Latest Articles

Insights on AI, technology, and business — written from Iowa, built for everywhere.

Ollama 3.0: Real-Time 70B Inference via Speculative Decoding

Ollama 3.0 utilizes speculative decoding to run Llama 3 70B at real-time speeds on local hardware, revolutionizing AI performance.

PyTorch 3.0 introduces native 1-bit LLM training and optimized distributed inference, enabling massive models on consumer hardware.

← Newer 1 2 3 4