Luna: An AI Assistant That Works While I Sleep

Luna monitors, follows up, and takes action on her own—powered by a local LLM and a team of background agents.

Jan 31, 2026 AI

Fixing RTX Pro 6000 Blackwell shutdowns with custom fan control

Unexpected shutdowns under sustained load on RTX Pro 6000 Blackwell: fix with a small NVML fan control daemon + systemd.

Jan 17, 2026 AI

The age of hyper-personalized software

Why I run local LLMs to power a multimodal event crawler

Dec 30, 2025 AI

Running MiniMax-M2.1 Locally with Claude Code on Dual RTX Pro 6000

Run Claude Code with your own local MiniMax-M2.1 model using vLLM's native Anthropic API endpoint support.

Dec 27, 2025 AI

Guide on installing and running the best models on a dual RTX Pro 6000 rig with vLLM

Step-by-step vLLM stable/nightly install on Ubuntu 24.04 for a dual RTX Pro 6000 (96GB x2), model download workflow, and a fix for tp=2 hangs (IOMMU). Includes tested serve commands for Devstral 123B, GLM-4.5/4.6V, Qwen3 235B, MiniMax-M2, and gpt-oss-120b.

Dec 25, 2025 AI

Injecting Knowledge into LLMs via Fine-Tuning

A practical guide to injecting new knowledge into LLM models through fine-tuning, using Q&A pairs generated from documentation.

Dec 21, 2025 AI, Development

Three Years of ChatGPT

Today marks 3 years since ChatGPT was launched. In this short article I reflect on how far LLMs have come in just a few years, from getting early access to GPT-4 to now running open models that surpass it, and share two graphs that illustrate both the progress in open-weight models and the increasingly close race between OpenAI, Google, and Anthropic (with Google currently in the lead).

Nov 30, 2025 AI

Getting Started with running LLM models locally

A guide to running large language models locally: hardware options, inference engines (vLLM, SGLang, llama.cpp), quantization techniques, and user interfaces.

Nov 16, 2025 AI

Silicon Valley's New Secret: Chinese Base Models

From fine-tunes to founder stacks, the center of gravity is moving east.

Nov 2, 2025 AI, Development

Speeding up local LLM inference 2x with Speculative Decoding

How a small draft model can speed up LLM inference by 1.82× without sacrificing quality - benchmarking Qwen3-32B with speculative decoding

Oct 26, 2025 AI, Development