Forecasting with Foundation Models: Capacity Planning and Incident Detection

Foundation models like TimesFM forecast numerical time series zero-shot. I tested them on payment transaction volume, added holiday covariates, and learned where they help and where they hurt.

Jun 24, 2026 AI

Running DeepSeek-V4-Flash at 700 tokens/s on 2x RTX Pro 6000

Run DeepSeek-V4-Flash on a 2x RTX Pro 6000 (96GB each) workstation using the voipmonitor/vllm:lucifer Docker image, a Blackwell-targeted vLLM fork with sm_120 kernels, FP8 KV cache, and MTP speculative decoding.

Jun 20, 2026 AI

Coding locally with Pi Coding agent and open weights models (April 2026 edition)

Run Qwen3.6-27B, Gemma 4 31B, and MiniMax M2.7 locally, then connect them to the Pi coding agent for local coding.

Apr 26, 2026 AI

What Happened When I Gave Luna Access to My Email

I gave Luna access to my email. Here's what happened in a single day.

Feb 22, 2026 AI

Luna: An AI Assistant That Works While I Sleep

Luna monitors, follows up, and takes action on her own—powered by a local LLM and a team of background agents.

Jan 31, 2026 AI

Fixing RTX Pro 6000 Blackwell shutdowns with custom fan control

Unexpected shutdowns under sustained load on RTX Pro 6000 Blackwell: fix with a small NVML fan control daemon + systemd.

Jan 17, 2026 AI

The age of hyper-personalized software

Why I run local LLMs to power a multimodal event crawler

Dec 30, 2025 AI

Running MiniMax-M2.1 Locally with Claude Code on Dual RTX Pro 6000

Run Claude Code with your own local MiniMax-M2.1 model using vLLM's native Anthropic API endpoint support.

Dec 27, 2025 AI

Guide on installing and running the best models on a dual RTX Pro 6000 rig with vLLM

Step-by-step vLLM stable/nightly install on Ubuntu 24.04 for a dual RTX Pro 6000 (96GB x2), model download workflow, and a fix for tp=2 hangs (IOMMU). Includes tested serve commands for Devstral 123B, GLM-4.5/4.6V, Qwen3 235B, MiniMax-M2, and gpt-oss-120b.

Dec 25, 2025 AI

Injecting Knowledge into LLMs via Fine-Tuning

A practical guide to injecting new knowledge into LLM models through fine-tuning, using Q&A pairs generated from documentation.

Dec 21, 2025 AI, Development