Injecting Knowledge into LLMs via Fine-Tuning
A practical guide to injecting new knowledge into LLM models through fine-tuning, using Q&A pairs generated from documentation.
A practical guide to injecting new knowledge into LLM models through fine-tuning, using Q&A pairs generated from documentation.
Today marks 3 years since ChatGPT was launched. In this short article I reflect on how far LLMs have come in just a few years, from getting early access to GPT-4 to now running open models that surpass it, and share two graphs that illustrate both the progress in open-weight models and the increasingly close race between OpenAI, Google, and Anthropic (with Google currently in the lead).
A guide to running large language models locally: hardware options, inference engines (vLLM, SGLang, llama.cpp), quantization techniques, and user interfaces.
From fine-tunes to founder stacks, the center of gravity is moving east.
How a small draft model can speed up LLM inference by 1.82× without sacrificing quality - benchmarking Qwen3-32B with speculative decoding
A practical guide to renting GPUs for running open-weight LLM models with control, privacy, and flexibility.
Learn how to properly set up vLLM with GPT-OSS built-in tools and integrate it with LibreChat to leverage powerful capabilities.