Pretraining

This is a flyover A high-altitude tour of the pre-training pipeline. Each stage gets its own deep dive elsewhere in the series; this post exists to give those posts shared vocabulary to refer back to. Pre-training is the process that turns a freshly initialized neural network into a base model that can produce plausible continuations of text. Everything else built on top of an LLM (instruction tuning, RLHF, deployment) assumes pre-training has already happened. ...