Simon's Journal

Simon's Journal https://bearisland.dev/ Recent content on Simon's Journal Hugo en Sun, 07 Jun 2026 00:00:00 +0000 Pre-training: An Overview https://bearisland.dev/posts/pretraining-overview/ Sat, 06 Jun 2026 00:00:00 +0000 https://bearisland.dev/posts/pretraining-overview/ A flyover of the pre-training pipeline. Text in, trained model out. Each stage gets its own deep dive in the rest of the series. Tokens and Tokenization https://bearisland.dev/posts/tokens-and-tokenization/ Sun, 07 Jun 2026 00:00:00 +0000 https://bearisland.dev/posts/tokens-and-tokenization/ How LLMs split text into tokens, the BPE algorithm, and why 'strawberry' has 3 r's the model can't see.