-
Positional Encoding Explained: Sinusoidal Embeddings and RoPE (Part 2)
Binary encoding gave us the right idea but the wrong shape. Here we replace square waves with smooth ones to derive sinusoidal positional encoding from scratch, and build up to Rotary Position Embeddings (RoPE) the method behind LLaMA, Mistral, and Gemma. -
Positional Encoding Explained: From Position to Binary Encoding (Part 1)
Language models process text as a sequence of tokens. While token embeddings can represent the meaning of individual words, they do not inherently represent where those words appear in the sequence.