Positional Encoding

advanced
ArchitecturesLast updated: 2025-01-15
Also known as: position embeddings

What is Positional Encoding?


Positional encoding is a mechanism in transformer models that injects information about token position or sequence order into the model's representations. Unlike recurrent neural networks that process sequences sequentially and inherently capture positional information, transformers process all positions in parallel through attention mechanisms. Without positional encoding, the model would be permutation-invariant, unable to distinguish "the cat sat on the mat" from "mat the on sat cat the."


The original transformer architecture used sinusoidal positional encodings with different frequencies for different dimensions, creating unique patterns for each position that the model could learn to interpret. Modern approaches include learned positional embeddings (trainable vectors for each position), relative positional encodings (representing distances between positions), and rotary positional embeddings (RoPE) that have become popular in recent LLMs. Each approach has different properties regarding sequence length generalization and computational efficiency.


Positional encoding is fundamental to how transformers understand sequence structure and context. The choice of positional encoding scheme affects the model's maximum context length, ability to generalize to longer sequences than seen during training, and how it represents temporal or sequential relationships. Recent innovations in positional encoding, like RoPE and ALiBi (Attention with Linear Biases), have enabled models to handle much longer context windows, directly impacting the practical capabilities of LLM-based applications.


Related Terms