What is Context Length?
Context length refers to the maximum number of tokens that a large language model can process in a single request, encompassing both the input prompt and the generated output. This limit is determined by the model's architecture, specifically the size of its attention mechanism and positional encoding scheme. Modern LLMs have context lengths ranging from a few thousand tokens to over a million tokens in the most advanced models.
The context length represents a hard constraint on how much information can be provided to the model at once. For conversational agents, this limits how much conversation history can be included. For RAG systems, it constrains how many retrieved documents can be incorporated into the prompt. When working with long documents, the context length may require chunking or summarization strategies to fit the content within the model's processing capacity.
Managing context length is a critical consideration in agent system design. Developers must balance providing sufficient context for accurate responses against the finite context budget. Strategies include prioritizing recent or relevant information, using summarization to compress older context, implementing hierarchical processing for long documents, and carefully tracking token usage to avoid exceeding limits. As models with longer context windows become available, some of these constraints ease, but efficient context management remains important for performance and cost optimization.