Learning Timeline
Key Insights
Trimming vs. Summarization
Trimming is free and fast (zero latency) but carries the risk of losing older information. Summarization preserves all essential info but increases API costs and latency because it requires an additional model call to summarize the text.
The 'Do Not Break Turns' Golden Rule
Never perform trimming in the middle of a 'turn'. A single turn encompasses everything from the user's message to the agent's final response. Splitting this block can cause the agent to lose focus or lose track of the task objectives.
Prompt Best Practices for Context Efficiency
Use explicit and structured language. Including too many tool definitions in the context creates 'noise'. It is more efficient to use targeted tools with clear boundaries rather than overloading a single agent with every available tool.
Step by Step
Managing Context Life Cycle in Agent Workflows
- Identify the agent workflow type—whether 'Knowledge heavy', 'Tool heavy', or 'Conversational'—to determine the best context management strategy.
- Monitor token usage within sessions to detect 'context bursts' (sudden spikes in context window usage).
- Apply 'Reshape and Fit' techniques to reduce context size without compromising agent functionality.
- Enable 'Context Trimming' by setting a value for the last 'n' turns to retain the most recent messages and discard older history.
- Use 'Context Compaction' to remove large payloads or tool call results from conversation history while maintaining the tool placeholder structure.
- Implement 'Context Summarization' to compress old conversations into a compact 'Memory' object before re-injecting it into the context.
- Set warning thresholds at 40% or 80% of context window capacity to automatically trigger cleanup techniques.
- Open the 'Configurations' page in the OpenAI Agents SDK demo to select the appropriate context management technique for specific agents.