Chapter 9 Context Engineering
In previous chapters, we have introduced memory systems and RAG for agents. However, to enable agents to stably "think" and "act" in real complex scenarios, memory and retrieval alone are not enough—we need an engineering methodology to continuously and systematically construct appropriate "context" for the model. This is the theme of this chapter: Context Engineering. It focuses on "how to assemble and optimize input context in a reusable, measurable, and evolvable way before each model call", thereby improving correctness, robustness, and efficiency[1][2].
To enable readers to quickly experience the complete functionality of this chapter, we provide a directly installable Python package. You can install the version corresponding to this chapter with the following command:
pip install "hello-agents[all]==0.2.7"This chapter mainly introduces the core concepts and practices of context engineering, and adds a context builder and two supporting tools to the HelloAgents framework:
- ContextBuilder (
hello_agents/context/builder.py): Context builder that implements the GSSC (Gather-Select-Structure-Compress) pipeline, providing a unified context management interface - NoteTool (
hello_agents/tools/builtin/note_tool.py): Structured note tool that supports persistent memory management for agents - TerminalTool (
hello_agents/tools/builtin/terminal_tool.py): Terminal tool that supports file system operations and just-in-time context retrieval for agents
These components together constitute a complete context engineering solution, which is key to implementing long-term task management and agentic search, and will be introduced in detail in subsequent sections.
In addition to installing the framework, you also need to configure the LLM API in .env. The examples in this chapter mainly use large language models for context management and intelligent decision-making.
After configuration is complete, you can start the learning journey of this chapter!
9.1 What is Context Engineering
After years of Prompt Engineering becoming the focus of applied AI, a new term has come to the forefront: Context Engineering. Today, building systems with language models is no longer just about finding the right phrasing and wording in prompts, but about answering a more macro question: What kind of context configuration is most likely to make the model produce the behavior we expect?
The so-called "context" refers to the set of tokens included when sampling a large language model (LLM). The engineering problem at hand is to optimize the utility of these tokens under the inherent constraints of the LLM, in order to stably obtain expected results. To effectively harness LLMs, it is often necessary to "think in context"—that is: at any call, examine the overall state visible to the LLM and predict the behavior this state might induce.
This section will explore the emerging context engineering and provide a refined mental model for building controllable and effective agents.
Context Engineering vs. Prompt Engineering
As shown in Figure 9.1, from the perspective of leading model vendors, context engineering is the natural evolution of prompt engineering. Prompt engineering focuses on how to write and organize LLM instructions to obtain better results (such as system prompt writing and structured strategies); while context engineering is how to plan and maintain the "optimal information set (tokens)" during the inference stage, which includes not only the prompt itself, but also all other information that will enter the context window.
In the early stages of LLM engineering, prompts were often the main work, because most use cases (except daily chat) required fine-tuned prompt optimization for single-turn classification or text generation. As the name suggests, the core of prompt engineering is "how to write effective prompts", especially system prompts. However, as we begin to engineer stronger agents that work over longer time spans and across multiple inference rounds, we need strategies that can manage the entire context state—including system instructions, tools, MCP (Model Context Protocol), external data, message history, etc.
An agent running in a loop will continuously generate data that may be relevant to the next round of inference. This information must be periodically refined. Therefore, the "art and technique" of context engineering lies in identifying which content should enter the limited context window from the continuously expanding "candidate information universe".
9.2 Why Context Engineering is Important
Although models are getting faster and can handle larger data scales, we observe that: like humans, LLMs will "wander" or "get confused" at a certain point. Needle-in-a-haystack benchmarks reveal a phenomenon: context rot—as the number of tokens in the context window increases, the model's ability to accurately recall information from the context actually decreases.
Different models may have smoother degradation curves, but this characteristic appears in almost all models. Therefore, context must be viewed as a limited resource with diminishing marginal returns. Just as humans have limited working memory capacity, LLMs also have an "attention budget". Each new token consumes part of this budget, so we need to be more careful about which tokens should be provided to the LLM.
This scarcity is not accidental, but stems from the architectural constraints of LLMs. Transformers allow each token to establish associations with all tokens in the context, theoretically forming (n^2) pairwise attention relationships. As the context length grows, the model's ability to model these pairwise relationships is "stretched thin", naturally creating tension between "context scale" and "attention concentration". In addition, the model's attention patterns come from the training data distribution—short sequences are usually more common than long sequences, so the model has less experience with "full-context dependencies" and fewer specialized parameters.
Techniques such as position encoding interpolation can allow models to "adapt" to sequences longer than during training at inference time, but at the cost of some precision in understanding token positions. Overall, these factors together form a performance gradient rather than a "cliff-like" collapse: models are still powerful in long contexts, but compared to short contexts, their precision in information retrieval and long-range reasoning will decline.
Based on the above reality, conscious context engineering becomes a necessity for building robust agents.
9.2.1 The "Anatomy" of Effective Context
Under the constraint of "limited attention budget", the goal of excellent context engineering is: maximize the probability of obtaining expected results with as few but high signal density tokens as possible. In practice, we recommend engineering around the following components:
System Prompt: Clear and straightforward language, with information hierarchy at "just right" height. Common pitfalls at two extremes:
- Over-hardcoding: Writing complex, fragile if-else logic in prompts, with high long-term maintenance costs and fragility.
- Too vague: Only providing macro goals and generalized guidance, lacking specific signals for expected output or assuming incorrect "shared context". It is recommended to organize prompts into sections (such as , , tool guidance, output description, etc.), separated by XML/Markdown. Regardless of format, the pursuit is the "minimum necessary information set" that can fully outline expected behavior ("minimum" does not equal "shortest"). First run with the best model on the minimum prompt, then add clear instructions and examples based on failure modes.
Tools: Tools define the contract between the agent and the information/action space, and must promote efficiency: they must return token-friendly information while encouraging efficient agent behavior. Tools should:
- Have single responsibilities with low overlap, clear interface semantics;
- Be robust to errors;
- Have clear and unambiguous parameter descriptions, fully leveraging the model's strengths in expression and reasoning. A common failure mode is "bloated tool sets": fuzzy functional boundaries, making the decision of "which tool to use" itself ambiguous. If human engineers can't tell which tool to use, don't expect agents to do better. Carefully identifying a "Minimum Viable Tool Set (MVTS)" can often significantly improve stability and maintainability in long-term interactions.
Few-shot Examples: Always recommend providing examples, but don't recommend stuffing "all boundary conditions" into prompts. Please carefully select a set of diverse and typical examples that directly portray "expected behavior". For LLMs, good examples are worth a thousand words.
The overall guiding principle is: sufficient but compact information. As shown in Figure 9.2, this is dynamic retrieval entering runtime.
9.2.2 Context Retrieval and Agentic Search
A concise definition: Agent = LLM autonomously calling tools in a loop. As the capabilities of underlying models increase, the autonomy level of agents can be improved: they can more independently explore complex problem spaces and recover from errors.
Engineering practice is gradually transitioning from "one-time retrieval before inference (embedding retrieval)" to "Just-in-time (JIT) context". The latter no longer preloads all relevant data, but maintains lightweight references (file paths, storage queries, URLs, etc.), dynamically loading required data through tools at runtime. This allows the model to write targeted queries, cache necessary results, and analyze large volumes of data with commands like head/tail—without stuffing entire data blocks into context at once. Its cognitive pattern is closer to humans: we don't memorize all information, but use external indexes like file systems, inboxes, bookmarks to extract on demand.
In addition to storage efficiency, metadata of references itself can help refine behavior: directory hierarchy, naming conventions, timestamps, etc., all implicitly convey "purpose and timeliness". For example, tests/test_utils.py and src/core/test_utils.py have different semantic implications.
Allowing agents to autonomously navigate and retrieve also enables progressive disclosure: each interaction step generates new context, which in turn guides the next decision—file size hints at complexity, naming hints at purpose, timestamps hint at relevance. Agents can build understanding layer by layer, keeping only the "currently necessary subset" in working memory, and using "note-taking" for supplementary persistence, thereby maintaining focus rather than being "dragged down by comprehensiveness".
The trade-off is: runtime exploration is often slower than pre-computed retrieval, and requires "opinionated" engineering design to ensure the model has the right tools and heuristics. Without guidance, agents may misuse tools, chase dead ends, or miss key information, causing context waste.
In many scenarios, a hybrid strategy is more effective: preload a small amount of "high-value" context to ensure speed, then allow agents to continue autonomous exploration on demand. The choice of boundaries depends on task dynamics and timeliness requirements. In engineering, you can preload files like "project convention descriptions (such as README/guides)", while providing primitives like glob, grep, allowing agents to retrieve specific files just-in-time, thereby bypassing the sunk costs of outdated indexes and complex syntax trees.
9.2.3 Context Engineering for Long-Horizon Tasks
Long-horizon tasks require agents to maintain coherence, context consistency, and goal orientation in action sequences that exceed the context window. For example, large codebase migrations, systematic research spanning hours. Expecting to infinitely increase the context window cannot cure the problems of "context pollution" and relevance degradation, so engineering methods directly facing these constraints are needed: Compaction, Structured note-taking, and Sub-agent architectures.
Compaction
- Definition: When a conversation approaches the context limit, perform high-fidelity summarization and restart a new context window with the summary to maintain long-range coherence.
- Practice: Have the model compress and retain architectural decisions, unresolved defects, implementation details, discarding repetitive tool outputs and noise; the new window carries the compressed summary + a few recent highly relevant artifacts (such as "recently accessed files").
- Tuning suggestions: First optimize recall (ensure no key information is missed), then optimize precision (remove redundant content); a safe "light-touch" compression is to clean up "tool calls and results in deep history".
Structured note-taking
- Definition: Also called "agent memory". Agents write key information to persistent storage outside the context at fixed frequencies, pulling it back on demand in subsequent stages.
- Value: Maintain persistent state and dependencies with extremely low context overhead. For example, maintaining TODO lists, project NOTES.md, indexes of key conclusions/dependencies/blockers, maintaining progress and consistency across dozens of tool calls and multiple context resets.
- Note: Equally effective in non-coding scenarios (such as long-term strategic tasks, goal management and statistical counting in games/simulations). Combined with
MemoryToolfrom Chapter 8, file-based/vector-based external memory can be easily implemented and retrieved at runtime.
Sub-agent architectures
- Idea: The main agent is responsible for high-level planning and synthesis, while multiple specialized sub-agents each dig deep, call tools, and explore in "clean context windows", finally only returning condensed summaries (typically 1,000–2,000 tokens).
- Benefits: Achieve separation of concerns. Complex search contexts remain internal to sub-agents, while the main agent focuses on integration and reasoning; suitable for complex research/analysis tasks requiring parallel exploration.
- Experience: Public multi-agent research systems show that this pattern has significant advantages over single-agent baselines in complex research tasks.
Method trade-offs can follow these rules of thumb:
- Compaction: Suitable for tasks requiring long conversation continuity, emphasizing context "relay".
- Structured note-taking: Suitable for iterative development and research with milestones/phased results.
- Sub-agent architectures: Suitable for complex research and analysis that can benefit from parallel exploration.
Even as model capabilities continue to improve, "maintaining coherence and focus in long interactions" remains a core challenge in building robust agents. Careful and systematic context engineering will maintain its key value in the long term.
9.3 Practice in Hello-Agents: ContextBuilder
This section will detail the context engineering practice in the HelloAgents framework. We will gradually demonstrate how to build a production-grade context management system from design motivation, core data structures, implementation details to complete cases. The design philosophy of ContextBuilder is "simple and efficient", removing unnecessary complexity, uniformly selecting based on "relevance + recency" scores, conforming to the engineering orientation of Agent modularity and maintainability.
9.3.1 Design Motivation and Goals
Before building ContextBuilder, we first need to clarify its design goals and core value. An excellent context management system should solve the following key problems:
Unified Entry: Abstract "Gather-Select-Structure-Compress" as a reusable pipeline, reducing repetitive template code in Agent implementations. This unified interface design allows developers to avoid repeatedly writing context management logic in each Agent.
Stable Form: Output a context template with a fixed skeleton, facilitating debugging, A/B testing, and evaluation. We adopted a sectioned template structure:
[Role & Policies]: Clarify the Agent's role positioning and behavioral guidelines[Task]: The specific task currently to be completed[State]: The Agent's current state and context information[Evidence]: Evidence information retrieved from external knowledge bases[Context]: Historical dialogue and related memories[Output]: Expected output format and requirements
Budget Guardian: Retain high-value information as much as possible within the token budget, providing fallback compression strategies for over-limit contexts. This ensures that even in scenarios with huge amounts of information, the system can run stably.
Minimum Rules: Do not introduce classification dimensions such as source/priority to avoid complexity growth. Practice shows that a simple scoring mechanism based on relevance and recency is effective enough in most scenarios.
9.3.2 Core Data Structures
The implementation of ContextBuilder relies on two core data structures that define the system's configuration and information units.
(1) ContextPacket: Candidate Information Package
from dataclasses import dataclass
from typing import Optional, Dict, Any
from datetime import datetime
@dataclass
class ContextPacket:
"""Candidate information package
Attributes:
content: Information content
timestamp: Timestamp
token_count: Token count
relevance_score: Relevance score (0.0-1.0)
metadata: Optional metadata
"""
content: str
timestamp: datetime
token_count: int
relevance_score: float = 0.5
metadata: Optional[Dict[str, Any]] = None
def __post_init__(self):
"""Post-initialization processing"""
if self.metadata is None:
self.metadata = {}
# Ensure relevance score is within valid range
self.relevance_score = max(0.0, min(1.0, self.relevance_score))ContextPacket is the basic unit of information in the system. Each candidate information is encapsulated as a ContextPacket, containing core attributes such as content, timestamp, token count, and relevance score. This unified data structure simplifies subsequent selection and sorting logic.
(2) ContextConfig: Configuration Management
@dataclass
class ContextConfig:
"""Context building configuration
Attributes:
max_tokens: Maximum token count
reserve_ratio: Ratio reserved for system instructions (0.0-1.0)
min_relevance: Minimum relevance threshold
enable_compression: Whether to enable compression
recency_weight: Recency weight (0.0-1.0)
relevance_weight: Relevance weight (0.0-1.0)
"""
max_tokens: int = 3000
reserve_ratio: float = 0.2
min_relevance: float = 0.1
enable_compression: bool = True
recency_weight: float = 0.3
relevance_weight: float = 0.7
def __post_init__(self):
"""Validate configuration parameters"""
assert 0.0 <= self.reserve_ratio <= 1.0, "reserve_ratio must be in [0, 1] range"
assert 0.0 <= self.min_relevance <= 1.0, "min_relevance must be in [0, 1] range"
assert abs(self.recency_weight + self.relevance_weight - 1.0) < 1e-6, \
"recency_weight + relevance_weight must equal 1.0"ContextConfig encapsulates all configurable parameters, making system behavior flexibly adjustable. Particularly noteworthy is the reserve_ratio parameter, which ensures that key information such as system instructions always has sufficient space and will not be squeezed out by other information.
9.3.3 GSSC Pipeline Detailed Explanation
The core of ContextBuilder is the GSSC (Gather-Select-Structure-Compress) pipeline, which decomposes the context building process into four clear stages. Let's dive into the implementation details of each stage.
(1) Gather: Multi-source Information Collection
The first stage is to collect candidate information from multiple sources. The key to this stage is fault tolerance and flexibility.
def _gather(
self,
user_query: str,
conversation_history: Optional[List[Message]] = None,
system_instructions: Optional[str] = None,
custom_packets: Optional[List[ContextPacket]] = None
) -> List[ContextPacket]:
"""Collect all candidate information
Args:
user_query: User query
conversation_history: Conversation history
system_instructions: System instructions
custom_packets: Custom information packages
Returns:
List[ContextPacket]: Candidate information list
"""
packets = []
# 1. Add system instructions (highest priority, not scored)
if system_instructions:
packets.append(ContextPacket(
content=system_instructions,
timestamp=datetime.now(),
token_count=self._count_tokens(system_instructions),
relevance_score=1.0, # System instructions always retained
metadata={"type": "system_instruction", "priority": "high"}
))
# 2. Retrieve relevant memories from memory system
if self.memory_tool:
try:
memory_results = self.memory_tool.execute(
"search",
query=user_query,
limit=10,
min_importance=0.3
)
# Parse memory results and convert to ContextPacket
memory_packets = self._parse_memory_results(memory_results, user_query)
packets.extend(memory_packets)
except Exception as e:
print(f"[WARNING] Memory retrieval failed: {e}")
# 3. Retrieve relevant knowledge from RAG system
if self.rag_tool:
try:
rag_results = self.rag_tool.execute(
"search",
query=user_query,
limit=5,
min_score=0.3
)
# Parse RAG results and convert to ContextPacket
rag_packets = self._parse_rag_results(rag_results, user_query)
packets.extend(rag_packets)
except Exception as e:
print(f"[WARNING] RAG retrieval failed: {e}")
# 4. Add conversation history (only keep recent N entries)
if conversation_history:
recent_history = conversation_history[-5:] # Default keep recent 5 entries
for msg in recent_history:
packets.append(ContextPacket(
content=f"{msg.role}: {msg.content}",
timestamp=msg.timestamp if hasattr(msg, 'timestamp') else datetime.now(),
token_count=self._count_tokens(msg.content),
relevance_score=0.6, # Base relevance of historical messages
metadata={"type": "conversation_history", "role": msg.role}
))
# 5. Add custom information packages
if custom_packets:
packets.extend(custom_packets)
print(f"[ContextBuilder] Collected {len(packets)} candidate information packages")
return packetsThis implementation demonstrates several important design considerations:
- Fault Tolerance Mechanism: Each external data source call is wrapped in try-except, ensuring that failure of a single source does not affect the overall process
- Priority Handling: System instructions are marked as high priority, ensuring they are always retained
- History Limitation: Conversation history only keeps the most recent entries, avoiding the context window being occupied by historical information
(2) Select: Intelligent Information Selection
The second stage is to score and select candidate information based on relevance and recency. This is the core of the entire pipeline and directly determines the quality of the final context.
def _select(
self,
packets: List[ContextPacket],
user_query: str,
available_tokens: int
) -> List[ContextPacket]:
"""Select the most relevant information packages
Args:
packets: Candidate information package list
user_query: User query (for calculating relevance)
available_tokens: Available token count
Returns:
List[ContextPacket]: Selected information package list
"""
# 1. Separate system instructions and other information
system_packets = [p for p in packets if p.metadata.get("type") == "system_instruction"]
other_packets = [p for p in packets if p.metadata.get("type") != "system_instruction"]
# 2. Calculate tokens occupied by system instructions
system_tokens = sum(p.token_count for p in system_packets)
remaining_tokens = available_tokens - system_tokens
if remaining_tokens <= 0:
print("[WARNING] System instructions have occupied all token budget")
return system_packets
# 3. Calculate comprehensive scores for other information
scored_packets = []
for packet in other_packets:
# Calculate relevance score (if not yet calculated)
if packet.relevance_score == 0.5: # Default value, needs recalculation
relevance = self._calculate_relevance(packet.content, user_query)
packet.relevance_score = relevance
# Calculate recency score
recency = self._calculate_recency(packet.timestamp)
# Combined score = relevance weight × relevance + recency weight × recency
combined_score = (
self.config.relevance_weight * packet.relevance_score +
self.config.recency_weight * recency
)
# Filter information below minimum relevance threshold
if packet.relevance_score >= self.config.min_relevance:
scored_packets.append((combined_score, packet))
# 4. Sort by score in descending order
scored_packets.sort(key=lambda x: x[0], reverse=True)
# 5. Greedy selection: fill from high to low score until token limit is reached
selected = system_packets.copy()
current_tokens = system_tokens
for score, packet in scored_packets:
if current_tokens + packet.token_count <= available_tokens:
selected.append(packet)
current_tokens += packet.token_count
else:
# Token budget is full, stop selection
break
print(f"[ContextBuilder] Selected {len(selected)} information packages, total {current_tokens} tokens")
return selected
def _calculate_relevance(self, content: str, query: str) -> float:
"""Calculate relevance between content and query
Uses simple keyword overlap algorithm. In production, can be replaced with vector similarity calculation.
Args:
content: Content text
query: Query text
Returns:
float: Relevance score (0.0-1.0)
"""
# Tokenization (simple implementation, can use more complex tokenizers)
content_words = set(content.lower().split())
query_words = set(query.lower().split())
if not query_words:
return 0.0
# Jaccard similarity
intersection = content_words & query_words
union = content_words | query_words
return len(intersection) / len(union) if union else 0.0
def _calculate_recency(self, timestamp: datetime) -> float:
"""Calculate temporal recency score
Uses exponential decay model, maintains high score within 24 hours, then gradually decays.
Args:
timestamp: Information timestamp
Returns:
float: Recency score (0.0-1.0)
"""
import math
age_hours = (datetime.now() - timestamp).total_seconds() / 3600
# Exponential decay: maintain high score within 24 hours, then gradually decay
decay_factor = 0.1 # Decay coefficient
recency_score = math.exp(-decay_factor * age_hours / 24)
return max(0.1, min(1.0, recency_score)) # Limit to [0.1, 1.0] rangeThe core algorithm of the selection stage embodies several important engineering considerations:
- Scoring Mechanism: Uses weighted combination of relevance and recency, with configurable weights
- Greedy Algorithm: Fills from high to low score, ensuring selection of the most valuable information within limited budget
- Filtering Mechanism: Filters low-quality information through the
min_relevanceparameter
(3) Structure: Structured Output
The third stage is to organize selected information into a structured context template.
def _structure(self, selected_packets: List[ContextPacket], user_query: str) -> str:
"""Organize selected information packages into structured context template
Args:
selected_packets: Selected information package list
user_query: User query
Returns:
str: Structured context string
"""
# Group by type
system_instructions = []
evidence = []
context = []
for packet in selected_packets:
packet_type = packet.metadata.get("type", "general")
if packet_type == "system_instruction":
system_instructions.append(packet.content)
elif packet_type in ["rag_result", "knowledge"]:
evidence.append(packet.content)
else:
context.append(packet.content)
# Build structured template
sections = []
# [Role & Policies]
if system_instructions:
sections.append("[Role & Policies]\n" + "\n".join(system_instructions))
# [Task]
sections.append(f"[Task]\n{user_query}")
# [Evidence]
if evidence:
sections.append("[Evidence]\n" + "\n---\n".join(evidence))
# [Context]
if context:
sections.append("[Context]\n" + "\n".join(context))
# [Output]
sections.append("[Output]\nPlease provide accurate, evidence-based answers based on the above information.")
return "\n\n".join(sections)The structuring stage organizes scattered information packages into clear sections. This design has several advantages:
- Readability: Clear sections make it easier for both humans and models to understand the context structure
- Debuggability: Problem localization is easier, can quickly identify which area has problematic information
- Extensibility: Adding new information sources only requires creating new sections
(4) Compress: Fallback Compression
The fourth stage is to compress over-limit contexts.
def _compress(self, context: str, max_tokens: int) -> str:
"""Compress over-limit context
Args:
context: Original context
max_tokens: Maximum token limit
Returns:
str: Compressed context
"""
current_tokens = self._count_tokens(context)
if current_tokens <= max_tokens:
return context # No compression needed
print(f"[ContextBuilder] Context over limit ({current_tokens} > {max_tokens}), executing compression")
# Section compression: maintain structural integrity
sections = context.split("\n\n")
compressed_sections = []
current_total = 0
for section in sections:
section_tokens = self._count_tokens(section)
if current_total + section_tokens <= max_tokens:
# Fully retain
compressed_sections.append(section)
current_total += section_tokens
else:
# Partially retain
remaining_tokens = max_tokens - current_total
if remaining_tokens > 50: # Retain at least 50 tokens
# Simple truncation (can use LLM summarization in production)
truncated = self._truncate_text(section, remaining_tokens)
compressed_sections.append(truncated + "\n[... Content compressed ...]")
break
compressed_context = "\n\n".join(compressed_sections)
final_tokens = self._count_tokens(compressed_context)
print(f"[ContextBuilder] Compression complete: {current_tokens} -> {final_tokens} tokens")
return compressed_context
def _truncate_text(self, text: str, max_tokens: int) -> str:
"""Truncate text to specified token count
Args:
text: Original text
max_tokens: Maximum token count
Returns:
str: Truncated text
"""
# Simple implementation: estimate by character ratio
# Should use precise tokenizer in production
char_per_token = len(text) / self._count_tokens(text) if self._count_tokens(text) > 0 else 4
max_chars = int(max_tokens * char_per_token)
return text[:max_chars]
def _count_tokens(self, text: str) -> int:
"""Estimate token count of text
Args:
text: Text content
Returns:
int: Token count
"""
# Simple estimation: Chinese 1 char ≈ 1 token, English 1 word ≈ 1.3 tokens
# Should use actual tokenizer in production
chinese_chars = sum(1 for ch in text if '\u4e00' <= ch <= '\u9fff')
english_words = len([w for w in text.split() if w])
return int(chinese_chars + english_words * 1.3)The design of the compression stage embodies the principle of "maintaining structural integrity". Even when the token budget is tight, it tries to retain key information from each section.
9.3.4 Complete Usage Example
Now let's demonstrate how to use ContextBuilder in actual projects through a complete example.
(1) Basic Usage
from hello_agents.context import ContextBuilder, ContextConfig
from hello_agents.tools import MemoryTool, RAGTool
from hello_agents.core.message import Message
from datetime import datetime
# 1. Initialize tools
memory_tool = MemoryTool(user_id="user123")
rag_tool = RAGTool(knowledge_base_path="./knowledge_base")
# 2. Create ContextBuilder
config = ContextConfig(
max_tokens=3000,
reserve_ratio=0.2,
min_relevance=0.2,
enable_compression=True
)
builder = ContextBuilder(
memory_tool=memory_tool,
rag_tool=rag_tool,
config=config
)
# 3. Prepare conversation history
conversation_history = [
Message(content="I'm developing a data analysis tool", role="user", timestamp=datetime.now()),
Message(content="Great! Data analysis tools usually need to handle large amounts of data. What tech stack do you plan to use?", role="assistant", timestamp=datetime.now()),
Message(content="I plan to use Python and Pandas, and have completed the CSV reading module", role="user", timestamp=datetime.now()),
Message(content="Good choice! Pandas is very powerful for data processing. Next you may need to consider data cleaning and transformation.", role="assistant", timestamp=datetime.now()),
]
# 4. Add some memories
memory_tool.execute(
"add",
content="User is developing a data analysis tool using Python and Pandas",
memory_type="semantic",
importance=0.8
)
memory_tool.execute(
"add",
content="Completed development of CSV reading module",
memory_type="episodic",
importance=0.7
)
# 5. Build context
context = builder.build(
user_query="How to optimize Pandas memory usage?",
conversation_history=conversation_history,
system_instructions="You are a senior Python data engineering consultant. Your answers need to: 1) Provide specific actionable advice 2) Explain technical principles 3) Provide code examples"
)
print("=" * 80)
print("Built context:")
print("=" * 80)
print(context)
print("=" * 80)(2) Running Effect Demonstration
After running the above code, you will see the following structured context output:
================================================================================
Built context:
================================================================================
[Role & Policies]
You are a senior Python data engineering consultant. Your answers need to: 1) Provide specific actionable advice 2) Explain technical principles 3) Provide code examples
[Task]
How to optimize Pandas memory usage?
[Evidence]
Core strategies for Pandas memory optimization include:
1. Use appropriate data types (such as category instead of object)
2. Read large files in chunks
3. Use chunksize parameter
---
Data type optimization can significantly reduce memory usage. For example, downgrading int64 to int32 can save 50% memory.
[Context]
user: I'm developing a data analysis tool
assistant: Great! Data analysis tools usually need to handle large amounts of data. What tech stack do you plan to use?
user: I plan to use Python and Pandas, and have completed the CSV reading module
assistant: Good choice! Pandas is very powerful for data processing. Next you may need to consider data cleaning and transformation.
Memory: User is developing a data analysis tool using Python and Pandas
Memory: Completed development of CSV reading module
[Output]
Please provide accurate, evidence-based answers based on the above information.
================================================================================This structured context contains all necessary information:
- [Role & Policies]: Clarifies the AI's role and answer requirements
- [Task]: Clearly expresses the user's question
- [Evidence]: Relevant knowledge retrieved from the RAG system
- [Context]: Conversation history and related memories, providing sufficient background information
- [Output]: Guides the LLM on how to organize the answer
(3) Integration with Agent
Finally, let's demonstrate how to integrate ContextBuilder into an Agent:
from hello_agents import SimpleAgent, HelloAgentsLLM, ToolRegistry
from hello_agents.context import ContextBuilder, ContextConfig
from hello_agents.tools import MemoryTool, RAGTool
class ContextAwareAgent(SimpleAgent):
"""Agent with context awareness capability"""
def __init__(self, name: str, llm: HelloAgentsLLM, **kwargs):
super().__init__(name=name, llm=llm, system_prompt=kwargs.get("system_prompt", ""))
# Initialize context builder
self.memory_tool = MemoryTool(user_id=kwargs.get("user_id", "default"))
self.rag_tool = RAGTool(knowledge_base_path=kwargs.get("knowledge_base_path", "./kb"))
self.context_builder = ContextBuilder(
memory_tool=self.memory_tool,
rag_tool=self.rag_tool,
config=ContextConfig(max_tokens=4000)
)
self.conversation_history = []
def run(self, user_input: str) -> str:
"""Run Agent, automatically build optimized context"""
# 1. Use ContextBuilder to build optimized context
optimized_context = self.context_builder.build(
user_query=user_input,
conversation_history=self.conversation_history,
system_instructions=self.system_prompt
)
# 2. Call LLM with optimized context
messages = [
{"role": "system", "content": optimized_context},
{"role": "user", "content": user_input}
]
response = self.llm.invoke(messages)
# 3. Update conversation history
from hello_agents.core.message import Message
from datetime import datetime
self.conversation_history.append(
Message(content=user_input, role="user", timestamp=datetime.now())
)
self.conversation_history.append(
Message(content=response, role="assistant", timestamp=datetime.now())
)
# 4. Record important interactions to memory system
self.memory_tool.execute(
"add",
content=f"Q: {user_input}\nA: {response[:200]}...", # Summary
memory_type="episodic",
importance=0.6
)
return response
# Usage example
agent = ContextAwareAgent(
name="Data Analysis Consultant",
llm=HelloAgentsLLM(),
system_prompt="You are a senior Python data engineering consultant.",
user_id="user123",
knowledge_base_path="./data_science_kb"
)
response = agent.run("How to optimize Pandas memory usage?")
print(response)Through this approach, ContextBuilder becomes the "context management brain" of the Agent, automatically handling information collection, filtering, and organization, allowing the Agent to always reason and generate under optimal context.
9.3.5 Best Practices and Optimization Recommendations
When actually applying ContextBuilder, the following best practices are worth noting:
Dynamically adjust token budget: Dynamically adjust
max_tokensbased on task complexity, use smaller budgets for simple tasks, increase budgets for complex tasks.Relevance calculation optimization: In production environments, replace simple keyword overlap with vector similarity calculation to improve retrieval quality.
Caching mechanism: For unchanging system instructions and knowledge base content, implement caching mechanisms to avoid repeated calculations.
Monitoring and logging: Record statistical information for each context build (number of selected information, token usage rate, etc.) for subsequent optimization.
A/B testing: For key parameters (such as relevance weight, recency weight), find optimal configuration through A/B testing.
9.4 NoteTool: Structured Notes
NoteTool is a structured external memory component provided for "long-horizon tasks". It uses Markdown files as carriers, with YAML front matter in the header to record key information, and the body to record status, conclusions, blockers, and action items. This design combines human readability, version control friendliness, and ease of re-injecting into context, making it an important tool for building long-horizon agents.
9.4.1 Design Philosophy and Application Scenarios
Before diving into implementation details, let's first understand the design philosophy and typical application scenarios of NoteTool.
(1) Why do we need NoteTool?
In Chapter 8, we introduced MemoryTool, which provides powerful memory management capabilities. However, MemoryTool mainly focuses on conversational memory—short-term working memory, episodic memory, and semantic memory. For project-based tasks that require long-term tracking and structured management, we need a lighter, more human-friendly recording method.
NoteTool fills this gap by providing:
- Structured recording: Uses Markdown + YAML format, suitable for both machine parsing and human reading and editing
- Version friendly: Plain text format, naturally supports version control systems like Git
- Low overhead: No complex database operations required, suitable for lightweight state tracking
- Flexible categorization: Flexibly organize notes through
typeandtags, supporting multi-dimensional retrieval
(2) Typical Application Scenarios
NoteTool is particularly suitable for the following scenarios:
Scenario 1: Long-term Project Tracking
Imagine an agent is assisting with a large codebase refactoring task, which may take days or even weeks. NoteTool can record:
task_state: Current stage task status and progressconclusion: Key conclusions after each stage endsblocker: Problems and blocking points encounteredaction: Next action plan
# Record task status
notes.run({
"action": "create",
"title": "Refactoring Project - Phase 1",
"content": "Completed refactoring of data model layer, test coverage reached 85%. Next will refactor business logic layer.",
"note_type": "task_state",
"tags": ["refactoring", "phase1"]
})
# Record blocker
notes.run({
"action": "create",
"title": "Dependency Conflict Issue",
"content": "Found some third-party library versions incompatible, need to resolve. Impact scope: 3 modules in business logic layer.",
"note_type": "blocker",
"tags": ["dependency", "urgent"]
})Scenario 2: Research Task Management
An intelligent research assistant conducting literature review can use NoteTool to record:
- Core viewpoints of each paper (
conclusion) - Topics to be investigated in depth (
action) - Important references (
reference)
Scenario 3: Cooperation with ContextBuilder
Before each round of dialogue, the Agent can retrieve relevant notes through search or list operations and inject them into the context:
# In Agent's run method
def run(self, user_input: str) -> str:
# 1. Retrieve relevant notes
relevant_notes = self.note_tool.run({
"action": "search",
"query": user_input,
"limit": 3
})
# 2. Convert note content to ContextPacket
note_packets = []
for note in relevant_notes:
note_packets.append(ContextPacket(
content=note['content'],
timestamp=note['updated_at'],
token_count=self._count_tokens(note['content']),
relevance_score=0.7,
metadata={"type": "note", "note_type": note['type']}
))
# 3. Pass notes when building context
context = self.context_builder.build(
user_query=user_input,
custom_packets=note_packets,
...
)9.4.2 Storage Format Detailed Explanation
NoteTool adopts a hybrid format of Markdown + YAML, which balances structure and readability.
(1) Note File Format
Each note is an independent .md file with the following format:
---
id: note_20250119_153000_0
title: Project Progress - Phase 1
type: task_state
tags: [refactoring, phase1, backend]
created_at: 2025-01-19T15:30:00
updated_at: 2025-01-19T15:30:00
---
# Project Progress - Phase 1
## Completion Status
Completed refactoring of data model layer, main changes include:
1. Unified entity class naming conventions
2. Introduced type hints to improve code maintainability
3. Optimized database query performance
## Test Coverage
- Unit test coverage: 85%
- Integration test coverage: 70%
## Next Steps
1. Refactor business logic layer
2. Resolve dependency conflict issues
3. Increase integration test coverage to 85%Advantages of this format:
- YAML metadata: Machine-parsable, supports precise field extraction and retrieval
- Markdown body: Human-readable, supports rich formatting (headings, lists, code blocks, etc.)
- Filename as ID: Simplifies management, each note's filename is its unique identifier
(2) Index File
NoteTool maintains a notes_index.json file for quick retrieval and management of notes:
{
"note_20250119_153000_0": {
"id": "note_20250119_153000_0",
"title": "Project Progress - Phase 1",
"type": "task_state",
"tags": ["refactoring", "phase1", "backend"],
"created_at": "2025-01-19T15:30:00",
"updated_at": "2025-01-19T15:30:00",
"file_path": "./notes/note_20250119_153000_0.md"
}
}The role of this index file:
- Quick retrieval: No need to open each file, search directly from the index
- Metadata management: Centrally manage metadata for all notes
- Integrity check: Can detect missing or corrupted files
9.4.3 Core Operations Detailed Explanation
NoteTool provides seven core operations covering the complete lifecycle management of notes.
(1) create: Create Note
def _create_note(
self,
title: str,
content: str,
note_type: str = "general",
tags: Optional[List[str]] = None
) -> str:
"""Create note
Args:
title: Note title
content: Note content (Markdown format)
note_type: Note type (task_state/conclusion/blocker/action/reference/general)
tags: Tag list
Returns:
str: Note ID
"""
from datetime import datetime
# 1. Generate unique ID
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
note_id = f"note_{timestamp}_{len(self.index)}"
# 2. Build metadata
metadata = {
"id": note_id,
"title": title,
"type": note_type,
"tags": tags or [],
"created_at": datetime.now().isoformat(),
"updated_at": datetime.now().isoformat()
}
# 3. Build complete Markdown file content
md_content = self._build_markdown(metadata, content)
# 4. Save to file
file_path = os.path.join(self.workspace, f"{note_id}.md")
with open(file_path, 'w', encoding='utf-8') as f:
f.write(md_content)
# 5. Update index
metadata["file_path"] = file_path
self.index[note_id] = metadata
self._save_index()
return note_id
def _build_markdown(self, metadata: Dict, content: str) -> str:
"""Build Markdown file content (YAML + body)"""
import yaml
# YAML front matter
yaml_header = yaml.dump(metadata, allow_unicode=True, sort_keys=False)
# Combined format
return f"---\n{yaml_header}---\n\n{content}"Usage example:
from hello_agents.tools import NoteTool
notes = NoteTool(workspace="./project_notes")
note_id = notes.run({
"action": "create",
"title": "Refactoring Project - Phase 1",
"content": """## Completion Status
Completed refactoring of data model layer, test coverage reached 85%.
## Next Steps
Refactor business logic layer""",
"note_type": "task_state",
"tags": ["refactoring", "phase1"]
})
print(f"✅ Note created successfully, ID: {note_id}")(2) read: Read Note
def _read_note(self, note_id: str) -> Dict:
"""Read note content
Args:
note_id: Note ID
Returns:
Dict: Dictionary containing metadata and content
"""
if note_id not in self.index:
raise ValueError(f"Note does not exist: {note_id}")
file_path = self.index[note_id]["file_path"]
# Read file
with open(file_path, 'r', encoding='utf-8') as f:
raw_content = f.read()
# Parse YAML metadata and Markdown body
metadata, content = self._parse_markdown(raw_content)
return {
"metadata": metadata,
"content": content
}
def _parse_markdown(self, raw_content: str) -> Tuple[Dict, str]:
"""Parse Markdown file (separate YAML and body)"""
import yaml
# Find YAML delimiters
parts = raw_content.split('---\n', 2)
if len(parts) >= 3:
# Has YAML front matter
yaml_str = parts[1]
content = parts[2].strip()
metadata = yaml.safe_load(yaml_str)
else:
# No metadata, all as body
metadata = {}
content = raw_content.strip()
return metadata, content(3) update: Update Note
def _update_note(
self,
note_id: str,
title: Optional[str] = None,
content: Optional[str] = None,
note_type: Optional[str] = None,
tags: Optional[List[str]] = None
) -> str:
"""Update note
Args:
note_id: Note ID
title: New title (optional)
content: New content (optional)
note_type: New type (optional)
tags: New tags (optional)
Returns:
str: Operation result message
"""
if note_id not in self.index:
raise ValueError(f"Note does not exist: {note_id}")
# 1. Read existing note
note = self._read_note(note_id)
metadata = note["metadata"]
old_content = note["content"]
# 2. Update fields
if title:
metadata["title"] = title
if note_type:
metadata["type"] = note_type
if tags is not None:
metadata["tags"] = tags
if content is not None:
old_content = content
# Update timestamp
from datetime import datetime
metadata["updated_at"] = datetime.now().isoformat()
# 3. Rebuild and save
md_content = self._build_markdown(metadata, old_content)
file_path = metadata["file_path"]
with open(file_path, 'w', encoding='utf-8') as f:
f.write(md_content)
# 4. Update index
self.index[note_id] = metadata
self._save_index()
return f"✅ Note updated: {metadata['title']}"(4) search: Search Notes
def _search_notes(
self,
query: str,
limit: int = 10,
note_type: Optional[str] = None,
tags: Optional[List[str]] = None
) -> List[Dict]:
"""Search notes
Args:
query: Search keyword
limit: Return quantity limit
note_type: Filter by type (optional)
tags: Filter by tags (optional)
Returns:
List[Dict]: List of matching notes
"""
results = []
query_lower = query.lower()
for note_id, metadata in self.index.items():
# Type filter
if note_type and metadata.get("type") != note_type:
continue
# Tag filter
if tags:
note_tags = set(metadata.get("tags", []))
if not note_tags.intersection(tags):
continue
# Read note content
try:
note = self._read_note(note_id)
content = note["content"]
title = metadata.get("title", "")
# Search in title and content
if query_lower in title.lower() or query_lower in content.lower():
results.append({
"note_id": note_id,
"title": title,
"type": metadata.get("type"),
"tags": metadata.get("tags", []),
"content": content,
"updated_at": metadata.get("updated_at")
})
except Exception as e:
print(f"[WARNING] Failed to read note {note_id}: {e}")
continue
# Sort by update time
results.sort(key=lambda x: x["updated_at"], reverse=True)
return results[:limit](5) list: List Notes
def _list_notes(
self,
note_type: Optional[str] = None,
tags: Optional[List[str]] = None,
limit: int = 20
) -> List[Dict]:
"""List notes (in reverse chronological order by update time)
Args:
note_type: Filter by type (optional)
tags: Filter by tags (optional)
limit: Return quantity limit
Returns:
List[Dict]: List of note metadata
"""
results = []
for note_id, metadata in self.index.items():
# Type filter
if note_type and metadata.get("type") != note_type:
continue
# Tag filter
if tags:
note_tags = set(metadata.get("tags", []))
if not note_tags.intersection(tags):
continue
results.append(metadata)
# Sort by update time
results.sort(key=lambda x: x.get("updated_at", ""), reverse=True)
return results[:limit](6) summary: Note Summary
def _summary(self) -> Dict[str, Any]:
"""Generate note summary statistics
Returns:
Dict: Statistical information
"""
total_count = len(self.index)
# Count by type
type_counts = {}
for metadata in self.index.values():
note_type = metadata.get("type", "general")
type_counts[note_type] = type_counts.get(note_type, 0) + 1
# Recently updated notes
recent_notes = sorted(
self.index.values(),
key=lambda x: x.get("updated_at", ""),
reverse=True
)[:5]
return {
"total_notes": total_count,
"type_distribution": type_counts,
"recent_notes": [
{
"id": note["id"],
"title": note.get("title", ""),
"type": note.get("type"),
"updated_at": note.get("updated_at")
}
for note in recent_notes
]
}(7) delete: Delete Note
def _delete_note(self, note_id: str) -> str:
"""Delete note
Args:
note_id: Note ID
Returns:
str: Operation result message
"""
if note_id not in self.index:
raise ValueError(f"Note does not exist: {note_id}")
# 1. Delete file
file_path = self.index[note_id]["file_path"]
if os.path.exists(file_path):
os.remove(file_path)
# 2. Remove from index
title = self.index[note_id].get("title", note_id)
del self.index[note_id]
self._save_index()
return f"✅ Note deleted: {title}"9.4.4 Deep Integration with ContextBuilder
The true power of NoteTool lies in its combined use with ContextBuilder. Let's demonstrate this integration through a complete case study.
(1) Scenario Setup
Suppose we are building a long-term project assistant that needs to:
- Record phased progress of the project
- Track pending issues
- Automatically review relevant notes during each conversation
- Provide coherent recommendations based on historical notes
(2) Implementation Example
from hello_agents import SimpleAgent, HelloAgentsLLM
from hello_agents.context import ContextBuilder, ContextConfig, ContextPacket
from hello_agents.tools import MemoryTool, RAGTool, NoteTool
from datetime import datetime
class ProjectAssistant(SimpleAgent):
"""Long-term project assistant, integrating NoteTool and ContextBuilder"""
def __init__(self, name: str, project_name: str, **kwargs):
super().__init__(name=name, llm=HelloAgentsLLM(), **kwargs)
self.project_name = project_name
# Initialize tools
self.memory_tool = MemoryTool(user_id=project_name)
self.rag_tool = RAGTool(knowledge_base_path=f"./{project_name}_kb")
self.note_tool = NoteTool(workspace=f"./{project_name}_notes")
# Initialize context builder
self.context_builder = ContextBuilder(
memory_tool=self.memory_tool,
rag_tool=self.rag_tool,
config=ContextConfig(max_tokens=4000)
)
self.conversation_history = []
def run(self, user_input: str, note_as_action: bool = False) -> str:
"""Run assistant, automatically integrate notes"""
# 1. Retrieve relevant notes from NoteTool
relevant_notes = self._retrieve_relevant_notes(user_input)
# 2. Convert notes to ContextPacket
note_packets = self._notes_to_packets(relevant_notes)
# 3. Build optimized context
context = self.context_builder.build(
user_query=user_input,
conversation_history=self.conversation_history,
system_instructions=self._build_system_instructions(),
custom_packets=note_packets
)
# 4. Call LLM
response = self.llm.invoke(context)
# 5. If needed, record interaction as note
if note_as_action:
self._save_as_note(user_input, response)
# 6. Update conversation history
self._update_history(user_input, response)
return response
def _retrieve_relevant_notes(self, query: str, limit: int = 3) -> List[Dict]:
"""Retrieve relevant notes"""
try:
# Prioritize retrieving blocker and action type notes
blockers = self.note_tool.run({
"action": "list",
"note_type": "blocker",
"limit": 2
})
# General search
search_results = self.note_tool.run({
"action": "search",
"query": query,
"limit": limit
})
# Merge and deduplicate
all_notes = {note['note_id']: note for note in blockers + search_results}
return list(all_notes.values())[:limit]
except Exception as e:
print(f"[WARNING] Note retrieval failed: {e}")
return []
def _notes_to_packets(self, notes: List[Dict]) -> List[ContextPacket]:
"""Convert notes to context packets"""
packets = []
for note in notes:
content = f"[Note: {note['title']}]\n{note['content']}"
packets.append(ContextPacket(
content=content,
timestamp=datetime.fromisoformat(note['updated_at']),
token_count=len(content) // 4, # Simple estimation
relevance_score=0.75, # Notes have high relevance
metadata={
"type": "note",
"note_type": note['type'],
"note_id": note['note_id']
}
))
return packets
def _save_as_note(self, user_input: str, response: str):
"""Save interaction as note"""
try:
# Determine what type of note to save
if "problem" in user_input.lower() or "blocker" in user_input.lower():
note_type = "blocker"
elif "plan" in user_input.lower() or "next" in user_input.lower():
note_type = "action"
else:
note_type = "conclusion"
self.note_tool.run({
"action": "create",
"title": f"{user_input[:30]}...",
"content": f"## Question\n{user_input}\n\n## Analysis\n{response}",
"note_type": note_type,
"tags": [self.project_name, "auto_generated"]
})
except Exception as e:
print(f"[WARNING] Failed to save note: {e}")
def _build_system_instructions(self) -> str:
"""Build system instructions"""
return f"""You are a long-term assistant for the {self.project_name} project.
Your responsibilities:
1. Provide coherent recommendations based on historical notes
2. Track project progress and pending issues
3. Reference relevant historical notes when answering
4. Provide specific, actionable next-step recommendations
Notes:
- Prioritize issues marked as blockers
- Indicate source of basis in recommendations (notes, memory, or knowledge base)
- Maintain awareness of overall project progress"""
def _update_history(self, user_input: str, response: str):
"""Update conversation history"""
from hello_agents.core.message import Message
self.conversation_history.append(
Message(content=user_input, role="user", timestamp=datetime.now())
)
self.conversation_history.append(
Message(content=response, role="assistant", timestamp=datetime.now())
)
# Limit history length
if len(self.conversation_history) > 10:
self.conversation_history = self.conversation_history[-10:]
# Usage example
assistant = ProjectAssistant(
name="Project Assistant",
project_name="data_pipeline_refactoring"
)
# First interaction: Record project status
response = assistant.run(
"We have completed refactoring of the data model layer, test coverage reached 85%. Next plan is to refactor the business logic layer.",
note_as_action=True
)
# Second interaction: Raise issue
response = assistant.run(
"When refactoring the business logic layer, I encountered dependency version conflict issues. How should I resolve this?"
)
# View note summary
summary = assistant.note_tool.run({"action": "summary"})
print(summary)(3) Running Effect Demonstration
[ContextBuilder] Collected 8 candidate information packages
[ContextBuilder] Selected 7 information packages, total 3500 tokens
✅ Assistant answer:
I noticed this issue was mentioned in your previously recorded notes. According to the note [Refactoring Project - Phase 1], your current test coverage has reached 85%, which is a good foundation.
Regarding the dependency version conflict issue, I recommend:
1. **Use virtual environment isolation**: Create an independent virtual environment for the business logic layer to avoid dependency conflicts with other modules
2. **Lock versions**: Explicitly specify exact versions of all dependencies in requirements.txt
3. **Use pipdeptree**: Analyze the dependency tree to find the root cause of conflicts
I will mark this issue as a blocker and recommend prioritizing its resolution.
[Source: Note note_20250119_153000_0, Project knowledge base]
---
📋 Note summary:
{
"total_notes": 2,
"type_distribution": {
"action": 1,
"blocker": 1
},
"recent_notes": [
{
"id": "note_20250119_154500_1",
"title": "When refactoring the business logic layer, I encountered dependency version conflict issues...",
"type": "blocker",
"updated_at": "2025-01-19T15:45:00"
},
{
"id": "note_20250119_153000_0",
"title": "We have completed refactoring of the data model layer...",
"type": "action",
"updated_at": "2025-01-19T15:30:00"
}
]
}9.4.5 Best Practices
When actually using NoteTool, the following best practices can help you build more powerful long-horizon agents:
Reasonable note classification:
task_state: Record phased progress and statusconclusion: Record important conclusions and findingsblocker: Record blocking issues, highest priorityaction: Record next action plansreference: Record important reference materials
Regular cleanup and archiving:
- For resolved blockers, update to conclusion
- For outdated actions, delete or update promptly
- Use tags for version management, such as
["v1.0", "completed"]
Cooperation with ContextBuilder:
- Retrieve relevant notes before each round of dialogue
- Set different relevance scores based on note type (blocker > action > conclusion)
- Limit number of notes to avoid context overload
Human-machine collaboration:
- Notes are in human-readable Markdown format, supporting manual editing
- Use Git for version control to track note evolution
- At key stages, manually review notes generated by Agent
Automated workflow:
- Regularly generate note summary reports
- Automatically generate project progress documents based on notes
- Synchronize note content to other systems (such as Notion, Confluence)
9.5 TerminalTool: Instant File System Access
In previous chapters, we introduced MemoryTool and RAGTool, which provide conversational memory and knowledge retrieval capabilities respectively. However, in many practical scenarios, agents need instant access and exploration of the file system—viewing log files, analyzing codebase structure, retrieving configuration files, etc. This is where TerminalTool comes in.
TerminalTool provides agents with secure command-line execution capability, supporting common file system and text processing commands, while ensuring system security through multi-layer security mechanisms. This design implements the "Just-in-time (JIT) context" concept mentioned in Section 9.2.2—agents don't need to preload all files, but explore and retrieve on demand.
9.5.1 Design Philosophy and Security Mechanisms
(1) Why do we need TerminalTool?
When building long-horizon agents, we often encounter the following scenarios:
Scenario 1: Codebase Exploration
A development assistant needs to help users understand the structure of a large codebase:
# Traditional approach: Pre-index all files (high cost, may be outdated)
rag_tool.add_document("./project/**/*.py") # Time-consuming, occupies large storage
# TerminalTool approach: Instant exploration
terminal.run({"command": "find . -name '*.py' -type f"}) # Fast, real-time
terminal.run({"command": "grep -r 'class UserService' ."}) # Precise location
terminal.run({"command": "head -n 50 src/services/user.py"}) # View on demandScenario 2: Log File Analysis
An operations assistant needs to analyze application logs:
# Check log file size
terminal.run({"command": "ls -lh /var/log/app.log"})
# View latest error logs
terminal.run({"command": "tail -n 100 /var/log/app.log | grep ERROR"})
# Count error type distribution
terminal.run({"command": "grep ERROR /var/log/app.log | cut -d':' -f3 | sort | uniq -c"})Scenario 3: Data File Preview
A data analysis assistant needs to quickly understand the structure of data files:
# View first few lines of CSV file
terminal.run({"command": "head -n 5 data/sales.csv"})
# Count lines
terminal.run({"command": "wc -l data/*.csv"})
# View column names
terminal.run({"command": "head -n 1 data/sales.csv | tr ',' '\n'"})The common characteristic of these scenarios is: need real-time, lightweight file system access, rather than pre-indexing and vectorization. TerminalTool is designed precisely for this "exploratory" workflow.
(2) Security Mechanism Detailed Explanation
Allowing agents to execute commands is a powerful but dangerous capability. TerminalTool ensures system security through multi-layer security mechanisms:
First Layer: Command Whitelist
Only allow safe read-only commands, completely prohibit any operations that may modify the system:
ALLOWED_COMMANDS = {
# File listing and information
'ls', 'dir', 'tree',
# File content viewing
'cat', 'head', 'tail', 'less', 'more',
# File search
'find', 'grep', 'egrep', 'fgrep',
# Text processing
'wc', 'sort', 'uniq', 'cut', 'awk', 'sed',
# Directory operations
'pwd', 'cd',
# File information
'file', 'stat', 'du', 'df',
# Others
'echo', 'which', 'whereis',
}If the agent attempts to execute commands outside the whitelist, it will be immediately rejected:
terminal.run({"command": "rm -rf /"})
# ❌ Command not allowed: rm
# Allowed commands: cat, cd, cut, dir, du, ...Second Layer: Working Directory Restriction (Sandbox)
TerminalTool can only access the specified working directory and its subdirectories, cannot access other parts of the system:
# Specify working directory during initialization
terminal = TerminalTool(workspace="./project")
# Allowed: Access files within working directory
terminal.run({"command": "cat ./src/main.py"}) # ✅
# Prohibited: Access files outside working directory
terminal.run({"command": "cat /etc/passwd"}) # ❌ Not allowed to access paths outside working directory
# Prohibited: Escape through ..
terminal.run({"command": "cd ../../../etc"}) # ❌ Not allowed to access paths outside working directoryThis sandbox mechanism ensures that even if the agent's behavior is abnormal, it cannot affect other parts of the system.
Third Layer: Timeout Control
Each command has an execution time limit to prevent infinite loops or resource exhaustion:
terminal = TerminalTool(
workspace="./project",
timeout=30 # 30 second timeout
)
# If command execution exceeds 30 seconds
terminal.run({"command": "find / -name '*.log'"})
# ❌ Command execution timeout (exceeded 30 seconds)Fourth Layer: Output Size Limit
Limit the size of command output to prevent memory overflow:
terminal = TerminalTool(
workspace="./project",
max_output_size=10 * 1024 * 1024 # 10MB
)
# If output exceeds 10MB
terminal.run({"command": "cat huge_file.log"})
# ... (first 10MB of content) ...
# ⚠️ Output truncated (exceeded 10485760 bytes)Through these four layers of security mechanisms, TerminalTool provides powerful capabilities while maximizing system security.
9.5.2 Core Functionality Detailed Explanation
The implementation of TerminalTool focuses on two core functions: command execution and directory navigation.
(1) Command Execution
The core _execute_command method is responsible for actually executing commands:
def _execute_command(self, command: str) -> str:
"""Execute command"""
try:
# Execute command in current directory
result = subprocess.run(
command,
shell=True,
cwd=str(self.current_dir), # Execute in current working directory
capture_output=True,
text=True,
timeout=self.timeout,
env=os.environ.copy()
)
# Merge standard output and standard error
output = result.stdout
if result.stderr:
output += f"\n[stderr]\n{result.stderr}"
# Check output size
if len(output) > self.max_output_size:
output = output[:self.max_output_size]
output += f"\n\n⚠️ Output truncated (exceeded {self.max_output_size} bytes)"
# Add return code information
if result.returncode != 0:
output = f"⚠️ Command return code: {result.returncode}\n\n{output}"
return output if output else "✅ Command executed successfully (no output)"
except subprocess.TimeoutExpired:
return f"❌ Command execution timeout (exceeded {self.timeout} seconds)"
except Exception as e:
return f"❌ Command execution failed: {e}"Key points of this implementation:
- Current directory awareness: Use
cwdparameter to execute commands in the correct directory - Error handling: Capture and merge standard error, provide complete diagnostic information
- Return code check: Non-zero return codes are marked as warnings
- Fault-tolerant design: Timeouts and exceptions are handled properly, won't cause agent to crash
(2) Directory Navigation
Special handling of the cd command supports agent navigation in the file system:
def _handle_cd(self, parts: List[str]) -> str:
"""Handle cd command"""
if not self.allow_cd:
return "❌ cd command is disabled"
if len(parts) < 2:
# cd without parameters, return current directory
return f"Current directory: {self.current_dir}"
target_dir = parts[1]
# Handle relative path
if target_dir == "..":
new_dir = self.current_dir.parent
elif target_dir == ".":
new_dir = self.current_dir
elif target_dir == "~":
new_dir = self.workspace
else:
new_dir = (self.current_dir / target_dir).resolve()
# Check if within working directory
try:
new_dir.relative_to(self.workspace)
except ValueError:
return f"❌ Not allowed to access paths outside working directory: {new_dir}"
# Check if directory exists
if not new_dir.exists():
return f"❌ Directory does not exist: {new_dir}"
if not new_dir.is_dir():
return f"❌ Not a directory: {new_dir}"
# Update current directory
self.current_dir = new_dir
return f"✅ Switched to directory: {self.current_dir}"This design supports agents in multi-step file system exploration:
# Step 1: View project structure
terminal.run({"command": "ls -la"})
# Step 2: Enter source code directory
terminal.run({"command": "cd src"})
# Step 3: Find specific files
terminal.run({"command": "find . -name '*service*.py'"})
# Step 4: View file content
terminal.run({"command": "cat user_service.py"})9.5.3 Typical Usage Patterns
TerminalTool supports various common file system operation patterns.
(1) Exploratory Navigation
Agents can explore codebases step by step like human developers:
from hello_agents.tools import TerminalTool
terminal = TerminalTool(workspace="./my_project")
# Step 1: View project root directory
print(terminal.run({"command": "ls -la"}))
"""
total 24
drwxr-xr-x 6 user staff 192 Jan 19 16:00 .
drwxr-xr-x 5 user staff 160 Jan 19 15:30 ..
-rw-r--r-- 1 user staff 1234 Jan 19 15:30 README.md
drwxr-xr-x 4 user staff 128 Jan 19 15:30 src
drwxr-xr-x 3 user staff 96 Jan 19 15:30 tests
-rw-r--r-- 1 user staff 456 Jan 19 15:30 requirements.txt
"""
# Step 2: View source code directory structure
terminal.run({"command": "cd src"})
print(terminal.run({"command": "tree"}))
# Step 3: Search for specific patterns
print(terminal.run({"command": "grep -r 'def process' ."}))(2) Data File Analysis
Quickly understand the structure and content of data files:
terminal = TerminalTool(workspace="./data")
# View first few lines of CSV file
print(terminal.run({"command": "head -n 5 sales_2024.csv"}))
"""
date,product,quantity,revenue
2024-01-01,Widget A,150,4500.00
2024-01-01,Widget B,200,8000.00
2024-01-02,Widget A,180,5400.00
2024-01-02,Widget C,120,3600.00
"""
# Count total lines
print(terminal.run({"command": "wc -l *.csv"}))
"""
10234 sales_2024.csv
8567 sales_2023.csv
18801 total
"""
# Extract and count product categories
print(terminal.run({"command": "tail -n +2 sales_2024.csv | cut -d',' -f2 | sort | uniq -c"}))
"""
3456 Widget A
4123 Widget B
2655 Widget C
"""(3) Log File Analysis
Real-time analysis of application logs, quickly locate issues:
terminal = TerminalTool(workspace="/var/log")
# View latest error logs
print(terminal.run({"command": "tail -n 50 app.log | grep ERROR"}))
# Count error type distribution
print(terminal.run({"command": "grep ERROR app.log | awk '{print $4}' | sort | uniq -c | sort -rn"}))
"""
245 DatabaseConnectionError
123 TimeoutException
67 ValidationError
34 AuthenticationError
"""
# Find logs for specific time period
print(terminal.run({"command": "grep '2024-01-19 15:' app.log | tail -n 20"}))(4) Codebase Analysis
Assist code review and understanding:
terminal = TerminalTool(workspace="./codebase")
# Count lines of code
print(terminal.run({"command": "find . -name '*.py' -exec wc -l {} + | tail -n 1"}))
# Find all TODO comments
print(terminal.run({"command": "grep -rn 'TODO' --include='*.py'"}))
# Find definition of specific function
print(terminal.run({"command": "grep -rn 'def process_data' --include='*.py'"}))
# View function implementation
print(terminal.run({"command": "sed -n '/def process_data/,/^def /p' src/processor.py | head -n -1"}))9.5.4 Collaboration with Other Tools
The true power of TerminalTool lies in its collaborative use with MemoryTool, NoteTool, and ContextBuilder.
(1) Collaboration with MemoryTool
Information discovered by TerminalTool can be stored in the memory system:
# Use TerminalTool to discover project structure
structure = terminal.run({"command": "tree -L 2 src"})
# Store in semantic memory
memory_tool.execute(
"add",
content=f"Project structure:\n{structure}",
memory_type="semantic",
importance=0.8,
metadata={"type": "project_structure"}
)(2) Collaboration with NoteTool
Important discoveries can be recorded as structured notes:
# Discover a performance bottleneck
log_analysis = terminal.run({"command": "grep 'slow query' app.log | tail -n 10"})
# Record as blocker note
note_tool.run({
"action": "create",
"title": "Database Slow Query Issue",
"content": f"## Problem Description\nFound multiple slow queries affecting system performance\n\n## Log Analysis\n```\n{log_analysis}\n```\n\n## Next Steps\n1. Analyze slow query SQL\n2. Add indexes\n3. Optimize query logic",
"note_type": "blocker",
"tags": ["performance", "database"]
})(3) Collaboration with ContextBuilder
TerminalTool output can be part of the context:
# Explore codebase
code_structure = terminal.run({"command": "ls -R src"})
recent_changes = terminal.run({"command": "git log --oneline -10"})
# Convert to ContextPacket
from hello_agents.context import ContextPacket
from datetime import datetime
packets = [
ContextPacket(
content=f"Codebase structure:\n{code_structure}",
timestamp=datetime.now(),
token_count=len(code_structure) // 4,
relevance_score=0.7,
metadata={"type": "code_structure", "source": "terminal"}
),
ContextPacket(
content=f"Recent commits:\n{recent_changes}",
timestamp=datetime.now(),
token_count=len(recent_changes) // 4,
relevance_score=0.8,
metadata={"type": "git_history", "source": "terminal"}
)
]
# Include this information when building context
context = context_builder.build(
user_query="How to refactor the user service module?",
custom_packets=packets
)9.6 Long-Horizon Agent in Practice: Codebase Maintenance Assistant
Now, let's integrate ContextBuilder, NoteTool, and TerminalTool to build a complete long-horizon agent—Codebase Maintenance Assistant. This assistant can:
- Explore and understand codebase structure
- Record discovered issues and improvement points
- Track long-term refactoring tasks
- Maintain coherence under context window limitations
9.6.1 Scenario Setup and Requirements Analysis
Business Scenario
Suppose we are maintaining a medium-sized Python web application. This codebase contains about 50 Python files, built with the Flask framework, covering data models, business logic, API interfaces, and other modules, while also having some technical debt that needs to be gradually cleaned up. In this scenario, we need an intelligent assistant to help us explore the codebase, understand project structure, dependencies, and code style; identify issues in the code, such as code duplication, excessive complexity, lack of tests, etc.; track task progress, record to-do items, completed work, and encountered blockers; and provide coherent refactoring recommendations based on historical context.
Challenges and Solutions
This scenario faces several typical long-horizon task challenges. First is the problem of information exceeding the context window—the entire codebase may contain tens of thousands of lines of code, which cannot be placed in the context window all at once. We solve this by using TerminalTool for instant, on-demand code exploration, viewing specific files only when needed. Second is the cross-session state management challenge—refactoring tasks may last for days and need to maintain progress across multiple sessions. We address this by using NoteTool to record phased progress, to-do items, and key decisions. Finally, there's the issue of context quality and relevance—each conversation needs to review relevant historical information but cannot be overwhelmed by irrelevant information. We use ContextBuilder to intelligently filter and organize context, ensuring high signal density.
9.6.2 System Architecture Design
Our codebase maintenance assistant adopts a three-layer architecture, as shown in Figure 9.3:
9.6.3 Core Implementation
Now let's implement the core class of this system:
from typing import Dict, Any, List, Optional
from datetime import datetime
import json
from hello_agents import SimpleAgent, HelloAgentsLLM
from hello_agents.context import ContextBuilder, ContextConfig, ContextPacket
from hello_agents.tools import MemoryTool, NoteTool, TerminalTool
from hello_agents.core.message import Message
class CodebaseMaintainer:
"""Codebase Maintenance Assistant - Long-horizon agent example
Integrates ContextBuilder + NoteTool + TerminalTool + MemoryTool
Implements cross-session codebase maintenance task management
"""
def __init__(
self,
project_name: str,
codebase_path: str,
llm: Optional[HelloAgentsLLM] = None
):
self.project_name = project_name
self.codebase_path = codebase_path
self.session_id = f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
# Initialize LLM
self.llm = llm or HelloAgentsLLM()
# Initialize tools
self.memory_tool = MemoryTool(user_id=project_name)
self.note_tool = NoteTool(workspace=f"./{project_name}_notes")
self.terminal_tool = TerminalTool(workspace=codebase_path, timeout=60)
# Initialize context builder
self.context_builder = ContextBuilder(
memory_tool=self.memory_tool,
rag_tool=None, # This case does not use RAG
config=ContextConfig(
max_tokens=4000,
reserve_ratio=0.15,
min_relevance=0.2,
enable_compression=True
)
)
# Conversation history
self.conversation_history: List[Message] = []
# Statistics
self.stats = {
"session_start": datetime.now(),
"commands_executed": 0,
"notes_created": 0,
"issues_found": 0
}
print(f"✅ Codebase maintenance assistant initialized: {project_name}")
print(f"📁 Working directory: {codebase_path}")
print(f"🆔 Session ID: {self.session_id}")
def run(self, user_input: str, mode: str = "auto") -> str:
"""Run assistant
Args:
user_input: User input
mode: Running mode
- "auto": Automatically decide whether to use tools
- "explore": Focus on code exploration
- "analyze": Focus on problem analysis
- "plan": Focus on task planning
Returns:
str: Assistant's answer
"""
print(f"\n{'='*80}")
print(f"👤 User: {user_input}")
print(f"{'='*80}\n")
# Step 1: Execute preprocessing based on mode
pre_context = self._preprocess_by_mode(user_input, mode)
# Step 2: Retrieve relevant notes
relevant_notes = self._retrieve_relevant_notes(user_input)
note_packets = self._notes_to_packets(relevant_notes)
# Step 3: Build optimized context
context = self.context_builder.build(
user_query=user_input,
conversation_history=self.conversation_history,
system_instructions=self._build_system_instructions(mode),
custom_packets=note_packets + pre_context
)
# Step 4: Call LLM
print("🤖 Thinking...")
response = self.llm.invoke(context)
# Step 5: Post-processing
self._postprocess_response(user_input, response)
# Step 6: Update conversation history
self._update_history(user_input, response)
print(f"\n🤖 Assistant: {response}\n")
print(f"{'='*80}\n")
return response
def _preprocess_by_mode(
self,
user_input: str,
mode: str
) -> List[ContextPacket]:
"""Execute preprocessing based on mode, collect relevant information"""
packets = []
if mode == "explore" or mode == "auto":
# Explore mode: Automatically view project structure
print("🔍 Exploring codebase structure...")
structure = self.terminal_tool.run({"command": "find . -type f -name '*.py' | head -n 20"})
self.stats["commands_executed"] += 1
packets.append(ContextPacket(
content=f"[Codebase Structure]\n{structure}",
timestamp=datetime.now(),
token_count=len(structure) // 4,
relevance_score=0.6,
metadata={"type": "code_structure", "source": "terminal"}
))
if mode == "analyze":
# Analyze mode: Check code complexity and issues
print("📊 Analyzing code quality...")
# Count lines of code
loc = self.terminal_tool.run({"command": "find . -name '*.py' -exec wc -l {} + | tail -n 1"})
# Find TODO and FIXME
todos = self.terminal_tool.run({"command": "grep -rn 'TODO\\|FIXME' --include='*.py' | head -n 10"})
self.stats["commands_executed"] += 2
packets.append(ContextPacket(
content=f"[Code Statistics]\n{loc}\n\n[To-Do Items]\n{todos}",
timestamp=datetime.now(),
token_count=(len(loc) + len(todos)) // 4,
relevance_score=0.7,
metadata={"type": "code_analysis", "source": "terminal"}
))
if mode == "plan":
# Planning mode: Load recent notes
print("📋 Loading task planning...")
task_notes = self.note_tool.run({
"action": "list",
"note_type": "task_state",
"limit": 3
})
if task_notes:
content = "\n".join([f"- {note['title']}" for note in task_notes])
packets.append(ContextPacket(
content=f"[Current Tasks]\n{content}",
timestamp=datetime.now(),
token_count=len(content) // 4,
relevance_score=0.8,
metadata={"type": "task_plan", "source": "notes"}
))
return packets
def _retrieve_relevant_notes(self, query: str, limit: int = 3) -> List[Dict]:
"""Retrieve relevant notes"""
try:
# Prioritize retrieving blockers
blockers = self.note_tool.run({
"action": "list",
"note_type": "blocker",
"limit": 2
})
# Search relevant notes
search_results = self.note_tool.run({
"action": "search",
"query": query,
"limit": limit
})
# Merge and deduplicate
all_notes = {note.get('note_id') or note.get('id'): note for note in (blockers or []) + (search_results or [])}
return list(all_notes.values())[:limit]
except Exception as e:
print(f"[WARNING] Note retrieval failed: {e}")
return []
def _notes_to_packets(self, notes: List[Dict]) -> List[ContextPacket]:
"""Convert notes to context packets"""
packets = []
for note in notes:
# Set different relevance scores based on note type
relevance_map = {
"blocker": 0.9,
"action": 0.8,
"task_state": 0.75,
"conclusion": 0.7
}
note_type = note.get('type', 'general')
relevance = relevance_map.get(note_type, 0.6)
content = f"[Note: {note.get('title', 'Untitled')}]\nType: {note_type}\n\n{note.get('content', '')}"
packets.append(ContextPacket(
content=content,
timestamp=datetime.fromisoformat(note.get('updated_at', datetime.now().isoformat())),
token_count=len(content) // 4,
relevance_score=relevance,
metadata={
"type": "note",
"note_type": note_type,
"note_id": note.get('note_id') or note.get('id')
}
))
return packets
def _build_system_instructions(self, mode: str) -> str:
"""Build system instructions"""
base_instructions = f"""You are the codebase maintenance assistant for the {self.project_name} project.
Your core capabilities:
1. Use TerminalTool to explore codebase (ls, cat, grep, find, etc.)
2. Use NoteTool to record discoveries and tasks
3. Provide coherent recommendations based on historical notes
Current session ID: {self.session_id}
"""
mode_specific = {
"explore": """
Current mode: Explore codebase
You should:
- Actively use terminal commands to understand code structure
- Identify key modules and files
- Record project architecture in notes
""",
"analyze": """
Current mode: Analyze code quality
You should:
- Find code issues (duplication, complexity, TODOs, etc.)
- Evaluate code quality
- Record discovered issues as blocker or action notes
""",
"plan": """
Current mode: Task planning
You should:
- Review historical notes and tasks
- Formulate next action plan
- Update task status notes
""",
"auto": """
Current mode: Auto decision
You should:
- Flexibly choose strategies based on user needs
- Use tools when needed
- Maintain professionalism and practicality in responses
"""
}
return base_instructions + mode_specific.get(mode, mode_specific["auto"])
def _postprocess_response(self, user_input: str, response: str):
"""Post-processing: Analyze response, automatically record important information"""
# If issues found, automatically create blocker note
if any(keyword in response.lower() for keyword in ["issue", "bug", "error", "blocker", "problem"]):
try:
self.note_tool.run({
"action": "create",
"title": f"Issue found: {user_input[:30]}...",
"content": f"## User Input\n{user_input}\n\n## Issue Analysis\n{response[:500]}...",
"note_type": "blocker",
"tags": [self.project_name, "auto_detected", self.session_id]
})
self.stats["notes_created"] += 1
self.stats["issues_found"] += 1
print("📝 Automatically created issue note")
except Exception as e:
print(f"[WARNING] Failed to create note: {e}")
# If task planning, automatically create action note
elif any(keyword in user_input.lower() for keyword in ["plan", "next", "task", "todo"]):
try:
self.note_tool.run({
"action": "create",
"title": f"Task planning: {user_input[:30]}...",
"content": f"## Discussion\n{user_input}\n\n## Action Plan\n{response[:500]}...",
"note_type": "action",
"tags": [self.project_name, "planning", self.session_id]
})
self.stats["notes_created"] += 1
print("📝 Automatically created action plan note")
except Exception as e:
print(f"[WARNING] Failed to create note: {e}")
def _update_history(self, user_input: str, response: str):
"""Update conversation history"""
self.conversation_history.append(
Message(content=user_input, role="user", timestamp=datetime.now())
)
self.conversation_history.append(
Message(content=response, role="assistant", timestamp=datetime.now())
)
# Limit history length (keep recent 10 rounds of conversation)
if len(self.conversation_history) > 20:
self.conversation_history = self.conversation_history[-20:]
# === Convenience methods ===
def explore(self, target: str = ".") -> str:
"""Explore codebase"""
return self.run(f"Please explore the code structure of {target}", mode="explore")
def analyze(self, focus: str = "") -> str:
"""Analyze code quality"""
query = f"Please analyze code quality" + (f", focusing on {focus}" if focus else "")
return self.run(query, mode="analyze")
def plan_next_steps(self) -> str:
"""Plan next steps"""
return self.run("Based on current progress, plan next steps", mode="plan")
def execute_command(self, command: str) -> str:
"""Execute terminal command"""
result = self.terminal_tool.run({"command": command})
self.stats["commands_executed"] += 1
return result
def create_note(
self,
title: str,
content: str,
note_type: str = "general",
tags: List[str] = None
) -> str:
"""Create note"""
result = self.note_tool.run({
"action": "create",
"title": title,
"content": content,
"note_type": note_type,
"tags": tags or [self.project_name]
})
self.stats["notes_created"] += 1
return result
def get_stats(self) -> Dict[str, Any]:
"""Get statistics"""
duration = (datetime.now() - self.stats["session_start"]).total_seconds()
# Get note summary
try:
note_summary = self.note_tool.run({"action": "summary"})
except:
note_summary = {}
return {
"session_info": {
"session_id": self.session_id,
"project": self.project_name,
"duration_seconds": duration
},
"activity": {
"commands_executed": self.stats["commands_executed"],
"notes_created": self.stats["notes_created"],
"issues_found": self.stats["issues_found"]
},
"notes": note_summary
}
def generate_report(self, save_to_file: bool = True) -> Dict[str, Any]:
"""Generate session report"""
report = self.get_stats()
if save_to_file:
report_file = f"maintainer_report_{self.session_id}.json"
with open(report_file, 'w', encoding='utf-8') as f:
json.dump(report, f, ensure_ascii=False, indent=2, default=str)
report["report_file"] = report_file
print(f"📄 Report saved: {report_file}")
return report9.6.4 Complete Usage Example
Now let's demonstrate the workflow of this long-horizon agent through a complete usage scenario:
# ========== Initialize Assistant ==========
from hello_agents import HelloAgentsLLM
maintainer = CodebaseMaintainer(
project_name="my_flask_app",
codebase_path="./my_flask_app",
llm=HelloAgentsLLM()
)
# ========== Day 1: Explore Codebase ==========
# 1. Initial exploration
response = maintainer.explore()
"""
🔍 Exploring codebase structure...
🤖 Thinking...
🤖 Assistant: I have scanned the codebase, this is a Flask web application, mainly containing the following modules:
📁 Project structure:
- app/
├── models/ # Data models (User, Product, Order, etc.)
├── routes/ # API routes
├── services/ # Business logic
└── utils/ # Utility functions
- tests/ # Test files
- migrations/ # Database migrations
- config.py # Configuration file
Total approximately 3,500 lines of Python code.
Recommended next steps:
1. Review core model definitions
2. Check routes and API design
3. Evaluate test coverage
"""
# 2. Deep dive into a specific module
response = maintainer.run("Please analyze the data model design in the app/models/ directory")
"""
🤖 Assistant: I reviewed the app/models/ directory and found the following data models:
1. User (user.py)
- Fields: id, username, email, password_hash, created_at
- Relationships: One-to-many Order
- Issue: ⚠️ Missing index, email field not set with unique constraint
2. Product (product.py)
- Fields: id, name, description, price, stock
- Issue: ✅ Well designed
3. Order (order.py)
- Fields: id, user_id, product_id, quantity, status
- Issue: ⚠️ Missing creation time field, not conducive to data analysis
Recommendations:
1. Add unique constraint to User.email
2. Add created_at and updated_at fields to Order
3. Consider adding soft delete support
I have recorded these issues as blocker notes.
"""
# [Automatically created blocker note]
# ========== Day 2: Analyze Code Quality ==========
# 1. Overall quality analysis
response = maintainer.analyze()
"""
📊 Analyzing code quality...
🤖 Thinking...
🤖 Assistant: Code quality analysis report:
📊 Code statistics:
- Total lines: 3,542 lines
- Average file length: approximately 150 lines
- Largest file: services/order_service.py (456 lines)
⚠️ Issues found:
1. Code duplication (HIGH):
- user_service.py and product_service.py have similar CRUD logic
- Recommendation: Extract base class BaseService
2. Excessive complexity (MEDIUM):
- order_service.py::process_order method has 8 levels of nesting
- Recommendation: Split into multiple small methods
3. Lack of tests (HIGH):
- Test coverage only 45%
- services/ directory has no tests at all
4. TODO/FIXME (12 items):
- Most are "optimize later" type comments
- Recommendation: Clean up or convert to formal tasks
I have recorded these issues as notes and marked priorities.
"""
# [Automatically created multiple blocker notes]
# 2. View specific issue
response = maintainer.run("Please review the process_order method in order_service.py and provide refactoring recommendations")
"""
🤖 Assistant: I reviewed the code of the process_order method:
【Original Code】
def process_order(self, order_id):
order = Order.query.get(order_id)
if order:
if order.status == 'pending':
product = Product.query.get(order.product_id)
if product:
if product.stock >= order.quantity:
# ... 8 levels of nesting continue ...
【Problem Analysis】
1. Deep nesting leads to poor readability
2. Lack of error handling
3. Business logic mixed with data access
【Refactoring Recommendations】
def process_order(self, order_id):
# 1. Early return pattern, reduce nesting
order = self._get_order(order_id)
if not order or order.status != 'pending':
return False
# 2. Extract methods, single responsibility
if not self._check_stock(order):
return False
# 3. Use transactions to ensure consistency
with db.transaction():
self._update_stock(order)
self._update_order_status(order, 'processing')
return True
【Next Steps】
1. I have created a refactoring plan note
2. Recommend writing test cases first
3. Refactor gradually, keeping functionality unchanged
"""
# [Automatically created action note]
# ========== Day 3: Plan Refactoring Tasks ==========
# 1. Review progress
response = maintainer.plan_next_steps()
"""
📋 Loading task planning...
🤖 Thinking...
🤖 Assistant: Based on the exploration and analysis from the past two days, I have organized refactoring task priorities:
🔴 High priority (complete this week):
1. [blocker] Add unique constraint to User.email
- Risk: Low, just database migration
- Effort: 1 hour
2. [blocker] Extract BaseService to reduce code duplication
- Risk: Medium, affects multiple service classes
- Effort: Half a day
3. [blocker] Refactor process_order method
- Risk: High, core business logic
- Effort: 1 day (including writing tests)
🟡 Medium priority (next week):
4. Add unit tests for services/
5. Clean up TODO comments
6. Add time fields to Order
🟢 Low priority (TBD):
7. Performance optimization
8. Documentation updates
Recommendations:
- This week, first resolve data model issues (tasks 1, 6)
- Next week, conduct code refactoring (tasks 2, 3)
- Continuously increase test coverage (task 4)
I have updated task status notes, you can check progress anytime.
"""
# 2. Manually create detailed refactoring plan
maintainer.create_note(
title="Weekly Refactoring Plan - Week 1",
content="""## Objectives
Complete optimization of data model layer
## Task Checklist
- [ ] Add unique constraint to User.email
- [ ] Add created_at, updated_at fields to Order
- [ ] Write database migration scripts
- [ ] Update related test cases
## Schedule
- Monday: Design migration scripts
- Tuesday-Wednesday: Execute migration and test
- Thursday: Update test cases
- Friday: Code Review
## Risks
- Database migration may affect production environment, needs to be executed during off-peak hours
- Existing data may have duplicate emails, need to clean up first
""",
note_type="task_state",
tags=["refactoring", "week1", "high_priority"]
)
print("✅ Created detailed refactoring plan")
# ========== One Week Later: Check Progress ==========
# View note summary
summary = maintainer.note_tool.run({"action": "summary"})
print("📊 Note summary:")
print(json.dumps(summary, indent=2, ensure_ascii=False))
"""
{
"total_notes": 8,
"type_distribution": {
"blocker": 3,
"action": 2,
"task_state": 2,
"conclusion": 1
},
"recent_notes": [
{
"id": "note_20250119_160000_7",
"title": "Weekly Refactoring Plan - Week 1",
"type": "task_state",
"updated_at": "2025-01-19T16:00:00"
},
...
]
}
"""
# Generate complete report
report = maintainer.generate_report()
print("\n📄 Session report:")
print(json.dumps(report, indent=2, ensure_ascii=False))
"""
{
"session_info": {
"session_id": "session_20250119_150000",
"project": "my_flask_app",
"duration_seconds": 172800 # 2 days
},
"activity": {
"commands_executed": 24,
"notes_created": 8,
"issues_found": 3
},
"notes": { ... }
}
"""9.6.5 Running Effect Analysis
Through this complete case study, we can see several key characteristics of long-horizon agents. First is cross-session coherence—the agent maintains task coherence across multiple days and sessions through NoteTool. Issues explored on day one are automatically considered during day two analysis, day three planning can synthesize all discoveries from the previous two days, and the complete history is preserved when checking a week later. Second is intelligent context management—ContextBuilder ensures high-quality context for each conversation, automatically gathering relevant notes (especially blocker types), dynamically adjusting preprocessing strategies based on conversation mode, and selecting the most relevant information within the token budget.
The third characteristic is instant file system access—TerminalTool supports flexible code exploration without needing to pre-index the entire codebase, can view specific file content instantly, and supports complex text processing (grep, awk, etc.). Fourth is automated knowledge management—the system automatically manages discovered knowledge, automatically creating blocker notes when issues are found, automatically creating action notes when discussing plans, and automatically storing key information in the memory system. Finally is human-machine collaboration—this system supports flexible human-machine collaboration modes, where agents can automatically complete exploration and analysis, humans can intervene and guide through the note system, and supports manually creating detailed planning notes.
This basic framework can be further extended, such as integrating RAGTool to build vector indexes for codebases combined with semantic retrieval, splitting into specialized explorers, analyzers, and planners to implement multi-agent collaboration, integrating testing tools to automatically verify refactoring results, executing git commands through TerminalTool to track code changes, or building visual interfaces using Gradio/Streamlit.
9.7 Chapter Summary
In this chapter, we deeply explored the theoretical foundations and engineering practices of context engineering:
Theoretical Level
- Essence of Context Engineering: Evolution from "prompt engineering" to "context engineering", the core is managing limited attention budget
- Context Rot: Understanding performance degradation brought by long contexts, recognizing context as a scarce resource
- Three Major Strategies: Compaction, structured note-taking, sub-agent architectures
Engineering Practice
- ContextBuilder: Implements GSSC pipeline, provides unified context management interface
- NoteTool: Hybrid format of Markdown+YAML, supports structured long-term memory
- TerminalTool: Secure command-line tool, supports instant file system access
- Long-Horizon Agent: Integrates three major tools, builds cross-session codebase maintenance assistant
Core Takeaways
- Layered Design: Instant access (TerminalTool) + session memory (MemoryTool) + persistent notes (NoteTool)
- Intelligent Filtering: Scoring mechanism based on relevance and recency
- Security First: Multi-layer security mechanisms ensure system stability
- Human-Machine Collaboration: Balance between automation and controllability
Through this chapter's learning, you have not only mastered the core technologies of context engineering, but more importantly, understood how to build agent systems that can maintain coherence and effectiveness over long time spans. These skills will become an important foundation for you to build production-level agent applications.
In the next chapter, we will explore agent communication protocols and learn how to enable agents to interact more broadly with the external world.
Exercises
Note: Some exercises do not have standard answers. The focus is on cultivating learners' comprehensive understanding and practical ability in context engineering and long-horizon task management.
This chapter introduced the difference between context engineering and prompt engineering. Please analyze:
- Section 9.1 mentioned "context must be viewed as a limited resource with diminishing marginal returns". Please explain what the "context rot" phenomenon is? Why do we still need to carefully manage context even when models support 100K or even 200K context windows?
- Suppose you want to build a "code review assistant" that needs to analyze a codebase containing 50 files. Please compare two strategies: (1) Load all file content into context at once; (2) Use JIT (Just-in-time) context, retrieving files on demand through tools. Analyze the advantages, disadvantages, and applicable scenarios of each.
- Section 9.2.1 mentioned two extreme pitfalls of system prompts: "over-hardcoding" and "too vague". Please give a practical example of each and explain how to find the right balance.
The GSSC (Gather-Select-Structure-Compress) pipeline is the core technology of this chapter. Please think deeply:
Note: This is a hands-on practice question, actual operation is recommended
- In the ContextBuilder implementation in Section 9.3, the four stages each have different responsibilities. Please analyze: If a certain stage fails (such as the Select stage selecting irrelevant information, or the Compress stage over-compressing leading to information loss), what impact will it have on the final agent performance?
- Based on the code in Section 9.3.4, add a "context quality assessment" function to ContextBuilder: After each context build, automatically evaluate the information density, relevance, and completeness of the context, and provide optimization suggestions.
- The "compression" stage in the GSSC pipeline uses LLM for intelligent summarization. Please think: Under what circumstances might simple truncation or sliding window strategies be more appropriate than LLM summarization? Design a hybrid compression strategy that combines the advantages of multiple compression methods.
NoteTool and TerminalTool are key tools supporting long-horizon tasks. Based on Sections 9.4 and 9.5, please complete the following extension practices:
Note: This is a hands-on practice question, actual operation is recommended
- NoteTool uses a hierarchical note system (project notes, task notes, temporary notes). Please design an "automatic note organization" mechanism: When temporary notes accumulate to a certain number, the agent can automatically analyze these notes, promote important information to task notes or project notes, and clean up redundant content.
- TerminalTool provides file system operation capabilities, but Section 9.5.2 emphasizes security design. Please analyze: Are the current security mechanisms (path validation, command whitelist, permission check) sufficient? If the agent needs to access sensitive files or execute dangerous operations, how should a "human-machine collaborative approval" process be designed?
- Combining NoteTool and TerminalTool, design an "intelligent code refactoring assistant": Can analyze codebase structure, record refactoring plans, execute refactoring operations step by step, and track progress and encountered problems in notes. Please draw a complete workflow diagram.
In the "long-horizon task management" case in Section 9.6, we saw the value of context engineering in practical applications. Please analyze in depth:
- The case uses a "layered context management" strategy: instant access (TerminalTool) + session memory (MemoryTool) + persistent notes (NoteTool). Please analyze: How should these three layers coordinate? What information should be placed in which layer? How to avoid information redundancy and inconsistency?
- Suppose an interruption occurs during task execution (such as system crash, network disconnection), the agent needs to recover state from notes and continue execution. Please design a "resume from breakpoint" mechanism: How to record sufficient state information in notes? How to verify that the recovered state is correct?
- Long-horizon tasks often involve parallel or serial execution of multiple subtasks. Please design a "task dependency management" system: Can express dependency relationships between tasks (such as "Task B must be executed after Task A is completed"), and automatically schedule task execution order. How should this system integrate with NoteTool?
This chapter repeatedly mentioned the concept of "progressive disclosure". Please think:
- In Section 9.2.2, progressive disclosure is described as "each interaction step produces new context, which in turn guides the next decision". Please design a specific application scenario (such as academic paper writing, complex problem debugging), demonstrating how progressive disclosure helps agents complete tasks more efficiently.
- A potential risk of progressive disclosure is "inefficient exploration": The agent may waste time on unimportant details or miss key information. Please design an "exploration guidance" mechanism: Through heuristic rules or metacognitive strategies, help the agent make smarter decisions about "what to explore next".
- Compare "progressive disclosure" with traditional "load all context at once": In what types of tasks does the former have obvious advantages? In what types of tasks might the latter be more appropriate? Please provide at least 3 examples of different types of tasks.
References
[1] Anthropic. Effective Context Engineering for AI Agents. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
[2] David Kim. Context-Engineering (GitHub). https://github.com/davidkimai/Context-Engineering