Chapter 14: Automated Deep Research Agent
In Chapter 13's travel assistant project, we experienced how to apply HelloAgents to a multi-agent product. In this chapter, we continue forward, focusing on knowledge-intensive applications: building an agent assistant that can automatically execute deep research tasks.
Compared to travel planning, the difficulty of deep research lies in the continuous divergence of information, rapid updates of facts, and users' high requirements for citation sources. To deliver trustworthy research reports, we need to equip agents with three core capabilities:
(1) Problem Analysis: Decompose users' open topics into retrievable query statements.
(2) Multi-Round Information Collection: Continuously mine materials by combining different search APIs and deduplicate and integrate them.
(3) Reflection and Summarization: Identify knowledge gaps based on stage results, decide whether to continue retrieval, and generate structured summaries.
14.1 Project Overview and Architecture Design
14.1.1 Why We Need a Deep Research Assistant
In the era of information explosion, we need to quickly understand new technologies, concepts, or events every day. Traditional research methods have several pain points. First is information overload. Search engines return thousands of results, and you need to click on links one by one and read a lot of content to find useful information. Second is lack of structure. Even if you find relevant information, this information is often fragmented and lacks systematic organization. Finally is repetitive labor. Every time you research a new topic, you need to repeat the process of "search → read → summarize → organize".
This is the problem that the deep research assistant needs to solve. It's not just a search tool, but a research assistant that can autonomously plan, execute, and summarize.
Core Value of Deep Research Assistant:
- Save Time: Compress 1-2 hours of research work into 5-10 minutes
- Improve Quality: Systematic research process to avoid missing important information
- Traceable: Record all search results and sources for easy verification and citation
- Extensible: Easily add new search engines, data sources, and analysis tools
14.1.2 Technical Architecture Overview
This system still adopts the classic front-end and back-end separation architecture, as shown in Figure 14.1.
Figure 14.1 Deep Research Assistant Technical Architecture
The system is designed with a four-layer architecture:
Front-End Layer (Vue3+TypeScript): Full-screen modal dialog UI, Markdown result visualization
Back-End Layer (FastAPI): API routing (/research/stream)
Agent Layer (HelloAgents): Three specialized Agents (TODO Planner, Task Summarizer, Report Writer) + Two core tools (SearchTool, NoteTool)
External Service Layer: Search engines + LLM providers
Let's see how a complete research request flows through the system, as shown in Figure 14.2:
Figure 14.2 Deep Research Assistant Data Flow Process
- User Input: User enters research topic on the front-end
- Front-End Sends: Front-end connects to
/research/streamvia SSE - Back-End Receives: FastAPI receives request, creates research state
- Planning Phase: Calls research planning Agent, decomposes into 3 subtasks
- Execution Phase: Executes each subtask one by one
- Use SearchTool to search
- Call task summarization Agent to summarize
- Use NoteTool to record results
- Report Phase: Call report generation Agent, integrate all summaries
- Stream Return: Push progress and results to front-end via SSE
- Front-End Display: Front-end updates task status, progress bar, logs, and report in real-time
The project directory structure is as follows:
helloagents-deepresearch/
├── backend/ # Back-end code
│ ├── src/
│ │ ├── agent.py # Core coordinator
│ │ ├── main.py # FastAPI entry
│ │ ├── models.py # Data models
│ │ ├── prompts.py # Prompt templates
│ │ ├── config.py # Configuration management
│ │ └── services/ # Service layer
│ │ ├── planner.py # Planning service
│ │ ├── summarizer.py # Summarization service
│ │ ├── reporter.py # Report service
│ │ └── search.py # Search service
│ ├── .env # Environment variables
│ ├── pyproject.toml # Dependency management
│ └── workspace/ # Research notes
│
└── frontend/ # Front-end code
├── src/
│ ├── App.vue # Main component
│ ├── components/ # UI components
│ │ └── ResearchModal.vue
│ └── composables/ # Composable functions
│ └── useResearch.ts
├── package.json # npm dependencies
└── vite.config.ts # Build configuration14.1.3 Quick Experience: Run the Project in 5 Minutes
Before diving into implementation details, let's first run the project to see the final result. This way you'll have an intuitive understanding of the entire system.
You can check versions with the following commands:
python --version # Should show Python 3.10.x or higher
node --version # Should show v16.x.x or higher
npm --version # Should show 8.x.x or higher(1) Start the Back-End
# 1. Enter back-end directory
cd helloagents-deepresearch/backend
# 2. Install dependencies
# Method 1: Using uv (recommended, faster Python package manager)
uv sync
# Method 2: Using pip
pip install -e .
# 3. Configure environment variables
cp .env.example .env
# 4. Edit .env file, fill in your API keys
# Open .env file with your favorite editor
# At minimum, configure:
# - LLM_PROVIDER (e.g., openai, deepseek, qwen)
# - LLM_API_KEY (your LLM API key)
# - SEARCH_API (e.g., duckduckgo, tavily)
# 5. Start back-end
python src/main.pyIf everything is normal, you'll see output similar to:
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)(2) Start the Front-End
Open a new terminal window:
# 1. Enter front-end directory
cd helloagents-deepresearch/frontend
# 2. Install dependencies
npm install
# 3. Start front-end
npm run devIf everything is normal, you'll see output similar to:
VITE v5.0.0 ready in 500 ms
➜ Local: http://localhost:5174/
➜ Network: use --host to expose
➜ press h + enter to show help(3) Start Research
Open your browser and visit http://localhost:5174. You'll see a centered input card, as shown in Figure 14.3. Enter a research topic, for example What kind of organization is Datawhale?, select a search engine (if multiple are configured), and click the "Start Research" button.
Figure 14.3 Deep Research Assistant Search Page
As shown in Figure 14.4, the system will automatically expand to full screen, with research information displayed on the left and research progress and results displayed in real-time on the right. The entire research process takes about 1-3 minutes, depending on the complexity of the topic and the response speed of the search engine.
Figure 14.4 Deep Research Assistant Expanded Research
After research is complete, you'll see:
- Task List: Shows all subtasks and their status
- Progress Log: Shows all operations during the research process
- Final Report: Structured Markdown report containing summaries of all subtasks and source citations
Now you've successfully run the deep research assistant and have an intuitive understanding of the system.
14.2 TODO-Driven Research Paradigm
14.2.1 What is TODO-Driven Research
Traditional search engines can only answer single questions, while deep research needs to answer a series of related questions. The TODO-driven research paradigm decomposes complex research topics into multiple subtasks (TODOs), executes them one by one, and integrates the results.
The core idea of this paradigm is: Transform the complex task of "research" into a "planning → execution → integration" process.
Let's understand this transformation through an example. Suppose you want to research "What kind of organization is Datawhale?". The traditional search method is:
User input: What kind of organization is Datawhale?
Search engine: Returns 10-20 links
User: Click on links one by one, read content, take notes
Result: Fragmented information, lacking systematizationThe problem with this approach is that each link only covers one aspect of the topic, lacks systematic structure, and requires manual organization and summarization.
TODO-Driven Approach: Systematic Research
User input: What kind of organization is Datawhale?
System planning:
├─ TODO 1: Basic information about Datawhale (organizational positioning)
├─ TODO 2: Main projects of Datawhale (core content)
├─ TODO 3: Community culture of Datawhale (values)
└─ TODO 4: Influence of Datawhale (social contribution)
System execution:
For each TODO:
1. Search for relevant materials
2. Summarize key information
3. Record source citations
System integration:
Generate structured report:
├─ Part 1: Organizational positioning (from TODO 1)
├─ Part 2: Core content (from TODO 2)
├─ Part 3: Values (from TODO 3)
├─ Part 4: Social contribution (from TODO 4)
└─ References: All source citationsThe advantages of this approach are that it decomposes complex topics into clear sub-questions, records search results and summaries for each subtask for easy traceability, and the systematic research process avoids missing important information. It's also easy to add new subtasks or adjust execution order.
A complete TODO-driven research system contains three core elements:
(1) Intelligent Planner (TODO Planner): Responsible for decomposing research topics into subtasks. A good planner needs to understand the key aspects and research objectives of the topic, decompose the topic into 3-5 subtasks (too few won't cover everything, too many will be redundant), and design appropriate search queries for each subtask.
(2) Task Executor: Responsible for executing each subtask. The executor needs to use search engines to obtain relevant materials, extract key information and remove redundant content, while saving all source citations for easy verification.
(3) Report Writer: Responsible for integrating the results of all subtasks. The generator needs to organize content in logical order, merge duplicate information, and add source citations for each viewpoint.
In our case, the TODO-driven research process is shown in Figure 14.5:
Figure 14.5 TODO-Driven Research Process
The entire process is linear, but each stage has clear inputs and outputs. This design makes the system easy to understand and debug.
14.2.2 Three-Stage Research Process
The TODO-driven research process is divided into three stages: Planning, Execution, and Reporting. Each stage has a dedicated Agent responsible for it.
(1) Stage 1: Planning
The goal of the planning stage is to decompose the research topic into 3-5 subtasks. The system receives the research topic and current date as input, and outputs a JSON-format list of subtasks. Each subtask contains three fields: title (task title), intent (research intent), and query (search query).
The research planning Agent adopts different decomposition strategies based on topic characteristics, usually starting with basic concepts, then understanding technical status, practical applications, and development trends, and conducting comparative analysis when necessary. For example, for "What kind of organization is Datawhale?", the planning Agent might generate the following subtasks:
[
{
"title": "Basic information about Datawhale",
"intent": "Understand Datawhale's organizational positioning, founding time, development history",
"query": "Datawhale organization introduction history 2024"
},
{
"title": "Main projects of Datawhale",
"intent": "Understand Datawhale's core open source projects and tutorials",
"query": "Datawhale projects tutorials open source 2024"
},
...
]A good plan should be comprehensive, logically clear, have precise queries, and an appropriate number of items.
(2) Stage 2: Execution
The execution stage executes each subtask one by one, searching and summarizing relevant materials. The system receives the subtask list and search engine configuration as input, and outputs a summary (Markdown format) and source citation list for each subtask. The execution process is as follows:
For each subtask, the executor will:
Search for materials: Use the configured search engine to execute the search
pythonsearch_results = search_tool.run({ "input": task.query, "backend": "tavily", "mode": "structured", "max_results": 5 })Get search results: Extract title, URL, snippet
json{ "results": [ { "title": "What is a Multimodal Model?", "url": "https://example.com/multimodal-model", "snippet": "A multimodal model is an AI model that can process multiple types of data..." }, ... ] }Call summarization Agent: Summarize search results
pythonsummary = summarizer_agent.run( task=task, search_results=search_results )Record summary and sources: Save to NoteTool
pythonnote_tool.run({ "action": "create", "title": task.title, "content": f"## {task.title}\n\n{summary}\n\n## Sources\n{sources}", "tags": ["research", "summary"] })
The task summarization Agent will extract core viewpoints from each search result, merge similar information, retain important numbers, dates, names and other key data, and add source citations for each viewpoint. For example, for the search results of "Basic information about Datawhale", the summarization Agent might generate:
## Basic Information about Datawhale
Datawhale is an open source organization focused on data science and AI, founded in 2018[1]. The organization's core mission is "for the learner, grow together with learners", committed to building a pure learning community[2].
**Core Positioning:**
1. **Open Source Education Platform**: Provides high-quality AI and data science learning resources[1]
2. **Learner Community**: Gathers tens of thousands of AI learners and practitioners[3]
3. **Knowledge Sharing**: Advocates open source spirit, all content is completely free and open[2]
**Development History:**
- **2018**: Datawhale was founded, released first open source tutorial[1]
- **2020**: Became one of the leading AI learning communities in China[3]
- **2024**: Released 50+ open source projects, impacting 100,000+ learners[4]
## Sources
[1] https://github.com/datawhalechina
[2] https://datawhale.club/about
[3] https://www.zhihu.com/org/datawhale
[4] https://datawhale.cnDuring execution, the system will push progress information to the front-end in real-time:
{
"type": "status",
"message": "Searching: Basic information about Datawhale"
}{
"type": "status",
"message": "Summarizing search results..."
}{
"type": "task",
"task": {
"id": 1,
"title": "Basic information about Datawhale",
"status": "completed"
}
}(3) Stage 3: Reporting
The goal of the reporting stage is to integrate the summaries of all subtasks and generate the final report. The system receives the summaries of all subtasks and the research topic as input, and outputs the final report in Markdown format. The report contains five parts: title, overview, detailed analysis of each subtask, summary, and references. For example, for "What kind of organization is Datawhale?", the final report might be:
# What Kind of Organization is Datawhale?
## Overview
This report systematically researched the open source organization Datawhale, covering four aspects: basic information, main projects, community culture, and influence.
## 1. Basic Information about Datawhale
Datawhale is an open source organization focused on data science and AI, founded in 2018...
(Insert summary of subtask 1 here)
## 2. Main Projects of Datawhale
Datawhale has released multiple high-quality open source tutorials, including Hello-Agents, Joyful-Pandas, etc...
(Insert summary of subtask 2 here)
...
## Summary
Through this research, we learned about Datawhale's organizational positioning, core projects, community culture, and social contributions. Datawhale is a pure learning community that has made important contributions to AI education.
## References
[1] https://github.com/datawhalechina
[2] https://datawhale.club/about
...The report generation Agent will organize content in the logical order of subtasks, add a brief overview at the beginning, merge duplicate information, unify Markdown format, and organize all source citations into the references section.
14.3 Agent System Design
14.3.1 Agent Responsibility Division
In the deep research assistant, we designed three specialized Agents, each responsible for a specific task. This makes each Agent simple, easy to understand and maintain.
In Chapter 7, we learned how to use SimpleAgent to build agents. The design philosophy of SimpleAgent is simple and direct: each time the run() method is called, the Agent analyzes the user's question, decides whether to call tools, and then returns the result. This design is very effective when handling simple tasks, but when facing complex tasks like deep research, we need to continue using a multi-agent collaboration approach.
As shown in Table 14.1, the three Agents are respectively responsible for planning, summarization, and report generation.
Table 14.1 Responsibility Division of Three Agents
Let's introduce the design of each Agent in detail.
Agent 1: Research Planning Expert (TODO Planner)
Responsibility: Decompose research topics into 3-5 subtasks
Design Philosophy: The core task of the research planning expert is to understand the user's research topic, analyze the key aspects of the topic, and then generate a series of subtasks. This process is similar to the "brainstorming" stage of human researchers before starting research.
Prompt Design:
todo_planner_instructions = """
You are a research planning expert. Your task is to decompose the user's research topic into 3-5 subtasks.
Current date: {current_date}
Research topic: {research_topic}
Please analyze this research topic and decompose it into 3-5 subtasks. Each subtask should:
1. Cover an important aspect of the topic
2. Have a clear research objective
3. Be able to find relevant materials through search engines
Please return the subtask list in JSON format, each subtask containing:
- title: Task title (concise and clear)
- intent: Task intent (why research this)
- query: Search query (query string for search engines, can use English for better search results)
Example output:
[
{{
"title": "What is a multimodal model",
"intent": "Understand the basic concepts of multimodal models to lay the foundation for subsequent research",
"query": "multimodal model definition concept 2024"
}},
...
]
Please ensure:
1. Number of subtasks is between 3-5
2. Subtasks have logical relationships (e.g., from basics to applications, from current status to trends)
3. Search queries can accurately find relevant materials
4. Only return JSON, do not include other text
"""Key Design Points: The prompt includes the current date to get the latest information, explicitly requires JSON format output for easy parsing, helps the Agent understand expected output through examples, and emphasizes constraints such as number of subtasks and logical relationships.
Implementation Code:
The ToolAwareSimpleAgent here is an extension of SimpleAgent. You can learn about it in Section 14.3.2, no need to delve into it here.
class PlanningService:
def __init__(self, llm: HelloAgentsLLM):
self._agent = ToolAwareSimpleAgent(
name="TODO Planner",
system_prompt="You are a research planning expert",
llm=llm,
tool_call_listener=self._on_tool_call
)
def plan_todo_list(self, state: SummaryState) -> List[TodoItem]:
prompt = todo_planner_instructions.format(
current_date=get_current_date(),
research_topic=state.research_topic,
)
response = self._agent.run(prompt)
tasks_payload = self._extract_tasks(response)
todo_items = []
for idx, item in enumerate(tasks_payload, start=1):
task = TodoItem(
id=idx,
title=item["title"],
intent=item["intent"],
query=item["query"],
)
todo_items.append(task)
return todo_items
def _extract_tasks(self, response: str) -> List[dict]:
"""Extract JSON from Agent response"""
# Use regex to extract JSON part
json_match = re.search(r'\[.*\]', response, re.DOTALL)
if json_match:
json_str = json_match.group(0)
return json.loads(json_str)
else:
raise ValueError("Unable to extract JSON from response")Agent 2: Task Summarization Expert (Task Summarizer)
Responsibility: Summarize search results, extract key information
Design Philosophy: The core task of the task summarization expert is to read search results, extract key information, and present it in a structured way. This process is similar to human researchers taking notes after reading literature.
Prompt Design:
task_summarizer_instructions = """
You are a task summarization expert. Your task is to summarize search results and extract key information.
Task title: {task_title}
Task intent: {task_intent}
Search query: {task_query}
Search results:
{search_results}
Please carefully read the above search results, extract key information, and return a summary in Markdown format.
The summary should include:
1. **Core Viewpoints**: Core viewpoints and conclusions from search results
2. **Key Data**: Important numbers, dates, names, etc.
3. **Source Citations**: Add source citations for each viewpoint (using [1], [2], etc.)
Please ensure:
1. Summary is concise and clear, avoiding redundancy
2. Retain important details and data
3. Add source citations for each viewpoint
4. Use Markdown format (headings, lists, bold, etc.)
Example output:
## Core Viewpoints
Multimodal models are AI models that can process multiple types of data[1]. Unlike traditional unimodal models, multimodal models can simultaneously understand text, images, audio, etc.[2].
**Key Features:**
- Cross-modal understanding[1]
- Unified representation[3]
- End-to-end training[2]
## Sources
[1] https://example.com/source1
[2] https://example.com/source2
[3] https://example.com/source3
"""Key Design Points: The prompt includes task title, intent, query and other context to help the Agent understand the task, explicitly requires output to include core viewpoints, key data, and source citations, emphasizes adding source citations for each viewpoint, and helps the Agent understand the expected output format through examples.
Implementation Code:
class SummarizationService:
def __init__(self, llm: HelloAgentsLLM):
self._agent = ToolAwareSimpleAgent(
name="Task Summarizer",
system_prompt="You are a task summarization expert",
llm=llm,
tool_call_listener=self._on_tool_call
)
def summarize_task(
self,
task: TodoItem,
search_results: List[dict]
) -> str:
# Format search results
formatted_sources = self._format_sources(search_results)
prompt = task_summarizer_instructions.format(
task_title=task.title,
task_intent=task.intent,
task_query=task.query,
search_results=formatted_sources,
)
summary = self._agent.run(prompt)
return summary
def _format_sources(self, search_results: List[dict]) -> str:
"""Format search results"""
formatted = []
for idx, result in enumerate(search_results, start=1):
formatted.append(
f"[{idx}] {result['title']}\n"
f"URL: {result['url']}\n"
f"Snippet: {result['snippet']}\n"
)
return "\n".join(formatted)Agent 3: Report Writing Expert (Report Writer)
Responsibility: Integrate summaries of all subtasks and generate final report
Design Philosophy: The core task of the report writing expert is to integrate the summaries of all subtasks into a structured report. This process is similar to human researchers writing research reports after completing all investigations.
Prompt Design:
report_writer_instructions = """
You are a report writing expert. Your task is to integrate the summaries of all subtasks and generate a structured research report.
Research topic: {research_topic}
Subtask summaries:
{task_summaries}
Please integrate all the above subtask summaries and generate a structured research report.
The report should include:
1. **Title**: Research topic
2. **Overview**: Briefly introduce the research topic and report structure (2-3 paragraphs)
3. **Detailed Analysis of Each Subtask**: Organize in logical order (using level-2 headings)
4. **Summary**: Summarize the main findings of the research (1-2 paragraphs)
5. **References**: All source citations (grouped by subtask)
Please ensure:
1. Report structure is clear and logically coherent
2. Eliminate duplicate information
3. Retain all source citations
4. Use Markdown format
Example output:
# Latest Advances in Multimodal Large Models
## Overview
This report systematically researched the latest advances in multimodal large models...
## 1. What is a Multimodal Model
(Insert summary of subtask 1 here)
## 2. What are the Latest Multimodal Models
(Insert summary of subtask 2 here)
...
## Summary
Through this research, we learned about...
## References
### Task 1: What is a Multimodal Model
[1] https://example.com/source1
...
"""Key Design Points: The prompt explicitly requires the report to include title, overview, detailed analysis, summary, references and other structures, emphasizes organizing content in logical order, requires merging duplicate information to eliminate redundancy, and retains all source citations.
Implementation Code:
class ReportingService:
def __init__(self, llm: HelloAgentsLLM):
self._agent = ToolAwareSimpleAgent(
name="Report Writer",
system_prompt="You are a report writing expert",
llm=llm,
tool_call_listener=self._on_tool_call
)
def generate_report(
self,
research_topic: str,
task_summaries: List[Tuple[TodoItem, str]]
) -> str:
# Format subtask summaries
formatted_summaries = self._format_summaries(task_summaries)
prompt = report_writer_instructions.format(
research_topic=research_topic,
task_summaries=formatted_summaries,
)
report = self._agent.run(prompt)
return report
def _format_summaries(
self,
task_summaries: List[Tuple[TodoItem, str]]
) -> str:
"""Format subtask summaries"""
formatted = []
for idx, (task, summary) in enumerate(task_summaries, start=1):
formatted.append(
f"## Task {idx}: {task.title}\n"
f"Intent: {task.intent}\n\n"
f"{summary}\n"
)
return "\n".join(formatted)14.3.2 ToolAwareSimpleAgent Design
In Chapter 7, we implemented SimpleAgent, which is the basic Agent of the HelloAgents framework. But in the deep research assistant, we need an Agent that can record tool calls. This is where ToolAwareSimpleAgent comes from.
In the deep research assistant, we need to record the tool call status of each Agent for:
- Debugging: View which tools the Agent called and what parameters were passed
- Logging: Record all operations during the research process
- Analysis: Analyze the Agent's behavior patterns
- Progress Display: Show in real-time what the Agent is doing
SimpleAgent itself does not support tool call listening, so we need to extend it.
ToolAwareSimpleAgent adds a tool_call_listener parameter on top of SimpleAgent. This is a callback function that is called every time a tool is called.
Usage Example:
from hello_agents import ToolAwareSimpleAgent
def tool_listener(call_info):
print(f"Agent: {call_info['agent_name']}")
print(f"Tool: {call_info['tool_name']}")
print(f"Parameters: {call_info['parsed_parameters']}")
print(f"Result: {call_info['result']}")
agent = ToolAwareSimpleAgent(
name="Research Assistant",
system_prompt="You are a research assistant",
llm=llm,
tool_call_listener=tool_listener
)ToolAwareSimpleAgent inherits from SimpleAgent and overrides the _execute_tool_call method:
class ToolAwareSimpleAgent(SimpleAgent):
def __init__(
self,
name: str,
system_prompt: str,
llm: HelloAgentsLLM,
tool_registry: Optional[ToolRegistry] = None,
tool_call_listener: Optional[Callable] = None,
):
super().__init__(
name=name,
system_prompt=system_prompt,
llm=llm,
tool_registry=tool_registry,
)
self._tool_call_listener = tool_call_listener
def _execute_tool_call(self, tool_name: str, parameters: str) -> str:
"""Execute tool call and notify listener"""
# Parse parameters
parsed_parameters = self._parse_parameters(parameters)
# Call tool
result = super()._execute_tool_call(tool_name, parameters)
# Notify listener
if self._tool_call_listener:
self._tool_call_listener({
"agent_name": self.name,
"tool_name": tool_name,
"parsed_parameters": parsed_parameters,
"result": result,
})
return resultIn the deep research assistant, we use ToolAwareSimpleAgent to record all Agent tool calls:
class DeepResearchAgent:
def __init__(self, config: Configuration):
self.config = config
self.llm = HelloAgentsLLM(...)
# Create tool call listener
def tool_listener(call_info):
self._emit_event({
"type": "tool_call",
"agent": call_info["agent_name"],
"tool": call_info["tool_name"],
"parameters": call_info["parsed_parameters"],
})
# Create three Agents, all using the same listener
self.planner = PlanningService(self.llm, tool_listener)
self.summarizer = SummarizationService(self.llm, tool_listener)
self.reporter = ReportingService(self.llm, tool_listener)This way, all Agent tool calls are recorded and pushed to the front-end via SSE, displayed to the user in real-time.
14.3.3 Agent Collaboration Mode
The three Agents have a sequential collaboration relationship, as shown in Figure 14.6.
Figure 14.6 Agent Collaboration Process
The characteristics of the sequential collaboration mode are:
- Linear Process: Agents execute in a fixed order
- Clear Input and Output: Each Agent's input comes from the previous Agent's output
- No Concurrency: Only one Agent is working at the same time
DeepResearchAgent is the core coordinator of the entire system, responsible for scheduling the three Agents:
class DeepResearchAgent:
def run(self, research_topic: str) -> str:
# 1. Planning stage
self._emit_event({"type": "status", "message": "Planning research tasks..."})
todo_list = self.planner.plan_todo_list(research_topic)
self._emit_event({"type": "tasks", "tasks": todo_list})
# 2. Execution stage
task_summaries = []
for task in todo_list:
self._emit_event({
"type": "status",
"message": f"Researching: {task.title}"
})
# Search
search_results = self.search_service.search(task.query)
# Summarize
summary = self.summarizer.summarize_task(task, search_results)
task_summaries.append((task, summary))
self._emit_event({
"type": "task_completed",
"task_id": task.id
})
# 3. Reporting stage
self._emit_event({"type": "status", "message": "Generating report..."})
report = self.reporter.generate_report(research_topic, task_summaries)
self._emit_event({"type": "report", "content": report})
return report14.4 Tool System Integration
14.4.1 SearchTool Extension
In Chapter 7, we implemented the basic version of SearchTool, integrating Tavily and SerpApi search engines, demonstrating the design idea of multi-source search. In this chapter's deep research assistant, we further extended the capabilities of SearchTool, adding DuckDuckGo, Perplexity, SearXNG and other search engines, and implementing Advanced mode (combining multiple search engines). Search is the most core function of the deep research assistant, and these extensions enable the system to adapt to different usage scenarios and needs.
As shown in Table 14.2, the search engines added this time have different characteristics and applicable scenarios.
Table 14.2 Multi-Search Engine Comparison
We will no longer discuss how to extend separately. You can refer to the source code and the extension cases in Chapter 7 for implementation. SearchTool provides a unified search interface. No matter which search engine is used, the calling method is the same.
In the deep research assistant, we select the search engine through the configuration file:
# config.py
class SearchAPI(str, Enum):
TAVILY = "tavily"
DUCKDUCKGO = "duckduckgo"
PERPLEXITY = "perplexity"
SEARXNG = "searxng"
ADVANCED = "advanced"
class Configuration(BaseModel):
search_api: SearchAPI = SearchAPI.DUCKDUCKGO
# ...# .env
SEARCH_API=tavilyThis way, users can select the search engine by modifying the .env file without modifying the code.
The result returned by SearchTool is a dictionary containing:
results: List of search results, each result contains title, URL, snippetbackend: Search engine usedanswer: AI-generated answer (Perplexity only)notices: Notification information (such as API limits, errors, etc.)
Here are some special case handling.
Search results may contain duplicate URLs, we need to deduplicate:
def deduplicate_sources(sources: List[dict]) -> List[dict]:
"""Remove duplicate URLs"""
seen_urls = set()
unique_sources = []
for source in sources:
if source["url"] not in seen_urls:
seen_urls.add(source["url"])
unique_sources.append(source)
return unique_sourcesSearch results may contain a large amount of text, we need to limit the number of tokens for each source:
def limit_source_tokens(source: dict, max_tokens: int = 2000) -> dict:
"""Limit the number of tokens for a source"""
snippet = source["snippet"]
# Simple token estimation: 1 token is approximately 4 characters
max_chars = max_tokens * 4
if len(snippet) > max_chars:
snippet = snippet[:max_chars] + "..."
return {
**source,
"snippet": snippet
}14.4.2 NoteTool Usage
In the deep research assistant, we use NoteTool to persist research progress. NoteTool is a built-in tool integrated in Chapter 9, used to create, read, update, and delete notes.
During the research process, we need to record the search results, summaries, and final research report for each subtask. This information needs to be persisted to disk so that research can continue from the last progress when interrupted, and it is also convenient to view all operations during the research process and analyze the quality and efficiency of the research.
NoteTool stores notes in the specified workspace directory, with each note being a Markdown file. The note filename is the task ID, and the content includes task title, task intent, search query, search results, and summary.
The final generated file style will be in the following tree structure:
workspace/
├── notes/
│ ├── 1.md # Notes for task 1
│ ├── 2.md # Notes for task 2
│ ├── 3.md # Notes for task 3
│ └── ...
└── reports/
└── final_report.md # Final reportIn the deep research assistant, we use NoteTool to record the research progress of each subtask:
class NotesService:
def __init__(self, workspace: str):
self.note_tool = NoteTool(workspace=workspace)
def save_task_summary(
self,
task: TodoItem,
search_results: List[dict],
summary: str
):
"""Save task summary"""
# Format note content
content = self._format_note_content(
task=task,
search_results=search_results,
summary=summary
)
# Create note
self.note_tool.run({
"action": "create",
"title": f"Task {task.id}: {task.title}",
"content": content,
"tags": ["research", "summary"]
})
def _format_note_content(
self,
task: TodoItem,
search_results: List[dict],
summary: str
) -> str:
"""Format note content"""
content = f"# Task {task.id}: {task.title}\n\n"
content += f"## Task Information\n\n"
content += f"- **Intent**: {task.intent}\n"
content += f"- **Query**: {task.query}\n\n"
content += f"## Search Results\n\n"
for idx, result in enumerate(search_results, start=1):
content += f"[{idx}] {result['title']}\n"
content += f"URL: {result['url']}\n"
content += f"Snippet: {result['snippet']}\n\n"
content += f"## Summary\n\n{summary}\n"
return content14.4.3 ToolRegistry Tool Management
ToolRegistry is the tool registry of the HelloAgents framework, also supported in our Chapter 7, used to manage the registration and invocation of all tools. In the deep research assistant, we use ToolRegistry to manage SearchTool and NoteTool.
Before creating an Agent, we need to register tools first:
from hello_agents import ToolAwareSimpleAgent
from hello_agents.tools import ToolRegistry
from hello_agents.tools import SearchTool
from hello_agents.tools import NoteTool
# Create tools
search_tool = SearchTool(backend="hybrid")
note_tool = NoteTool(workspace="./workspace/notes")
# Create registry
registry = ToolRegistry()
# Register tools
registry.register_tool(search_tool)
registry.register_tool(note_tool)
# Create Agent
agent = ToolAwareSimpleAgent(
name="Research Assistant",
system_prompt="You are a research assistant",
llm=llm,
tool_registry=registry
)When an Agent needs to call a tool, it generates a tool call instruction, as shown in Figure 14.7.
Figure 14.7 Tool Call Process
Tool Call Process:
- Agent generates instruction: Agent generates tool call instruction, such as
[TOOL_CALL:search_tool:{"input": "Datawhale organization", "backend": "tavily"}] - Parse instruction:
ToolRegistryparses the instruction, extracts tool name and parameters - Find tool:
ToolRegistryfinds the corresponding tool based on the tool name - Call tool: Call the tool's
runmethod, passing in parameters - Return result: Tool returns execution result
- Format result: Format the result as a string and return it to the Agent
14.5 Service Layer Implementation
This section will introduce the implementation of core services in detail, including PlanningService, SummarizationService, ReportingService, and SearchService. These services are the bridge connecting Agents and tools, responsible for specific business logic.
14.5.1 Task Planning Service
PlanningService is responsible for calling the research planning Agent to decompose the research topic into subtasks. This is the first and most critical step of the entire research process.
(1) Implementation Approach
Its core responsibilities are:
- Build planning Prompt: Build Prompt based on research topic and current date
- Call planning Agent: Call TODO Planner Agent to generate subtask list
- Parse JSON response: Extract JSON-format subtask list from Agent's response
- Validate subtask format: Ensure each subtask contains required fields (title, intent, query)
import re
import json
from typing import List, Callable, Optional
from datetime import datetime
from hello_agents import HelloAgentsLLM
from hello_agents import ToolAwareSimpleAgent
from models import TodoItem, SummaryState
from prompts import todo_planner_instructions
class PlanningService:
"""Task planning service"""
def __init__(
self,
llm: HelloAgentsLLM,
tool_call_listener: Optional[Callable] = None
):
self._llm = llm
self._tool_call_listener = tool_call_listener
# Create planning Agent
self._agent = ToolAwareSimpleAgent(
name="TODO Planner",
system_prompt="You are a research planning expert, skilled at decomposing complex research topics into clear subtasks.",
llm=llm,
tool_call_listener=tool_call_listener
)
def plan_todo_list(self, state: SummaryState) -> List[TodoItem]:
"""Plan TODO list
Args:
state: Research state, containing research topic
Returns:
Subtask list
"""
# Build Prompt
prompt = todo_planner_instructions.format(
current_date=self._get_current_date(),
research_topic=state.research_topic,
)
# Call Agent
response = self._agent.run(prompt)
# Parse JSON
tasks_payload = self._extract_tasks(response)
# Validate and create TodoItem
todo_items = []
for idx, item in enumerate(tasks_payload, start=1):
# Validate required fields
if not all(key in item for key in ["title", "intent", "query"]):
raise ValueError(f"Task {idx} is missing required fields")
task = TodoItem(
id=idx,
title=item["title"],
intent=item["intent"],
query=item["query"],
)
todo_items.append(task)
return todo_items
def _get_current_date(self) -> str:
"""Get current date"""
return datetime.now().strftime("%Y-%m-%d")
def _extract_tasks(self, response: str) -> List[dict]:
"""Extract JSON from Agent response
The Agent's response may contain extra text, such as:
"Okay, I will plan the following tasks for you:\n[{...}, {...}]\nThese tasks cover..."
We need to extract the JSON part.
"""
# Method 1: Use regex to extract JSON array
json_match = re.search(r'\[.*\]', response, re.DOTALL)
if json_match:
json_str = json_match.group(0)
try:
return json.loads(json_str)
except json.JSONDecodeError as e:
raise ValueError(f"JSON parsing failed: {e}")
# Method 2: If no JSON array is found, try to parse the entire response directly
try:
return json.loads(response)
except json.JSONDecodeError:
raise ValueError("Unable to extract JSON from response")(2) JSON Parsing and Validation
The JSON returned by the Agent may contain extra text or format errors, so we need robust parsing logic:
Common Issues:
- Contains extra text: Agent may add explanatory text before and after JSON
- Format errors: JSON may be missing quotes, commas, etc.
- Missing fields: Some subtasks may be missing required fields
Solutions:
- Use regex: Extract JSON part
- Multiple parsing strategies: First try to extract JSON array, then try to parse directly
- Field validation: Ensure each subtask contains required fields
Example:
# Agent response example 1: Contains extra text
response1 = """
Okay, I will plan the following tasks for you:
[
{
"title": "What is a multimodal model",
"intent": "Understand basic concepts",
"query": "multimodal model definition"
},
{
"title": "Latest multimodal models",
"intent": "Understand technical status",
"query": "latest multimodal models 2024"
}
]
These tasks cover the basic information and core projects of the Datawhale organization.
"""
# Extract JSON
tasks1 = service._extract_tasks(response1)
# Result: [{"title": "Basic information about Datawhale", ...}, ...]
# Agent response example 2: Pure JSON
response2 = """
[
{"title": "Basic information about Datawhale", "intent": "Understand organizational positioning", "query": "Datawhale organization introduction"},
{"title": "Main projects of Datawhale", "intent": "Understand core content", "query": "Datawhale projects tutorials 2024"}
]
"""
# Extract JSON
tasks2 = service._extract_tasks(response2)
# Result: [{"title": "What is a multimodal model", ...}, ...](3) Planning Quality Assessment
A good plan should meet the following criteria:
- Comprehensive coverage: Cover all important aspects of the topic
- Clear logic: Clear logical relationships between subtasks
- Precise queries: Search queries can accurately find relevant materials
- Appropriate quantity: 3-5 subtasks
We can add an evaluation method:
def evaluate_plan(self, todo_items: List[TodoItem]) -> dict:
"""Evaluate planning quality
Returns:
Evaluation results, including score and suggestions
"""
score = 100
suggestions = []
# Check quantity
if len(todo_items) < 3:
score -= 20
suggestions.append("Too few subtasks, may miss important information")
elif len(todo_items) > 5:
score -= 10
suggestions.append("Too many subtasks, may have redundancy")
# Check query quality
for task in todo_items:
if len(task.query.split()) < 2:
score -= 10
suggestions.append(f"Query for task '{task.title}' is too simple")
# Check logical relationships
# (More complex logic checks can be added here)
return {
"score": score,
"suggestions": suggestions
}14.5.2 Summarization Service
SummarizationService is responsible for calling the task summarization Agent to summarize search results. This is the core link of the research process and determines the quality of the research.
Its responsibilities are:
- Format search results: Format search results into readable text
- Build summarization Prompt: Build Prompt based on task information and search results
- Call summarization Agent: Call Task Summarizer Agent to generate summary
- Extract source citations: Extract source citations from summary
Core code:
from typing import List, Callable, Optional, Tuple
from hello_agents import HelloAgentsLLM
from hello_agents import ToolAwareSimpleAgent
from models import TodoItem
from prompts import task_summarizer_instructions
class SummarizationService:
"""Summarization service"""
def __init__(
self,
llm: HelloAgentsLLM,
tool_call_listener: Optional[Callable] = None
):
self._llm = llm
self._tool_call_listener = tool_call_listener
# Create summarization Agent
self._agent = ToolAwareSimpleAgent(
name="Task Summarizer",
system_prompt="You are a task summarization expert, skilled at extracting key information from search results.",
llm=llm,
tool_call_listener=tool_call_listener
)
def summarize_task(
self,
task: TodoItem,
search_results: List[dict]
) -> Tuple[str, List[str]]:
"""Summarize task
Args:
task: Task information
search_results: Search results list
Returns:
(Summary text, source URL list)
"""
# Format search results
formatted_sources = self._format_sources(search_results)
# Build Prompt
prompt = task_summarizer_instructions.format(
task_title=task.title,
task_intent=task.intent,
task_query=task.query,
search_results=formatted_sources,
)
# Call Agent
summary = self._agent.run(prompt)
# Extract source URLs
source_urls = [result["url"] for result in search_results]
return summary, source_urls
def _format_sources(self, search_results: List[dict]) -> str:
"""Format search results
Format search results into readable text, including:
- Serial number
- Title
- URL
- Snippet
"""
formatted = []
for idx, result in enumerate(search_results, start=1):
formatted.append(
f"[{idx}] {result['title']}\n"
f"URL: {result['url']}\n"
f"Snippet: {result['snippet']}\n"
)
return "\n".join(formatted)Report Structure Design
The final report should include the following parts:
References
Task 1: What is a Multimodal Model
Task 2: What are the Latest Multimodal Models
- https://example.com/gpt4v ... ...
14.5.3 Report Generation Service
ReportingService is responsible for calling the report generation Agent to integrate the summaries of all subtasks. This is the last step of the research process, generating the final research report.
Its responsibilities are:
- Format subtask summaries: Format all subtask summaries into a unified format
- Build report Prompt: Build Prompt based on research topic and subtask summaries
- Call report Agent: Call Report Writer Agent to generate final report
- Organize citations: Organize all source citations into the references section
Core Code Implementation:
from typing import List, Callable, Optional, Tuple
from hello_agents import HelloAgentsLLM
from hello_agents import ToolAwareSimpleAgent
from models import TodoItem
from prompts import report_writer_instructions
class ReportingService:
"""Report generation service"""
def __init__(
self,
llm: HelloAgentsLLM,
tool_call_listener: Optional[Callable] = None
):
self._llm = llm
self._tool_call_listener = tool_call_listener
# Create report Agent
self._agent = ToolAwareSimpleAgent(
name="Report Writer",
system_prompt="You are a report writing expert, skilled at integrating information and generating structured reports.",
llm=llm,
tool_call_listener=tool_call_listener
)
def generate_report(
self,
research_topic: str,
task_summaries: List[Tuple[TodoItem, str, List[str]]]
) -> str:
"""Generate final report
Args:
research_topic: Research topic
task_summaries: Subtask summary list, each element is (task, summary, source URL list)
Returns:
Final report (Markdown format)
"""
# Format subtask summaries
formatted_summaries = self._format_summaries(task_summaries)
# Build Prompt
prompt = report_writer_instructions.format(
research_topic=research_topic,
task_summaries=formatted_summaries,
)
# Call Agent
report = self._agent.run(prompt)
return report
def _format_summaries(
self,
task_summaries: List[Tuple[TodoItem, str, List[str]]]
) -> str:
"""Format subtask summaries
Format all subtask summaries into a unified format, including:
- Task serial number
- Task title
- Task intent
- Summary content
- Source URLs
"""
formatted = []
for idx, (task, summary, source_urls) in enumerate(task_summaries, start=1):
formatted.append(
f"## Task {idx}: {task.title}\n\n"
f"**Intent**: {task.intent}\n\n"
f"{summary}\n\n"
f"**Sources**:\n"
)
for url in source_urls:
formatted.append(f"- {url}\n")
formatted.append("\n")
return "".join(formatted)14.5.4 Search Scheduling Service
SearchService is responsible for scheduling search engines, executing searches, and returning results. This is the bridge connecting Agents and SearchTool. Here we did not adopt the usual form of having SimpleAgent directly call tools, but instead return the execution results of SearchTool to the Agent through an intermediate layer, which makes the Agent more focused on processing the obtained information.
Its responsibilities are:
- Schedule search engine: Select search engine based on configuration
- Execute search: Call SearchTool to execute search
- Process results: Deduplicate, limit tokens, format
- Error handling: Handle search failure situations
Core code:
from typing import List, Optional
import logging
from hello_agents.tools import SearchTool
from config import Configuration
logger = logging.getLogger(__name__)
class SearchService:
"""Search scheduling service"""
def __init__(self, config: Configuration):
self.config = config
# Create SearchTool
self.search_tool = SearchTool(backend="hybrid")
def search(
self,
query: str,
max_results: int = 5
) -> List[dict]:
"""Execute search
Args:
query: Search query
max_results: Maximum number of results
Returns:
Search results list
"""
try:
# Call SearchTool
raw_response = self.search_tool.run({
"input": query,
"backend": self.config.search_api.value,
"mode": "structured",
"max_results": max_results
})
# Extract results
results = raw_response.get("results", [])
# Process results
results = self._deduplicate_sources(results)
results = self._limit_source_tokens(results)
logger.info(f"Search successful: {query}, returned {len(results)} results")
return results
except Exception as e:
logger.error(f"Search failed: {query}, error: {e}")
return []
def _deduplicate_sources(self, sources: List[dict]) -> List[dict]:
"""Remove duplicate URLs"""
seen_urls = set()
unique_sources = []
for source in sources:
url = source.get("url", "")
if url and url not in seen_urls:
seen_urls.add(url)
unique_sources.append(source)
return unique_sources
def _limit_source_tokens(
self,
sources: List[dict],
max_tokens_per_source: int = 2000
) -> List[dict]:
"""Limit the number of tokens per source"""
limited_sources = []
for source in sources:
snippet = source.get("snippet", "")
# Simple token estimation: 1 token is approximately 4 characters
max_chars = max_tokens_per_source * 4
if len(snippet) > max_chars:
snippet = snippet[:max_chars] + "..."
limited_sources.append({
**source,
"snippet": snippet
})
return limited_sourcesSelect search engine based on configuration, as shown in Figure 14.8:
Figure 14.8 Search Engine Scheduling Process
Scheduling Logic:
- Read configuration: Read
SEARCH_APIconfiguration from.envfile - Select engine: Select search engine based on configuration (tavily, duckduckgo, perplexity, etc.)
- Execute search: Call SearchTool to execute search
- Process results: Deduplicate, limit tokens, format
- Return results: Return processed search results
To improve efficiency and reduce costs, we can add search result caching:
import hashlib
import json
from pathlib import Path
class SearchService:
def __init__(self, config: Configuration):
self.config = config
self.search_tool = SearchTool(backend="hybrid")
# Cache directory
self.cache_dir = Path("./cache/search")
self.cache_dir.mkdir(parents=True, exist_ok=True)
def search(
self,
query: str,
max_results: int = 5,
use_cache: bool = True
) -> List[dict]:
"""Execute search (with cache)"""
# Generate cache key
cache_key = self._generate_cache_key(query, max_results)
cache_file = self.cache_dir / f"{cache_key}.json"
# Try to read from cache
if use_cache and cache_file.exists():
logger.info(f"Reading search results from cache: {query}")
with open(cache_file, "r", encoding="utf-8") as f:
return json.load(f)
# Execute search
results = self._execute_search(query, max_results)
# Save to cache
if use_cache and results:
with open(cache_file, "w", encoding="utf-8") as f:
json.dump(results, f, ensure_ascii=False, indent=2)
return results
def _generate_cache_key(self, query: str, max_results: int) -> str:
"""Generate cache key"""
# Generate MD5 hash using query and max results
content = f"{query}_{max_results}_{self.config.search_api.value}"
return hashlib.md5(content.encode()).hexdigest()Through four core services (PlanningService, SummarizationService, ReportingService, SearchService), we built a complete research process. These services each perform their duties and collaborate through clear interfaces, achieving an automated process from research topic to final report.
14.6 Front-End Interaction Design
In the previous sections, we implemented the complete back-end system. This section will introduce the front-end interaction design in detail, including full-screen modal dialog UI, real-time progress display, and research result visualization.
14.6.1 Full-Screen Modal Dialog UI Design
The deep research assistant adopts a full-screen modal dialog UI design, which has the following advantages:
- Immersive experience: Full-screen display, avoiding distractions, focusing on research
- Clear hierarchy: Main page and research page are separated, with clear hierarchy
- Easy to close: Click the close button or press ESC key to return to the main page
- Responsive design: Adapts to different screen sizes
As shown in Figure 14.9, the full-screen modal dialog contains the following parts:
Figure 14.9 Full-Screen Modal Dialog UI
UI Components:
- Top bar: Contains research topic and close button
- Progress area: Shows current research progress (planning, execution, reporting)
- Content area: Shows research results (Markdown format)
- Bottom bar: Shows status information (such as "Researching...", "Completed")
The corresponding Vue implementation is as follows (ResearchModal.vue):
{{ researchTopic }}
<div
class="progress-fill"
:style="{ width: progressPercentage + '%' }"
>
{{ progressText }}
Researching, please wait...
{{ statusText }}
import { ref, computed, watch } from 'vue'
import { marked } from 'marked'
interface Props {
isOpen: boolean
researchTopic: string
}
const props = defineProps()
const emit = defineEmits<{
close: []
}>()
// State
const isLoading = ref(true)
const progressPercentage = ref(0)
const progressText = ref('Preparing...')
const statusText = ref('Researching...')
const markdownContent = ref('')
// Render Markdown
const renderedMarkdown = computed(() => {
return marked(markdownContent.value)
})
// Close modal
const close = () => {
emit('close')
}
// Listen for ESC key
const handleKeydown = (e: KeyboardEvent) => {
if (e.key === 'Escape') {
close()
}
}
// Add keyboard listener on mount
watch(() => props.isOpen, (isOpen) => {
if (isOpen) {
document.addEventListener('keydown', handleKeydown)
} else {
document.removeEventListener('keydown', handleKeydown)
}
})
.modal-overlay {
position: fixed;
top: 0;
left: 0;
width: 100vw;
height: 100vh;
background-color: rgba(0, 0, 0, 0.5);
display: flex;
justify-content: center;
align-items: center;
z-index: 1000;
}
...To adapt to different screen sizes, we add media queries:
/* Tablet devices */
@media (max-width: 768px) {
.modal-container {
width: 95vw;
height: 95vh;
}
.modal-header,
.progress-section,
.content-section,
.modal-footer {
padding: 15px 20px;
}
}
/* Mobile devices */
@media (max-width: 480px) {
.modal-container {
width: 100vw;
height: 100vh;
border-radius: 0;
}
.modal-header h2 {
font-size: 18px;
}
}14.6.2 Real-Time Progress Display
The deep research assistant uses SSE to implement real-time progress display. SSE is a server push technology that allows the server to actively send data to the client, which is also explained in the protocol chapter.
As shown in Figure 14.10, the SSE process includes the following steps:
Figure 14.10 SSE Process
Process Description:
- Client initiates request: Send POST request to
/api/research, containing research topic - Server establishes SSE connection: Return
text/event-streamresponse - Server pushes progress: Periodically push research progress (planning, execution, reporting)
- Client receives progress: Listen for SSE events, update UI
- Research complete: Server pushes final report, closes connection
If you want to use SSE in front-end and back-end projects, you also need to make the following configurations.
Back-End FastAPI SSE Endpoint:
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from typing import AsyncGenerator
import asyncio
import json
app = FastAPI()
async def research_stream(topic: str) -> AsyncGenerator[str, None]:
"""Research streaming generator
Generate SSE format data:
data: {"type": "progress", "data": {...}}
"""
try:
# 1. Planning stage
yield f"data: {json.dumps({'type': 'progress', 'stage': 'planning', 'percentage': 10, 'text': 'Planning research tasks...'})}\n\n"
# Call PlanningService
todo_items = await planning_service.plan_todo_list(topic)
yield f"data: {json.dumps({'type': 'plan', 'data': [item.dict() for item in todo_items]})}\n\n"
# 2. Execution stage
task_summaries = []
for idx, task in enumerate(todo_items, start=1):
# Update progress
percentage = 10 + (idx / len(todo_items)) * 70
yield f"data: {json.dumps({'type': 'progress', 'stage': 'executing', 'percentage': percentage, 'text': f'Researching task {idx}/{len(todo_items)}: {task.title}'})}\n\n"
# Search
search_results = await search_service.search(task.query)
# Summarize
summary, source_urls = await summarization_service.summarize_task(task, search_results)
task_summaries.append((task, summary, source_urls))
# Push task summary
yield f"data: {json.dumps({'type': 'task_summary', 'task_id': task.id, 'summary': summary})}\n\n"
# 3. Reporting stage
yield f"data: {json.dumps({'type': 'progress', 'stage': 'reporting', 'percentage': 90, 'text': 'Generating final report...'})}\n\n"
# Generate report
report = await reporting_service.generate_report(topic, task_summaries)
# Push final report
yield f"data: {json.dumps({'type': 'report', 'data': report})}\n\n"
# Complete
yield f"data: {json.dumps({'type': 'progress', 'stage': 'completed', 'percentage': 100, 'text': 'Research complete!'})}\n\n"
except Exception as e:
# Error handling
yield f"data: {json.dumps({'type': 'error', 'message': str(e)})}\n\n"
@app.post("/api/research")
async def research(request: ResearchRequest):
"""Research endpoint (SSE)"""
return StreamingResponse(
research_stream(request.topic),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
}
)Front-End Using EventSource to Receive SSE:
// composables/useResearch.ts
import { ref } from 'vue'
export function useResearch() {
const isLoading = ref(false)
const progressPercentage = ref(0)
const progressText = ref('')
const markdownContent = ref('')
const error = ref(null)
const startResearch = (topic: string) => {
isLoading.value = true
error.value = null
// Create EventSource
const eventSource = new EventSource(`/api/research?topic=${encodeURIComponent(topic)}`)
// Listen for messages
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data)
switch (data.type) {
case 'progress':
progressPercentage.value = data.percentage
progressText.value = data.text
break
case 'plan':
// Display planning results
console.log('Planning results:', data.data)
break
case 'task_summary':
// Append task summary to Markdown
markdownContent.value += `\n\n## Task ${data.task_id}\n\n${data.summary}`
break
case 'report':
// Display final report
markdownContent.value = data.data
break
case 'error':
error.value = data.message
eventSource.close()
isLoading.value = false
break
case 'completed':
eventSource.close()
isLoading.value = false
break
}
}
// Error handling
eventSource.onerror = (err) => {
console.error('SSE error:', err)
error.value = 'Connection failed, please retry'
eventSource.close()
isLoading.value = false
}
}
return {
isLoading,
progressPercentage,
progressText,
markdownContent,
error,
startResearch,
}
}Using in Component:
import { useResearch } from '@/composables/useResearch'
const {
isLoading,
progressPercentage,
progressText,
markdownContent,
error,
startResearch
} = useResearch()
const handleStartResearch = (topic: string) => {
startResearch(topic)
}14.6.3 Research Result Visualization
Research results are displayed in Markdown format, including titles, paragraphs, lists, quotes, and other elements. We use the marked library to convert Markdown to HTML and add custom styles.
Rendering Markdown:
import { marked } from 'marked'
// Configure marked
marked.setOptions({
breaks: true, // Support line breaks
gfm: true, // Support GitHub Flavored Markdown
})
// Render
const renderedHtml = marked(markdownContent.value)Research reports contain a large number of source citations, which we need to handle specially:
## References
### Task 1: Basic Information about Datawhale
- [Datawhale GitHub](https://github.com/datawhalechina)
- [Datawhale Official Website](https://datawhale.club)
### Task 2: Main Projects of Datawhale
- [Hello-Agents Tutorial](https://github.com/datawhalechina/Hello-Agents)
...Through full-screen modal dialog UI, SSE real-time progress display, and Markdown result visualization, we built a user-friendly front-end interface. Users can clearly see the research progress and view research results in a beautiful format.
14.7 Chapter Summary
In this chapter, we built a complete automated deep research agent system from scratch. Let's review the core points:
(1) TODO-Driven Research Paradigm
We proposed a new research paradigm - TODO-driven research. This paradigm decomposes complex research topics into executable subtasks and completes research through three stages:
- Planning stage: Decompose research topic into 3-5 subtasks, each subtask contains title, intent, and search query
- Execution stage: Execute search and summarization for each subtask, generating structured knowledge
- Reporting stage: Integrate summaries of all subtasks, generate final research report
The advantages of this paradigm are:
- Strong controllability: Each subtask has clear objectives and scope
- Reliable quality: Dedicated Agents ensure quality at each stage
- Easy to debug: Can debug each subtask individually
- Good scalability: Can easily add new subtasks or modify existing subtasks
(2) Three-Agent Collaboration System
We designed three specialized Agents, each performing their duties:
- TODO Planner (Research Planning Expert): Responsible for decomposing research topics into subtasks
- Task Summarizer (Task Summarization Expert): Responsible for summarizing search results for each subtask
- Report Writer (Report Writing Expert): Responsible for integrating summaries of all subtasks and generating final report
The advantages of this design are:
- Clear responsibilities: Each Agent focuses on a specific task
- Prompt optimization: Can customize specialized Prompts for each Agent
- Easy to maintain: Modifying one Agent does not affect other Agents
- Quality assurance: Each Agent is an "expert" in their field
(3) ToolAwareSimpleAgent Design
We extended the SimpleAgent of the HelloAgents framework and implemented ToolAwareSimpleAgent. This Agent has tool call listening capability and can:
- Listen to tool calls: Listen to each tool call through callback functions
- Real-time feedback: Push tool call information to the front-end in real-time
- Debugging support: Record all tool calls for easy debugging
This Agent has been integrated into the HelloAgents framework and can be reused in other projects.
(4) Tool System Integration
We fully utilized the tool system of the HelloAgents framework:
- SearchTool: Extended to support more search engines (Tavily, DuckDuckGo, Perplexity, etc.)
- NoteTool: Persist research progress, support recovery and auditing
- ToolRegistry: Unified management of all tools, support custom extensions
Through configuration-based design, users can easily switch search engines without modifying code.
(5) Core Service Implementation
We implemented four core services connecting Agents and tools:
- PlanningService: Call planning Agent, parse JSON, validate format
- SummarizationService: Call summarization Agent, process search results, extract sources
- ReportingService: Call report Agent, integrate summaries, generate report
- SearchService: Schedule search engines, process results, error degradation, result caching
These services each perform their duties and collaborate through clear interfaces, achieving an automated process from research topic to final report.
(6) Front-End Interaction Design
We designed a user-friendly front-end interface:
- Full-screen modal dialog: Immersive experience, clear hierarchy
- SSE real-time progress: Real-time display of research progress, good user experience
- Markdown visualization: Beautiful format, clear structure
Through the Vue 3 + TypeScript + SSE technology stack, we implemented a modern web application.
This knowledge is not only applicable to deep research assistants, but can also be applied to other AI applications. We hope readers can explore more possibilities based on this chapter and build more powerful AI systems.
In the next chapter, we will build a multi-agent system combined with a game engine - Cyber Town, exploring complex interaction and collaboration patterns between Agents. Stay tuned!