Dify-03-Agent智能体系统-完整剖析

1. 模块概览

1.1 职责与边界

Agent智能体系统是Dify平台实现自主任务规划和工具调用的核心模块，通过ReAct（Reasoning + Acting）框架和Function Calling机制，使LLM能够自主调用工具、访问知识库，并通过多轮推理解决复杂问题。

核心职责：

多模式Agent支持：Chain-of-Thought（CoT）Agent和Function Call Agent
工具调用编排：管理和调用内置工具、自定义工具、API工具、知识库工具
多轮推理迭代：支持最多99轮迭代执行，自动规划任务步骤
记忆管理：维护对话历史和推理记录（TokenBufferMemory）
流式输出：实时反馈推理过程和中间结果
上下文控制：自动管理token限制，动态裁剪历史

输入：

用户查询文本
对话历史（多轮对话）
Agent配置（max_iteration、策略、工具列表）
系统提示词

输出：

AgentThought流：推理步骤、工具调用、观察结果
最终答案
Token使用统计
工具调用记录

上下游依赖：

上游：AgentChatApp、Workflow中的Agent节点
下游：ToolManager、LLM Provider、Memory Manager、Dataset Retrieval

1.2 Agent模式对比

Dify支持两种Agent实现模式：

特性	CoT Agent	Function Call Agent
原理	ReAct框架，LLM生成Thought-Action-Observation	使用LLM原生Function Calling能力
推理模式	显式推理步骤，类似人类思考过程	隐式推理，直接返回工具调用
工具调用	单次调用一个工具	支持并行调用多个工具
适用模型	所有LLM（GPT-3.5/4、Claude等）	仅支持Function Calling的模型（GPT-3.5-turbo-1106+）
可解释性	高，输出详细推理步骤	中，推理过程隐式
性能	较慢，每轮需要生成Thought	较快，直接调用工具
适用场景	复杂推理任务、需要可解释性	简单工具调用、性能要求高

1.3 生命周期

Agent的执行生命周期分为5个阶段：

初始化阶段：创建AgentRunner实例，加载配置、初始化工具、准备记忆
推理循环阶段：
- 构建提示词（包含历史对话、可用工具、当前查询）
- 调用LLM生成响应
- 解析输出（Thought/Action/Tool Call）
- 执行工具调用
- 记录Observation
- 判断是否需要继续迭代
工具执行阶段：
- 参数验证
- 调用工具引擎
- 格式化结果
- 处理异常
记忆更新阶段：
- 将推理步骤添加到Scratchpad
- 更新对话历史
- 控制token使用
终止阶段：
- 达到max_iteration
- 获得最终答案
- 工具调用失败超限
- 输出最终结果

2. 整体架构

2.1 架构图

flowchart TB
    subgraph "Agent智能体系统架构"
        subgraph "输入层"
            Query[用户查询]
            History[对话历史]
            Config[Agent配置]
        end
        
        subgraph "AgentRunner层"
            Factory{AgentRunner<br/>Factory}
            CoT[CoT Agent<br/>ReAct框架]
            FC[Function Call Agent]
            Base[BaseAgentRunner]
        end
        
        subgraph "推理循环"
            Iterator[迭代控制器]
            Prompt[提示词构建]
            LLM[LLM调用]
            Parser[输出解析]
        end
        
        subgraph "工具管理"
            ToolMgr[ToolManager]
            Builtin[内置工具]
            Dataset[知识库工具]
            API[API工具]
            Plugin[插件工具]
        end
        
        subgraph "工具执行"
            Engine[ToolEngine]
            Validate[参数校验]
            Invoke[工具调用]
            Format[结果格式化]
        end
        
        subgraph "记忆管理"
            Memory[TokenBufferMemory]
            MsgHistory[消息历史]
            Scratchpad[推理记录]
        end
        
        subgraph "输出层"
            Stream[流式管理]
            Thought[推理步骤]
            Answer[最终答案]
        end
    end
    
    Query --> Factory
    History --> Factory
    Config --> Factory
    
    Factory --> CoT
    Factory --> FC
    CoT --> Base
    FC --> Base
    
    Base --> Iterator
    Iterator --> Prompt
    Prompt --> LLM
    LLM --> Parser
    
    Parser -->|需要工具| ToolMgr
    Parser -->|最终答案| Answer
    
    ToolMgr --> Builtin
    ToolMgr --> Dataset
    ToolMgr --> API
    ToolMgr --> Plugin
    
    Builtin --> Engine
    Dataset --> Engine
    API --> Engine
    Plugin --> Engine
    
    Engine --> Validate
    Validate --> Invoke
    Invoke --> Format
    Format -->|结果| Iterator
    
    Iterator --> Memory
    Memory --> MsgHistory
    Memory --> Scratchpad
    
    Iterator --> Stream
    Stream --> Thought
    Stream --> Answer

2.2 架构分层说明

1. 输入层

接收用户查询、对话历史和Agent配置。

关键参数：

query：用户当前问题
conversation_history：历史对话列表
max_iteration：最大迭代次数（默认5，最大99）
tools：可用工具列表
strategy：策略配置（CoT或Function Call）

2. AgentRunner层

根据配置选择合适的Agent实现。

AgentRunnerFactory：

检查模型是否支持Function Calling
如果支持且配置为Function Call，创建FunctionCallAgentRunner
否则创建CotAgentRunner

BaseAgentRunner：

提供公共功能：初始化工具、加载记忆、组织历史
定义抽象方法：_run()由子类实现

设计理由：工厂模式支持灵活切换Agent类型。基类抽取公共逻辑，减少重复代码。

3. 推理循环

控制Agent的迭代执行流程。

迭代控制器：

iteration = 0
while iteration < max_iteration:
    # 构建提示词
    prompt = self._build_prompt(query, history, scratchpad)
    
    # 调用LLM
    response = self.llm.invoke(prompt)
    
    # 解析输出
    parsed = self.parser.parse(response)
    
    # 执行工具或返回答案
    if parsed.has_tool_call:
        result = self.tool_engine.invoke(parsed.tool_call)
        scratchpad.add_observation(result)
        iteration += 1
    else:
        return parsed.answer

提示词构建：

系统提示：定义Agent角色和行为规范
工具描述：JSON Schema格式的工具说明
历史对话：TokenBuffer管理的历史记录
当前查询：用户问题

输出解析：

CoT Agent：解析Thought、Action、Action Input、Observation
Function Call Agent：解析tool_calls结构

4. 工具管理

管理和提供各种工具。

ToolManager职责：

加载工具定义
提供工具列表给LLM
路由工具调用请求

工具类型：

类型	说明	示例
内置工具	预定义的通用工具	Google搜索、天气查询、计算器
知识库工具	访问数据集	DatasetRetrieverTool
API工具	自定义HTTP请求	调用第三方API
插件工具	第三方扩展	通过Plugin系统接入

5. 工具执行

执行工具调用并返回结果。

ToolEngine流程：

参数校验：验证工具参数完整性和类型
工具调用：执行工具的_run()方法
结果格式化：将工具输出转换为标准格式
异常处理：捕获并记录工具执行错误

设计理由：统一的工具接口支持任意工具接入。异常处理保证单个工具失败不影响Agent执行。

6. 记忆管理

维护对话历史和推理记录。

TokenBufferMemory：

维护消息列表：[{role: user, content: ...}, {role: assistant, content: ...}]
Token限制：根据模型context window动态裁剪
裁剪策略：保留最近N轮对话

AgentScratchpad：

记录当前推理过程
结构：[Thought → Action → Observation] * N
用于构建提示词

设计理由：TokenBuffer防止超出模型限制。Scratchpad支持多轮推理。

7. 输出层

管理流式输出和最终结果。

流式管理器：

实时yield推理步骤（AgentThought）
支持SSE协议推送给前端
最终yield答案

3. 核心数据结构

3.1 AgentEntity（Agent配置实体）

3.1.1 类图

classDiagram
    class AgentEntity {
        +str provider
        +str model
        +AgentStrategy strategy
        +int max_iteration
        +list~AgentToolEntity~ tools
        +PromptTemplate prompt_template
    }
    
    class AgentStrategy {
        <<enumeration>>
        FUNCTION_CALL
        REACT
    }
    
    class AgentToolEntity {
        +str provider_type
        +str provider_id
        +str tool_name
        +dict tool_parameters
    }
    
    AgentEntity --> AgentStrategy
    AgentEntity "1" *-- "*" AgentToolEntity

3.1.2 字段说明

字段	类型	说明
`provider`	str	LLM提供商（openai、anthropic等）
`model`	str	模型名称（gpt-4、claude-3等）
`strategy`	AgentStrategy	策略：FUNCTION_CALL或REACT
`max_iteration`	int	最大迭代次数（1-99）
`tools`	list[AgentToolEntity]	工具列表
`prompt_template`	PromptTemplate	自定义提示词模板

3.2 AgentScratchpadUnit（推理单元）

3.2.1 类图

classDiagram
    class AgentScratchpadUnit {
        <<abstract>>
        +str id
        +int position
    }
    
    class AgentThought {
        +str thought
        +str action
        +dict action_input
    }
    
    class AgentObservation {
        +str tool_name
        +str tool_input
        +str tool_output
    }
    
    class AgentAnswer {
        +str answer
    }
    
    AgentScratchpadUnit <|-- AgentThought
    AgentScratchpadUnit <|-- AgentObservation
    AgentScratchpadUnit <|-- AgentAnswer

3.2.2 字段说明

AgentThought（思考）：

字段	类型	说明
`thought`	str	推理思考内容
`action`	str	要执行的动作（工具名）
`action_input`	dict	动作参数

AgentObservation（观察）：

字段	类型	说明
`tool_name`	str	工具名称
`tool_input`	str	工具输入
`tool_output`	str	工具输出结果

AgentAnswer（答案）：

字段	类型	说明
`answer`	str	最终答案内容

4. API详细规格

4.1 CotAgentRunner.run()

4.1.1 基本信息

名称：CotAgentRunner.run()
用途：执行CoT Agent推理流程
调用场景：Agent Chat应用、Workflow Agent节点

4.1.2 方法签名

def run(
    self,
    query: str,
    inputs: dict = None,
) -> Generator[AgentThought | AgentAnswer, None, None]:
    """执行CoT Agent推理"""

4.1.3 核心实现

def run(self, query: str, inputs: dict = None):
    # 1. 初始化
    iteration = 0
    scratchpad = AgentScratchpad()
    
    # 2. 推理循环
    while iteration < self.max_iteration:
        # 构建提示词
        prompt = self._build_prompt(query, scratchpad)
        
        # 调用LLM
        response = self.llm_client.invoke(prompt)
        
        # 解析输出
        parsed = self.output_parser.parse(response)
        
        # 输出推理步骤
        yield AgentThought(
            thought=parsed.thought,
            action=parsed.action,
            action_input=parsed.action_input,
        )
        
        # 判断是否需要工具调用
        if parsed.action == "Final Answer":
            yield AgentAnswer(answer=parsed.action_input)
            break
        
        # 执行工具调用
        tool_result = self.tool_engine.invoke(
            tool_name=parsed.action,
            tool_input=parsed.action_input,
        )
        
        # 记录观察结果
        observation = AgentObservation(
            tool_name=parsed.action,
            tool_input=str(parsed.action_input),
            tool_output=tool_result,
        )
        scratchpad.add(observation)
        yield observation
        
        iteration += 1
    
    # 达到最大迭代次数
    if iteration >= self.max_iteration:
        yield AgentAnswer(answer="抱歉，我无法在规定步骤内完成任务。")

4.1.4 调用链分析

调用路径：

AgentChatApp.run()
  └─> AgentRunnerFactory.create()
       └─> CotAgentRunner().__init__()
            ├─> _init_tools()
            ├─> _init_memory()
            └─> run()
                 ├─> _build_prompt()
                 ├─> llm_client.invoke()
                 ├─> output_parser.parse()
                 └─> tool_engine.invoke()

上层调用：AgentChatApp.run()

class AgentChatApp:
    def run(self, query: str, conversation_id: str = None):
        # 创建Agent Runner
        agent_runner = AgentRunnerFactory.create(
            tenant_id=self.tenant_id,
            app_config=self.app_config,
            model_config=self.model_config,
            agent_config=self.agent_config,
            conversation=conversation,
        )
        
        # 执行Agent
        for event in agent_runner.run(query=query):
            # 转换为App事件
            app_event = self._convert_to_app_event(event)
            yield app_event

4.2 FunctionCallAgentRunner.run()

4.2.1 基本信息

名称：FunctionCallAgentRunner.run()
用途：执行Function Call Agent推理流程
调用场景：支持Function Calling的LLM

4.2.2 核心实现

def run(self, query: str, inputs: dict = None):
    # 1. 初始化
    iteration = 0
    messages = self._build_messages(query)
    
    # 2. 推理循环
    while iteration < self.max_iteration:
        # 调用LLM（带工具定义）
        response = self.llm_client.invoke(
            messages=messages,
            tools=self._get_tool_schemas(),
            tool_choice="auto",
        )
        
        # 判断是否有工具调用
        if response.tool_calls:
            # 并行执行工具调用
            tool_results = []
            for tool_call in response.tool_calls:
                result = self.tool_engine.invoke(
                    tool_name=tool_call.function.name,
                    tool_input=json.loads(tool_call.function.arguments),
                )
                tool_results.append({
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "content": result,
                })
                
                # 输出工具调用
                yield AgentThought(
                    action=tool_call.function.name,
                    action_input=tool_call.function.arguments,
                )
            
            # 将结果添加到消息列表
            messages.extend(tool_results)
            iteration += 1
        else:
            # 返回最终答案
            yield AgentAnswer(answer=response.content)
            break

4.2.3 关键差异

特性	CoT Agent	Function Call Agent
提示词	包含ReAct模板	标准对话格式
工具定义	文本描述	JSON Schema
工具调用	单次一个	支持并行多个
输出格式	需要解析Thought/Action	结构化tool_calls

5. 执行流程与时序

5.1 CoT Agent执行流程

5.1.1 时序图

sequenceDiagram
    autonumber
    participant U as 用户
    participant App as AgentChatApp
    participant CoT as CotAgentRunner
    participant LLM as LLM Client
    participant Parser as OutputParser
    participant TE as ToolEngine
    participant Tool as 工具
    
    U->>App: 发送查询
    App->>CoT: run(query)
    
    loop 推理循环 (max_iteration)
        CoT->>CoT: _build_prompt()
        CoT->>LLM: invoke(prompt)
        LLM-->>CoT: 响应文本
        
        CoT->>Parser: parse(response)
        Parser-->>CoT: {thought, action, action_input}
        
        CoT-->>App: yield AgentThought
        
        alt 最终答案
            CoT-->>App: yield AgentAnswer
        else 工具调用
            CoT->>TE: invoke(tool_name, tool_input)
            TE->>Tool: 执行工具
            Tool-->>TE: 工具结果
            TE-->>CoT: formatted_result
            
            CoT->>CoT: 记录Observation
            CoT-->>App: yield AgentObservation
        end
    end
    
    App-->>U: 返回结果

5.1.2 流程说明

阶段1：初始化（步骤1-2）

用户发送查询
创建CotAgentRunner实例

阶段2：推理循环（步骤3-14）

构建提示词（包含历史和Scratchpad）
调用LLM生成响应
解析输出（Thought、Action、Action Input）
输出推理步骤
判断是否为最终答案
如果不是，执行工具调用
记录Observation
继续下一轮迭代

阶段3：终止（步骤15）

返回最终答案或达到最大迭代次数

5.2 Function Call Agent执行流程

5.2.1 时序图

sequenceDiagram
    autonumber
    participant App as AgentChatApp
    participant FC as FunctionCallRunner
    participant LLM as LLM Client
    participant TE as ToolEngine
    
    App->>FC: run(query)
    FC->>FC: _build_messages()
    
    loop 推理循环
        FC->>LLM: invoke(messages, tools)
        LLM-->>FC: response with tool_calls
        
        alt 有工具调用
            par 并行执行工具
                FC->>TE: invoke(tool1)
                TE-->>FC: result1
            and
                FC->>TE: invoke(tool2)
                TE-->>FC: result2
            end
            
            FC->>FC: 添加tool结果到messages
            FC-->>App: yield AgentThought
        else 无工具调用
            FC-->>App: yield AgentAnswer
        end
    end

6. 关键功能深入分析

6.1 提示词构建

6.1.1 功能说明

提示词构建是Agent推理的基础，直接影响推理质量。

CoT Agent提示词结构：

[系统提示]
You are a helpful assistant with access to tools.

Available tools:
- google_search: Search Google for information
  Parameters: {"query": "string"}
- calculator: Perform calculations
  Parameters: {"expression": "string"}

[推理格式]
Use the following format:

Thought: your reasoning about what to do
Action: tool name to use
Action Input: tool parameters in JSON
Observation: tool result

[历史对话]
User: ...
Assistant: ...

[当前Scratchpad]
Thought: ...
Action: ...
Observation: ...

[当前查询]
User: {query}

6.1.2 核心代码

def _build_prompt(self, query: str, scratchpad: AgentScratchpad):
    # 1. 系统提示
    system_prompt = self._get_system_prompt()
    
    # 2. 工具描述
    tools_desc = self._format_tools_description()
    
    # 3. 推理格式说明
    format_instruction = self._get_format_instruction()
    
    # 4. 历史对话
    history = self.memory.get_history()
    
    # 5. 当前Scratchpad
    scratchpad_text = self._format_scratchpad(scratchpad)
    
    # 6. 组合提示词
    prompt = f"{system_prompt}\n\n{tools_desc}\n\n{format_instruction}\n\n{history}\n\n{scratchpad_text}\n\nUser: {query}"
    
    return prompt

6.2 输出解析

6.2.1 功能说明

解析LLM输出，提取Thought、Action、Action Input。

解析逻辑：

class CotOutputParser:
    def parse(self, text: str) -> ParsedOutput:
        # 使用正则提取各个部分
        thought_match = re.search(r"Thought:\s*(.+?)(?=\nAction:|$)", text, re.DOTALL)
        action_match = re.search(r"Action:\s*(.+?)(?=\nAction Input:|$)", text)
        action_input_match = re.search(r"Action Input:\s*(.+?)(?=\n|$)", text, re.DOTALL)
        
        # 提取内容
        thought = thought_match.group(1).strip() if thought_match else ""
        action = action_match.group(1).strip() if action_match else ""
        action_input_str = action_input_match.group(1).strip() if action_input_match else "{}"
        
        # 解析JSON参数
        try:
            action_input = json.loads(action_input_str)
        except:
            action_input = {"input": action_input_str}
        
        return ParsedOutput(
            thought=thought,
            action=action,
            action_input=action_input,
        )

6.3 Token管理

6.3.1 功能说明

控制提示词长度，防止超出模型限制。

TokenBufferMemory实现：

class TokenBufferMemory:
    def __init__(self, model_context_window: int):
        self.messages = []
        self.max_tokens = model_context_window - 1000  # 预留空间
    
    def add_message(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        self._trim_if_needed()
    
    def _trim_if_needed(self):
        # 计算当前token数
        total_tokens = sum(self._count_tokens(m["content"]) for m in self.messages)
        
        # 如果超出限制，删除最老的消息（保留系统消息）
        while total_tokens > self.max_tokens and len(self.messages) > 2:
            # 跳过第一条（系统消息）
            removed = self.messages.pop(1)
            total_tokens -= self._count_tokens(removed["content"])

7. 实战案例与最佳实践

7.1 案例1：构建多工具Agent助手

7.1.1 场景描述

构建一个Agent助手，支持搜索、计算、天气查询等多种工具。

7.1.2 配置方案

agent_config:
  provider: "openai"
  model: "gpt-4"
  strategy: "function_call"
  max_iteration: 5
  tools:
    - provider_type: "builtin"
      provider_id: "google"
      tool_name: "google_search"
    - provider_type: "builtin"
      provider_id: "calculator"
      tool_name: "calculate"
    - provider_type: "builtin"
      provider_id: "weather"
      tool_name: "get_weather"
  prompt_template: |
    You are a helpful assistant with access to tools.
    Use tools when necessary to answer user questions.

7.1.3 使用示例

# 创建Agent应用
app = AgentChatApp(app_config, model_config, agent_config)

# 执行查询
query = "What's the weather in San Francisco and calculate 25% tip on $80?"

for event in app.run(query):
    if isinstance(event, AgentThought):
        print(f"Thought: {event.thought}")
        print(f"Action: {event.action}({event.action_input})")
    elif isinstance(event, AgentObservation):
        print(f"Observation: {event.tool_output}")
    elif isinstance(event, AgentAnswer):
        print(f"Answer: {event.answer}")

7.2 案例2：知识库Agent

7.2.1 场景描述

构建一个能够访问企业知识库的Agent。

7.2.2 配置方案

agent_config:
  strategy: "react"
  max_iteration: 3
  tools:
    - provider_type: "dataset_retrieval"
      dataset_ids: ["kb-001", "kb-002"]
      retrieval_config:
        top_k: 3
        score_threshold: 0.7

7.2.3 效果示例

User: 公司的年假政策是什么？

Thought: 需要查询公司知识库获取年假政策
Action: dataset_retrieval
Action Input: {"query": "年假政策"}
Observation: 找到3个相关文档：
1. 员工享有15天带薪年假...
2. 年假需要提前申请...
3. 未使用年假可转至下年...

Thought: 已获取相关信息，可以回答
Action: Final Answer
Answer: 根据公司政策，员工享有15天带薪年假...

7.3 最佳实践总结

7.3.1 模式选择

何时使用CoT Agent：

需要详细推理过程
模型不支持Function Calling
复杂多步骤任务

何时使用Function Call Agent：

模型支持Function Calling
需要高性能
工具调用为主

7.3.2 参数配置

max_iteration设置：

简单任务：3-5
复杂任务：5-10
超过10通常说明任务定义不清

工具数量：

建议：3-10个工具
过多工具会增加LLM选择难度
可以通过工具分组优化

7.3.3 性能优化

1. 提示词优化

精简系统提示
清晰的工具描述
提供示例

2. 工具优化

工具命名清晰
参数说明详细
返回结果结构化

3. 记忆管理

合理设置max_tokens
定期清理无关历史
保留关键上下文

附录

A. Agent类型对比表

维度	CoT Agent	Function Call Agent
实现原理	ReAct框架	LLM原生Function Calling
模型要求	所有LLM	GPT-3.5-turbo-1106+
推理可见性	高	中
工具并行	否	是
性能	较慢	较快
准确性	高	高

B. 常见问题

Q1: Agent一直循环调用相同工具？

检查工具返回格式是否清晰
增加Scratchpad的展示
优化系统提示词
降低max_iteration

Q2: Agent不调用工具直接回答？

检查工具描述是否清晰
在系统提示中强调使用工具
提供工具使用示例

Q3: 如何优化Agent响应速度？

使用Function Call模式
减少工具数量
精简历史记录
使用更快的模型

文档版本：v1.0
生成日期：2025-10-04
维护者：Backend Team

Dify-03-Agent智能体系统-完整剖析#

目录#

1. 模块概览#

1.1 职责与边界#

1.2 Agent模式对比#

1.3 生命周期#

2. 整体架构#

2.1 架构图#

2.2 架构分层说明#

3. 核心数据结构#

3.1 AgentEntity（Agent配置实体）#

3.1.1 类图#

3.1.2 字段说明#

3.2 AgentScratchpadUnit（推理单元）#

3.2.1 类图#

3.2.2 字段说明#

4. API详细规格#

4.1 CotAgentRunner.run()#

4.1.1 基本信息#

4.1.2 方法签名#

4.1.3 核心实现#

4.1.4 调用链分析#

4.2 FunctionCallAgentRunner.run()#

4.2.1 基本信息#

4.2.2 核心实现#

4.2.3 关键差异#

5. 执行流程与时序#

5.1 CoT Agent执行流程#

5.1.1 时序图#

5.1.2 流程说明#

5.2 Function Call Agent执行流程#

5.2.1 时序图#

6. 关键功能深入分析#

6.1 提示词构建#

6.1.1 功能说明#

6.1.2 核心代码#

6.2 输出解析#

6.2.1 功能说明#

6.3 Token管理#

6.3.1 功能说明#

7. 实战案例与最佳实践#

7.1 案例1：构建多工具Agent助手#

7.1.1 场景描述#

7.1.2 配置方案#

7.1.3 使用示例#

7.2 案例2：知识库Agent#

7.2.1 场景描述#

7.2.2 配置方案#

7.2.3 效果示例#

7.3 最佳实践总结#

7.3.1 模式选择#

7.3.2 参数配置#

7.3.3 性能优化#

附录#

A. Agent类型对比表#

B. 常见问题#