GraphRAG-04-API与CLI接口

一、API 层

1.1 索引构建 API

build_index

函数签名：

async def build_index(
    config: GraphRagConfig,
    method: IndexingMethod | str = IndexingMethod.Standard,
    is_update_run: bool = False,
    memory_profile: bool = False,
    callbacks: list[WorkflowCallbacks] | None = None,
    additional_context: dict[str, Any] | None = None,
    verbose: bool = False,
    input_documents: pd.DataFrame | None = None,
) -> list[PipelineRunResult]

参数说明：

参数	类型	默认值	说明
config	GraphRagConfig	必填	配置对象
method	IndexingMethod\|str	Standard	索引方法：standard, fast, standard-update, fast-update
is_update_run	bool	False	是否为增量更新
memory_profile	bool	False	是否启用内存性能分析
callbacks	list[WorkflowCallbacks]	None	工作流回调列表
additional_context	dict	None	额外上下文（传递给工作流状态）
verbose	bool	False	详细日志输出
input_documents	pd.DataFrame	None	直接提供文档 DataFrame（跳过加载）

返回值：

[
    PipelineRunResult(
        workflow="load_input_documents",
        result={...},
        errors=[],
        total_runtime=1.5,
    ),
    ...
]

使用示例：

import asyncio
from graphrag.config import load_config
from graphrag.api import build_index

async def main():
    config = load_config(root_dir="./my_project")
    results = await build_index(
        config=config,
        method="standard",
        verbose=True,
    )
    for result in results:
        print(f"{result.workflow}: {result.total_runtime:.2f}s")

asyncio.run(main())

1.2 查询 API

local_search

函数签名：

async def local_search(
    config: GraphRagConfig,
    entities: pd.DataFrame,
    communities: pd.DataFrame,
    community_reports: pd.DataFrame,
    text_units: pd.DataFrame,
    relationships: pd.DataFrame,
    covariates: pd.DataFrame | None,
    community_level: int,
    response_type: str,
    query: str,
    callbacks: list[QueryCallbacks] | None = None,
    verbose: bool = False,
) -> tuple[str, dict]

参数说明：

参数	类型	说明
config	GraphRagConfig	配置对象
entities	pd.DataFrame	实体表（entities.parquet）
communities	pd.DataFrame	社区表（communities.parquet）
community_reports	pd.DataFrame	社区报告表（community_reports.parquet）
text_units	pd.DataFrame	文本块表（text_units.parquet）
relationships	pd.DataFrame	关系表（relationships.parquet）
covariates	pd.DataFrame	协变量表（可选）
community_level	int	社区层级（0-N）
response_type	str	响应类型（如 “Multiple Paragraphs”）
query	str	用户查询
callbacks	list[QueryCallbacks]	查询回调（可选）
verbose	bool	详细日志

返回值：

(
    "Answer text...",  # LLM 生成的答案
    {  # 上下文数据
        "entities": [...],
        "relationships": [...],
        "text_units": [...],
        "reports": [...],
    }
)

使用示例：

import asyncio
import pandas as pd
from graphrag.config import load_config
from graphrag.api import local_search

async def main():
    config = load_config(root_dir="./my_project")
    
    # 加载索引数据
    output_dir = "./my_project/output"
    entities = pd.read_parquet(f"{output_dir}/entities.parquet")
    communities = pd.read_parquet(f"{output_dir}/communities.parquet")
    community_reports = pd.read_parquet(f"{output_dir}/community_reports.parquet")
    text_units = pd.read_parquet(f"{output_dir}/text_units.parquet")
    relationships = pd.read_parquet(f"{output_dir}/relationships.parquet")
    
    # 执行查询
    response, context = await local_search(
        config=config,
        entities=entities,
        communities=communities,
        community_reports=community_reports,
        text_units=text_units,
        relationships=relationships,
        covariates=None,
        community_level=2,
        response_type="Multiple Paragraphs",
        query="What are the key partnerships?",
    )
    
    print("Answer:", response)
    print("Context entities:", len(context["entities"]))

asyncio.run(main())

global_search

函数签名：

async def global_search(
    config: GraphRagConfig,
    entities: pd.DataFrame,
    communities: pd.DataFrame,
    community_reports: pd.DataFrame,
    community_level: int | None,
    dynamic_community_selection: bool,
    response_type: str,
    query: str,
    callbacks: list[QueryCallbacks] | None = None,
    verbose: bool = False,
) -> tuple[str, dict]

参数说明：

参数	类型	说明
community_level	int\|None	社区层级（None=所有层级）
dynamic_community_selection	bool	动态社区选择（LLM 判断相关性）

其他参数同 local_search。

流式查询 API

# 流式局部搜索
async for chunk in local_search_streaming(...):
    print(chunk, end="", flush=True)

# 流式全局搜索
async for chunk in global_search_streaming(...):
    print(chunk, end="", flush=True)

1.3 提示词调优 API

generate_indexing_prompts

函数签名：

async def generate_indexing_prompts(
    config: GraphRagConfig,
    domain: str | None = None,
    selection_method: DocSelectionType = DocSelectionType.RANDOM,
    limit: int = 15,
    max_tokens: int = 12000,
    chunk_size: int = 200,
    language: str | None = None,
    discover_entity_types: bool = True,
    output: Path = Path("prompts"),
) -> None

参数说明：

参数	类型	说明
domain	str	领域描述（如 “medical research”）
selection_method	DocSelectionType	文档选择方法：random, top, auto
limit	int	选择的文档数量
max_tokens	int	最大 token 数
chunk_size	int	文本块大小
language	str	主要语言（如 “chinese”）
discover_entity_types	bool	是否自动发现实体类型
output	Path	输出目录

使用示例：

await generate_indexing_prompts(
    config=config,
    domain="financial technology",
    selection_method="auto",
    limit=50,
    output=Path("./custom_prompts"),
)

二、CLI 接口

2.1 命令概览

命令	说明
`graphrag init`	初始化项目
`graphrag index`	构建索引
`graphrag update`	增量更新索引
`graphrag query`	执行查询
`graphrag prompt-tune`	生成定制化提示词

2.2 graphrag init

功能：初始化 GraphRAG 项目，生成默认配置文件和目录结构。

语法：

graphrag init [OPTIONS]

选项：

选项	类型	默认值	说明
–root, -r	Path	.	项目根目录
–force, -f	Flag	False	强制覆盖已存在的项目

示例：

# 在当前目录初始化
graphrag init

# 在指定目录初始化
graphrag init --root ./my_graphrag_project

# 强制覆盖
graphrag init --root ./my_graphrag_project --force

生成的目录结构：

my_graphrag_project/
├── settings.yaml        # 配置文件
├── .env                 # 环境变量（需手动创建）
├── input/               # 输入文档目录
├── output/              # 输出索引目录
├── cache/               # 缓存目录
└── prompts/             # 提示词目录（可选）

2.3 graphrag index

功能：构建知识图谱索引。

语法：

graphrag index [OPTIONS]

选项：

选项	类型	默认值	说明
–root, -r	Path	.	项目根目录
–config, -c	Path	None	配置文件路径（默认 settings.yaml）
–method, -m	str	standard	索引方法：standard, fast
–verbose, -v	Flag	False	详细日志输出
–memprofile	Flag	False	内存性能分析
–dry-run	Flag	False	仅验证配置，不执行
–cache/–no-cache	Flag	True	是否启用缓存
–skip-validation	Flag	False	跳过配置验证
–output, -o	Path	None	输出目录（覆盖配置）

示例：

# 基本用法
graphrag index --root ./my_project

# 使用自定义配置
graphrag index --config ./custom_settings.yaml

# Fast 模式（跳过 LLM）
graphrag index --method fast

# 详细日志
graphrag index --verbose

# 禁用缓存（重新计算所有）
graphrag index --no-cache

# 仅验证配置
graphrag index --dry-run

2.4 graphrag update

功能：增量更新现有索引。

语法：

graphrag update [OPTIONS]

选项：同 graphrag index，但默认 method 为 standard-update。

示例：

# 增量更新
graphrag update --root ./my_project

# Fast 模式增量更新
graphrag update --method fast --root ./my_project

2.5 graphrag query

功能：执行知识图谱查询。

语法：

graphrag query [OPTIONS]

选项：

选项	类型	默认值	说明
–method, -m	str	必填	查询方法：local, global, drift, basic
–query, -q	str	必填	查询文本
–root, -r	Path	.	项目根目录
–config, -c	Path	None	配置文件路径
–data, -d	Path	None	索引数据目录（默认 output/）
–community-level	int	2	社区层级
–dynamic-community-selection	Flag	False	动态社区选择（仅 global）
–response-type	str	Multiple Paragraphs	响应类型
–streaming	Flag	False	流式输出
–verbose, -v	Flag	False	详细日志

示例：

# 局部搜索
graphrag query \
  --method local \
  --query "What is the relationship between Microsoft and OpenAI?" \
  --root ./my_project

# 全局搜索
graphrag query \
  --method global \
  --query "What are the main trends in AI?" \
  --root ./my_project

# 动态社区选择
graphrag query \
  --method global \
  --query "Explain the AI ecosystem" \
  --dynamic-community-selection \
  --root ./my_project

# 流式输出
graphrag query \
  --method local \
  --query "Summarize the key findings" \
  --streaming \
  --root ./my_project

# 自定义响应类型
graphrag query \
  --method local \
  --query "List the top 5 entities" \
  --response-type "Bullet List" \
  --root ./my_project

2.6 graphrag prompt-tune

功能：根据领域数据自动生成定制化提示词。

语法：

graphrag prompt-tune [OPTIONS]

选项：

选项	类型	默认值	说明
–root, -r	Path	.	项目根目录
–config, -c	Path	None	配置文件路径
–domain	str	None	领域描述（如 “medical research”）
–selection-method	str	random	文档选择方法：random, top, auto
–n-subset-max	int	300	auto 模式下的最大子集数
–k	int	15	auto 模式下每个聚类选择的文档数
–limit	int	15	random/top 模式下的文档数
–max-tokens	int	12000	最大 token 数
–chunk-size	int	200	文本块大小
–overlap	int	100	文本块重叠
–language	str	None	主要语言
–discover-entity-types	Flag	True	自动发现实体类型
–output, -o	Path	prompts	输出目录
–verbose, -v	Flag	False	详细日志

示例：

# 基本用法（随机选择文档）
graphrag prompt-tune --root ./my_project

# 指定领域
graphrag prompt-tune \
  --domain "financial technology" \
  --root ./my_project

# 使用 auto 模式（聚类选择）
graphrag prompt-tune \
  --selection-method auto \
  --n-subset-max 500 \
  --k 20 \
  --root ./my_project

# 指定语言
graphrag prompt-tune \
  --language chinese \
  --root ./my_project

# 自定义输出目录
graphrag prompt-tune \
  --output ./custom_prompts \
  --root ./my_project

三、回调机制（Callbacks）

3.1 WorkflowCallbacks

用于索引构建过程的回调接口：

class WorkflowCallbacks:
    def on_pipeline_start(self, workflows: list[str]):
        """管道开始"""
    
    def on_pipeline_end(self, results: list[PipelineRunResult]):
        """管道结束"""
    
    def on_workflow_start(self, workflow_name: str):
        """工作流开始"""
    
    def on_workflow_end(self, workflow_name: str, result: Any):
        """工作流结束"""
    
    def on_error(self, message: str, cause: Exception | None = None):
        """错误发生"""
    
    def on_warning(self, message: str):
        """警告发生"""
    
    def on_log(self, message: str):
        """日志消息"""
    
    def on_progress(self, current: int, total: int, message: str):
        """进度更新"""

使用示例：

class MyCallbacks(WorkflowCallbacks):
    def on_workflow_start(self, workflow_name: str):
        print(f"开始执行: {workflow_name}")
    
    def on_workflow_end(self, workflow_name: str, result: Any):
        print(f"完成: {workflow_name}")
    
    def on_error(self, message: str, cause: Exception | None = None):
        print(f"错误: {message}")

# 使用回调
await build_index(
    config=config,
    callbacks=[MyCallbacks()],
)

3.2 QueryCallbacks

用于查询过程的回调接口：

class QueryCallbacks:
    def on_context(self, context: Any):
        """上下文构建完成"""
    
    def on_map_response_start(self, chunks: list[str]):
        """Map 阶段开始（Global Search）"""
    
    def on_map_response_end(self, responses: list[str]):
        """Map 阶段结束"""
    
    def on_response_start(self):
        """响应生成开始"""
    
    def on_response_end(self, response: str):
        """响应生成结束"""

使用示例：

class MyQueryCallbacks(QueryCallbacks):
    def on_context(self, context: Any):
        print(f"上下文包含 {len(context['entities'])} 个实体")
    
    def on_response_end(self, response: str):
        print(f"生成答案长度: {len(response)} 字符")

# 使用回调
response, context = await local_search(
    config=config,
    ...,
    callbacks=[MyQueryCallbacks()],
)

四、CLI 最佳实践

4.1 开发工作流

# 1. 初始化项目
graphrag init --root ./my_project

# 2. 准备数据（手动将文档放入 input/）

# 3. 配置调整（编辑 settings.yaml）

# 4. 验证配置
graphrag index --dry-run --root ./my_project

# 5. 构建索引（首次使用 standard 模式）
graphrag index --verbose --root ./my_project

# 6. 测试查询
graphrag query \
  --method local \
  --query "Test query" \
  --root ./my_project

# 7. 优化提示词（可选）
graphrag prompt-tune --domain "your domain" --root ./my_project

# 8. 增量更新（添加新文档后）
graphrag update --root ./my_project

4.2 生产部署

# 使用环境变量配置
export OPENAI_API_KEY=sk-...
export AZURE_STORAGE_CONNECTION_STRING=...

# 后台运行索引构建
nohup graphrag index \
  --root /var/graphrag/project \
  --config /etc/graphrag/settings.yaml \
  --verbose \
  > /var/log/graphrag/index.log 2>&1 &

# 监控进度
tail -f /var/log/graphrag/index.log

本文档详细介绍了 GraphRAG 的 API 接口、CLI 命令和回调机制。通过这些接口，可以灵活地将 GraphRAG 集成到各种应用场景中。

GraphRAG-04-API与CLI接口#

一、API 层#

1.1 索引构建 API#

build_index#

1.2 查询 API#

local_search#

global_search#

流式查询 API#

1.3 提示词调优 API#

generate_indexing_prompts#

二、CLI 接口#

2.1 命令概览#

2.2 graphrag init#

2.3 graphrag index#

2.4 graphrag update#

2.5 graphrag query#

2.6 graphrag prompt-tune#

三、回调机制（Callbacks）#

3.1 WorkflowCallbacks#

3.2 QueryCallbacks#

四、CLI 最佳实践#

4.1 开发工作流#

4.2 生产部署#

GraphRAG-04-API与CLI接口

一、API 层

1.1 索引构建 API

build_index

1.2 查询 API

local_search

global_search

流式查询 API

1.3 提示词调优 API

generate_indexing_prompts

二、CLI 接口

2.1 命令概览

2.2 graphrag init

2.3 graphrag index

2.4 graphrag update

2.5 graphrag query

2.6 graphrag prompt-tune

三、回调机制（Callbacks）

3.1 WorkflowCallbacks

3.2 QueryCallbacks

四、CLI 最佳实践

4.1 开发工作流

4.2 生产部署