kb-arena开源AI工作流 是 AI Skill Hub 本期精选Agent工作流之一。综合评分 7.5 分,整体质量较高。我们推荐使用将其纳入你的 AI 工具库,帮助提升工作效率。
kb-arena是开源的AI工作流,用于Benchmark 9 retrieval architectures(向量、上下文、QnA、知识图谱、h等),提供了多种检索架构的基准测试和评估工具,帮助开发者优化和比较不同检索模型的性能。
kb-arena开源AI工作流 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
kb-arena是开源的AI工作流,用于Benchmark 9 retrieval architectures(向量、上下文、QnA、知识图谱、h等),提供了多种检索架构的基准测试和评估工具,帮助开发者优化和比较不同检索模型的性能。
kb-arena开源AI工作流 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
# 方式一:pip 安装(推荐)
pip install kb-arena
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install kb-arena
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/xmpuspus/kb-arena
cd kb-arena
pip install -e .
# 验证安装
python -c "import kb_arena; print('安装成功')"
# 命令行使用
kb-arena --help
# 基本用法
kb-arena input_file -o output_file
# Python 代码中调用
import kb_arena
# 示例
result = kb_arena.process("input")
print(result)
# kb-arena 配置文件示例(config.yml) app: name: "kb-arena" debug: false log_level: "INFO" # 运行时指定配置文件 kb-arena --config config.yml # 或通过环境变量配置 export KB_ARENA_API_KEY="your-key" export KB_ARENA_OUTPUT_DIR="./output"
Should you use Graph RAG, Vector RAG, or Hybrid? KB Arena tells you — empirically, on your own docs.
Nine retrieval architectures. Your documentation. One winner.
KB Arena is the only open-source benchmark that runs architecturally distinct retrieval strategies — naive vector, contextual vector, Q&A pairs, knowledge graph, hybrid (RRF-fused), RAPTOR, PageIndex, BM25, and rerank-vector (cross-encoder reranking) — head-to-head on your own corpus, with auto-generated questions across 5 difficulty tiers, IR metrics (Recall@k, MRR, NDCG@k), RAGAS metrics, ELO arena voting, a CI gate, and a strategy plugin system.
Embeddings: pluggable across OpenAI, Voyage-3, Cohere, Gemini, BGE (local), Ollama (local) via KB_ARENA_EMBEDDING_PROVIDER. Rerankers: BGE-v2-m3 (local), Cohere Rerank, Voyage Rerank via KB_ARENA_RERANKER_BACKEND.

---
KB_ARENA_EMBEDDING_PROVIDER selects the embedding backend used by every vector strategy:
| Provider | Why pick it |
|---|---|
openai (default) | text-embedding-3-large |
voyage | Current MTEB retrieval leader (+10.58% over OpenAI at matched dims) |
cohere | Cohere embed-v4 |
bge | BAAI/bge-large-en-v1.5 — **local, no key**, on-prem-friendly |
ollama | Local via Ollama, no key |
gemini | text-embedding-004 |
Unblocks privacy / on-prem teams (federal, healthcare, finance) and Gemini-shop / Bedrock-shop deployments.
optimize used to report "improved: True, delta=+0.0033" with no honesty layer. v0.8.0 fixes that and adds the metrics every IR benchmark since 2010 expects.
Two changes that close the only gaps a direct competitor (AutoRAG) had on us, and fix the most visible methodological hole in our own numbers.
A focused release that closes the four ship-blocker classes from a multi-dimension audit and adds three differentiated capabilities. Headline numbers in the README are now backed by code that does what it says.
Classical IR metrics computed at the chunk level. See exactly which chunks each strategy surfaced, which it missed, and why one strategy beats another at a metric level — not just at the answer level.

That's it. No Neo4j expertise needed. No graph database experience required. KB Arena handles the schema, extraction, and querying.
pip install -e '.[dev]'
Q&A pair generation during build-vectors is now parallelized with asyncio.gather() (5 concurrent). Building QnA indexes on large corpora is up to 5x faster.
```bash pip install kb-arena
pip install kb-arena[pdf] # PDF support (PyMuPDF) pip install kb-arena[docx] # Word documents (mammoth) pip install kb-arena[web] # Web scraping (httpx) pip install kb-arena[all-formats] # All of the above ```
kb-arena build-graph --corpus my-docs
kb-arena build-vectors --corpus my-docs
Run everything — Neo4j, the API server, and the frontend — in one command:
```bash
```bash
You have documentation files (markdown, HTML, text, PDFs). You want to know which retrieval strategy works best. Here's everything from zero.
The AWS Compute corpus ships ready to use (75 questions across 5 difficulty tiers):
kb-arena ingest ./datasets/aws-compute/raw/ --corpus aws-compute
kb-arena build-graph --corpus aws-compute
kb-arena build-vectors --corpus aws-compute
kb-arena benchmark --corpus aws-compute
kb-arena label-chunks --corpus aws-compute # v0.5.0: ground truth for IR metrics
kb-arena retriever-lab --corpus aws-compute # v0.5.0: classical IR metrics, no LLM cost
kb-arena serve
---
kb-arena demo truly zero-config — LLMClient init is tolerant of missing keys, demo_mode auto-enables, dashboard loads instantlyaws-compute_bm25.json is bundled (was missing in v0.5.0 — the 8th strategy showed empty in fresh installs)vhs tape scripts in docs/tapes/kb-arena --version flagHome — Overview of the 9 strategies, difficulty tiers, and evaluation methodology.

Strategy comparison — Ask the same question to all 9 strategies simultaneously. Compare answers, sources, latency, and cost side-by-side.

Benchmark results — Accuracy table by tier with grouped bar chart.

Knowledge graph — Interactive force-directed visualization of entities extracted from your docs.

Live graph build — Watch entities and relationships stream in as the extractor runs.

---
All prefixed with KB_ARENA_. Loaded from .env or environment.
| Variable | Default | Required | Description |
|---|---|---|---|
ANTHROPIC_API_KEY | — | Yes | Claude for generation, evaluation, extraction |
OPENAI_API_KEY | — | Yes | OpenAI for text-embedding-3-large |
NEO4J_URI | bolt://localhost:7687 | No | Neo4j connection |
NEO4J_USER | neo4j | No | Neo4j username |
NEO4J_PASSWORD | — | No | Neo4j password (set to match NEO4J_AUTH in docker-compose) |
JUDGE_MODEL | claude-opus-4-6 | No | Model used for LLM-as-judge evaluation (default differs from generate model to avoid self-evaluation bias) |
CHROMA_PATH | ./chroma_data | No | ChromaDB storage path |
EMBEDDING_MODEL | text-embedding-3-large | No | OpenAI embedding model |
EMBEDDING_DIMENSIONS | 3072 | No | Embedding vector dimensions |
GENERATE_MODEL | claude-sonnet-4-6 | No | Generation model |
FAST_MODEL | claude-haiku-4-5-20251001 | No | Classification model |
HOST | 0.0.0.0 | No | Server bind address |
PORT | 8000 | No | Server port |
DEBUG | false | No | Debug mode |
BENCHMARK_TEMPERATURE | 0.0 | No | LLM temperature for benchmarks |
BENCHMARK_MAX_CONCURRENT | 5 | No | Parallel benchmark queries |
BENCHMARK_QUERY_TIMEOUT_S | 120 | No | Per-query timeout (seconds) |
BENCHMARK_MAX_RETRIES | 2 | No | Retry count on failures |
PAGEINDEX_BEAM_WIDTH | 3 | No | Branches to explore per tree level |
PAGEINDEX_MAX_DEPTH | 4 | No | Maximum tree traversal depth |
DATASETS_PATH | ./datasets | No | Datasets directory |
RESULTS_PATH | ./results | No | Results output directory |
---
pip install kb-arena
kb-arena demo
This launches the dashboard with pre-computed results from the AWS Compute corpus (75 questions, 9 strategies, 5 difficulty tiers). The demo runs in read-only mode — chat, arena, and tools endpoints stay disabled until you set an API key. No Docker, no Neo4j, no surprises.

To enable live chat / arena voting / tools, set KB_ARENA_ANTHROPIC_API_KEY (or KB_ARENA_OPENAI_API_KEY, or use KB_ARENA_LLM_PROVIDER=ollama for free local inference).
kb-arena optimize --corpus my-docs \ --strategies naive_vector,rerank_vector \ --top-ks 3,5,10 --chunk-sizes 256,512,1024 \ --embedding-providers openai,bge --dry-run
Benchmark without pre-written ground truth -- useful for quick evaluation of new corpora before investing in question generation.

Scores on faithfulness and answer relevancy only (no accuracy/completeness since there's no reference to compare against).
Trace the full retrieval pipeline -- intent classification, retrieved sources, latency breakdown, and cost -- without generating a final answer.

Create a .env file or export directly:
export KB_ARENA_ANTHROPIC_API_KEY=sk-ant-... # Claude for generation + evaluation
export KB_ARENA_OPENAI_API_KEY=sk-... # OpenAI for text-embedding-3-large
export ANTHROPIC_API_KEY=sk-ant-... export OPENAI_API_KEY=sk-...
| Command | Description |
|---|---|
demo | Launch dashboard with pre-computed results (no API keys needed) |
init-corpus <name> | Scaffold datasets/{name}/ directories |
ingest <path> | Parse docs into JSONL. Accepts files, dirs, URLs, github:owner/repo. Options: --corpus, --format, --dry-run |
build-graph | Extract entities/rels into Neo4j. Options: --corpus |
build-vectors | Build vector indexes + PageIndex tree. Options: --corpus, --strategy |
generate-questions | Auto-generate benchmark questions. Options: --corpus, --count |
benchmark | Run evaluation. Options: --corpus, --strategy, --tier, --dry-run |
optimize | Automated hyperparameter search per strategy. Options: --corpus, --strategies, --top-ks, --chunk-sizes, --embedding-providers, --reranker-backends, --metric, --method (grid/random), --max-trials, --dry-run |
generate-qa | Generate Q&A pairs from your docs as JSONL. Options: --corpus, --output |
audit | Find documentation gaps — classifies sections as strong/weak/gap. Options: --corpus, --output, --max-sections |
fix | Generate fix recommendations with draft content. Options: --corpus, --max-fixes, --output |
report | Generate report. Options: --corpus, --output, --format (rich/json) |
serve | Launch API + frontend. Options: --host, --port |
health | Pipeline status. Options: --format (rich/json) |
All commands are independently re-runnable. Each stage writes to disk so you can re-run any step without repeating earlier ones.
Dry run — Preview what a command will do before committing to expensive LLM calls:
```bash kb-arena ingest datasets/my-docs/raw/ --corpus my-docs --dry-run
Bring your own retrieval strategy without forking. Your module exports a single Strategy subclass with build_index() and query() methods.

Fail your pipeline if retrieval quality drops:
```bash kb-arena benchmark --corpus my-docs --fail-below 0.7
```bash
kb-arena benchmark --corpus my-docs --dry-run
New "Compare" view in the benchmark UI lets you pick two strategies and see tier-by-tier accuracy, latency, and cost differences side by side.

A new Arena mode for blind head-to-head strategy battles. Ask a question, two random strategies answer it, you vote for the better response. ELO ratings emerge over time.
kb-arena serve # then open /arena in your browser

Benchmark runs now have unique IDs and timestamps. Results are preserved across runs instead of overwritten:
```bash kb-arena benchmark --corpus my-docs
kb-arena generate-questions --corpus my-docs --count 50
kb-arena benchmark --corpus my-docs
Generate Q&A pairs from your docs — use them for chatbot training, FAQ pages, or search indexes. Only needs an Anthropic key (no embeddings, no vector DB).
```bash kb-arena generate-qa --corpus my-docs
```
CLI

Web UI

Questions are organized into 5 difficulty tiers:
| Tier | Type | Hops | What it tests |
|---|---|---|---|
| 1 | Lookup | 1 | Single-fact lookup from one document |
| 2 | How-To | 1-2 | Multi-step processes, configuration sequences |
| 3 | Comparison | 2-3 | Comparing alternatives, trade-offs between options |
| 4 | Integration | 3-4 | Dependencies and connections between concepts |
| 5 | Architecture | 3-5 | Cross-document synthesis, transitive reasoning |
Use kb-arena generate-questions to auto-generate questions from your docs, or write them by hand in YAML.
---

**JSON output** — Pipe structured data to `jq`, scripts, or CI pipelines:
bash kb-arena report --corpus my-docs --format json | jq '.corpora' kb-arena health --format json | jq '.services'

**Pipeline hints** — After every command, see what to run next:
$ kb-arena ingest datasets/my-docs/raw/ --corpus my-docs Done. 12 documents, 47 sections → datasets/my-docs/processed/documents.jsonl
Next: kb-arena build-graph --corpus my-docs && kb-arena build-vectors --corpus my-docs
**Progress bars** — Every long-running command shows real-time progress (extraction sections, Neo4j batch loading, vector index building, question generation tiers).
**Cost tracking** — Benchmark runs display cumulative API cost in the progress bar and print per-strategy cost/accuracy summaries after completion.
**Verbose mode** — Add `--verbose` / `-v` to any command for debug logging:
bash kb-arena benchmark --corpus my-docs --verbose ```
---
KB Arena 是一个开源基准测试项目,用于比较不同检索架构的性能。它支持九种不同的检索策略,包括简单向量、上下文向量、问答对、知识图谱、混合(RRF-fused)、RAPTOR、PageIndex、BM25 和重新排列向量。
KB Arena 的新功能包括统计严谨度的指标层、自动化策略搜索、图形检索、硬化、第九个策略和嵌入提供者抽象等。
KB Arena 的环境依赖包括 Python 3.11+、pip、Docker 和 API 密钥(用于 LLM 提供商和 OpenAI)等。
KB Arena 的安装步骤包括使用 pip 安装,安装开发依赖,使用 Docker 安装 Neo4j 等。
KB Arena 的使用教程包括快速启动、使用内置 AWS 示例、配置环境变量、使用 API 等。
KB Arena 的配置说明包括环境变量、MCP、.env 文件和关键参数等。
KB Arena 的 API/接口说明包括使用 API 密钥、尝试它在 10 秒钟内(无 API 密钥)、预览搜索空间和成本等功能。
KB Arena 的工作流/模块说明包括策略插件系统、CI/CD 集成和失败管道等功能的使用说明。
KB Arena 的 FAQ 摘要包括自动生成基准测试问题、运行基准测试、Q&A 生成器等功能的使用说明。
kb-arena是一个有用的开源AI工作流,提供了多种检索架构的基准测试和评估工具,帮助开发者优化和比较不同检索模型的性能。然而,kb-arena的文档和API可能需要进一步完善。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
经综合评估,kb-arena开源AI工作流 在Agent工作流赛道中表现稳健,质量良好。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | kb-arena |
| 原始描述 | 开源AI工作流:Benchmark 9 retrieval architectures (vector, contextual, QnA, knowledge graph, h。⭐7 · Python |
| Topics | workflowbenchmarkbprefchromadbclidocument-retrievalpython |
| GitHub | https://github.com/xmpuspus/kb-arena |
| License | MIT |
| 语言 | Python |
收录时间:2026-05-21 · 更新时间:2026-05-24 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端