经 AI Skill Hub 精选评估,PageIndex Agent工作流 获评「强烈推荐」。在 GitHub 上收获超过 31.2k 颗 Star,这款Agent工作流在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 8.2 分,适合有一定技术背景的用户使用。
PageIndex Agent工作流 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
PageIndex Agent工作流 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
# 方式一:pip 安装(推荐)
pip install pageindex
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install pageindex
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/VectifyAI/PageIndex
cd PageIndex
pip install -e .
# 验证安装
python -c "import pageindex; print('安装成功')"
# 命令行使用
pageindex --help
# 基本用法
pageindex input_file -o output_file
# Python 代码中调用
import pageindex
# 示例
result = pageindex.process("input")
print(result)
# pageindex 配置文件示例(config.yml) app: name: "pageindex" debug: false log_level: "INFO" # 运行时指定配置文件 pageindex --config config.yml # 或通过环境变量配置 export PAGEINDEX_API_KEY="your-key" export PAGEINDEX_OUTPUT_DIR="./output"
<br/> <br/>
<p align="center"> <a href="https://trendshift.io/repositories/14736" target="_blank"><img src="https://trendshift.io/api/badge/repositories/14736" alt="VectifyAI%2FPageIndex | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> </p>
Are you frustrated with vector database retrieval accuracy for long professional documents? Traditional vector-based RAG relies on semantic similarity rather than true relevance. But similarity ≠ relevance — what we truly need in retrieval is relevance, and that requires reasoning. When working with professional documents that demand domain expertise and multi-step reasoning, similarity search often falls short.
Inspired by AlphaGo, we propose PageIndex — a vectorless, reasoning-based RAG system that builds a hierarchical tree index from long documents and uses LLMs to reason over that index for agentic, context-aware retrieval. It simulates how human experts navigate and extract knowledge from complex documents through tree search, enabling LLMs to think and reason their way to the most relevant document sections. PageIndex performs retrieval in two steps:
Compared to traditional vector-based RAG, PageIndex features: - No Vector DB: Uses document structure and LLM reasoning for retrieval, instead of vector similarity search. - No Chunking: Documents are organized into natural sections, not artificial chunks. - Better Explainability and Traceability: Retrieval is based on reasoning, traceable and interpretable, with page and section references. No more opaque, approximate vector search (“vibe retrieval”). - Context-Aware Retrieval: Retrieval depends on your full context (e.g., conversation history and domain knowledge), and easily incorporates new context. - Human-like Retrieval: Simulates how human experts navigate and extract knowledge from complex documents.
PageIndex powers a reasoning-based RAG system that achieved state-of-the-art 98.7% accuracy on FinanceBench, demonstrating superior performance over vector-based RAG solutions in professional document analysis. See our blog post for details.
pip3 install --upgrade -r requirements.txt
pip3 install openai-agents
Note: This package uses standard PDF parsing. For use cases with complex PDFs, our cloud service (via MCP and API) offers enhanced OCR, tree building, and retrieval.
You can follow these steps to generate a PageIndex tree from a PDF document.
For a simple, end-to-end agentic vectorless RAG example using self-hosted PageIndex (with OpenAI Agents SDK), see examples/agentic_vectorless_rag_demo.py.
```bash
python3 examples/agentic_vectorless_rag_demo.py ```
---
Create a .env file in the root directory with your LLM API key. Multi-LLM is supported via LiteLLM:
OPENAI_API_KEY=your_openai_key_here
Mafin 2.5 is a reasoning-based RAG system for financial document analysis, powered by PageIndex. It achieved a state-of-the-art 98.7% accuracy on the FinanceBench benchmark, significantly outperforming traditional vector-based RAG systems.
PageIndex's hierarchical indexing and reasoning-driven retrieval enable precise navigation and extraction of relevant context from complex financial reports, such as SEC filings and earnings disclosures.
Explore the full benchmark results and our blog post for detailed comparisons and performance metrics.
---
创新性强,以推理替代向量化的RAG思路值得关注。社区热度高(31k星),架构设计先进,适合前沿应用探索。文档完整度需验证。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
AI Skill Hub 点评:PageIndex Agent工作流 的核心功能完整,质量优秀。对于自动化工程师和运维人员来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | PageIndex |
| 原始描述 | 开源AI工作流:📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG。⭐31.2k · Python |
| Topics | RAG文档索引推理引擎AI工作流Agent框架 |
| GitHub | https://github.com/VectifyAI/PageIndex |
| License | MIT |
| 语言 | Python |
收录时间:2026-05-14 · 更新时间:2026-05-16 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端