AI Skill Hub 推荐使用:论文公式提取工具 是一款优质的AI工具。AI 综合评分 7.5 分,在同类工具中表现稳健。如果你正在寻找可靠的AI工具解决方案,这是一个值得深入了解的选择。
论文公式提取工具 是一款基于 Python 开发的开源工具,专注于 公式提取、学术论文、AI编码助手 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
论文公式提取工具 是一款基于 Python 开发的开源工具,专注于 公式提取、学术论文、AI编码助手 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install paperpipe
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install paperpipe
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/hummat/paperpipe
cd paperpipe
pip install -e .
# 验证安装
python -c "import paperpipe; print('安装成功')"
# 命令行使用
paperpipe --help
# 基本用法
paperpipe input_file -o output_file
# Python 代码中调用
import paperpipe
# 示例
result = paperpipe.process("input")
print(result)
# paperpipe 配置文件示例(config.yml) app: name: "paperpipe" debug: false log_level: "INFO" # 运行时指定配置文件 paperpipe --config config.yml # 或通过环境变量配置 export PAPERPIPE_API_KEY="your-key" export PAPERPIPE_OUTPUT_DIR="./output"
The problem: You're implementing a paper. You need the exact equations, want to verify your code matches the math, and your coding agent keeps hallucinating details. Reading PDFs is slow; copy-pasting LaTeX is tedious.
The solution: paperpipe maintains a local paper database with PDFs, LaTeX source (when available), extracted equations, and coding-oriented summaries. It integrates with coding agents (Claude Code, Codex, Gemini CLI) so they can ground their responses in actual paper content.
uv tool install paperpipe --with "paperpipe[llm]" # better summaries via LLMs uv tool install paperpipe --with "paperpipe[paperqa]" # RAG via PaperQA2 uv tool install paperpipe --with "paperpipe[leann]" # local RAG via LEANN uv tool install paperpipe --with "paperpipe[figures]" # figure extraction from LaTeX/PDF uv tool install paperpipe --with "paperpipe[mcp]" # MCP server integrations (Python 3.11+) uv tool install paperpipe --with "paperpipe[all]" # everything
<details markdown="1">
<summary>Alternative: pip install</summary>
bash pip install paperpipe pip install 'paperpipe[llm]' pip install 'paperpipe[paperqa]' # PaperQA2 + multimodal PDF parsing pip install 'paperpipe[leann]' pip install 'paperpipe[figures]' # figure extraction from LaTeX/PDF pip install 'paperpipe[mcp]' pip install 'paperpipe[all]' </details>
<details markdown="1">
<summary>From source</summary>
bash git clone https://github.com/hummat/paperpipe && cd paperpipe pip install -e ".[all]" ``` </details>
papi add --from-file papers.bib
```bash
**Title Search:**bash
papi install mcp
papi install mcp --claude papi install mcp --codex papi install mcp --gemini
papi install mcp --repo
```bash papi index --backend leann
papi index --backend leann --leann-embedding-mode ollama --leann-embedding-model nomic-embed-text papi index --backend leann --leann-embedding-mode ollama --leann-embedding-host http://localhost:11434 papi index --backend leann --leann-doc-chunk-size 350 --leann-doc-chunk-overlap 128 ```
By default, papi ask --backend leann auto-builds the index if missing (disable with --leann-no-auto-index). For explicit derived names such as papers_openai_voyage-4, auto-build infers the embedding mode/model from the name.
</details>
papi index --backend pqa --pqa-embedding text-embedding-3-smallleann_search() (fast) or retrieve_chunks() (with citations)</details>
| Variable | Default | Description |
|---|---|---|
PAPERPIPE_PQA_INDEX_DIR | ~/.paperpipe/.pqa_index | Root directory for PaperQA2 indices |
PAPERPIPE_PQA_INDEX_NAME | paperpipe_<embedding> | Index name (subfolder under index dir) |
PAPERQA_EMBEDDING | (from config) | Embedding model (must match index for PaperQA2) |
| Flag | Description |
|---|---|
--pqa-llm MODEL | LLM for answer generation (LiteLLM id) |
--pqa-summary-llm MODEL | LLM for evidence summarization (often cheaper) |
--pqa-embedding MODEL | Embedding model for text chunks |
--pqa-temperature FLOAT | LLM temperature (0.0-1.0) |
--pqa-verbosity INT | Logging level (0-3; 3 = log all LLM calls) |
--pqa-agent-type TEXT | Agent type (e.g., fake for deterministic low-token retrieval) |
--pqa-answer-length TEXT | Target answer length (e.g., "about 200 words") |
--pqa-evidence-k INT | Number of evidence pieces to retrieve (default: 10) |
--pqa-max-sources INT | Max sources to cite in answer (default: 5) |
--pqa-timeout FLOAT | Agent timeout in seconds (default: 500) |
--pqa-concurrency INT | Indexing concurrency (default: 1) |
--pqa-rebuild-index | Force full index rebuild |
--pqa-retry-failed | Retry previously failed documents |
--format evidence-blocks | Output JSON with {answer, evidence[]} (requires PaperQA2 Python package) |
--pqa-raw | Show raw PaperQA2 output (streaming logs + answer); disables papi ask output filtering (also enabled by global -v/--verbose) |
Any additional arguments are passed through to pqa (e.g., --agent.search_count 10).
export OPENAI_API_KEY=... export ANTHROPIC_API_KEY=... export GEMINI_API_KEY=... export VOYAGE_API_KEY=... export OPENROUTER_API_KEY=...
papi ask "..." --backend leann --leann-provider ollama --leann-model qwen3:8b
papi ask "..." --backend leann --leann-host http://localhost:11434
papi ask "..." --backend leann --leann-top-k 12 --leann-complexity 64
Notes: - If you use --leann-provider anthropic, your leann install must include the anthropic Python package (pip install anthropic in the same environment that runs leann). - You can pass through extra leann CLI flags after -- (useful for debugging), e.g.: papi -v ask "..." --backend leann -- ...
export OPENAI_API_KEY=... export ANTHROPIC_API_KEY=... export GEMINI_API_KEY=... export VOYAGE_API_KEY=... export OPENROUTER_API_KEY=...
paperpipe uses LLMs for generating summaries, extracting equations, and tagging. Without an LLM, it falls back to regex extraction and metadata-based summaries.
```bash
export OLLAMA_HOST=http://localhost:11434
For persistent settings, create ~/.paperpipe/config.toml (override location with PAPERPIPE_CONFIG_PATH):
```toml [llm] model = "gemini/gemini-2.5-flash" temperature = 0.3
[embedding] model = "gemini/gemini-embedding-001"
[paperqa] settings = "default" index_dir = "~/.paperpipe/.pqa_index" summary_llm = "gpt-4o-mini" enrichment_llm = "gpt-4o-mini"
[leann] llm_provider = "ollama" llm_model = "qwen3:8b" embedding_model = "nomic-embed-text" embedding_mode = "ollama"
[tags.aliases] cv = "computer-vision" nlp = "natural-language-processing" ```
Precedence: CLI flags > env vars > config.toml > built-in defaults.
papi show lora --level tex # exact LaTeX definitions
paperpipe supports two RAG backends for cross-paper questions:
| Backend | Install | Best for |
|---|---|---|
| [PaperQA2](https://github.com/Future-House/paper-qa) | paperpipe[paperqa] | Agentic synthesis with citations (cloud LLMs) |
| [LEANN](https://github.com/yichuan-w/LEANN) | paperpipe[leann] | Local retrieval (Ollama) |
```bash
export GEMINI_API_KEY=... # default provider export OPENAI_API_KEY=... export ANTHROPIC_API_KEY=... export VOYAGE_API_KEY=... # for Voyage embeddings (recommended with Claude) export OPENROUTER_API_KEY=... # 200+ models
Check which models work with your keys:bash papi models # probe default models for your configured keys papi models latest # probe latest model candidates (gpt-5, Gemini via OpenRouter/Gemini, Claude, Voyage 4) papi models last-gen # probe previous generation papi models all # probe broader superset papi models --verbose # show underlying provider errors ```
```bash
paperpipe is designed to work with coding agents. Install the skill and MCP servers:
```bash papi install # installs skill + MCP for detected CLIs
papi ask "How does LoRA differ from full fine-tuning in terms of parameter count?"
papi ask "What regularization techniques do these papers use?"
paperpipe 是一个解决论文实现问题的工具,帮助开发者维护一个本地论文数据库,包含 PDF 文件和 LaTeX 代码,方便开发者验证代码与数学公式的匹配。
paperpipe 提供多种功能,包括使用 LLMs 提供更好的摘要、RAG 通过 PaperQA2 提供引文、使用 LEANN 提供本地 RAG 等功能。
paperpipe 需要从 BibTeX 文件导入,需要使用 bibtexparser 库。
paperpipe 可以使用 uv tool 安装,支持多种安装选项,包括 BibTeX 支持、LMM 支持、PaperQA2 支持等。
paperpipe 的使用包括创建索引、在编码代理中使用 leann_search() 或 retrieve_chunks() 等功能,支持 PaperQA2 等 RAG 后端。
paperpipe 的配置包括 MCP 环境变量,例如 PAPERPIPE_PQA_INDEX_DIR 和 PAPERPIPE_PQA_INDEX_NAME 等,支持多种配置选项。
paperpipe 提供多种 API,包括 papi show lora、papi ask 等功能,支持 RAG 后端等功能。
paperpipe 的工作流包括创建索引、在编码代理中使用 leann_search() 或 retrieve_chunks() 等功能,支持 PaperQA2 等 RAG 后端。
paperpipe 的常见问题包括如何使用 RAG 后端、如何设置 API 密钥等问题。
专注于学术论文公式提取的垂直工具,填补AI编码助手的知识库建设空白。代码质量较高,但stars较少,生态有待扩展。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
总体来看,论文公式提取工具 是一款质量良好的AI工具,在同类工具中具备一定竞争力。AI Skill Hub 将持续追踪其更新动态,建议收藏备用,结合自身场景选择合适时机引入使用。
| 原始名称 | paperpipe |
| 原始描述 | 开源AI工具:Extract equations and context from research papers for LLM coding assistants (ar。⭐12 · Python |
| Topics | 公式提取学术论文AI编码助手arXivCLI工具 |
| GitHub | https://github.com/hummat/paperpipe |
| License | MIT |
| 语言 | Python |
收录时间:2026-05-21 · 更新时间:2026-05-26 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。