📄 工具详情 ⚙️ 安装教程 📚 使用教程

能力标签

🤖 Agent 🔄 工作流 👁 OCR 🐳 Docker 💻 CLI 🔗 REST API 🧬 Embedding 📚 RAG 🖼 视觉 🔊 TTS

🛠

AI工具

AIfred智能多代理助手

Q: AIfred-Intelligence 如何安装和开始使用？

访问 AIfred-Intelligence 的 GitHub 仓库或官方网站，按照 README 文档中的步骤安装依赖并运行。通常需要 Python 3.8+ 或 Node.js 16+ 基础环境。

Q: AIfred-Intelligence 是否免费？许可证是什么？

AIfred-Intelligence 完全免费，采用 NOASSERTION 许可证开源发布，任何人都可以免费使用、修改和分发。

Q: AIfred-Intelligence 适合哪些用户使用？

AIfred-Intelligence 主要面向有一定技术基础的用户，包括开发者、数据分析师、AI 工程师等专业人士。

Q: AIfred-Intelligence 的社区活跃度和项目维护状况如何？

AIfred-Intelligence 在 GitHub 上已获得 32 个 Star，处于积极发展阶段，社区在持续扩大。

基于 Python · 开源 AI 工具，GitHub 社区精选

英文名：AIfred-Intelligence

⭐ 32 Stars 🍴 2 Forks 💻 Python 📄 NOASSERTION 🏷 AI 7.2分

7.2AI 综合评分

多智能体工作流编排思维链自托管Python

🌐 访问官网

✦ AI Skill Hub 推荐

经 AI Skill Hub 精选评估，AIfred智能多代理助手获评「推荐使用」。这款AI工具在功能完整性、社区活跃度和易用性方面表现出色，AI 评分 7.2 分，适合有一定技术背景的用户使用。

📚 深度解析

AIfred智能多代理助手是一款基于 Python 的开源工具，在 GitHub 上收获 0k+ Star，是多智能体、工作流编排、思维链、自托管领域中的优质开源项目。开源工具的最大优势在于代码完全透明，你可以审计每一行代码的安全性，也可以根据自身需求进行二次开发和定制。

**为什么要使用开源工具而非商业 SaaS？**
对于个人开发者和有隐私需求的用户，本地部署的开源工具意味着数据不离本机，不受第三方服务商的数据政策约束。同时，开源工具通常没有使用次数限制和月度费用，一次安装即可长期使用，对于高频使用场景的总拥有成本（TCO）远低于订阅制商业工具。

**安装与环境准备**
AIfred智能多代理助手依赖 Python 运行环境。建议通过 pyenv（Python）或 nvm（Node.js）管理 Python 版本，避免全局环境污染。对于新手用户，推荐先创建虚拟环境（python -m venv venv && source venv/bin/activate），再安装依赖，这样即使出现问题也可以随时删除虚拟环境重新开始，不影响系统稳定性。

**社区与维护**
GitHub Issue 和 Discussion 是获取帮助的最快渠道。在提问前建议先检查 Closed Issues（已关闭的问题），大多数常见问题都已有解答。遇到 Bug 时，提供 pip list 的输出、完整错误堆栈和最小可复现示例，能显著提高开发者响应速度。AI Skill Hub 将持续追踪 AIfred智能多代理助手的版本更新，及时通知重要功能变化。

📋 工具概览

AIfred智能多代理助手是一款基于 Python 开发的开源工具，专注于多智能体、工作流编排、思维链等核心功能。作为 GitHub 开源项目，它拥有活跃的社区支持和持续的版本迭代，代码完全透明可审计，支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流，都能提供稳定可靠的解决方案。

GitHub Stars

⭐ 32

开发语言

Python

支持平台

Windows / macOS / Linux

维护状态

轻量级项目，按需更新

开源协议

NOASSERTION

AI 综合评分

7.2 分

工具类型

AI工具

Forks

📖 中文文档

以下内容由 AI Skill Hub 根据项目信息自动整理，如需查看完整原始文档请访问底部「原始来源」。

📌 核心特色

开源免费，支持本地部署，数据完全自主可控
活跃的 GitHub 开源社区，持续迭代更新
提供详细文档和使用示例，新手友好
支持自定义配置，灵活适配不同使用环境
可作为基础组件集成进现有技术栈或进行二次开发

🎯 主要使用场景

本地部署运行，保护数据隐私，满足合规要求
自定义集成到现有系统，扩展技术栈能力
作为开源基础组件进行商业化二次开发

以下安装命令基于项目开发语言和类型自动生成，实际以官方 README 为准。

安装命令

# 方式一：pip 安装（推荐）
pip install aifred-intelligence

# 方式二：虚拟环境安装（推荐生产环境）
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install aifred-intelligence

# 方式三：从源码安装（获取最新功能）
git clone https://github.com/Peuqui/AIfred-Intelligence
cd AIfred-Intelligence
pip install -e .

# 验证安装
python -c "import aifred_intelligence; print('安装成功')"

📋 安装步骤说明

访问 GitHub 仓库页面
按照 README 文档完成依赖安装
根据系统环境完成初始化配置
参考官方示例或文档开始使用
遇到问题可在 GitHub Issues 中查找解答

以下用法示例由 AI Skill Hub 整理，涵盖最常见的使用场景。

常用命令 / 代码示例

# 命令行使用
aifred-intelligence --help

# 基本用法
aifred-intelligence input_file -o output_file

# Python 代码中调用
import aifred_intelligence

# 示例
result = aifred_intelligence.process("input")
print(result)

以下配置示例基于典型使用场景生成，具体参数请参照官方文档调整。

配置示例

# aifred-intelligence 配置文件示例（config.yml）
app:
  name: "aifred-intelligence"
  debug: false
  log_level: "INFO"

# 运行时指定配置文件
aifred-intelligence --config config.yml

# 或通过环境变量配置
export AIFRED_INTELLIGENCE_API_KEY="your-key"
export AIFRED_INTELLIGENCE_OUTPUT_DIR="./output"

📑 README 深度解析真实文档完整度 82/100 查看 GitHub 原文 →

以下内容由系统直接从 GitHub README 解析整理，保留代码块、表格与列表结构。

简介

🌍 Languages: English | Deutsch

---

📊 LLM Calls Overview

Mode	Min LLM Calls	Max LLM Calls	Typical Duration
Own Knowledge	1	1	5-30s
Automatik (Cache Hit)	0	0	<1s
Automatik (Direct Answer)	2	3	5-35s
Automatik (Web Research)	4	5	15-60s
Quick Web Search	3	4	10-40s
Deep Web Search	3	4	15-60s

---

✨ Features

🧠 Autonomous Capabilities (Function Calling / Tool Use)

The LLM autonomously decides which tools to use — OpenAI-compatible tool infrastructure with plugin system:

Message Hub — AIfred as Communication Central: AIfred monitors external channels and processes incoming messages autonomously. Runs headless — no browser needed. Channel plugins listen in the background, the LLM processes and replies via Discord/Email independently. The web UI is only needed for initial setup (credentials, plugin toggles) and optional monitoring. Unified plugin system: drop a .py file into plugins/channels/ or plugins/tools/ — auto-discovered, no code changes needed. Built-in channels: E-Mail Monitor (IMAP IDLE push-based + SMTP auto-reply), Discord (bot with channel + DM support, /clear command). Plugin Manager UI modal to enable/disable any plugin at runtime (moves files to disabled/). Pipeline: Channel listener → Envelope normalization → SQLite routing table → AIfred engine call (with full toolkit incl. web research, calendar check) → Auto-reply (optional, per-channel toggle). Agent routing: address Sokrates or Salomo by name. Note: Hub messages are processed without browser State — progress bars, live streaming and sources HTML are not available for Hub-processed messages; this is by design, not a limitation. See Architecture & Setup
Email Integration: Read, search, and send emails via IMAP/SMTP. Sending requires explicit user confirmation (draft → review → confirm). Credentials via .env or UI modal
EPIM Database Integration: Full CRUD access to the EssentialPIM Firebird 2.5 database — the LLM autonomously searches, creates, updates and deletes calendar events, contacts, notes, todos and password entries. Automatic name-to-ID resolution, anti-hallucination guardrails, 7-day date reference
Workspace (Files & Documents): Upload documents (PDF, Word, Excel, PowerPoint, LibreOffice, TXT, MD, CSV), automatic chunking and embedding in ChromaDB via BGE-M3 (8192-token context, 1024-dim, multilingual). Token-accurate chunking with the local Qwen3 tokenizer. LLM can autonomously browse, read (PDFs page-by-page), write, edit, rename and delete files on disk — then index them into the vector database for semantic search with folder filter (search_documents(query=…, folder="bibel/Schlachter")) and chunk-neighbor retrieval (each hit returns its immediate neighbor chunks for full surrounding context). Document Manager UI with preview, bulk-folder index (one click for an entire tree), live file count per folder, orphan cleanup (find indexed entries whose source file is gone) and toast-based feedback for terminal status messages
Sandboxed Code Execution: LLM writes and runs Python code in isolated subprocess. Supports numpy, pandas, matplotlib, plotly, seaborn, scipy, sklearn. Interactive HTML/JS visualizations (Plotly 3D, Canvas games, simulations) directly in chat
Agent Long-Term Memory: Per-agent persistent memory via ChromaDB (BGE-M3 embeddings) — agents autonomously store insights, combined recall (10 recent + semantic search), session pinning. Memory Browser for inspection and cleanup. Incognito mode (🔒)
Tool-Output Token Cap: Single tool result is capped to keep system + history + memory + tool_result ≤ 75% of the active model's context window — guarantees the model has 25% headroom for its answer. JSON-aware truncation: result-list responses are shortened from the end (with _truncated marker) so the model still sees structured data
Automatic Web Research: AI decides autonomously when research is needed. Multi-API (SearXNG primary, Tavily + Brave as fallback) with automatic scraping and LLM-based URL ranking. Semantic vector cache via ChromaDB with volatility-aware reuse threshold (PERMANENT 0.20 / MONTHLY 0.15 / WEEKLY 0.10 / DAILY 0.05) — stable knowledge tolerates wider matches, news-class topics stay tight to avoid stale facts
Additional tools: calculate (math), web_fetch (URL extraction), store_memory (memory)
Full plugin overview: Available Plugins

🔧 Technical Highlights

Reflex Framework: React frontend generated from Python
WebSocket Streaming: Real-time updates without polling
Adaptive Temperature: AI selects temperature based on question type
Token Management: Dynamic context window calculation
VRAM-Aware Context: Automatic context sizing based on available GPU memory
Debug Console: Comprehensive logging and monitoring
ChromaDB Server Mode: Thread-safe vector DB via Docker (0.0 distance for exact matches)
GPU Detection: Automatic detection and warnings for incompatible backend-GPU combinations (docs/GPU_COMPATIBILITY.md)
Context Calibration: Intelligent per-model calibration for Ollama and llama.cpp
Ollama: Binary search with automatic VRAM/Hybrid mode detection (512 token precision)
Hybrid mode for CPU+GPU offload (MoE vs Dense detection, 3 GB RAM reserve)
Auto-Hybrid threshold: VRAM-only < 16k tokens → switch to Hybrid
llama.cpp (3-phase calibration for multi-GPU setups):
Phase 1 (GPU-only): Binary search on -c with ngl=99, stops llama-swap, tests on temp port
KV fallback chain: f16 → q8_0 (if < native context) → q4_0 (last resort, only if q8_0 < 32K)
Small model shortcut: models with native_context ≤ 8192 are tested directly (no binary search)
flash-attn auto-detection: startup failure → automatic retry without --flash-attn, updates llama-swap YAML on success
Phase 2 (Speed variant): Min-GPU strategy — calculates minimum GPUs needed for model weights, fewer GPU boundaries = less transfer overhead = faster inference (tradeoff: reduced max context). Own KV chain (f16 → q8_0), independent from Phase 1. Creates a separate model-speed entry in llama-swap YAML with its own KV quant
Phase 3 (Hybrid fallback): If Phase 1 < 32K → NGL reduction to free VRAM for KV-cache. Inherits KV quantization from Phase 1
Startup errors (unknown architecture, wrong CUDA version) are logged and never written as false calibration data
Results cached in unified data/model_vram_cache.json
llama-swap Autoscan: Automatic model discovery on service start (scripts/llama-swap-autoscan.py) — zero manual YAML editing required
Scans Ollama manifests → creates descriptive symlinks in ~/models/ (e.g., sha256-6335adf... → Qwen3-14B-Q8_0.gguf)
Scans HuggingFace cache (~/.cache/huggingface/hub/) → creates symlinks for downloaded GGUFs
VL models (with matching mmproj-*.gguf) automatically get --mmproj argument
Compatibility test: each new model is briefly started with llama-server — unsupported architectures (e.g. deepseekocr) are detected and excluded before being added to the config
Skip list (~/.config/llama-swap/autoscan-skip.json): incompatible models are remembered, no re-test on every restart. Delete entry to re-test after a llama.cpp update
Detects new GGUFs and adds llama-swap config entries with optimal defaults (-ngl 99, --flash-attn on, -ctk q8_0, etc.)
Automatically maintains groups.main.members in the YAML — all models share VRAM exclusivity without manual editing
Creates preliminary VRAM cache entries (calibration via UI adds vram_used_mb measured while the model is loaded)
Creates config.yaml from scratch if not present — no manual bootstrap required
Runs as ExecStartPre in systemd service → ollama pull model or hf download is all it takes to add a model
Ctx/Speed Switch: Per-agent toggle between two pre-calibrated variants (Ctx = max context, ⚡ Speed = 32K + aggressive GPU split)
RoPE 2x Extended Context: Optional extended calibration up to 2x native context limit
Parallel Web Search: 2-3 optimized queries distributed in parallel across APIs (Tavily, Brave, SearXNG), automatic URL deduplication, optional self-hosted SearXNG
Parallel Scraping: ThreadPoolExecutor scrapes 3-7 URLs simultaneously, first successful results are used
Failed Sources Display: Shows unavailable URLs with error reasons (Cloudflare, 404, Timeout) - persisted in Vector Cache for cache hits
PDF Support: Direct extraction from PDF documents (AWMF guidelines, PubMed PDFs) via PyMuPDF with browser-like User-Agent

Key Features

Full Remote Control: Control all AIfred settings from anywhere
Live Browser Sync: API changes automatically appear in the browser UI via session mtime-watching
Message Injection: Queue messages that browser processes with full pipeline
Session Management: Access and manage multiple browser sessions
Per-session Config: Agent, discussion mode, and research mode stored per session (not global)
OpenAPI Documentation: Interactive Swagger UI at /docs

Prerequisites

Python 3.10+
LLM Backend (choose one):
llama.cpp via llama-swap (GGUF models) - best performance, full GPU control (setup guide)
Ollama (easy, GGUF models) - recommended for getting started
vLLM (fast, AWQ models) - best performance for AWQ (requires Compute Capability 7.5+)
TabbyAPI (ExLlamaV2/V3, EXL2 models) - experimental

Zero-Config Model Management (llama.cpp backend): After the initia

🚀 Installation

Example Usage

```bash

Use Cases

Cloud Control: Operate AIfred from anywhere via HTTPS/API
Home Automation: Integration with Home Assistant, Node-RED, etc.
Voice Assistants: Alexa/Google Home can send AIfred queries
Batch Processing: Automated queries via scripts
Mobile Apps: Custom apps can use the API
Remote Maintenance: Test and monitor AIfred on headless systems

---

Per-Session Config SSOT

Agent selection, discussion mode, and research mode are persisted per session, not globally. Every chat session has its own config block stored in its session file:

{
  "data": {
    "config": {
      "active_agent": "aifred",
      "multi_agent_mode": "standard",
      "symposion_agents": [],
      "research_mode": "automatik"
    }
  }
}

Clean default on new session: every new chat starts with aifred + standard + automatik — never inheriting from the previous session.

Multi-tab / cross-channel sync via session file mtime-watching: whenever any writer (browser tab, API, email channel, voice puck) modifies the session file, all other tabs that have this session open detect the change within 1 second and reload — without polling, without events, without race conditions. This replaces the legacy update_flag mechanism entirely.

Get current global settings

curl http://localhost:8002/api/settings

Change model (global setting)

curl -X PATCH http://localhost:8002/api/settings \ -H "Content-Type: application/json" \ -d '{"aifred_model": "qwen3:14b"}'

Switch a session to Tribunal mode (per-session config)

🎤 Voice & Vision Interface

Voice Interface: STT via Whisper Docker container (dual-device: CPU permanent + GPU with TTL auto-unload, Web-UI for model/settings management). TTS engines: Edge TTS, XTTS v2 Voice Cloning, MOSS-TTS 1.7B, DashScope Qwen3-TTS Cloud Streaming, Piper, espeak. Per-agent TTS configuration (voice, speed, pitch, on/off per agent), gapless realtime audio playback
FreeEcho.2 Voice Terminal: Dedicated voice interface for Echo Dot 2 hardware (custom firmware). Wake word detection, immediate browser flush (user question visible within 500ms after STT), deferred TTS container management (parallel GPU cleanup during LLM inference)
Vision/OCR: Image analysis with multimodal LLMs (DeepSeek-OCR, Qwen3-VL, Ministral-3), VL Follow-Up, interactive image crop, 2-model architecture (Vision-LLM + Main-LLM)

🔊 Voice Interface (TTS Engines)

AIfred supports 6 TTS engines with different trade-offs between quality, latency, and resource usage. Each engine was chosen for a specific use case after extensive experimentation.

Engine	Type	Streaming	Quality	Latency	Resources
XTTS v2	Local Docker	Sentence-level	High (voice cloning)	~1-2s/sentence	~2 GB VRAM
MOSS-TTS 1.7B	Local Docker	None (batch after bubble)	Excellent (best open-source)	~18-22s/sentence	~11.5 GB VRAM
DashScope Qwen3-TTS	Cloud API	Sentence-level	High (voice cloning)	~1-2s/sentence	API key only
Piper TTS	Local	Sentence-level	Medium	<100ms	CPU only
eSpeak	Local	Sentence-level	Low (robotic)	<50ms	CPU only
Edge TTS	Cloud	Sentence-level	Good	~200ms	Internet only

Why multiple engines?

The search for the perfect TTS experience led through several iterations:

Edge TTS was the first engine -- free, fast, decent quality, but limited voices and no voice cloning.
XTTS v2 added high-quality voice cloning with multilingual support. Sentence-level streaming works well: while the LLM generates the next sentence, XTTS synthesizes the current one. However, it requires a Docker container and ~2 GB VRAM.
MOSS-TTS 1.7B delivers the best speech quality of all open-source models (SIM 73-79%), but at a cost: ~18-22 seconds per sentence makes it unsuitable for streaming. Audio is generated as a batch after the complete response, which is acceptable for short answers but frustrating for longer ones.
DashScope Qwen3-TTS adds cloud-based voice cloning via Alibaba Cloud's API. By default it uses sentence-level streaming (same as XTTS) for better intonation. A realtime WebSocket mode (word-level chunks, ~200ms first audio) is also implemented but disabled by default -- it trades slightly worse prosody for faster first-audio. To re-enable it, uncomment the WebSocket block in state.py:_init_streaming_tts() (see code comment there).
Piper TTS and eSpeak serve as lightweight offline alternatives that work without Docker, GPU, or internet connection.

Playback Architecture: - Visible HTML5 <audio> widget with blob-URL prefetching (next 2 chunks pre-fetched into memory) - preservesPitch: true for speed adjustments without chipmunk effect - Per-agent voice/pitch/speed settings (AIfred, Sokrates, Salomo can each have distinct voices) - SSE-based audio streaming from backend to browser (persistent connection, 15s keepalive)

☁️ Cloud API Support

AIfred supports cloud LLM providers via OpenAI-compatible APIs:

Provider	Models	API Key Variable
Qwen (DashScope)	qwen-plus, qwen-turbo, qwen-max	`DASHSCOPE_API_KEY`
DeepSeek	deepseek-chat, deepseek-reasoner	`DEEPSEEK_API_KEY`
Claude (Anthropic)	claude-3.5-sonnet, claude-3-opus	`ANTHROPIC_API_KEY`
Kimi (Moonshot)	moonshot-v1-8k, moonshot-v1-32k	`MOONSHOT_API_KEY`

Features: - Dynamic model fetching (models loaded from provider's /models endpoint) - Token usage tracking (prompt + completion tokens displayed in debug console) - Per-provider model memory (each provider remembers its last used model) - Vision model filtering (excludes -vl variants from main LLM dropdown) - Streaming support with real-time output

Note: Cloud APIs don't require local GPU resources - ideal for: - Testing larger models without hardware investment - Mobile/laptop usage without dedicated GPU - Comparing cloud vs local model quality

📁 Code Structure Reference

Core Entry Points: - aifred/state.py - Main state management, send_message()

Automatik Mode: - aifred/lib/conversation_handler.py - Decision logic, RAG context

Web Research Pipeline: - aifred/lib/research/orchestrator.py - Top-level orchestration (incl. URL ranking) - aifred/lib/research/cache_handler.py - Session cache - aifred/lib/research/query_processor.py - Query optimization + search - aifred/lib/research/url_ranker.py - LLM-based URL relevance ranking (NEW) - aifred/lib/research/scraper_orchestrator.py - Parallel scraping - aifred/lib/research/context_builder.py - Context building + LLM

Document RAG Pipeline: - aifred/lib/document_store.py - ChromaDB Documents collection — token-accurate chunking (Qwen3 tokenizer, char fallback), delete + upsert for clean re-indexing, dual embedding functions (index/query mode), folder filter + chunk-neighbor retrieval in search() - aifred/lib/file_manager.py - Single source of truth for file-system + ChromaDB operations (used by Document UI and Workspace plugin): list/create/delete/rename/index/deindex/search/list_orphaned

Supporting Modules: - aifred/lib/vector_cache.py - ChromaDB semantic cache for web research, includes OllamaEmbeddingFunction with mode-switch (index→GPU+warm, query→CPU) - aifred/lib/agent_memory.py - Per-agent ChromaDB memory collections - aifred/lib/tool_output_cap.py - Token budget for tool results (75% input ratio, JSON-aware truncation, ContextVar-based) - aifred/lib/debug_format.py - Tool-call/result formatting for the debug panel (key=value rendering, agent prefix, token count) - aifred/lib/intent_detector.py - Temperature selection - aifred/lib/agent_tools.py - Web search, scraping, context building

📝 Automatik-LLM Prompts Reference

The Automatik-LLM uses dedicated prompts in prompts/{de,en}/automatik/ for various decisions:

Prompt	Language	When Called	Purpose
`intent_detection.txt`	EN only	Pre-processing	Determine query intent (FACTUAL/MIXED/CREATIVE) and addressee
`research_decision.txt`	DE + EN	Phase 3	Decide if web research needed + generate queries
`followup_intent_detection.txt`	DE + EN	Cache follow-up	Detect if user wants more details from cache
`url_ranking.txt`	EN only	Quick-Search Phase 2.5	Rank URLs by relevance (output: numeric indices)

Language Rules: - EN only: Output is structured/numeric (parseable), language doesn't affect result - DE + EN: Output depends on user's language or requires semantic understanding in that language

Prompt Directory Structure:

prompts/
├── de/
│   └── automatik/
│       ├── research_decision.txt      # German queries for German users
│       └── followup_intent_detection.txt
└── en/
    └── automatik/
        ├── intent_detection.txt       # Universal intent detection
        ├── research_decision.txt      # English queries (Query 1 always EN)
        ├── followup_intent_detection.txt
        └── url_ranking.txt            # Numeric output (indices)

---

🌐 REST API (Browser Remote Control)

AIfred provides a complete REST API for programmatic control - enabling remote operation via Cloud, automation systems, and third-party integrations.

API Endpoints

The API enables pure remote control - messages are injected into browser sessions, the browser performs the full processing (Intent Detection, Multi-Agent, Research, etc.). The user sees everything live in the browser.

Endpoint	Method	Description
`/api/health`	GET	Health check with backend status
`/api/settings`	GET	Retrieve global settings
`/api/settings`	PATCH	Update global settings (backend, models, TTS, …)
`/api/session/config`	POST	Update per-session config (agent, mode, research mode)
`/api/models`	GET	List available models
`/api/chat/inject`	POST	Inject message into browser session
`/api/chat/status`	GET	Check if inference is running (is_generating, message_count)
`/api/chat/history`	GET	Get chat history
`/api/chat/clear`	POST	Clear chat history
`/api/sessions`	GET	List all browser sessions
`/api/system/restart-ollama`	POST	Restart Ollama
`/api/system/restart-aifred`	POST	Restart AIfred
`/api/calibrate`	POST	Start context calibration

Global vs per-session: /api/settings covers truly global settings (backend, models, TTS voices, language, sampling). Anything that belongs to a specific conversation — agent, multi-agent mode, research mode, symposion participants — goes through /api/session/config and is stored in the session file as SSOT.

🔄 Research Mode Workflows

AIfred offers 4 different research modes, each using different strategies depending on requirements. Here's the detailed workflow for each mode:

Inject a message (browser runs full pipeline)

curl -X POST http://localhost:8002/api/chat/inject \ -H "Content-Type: application/json" \ -d '{"message": "What is Python?", "device_id": "abc123..."}'

🎯 aiskill88 AI 点评 B 级 2026-05-21

创新的多智能体协作框架，思维链和辩论模式设计合理。但生态成熟度待提升，建议关注长期维护动态。

📚 实用指南（长尾问题）

适合谁

构建多智能体协作系统的 Agent 开发者
构建企业知识库 / RAG 检索应用的团队
需要从图片、PDF 提取文字的文档自动化场景
做语音类 AI 产品的开发者

最佳实践

生产部署优先使用 Docker Compose 隔离依赖，并挂载 volume 持久化数据
本地部署优先选 GGUF 量化模型，节省显存并保持响应速度
分块大小建议 256-512 tokens，向量库优选 pgvector 或 Qdrant
Agent 任务先做 dry-run 验证工具调用链，再开启自主执行

常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
容器内无法访问宿主机 localhost — 使用 host.docker.internal
embedding 模型与查询模型不一致导致检索失效
显存不足直接 OOM — 优先降低 context 或换更小的量化模型
Python 依赖冲突：建议用 venv / uv 隔离环境

部署方案

Docker：AIfred-Intelligence 提供官方镜像，docker compose up 一键启动
CLI：直接 npm install -g / pip install，命令行调用
本地部署：CPU 8GB 起，GPU 推荐 16GB+ 显存
云端托管：可放在 Vercel / Railway / Fly.io 等 PaaS 平台

原始名称	`AIfred-Intelligence`
原始描述	开源AI工作流：🤵 AIfred-Intelligence — self-hosted Multi-Agent Assistant with Debate Modes (Sy。⭐32 · Python
Topics	`多智能体工作流编排思维链自托管Python`
GitHub	https://github.com/Peuqui/AIfred-Intelligence
License	NOASSERTION
语言	Python