LlamaFactory Agent工作流 是 AI Skill Hub 本期精选AI工具之一。在 GitHub 上收获超过 71.2k 颗 Star,综合评分 9.0 分,整体质量较高。我们强烈推荐将其纳入你的 AI 工具库,帮助提升工作效率。
LlamaFactory Agent工作流 是一款基于 Python 开发的开源工具,专注于 模型微调、大语言模型、工作流自动化 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
LlamaFactory Agent工作流 是一款基于 Python 开发的开源工具,专注于 模型微调、大语言模型、工作流自动化 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install llamafactory
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install llamafactory
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/hiyouga/LlamaFactory
cd LlamaFactory
pip install -e .
# 验证安装
python -c "import llamafactory; print('安装成功')"
# 命令行使用
llamafactory --help
# 基本用法
llamafactory input_file -o output_file
# Python 代码中调用
import llamafactory
# 示例
result = llamafactory.process("input")
print(result)
# llamafactory 配置文件示例(config.yml) app: name: "llamafactory" debug: false log_level: "INFO" # 运行时指定配置文件 llamafactory --config config.yml # 或通过环境变量配置 export LLAMAFACTORY_API_KEY="your-key" export LLAMAFACTORY_OUTPUT_DIR="./output"
| Mandatory | Minimum | Recommend |
|---|---|---|
| python | 3.11 | >=3.11 |
| torch | 2.0.0 | 2.6.0 |
| torchvision | 0.15.0 | 0.21.0 |
| transformers | 4.49.0 | 4.50.0 |
| datasets | 2.16.0 | 3.2.0 |
| accelerate | 0.34.0 | 1.2.1 |
| peft | 0.14.0 | 0.15.1 |
| trl | 0.8.6 | 0.9.6 |
| Optional | Minimum | Recommend |
|---|---|---|
| CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.16.4 |
| bitsandbytes | 0.39.0 | 0.43.1 |
| vllm | 0.4.3 | 0.8.2 |
| flash-attn | 2.5.6 | 2.7.2 |
\* estimated
| Method | Bits | 7B | 14B | 30B | 70B | xB |
|---|---|---|---|---|---|---|
Full (bf16 or fp16) | 32 | 120GB | 240GB | 600GB | 1200GB | 18xGB |
Full (pure_bf16) | 16 | 60GB | 120GB | 300GB | 600GB | 8xGB |
| Freeze/LoRA/GaLore/APOLLO/BAdam/OFT | 16 | 16GB | 32GB | 64GB | 160GB | 2xGB |
| QLoRA / QOFT | 8 | 10GB | 20GB | 40GB | 80GB | xGB |
| QLoRA / QOFT | 4 | 6GB | 12GB | 24GB | 48GB | x/2GB |
| QLoRA / QOFT | 2 | 4GB | 8GB | 16GB | 24GB | x/4GB |
pip install -r requirements-dev.txt
apt-get install -y build-essential cmake
[!IMPORTANT] Installation is mandatory.
git clone --depth 1 https://github.com/hiyouga/LlamaFactory.git
cd LlamaFactory
pip install -e .
pip install -r requirements/metrics.txt
Optional dependencies available: metrics, deepspeed. Install with: pip install -e . && pip install -r requirements/metrics.txt -r requirements/deepspeed.txt
Additional dependencies for specific features are available in examples/requirements/.
docker run -it --rm --gpus=all --ipc=host hiyouga/llamafactory:latest
This image is built on Ubuntu 22.04 (x86\_64), CUDA 12.4, Python 3.11, PyTorch 2.6.0, and Flash-attn 2.7.4.
Find the pre-built images: https://hub.docker.com/r/hiyouga/llamafactory/tags
Please refer to build docker to build the image yourself.
<details><summary>Setting up a virtual environment with <b>uv</b></summary>
Create an isolated Python environment with uv:
uv run llamafactory-cli webui
</details>
<details><summary>For Windows users</summary>
You need to manually install the GPU version of PyTorch on the Windows platform. Please refer to the official website and the following command to install PyTorch with CUDA support:
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
python -c "import torch; print(torch.cuda.is_available())"
If you see True then you have successfully installed PyTorch with CUDA support.
Try dataloader_num_workers: 0 if you encounter Can't pickle local object error.
If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you need to install a pre-built version of bitsandbytes library, which supports CUDA 11.1 to 12.2, please select the appropriate release version based on your CUDA version.
pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl
To enable FlashAttention-2 on the Windows platform, please use the script from flash-attention-windows-wheel to compile and install it by yourself.
</details>
<details><summary>For Ascend NPU users</summary>
To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher: pip install -r requirements/npu.txt. Additionally, you need to install the Ascend CANN Toolkit and Kernels. Please follow the installation tutorial.
You can also download the pre-built Docker images:
```bash
docker pull hiyouga/llamafactory:latest-npu-a2 docker pull hiyouga/llamafactory:latest-npu-a3
cmake -DCOMPUTE_BACKEND=npu -S . make pip install .
2. Install transformers from the main branch.
bash git clone -b main https://github.com/huggingface/transformers.git cd transformers pip install . ```
double_quantization: false in the configuration. You can refer to the example.</details>
For CUDA users:
cd docker/docker-cuda/
docker compose up -d
docker compose exec llamafactory bash
For Ascend NPU users:
cd docker/docker-npu/
docker compose up -d
docker compose exec llamafactory bash
For AMD ROCm users:
cd docker/docker-rocm/
docker compose up -d
docker compose exec llamafactory bash
<details><summary>Build without Docker Compose</summary>
For CUDA users:
docker build -f ./docker/docker-cuda/Dockerfile \
--build-arg PIP_INDEX=https://pypi.org/simple \
-t llamafactory:latest .
docker run -dit --ipc=host --gpus=all \
-p 7860:7860 \
-p 8000:8000 \
--name llamafactory \
llamafactory:latest
docker exec -it llamafactory bash
For Ascend NPU users:
docker build -f ./docker/docker-npu/Dockerfile \
--build-arg PIP_INDEX=https://pypi.org/simple \
-t llamafactory:latest .
docker run -dit --ipc=host \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-p 7860:7860 \
-p 8000:8000 \
--device /dev/davinci0 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
--name llamafactory \
llamafactory:latest
docker exec -it llamafactory bash
For AMD ROCm users:
docker build -f ./docker/docker-rocm/Dockerfile \
--build-arg PIP_INDEX=https://pypi.org/simple \
-t llamafactory:latest .
docker run -dit --ipc=host \
-p 7860:7860 \
-p 8000:8000 \
--device /dev/kfd \
--device /dev/dri \
--name llamafactory \
llamafactory:latest
docker exec -it llamafactory bash
</details>
<details><summary>Use Docker volumes</summary>
You can uncomment VOLUME [ "/root/.cache/huggingface", "/app/shared_data", "/app/output" ] in the Dockerfile to use data volumes.
When building the Docker image, use -v ./hf_cache:/root/.cache/huggingface argument to mount the local directory to the container. The following data volumes are available.
hf_cache: Utilize Hugging Face cache on the host machine.shared_data: The directionary to store datasets on the host machine.output: Set export dir to this location so that the merged result can be accessed directly on the host machine.</details>
API_PORT=8000 llamafactory-cli api examples/inference/qwen3.yaml infer_backend=vllm vllm_enforce_eager=true
[!TIP] Visit this page for API document. Examples: Image understanding | Function calling
</div>
👋 Join our WeChat, NPU, Lab4AI, LLaMA Factory Online user group.
\ English | [中文 \]
Fine-tuning a large language model can be easy as...
https://github.com/user-attachments/assets/3991a3a8-4276-4d30-9cab-4cb0c4b9b99e
Start local training: - Please refer to usage
Start cloud training: - Colab (free): https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing - PAI-DSW (free trial): https://gallery.pai-ml.com/#/preview/deepLearning/nlp/llama_factory - LLaMA Factory Online: https://www.llamafactory.com.cn/?utm_source=LLaMA-Factory - Alaya NeW (cloud GPU deal): https://docs.alayanew.com/docs/documents/useGuide/LLaMAFactory/mutiple/?utm_source=LLaMA-Factory
Read technical notes: - Documentation (WIP): https://llamafactory.readthedocs.io/en/latest/ - Documentation (AMD GPU): https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/fine_tune/llama_factory_llama3.html - Official Blog: https://blog.llamafactory.net/en/ - Official Course: https://www.lab4ai.cn/course/detail?id=7c13e60f6137474eb40f6fd3983c0f46&utm_source=LLaMA-Factory
[!NOTE] Except for the above links, all other websites are unauthorized third-party websites. Please carefully use them.
Use the following 3 commands to run LoRA fine-tuning, inference and merging of the Qwen3-4B-Instruct model, respectively.
llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
See examples/README.md for advanced usage (including distributed training).
[!TIP] Use llamafactory-cli help to show help information. Read FAQs first if you encounter any problems.
| Model | Model size | Template |
|---|---|---|
| [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - |
| [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek |
| [DeepSeek 3-3.2](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 |
| [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 |
| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie_nothink |
| [Falcon/Falcon H1](https://huggingface.co/tiiuae) | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1 |
| [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma/gemma2 |
| [Gemma 3/Gemma 3n](https://huggingface.co/google) | 270M/1B/4B/6B/8B/12B/27B | gemma3/gemma3n |
| [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/zai-org) | 9B/32B | glm4/glmz1 |
| [GLM-4.5/GLM-4.5(6)V](https://huggingface.co/zai-org) | 9B/106B/355B | glm4_moe/glm4_5v |
| [GPT-2](https://huggingface.co/openai-community) | 0.1B/0.4B/0.8B/1.5B | - |
| [GPT-OSS](https://huggingface.co/openai) | 20B/120B | gpt_oss |
| [Granite 3-4](https://huggingface.co/ibm-granite) | 1B/2B/3B/7B/8B | granite3/granite4 |
| [Hunyuan/Hunyuan1.5 (MT)](https://huggingface.co/tencent/) | 0.5B/1.8B/4B/7B/13B | hunyuan/hunyuan_small |
| [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 |
| [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab) | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl |
| [Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 |
| [Kimi-VL](https://huggingface.co/moonshotai) | 16B | kimi_vl |
| [Ling 2.0 (mini/flash)](https://huggingface.co/inclusionAI) | 16B/100B | bailing_v2 |
| [LFM 2.5 (VL)](https://huggingface.co/LiquidAI) | 1.2B/1.6B | lfm2/lfm2_vl |
| [Llama](https://github.com/facebookresearch/llama) | 7B/13B/33B/65B | - |
| [Llama 2](https://huggingface.co/meta-llama) | 7B/13B/70B | llama2 |
| [Llama 3-3.3](https://huggingface.co/meta-llama) | 1B/3B/8B/70B | llama3 |
| [Llama 4](https://huggingface.co/meta-llama) | 109B/402B | llama4 |
| [Llama 3.2 Vision](https://huggingface.co/meta-llama) | 11B/90B | mllama |
| [LLaVA-1.5](https://huggingface.co/llava-hf) | 7B/13B | llava |
| [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next |
| [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video |
| [MiMo](https://huggingface.co/XiaomiMiMo) | 7B/309B | mimo/mimo_v2 |
| [MiniCPM 4](https://huggingface.co/openbmb) | 0.5B/8B | cpm4 |
| [MiniCPM-o/MiniCPM-V 4.5](https://huggingface.co/openbmb) | 8B/9B | minicpm_o/minicpm_v |
| [MiniMax-M1/MiniMax-M2](https://huggingface.co/MiniMaxAI/models) | 229B/456B | minimax1/minimax2 |
| [Ministral 3](https://huggingface.co/mistralai) | 3B/8B/14B | ministral3 |
| [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral |
| [PaliGemma/PaliGemma2](https://huggingface.co/google) | 3B/10B/28B | paligemma |
| [Phi-3/Phi-3.5](https://huggingface.co/microsoft) | 4B/14B | phi |
| [Phi-3-small](https://huggingface.co/microsoft) | 7B | phi_small |
| [Phi-4-mini/Phi-4](https://huggingface.co/microsoft) | 3.8B/14B | phi4_mini/phi4 |
| [Pixtral](https://huggingface.co/mistralai) | 12B | pixtral |
| [Qwen2 (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
| [Qwen3 (MoE/Instruct/Thinking/Next)](https://huggingface.co/Qwen) | 0.6B/1.7B/4B/8B/14B/32B/80B/235B | qwen3/qwen3_nothink |
| [Qwen3.5](https://huggingface.co/Qwen) | 0.8B/2B/4B/9B/27B/35B/122B/397B | qwen3_5/qwen3_5_nothink |
| [Qwen3.6](https://huggingface.co/Qwen) | 27B/35B | qwen3_6 |
| [Qwen2-Audio](https://huggingface.co/Qwen) | 7B | qwen2_audio |
| [Qwen2.5-Omni](https://huggingface.co/Qwen) | 3B/7B | qwen2_omni |
| [Qwen3-Omni](https://huggingface.co/Qwen) | 30B | qwen3_omni |
| [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen) | 2B/3B/7B/32B/72B | qwen2_vl |
| [Qwen3-VL](https://huggingface.co/Qwen) | 2B/4B/8B/30B/32B/235B | qwen3_vl |
| [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed) | 8B/36B | seed_oss/seed_coder |
| [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - |
| [TeleChat 2-2.5](https://huggingface.co/Tele-AI) | 3B/7B/35B/115B | telechat2 |
| [Yuan 2](https://huggingface.co/IEITYuan) | 2B/51B/102B | yuan |
[!NOTE] For the "base" models, thetemplateargument can be chosen fromdefault,alpaca,vicunaetc. But make sure to use the corresponding template for the "instruct/chat" models. If the model has both reasoning and non-reasoning versions, please use the_nothinksuffix to distinguish between them. For example,qwen3andqwen3_nothink. Remember to use the SAME template in training and inference. \: You should install thetransformersfrom main branch and useDISABLE_VERSION_CHECK=1to skip version check. \\*: You need to install a specific version oftransformersto use the corresponding model.
Please refer to constants.py for a full list of models we supported.
You also can add a custom chat template to template.py.
LlamaFactory是业界领先的统一微调平台,技术深度与实用性兼备。支持广泛模型生态,ACL2024论文加持,社区活跃度高。是企业和研究者进行模型定制的首选工具。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ Apache 2.0 — 宽松开源协议,可商用,需保留版权声明和 NOTICE 文件,含专利授权条款。
经综合评估,LlamaFactory Agent工作流 在AI工具赛道中表现稳健,质量优秀。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | LlamaFactory |
| 原始描述 | 开源AI工作流:Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)。⭐71.2k · Python |
| Topics | 模型微调大语言模型工作流自动化开源框架多模态 |
| GitHub | https://github.com/hiyouga/LlamaFactory |
| License | Apache-2.0 |
| 语言 | Python |
收录时间:2026-05-13 · 更新时间:2026-05-16 · License:Apache-2.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。