AI Skill Hub 强烈推荐:Whisper语音识别引擎 是一款优质的AI工具。在 GitHub 上收获超过 49.7k 颗 Star,AI 综合评分 8.8 分,在同类工具中表现稳健。如果你正在寻找可靠的AI工具解决方案,这是一个值得深入了解的选择。
OpenAI Whisper模型的C/C++高性能实现,专为离线语音转文字优化。支持多语言识别,资源占用小,适合开发者集成到应用中或部署在边缘设备上。
Whisper语音识别引擎 是一款基于 C++ 开发的开源工具,专注于 语音识别、语音转文字、C++ 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
OpenAI Whisper模型的C/C++高性能实现,专为离线语音转文字优化。支持多语言识别,资源占用小,适合开发者集成到应用中或部署在边缘设备上。
Whisper语音识别引擎 是一款基于 C++ 开发的开源工具,专注于 语音识别、语音转文字、C++ 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 克隆仓库 git clone https://github.com/ggml-org/whisper.cpp cd whisper.cpp # 查看安装说明 cat README.md # 按 README 完成环境依赖安装后即可使用
# 查看帮助 whisper.cpp --help # 基本运行 whisper.cpp [options] <input> # 详细使用说明请查阅文档 # https://github.com/ggml-org/whisper.cpp
# whisper.cpp 配置说明 # 查看配置选项 whisper.cpp --config-example > config.yml # 常见配置项 # output_dir: ./output # log_level: info # workers: 4 # 环境变量(覆盖配置文件) export WHISPER.CPP_CONFIG="/path/to/config.yml"

High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model:
Supported platforms:
The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library.
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications. As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: whisper.objc
https://user-images.githubusercontent.com/1991296/197385372-962a6dea-bca1-4d50-bf96-1d8c27b98c81.mp4
You can also easily make your own offline voice assistant application: command
https://user-images.githubusercontent.com/1991296/204038393-2f846eae-c255-4099-a76d-5735c25c49da.mp4
On Apple Silicon, the inference runs fully on the GPU via Metal:
https://github.com/ggml-org/whisper.cpp/assets/1991296/c82e8f86-60dc-49f2-b048-d2fdbd6b5225
cmake -B build cmake --build build -j --config Release
cmake -B build -DGGML_BLAS=1 cmake --build build -j --config Release ./build/bin/whisper-cli [ .. etc .. ] ```
You can install pre-built binaries for whisper.cpp or build it from source using Conan. Use the following command:
conan install --requires="whisper-cpp/[*]" --build=missing
For detailed instructions on how to use Conan, please refer to the Conan documentation.
First clone the repository:
git clone https://github.com/ggml-org/whisper.cpp.git
Navigate into the directory:
cd whisper.cpp
Then, download one of the Whisper models converted in ggml format. For example:
sh ./models/download-ggml-model.sh base.en
Now build the whisper-cli example and transcribe an audio file like this:
```bash
| Model | Disk | Mem |
|---|---|---|
| tiny | 75 MiB | ~273 MB |
| base | 142 MiB | ~388 MB |
| small | 466 MiB | ~852 MB |
| medium | 1.5 GiB | ~2.1 GB |
| large | 2.9 GiB | ~3.9 GB |
./build/bin/whisper-cli -m models/ggml-base.en-q5_0.bin ./samples/gb0.wav ```
```shell
This is a naive example of performing real-time inference on audio from your microphone. The stream tool samples the audio every half a second and runs the transcription continuously. More info is available in issue #10. You will need to have sdl2 installed for it to work properly.
cmake -B build -DWHISPER_SDL2=ON
cmake --build build -j --config Release
./build/bin/whisper-stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000
https://user-images.githubusercontent.com/1991296/194935793-76afede7-cfa8-48d8-a80f-28ba83be7d09.mp4
There are various examples of using the library for different projects in the examples folder. Some of the examples are even ported to run in the browser using WebAssembly. Check them out!
| Example | Web | Description |
|---|---|---|
| [whisper-cli](examples/cli) | [whisper.wasm](examples/whisper.wasm) | Tool for translating and transcribing audio using Whisper |
| [whisper-bench](examples/bench) | [bench.wasm](examples/bench.wasm) | Benchmark the performance of Whisper on your machine |
| [whisper-stream](examples/stream) | [stream.wasm](examples/stream.wasm) | Real-time transcription of raw microphone capture |
| [whisper-command](examples/command) | [command.wasm](examples/command.wasm) | Basic voice assistant example for receiving voice commands from the mic |
| [whisper-server](examples/server) | HTTP transcription server with OAI-like API | |
| [whisper-talk-llama](examples/talk-llama) | Talk with a LLaMA bot | |
| [whisper.objc](examples/whisper.objc) | iOS mobile application using whisper.cpp | |
| [whisper.swiftui](examples/whisper.swiftui) | SwiftUI iOS / macOS application using whisper.cpp | |
| [whisper.android](examples/whisper.android) | Android mobile application using whisper.cpp | |
| [whisper.nvim](examples/whisper.nvim) | Speech-to-text plugin for Neovim | |
| [generate-karaoke.sh](examples/generate-karaoke.sh) | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture | |
| [livestream.sh](examples/livestream.sh) | [Livestream audio transcription](https://github.com/ggml-org/whisper.cpp/issues/185) | |
| [yt-wsp.sh](examples/yt-wsp.sh) | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) | |
| [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
If you want some extra audio samples to play with, simply run:
make -j samples
This will download a few more audio files from Wikipedia and convert them to 16-bit WAV format via ffmpeg.
You can download and run the other models as follows:
make -j tiny.en
make -j tiny
make -j base.en
make -j base
make -j small.en
make -j small
make -j medium.en
make -j medium
make -j large-v1
make -j large-v2
make -j large-v3
make -j large-v3-turbo
docker run -it --rm \ -v path/to/models:/models \ whisper.cpp:main "whisper-cli -m /models/ggml-base.bin -f ./samples/jfk.wav"
On platforms that support OpenVINO, the Encoder inference can be executed on OpenVINO-supported devices including x86 CPUs and Intel GPUs (integrated & discrete).
This can result in significant speedup in encoder performance. Here are the instructions for generating the OpenVINO model and using it with whisper.cpp:
Windows:
cd models
python -m venv openvino_conv_env
openvino_conv_env\Scripts\activate
python -m pip install --upgrade pip
pip install -r requirements-openvino.txt
Linux and macOS:
cd models
python3 -m venv openvino_conv_env
source openvino_conv_env/bin/activate
python -m pip install --upgrade pip
pip install -r requirements-openvino.txt
base.en model, use: python convert-whisper-to-openvino.py --model base.en
This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that is the default location that the OpenVINO extension will search at runtime.
whisper.cpp with OpenVINO support:Download OpenVINO package from release page. The recommended version to use is 2024.6.0. Ready to use Binaries of the required libraries can be found in the OpenVino Archives
After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
Linux:
source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
Windows (cmd):
C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
And then build the project using cmake:
cmake -B build -DWHISPER_OPENVINO=1
cmake --build build -j --config Release
$ ./build/bin/whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav
...
whisper_ctx_init_openvino_encoder: loading OpenVINO model from 'models/ggml-base.en-encoder-openvino.xml'
whisper_ctx_init_openvino_encoder: first run on a device may take a while ...
whisper_openvino_init: path_model = models/ggml-base.en-encoder-openvino.xml, device = GPU, cache_dir = models/ggml-base.en-encoder-openvino-cache
whisper_ctx_init_openvino_encoder: OpenVINO model loaded
system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 1 |
...
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get cached for the next run.
For more information about the OpenVINO implementation please refer to PR #1037.
* --vad-threshold: Threshold probability for speech detection. A probability for a speech segment/frame above this threshold will be considered as speech.
* --vad-min-speech-duration-ms: Minimum speech duration in milliseconds. Speech segments shorter than this value will be discarded to filter out brief noise or false positives.
* --vad-min-silence-duration-ms: Minimum silence duration in milliseconds. Silence periods must be at least this long to end a speech segment. Shorter silence periods will be ignored and included as part of the speech.
* --vad-max-speech-duration-s: Maximum speech duration in seconds. Speech segments longer than this will be automatically split into multiple segments at silence points exceeding 98ms to prevent excessively long segments.
* --vad-speech-pad-ms: Speech padding in milliseconds. Adds this amount of padding before and after each detected speech segment to avoid cutting off speech edges.
* --vad-samples-overlap: Amount of audio to extend from each speech segment into the next one, in seconds (e.g., 0.10 = 100ms overlap). This ensures speech isn't cut off abruptly between segments when they're concatenated together.
Use the scripts/bench-wts.sh script to generate a video in the following format:
./scripts/bench-wts.sh samples/jfk.wav
ffplay ./samples/jfk.wav.all.mp4
https://user-images.githubusercontent.com/1991296/223206245-2d36d903-cf8e-4f09-8c3b-eb9f9c39d6fc.mp4
---
业界公认的高质量语音识别方案,C++实现性能优异,社区活跃度高,适合对推理速度和隐私保护有要求的场景。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
总体来看,Whisper语音识别引擎 是一款质量优秀的AI工具,在同类工具中具备一定竞争力。AI Skill Hub 将持续追踪其更新动态,建议收藏备用,结合自身场景选择合适时机引入使用。
| 原始名称 | whisper-cpp |
| 原始描述 | 开源AI工具:Port of OpenAI's Whisper model in C/C++。⭐49.7k · C++ |
| Topics | 语音识别语音转文字C++离线推理边缘计算 |
| GitHub | https://github.com/ggml-org/whisper.cpp |
| License | MIT |
| 语言 | C++ |
收录时间:2026-05-14 · 更新时间:2026-05-16 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。