May 15, 2026
Ownership and distribution
Vox Jot does not claim ownership of third-party model weights. Unless a model is listed as a platform runtime, Vox Jot either downloads the upstream model directly or mirrors/converts assets for app-managed installation.
App-managed mirrors and converted assets should preserve upstream license files, model cards, notices, source links, attribution, and lineage metadata where available.
Notice rules
- MIT, BSD, and Apache-2.0 models require preserved copyright and license text.
- CC-BY models require attribution, license link, source link, and conversion or mirror notes.
- Gated models require upstream provider terms acceptance before download.
- Non-commercial, research-only, custom, or unknown licenses require separate legal/product review.
- Voice cloning models require permission for any reference voice sample, regardless of model license.
Speech-to-text and file ASR
- OpenAI Whisper / whisper.cpp assets - MIT. Sources: openai/whisper and ggerganov/whisper.cpp.
- NVIDIA Parakeet - CC-BY-4.0 attribution required. Vox Jot may distribute converted or quantized assets through its managed model mirror.
- Useful Sensors Moonshine - MIT.
- FunAudioLLM SenseVoice Small - custom model license requiring review before normal commercial distribution.
- GigaAM v3 - MIT.
- Breeze ASR 25 whisper.cpp conversion - Apache-2.0.
- Qwen3 ASR MLX conversions - Apache-2.0.
- FireRedASR2 AED MLX conversion - Apache-2.0 per current catalog policy.
- Microsoft VibeVoice ASR MLX conversion - MIT.
- Apple SpeechAnalyzer - platform runtime governed by Apple platform terms.
Speech analysis and speaker isolation
- IBM Granite Speech 4.1 2B - Apache-2.0.
- Cohere Transcribe 03-2026 - Apache-2.0 with gated provider terms.
- pyannote Speaker Diarization Community-1 - CC-BY-4.0 attribution required and gated Hugging Face terms apply.
- pyannote Speaker Diarization 3.1 - MIT and gated Hugging Face terms apply.
- NVIDIA Sortformer v2.1 MLX conversion - CC-BY-4.0 attribution required.
- NVIDIA Sortformer v1 - CC-BY-NC-4.0; gated before download for non-commercial terms acknowledgement.
- Rev.ai Reverb Diarization V2 - custom/gated terms; gated before download.
- Polyvoice ONNX diarization uses WeSpeaker embeddings and Silero VAD assets.
Text-to-speech
- Qwen3 TTS - Apache-2.0.
- OpenVoice - MIT.
- Coqui XTTS v2 - Coqui Public Model License; gated before download for non-commercial terms and reference-voice consent.
- Supertonic 3 - OpenRAIL-M; requires legal/product review before normal commercial distribution.
- Resemble AI Chatterbox - MIT.
- Kokoro - Apache-2.0.
- Dia - Apache-2.0.
- Sesame CSM - Apache-2.0.
- Spark TTS - CC-BY-NC-SA-4.0; gated before download for non-commercial terms.
- Llama OuteTTS - CC-BY-NC-SA-4.0; gated before download for non-commercial terms and reference-voice consent.
- Ming Omni TTS - Apache-2.0.
- KugelAudio - MIT.
- Bark Small MLX conversion - MIT.
- Fish Audio S2 Pro - Fish Audio Research License; gated before download and commercial use requires a separate license.
- Liquid LFM2.5 Audio 1.5B - LFM Open License v1.0; gated before download for custom license review.
- LongCat AudioDiT - MIT.
- Soprano - Apache-2.0.
- MeloTTS English MLX - MIT.
- Higgs Audio v2 - Apache-2.0.
- MOSS-TTS Nano - Apache-2.0.
- Irodori TTS - MIT.
- IndexTTS - Apache-2.0.
- OmniVoice - Apache-2.0.
- VibeVoice Realtime - MIT.
- Voxtral TTS - CC-BY-NC-4.0; gated before download for non-commercial terms.
- VoxCPM2 - Apache-2.0.
- Pocket TTS - CC-BY-4.0 attribution required.
- k2-fsa sherpa-onnx TTS runtime and VITS voice packs - preserve upstream runtime and voice-pack notices.
OCR
- PaddlePaddle PP-OCRv5 - Apache-2.0.
- LightOnOCR-2 1B - Apache-2.0.
- Allen AI olmOCR-2 7B - Apache-2.0.
- GLM-OCR - MIT.
- Chandra OCR 2 - Modified OpenRAIL-M; gated before download for custom license review.
- Qwen2.5-VL OCR - Qwen Research License; gated before download for non-commercial terms.
- Tesseract tessdata_best - Apache-2.0.
Known review-required models
The following catalog rows need legal/product review before normal
commercial distribution or should be explicitly gated: XTTS v2,
Fish Audio S2 Pro, Spark TTS, Llama OuteTTS, Voxtral TTS, NVIDIA
Sortformer v1, Chandra OCR 2, Qwen2.5-VL OCR, Reverb Diarization
V2, LFM Audio, and any row whose current license is
Other, Custom,
Research-only, or missing.
Vox Jot