Vox Jot Third-Party Model Notices

Last Updated

May 15, 2026

Ownership and distribution

Vox Jot does not claim ownership of third-party model weights. Unless a model is listed as a platform runtime, Vox Jot either downloads the upstream model directly or mirrors/converts assets for app-managed installation.

App-managed mirrors and converted assets should preserve upstream license files, model cards, notices, source links, attribution, and lineage metadata where available.

Notice rules

MIT, BSD, and Apache-2.0 models require preserved copyright and license text.
CC-BY models require attribution, license link, source link, and conversion or mirror notes.
Gated models require upstream provider terms acceptance before download.
Non-commercial, research-only, custom, or unknown licenses require separate legal/product review.
Voice cloning models require permission for any reference voice sample, regardless of model license.

Speech-to-text and file ASR

OpenAI Whisper / whisper.cpp assets - MIT. Sources: openai/whisper and ggerganov/whisper.cpp.
NVIDIA Parakeet - CC-BY-4.0 attribution required. Vox Jot may distribute converted or quantized assets through its managed model mirror.
Useful Sensors Moonshine - MIT.
FunAudioLLM SenseVoice Small - custom model license requiring review before normal commercial distribution.
GigaAM v3 - MIT.
Breeze ASR 25 whisper.cpp conversion - Apache-2.0.
Qwen3 ASR MLX conversions - Apache-2.0.
FireRedASR2 AED MLX conversion - Apache-2.0 per current catalog policy.
Microsoft VibeVoice ASR MLX conversion - MIT.
Apple SpeechAnalyzer - platform runtime governed by Apple platform terms.

Speech analysis and speaker isolation

IBM Granite Speech 4.1 2B - Apache-2.0.
Cohere Transcribe 03-2026 - Apache-2.0 with gated provider terms.
pyannote Speaker Diarization Community-1 - CC-BY-4.0 attribution required and gated Hugging Face terms apply.
pyannote Speaker Diarization 3.1 - MIT and gated Hugging Face terms apply.
NVIDIA Sortformer v2.1 MLX conversion - CC-BY-4.0 attribution required.
NVIDIA Sortformer v1 - CC-BY-NC-4.0; gated before download for non-commercial terms acknowledgement.
Rev.ai Reverb Diarization V2 - custom/gated terms; gated before download.
Polyvoice ONNX diarization uses WeSpeaker embeddings and Silero VAD assets.

Text-to-speech

Qwen3 TTS - Apache-2.0.
OpenVoice - MIT.
Coqui XTTS v2 - Coqui Public Model License; gated before download for non-commercial terms and reference-voice consent.
Supertonic 3 - OpenRAIL-M; requires legal/product review before normal commercial distribution.
Resemble AI Chatterbox - MIT.
Kokoro - Apache-2.0.
Dia - Apache-2.0.
Sesame CSM - Apache-2.0.
Spark TTS - CC-BY-NC-SA-4.0; gated before download for non-commercial terms.
Llama OuteTTS - CC-BY-NC-SA-4.0; gated before download for non-commercial terms and reference-voice consent.
Ming Omni TTS - Apache-2.0.
KugelAudio - MIT.
Bark Small MLX conversion - MIT.
Fish Audio S2 Pro - Fish Audio Research License; gated before download and commercial use requires a separate license.
Liquid LFM2.5 Audio 1.5B - LFM Open License v1.0; gated before download for custom license review.
LongCat AudioDiT - MIT.
Soprano - Apache-2.0.
MeloTTS English MLX - MIT.
Higgs Audio v2 - Apache-2.0.
MOSS-TTS Nano - Apache-2.0.
Irodori TTS - MIT.
IndexTTS - Apache-2.0.
OmniVoice - Apache-2.0.
VibeVoice Realtime - MIT.
Voxtral TTS - CC-BY-NC-4.0; gated before download for non-commercial terms.
VoxCPM2 - Apache-2.0.
Pocket TTS - CC-BY-4.0 attribution required.
k2-fsa sherpa-onnx TTS runtime and VITS voice packs - preserve upstream runtime and voice-pack notices.

OCR

PaddlePaddle PP-OCRv5 - Apache-2.0.
LightOnOCR-2 1B - Apache-2.0.
Allen AI olmOCR-2 7B - Apache-2.0.
GLM-OCR - MIT.
Chandra OCR 2 - Modified OpenRAIL-M; gated before download for custom license review.
Qwen2.5-VL OCR - Qwen Research License; gated before download for non-commercial terms.
Tesseract tessdata_best - Apache-2.0.

Known review-required models

The following catalog rows need legal/product review before normal commercial distribution or should be explicitly gated: XTTS v2, Fish Audio S2 Pro, Spark TTS, Llama OuteTTS, Voxtral TTS, NVIDIA Sortformer v1, Chandra OCR 2, Qwen2.5-VL OCR, Reverb Diarization V2, LFM Audio, and any row whose current license is Other, Custom, Research-only, or missing.

Model licenses and attribution notices stay public.