Unsigned
7
1
voice-stt
:Voice STT — Faster-Whisper Small
Speech-to-text model for the voice AI pipeline. Uses Faster-Whisper with the CTranslate2 backend for low-latency transcription.
Model Details
| Property | Value |
|---|---|
| Base model | Whisper Small |
| Backend | CTranslate2 |
| Size | ~500MB |
| Latency | <200ms per utterance (CPU) |
| Input | 16kHz mono audio |
| VAD | Built-in (silero) |
KitOps Usage
# Pack
kit pack . -t jozu.ml/arnabchat2001/voice-stt:v1.0.0
# Push
kit push jozu.ml/arnabchat2001/voice-stt:v1.0.0
# Unpack (model weights only)
kit unpack jozu.ml/arnabchat2001/voice-stt:v1.0.0 --filter=model -d ./output
Local Usage
from src.stt_service import STTService
stt = STTService()
text = stt.transcribe(audio_array, sample_rate=16000)