voice-pipeline
:Voice AI Pipeline with KitOps & Jozu Hub
A self-hosted call centre voice agent built with open-source models, packaged and managed with KitOps and Jozu Hub.
Phone → STT (Faster-Whisper) → LLM (Qwen3) → TTS (Kokoro) → Phone
Why This Exists
Voice AI pipelines run 3 models updating on different schedules. Without proper model management, you end up with:
- The wrong model version in production during a live call
- No way to roll back when a new model performs worse
- No audit trail of what's running and who deployed it
KitOps packages each model into a versioned, scannable ModelKit. Jozu Hub stores them with security scanning, signed attestations, and policy gates. Together, they give you git-like control over your AI models.
Architecture
┌─────────┐ ┌──────────────────────────────────────┐ ┌─────────┐
│ Phone │────▶│ Voice AI Pipeline │────▶│ Phone │
│ (caller) │ SIP │ │ SIP │ (caller) │
└─────────┘ │ ┌─────┐ ┌──────┐ ┌───────┐ │ └─────────┘
│ │ STT │──▶│ LLM │──▶│ TTS │ │
│ └─────┘ └──────┘ └───────┘ │
│ Faster- Qwen3 Kokoro │
│ Whisper 0.6B 82M │
└──────────────────────────────────────┘
▲ │
│ kit unpack │
┌────┴────────────────────┴────┐
│ Jozu Hub │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │ STT │ │ LLM │ │ TTS │ │
│ │ v1.0 │ │ v1.0 │ │ v1.0 │ │
│ └──────┘ └──────┘ └──────┘ │
└──────────────────────────────┘
Models
| Role | Model | Size | Why |
|---|---|---|---|
| STT | Faster-Whisper Small | ~500MB | CTranslate2 backend, <200ms latency, VAD built-in |
| LLM | Qwen3-0.6B (Q4_K_M) | ~400MB | Dense model, fast CPU inference, good for scripted flows |
| TTS | Kokoro | ~300MB | 82M params, sounds better than models 10x its size |
Quick Start
Option A: One command from Jozu Hub (recommended)
All three models are pre-packaged on Jozu Hub. One pull, one unpack, you're running.
kit login jozu.ml
./scripts/pull_from_jozu.sh
pip install -r requirements.txt
python pipeline/server.py
This pulls the voice-pipeline ModelKit (~1.2GB) which bundles all three models, inference code, configs, and prompts into a single OCI artifact.
Option B: Per-model deployment (production)
Pull each model independently for version control at the model level:
kit login jozu.ml
./scripts/pull_from_jozu.sh --separate
pip install -r requirements.txt
python pipeline/server.py
This pulls 3 separate ModelKits. You can also cherry-pick:
# Just the LLM weights (ops deplying to inference server)
kit unpack jozu.ml/arnabchat2001/voice-llm:v1.0.0 --filter=model
# Just the prompt (dev iterating on agent persona)
kit unpack jozu.ml/arnabchat2001/voice-llm:v1.0.0 --filter=prompts
Option C: Direct from Hugging Face (no KitOps)
./scripts/setup_models.sh
pip install -r requirements.txt
python pipeline/server.py
Docker
# Download models first
./scripts/setup_models.sh # or kit unpack per Option B
# Run
docker compose up
Testing
# Audio mode — speak into your mic
python pipeline/ws_test_client.py
# Text mode — type what the caller says
python pipeline/ws_test_client.py --text
KitOps Workflow
Packaging models
Each model has its own Kitfile — a YAML manifest that bundles weights, code, configs, and test data into a single OCI artifact.
# Pack individual models
kit pack ./models/stt -t jozu.ml/arnabchat2001/voice-stt:v1.0.0
kit pack ./models/llm -t jozu.ml/arnabchat2001/voice-llm:v1.0.0
kit pack ./models/tts -t jozu.ml/arnabchat2001/voice-tts:v1.0.0
# Push to Jozu Hub
kit push jozu.ml/arnabchat2001/voice-stt:v1.0.0
kit push jozu.ml/arnabchat2001/voice-llm:v1.0.0
kit push jozu.ml/arnabchat2001/voice-tts:v1.0.0
Or use the all-in-one script:
VERSION=v1.0.0 ./scripts/pack_and_push.sh
Selective unpack
The --filter flag lets different roles get only what they need:
# Ops team: just the model weights (deploy to inference server)
kit unpack jozu.ml/arnabchat2001/voice-llm:v1.0.0 --filter=model
# Dev team: just the code and prompts (iterate on logic)
kit unpack jozu.ml/arnabchat2001/voice-llm:v1.0.0 --filter=code --filter=prompts
# QA team: just the test data (run validation)
kit unpack jozu.ml/arnabchat2001/voice-llm:v1.0.0 --filter=datasets
Champion/challenger pattern
Safe model updates without downtime:
# 1. Push a new TTS model as a challenger
kit pack ./models/tts -t jozu.ml/arnabchat2001/voice-tts:v2.0.0
kit push jozu.ml/arnabchat2001/voice-tts:v2.0.0
# 2. Compare before promoting
kit diff jozu.ml/arnabchat2001/voice-tts:v1.0.0 jozu.ml/arnabchat2001/voice-tts:v2.0.0
# 3. Promote when confident
./scripts/promote_challenger.sh voice-tts v2.0.0
# 4. Roll back if needed
./scripts/rollback.sh voice-tts v1.0.0
What Jozu Hub adds
When you kit push to Jozu Hub, your ModelKit is automatically:
- Scanned by 5 security scanners (ModelScan, LLM Guard, Garak, Promptfoo, ART)
- Signed with cryptographic attestations (via Cosign)
- Inventoried with an AI SBOM (SPDX 3 format)
- Gated by OPA policies before deployment
This means every model version in your call centre pipeline has a verifiable security and compliance record.
Project Structure
jozu-voice-ai/
├── models/
│ ├── stt/
│ │ ├── Kitfile ← STT ModelKit definition
│ │ ├── config.yaml ← Runtime configuration
│ │ ├── src/stt_service.py ← Faster-Whisper inference wrapper
│ │ ├── weights/ ← Model weights (via kit unpack)
│ │ └── test_data/ ← Validation audio samples
│ ├── llm/
│ │ ├── Kitfile ← LLM ModelKit definition
│ │ ├── config.yaml
│ │ ├── src/llm_service.py ← Qwen3 inference wrapper (llama.cpp)
│ │ ├── prompts/ ← System prompt for the agent
│ │ ├── weights/ ← Model weights (via kit unpack)
│ │ └── test_data/ ← Validation transcripts
│ └── tts/
│ ├── Kitfile ← TTS ModelKit definition
│ ├── config.yaml
│ ├── src/tts_service.py ← Kokoro inference wrapper
│ ├── weights/ ← Model weights (via kit unpack)
│ └── voice_profiles/ ← Voice configuration
├── pipeline/
│ ├── voice_pipeline.py ← STT → LLM → TTS orchestration
│ ├── server.py ← WebSocket server
│ ├── ws_test_client.py ← Test client (mic or text input)
│ └── config.yaml
├── scripts/
│ ├── setup_models.sh ← Download models from Hugging Face
│ ├── pack_and_push.sh ← Pack + push all ModelKits
│ ├── deploy.sh ← Pull + unpack on deployment target
│ ├── promote_challenger.sh ← Promote a model version to champion
│ └── rollback.sh ← Roll back a model to previous version
├── Kitfile ← Meta ModelKit (whole pipeline in one)
├── Dockerfile
├── docker-compose.yaml
├── requirements.txt
└── blog_outline.md
Why 3 ModelKits + 1 Meta?
3 individual ModelKits (one per model) are the production pattern:
- Version each model independently
- Update the TTS voice without re-pushing the LLM
- Roll back just the broken model, not the whole pipeline
- Security scan results are per-model
1 meta ModelKit (root Kitfile) is the demo convenience:
kit unpackonce to get everything- LLM is the
model:, STT/TTS weights aredatasets:(a practical trade-off since Kitfile supports one model per artifact)
License
Apache-2.0