Unsigned
9
1

voice-llm

:

Voice LLM — Qwen3-0.6B

Language model for the voice AI pipeline. Uses Qwen3-0.6B in GGUF format via llama.cpp for fast CPU inference.

Model Details

Property Value
Base model Qwen3-0.6B
Quantization Q4_K_M
Size ~400MB
Context 4096 tokens
First token <300ms (CPU)
Architecture Dense transformer

KitOps Usage

# Pack (includes model, code, AND the system prompt)
kit pack . -t jozu.ml/arnabchat2001/voice-llm:v1.0.0

# Unpack just the prompt (for iteration)
kit unpack jozu.ml/arnabchat2001/voice-llm:v1.0.0 --filter=prompts -d ./output

System Prompt

The call centre agent persona is defined in prompts/system_prompt.md. This file is packaged inside the ModelKit, so prompt changes are versioned alongside the model weights.

Local Usage

from src.llm_service import LLMService

llm = LLMService()
reply = llm.generate("Hi, I have a question about my bill.")