Running a ModelKit as a Docker container
Jozu Hub can automatically generate Docker containers and Kubernetes deployments to run your ModelKits locally, in a Kubernetes cluster, or any other container runtime.
Overview
When you push a ModelKit to Jozu Hub, we generate container definitions to allow you to begin using your ModelKit immediately.
To get started use the search bar at the top of the Hub, or the Browse and Discover links at the top of the page to explore available ModelKits, or push your own.
For now, lets use jozu/qwen2-0.5b
.
In the repository view are a set of sub-tabs, including one for Deploy.
Now let's see how to run this model:
- In docker from the command line
- In a Kubernetes cluster
Docker
You can get the docker command needed to run your ModelKit-packaged model either from the UI, or via a modified URL you can use from the CLI or code.
Using the UI
Select the "Docker" radio button on the left, then select llama.cpp
on the right.
INFO
Available container types depend on the type of model in the ModelKit. In this example, we are using a ModelKit that contains a GGUF-serialized LLM, so the llama.cpp
container type is available.
Now simply copy the docker CLI command, and you're ready to run the model in docker.
Using a Modified URL
Alternatively, you can skip the UI and call the generated Docker container directly from a CLI or code by modifying the ModelKit's pull URL.
Each ModelKit has a pull URL with the format <organization>/<repository>:<tag>
. To get the generated docker container we need to specify which container type we need with a modified URL that becomes <organization>/<repository>/<container-type>:<tag>
.
For example, to get the llama.cpp
container for this ModelKit (compatible with this model), we'll add /llama-cpp
before the colon and tag name:
// original ModelKit URL
jozu.ml/jozu/qwen2-0.5b:0.5b-instruct-q4_0
// generated container URL [!code word:/llama-cpp]
jozu.ml/jozu/qwen2-0.5b/llama-cpp:0.5b-instruct-q4_0
Details
The compatible types are listed on the right radio button column on the deploy tab for the ModelKit.
Running the Container with Docker
Open a terminal on a computer with the docker CLI or Docker Desktop installed and paste the command we copied from the UI, or use our modified URL, for example:
bash
docker run -it --rm \
--publish 8000:8000 \
jozu.ml/jozu/qwen2-0.5b/llama-cpp:0.5b-instruct-q4_0
The CLI will first check if the image is available locally (you may need to grant permissions for docker to find files locally), then it will download it and start the container.
Open a browser window and navigate to http://localhost:8000/
. You'll see the llama.cpp
UI, where you can have a conversation with your LLM.
Kubernetes
This documentation will outline how to create a minimal Kubernetes model deployment that you can interact with. For a more robust deployment you should factor in additional considerations like scaling, GPUs, ingress. You can refer to Kubernetes documentation) for more information on getting production ready workloads or contact Jozu Before you begin, make sure you can reach a Kubernetes cluster and that the kubectl
CLI is installed on your local machine and configured to communicate with it (see the Kubernetes docs for details).
For step-by-step instructions, refer to the official guide:
Install and Set Up kubectl
Creating the Namespace
In a terminal on your Kubernetes cluster run:
bash
kubectl create namespace jozu-ric
This will create a new namespace in the cluster called jozu-ric
(you can use any name you want).
Creating the Deployment YAML File
Create a new file called jozu-deploy.yaml
with the following contents
yaml
apiVersion: v1
kind: Pod
metadata:
name: qwen2-0.5b-llama-cpp
labels:
app: qwen2-0.5b-llama-cpp
spec:
containers:
- name: llama-cpp-serve
image: jozu.ml/jozu/qwen2-0.5b/llama-cpp:latest
ports:
- containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
name: qwen2-0.5b-llama-cpp-svc
spec:
selector:
app: qwen2-0.5b-llama-cpp
ports:
- protocol: TCP
port: 8000
targetPort: 8000
In the same terminal session you used above run:
bash
kubectl apply -f jozu-deploy.yaml -n jozu-ric
You can watch the pod startup status by running:
bash
kubectl get pods -n jozu-ric --watch
Once the pods are up and running, forward the port to access the model within the cluster. Note that this method doesn’t expose the model externally— to make it accessible outside the cluster you can configure a Service with different types such as NodePort, LoadBalancer, use Ingress or another proxy solution.
bash
kubectl port-forward svc/qwen2-0.5b-llama-cpp-svc 8000:8000 -n jozu-ric
Accessing the Model Pod
Now you can open http://localhost:8000/
in your browser and see the UI.
You can also use alternate Kubernetes resources such as Deployments to scale the workload.