Running a ModelKit as a Docker container

Jozu Hub can automatically generate Docker containers and Kubernetes deployments to run your ModelKits locally, in a Kubernetes cluster, or any other container runtime.

Overview

When you push a ModelKit to Jozu Hub, we generate container definitions to allow you to begin using your ModelKit immediately.

To get started use the search bar at the top of the Hub, or the Browse and Discover links at the top of the page to explore available ModelKits, or push your own.

For now, lets use jozu/qwen2-0.5b.

In the repository view are a set of sub-tabs, including one for Deploy.

Now let's see how to run this model:

In docker from the command line
In a Kubernetes cluster

Docker

You can get the docker command needed to run your ModelKit-packaged model either from the UI, or via a modified URL you can use from the CLI or code.

Using the UI

Select the "Docker" radio button on the left, then select llama.cpp on the right.

INFO

Available container types depend on the type of model in the ModelKit. In this example, we are using a ModelKit that contains a GGUF-serialized LLM, so the llama.cpp container type is available.

Now simply copy the docker CLI command, and you're ready to run the model in docker.

Using a Modified URL

Alternatively, you can skip the UI and call the generated Docker container directly from a CLI or code by modifying the ModelKit's pull URL.

Each ModelKit has a pull URL with the format <organization>/<repository>:<tag>. To get the generated docker container we need to specify which container type we need with a modified URL that becomes <organization>/<repository>/<container-type>:<tag>.

For example, to get the llama.cpp container for this ModelKit (compatible with this model), we'll add /llama-cpp before the colon and tag name:

// original ModelKit URL
jozu.ml/jozu/qwen2-0.5b:0.5b-instruct-q4_0

// generated container URL [!code word:/llama-cpp]
jozu.ml/jozu/qwen2-0.5b/llama-cpp:0.5b-instruct-q4_0

Details

The compatible types are listed on the right radio button column on the deploy tab for the ModelKit.

Running the Container with Docker

Open a terminal on a computer with the docker CLI or Docker Desktop installed and paste the command we copied from the UI, or use our modified URL, for example:

bash

docker run -it --rm \
  --publish 8000:8000 \
  jozu.ml/jozu/qwen2-0.5b/llama-cpp:0.5b-instruct-q4_0

The CLI will first check if the image is available locally (you may need to grant permissions for docker to find files locally), then it will download it and start the container.

Open a browser window and navigate to http://localhost:8000/. You'll see the llama.cpp UI, where you can have a conversation with your LLM.

Kubernetes

This documentation will outline how to create a minimal Kubernetes model deployment that you can interact with. For a more robust deployment you should factor in additional considerations like scaling, GPUs, ingress. You can refer to Kubernetes documentation) for more information on getting production ready workloads or contact Jozu Before you begin, make sure you can reach a Kubernetes cluster and that the kubectl CLI is installed on your local machine and configured to communicate with it (see the Kubernetes docs for details).

For step-by-step instructions, refer to the official guide:
Install and Set Up kubectl

Creating the Namespace

In a terminal on your Kubernetes cluster run:

bash

kubectl create namespace jozu-ric

This will create a new namespace in the cluster called jozu-ric (you can use any name you want).

Creating the Deployment YAML File

Create a new file called jozu-deploy.yaml with the following contents

yaml

apiVersion: v1
kind: Pod
metadata:
  name: qwen2-0.5b-llama-cpp
  labels:
    app: qwen2-0.5b-llama-cpp
spec:
  containers:
    - name: llama-cpp-serve
      image: jozu.ml/jozu/qwen2-0.5b/llama-cpp:latest
      ports:
        - containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
  name: qwen2-0.5b-llama-cpp-svc
spec:
  selector:
    app: qwen2-0.5b-llama-cpp
  ports:
    - protocol: TCP
      port: 8000
      targetPort: 8000

In the same terminal session you used above run:

bash

kubectl apply -f jozu-deploy.yaml -n jozu-ric

You can watch the pod startup status by running:

bash

kubectl get pods -n jozu-ric --watch

Once the pods are up and running, forward the port to access the model within the cluster. Note that this method doesn’t expose the model externally— to make it accessible outside the cluster you can configure a Service with different types such as NodePort, LoadBalancer, use Ingress or another proxy solution.

bash

kubectl port-forward svc/qwen2-0.5b-llama-cpp-svc 8000:8000 -n jozu-ric

Accessing the Model Pod

Now you can open http://localhost:8000/ in your browser and see the UI.

You can also use alternate Kubernetes resources such as Deployments to scale the workload.

Running a ModelKit as a Docker container ​

Overview ​

Docker ​

Using the UI ​

Using a Modified URL ​

Running the Container with Docker ​

Kubernetes ​

Creating the Namespace ​

Creating the Deployment YAML File ​

Accessing the Model Pod ​

Running a ModelKit as a Docker container

Overview

Docker

Using the UI

Using a Modified URL

Running the Container with Docker

Kubernetes

Creating the Namespace

Creating the Deployment YAML File

Accessing the Model Pod