Running a ModelKit as a Docker container
Jozu Hub can automatically generate Docker containers to allow you to run your ModelKits locally, in a Kubernetes cluster, or any other container runtime.
Overview
When you push a ModelKit to Jozu Hub, we generate container definitions to allow you to begin using your ModelKit immediately. Currently, two container types are supported:
basic
: a simple Alpine Linux-based container that includes the model defined in the ModelKit.llama.cpp
: a llama.cpp inference container that can serve LLM ModelKits that contain GGUF-formatted models. The container serves an OpenAPI compatible REST API, and can be viewed in your browser at http://localhost:8000
INFO
More container types will be added over time. In the mean time, we would love to hear your feedback. Reach us at [email protected].
1/ Find a ModelKit
While Jozu Hub generates containers for all ModelKits, it currently works best with LLMs that include GGUF-formatted weights. You can use the search bar at the top of the Hub, or the Browse and Discover links at the top of the page to explore available ModelKits, or push your own.
For now, lets use jozu/qwen2-0.5b
, which is a small LLM we can run locally.
2/ Getting the Generated Container
Using the Jozu Hub UI
In the repository view are a set of sub-tabs, including one for Deploy. Select either the Docker, Kubernetes, or Custom Container deployment type. Then (since this is an LLM) select an LLM compatible container type like Llama.cpp.
You can copy either the docker CLI command or Kubernetes YAML and use them as you normally would.
In this case we'll copy the docker command...
2/ Running the Container with Docker
Open a terminal on a computer with the docker CLI or Docker Desktop installed and paste the command we copied from the UI, for example:
bash
docker run -it --rm \
--publish 8000:8000 \
jozu.ml/jozu/qwen2-0.5b/llama-cpp:0.5b-instruct-q4_0
The CLI will first check if the image is available locally (you may need to grant permissions for docker to find files locally), then it will download it and start the container.
Open a browser window and navigate to http://localhost:8000/. You'll see the llama.cpp UI, where you can have a conversation with your LLM.
3/ From CLI or Code
You can also skip the UI and call the generated Docker container directly from the CLI for any Jozu Hub ModelKit.
Lets imagine a ModelKit called jozu.ml/brad/coffee-optimize:v60
. Even without looking at the Jozu Hub UI we can infer the CLI call we'd used to pull the generated container:
For example, to get the llama.cpp container for this ModelKit, we'll add /llama-cpp
before the colon and tag name:
jozu.ml/brad/coffee-optimize/llama-cpp:v60
Details
To select which type of container image to use, we need to add a section to the ModelKit's reference, so that it becomes <organization>/<repository>/<container-type>:<tag>
, where <container-type>
is
basic
for the 'basic' container.llama-cpp
for the llama.cpp container.- etc...
4/ Next steps
This is a short tutorial on getting started with Jozu Hub containers. From here, you can explore the containers for other ModelKits, or try out deploying containers to a Kubernetes cluster.