Running a ModelKit as a Docker container
Jozu Hub can automatically generate Docker containers to allow you to run your ModelKits locally, in a Kubernetes cluster, or any other container runtime.
Overview
When you push a ModelKit to Jozu Hub, we generate container definitions to allow you to begin using your ModelKit immediately. Currently, two container types are supported:
basic
: a simple Alpine Linux-based container that includes the model defined in the ModelKit.llama.cpp
: a llama.cpp inference container that can serve LLM ModelKits that contain GGUF-formatted models. The container serves an OpenAPI compatible REST API, and can be viewed in your browser at http://localhost:8000
INFO
This feature is in early development at the moment, and more container types will be added over time. In the mean time, we would love to hear your feedback. Reach us at [email protected].
1) Find a ModelKit
While Jozu Hub generates containers for all ModelKits, it currently works best with LLMs that include GGUF-formatted weights. You can use the search bar at the top of the Hub, or the Browse and Discover links at the top of the page to explore available ModelKits, or push your own.
For now, lets use jozu/qwen2-0.5b
, which is a small LLM we can run locally.
2) Decide which type of container to run
Lets use the 0.5b-instruct-q4_0
tag for the jozu/qwen2-0.5b
ModelKit. We can select which container we use by updating the ModelKit reference:
Normally, we would access this ModelKit using
jozu.ml/jozu/qwen2-0.5b:0.5b-instruct-q4_0
To get the llama.cpp container for this ModelKit, we'll add /llama-cpp
before the colon and tag name:
jozu.ml/jozu/qwen2-0.5b/llama-cpp:0.5b-instruct-q4_0
Details
To select which type of container image to use, we need to add a section to the ModelKit's reference, so that it becomes <organization>/<repository>/<container-type>
, where <container-type>
is
basic
for the 'basic' container.llama-cpp
for the llama.cpp container.
3) Run the container with Docker
We'll use the docker
CLI to run our container, mapping port 8000 so that we can access it in our browser. In the terminal, type
bash
docker run -it --rm \
--publish 8000:8000 \
jozu.ml/jozu/qwen2-0.5b/llama-cpp:0.5b-instruct-q4_0
You should see the image being downloaded and then running llama.cpp.
Open a browser window and navigate to http://localhost:8000/. You should see the llama.cpp UI, where you can have a conversation with your LLM.
4) Next steps
This is a short tutorial on getting started with Jozu Hub containers. From here, you can explore the containers for other ModelKits, or try out deploying containers to a Kubernetes cluster.