10

2

smollm2-vllm

:

Byoyedeletemitope76

Deploy ModelKit Contents Security Report ModelKit Diff Model Card

SmolLM2-135M vLLM ModelKit

This ModelKit contains SmolLM2-135M, an ultra-lightweight 135M parameter model optimized for ARM64 Macs.

Performance Characteristics

Parameters: 135M (vs TinyLlama's 1.1B)
Memory Usage: ~800MB (vs TinyLlama's 2-3GB)
Model Size: ~400MB (vs TinyLlama's 4GB)
Inference Speed: 2-3x faster than TinyLlama
ARM64 Optimized: Built specifically for Apple Silicon

Usage

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Explain quantum computing briefly"}]
  }'