Skip to content

metaloom/smolvlm-inference-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SmolVLM Inference Server

A simple FastAPI inference server for the SmolVLM-Instruct MultiModal LLM.

Env

  • MODEL_ID: HuggingFaceTB/SmolVLM-Instruct
  • DEFAULT_PROMPT: "Describe the image"

Container

podman run \
    --device nvidia.com/gpu=all \
    --shm-size 1g \
    --name smolvlm-server \
    -p 8000:8000 \
    --rm \
    -v /opt/cache/huggingface:/root/.cache/huggingface \
    metaloom/smolvlm-server:latest

Spec

{
  "prompt": "Describe the image",
  "image_url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/SmolVLM.png",
  "image_data": "dGVzd…"
}

Build

./build.sh

Test

./test.sh

Development

pip3 install -r requirements.txt
pip3 install flash-attn --no-build-isolation

uvicorn main:app --reload

About

Simple FastAPI Server which provides an inference API for SmolVLM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published