Can I run Triton Inference Server using multiple MIG instances? #6468
tunahanertekin
started this conversation in
General
Replies: 2 comments
-
any updates? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @tunahanertekin, It is current a limitation of MIG/CUDA that multiple MIG instances can not be enumerated by the same process or container: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-visible-devices.
The blog post achieves scaling multiple mig instances by assigning each container/pod exactly one instance. Triton is just limited by CUDA functionality in this case. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
When I started Triton Inference Server using Docker in a machine that has 4 V100 GPUs, it distributed the load to other GPUs (I gave device IDs as
--gpus '"device=0,1,2"'
). Then I aimed to do the same thing with multiple MIG instances on an instance with A100.I used this manual to deploy Triton Inference Server to my Kubernetes cluster. I have A100 80Gb GPU and 7 MIG instances with type
1g.10gb
. However, Triton Inference Server does not seem to distribute the load after I assigned 4 MIG instances (instead of 1) (1g.10gb
) to the container. It only uses the GPU with ID 0.Is there a way to use multiple MIG instances with Triton Inference Server? Any kind of help is appreciated.
Beta Was this translation helpful? Give feedback.
All reactions