Replies: 1 comment
-
Hi @nybbles, thanks for such a detailed discussion post!
This is one of the major selling points of the Triton Inference Server historically, multi-backend with optimized C++ implementations through a single interface via python/REST/GRPC. If you're looking to do things in-process with your python workloads and avoid the network overhead, then it sounds like you're on the right track looking into the new Python in-process API. We'd definitely be interested to hear any early feedback or thoughts you may have from trying it out.
CC @nnshah1 who may be able to help here.
For now, you'd probably need to implement a simple dummy/mock module that could be interacted with in the same way. The |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We are working on streamlining our ML infrastructure (which runs on an IPC in a in-line manufacturing device). We do not have a standardized ML inference API for our models and supporting code. Our models are mostly Python-based and eventually call into sklearn or PyTorch).
For now, we are considering adopting the
TritonPythonModel
API as an internal standard, for the benefits of a standard API, including being able to easily adopt Triton in the future.The
execute
method inTritonPythonModel
expectspb_utils.InferenceRequest
andpb_utils.InferenceResponse
, and there is related utility code fromtriton_python_backend_utils
. We would need to use this utility code to constructpb_utils.InferenceRequest
s from our own internal abstractions and then translate thepb_utils.InferenceResponse
s back to our own internal abstractions.triton_python_backend_utils
is only available from within Triton inference server itself, or fromTriton_Inference_Server_Python_API/deps/tritonserver-2.41.0.dev0-py3-none-any.whl
, which I saw is built by the Docker build process for thetritonserver
Docker image.For now, we want to continue running our model inference in-process, and just want to adopt a standard API, like
TritonPythonModel
's for our models.Here are my questions:
pb_utils.InferenceRequest
and related abstractions that you'd recommend?Also, if this approach is flawed in some way that I'm missing, I would love to be alerted about that. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions