What would be the simplest way to use `triton_python_backend_utils` just to work with `pb_utils.InferenceRequest` and related abstractions? #7104

nybbles · 2024-04-11T18:49:37Z

nybbles
Apr 11, 2024

We are working on streamlining our ML infrastructure (which runs on an IPC in a in-line manufacturing device). We do not have a standardized ML inference API for our models and supporting code. Our models are mostly Python-based and eventually call into sklearn or PyTorch).

For now, we are considering adopting the TritonPythonModel API as an internal standard, for the benefits of a standard API, including being able to easily adopt Triton in the future.

The execute method in TritonPythonModel expects pb_utils.InferenceRequest and pb_utils.InferenceResponse, and there is related utility code from triton_python_backend_utils. We would need to use this utility code to construct pb_utils.InferenceRequests from our own internal abstractions and then translate the pb_utils.InferenceResponses back to our own internal abstractions.

triton_python_backend_utils is only available from within Triton inference server itself, or from Triton_Inference_Server_Python_API/deps/tritonserver-2.41.0.dev0-py3-none-any.whl, which I saw is built by the Docker build process for the tritonserver Docker image.

For now, we want to continue running our model inference in-process, and just want to adopt a standard API, like TritonPythonModel's for our models.

Here are my questions:

Is there a rough timeline for Triton inference server in-process Python API to come out of beta? What is it? We are cautious about building and using the wheel (in a limited way, for interacting with those classes) for our purposes when it is in beta.
Is there some alternative way to work with pb_utils.InferenceRequest and related abstractions that you'd recommend?

Also, if this approach is flawed in some way that I'm missing, I would love to be alerted about that. Thank you!

rmccorm4 · 2024-04-16T21:56:46Z

rmccorm4
Apr 16, 2024
Collaborator

Hi @nybbles, thanks for such a detailed discussion post!

We do not have a standardized ML inference API for our models

This is one of the major selling points of the Triton Inference Server historically, multi-backend with optimized C++ implementations through a single interface via python/REST/GRPC.

If you're looking to do things in-process with your python workloads and avoid the network overhead, then it sounds like you're on the right track looking into the new Python in-process API. We'd definitely be interested to hear any early feedback or thoughts you may have from trying it out.

Is there a rough timeline for Triton inference server in-process Python API to come out of beta? What is it? We are cautious about building and using the wheel (in a limited way, for interacting with those classes) for our purposes when it is in beta.

CC @nnshah1 who may be able to help here.

Is there some alternative way to work with pb_utils.InferenceRequest and related abstractions that you'd recommend?

For now, you'd probably need to implement a simple dummy/mock module that could be interacted with in the same way. The pb_utils helper module is a bit special in how it gets injected through the use of pybind I believe. This could also be a way to think about debugging python models standalone without requiring a running server. CC @Tabrizian if you have any insights here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What would be the simplest way to use `triton_python_backend_utils` just to work with `pb_utils.InferenceRequest` and related abstractions? #7104

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What would be the simplest way to use triton_python_backend_utils just to work with pb_utils.InferenceRequest and related abstractions? #7104

Uh oh!

Uh oh!

nybbles Apr 11, 2024

Replies: 1 comment

Uh oh!

rmccorm4 Apr 16, 2024 Collaborator

What would be the simplest way to use `triton_python_backend_utils` just to work with `pb_utils.InferenceRequest` and related abstractions? #7104

nybbles
Apr 11, 2024

rmccorm4
Apr 16, 2024
Collaborator