Convert Hunyuan models to TensorRT format for 2x speedup and half the vram use.

Flux Kontext dev released a model with support for TensorRT cutting the VRAM in half and doubling the performance. https://blogs.nvidia.com/blog/rtx-ai-garage-flux-kontext-nim-tensorrt/

Would you consider to convert the Hunyuan 3d and texture models to this format ? We are all using Nvidia anyhow to run the generations as CPU/AMD/Mac is not really useable with Hunyuan. 

I noticed there are already projects at Tencent Hunyuan to support TensorRT.  
What is the status of this and can it be of use?

http://huggingface.co/Tencent-Hunyuan/TensorRT-libs
https://huggingface.co/Tencent-Hunyuan/TensorRT-engine

I looked at tutorials about converting fp16 models that use pytorch to this TensorRT format and my research shows its possible and there are clear instructions for how to do the conversion. I also asked LLM to create a guide and cover the conversion process and I got positive indication with a detailed step by step guide how to do the conversion and would need 31 gb vram to do it. 

Would you consider to release Hunyuan 2.0 or 2.1 or future models in this TensorRT format ? I didnt see any other issue mention it yet. 

21GB vram for Hy2.1 smoothly is not really possible for many as we dont have the highest end gpu unless we rent them from a vm service like runpod and thats a pity when its totally possible to use the gpu we have already with 8gb, 12gb or 16gb vram. 


 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Convert Hunyuan models to TensorRT format for 2x speedup and half the vram use. #308

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Convert Hunyuan models to TensorRT format for 2x speedup and half the vram use. #308

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions