Change the repository type filter
All
Repositories list
20 repositories
ComfyUI-Copilot
Public- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Pixelle-MCP
PublicAn Open-Source Multimodal AIGC Solution based on ComfyUI + MCP + LLM https://pixelle.ai- Awesome Unified Multimodal Models
Ovis-U1
PublicAn unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.- TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance (ICCV 2025)
flashinfer
PublicParrot
Public🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.Marco-o1
PublicTransBench
PublicWings
PublicThe code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]Meissonic
PublicAutoGPTQ
Public