My long-term goal is to develop intelligent machines capable of perceiving, understanding, and creating multimodal content, such as videos.
ML PhD at MBZUAI
Pinned Loading
-
MetaAgentX/OpenCaptchaWorld
MetaAgentX/OpenCaptchaWorld PublicThe first web-based benchmark and platform to evaluate visual reasoning and interaction capabilities of MLLM powered agents through diverse and dynamic CAPTCHA puzzles.
JavaScript 36
-
De-Diffusion
De-Diffusion PublicThis is my version of code implementation for the model includes in the paper De-Diffusion Makes Text a Strong Cross-Modal Interface
Python 9
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.