π RoboML v0.3.1 β More Serving Options and More Models
Weβre excited to announce the release of RoboML v0.3.1, a major step forward in making open-source ML models easier and faster to deploy for robotics applications. This update focuses on real-time performance, multimodal interaction, and expanding RoboML's capabilities across speech, vision, and planning.
β¨ What's New
π Real-Time Interaction with WebSockets
RoboMLβs HTTP server now supports WebSocket endpoints, enabling bi-directional communication and real-time streaming for responsive robotic applications.
π§ Streaming LLM/MLLM Support
Language and multimodal models (like π€ Transformers and MLLMs) now support streaming outputs β perfect for interactive use cases such as instruction following and human-robot dialogue.
π§ New Planning Model: RoboBrain2.0 by BAAI
We're thrilled to integrate RoboBrain2.0 β a state-of-the-art spatial-temporal reasoning model that enables complex planning with closed-loop feedback and real-time scene understanding.
π New Voices, New Choices: TTS Upgrades
Two powerful TTS models have been added:
- Bark by SunoAI: Natural, expressive voice generation.
- MeloTTS by MyShell: High-quality multilingual synthesis (EN, ZH, JP, and more).
Also, Whisper STT has been upgraded to the FasterWhisper backend β delivering faster and more accurate transcriptions.
π οΈ Improvements & Fixes
Numerous performance enhancements, bug fixes, and stability improvements have been made across the board.
β οΈ Breaking Changes
- Vector database and encoding model support has been removed to simplify the stack and focus on deployable, robotics-ready models.
π Full Changelog
See the full list of changes here:
π 0.2.3 β 0.3.1
RoboML is growing as a unified platform for deploying multimodal ML in real-world robotics. With real-time capabilities, speech support, and planning models, v0.3.1 brings us one step closer to true embodied intelligence.
Stay tuned, and happy deploying! π€π¬π¦