Skip to content

“Turtle VLM Workspace” is a complete ROS Noetic project that transforms a TurtleBot3 (with Intel RealSense camera) into a conversational robot. It features YOLO v8 + MiDaS depth for 3‑D perception, a lightweight LLM interface for natural-language commands, navigation control, a Tkinter chat GUI with speech I/O, and ready-to-run Gazebo/RViz configs.

Notifications You must be signed in to change notification settings

Anamika-JH/turtle_vlm_ws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Turtle VLM Workspace

Turtle VLM Workspace Preview

A complete ROS Noetic workspace that turns a (simulated or real) TurtleBot3 Burger with an Intel RealSense depth camera into a conversational robot:

  • Perception: YOLO v8 + MiDaS depth for 3-D object poses, published via a simple service (/get_seen_objects).
  • Reasoning / Language: lightweight LLM front-end that converts natural-language commands into a structured action dictionary.
  • Control & Navigation: turns these actions into move_base goals or direct cmd_vel twists.
  • Chat GUI: Tkinter interface with speech I/O and image previews.
  • Gazebo world: ready-to-run CPS lab environment with RViz configs.

Directory layout

turtle_vlm_ws/
├── .catkin_workspace
├── src/
│   ├── turtle_vlm_chat/             # Main package (code + configs)
│   ├── turtle_vlm_gazebo/           # Gazebo world, RViz config, launch files
│   └── turtlebot3_description_custom/  # Custom URDF for TurtleBot3 + Realsense
└── frames.gv

Key scripts

Path Purpose
turtle_vlm_chat/scripts/vlm_node_yolo.py Runs YOLO v8, estimates 3-D poses, maintains “seen objects” memory
turtle_vlm_chat/scripts/llm_to_goal_node.py Converts parsed LLM actions into real robot motion
turtle_vlm_chat/scripts/llm_node.py Thin wrapper that posts prompts to your chosen LLM endpoint (OpenAI, Ollama, …)
turtle_vlm_chat/scripts/chatGUI_SR.py Desktop GUI with speech recognition (Google) & TTS (gTTS / pyttsx3)

Quick-start (simulation)

Tested on Ubuntu 20.04 + ROS Noetic + Python 3.8.

Simulation Preview

# 1. clone (already done if you're reading this from your repo clone)
git clone git@github.com:Anamika-JH/turtle_vlm_ws.git
cd turtle_vlm_ws

# 2. download YOLO + CLIP + SAM checkpoints (~500 MB total)
src/turtle_vlm_chat/models/download_models.sh

# 3. install missing Python deps (virtualenv recommended)
pip install -r src/turtle_vlm_chat/requirements.txt

# 4. build workspace
catkin_make
source devel/setup.bash

# 5. launch Gazebo world + navigation stack
roslaunch turtle_vlm_gazebo nav_sim.launch

# 6. start perception
rosrun turtle_vlm_chat vlm_node_yolo.py

# 7. start LLM interface (edit API key env vars first!)
rosrun turtle_vlm_chat llm_node.py

# 8. start action executor
rosrun turtle_vlm_chat llm_to_goal_node.py

# 9. optional: start the chat GUI
rosrun turtle_vlm_chat chatGUI_SR.py

License

This repository is released under the MIT License.

All third-party model checkpoints keep their original licenses:

  • YOLO v8 – © Ultralytics
  • CLIP – © OpenAI
  • SAM – © Meta AI
  • MiDaS – © Intel ISL

About

“Turtle VLM Workspace” is a complete ROS Noetic project that transforms a TurtleBot3 (with Intel RealSense camera) into a conversational robot. It features YOLO v8 + MiDaS depth for 3‑D perception, a lightweight LLM interface for natural-language commands, navigation control, a Tkinter chat GUI with speech I/O, and ready-to-run Gazebo/RViz configs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •