Turtle VLM Workspace

A complete ROS Noetic workspace that turns a (simulated or real) TurtleBot3 Burger with an Intel RealSense depth camera into a conversational robot:

Perception: YOLO v8 + MiDaS depth for 3-D object poses, published via a simple service (/get_seen_objects).
Reasoning / Language: lightweight LLM front-end that converts natural-language commands into a structured action dictionary.
Control & Navigation: turns these actions into move_base goals or direct cmd_vel twists.
Chat GUI: Tkinter interface with speech I/O and image previews.
Gazebo world: ready-to-run CPS lab environment with RViz configs.

Directory layout

turtle_vlm_ws/
├── .catkin_workspace
├── src/
│   ├── turtle_vlm_chat/             # Main package (code + configs)
│   ├── turtle_vlm_gazebo/           # Gazebo world, RViz config, launch files
│   └── turtlebot3_description_custom/  # Custom URDF for TurtleBot3 + Realsense
└── frames.gv

Key scripts

Path	Purpose
`turtle_vlm_chat/scripts/vlm_node_yolo.py`	Runs YOLO v8, estimates 3-D poses, maintains “seen objects” memory
`turtle_vlm_chat/scripts/llm_to_goal_node.py`	Converts parsed LLM actions into real robot motion
`turtle_vlm_chat/scripts/llm_node.py`	Thin wrapper that posts prompts to your chosen LLM endpoint (OpenAI, Ollama, …)
`turtle_vlm_chat/scripts/chatGUI_SR.py`	Desktop GUI with speech recognition (Google) & TTS (gTTS / pyttsx3)

Quick-start (simulation)

Tested on Ubuntu 20.04 + ROS Noetic + Python 3.8.

# 1. clone (already done if you're reading this from your repo clone)
git clone git@github.com:Anamika-JH/turtle_vlm_ws.git
cd turtle_vlm_ws

# 2. download YOLO + CLIP + SAM checkpoints (~500 MB total)
src/turtle_vlm_chat/models/download_models.sh

# 3. install missing Python deps (virtualenv recommended)
pip install -r src/turtle_vlm_chat/requirements.txt

# 4. build workspace
catkin_make
source devel/setup.bash

# 5. launch Gazebo world + navigation stack
roslaunch turtle_vlm_gazebo nav_sim.launch

# 6. start perception
rosrun turtle_vlm_chat vlm_node_yolo.py

# 7. start LLM interface (edit API key env vars first!)
rosrun turtle_vlm_chat llm_node.py

# 8. start action executor
rosrun turtle_vlm_chat llm_to_goal_node.py

# 9. optional: start the chat GUI
rosrun turtle_vlm_chat chatGUI_SR.py

License

This repository is released under the MIT License.

All third-party model checkpoints keep their original licenses:

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
images		images
src		src
.catkin_workspace		.catkin_workspace
.gitignore		.gitignore
README.md		README.md
frames.gv		frames.gv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Turtle VLM Workspace

Directory layout

Key scripts

Quick-start (simulation)

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Anamika-JH/turtle_vlm_ws

Folders and files

Latest commit

History

Repository files navigation

Turtle VLM Workspace

Directory layout

Key scripts

Quick-start (simulation)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages