An intelligent drone navigation system using Q-Learning to autonomously locate targets in 3D environments
Features • Demo • Installation • Usage • Documentation

This project implements an advanced Q-Learning algorithm to train a virtual drone for autonomous navigation in customizable 3D environments. Originally developed as an innovative solution to a university assignment, it showcases the power of reinforcement learning in robotics applications.
For more details please read Project Report
- Reinforcement Learning: Implements Q-Learning with customizable hyperparameters
- Real-time 3D Visualization: Interactive simulation with matplotlib and tkinter
- Dynamic Retraining: Adapt to new targets without restarting
- Optimized Trajectories: Intelligent path smoothing for efficient navigation
- Performance Monitoring: Track training progress and replay best episodes
- Room Simulation: Customizable 3D room environment for drone navigation
- Target Detection: Intelligent algorithm to locate a target in the simulated room
- Reinforcement Learning: Implements Q-Learning for trajectory optimization
- Visualization: Real-time 3D trajectory plotting for training and performance monitoring
- Dynamic Updates: Allows reconfiguration of the target's location with retraining capabilities
- Replay Mechanism: Replays the best navigation trajectory using generated commands

When you launch
ChangingTarget.py
, you'll be prompted to configure:• Room dimensions (depth, width, height)
• Target position (x, y, z coordinates)
• Drone starting position
• Number of training episodes
• Maximum steps per episode

The Q-Learning algorithm trains the drone through multiple episodes:
• The drone explores the environment
• Learns from successful and unsuccessful attempts
• Updates its Q-table based on rewards
• Progress bar shows training advancement
![]() |
![]() |
After training, the simulation automatically displays:
• The most efficient path found
• Smoothed trajectory commands
• Target detection confirmation
![]() |
![]() |
Without restarting the program:
• Close the simulation window
• Enter new target coordinates
• The drone starts from its last position
• Retraining adapts to the new target location
- Python 3.8 or higher
- pip package manager
- Virtual environment (recommended)
- Clone the repository
git clone https://github.com/Warukho/Reinforcement-Learning-Navigating-Drone.git
cd Reinforcement-Learning-Navigating-Drone
- Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
# Run the main application
python ChangingTarget.py
Follow the interactive prompts as shown in the demo section above.
Reinforcement_Learning_Navigating_Drone/
│
├── dronecore/ # Core drone mechanics
├── images/ # UI assets
├── README_Data/ # Documentation assets
│
├── ChangingTarget.py # Main application entry point
├── FunctionsLib.py # RL algorithms & utilities
├── dronecmds.py # Drone command interface
├── best_episode_commands.py # Replay functionality
│
├── viewermpl.py # Matplotlib visualizer
├── viewertk.py # Tkinter GUI interface
├── mplext.py # 3D plotting extensions
│
└── requirements.txt # Project dependencies
The drone learns optimal navigation strategies through:
- State Space: Discretized 3D coordinates (x, y, z)
- Action Space: 6 directions × variable distances
- Reward System:
- +1000 for reaching target
- Proportional rewards for reducing distance
- Penalties for inefficient movements
# Q-table update formula
q_table[state][action] = old_value + α * (reward + γ * max(q_table[next_state]) - old_value)
Dynamic State Discretization
The state space automatically adapts to room dimensions:
state_bins = [
np.linspace(0, room_width, round(5 + (room_width ** 0.45))),
np.linspace(0, room_depth, round(5 + (room_depth ** 0.45))),
np.linspace(0, room_height, round(5 + (room_height ** 0.45)))
]
Trajectory Smoothing Algorithm
Optimizes command sequences by:
– Aggregating movements by direction
– Canceling opposing movements
– Prioritizing larger movements
– Chunking commands to respect maximum distance constraints
Adaptive Exploration Strategy
Balances exploration vs exploitation:
# Epsilon-greedy approach with decay
epsilon = max(epsilon * epsilon_decay, epsilon_min)
Parameter | Default Value | Description |
---|---|---|
Learning Rate (α) | 0.05 | Controls how quickly the drone learns |
Discount Factor (γ) | 0.995 | Importance of future rewards |
Initial Exploration (ε) | 0.98 | Initial randomness in actions |
Epsilon Decay | 0.92 | Rate of exploration reduction |
Minimum Epsilon | 0.01 | Minimum exploration rate |
Modify hyperparameters in FunctionsLib.py
:
# Training parameters
alpha = 0.05 # Learning rate
gamma = 0.995 # Discount factor
epsilon = 0.98 # Initial exploration rate
epsilon_decay = 0.92
epsilon_min = 0.01
from FunctionsLib import initialize_settings, training_loop, get_training_results
# Initialize environment
settings = initialize_settings()
# Run training
best_actions, best_trajectory = training_loop(
env_with_viewer,
num_episodes=100,
max_steps=500
)
# Generate replay commands
writing_commands(best_actions, settings["room_x"], settings["room_y"],
settings["room_height"], settings["drone_x"], settings["drone_y"],
settings["target_x"], settings["target_y"], settings["target_z"])
To replay the optimal trajectory after training:
python best_episode_commands.py
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Axel Bouchaud--Roche
- Reinforcement Learning implementation
- Dynamic environment adaptation
- Q-Learning algorithm optimization
- Email: axelbouchaudroche@gmail.com
- GitHub: AxelBcr
Pierre Chauvet
- Core framework development
- Drone command interface
- 3D visualization system
- Email: pierre.chauvet@uco.fr
- GitHub: pechauvet
Léo Bugyan
- Co-developpment
- Writing report
- GitHub: zenk02