TTTS Reinforcement learning

This repository contains two custom made gymnasium enviroments for training an agent in navigating and mapping placenta enviroment. Also there are scripts for training agents to perform defined tasks.

Environment v0

Environment v0 is a simple grid enviroment. Agent is tasked with discovering whole grid world. Agent is a camera that has a viewport, every grid cell that is inside the viewport changes value from 0 to 1.

Observations

After each step agent observes its position in grid. Cells inside viewport change their values. Default viewport dimensions 3x3 wiith 9x9 grod world.
Actions

Agent has 4 defined actions:
- UP
- DOWN
- LEFT
- RIGHT
Every time action is decided agent takes a step in the decided direction with a defined step_size. Default step_size is 1.
Rewards

Agents task is to discover whole grid world, when it achives the goal episode is terminated and final reward is given. Aditionally agent receives reward for the amount of discovered cells in each step. To motivate agent to move and discover new areas there is a small penalty for taking a step. [Here table with rewards]
Reinforcement Learning

Agent is learning a policy using Q-Learning with epsilon greedy strategy. [Here pic of q learning algorith,]

Environment v1

Environment v1 is more complicated than v0. Instead of grid world agent moves its viewport on the placenta segmentation image. Every time agent discovers new area it mapps all white pixels (vessels) into its map in memory.

Observations

After each step agent observes part of placenta image that fits inside its viewport. Captured viewport is saved to map inside memory. Default viewport dimensions 256x256.
Actions

Agent has 4 defined actions:
- UP
- DOWN
- LEFT
- RIGHT
Every time action is decided agent takes a step in the decided direction with a defined step_size. Default step_size is 64.
Rewards

Agents task is to discover most of white pixels (90%) in the segmented image, when it achives the goal episode is terminated and final reward is given. Aditionally agent receives reward for the amount of discovered white pixels in each step. To motivate agent to move and discover new areas there is a small penalty for taking a step. [Here table with rewards]
Reinforcement Learning

Agent is learning a policy using Deep Q-Learning (DQL) with 2 custom convolutional neural networks (CNN). [Here pic of architecture of network]

Hyperparameters
- learning_rate = 0.001
- gamma = 0.9 (discount rate)
- network_sync_rate = 10 (number of steps the agent takes before syncing the policy and target network)
- replay_memory_size = 1000 (size of replay memory)
- mini_batch_size = 32 (size of the training data set sampled from the replay memory)

Run DQL

python3 v1/v1_camera_train_dqn.py - for training using custom dqn implementation

python3 v1/v1_sb3.py - for training using SB3 dqn implementation

Requirements

Gymansium
pyTorch
Stable Baselines3

ToDo

package enviroment to use with pip
change displaying figure when render = False
saving model every X episodes with different names
capturing learning time and debug info
add plots to improve learning interactive
optimize calculations e.g. normalize image pixels
finish README
play with parameters [num episodes, hyperparameters, step_size, viewport_size]
v3 with mujoco enviroment
spelling check
alternative algorithm for mapping (split viewport into 4 parts calc each parts whit pixels decide on that)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
v0		v0
v1		v1
.gitignore		.gitignore
README.md		README.md
seg_255rgb.png		seg_255rgb.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TTTS Reinforcement learning

Environment v0

Observations

Actions

Rewards

Reinforcement Learning

Environment v1

Observations

Actions

Rewards

Reinforcement Learning

Hyperparameters

Run DQL

Requirements

ToDo

About

Uh oh!

Releases

Packages

Languages

SanoScience/TTTS-RL

Folders and files

Latest commit

History

Repository files navigation

TTTS Reinforcement learning

Environment v0

Observations

Actions

Rewards

Reinforcement Learning

Environment v1

Observations

Actions

Rewards

Reinforcement Learning

Hyperparameters

Run DQL

Requirements

ToDo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages