aipipeline is a library for running AI pipelines and monitoring the performance of the pipelines, e.g. balanced accuracy. This may include object detection, clustering, classification, and vector search algorithms. It is designed to be used for a number of projects at MBARI that require advanced workflows to process large amounts of images or video. After workflows are developed, they may be moved to the project repositories for production use.
See the MBARI Internal AI documentation for more information on the tools and services used in the pipelines and what is coming in the core roadmap.
Example plots from the t-SNE, confusion matrix and accuracy analysis of exemplar data.
Three tools are required to run the code in this repository:
Recommend using the Miniconda version of Anaconda to manage python versions and virtual environments. This works well across all platforms.
Install on Mac OS X with the following command:
brew install miniconda
or on Ubuntu with the following command:
sudo apt install miniconda
This is a containerization tool that allows you to run code in a container.
just tool.
This is a handy tool for running scripts in the project. This is easier to use than make
and more clean than bash scripts. Try it out!
Install on Mac OS X with the following command:
port install just
or on Ubuntu with the following command:
sudo apt install just
Clone the repository and run the setup command.
git clone http://github.com/mbari-org/aipipeline.git
cd aipipeline
just setup
Sensitive information is stored in a .env file in the root directory of the project, so you need to create a .env file with the following contents in the root directory of the project:
TATOR_TOKEN=your_api_token
REDIS_PASSWORD=your_redis_password
Recipes are available to run common operations and processing pipelines. To see the available recipes, run the following command:
just list
list
— List recipesinstall
— Setup the environmentcp-env
— Copy the default.env
file to the projectupdate_trackers
— Update the environment (run after checking out code changes)update-env
— Update environmentcp-core
— Copy core dev code to the project on Doriscp-dev-cfe
— Copy CFE dev code to the project on Doriscp-dev-ptvr
— Copy Planktivore dev code to the project on Doriscp-dev-uav
— Copy UAV dev code to the project on Doriscp-dev-bio
— Copy Bio dev code to the project on Doriscp-dev-i2map
— Copy i2MAP dev code to the project on Doriscp-dev-vss
— Copy Vector Search System (VSS) dev code to the project on Doris
init-labels project='uav' leaf_type_id='19'
— Initialize labels for quick lookupplot-tsne-vss project='uav'
— Generate a t-SNE plot of the VSS databaseoptimize-vss project='uav' *more_args=""
calc-acc-vss project='uav'
— Calculate VSS accuracy after download and optimizationreset-vss-all
— Reset all VSS data (dangerous)reset-vss project='uav'
— Reset VSS database for a projectremove-vss project='uav' *more_args=""
— Remove VSS entry (e.g.,--doc 'doc:marine organism:*'
)init-vss project='uav' *more_args=""
— Initialize VSS for a projectload-vss project='uav'
— Load precomputed exemplars into VSSgen-stats-csv project='UAV' data='...'
- Generate training data stats from downloaded data. Aggregate stats for nested directories
load-cfe-isiis-videos missions=""
— Load CFE ISIIS mission videosload-cfe-isiis-sdcat data_dir="" stride="14"
— Load CFE ISIIS detections/clusterscluster-cfe-isiis roi_dir="..." save_dir="..."
— Cluster CFE ISIIS Hawaii framescluster-cfe-isiis-hawaii-p1
— First pass clustering for CFE Hawaiicluster-cfe-isiis-hawaii-p2 p1_dir=""
— Second pass clusteringgen-cfe-data
— Generate training data for CFEtranscode-cfe-isiis-rc
— Transcode Rachel Carson videostranscode-cfe-isiis-hawaii
— Transcode Hawaii videos
predict-vss-velella
— Predict Velella images using VSSrun-mega-stride-bio video='...'
— Mega stride on bio videorun-mega-track-bio video='...'
— Mega tracking on diverun-mega-track-test-1min
— 1-minute test videorun-mega-track-test-fastapiyv5
— With FastAPIgen-bio-data image_dir=""
— Generate training data for either classification or detection modelsrun-ctenoA-prod
— Inference on videos in TSVrun-mega-inference
— Mega inference on one video
load-ptvr-images images='tmp/roi' *more_args=""
— Load Planktivore ROI imagescluster-ptvr-images *more_args=""
— Cluster Planktivore ROI imagesload-ptvr-clusters clusters='tmp/roi/cluster.csv' *more_args=""
— Load Planktivore ROI clustersrescale-ifcb-images collection="2014"
— Rescale IFCB imagesrescale-ptvr-images collection="..."
download-rescale-ptvr-images collection="..."
— Download and rescale Planktivore imagescluster-ptvr-sweep roi_dir='...' save_dir='...' device='cuda'
— Run cluster sweep on Planktivore datagen-ptvr-lowmag-data
— Generate low-mag training datainit-ptvr-lowmag-vss
— Init VSS DB for low-mag Planktivore data
IMPORTANT: Many of these command are now run on the production server, uav.shore.mbari.org See the bitbucket uavprocessing repo for more details.
cluster-uav *more_args=""
— Cluster UAV missions - only run on the production serverdetect-uav *more_args=""
— Detect UAV missions - only run on the production serverload-uav-images
— Load UAV mission images - only run on the production serverload-uav type="cluster"
— Load UAV detections/clusters - only run on the production serverfix-uav-metadata
— Fix UAV metadata (lat/lon/alt)detect-uav-test
— Test detect UAV missionsgen-uav-data
— Generate training data
run-mega-stride-i2map video='...' vit_model='...' version='...'
run-mega-track-i2map video='...' vit_model='...' version='...'
cluster-i2mapbulk
— Run inference & clustering on i2MAP bulktranscode-i2map
— Transcode i2MAP.mov
to.mp4
for use with Tatorload-i2mapbulk data='data'
download-i2mapbulk-unlabeled
— Get unlabeled datagen-i2map-data
— Generate training data from mantis.shore.mbari.org server for either classification or detection modelsgen-i2mapbulk-data
— Generate training from i2map.shore.mbari.org server for either classification or detection models
replace-m3-urls
— Replacem3
URLs with Mantis URLs in the database
--
- aidata -A tool to extract, transform, load and download operations on AI data.
- sdcat - Sliced Detection and Clustering Analysis Toolkit; a tool to detect and cluster objects in images.
- deepsea-ai - A tool to train and run object detection and tracking on video at scale in the cloud (AWS).
- fastapi-yolov5 - A RESTful API for running YOLOv5 object detection models on images either locally or in the cloud (AWS).
- fastapi-vss - A RESTful API for vector similarity search using foundational models.
- fastapi-tator - A RESTful API server for bulk operations on a Tator annotation database.
🗓️ Last updated: 2025-08-13