- Introduction
- Project Overview
- Environment Setup
- Object Detection with YOLOv8
- Custom Training with Roboflow
- Object Tracking & Unique ID Assignment
- Team Assignment using KMeans Clustering
- Camera Motion Estimation with Optical Flow
- Perspective Transformation for Real-World Metrics
- Court Keypoint Detection
- Player Speed & Distance Calculation
- Full Pipeline Walkthrough
- Conclusion & Future Work
- References
FootballIQ is an end-to-end AI/Computer Vision system designed to analyze football games from broadcast video. It detects and tracks players, referees, and the ball, then extracts advanced analytics such as player speed, distance covered, team control, and spatial understanding of the football field using court keypoint detection. The system leverages the latest advances in deep learning, including YOLOv8 for object detection, custom training for football-specific classes, advanced tracking, clustering, optical flow, perspective transformation, and field keypoint detection.
This project demonstrates a complete real-world application of computer vision, machine learning, and data analytics in sports. It is accessible for both beginners and experienced ML engineers, and extensible for other sports or analytics use-cases.
Key Capabilities:
- Object Detection: Detects players, referees, and ball using YOLOv8.
- Custom Training: Fine-tunes detection models on football-specific datasets for higher accuracy.
- Object Tracking: Assigns unique IDs to each player and ball, maintaining identity across frames.
- Team Assignment: Uses pixel clustering (KMeans) to segment and assign players to teams based on jersey color.
- Camera Motion Correction: Estimates and removes camera movement using optical flow.
- Court Keypoint Detection: Detects keypoints (corners, penalty boxes, center circle, etc.) on the football field to enable accurate perspective transformation and spatial analytics.
- Real-World Analytics: Measures player speed and distance covered in meters via perspective transformation.
- Visualization: Annotates frames with bounding boxes, IDs, team assignments, calculated stats, and pitch keypoints.
- Python 3.8+
- Jupyter Notebook (main notebooks)
- Required libraries:
ultralytics
,opencv-python
,supervision
,scikit-learn
,numpy
,matplotlib
- Pretrained/finetuned model weights: Download from Google Drive
Install dependencies:
pip install ultralytics opencv-python supervision scikit-learn numpy matplotlib
YOLO (You Only Look Once) is a family of state-of-the-art, real-time object detectors. YOLOv8, provided by Ultralytics, offers fast and accurate detection, making it ideal for analyzing football matches where both speed and accuracy are crucial.
Model Used: yolov8m
(medium-sized model, pre-trained on COCO)
- Detects 80 common classes, including
person
andsports ball
. - Outputs: bounding box coordinates (
x1, y1, x2, y2
), class, and confidence probability.
- Load YOLOv8 via Ultralytics API.
- Run inference on each frame of the input video.
- Collect bounding boxes, class labels, and probabilities.
- Draw results with OpenCV for visualization.

- Problem: The ball was detected in only a few frames; sometimes missed due to size, occlusions, or similarity to the background.
- Solution: Switched to YOLOv5 for ball detection, as it exhibited higher robustness for small and fast-moving objects in football scenarios.
- Problem:
person
class detected audience or people outside the playing field. - Solution: Trained a custom model using the Roboflow Football Player Detection Dataset, which focuses on in-field players and football-specific classes.
- Downloaded annotated football images from Roboflow.
- Annotated classes:
{0: 'ball', 1: 'goalkeeper', 2: 'player', 3: 'referee'}
- Split into train/validation sets.
- Used YOLOv8's training script to finetune on the new dataset.
- Training and validation losses decreased steadily, indicating effective learning.
- Precision, recall, and mAP metrics improved for all classes.

Detection alone is insufficient; we need to track individuals and the ball across frames to analyze movement, speed, and interactions.
- Used the Supervision library for object tracking.
- Each detected player or ball receives a unique tracker ID.
- Custom logic ensures correct identity assignment, even during occlusions or close encounters.
Note: Closest bounding box assignment can fail in close calls; advanced matching algorithms (e.g., Hungarian algorithm) are used for robustness.
- Players are annotated with an ellipse and their unique tracker ID.
- The ball is annotated with a triangle.
- Tracks are drawn over time to visualize player and ball movement paths.
Assigning detected players to teams is done by analyzing the dominant color of their jerseys.
- Extract the pixel region inside each player's bounding box.
- Use KMeans clustering to segment pixels and identify the dominant color.
- Compare colors to known team palettes to assign a team label.
- Visualize with colored bounding boxes or labels.
To measure player movement accurately, we must isolate movement due to camera panning/zooming.
- Calculate optical flow between consecutive frames using OpenCV's
calcOpticalFlowFarneback
or similar methods. - Estimate the average camera movement in X and Y.
- Subtract camera motion from each player's movement to get true player displacement.
Pixel movement is not equivalent to real-world distances due to perspective.
- Use OpenCV's perspective transformation (
cv2.getPerspectiveTransform
,cv2.warpPerspective
) with manually selected reference points (e.g., corners of the field). - Transform image coordinates into a "bird's-eye view" where distances are proportional to meters.
- All player movement is now measured in meters.
To achieve accurate spatial analytics and perspective correction, our system includes an explicit court keypoint detection module. This module identifies essential field features such as the four corners, penalty box corners, center circle, center spot, and possibly the penalty spots and arcs. These keypoints serve as anchors for robust perspective transformation.
How it works:
- A keypoint detection model (can be a lightweight CNN, YOLO keypoint head, or custom solution) is trained to localize specific field landmarks in each frame.
- Detected keypoints allow automatic mapping between image pixels and real-world field locations, improving the accuracy of all geometric computations.
- These keypoints are used to:
- Calibrate the homography (perspective transform) for every frame, making analytics robust even with camera shake or zoom.
- Provide visual overlays showing field lines and regions, which helps in tactical analysis.
- Enable advanced metrics such as heatmaps, possession zones, and offside line calculation.
Typical Keypoints Detected:
- Four field corners
- Penalty box corners
- Center circle (center and circumference)
- Center spot
- Penalty spots
Benefits:
- Removes manual intervention for field calibration.
- Handles dynamic camera angles and zooms.
- Essential for real-world metric calculations (distances, speeds, tactical regions).
- For each tracked player, calculate the displacement between frames (in meters) and divide by the frame interval to estimate speed (m/s or km/h).
- Aggregate distances to compute total distance covered over time.
- Visualize per-player stats directly on the frame, as shown below.

- Input Video → Extract frames.
- YOLOv8 Detection → Detect players, referees, and ball on each frame.
- Custom Dataset Inference → Improve detection for football scenario.
- Tracking → Assign unique IDs and track objects across frames.
- Team Color Clustering → Assign team labels using KMeans on jersey colors.
- Camera Motion Estimation → Compute and remove camera movement.
- Court Keypoint Detection → Detect and localize field landmarks (corners, circles, spots).
- Perspective Transform → Map pixel positions to real-world coordinates using keypoints.
- Calculate Speed/Distance → For each player, compute and annotate stats.
- Visualization → Draw bounding boxes, IDs, team colors, speeds, distances, and pitch overlays on frames.
- Analytics Output → Save annotated video and statistical summaries.
This project demonstrates a comprehensive, real-world application of deep learning and computer vision in sports analytics. By combining state-of-the-art detection, custom training, advanced tracking, clustering, geometric transformations, and court keypoint detection, we deliver actionable insights from football video footage.
Possible Extensions:
- Add event detection (passes, goals, fouls).
- Expand to support other sports.
- Deploy as a web app for real-time analytics.
- Integrate with broadcast overlays for live matches.
- Improve keypoint detection with advanced models or multi-frame temporal smoothing.
- Ultralytics YOLOv8 Documentation
- Roboflow Football Dataset
- Supervision Library
- OpenCV Documentation
- Scikit-learn KMeans
- Homography and Perspective Transformation
For questions, issues, or contributions, please open an issue or submit a pull request.