Add: implement end-to-end latency measurement documentation and text detection script

tszumski · tszumski · commit d022dfa0b019 · 2025-06-16T05:14:29.000Z
diff --git a/doc/LatencyMeasurement.md b/doc/LatencyMeasurement.md
@@ -0,0 +1,180 @@
+# End-to-End Latency Measurement — Media Transport Library
+
+This document describes a simple solution for measuring end-to-end latency in Media Transport Library.
+
+## Overview
+
+The solution is based on the FFmpeg ability to print current timestamps on the sender side (Tx) and the receiver side (Rx), and the use of Optical Character Recognition (OCR) to read the timestamps out of each received video frame and calculate the delta. The choice of OCR is determined by the fact that the text can be effectively recognized even if the picture is affected by any sort of a lossy video compression algorithm somewhere in the transmission path. To achieve proper accuracy of the measurement, both Tx and Rx host machines should be synchronized using Precision Time Protocol (PTP).
+
+> Only video payload ST2110-20 and ST2110-22 is supported.
+
+```mermaid
+flowchart LR
+   tx-file((Input video file))
+   tx-ffmpeg(Tx FFmpeg)
+   mtl1(ST2110)
+   NET(network)
+   mtl2(ST2110)
+   rx-ffmpeg(Rx FFmpeg)
+   rx-file((Output video file))
+
+   tx-file --> tx-ffmpeg --> mtl1 --> NET --> mtl2 --> rx-ffmpeg --> rx-file
+
+   classDef netStyle fill:#ffcccc;
+   class NET netStyle;
+```
+
+## How it works
+
+1. Tx side – The user starts FFmpeg with special configuration to stream video via the ST2110.
+1. Rx side – The user starts FFmpeg with special configuration to receive the video stream from the ST2110.
+1. Tx side – FFmpeg prints the current timestamp as a huge text at the top of each video frame and transmits it via the network.
+1. Rx side – FFmpeg prints the current timestamp as a huge text at the bottom of each video frame received from the network and saves it on the disk.
+1. After transmission is done, there is a resulting MPEG video file on the disk on the Rx side.
+1. The user runs the solution script against the MPEG file that recognizes the Tx and Rx timestamps in each frame, and calculates the average latency based on the difference between the timestamps. Additionally, the script generates a latency diagram and stores it in JPEG format on the disk.
+
+## Sample latency diagram
+
+<img src="png/ffmpeg-based-latency-solution-diagram.jpg" width="520">
+
+## Important notice on latency measurement results
+
+> Please note the calculated average latency is highly dependent on the hardware configuration and CPU background load, and cannot be treated as an absolute value. The provided solution can only be used for comparing the latency in different network configurations and video streaming parameters, as well as latency stability checks.
+
+
+## Build and install steps
+
+> It is assumed that Media Transport Library is installed on the Tx and Rx host machines according to [Build Guide](build.md).
+
+If FFmpeg Plugin was installed earlier, remove its directory before proceeding with the following.
+
+1. Install required packages
+   ```bash
+   sudo apt install libfreetype6-dev libharfbuzz-dev libfontconfig1-dev
+   ```
+1. Clone build and install FFmpeg.
+   ```bash
+   git clone https://github.com/FFmpeg/FFmpeg.git
+   cd FFmpeg
+   git checkout release/7.0
+   # apply the build patch
+   git am <repo_dir>/ecosystem/ffmpeg_plugin/7.0/*.patch
+   # copy the mtl in/out implementation code
+   cp <repo_dir>/ecosystem/ffmpeg_plugin/mtl_*.c -rf libavdevice/
+   cp <repo_dir>/ecosystem/ffmpeg_plugin/mtl_*.h -rf libavdevice/
+   ./configure --enable-shared --enable-mtl --enable-libfreetype --enable-libharfbuzz --enable-libfontconfig
+   make -j "$(nproc)"
+   sudo make install
+   sudo ldconfig
+   ```
+1. Install Tesseract OCR
+   ```bash
+   apt install tesseract-ocr
+   ```
+1. Install Python packages
+   ```bash
+   pip install opencv-python~=4.11.0 pytesseract~=0.3.13 matplotlib~=3.10.3
+   ```
+1. Setup time synchronization on host machines
+
+   > Make sure `network_interface_1` and `network_interface_2` are connected to the same network.
+
+   * __host-1 Controller clock__
+      ```bash
+      sudo ptp4l -i <network_interface_1> -m 2 
+      sudo phc2sys -a -r -r -m
+      ```
+
+   * __host-2 Worker clock__
+      ```bash
+      sudo ptp4l -i <network_interface_2> -m 2 -s
+      sudo phc2sys -a -r
+      ```
+
+## Example – Measuring transmission latency between two FFmpeg instances on different hosts
+
+This example demonstrates sending a video file from the 1st FFmpeg instance to the 2nd FFmpeg instance via Media Transport Library, and then calculate transmission latency from the recorded video.
+
+1. Start the Receiver side FFmpeg instance
+
+   ```bash
+   sudo ffmpeg -y                                                            \
+      -f mtl_st20p                                                           \
+         -p_port 0000:af:01.0                                                \
+         -p_sip 192.168.96.2                                                 \
+         -p_rx_ip 239.168.85.20                                              \
+         -udp_port 20000                                                     \
+         -payload_type 96                                                    \
+         -fps 59.94                                                          \
+         -pix_fmt yuv422p10le                                                \
+         -video_size 1920x1080                                               \
+          -i -                                                               \
+      -vf                                                                    \
+          "drawtext=fontsize=40:                                             \
+          text='Rx timestamp %{localtime\\:%H\\\\\:%M\\\\\:%S\\\\\:%3N}':    \
+          x=10: y=70: fontcolor=white: box=1: boxcolor=black: boxborderw=10" \
+      -vcodec mpeg4 -qscale:v 3 recv.mp4
+   ```
+
+1. Start the Sender side FFmpeg instance
+
+   ```bash
+   sudo ffmpeg -i <video-file-path>                                       \
+   -vf                                                                    \
+       "drawtext=fontsize=40:                                             \
+       text='Tx timestamp %{localtime\\:%H\\\\\:%M\\\\\:%S\\\\\:%3N}':    \
+       x=10: y=10: fontcolor=white: box=1: boxcolor=black: boxborderw=10" \
+   -f mtl_st20p                                                           \
+      -fps 59.94                                                          \
+      -p_port 0000:af:01.1                                                \
+      -p_sip 192.168.96.3                                                 \
+      -p_tx_ip 239.168.85.20                                              \
+      -udp_port 20000                                                     \
+      -payload_type 96 -
+   ```
+
+   When sending a raw video file, e.g. of the YUV format, you have to explicitly specify the file format `-f rawvideo`, the pixel format `-pix_fmt`, and the video resolution `-s WxH`:
+
+   ```bash
+   ffmpeg -f rawvideo -pix_fmt yuv422p10le -s 1920x1080 -i <video-file-path> ...
+   ```
+
+   It is also recommended to provide the read rate `-readrate` at which FFmpeg will read frames from the file:
+
+   ```bash
+   ffmpeg -f rawvideo -readrate 2.4 -pix_fmt yuv422p10le -s 1920x1080 -i <video-file-path> ...
+   ```
+
+   The `-readrate` value is calculated from the `-frame_rate` parameter value using the following equation: $readrate=framerate\div25$. Use the pre-calculated values from the table below.
+
+   | frame_rate |      readrate     |
+   |------------|-------------------|
+   |    25      |   25 / 25 = 1     |
+   |    50      |   50 / 25 = 2     |
+   |    60      |  60 / 25 = 2.4    |
+
+1. Run the script located in `<repo_dir>/script`against the recorded MPEG file. The first argument is the input video file path. The second argument is the optional latency diagram JPEG file path to be generated.
+
+   ```bash
+   python text-detection.py recv.mp4 recv-latency.jpg
+   ```
+
+   Console output
+   ```bash
+   ...
+   Processing Frame:  235
+   Processing Frame:  236
+   Processing Frame:  237
+   Processing Frame:  238
+   Processing Frame:  239
+   Processing Frame:  240
+   Saving the latency chart to:  recv-latency.jpg
+   File: recv.mp4 | Last modified: 2025-06-02 13:49:54 UTC
+   Resolution: 640x360 | FPS: 25.00
+   Average End-to-End Latency: 564.61 ms
+   ```
+
+   See the [Sample latency diagram](#sample-latency-diagram).
+
+## Customization
+When modifying FFmpeg commands if you change parameters of `drawtext` filter, especialy `fontsize`, `x`, `y` or `text`, you have to adjust python script __text-detection.py__ too, please refer to function `extract_text_from_region(image, x, y, font_size, length)`
diff --git a/doc/png/ffmpeg-based-latency-solution-diagram.jpg b/doc/png/ffmpeg-based-latency-solution-diagram.jpg
diff --git a/script/text-detection.py b/script/text-detection.py
@@ -0,0 +1,165 @@
+import sys
+import pytesseract
+import cv2 as cv
+import numpy as np
+from datetime import datetime
+import re
+import matplotlib.pyplot as plt
+from concurrent.futures import ThreadPoolExecutor
+import os
+
+def is_display_attached():
+    # Check if the DISPLAY environment variable is set
+    return 'DISPLAY' in os.environ
+
+def extract_text_from_region(image, x, y, font_size, length):
+    """
+    Extracts text from a specific region of the image.
+    :param image: The image to extract text from.
+    :param x: The x-coordinate of the top-left corner of the region.
+    :param y: The y-coordinate of the top-left corner of the region.
+    :param font_size: The font size of the text.
+    :param length: The length of the text to extract.
+    :return: The extracted text.
+    """
+    margin = 5
+    y_adjusted = max(0, y - margin)
+    x_adjusted = max(0, x - margin)
+    height = y + font_size + margin
+    width = x + length + margin
+    # Define the region of interest (ROI) for text extraction
+    roi = image[y_adjusted:height, x_adjusted:width]
+
+    # Use Tesseract to extract text from the ROI
+    return pytesseract.image_to_string(roi, lang='eng')
+
+def process_frame(frame_idx, frame):
+    print("Processing Frame: ", frame_idx)
+
+    timestamp_format = "%H:%M:%S:%f"
+    timestamp_pattern = r'\b\d{2}:\d{2}:\d{2}:\d{3}\b'
+
+    # Convert frame to grayscale for better OCR performance
+    frame = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)
+
+    line_1 = extract_text_from_region(frame, 10, 10, 40, 600)
+    line_2 = extract_text_from_region(frame, 10, 70, 40, 600)
+
+    # Find the timestamps(Type: string) in the extracted text using regex
+    tx_time = re.search(timestamp_pattern, line_1)
+    rx_time = re.search(timestamp_pattern, line_2)
+
+    if tx_time is None or rx_time is None:
+        print("Error: Timestamp not found in the expected format.")
+        return 0
+
+    # Convert the timestamps(Type: string) to time (Type: datetime)
+    tx_time = datetime.strptime(tx_time.group(), timestamp_format)
+    rx_time = datetime.strptime(rx_time.group(), timestamp_format)
+
+    if tx_time is None or rx_time is None:
+        print("Error: Timestamp not found in the expected format.")
+        return 0
+
+    if tx_time > rx_time:
+        print("Error: Transmit time is greater than receive time.")
+        return 0
+
+    time_difference = rx_time - tx_time
+    time_difference_ms = time_difference.total_seconds() * 1000
+    return time_difference_ms
+
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: python text-detection.py <input_video_file> <output_image_name>")
+        sys.exit(1)
+
+    input_video_file = sys.argv[1]
+    cap = cv.VideoCapture(input_video_file)
+    if not cap.isOpened():
+        print("Fatal: Could not open video file.")
+        sys.exit(1)
+   
+    frame_idx = 0
+    time_differences = []
+
+    with ThreadPoolExecutor(max_workers=40) as executor:
+        futures = []
+        while True:
+            ret, frame = cap.read()
+            if not ret:
+                break
+
+            futures.append(executor.submit(process_frame, frame_idx, frame))
+            frame_idx += 1
+
+        for future in futures:
+            time_differences.append(future.result())
+
+    # Filter out zero values from time_differences
+    non_zero_time_differences = [td for td in time_differences if td != 0]
+
+    # Calculate the average latency excluding zero values
+    if non_zero_time_differences:
+        average_latency = np.mean(non_zero_time_differences)
+
+        # Filter out anomaly peaks that differ more than 25% from the average for average calculation
+        filtered_time_differences = [
+            td for td in non_zero_time_differences if abs(td - average_latency) <= 0.25 * average_latency
+        ]
+
+        # Calculate the average latency using the filtered data
+        filtered_average_latency = np.mean(filtered_time_differences)
+    else:
+        print("Fatal: No timestamps recognized in the video. No data for calculating latency.")
+        sys.exit(1)
+
+    # Plot the non-zero data
+    plt.plot(non_zero_time_differences, marker='o')
+    plt.title('End-to-End Latency — Media Transport Library')
+    plt.xlabel('Frame Index')
+    plt.ylabel('Latency, ms')
+    plt.grid(True)
+
+    # Adjust the layout to create more space for the text
+    plt.subplots_adjust(bottom=0.5)
+
+    # Prepare text for display and stdout
+    average_latency_text = f'Average End-to-End Latency: {filtered_average_latency:.2f} ms'
+    file_name = os.path.basename(input_video_file)
+    file_mod_time = datetime.fromtimestamp(os.path.getmtime(input_video_file)).strftime('%Y-%m-%d %H:%M:%S')
+    file_info_text = f'File: {file_name} | Last modified: {file_mod_time} UTC'
+    width = int(cap.get(cv.CAP_PROP_FRAME_WIDTH))
+    height = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT))
+    fps = cap.get(cv.CAP_PROP_FPS)
+    video_properties_text = f'Resolution: {width}x{height} | FPS: {fps:.2f}'
+
+    cap.release()
+
+    # Display text on the plot
+    plt.text(0.5, -0.55, average_latency_text, 
+             horizontalalignment='center', verticalalignment='center', 
+             transform=plt.gca().transAxes)
+    plt.text(0.5, -0.85, file_info_text, 
+             horizontalalignment='center', verticalalignment='center', 
+             transform=plt.gca().transAxes)
+    plt.text(0.5, -1, video_properties_text, 
+             horizontalalignment='center', verticalalignment='center', 
+             transform=plt.gca().transAxes)
+
+    if is_display_attached():
+        plt.show()
+
+    if len(sys.argv) == 3:
+        filename = sys.argv[2]
+        if not filename.endswith('.jpg'):
+            filename += '.jpg'
+        print("Saving the latency chart to: ", filename)
+        plt.savefig(filename, format='jpg', dpi=300)
+
+    # Print text to stdout
+    print(file_info_text)
+    print(video_properties_text)
+    print(average_latency_text)
+
+main()