Skip to content

OpenResearchInstitute/postlocutor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

postlocutor

Prototype Receiver for Opulent Voice

This is a simple receiver for the UDP- or TCP-encapsulated version of Opulent Voice frames created by Interlocutor.

See https://github.com/OpenResearchInstitute/interlocutor

Derived from code generated by Claude based on Interlocutor.

In the case of UDP encapsulation, exactly one Opulent Voice frame is included in each UDP packet.

In the case of TCP encapsulation, the TCP byte stream consists of a COBS-encoded stream of Opulent Voice frames. (Yes, this means there are two independent layers of COBS encoding in the TCP-encapsulated frames. One within the over-the-air frames, and one just for TCP encapsulation purposes.)

Within either encapsulation method, the frame structure is now the baseline Opulent Voice structure.

Might need this to run:

export DYLD_LIBRARY_PATH=/opt/homebrew/opt/opus/lib:$DYLD_LIBRARY_PATH

Install Dependencies and Run

On macOS,

brew install portaudio
pip3 install opuslib pyaudio cobs emoji_data_python scapy
python3 opulent_voice_receiver.py

Or, using UV,

brew install portaudio
uv init
uv venv
source .venv/bin/activate
uv pip install opuslib pyaudio cobs emoji_data_python scapy
uv run opulent_voice_receiver.py

On Raspberry Pi OS (bookworm),

sudo apt install libopus-dev opus-tools python3-pyaudio portaudio19-dev
pip3 install opuslib pyaudio cobs emoji_data_python scapy sounddevice

On Windows 11, you might have to build opus.dll from source (downloaded from opus-codec.org) and place it in C:\Windows\System32. You'll need cmake and a C compiler; the free "community" version of Microsoft Visual Studio is good enough.

pip3 install opuslib pyaudio cobs emoji_data_python scapy sounddevice

Summary of Major Classes

class OpulentVoiceProtocol (aka OPV)

OpulentVoiceProtocol encapsulates the Opulent Voice frame header and protocol knowledge.

It provides a number of constant values and one method, parse_frame(), which extracts the fields of the frame header.

class AudioPlayer

AudioPlayer uses pyaudio (which uses PortAudio) to decode Opus voice packets and send the decoded audio samples to the default audio output device.

It manages a short queue of pending decoded audio frames. The queue is filled by calls to the method decode_and_queue_audio(), and emptied by callbacks to the method audio_callback() sent by itself.

It provides start() and stop() methods, which are called by the methods of the same names of the OpulentVoiceReceiver object.

It keeps some statistics, which can be copied out by calling the get_stats() method.

class OpulentVoiceReceiver:

OpulentVoiceReceiver operates the receiver overall.

It accepts the incoming UDP-encapsulated packets by opening a socket and operating a separate thread running the listen_loop() method repeatedly. This method also accepts incoming TCP connections. The thread blocks on a select() call, awaiting either an incoming UDP packet or a new TCP connection. If an incoming UDP packet arrives, it is passed directly to the process_frame() method. If a new TCP connection arrives, it is handled by the handle_tcp_connection() method, which creates a new thread to handle the TCP connection.

handle_tcp_connection() blocks on a recv() call. The data returned by recv is not necessarily a complete encapsulated Opulent Voice frame. It might be shorter or longer, and could contains data from multiple frames. The data is passed to reassemble_encap(), which sorts this out into complete Opulent Voice frames, using the COBS delimiters added during encapsulation. Each complete frame is then COBS-decoded and finally passed to process_frame(), to be handled identically to a UDP-encapsulated frame.

Note that both UDP and TCP encapsulated frames are handled at all times; there is no mode switch between UDP and TCP. As far as this program is concerned, either or both of UDP and TCP can be used at any time. However, there is no particular effort made to guarantee a particular order of delivery of frames received via UDP and TCP, so using both at the same time is not recommended.

process_frame() is also part of OpulentVoiceReceiver. This method parses the packet into the component fields of the Opulent Voice frame header, plus the COBS-encoded payload. These are then passed on to cobs_process_bytes(), which uses the COBS delimiters within the encapsulated payload to reconstruct the original packets, which may be voice, text, control, or general IP data packets. Each processed frame may complete zero, one, or more than one COBS packets. For each packet completed, the method process_COBS_packet() is called.

process_cobs_packet begins by COBS-decoding the packet. That should produce a valid IP frame. This is checked (using ScaPy library calls). This program doesn't handle general IP data packets, so we immediately go on to assume that within the IP wrapper is a UDP frame, and check that, and check the IP and UDP checksums. If all checks pass, we then use the UDP destination port to distinguish between voice, text, and control packets, or an unknown UDP packet type. Then, based on the type, process_COBS_packet() acts as follows:

  • When handling a voice frame, it invokes AudioPlayer's decode_and_queue_audio() method to play back the received audio.

  • When handling a text message, it fetches the station ID from the frame header, decodes it, and prints a line consisting of the station ID, a special icon marking this as a text message, and the text data from the packet payload.

  • When handling a control message, it examines the contents of the control message. A recognized control message causes little or no output, but an unknown control message is handled much like a text message.

In the case of an unknown UDP packet type, or an IP packet that isn't UDP at all, the intention is to deliver this packet to the host's network stack for normal processing. This is not yet implemented.

Summary of Data/Control Flow

  • main creates an OpulentVoiceReceiver

    • OpulentVoiceReceiver.__init__() instantiates an OpulentVoiceProtocol
    • OpulentVoiceReceiver.__init__() instantiates an AudioPlayer
  • main calls OpulentVoiceReceiver.start() and then goes into a loop calling time.sleep(1) until interrupted.

  • OpulentVoiceReceiver.start then:

    • creates and starts a daemon thread running OpulentVoiceReceiver.listen_loop() to receive the encapsulated Opulent Voice packets (if encapsulated in UDP) or new connections (if TCP is used for encapsulation).
  • listen_loop() sits in a select loop. If a UDP packet comes in, it calls recvfrom(), which returns exactly one packet. This loop continues until self.running is false. Each time a packet is received in this way, it is passed to self.process_frame(). If a new TCP connection comes in, a new thread is created to handle incoming data on that connection, using method handle_tcp_connection(), which in turn uses reassemble_encap(), which calls process_frame() when a complete encapsulated frame is found.

  • process_frame() calls self.protocol.parse_frame() to extract the fields of the frame header (plus a timestamp) into a dictionary, which is returned. The payload, which is an IP packet, is passed into Scapy, resulting in a Scapy packet called pkt. We check that it's an IP/UDP packet, confirm its checksums, and get the destination UDP port number. The port number is used to determine which service (voice, text, control) owns the packet.

    • if it's voice, we send the data to self.audio_player.decode_and_queue_audio(), which calls the Opus decoder in self.decoder.decode(), and adds the resulting frame of audio samples to self.audio_queue. Meanwhile, self.audio_player (which is an AudioPlayer) is pulling decoded audio frames from self.audio_queue via audio_callback(), and streaming them to the default audio output device.

    • if it's a text message, we decode the station_id to ASCII and print it with the text message data to the screen.

    • if it's a control message, we recognize known text values of control messages, or print out unknown control messages.

  • If, on the other hand, the packet isn't IP/UDP or its destination port isn't one known to Opulent Voice, we currently discard it. Eventually, this packet will be passed through to the host's network stack.

About

Prototype receiver for Opulent Voice from Interlocutor

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages