This is a simple receiver for the UDP- or TCP-encapsulated version of Opulent Voice frames created by Interlocutor.
See https://github.com/OpenResearchInstitute/interlocutor
Derived from code generated by Claude based on Interlocutor.
In the case of UDP encapsulation, exactly one Opulent Voice frame is included in each UDP packet.
In the case of TCP encapsulation, the TCP byte stream consists of a COBS-encoded stream of Opulent Voice frames. (Yes, this means there are two independent layers of COBS encoding in the TCP-encapsulated frames. One within the over-the-air frames, and one just for TCP encapsulation purposes.)
Within either encapsulation method, the frame structure is now the baseline Opulent Voice structure.
Might need this to run:
export DYLD_LIBRARY_PATH=/opt/homebrew/opt/opus/lib:$DYLD_LIBRARY_PATH
On macOS,
brew install portaudio
pip3 install opuslib pyaudio cobs emoji_data_python scapy
python3 opulent_voice_receiver.py
Or, using UV,
brew install portaudio
uv init
uv venv
source .venv/bin/activate
uv pip install opuslib pyaudio cobs emoji_data_python scapy
uv run opulent_voice_receiver.py
On Raspberry Pi OS (bookworm),
sudo apt install libopus-dev opus-tools python3-pyaudio portaudio19-dev
pip3 install opuslib pyaudio cobs emoji_data_python scapy sounddevice
On Windows 11, you might have to build opus.dll
from source (downloaded from opus-codec.org) and place it in C:\Windows\System32
. You'll need cmake and a C compiler; the free "community" version of Microsoft Visual Studio is good enough.
pip3 install opuslib pyaudio cobs emoji_data_python scapy sounddevice
OpulentVoiceProtocol
encapsulates the Opulent Voice frame header and
protocol knowledge.
It provides a number of constant values and one method, parse_frame()
,
which extracts the fields of the frame header.
AudioPlayer
uses pyaudio
(which uses PortAudio) to decode Opus voice
packets and send the decoded audio samples to the default audio output
device.
It manages a short queue of pending decoded audio frames. The queue is
filled by calls to the method decode_and_queue_audio()
, and emptied by
callbacks to the method audio_callback()
sent by itself.
It provides start()
and stop()
methods, which are called by the methods
of the same names of the OpulentVoiceReceiver
object.
It keeps some statistics, which can be copied out by calling the
get_stats()
method.
OpulentVoiceReceiver
operates the receiver overall.
It accepts the incoming UDP-encapsulated packets by opening a socket
and operating a separate thread running the listen_loop()
method
repeatedly. This method also accepts incoming TCP connections. The
thread blocks on a select()
call, awaiting either an incoming UDP
packet or a new TCP connection. If an incoming UDP packet arrives,
it is passed directly to the process_frame()
method. If a new TCP
connection arrives, it is handled by the handle_tcp_connection()
method, which creates a new thread to handle the TCP connection.
handle_tcp_connection()
blocks on a recv()
call. The data returned
by recv
is not necessarily a complete encapsulated Opulent Voice
frame. It might be shorter or longer, and could contains data from
multiple frames. The data is passed to reassemble_encap()
, which
sorts this out into complete Opulent Voice frames, using the COBS
delimiters added during encapsulation. Each complete frame is then
COBS-decoded and finally passed to process_frame()
, to be handled
identically to a UDP-encapsulated frame.
Note that both UDP and TCP encapsulated frames are handled at all times; there is no mode switch between UDP and TCP. As far as this program is concerned, either or both of UDP and TCP can be used at any time. However, there is no particular effort made to guarantee a particular order of delivery of frames received via UDP and TCP, so using both at the same time is not recommended.
process_frame()
is also part of OpulentVoiceReceiver
. This method
parses the packet into the component fields of the Opulent Voice
frame header, plus the COBS-encoded payload. These are then passed
on to cobs_process_bytes()
, which uses the COBS delimiters within
the encapsulated payload to reconstruct the original packets, which
may be voice, text, control, or general IP data packets. Each processed
frame may complete zero, one, or more than one COBS packets. For each
packet completed, the method process_COBS_packet()
is called.
process_cobs_packet
begins by COBS-decoding the packet. That should
produce a valid IP frame. This is checked (using ScaPy library calls).
This program doesn't handle general IP data packets, so we immediately
go on to assume that within the IP wrapper is a UDP frame, and check
that, and check the IP and UDP checksums. If all checks pass, we then
use the UDP destination port to distinguish between voice, text, and
control packets, or an unknown UDP packet type. Then, based on the
type, process_COBS_packet()
acts as follows:
-
When handling a voice frame, it invokes AudioPlayer's
decode_and_queue_audio()
method to play back the received audio. -
When handling a text message, it fetches the station ID from the frame header, decodes it, and prints a line consisting of the station ID, a special icon marking this as a text message, and the text data from the packet payload.
-
When handling a control message, it examines the contents of the control message. A recognized control message causes little or no output, but an unknown control message is handled much like a text message.
In the case of an unknown UDP packet type, or an IP packet that isn't UDP at all, the intention is to deliver this packet to the host's network stack for normal processing. This is not yet implemented.
-
main creates an
OpulentVoiceReceiver
OpulentVoiceReceiver.__init__()
instantiates anOpulentVoiceProtocol
OpulentVoiceReceiver.__init__()
instantiates anAudioPlayer
-
main calls
OpulentVoiceReceiver.start()
and then goes into a loop callingtime.sleep(1)
until interrupted. -
OpulentVoiceReceiver.start
then:- creates and starts a daemon thread running
OpulentVoiceReceiver.listen_loop()
to receive the encapsulated Opulent Voice packets (if encapsulated in UDP) or new connections (if TCP is used for encapsulation).
- creates and starts a daemon thread running
-
listen_loop()
sits in aselect
loop. If a UDP packet comes in, it callsrecvfrom()
, which returns exactly one packet. This loop continues untilself.running
is false. Each time a packet is received in this way, it is passed toself.process_frame()
. If a new TCP connection comes in, a new thread is created to handle incoming data on that connection, using methodhandle_tcp_connection()
, which in turn usesreassemble_encap()
, which callsprocess_frame()
when a complete encapsulated frame is found. -
process_frame()
callsself.protocol.parse_frame()
to extract the fields of the frame header (plus a timestamp) into a dictionary, which is returned. The payload, which is an IP packet, is passed into Scapy, resulting in a Scapy packet calledpkt
. We check that it's an IP/UDP packet, confirm its checksums, and get the destination UDP port number. The port number is used to determine which service (voice, text, control) owns the packet.-
if it's voice, we send the data to
self.audio_player.decode_and_queue_audio()
, which calls the Opus decoder inself.decoder.decode()
, and adds the resulting frame of audio samples toself.audio_queue
. Meanwhile,self.audio_player
(which is anAudioPlayer
) is pulling decoded audio frames fromself.audio_queue
viaaudio_callback()
, and streaming them to the default audio output device. -
if it's a text message, we decode the
station_id
to ASCII and print it with the text message data to the screen. -
if it's a control message, we recognize known text values of control messages, or print out unknown control messages.
-
-
If, on the other hand, the packet isn't IP/UDP or its destination port isn't one known to Opulent Voice, we currently discard it. Eventually, this packet will be passed through to the host's network stack.