Skip to content

nttcslab-sp/mover

Repository files navigation

MOVER

Combining Multiple Meeting Recognition Systems

This is the implementation of "MOVER: Combining Multiple Meeting Recognition Systems" (Interspeech 2025).

Install

pip install git+https://github.com/nttcslab-sp/mover

Command line tool

mover --infile ./example_files/sys1.json ./example_files/sys2.json ./example_files/sys3.json --outfile out.json

# Enable to handle wild-cards
mover --infile './example_files/*.json' --outfile out.json

Warning! We will continue to refactor the API and update the documentation in the future. The names and usage of function arguments are subject to change.

Python API

from mover import mover

seglst_object = mover(["./example_files/sys1.json", "./example_files/sys2.json", "./example_files/sys3.json"])
seglst_object.dump("out.json")

The JSON file format is "SEGment-wise Long-form Speech Transcription annotation (SegLST, see also MeetEval about the format)", the file format used in the CHiME challenges, and inside the function it is handled as a meeteval.io.SegLST instance via meeteval.io.load. The return type of the mover function is also SegLST.

Alternatively, SegLST instances can be passed directly as arguments.

from mover import mover
import meeteval

seglst_list = [meeteval.io.load(f) for f in ["./example_files/sys1.json", "./example_files/sys2.json", "./example_files/sys3.json"]]

seglst_object = mover(seglst_list)
seglst_object.dump("out.json")

TIPS

Since the output text from different recognition systems may differ in the notation style of numbers, symbols, and punctuations, it is recommended to perform normalization into your desired style before applying mover. For example, when applying chime_utils.text_norm.get_txt_norm, you can do as follows:

pip install git+https://github.com/chimechallenge/chime-utils
from chime_utils.text_norm import get_txt_norm
text_norm_fn = get_txt_norm("chime8")

for seglst in seglst_list:
    for segment in seglst:
        words = segment["words"]
        for _ in range(5):
            words_ = text_norm_fn(words)
            if words == words_:
                break
            words = words_
        else:
            raise RuntimeError()
        segment["words"] = words
seglst_object = mover(seglst_list)

Cite

ISCA DOI arXiv

@inproceedings{kamo25_interspeech,
  title     = {{MOVER: Combining Multiple Meeting Recognition Systems}},
  author    = {{Naoyuki Kamo and Tsubasa Ochiai and Marc Delcroix and Tomohiro Nakatani}},
  year      = {{2025}},
  booktitle = {{Interspeech 2025}},
  pages     = {{3424--3428}},
  doi       = {{10.21437/Interspeech.2025-1614}},
}

About

MOVER: Combining Multiple Meeting Recognition Systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published