Language-Based Audio Retrieval

A curated list of academic papers, datasets, and other resources on Language-Based Audio Retrieval.

Papers

Year 2025

Year	Title	Paper	Code
2025	TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining	arXiv	GitHub
2025	FLAM: Frame-Wise Language-Audio Modeling	ICML
2025	CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining	arXiv
2025	M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP	arXiv
2025	TAIL: Text-Audio Incremental Learning	arXiv
2025	ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors	arXiv	GitHub

Year 2024

Year	Title	Paper	Code
2024	Text-based Audio Retrieval by Learning from Similarities between Audio Captions	SPL
2024	Language-based Audio Retrieval with Co-Attention Networks	UIC
2024	Audio meets text: a loss-enhanced journey with manifold mixup and re-ranking	Article	GitHub
2024	Dissecting Temporal Understanding in Text-to-Audio Retrieval	MM	GitHub
2024	Pre-Trained Models, Datasets, Data Augmentation for Language-Based Audio Retrieval	DCASE
2024	Estimated Audio–Caption Correspondences Improve Language-Based Audio Retrieval	DCASE	GitHub
2024	The Language of Sound Search: Examining User Queries in Audio Search Engines	DCASE
2024	Integrating Continuous and Binary Relevances in Audio-Text Relevance Learning	DCASE
2024	Improving Language-Based Audio Retrieval using LLM Augmentations	DCASE
2024	Language-Based Audio Retrieval with GPT-Augmented Captions and Self-Attended Audio Clips	CSCWD
2024	Learning Audio Concepts from Counterfactual Natural Language	ICASSP	GitHub
2024	A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval	ICASSP
2024	Fusing Audio and Metadata Embeddings Improves Language-Based Audio Retrieval	EUSIPCO
2024	Audio-Text Retrieval: Exploring Shared Parameters and Intra-Modal Constraint Loss	ANTIC

Year 2023

Year	Title	Paper	Code
2023	Killing Two Birds with One Stone: Can an Audio Captioning System Also Be Used for Audio-Text Retrieval?	DCASE
2023	Advancing Natural-Language Based Audio Retrieval with Passt and Large Audio-Caption Data Sets	DCASE
2023	Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances	DCASE	GitHub
2023	Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions	INTERSPEECH
2023	Audio Retrieval with WavText5K and CLAP Training	INTERSPEECH
2023	Audio–text retrieval based on contrastive learning and collaborative attention mechanism	Article
2023	Audio-Text Models Do Not Yet Leverage Natural Language	ICASSP
2023	Improving Text-Audio Retrieval by Text-Aware Attention Pooling and Prior Matrix Revised Loss	ICASSP
2023	Data Leakage in Cross-Modal Retrieval Training: A Case Study	ICASSP
2023	Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation	ICASSP
2023	On Negative Sampling for Contrastive Audio-Text Retrieval	ICASSP	GitHub
2023	CLAP: Learning Audio Concepts from Natural Language Supervision	ICASSP
2023	TIAR: Text-Image-Audio Retrieval with weighted multimodal re-ranking	Article	GitHub
2023	Enhancing Audio Retrieval with Attention-based Encoder for Audio Feature Representation	EUSIPCO
2023	Multi-grained Representation Learning for Cross-modal Retrieval	SIGIR
2023	Cross-Modal Audio-Text Retrieval via Sequential Feature Augmentation	CACML

Year 2022

Year	Title	Paper
2022	Language-Based Audio Retrieval with Textual Embeddings of Tag Names	DCASE
2022	Improving Natural-Language-Based Audio Retrieval with Transfer Learning and Audio & Text Augmentations	DCASE
2022	Matching Text and Audio Embeddings: Exploring Transfer-Learning Strategies for Language-Based Audio Retrieval	DCASE
2022	Language-Based Audio Retrieval Task in DCASE 2022 Challenge	DCASE
2022	Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval	INTERSPEECH
2022	On Metric Learning for Audio-Text Cross-Modal Retrieval	INTERSPEECH
2022	Audio-Text Retrieval in Context	ICASSP
2022	Language-Based Audio Retrieval with Converging Tied Layers and Contrastive Loss	APSIPA ASC
2022	Audio Retrieval with Natural Language Queries: A Benchmark Study	Article

Year 2021

Year	Title	Paper	Code
2021	Audio Retrieval with Natural Language Queries	INTERSPEECH

Before 2021

Year	Title	Paper
2019	Cross Modal Audio Search and Retrieval with Joint Embeddings Based on Text and Audio	ICASSP
2018	Acoustic Event Search with An Onomatopoeic Query: Measuring Distance between Onomatopoeic Words and Sounds	DCASE
2010	Content-Based Retrieval From Unstructured Audio Databases Using An Ecological Acoustics Taxonomy	ICAD
2008	Large-scale content-based audio retrieval from text queries	ICMR
2007	Audio Information Retrieval using Semantic Similarity	ICASSP
2005	Semantic-based Audio Recognition and Retrieval	Thesis
2002	Semantic-audio retrieval	ICASSP
1999	An overview of audio information retrieval	Article

Audio-Text Datasets

Year	Dataset	Paper	Code
2024	AudioSetCaps	arXiv	GitHub
2024	Auto-ACD	MM	GitHub
2023	WavCaps	TASLP	GitHub
2021	MACS		Zenodo
2020	Clotho	ICASSP	GitHub
2019	AudioCaps	NAACL	GitHub

Competitions

Year	Competition	Website
2024	DCASE 2024 Challenge Task 8	DCASE
2023	DCASE 2023 Challenge Task 6	DCASE
2022	DCASE 2022 Challenge Task 6	DCASE

Journals & Conferences

Type	Name
Journal	IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
Journal	IEEE Transactions on Multimedia
Conference	ACM Special Interest Group on Information Retrieval (SIGIR)
Conference	IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Conference	Annual Conference of the International Speech Communication Association (INTERSPEECH)
Conference	International Society for Music Information Retrieval Conference (ISMIR)
Conference	ACM Multimedia (MM)
Conference	European Signal Processing Conference (EUSIPCO)
Conference	ACM International Conference on Multimedia Retrieval (ICMR)
Workshop	Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language-Based Audio Retrieval

Table of Contents

Papers

Year 2025

Year 2024

Year 2023

Year 2022

Year 2021

Before 2021

Audio-Text Datasets

Competitions

Journals & Conferences

About

Uh oh!

Releases

Packages

License

xieh97/language-based-audio-retrieval

Folders and files

Latest commit

History

Repository files navigation

Language-Based Audio Retrieval

Table of Contents

Papers

Year 2025

Year 2024

Year 2023

Year 2022

Year 2021

Before 2021

Audio-Text Datasets

Competitions

Journals & Conferences

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages