Skip to content

xieh97/language-based-audio-retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Language-Based Audio Retrieval

Awesome

A curated list of academic papers, datasets, and other resources on Language-Based Audio Retrieval.

Table of Contents

Papers

Year 2025

Year Title Paper Code
2025 TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining arXiv GitHub
2025 FLAM: Frame-Wise Language-Audio Modeling ICML
2025 CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining arXiv
2025 M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP arXiv
2025 TAIL: Text-Audio Incremental Learning arXiv
2025 ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors arXiv GitHub

Year 2024

Year Title Paper Code
2024 Text-based Audio Retrieval by Learning from Similarities between Audio Captions SPL
2024 Language-based Audio Retrieval with Co-Attention Networks UIC
2024 Audio meets text: a loss-enhanced journey with manifold mixup and re-ranking Article GitHub
2024 Dissecting Temporal Understanding in Text-to-Audio Retrieval MM GitHub
2024 Pre-Trained Models, Datasets, Data Augmentation for Language-Based Audio Retrieval DCASE
2024 Estimated Audio–Caption Correspondences Improve Language-Based Audio Retrieval DCASE GitHub
2024 The Language of Sound Search: Examining User Queries in Audio Search Engines DCASE
2024 Integrating Continuous and Binary Relevances in Audio-Text Relevance Learning DCASE
2024 Improving Language-Based Audio Retrieval using LLM Augmentations DCASE
2024 Language-Based Audio Retrieval with GPT-Augmented Captions and Self-Attended Audio Clips CSCWD
2024 Learning Audio Concepts from Counterfactual Natural Language ICASSP GitHub
2024 A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval ICASSP
2024 Fusing Audio and Metadata Embeddings Improves Language-Based Audio Retrieval EUSIPCO
2024 Audio-Text Retrieval: Exploring Shared Parameters and Intra-Modal Constraint Loss ANTIC

Year 2023

Year Title Paper Code
2023 Killing Two Birds with One Stone: Can an Audio Captioning System Also Be Used for Audio-Text Retrieval? DCASE
2023 Advancing Natural-Language Based Audio Retrieval with Passt and Large Audio-Caption Data Sets DCASE
2023 Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances DCASE GitHub
2023 Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions INTERSPEECH
2023 Audio Retrieval with WavText5K and CLAP Training INTERSPEECH
2023 Audio–text retrieval based on contrastive learning and collaborative attention mechanism Article
2023 Audio-Text Models Do Not Yet Leverage Natural Language ICASSP
2023 Improving Text-Audio Retrieval by Text-Aware Attention Pooling and Prior Matrix Revised Loss ICASSP
2023 Data Leakage in Cross-Modal Retrieval Training: A Case Study ICASSP
2023 Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation ICASSP
2023 On Negative Sampling for Contrastive Audio-Text Retrieval ICASSP GitHub
2023 CLAP: Learning Audio Concepts from Natural Language Supervision ICASSP
2023 TIAR: Text-Image-Audio Retrieval with weighted multimodal re-ranking Article GitHub
2023 Enhancing Audio Retrieval with Attention-based Encoder for Audio Feature Representation EUSIPCO
2023 Multi-grained Representation Learning for Cross-modal Retrieval SIGIR
2023 Cross-Modal Audio-Text Retrieval via Sequential Feature Augmentation CACML

Year 2022

Year Title Paper Code
2022 Language-Based Audio Retrieval with Textual Embeddings of Tag Names DCASE
2022 Improving Natural-Language-Based Audio Retrieval with Transfer Learning and Audio & Text Augmentations DCASE
2022 Matching Text and Audio Embeddings: Exploring Transfer-Learning Strategies for Language-Based Audio Retrieval DCASE
2022 Language-Based Audio Retrieval Task in DCASE 2022 Challenge DCASE
2022 Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval INTERSPEECH
2022 On Metric Learning for Audio-Text Cross-Modal Retrieval INTERSPEECH
2022 Audio-Text Retrieval in Context ICASSP
2022 Language-Based Audio Retrieval with Converging Tied Layers and Contrastive Loss APSIPA ASC
2022 Audio Retrieval with Natural Language Queries: A Benchmark Study Article

Year 2021

Year Title Paper Code
2021 Audio Retrieval with Natural Language Queries INTERSPEECH

Before 2021

Year Title Paper Code
2019 Cross Modal Audio Search and Retrieval with Joint Embeddings Based on Text and Audio ICASSP
2018 Acoustic Event Search with An Onomatopoeic Query: Measuring Distance between Onomatopoeic Words and Sounds DCASE
2010 Content-Based Retrieval From Unstructured Audio Databases Using An Ecological Acoustics Taxonomy ICAD
2008 Large-scale content-based audio retrieval from text queries ICMR
2007 Audio Information Retrieval using Semantic Similarity ICASSP
2005 Semantic-based Audio Recognition and Retrieval Thesis
2002 Semantic-audio retrieval ICASSP
1999 An overview of audio information retrieval Article

Audio-Text Datasets

Year Dataset Paper Code
2024 AudioSetCaps arXiv GitHub
2024 Auto-ACD MM GitHub
2023 WavCaps TASLP GitHub
2021 MACS Zenodo
2020 Clotho ICASSP GitHub
2019 AudioCaps NAACL GitHub

Competitions

Year Competition Website
2024 DCASE 2024 Challenge Task 8 DCASE
2023 DCASE 2023 Challenge Task 6 DCASE
2022 DCASE 2022 Challenge Task 6 DCASE

Journals & Conferences

Type Name
Journal IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
Journal IEEE Transactions on Multimedia
Conference ACM Special Interest Group on Information Retrieval (SIGIR)
Conference IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Conference Annual Conference of the International Speech Communication Association (INTERSPEECH)
Conference International Society for Music Information Retrieval Conference (ISMIR)
Conference ACM Multimedia (MM)
Conference European Signal Processing Conference (EUSIPCO)
Conference ACM International Conference on Multimedia Retrieval (ICMR)
Workshop Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)

About

List of academic resources on Language-Based Audio Retrieval

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published