A curated list of academic papers, datasets, and other resources on Language-Based Audio Retrieval.
Year | Title | Paper | Code |
---|---|---|---|
2025 | TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining | arXiv | GitHub |
2025 | FLAM: Frame-Wise Language-Audio Modeling | ICML | |
2025 | CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining | arXiv | |
2025 | M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP | arXiv | |
2025 | TAIL: Text-Audio Incremental Learning | arXiv | |
2025 | ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors | arXiv | GitHub |
Year | Title | Paper | Code |
---|---|---|---|
2024 | Text-based Audio Retrieval by Learning from Similarities between Audio Captions | SPL | |
2024 | Language-based Audio Retrieval with Co-Attention Networks | UIC | |
2024 | Audio meets text: a loss-enhanced journey with manifold mixup and re-ranking | Article | GitHub |
2024 | Dissecting Temporal Understanding in Text-to-Audio Retrieval | MM | GitHub |
2024 | Pre-Trained Models, Datasets, Data Augmentation for Language-Based Audio Retrieval | DCASE | |
2024 | Estimated Audio–Caption Correspondences Improve Language-Based Audio Retrieval | DCASE | GitHub |
2024 | The Language of Sound Search: Examining User Queries in Audio Search Engines | DCASE | |
2024 | Integrating Continuous and Binary Relevances in Audio-Text Relevance Learning | DCASE | |
2024 | Improving Language-Based Audio Retrieval using LLM Augmentations | DCASE | |
2024 | Language-Based Audio Retrieval with GPT-Augmented Captions and Self-Attended Audio Clips | CSCWD | |
2024 | Learning Audio Concepts from Counterfactual Natural Language | ICASSP | GitHub |
2024 | A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval | ICASSP | |
2024 | Fusing Audio and Metadata Embeddings Improves Language-Based Audio Retrieval | EUSIPCO | |
2024 | Audio-Text Retrieval: Exploring Shared Parameters and Intra-Modal Constraint Loss | ANTIC |
Year | Title | Paper | Code |
---|---|---|---|
2023 | Killing Two Birds with One Stone: Can an Audio Captioning System Also Be Used for Audio-Text Retrieval? | DCASE | |
2023 | Advancing Natural-Language Based Audio Retrieval with Passt and Large Audio-Caption Data Sets | DCASE | |
2023 | Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances | DCASE | GitHub |
2023 | Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions | INTERSPEECH | |
2023 | Audio Retrieval with WavText5K and CLAP Training | INTERSPEECH | |
2023 | Audio–text retrieval based on contrastive learning and collaborative attention mechanism | Article | |
2023 | Audio-Text Models Do Not Yet Leverage Natural Language | ICASSP | |
2023 | Improving Text-Audio Retrieval by Text-Aware Attention Pooling and Prior Matrix Revised Loss | ICASSP | |
2023 | Data Leakage in Cross-Modal Retrieval Training: A Case Study | ICASSP | |
2023 | Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation | ICASSP | |
2023 | On Negative Sampling for Contrastive Audio-Text Retrieval | ICASSP | GitHub |
2023 | CLAP: Learning Audio Concepts from Natural Language Supervision | ICASSP | |
2023 | TIAR: Text-Image-Audio Retrieval with weighted multimodal re-ranking | Article | GitHub |
2023 | Enhancing Audio Retrieval with Attention-based Encoder for Audio Feature Representation | EUSIPCO | |
2023 | Multi-grained Representation Learning for Cross-modal Retrieval | SIGIR | |
2023 | Cross-Modal Audio-Text Retrieval via Sequential Feature Augmentation | CACML |
Year | Title | Paper | Code |
---|---|---|---|
2022 | Language-Based Audio Retrieval with Textual Embeddings of Tag Names | DCASE | |
2022 | Improving Natural-Language-Based Audio Retrieval with Transfer Learning and Audio & Text Augmentations | DCASE | |
2022 | Matching Text and Audio Embeddings: Exploring Transfer-Learning Strategies for Language-Based Audio Retrieval | DCASE | |
2022 | Language-Based Audio Retrieval Task in DCASE 2022 Challenge | DCASE | |
2022 | Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval | INTERSPEECH | |
2022 | On Metric Learning for Audio-Text Cross-Modal Retrieval | INTERSPEECH | |
2022 | Audio-Text Retrieval in Context | ICASSP | |
2022 | Language-Based Audio Retrieval with Converging Tied Layers and Contrastive Loss | APSIPA ASC | |
2022 | Audio Retrieval with Natural Language Queries: A Benchmark Study | Article |
Year | Title | Paper | Code |
---|---|---|---|
2021 | Audio Retrieval with Natural Language Queries | INTERSPEECH |
Year | Title | Paper | Code |
---|---|---|---|
2019 | Cross Modal Audio Search and Retrieval with Joint Embeddings Based on Text and Audio | ICASSP | |
2018 | Acoustic Event Search with An Onomatopoeic Query: Measuring Distance between Onomatopoeic Words and Sounds | DCASE | |
2010 | Content-Based Retrieval From Unstructured Audio Databases Using An Ecological Acoustics Taxonomy | ICAD | |
2008 | Large-scale content-based audio retrieval from text queries | ICMR | |
2007 | Audio Information Retrieval using Semantic Similarity | ICASSP | |
2005 | Semantic-based Audio Recognition and Retrieval | Thesis | |
2002 | Semantic-audio retrieval | ICASSP | |
1999 | An overview of audio information retrieval | Article |
Year | Dataset | Paper | Code |
---|---|---|---|
2024 | AudioSetCaps | arXiv | GitHub |
2024 | Auto-ACD | MM | GitHub |
2023 | WavCaps | TASLP | GitHub |
2021 | MACS | Zenodo | |
2020 | Clotho | ICASSP | GitHub |
2019 | AudioCaps | NAACL | GitHub |
Year | Competition | Website |
---|---|---|
2024 | DCASE 2024 Challenge Task 8 | DCASE |
2023 | DCASE 2023 Challenge Task 6 | DCASE |
2022 | DCASE 2022 Challenge Task 6 | DCASE |
Type | Name |
---|---|
Journal | IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) |
Journal | IEEE Transactions on Multimedia |
Conference | ACM Special Interest Group on Information Retrieval (SIGIR) |
Conference | IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) |
Conference | Annual Conference of the International Speech Communication Association (INTERSPEECH) |
Conference | International Society for Music Information Retrieval Conference (ISMIR) |
Conference | ACM Multimedia (MM) |
Conference | European Signal Processing Conference (EUSIPCO) |
Conference | ACM International Conference on Multimedia Retrieval (ICMR) |
Workshop | Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE) |