Publications
Journal Papers
- 2025 JSP Unified Speech Enhancement Technique for Diverse Input Conditions
- 2025 CSL An End-to-End Integration of Speech Separation and Recognition With Self-Supervised Learning Representation
- 2025 npj-Acoustics Contextual Understanding With Contextual Embeddings for Multi-Talker Speech Separation and Recognition in a Cocktail Party
- 2025 OJSP SpoofCeleb: Speech Deepfake Detection and SASV in the Wild
- 2024 SPM Module-Based End-to-End Distant Speech Processing: A Case Study of Far-Field Automatic Speech Recognition
- 2023 Applied Sciences Two-Stage Single-Channel Speech Enhancement with Multi-Frame Filtering
- 2022 TASLP End-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail Party
- 2020 TASLP Improving End-to-End Single-Channel Multi-Talker Speech Recognition
Conference Papers
- 2025 Interspeech Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
- 2025 Interspeech Interspeech 2025 URGENT Speech Enhancement Challenge
- 2025 Interspeech The Text-to-speech in the Wild (TITW) Database
- 2025 Interspeech BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM
- 2025 Interspeech Ranking and Selection of Bias Words for Contextual Bias Speech Recognition
- 2025 ICASSP Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
- 2025 NAACL-HLT VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
- 2024 EMNLP Towards Robust Speech Representation Learning for Thousands of Languages (Best Paper Award)
- 2024 ISCSLP Insights from Hyperparameter Scaling of Online Speech Separation
- 2024 Interspeech URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
- 2024 Interspeech Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
- 2024 Interspeech ESPnet-SPK: Full Pipeline Speaker Embedding Toolkit With Reproducible Recipes, Self-Supervised Front-Ends, and off-the-Shelf Models
- 2024 ICASSP Improving Design of Input Condition Invariant Speech Enhancement Models
- 2024 ICASSP Generation-Based Target Speech Extraction with Speech Discretization and Vocoder
- 2023 ASRU Toward Universal Speech Enhancement For Diverse Input Conditions
- 2023 ASRU Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing
- 2023 ASRU A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction
- 2023 ASRU Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
- 2023 ASRU Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning
- 2023 WASPAA Exploring the Integration of Speech Separation and Recognition with SelfSupervised Learning Representation
- 2023 Interspeech Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
- 2023 Interspeech Overlap Aware Continuous Speech Separation without Permutation Invariant Training
- 2022 SLT End-to-End Multi-Speaker ASR with Independent Vector Analysis
- 2022 ISCSLP Text-Informed Knowledge Distillation for Robust Speech Enhancement and Recognition
- 2022 Interspeech Separating Long-form Speech with Group-wise Permutation Invariant Training
- 2022 Interspeech ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
- 2022 ICASSP The SJTU System for Multimodal Information Based Speech Processing Challenge 2021
- 2022 ICASSP Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPnet-SE Submission to the L3DAS22 Challenge (First Place in the 3D Speech Enhancement Task of L3DAS22 Challenge)
- 2022 ICASSP Text Adaptive Detection for Customizable Keyword Spotting
- 2022 ICASSP Exploring Effective Data Utilization for Low-Resource Speech Recognition
- 2021 WASPAA Closing the Gap Between Time-domain Multi-channel Speech Enhancement on Real and Simulation Conditions
- 2021 DSLW The 2020 ESPnet Update: New Features, Broadened Applications, Performance Improvements, and Future Plans
- 2021 ICASSP End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend
- 2021 ICASSP Recent Developments on ESPnet Toolkit Boosted by Conformer
- 2021 ICASSP Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation
- 2021 SLT ESPnet-SE: End-to-End Speech Enhancement and Separation Toolkit Designed for ASR Integration
- 2020 Interspeech End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming
- 2020 Interspeech Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition
- 2020 ICASSP End-To-End Multi-Speaker Speech Recognition With Transformer
- 2019 ASRU End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform
- 2019 ASRU MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition (Best Paper Award)
- 2019 ASRU A Comparative Study on Transformer vs RNN in Speech Applications
- 2019 Interspeech Knowledge Distillation for End-to-End Monaural Multitalker ASR System
- 2019 Interspeech Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking