Publications

Journal Papers

  1. 2025 JSP Unified Speech Enhancement Technique for Diverse Input Conditions
  2. 2025 CSL An End-to-End Integration of Speech Separation and Recognition With Self-Supervised Learning Representation
  3. 2025 npj-Acoustics Contextual Understanding With Contextual Embeddings for Multi-Talker Speech Separation and Recognition in a Cocktail Party
  4. 2025 OJSP SpoofCeleb: Speech Deepfake Detection and SASV in the Wild
  5. 2024 SPM Module-Based End-to-End Distant Speech Processing: A Case Study of Far-Field Automatic Speech Recognition
  6. 2023 Applied Sciences Two-Stage Single-Channel Speech Enhancement with Multi-Frame Filtering
  7. 2022 TASLP End-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail Party
  8. 2020 TASLP Improving End-to-End Single-Channel Multi-Talker Speech Recognition

Conference Papers

  1. 2025 Interspeech Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
  2. 2025 Interspeech Interspeech 2025 URGENT Speech Enhancement Challenge
  3. 2025 Interspeech The Text-to-speech in the Wild (TITW) Database
  4. 2025 Interspeech BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM
  5. 2025 Interspeech Ranking and Selection of Bias Words for Contextual Bias Speech Recognition
  6. 2025 ICASSP Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
  7. 2025 NAACL-HLT VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
  8. 2024 EMNLP Towards Robust Speech Representation Learning for Thousands of Languages (Best Paper Award)
  9. 2024 ISCSLP Insights from Hyperparameter Scaling of Online Speech Separation
  10. 2024 Interspeech URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
  11. 2024 Interspeech Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
  12. 2024 Interspeech ESPnet-SPK: Full Pipeline Speaker Embedding Toolkit With Reproducible Recipes, Self-Supervised Front-Ends, and off-the-Shelf Models
  13. 2024 ICASSP Improving Design of Input Condition Invariant Speech Enhancement Models
  14. 2024 ICASSP Generation-Based Target Speech Extraction with Speech Discretization and Vocoder
  15. 2023 ASRU Toward Universal Speech Enhancement For Diverse Input Conditions
  16. 2023 ASRU Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing
  17. 2023 ASRU A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction
  18. 2023 ASRU Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
  19. 2023 ASRU Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning
  20. 2023 WASPAA Exploring the Integration of Speech Separation and Recognition with SelfSupervised Learning Representation
  21. 2023 Interspeech Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
  22. 2023 Interspeech Overlap Aware Continuous Speech Separation without Permutation Invariant Training
  23. 2022 SLT End-to-End Multi-Speaker ASR with Independent Vector Analysis
  24. 2022 ISCSLP Text-Informed Knowledge Distillation for Robust Speech Enhancement and Recognition
  25. 2022 Interspeech Separating Long-form Speech with Group-wise Permutation Invariant Training
  26. 2022 Interspeech ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
  27. 2022 ICASSP The SJTU System for Multimodal Information Based Speech Processing Challenge 2021
  28. 2022 ICASSP Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPnet-SE Submission to the L3DAS22 Challenge (First Place in the 3D Speech Enhancement Task of L3DAS22 Challenge)
  29. 2022 ICASSP Text Adaptive Detection for Customizable Keyword Spotting
  30. 2022 ICASSP Exploring Effective Data Utilization for Low-Resource Speech Recognition
  31. 2021 WASPAA Closing the Gap Between Time-domain Multi-channel Speech Enhancement on Real and Simulation Conditions
  32. 2021 DSLW The 2020 ESPnet Update: New Features, Broadened Applications, Performance Improvements, and Future Plans
  33. 2021 ICASSP End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend
  34. 2021 ICASSP Recent Developments on ESPnet Toolkit Boosted by Conformer
  35. 2021 ICASSP Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation
  36. 2021 SLT ESPnet-SE: End-to-End Speech Enhancement and Separation Toolkit Designed for ASR Integration
  37. 2020 Interspeech End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming
  38. 2020 Interspeech Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition
  39. 2020 ICASSP End-To-End Multi-Speaker Speech Recognition With Transformer
  40. 2019 ASRU End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform
  41. 2019 ASRU MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition (Best Paper Award)
  42. 2019 ASRU A Comparative Study on Transformer vs RNN in Speech Applications
  43. 2019 Interspeech Knowledge Distillation for End-to-End Monaural Multitalker ASR System
  44. 2019 Interspeech Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking