Publications

Journal Papers

2025 JSP Unified Speech Enhancement Technique for Diverse Input Conditions
2025 CSL An End-to-End Integration of Speech Separation and Recognition With Self-Supervised Learning Representation
2025 npj-Acoustics Contextual Understanding With Contextual Embeddings for Multi-Talker Speech Separation and Recognition in a Cocktail Party
2025 OJSP SpoofCeleb: Speech Deepfake Detection and SASV in the Wild
2024 SPM Module-Based End-to-End Distant Speech Processing: A Case Study of Far-Field Automatic Speech Recognition
2023 Applied Sciences Two-Stage Single-Channel Speech Enhancement with Multi-Frame Filtering
2022 TASLP End-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail Party
2020 TASLP Improving End-to-End Single-Channel Multi-Talker Speech Recognition

Conference Papers

2025 Interspeech Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
2025 Interspeech Interspeech 2025 URGENT Speech Enhancement Challenge
2025 Interspeech The Text-to-speech in the Wild (TITW) Database
2025 Interspeech BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM
2025 Interspeech Ranking and Selection of Bias Words for Contextual Bias Speech Recognition
2025 ICASSP Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
2025 NAACL-HLT VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
2024 EMNLP Towards Robust Speech Representation Learning for Thousands of Languages (Best Paper Award)
2024 ISCSLP Insights from Hyperparameter Scaling of Online Speech Separation
2024 Interspeech URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
2024 Interspeech Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
2024 Interspeech ESPnet-SPK: Full Pipeline Speaker Embedding Toolkit With Reproducible Recipes, Self-Supervised Front-Ends, and off-the-Shelf Models
2024 ICASSP Improving Design of Input Condition Invariant Speech Enhancement Models
2024 ICASSP Generation-Based Target Speech Extraction with Speech Discretization and Vocoder
2023 ASRU Toward Universal Speech Enhancement For Diverse Input Conditions
2023 ASRU Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing
2023 ASRU A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction
2023 ASRU Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
2023 ASRU Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning
2023 WASPAA Exploring the Integration of Speech Separation and Recognition with SelfSupervised Learning Representation
2023 Interspeech Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
2023 Interspeech Overlap Aware Continuous Speech Separation without Permutation Invariant Training
2022 SLT End-to-End Multi-Speaker ASR with Independent Vector Analysis
2022 ISCSLP Text-Informed Knowledge Distillation for Robust Speech Enhancement and Recognition
2022 Interspeech Separating Long-form Speech with Group-wise Permutation Invariant Training
2022 Interspeech ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
2022 ICASSP The SJTU System for Multimodal Information Based Speech Processing Challenge 2021
2022 ICASSP Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPnet-SE Submission to the L3DAS22 Challenge (First Place in the 3D Speech Enhancement Task of L3DAS22 Challenge)
2022 ICASSP Text Adaptive Detection for Customizable Keyword Spotting
2022 ICASSP Exploring Effective Data Utilization for Low-Resource Speech Recognition
2021 WASPAA Closing the Gap Between Time-domain Multi-channel Speech Enhancement on Real and Simulation Conditions
2021 DSLW The 2020 ESPnet Update: New Features, Broadened Applications, Performance Improvements, and Future Plans
2021 ICASSP End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend
2021 ICASSP Recent Developments on ESPnet Toolkit Boosted by Conformer
2021 ICASSP Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation
2021 SLT ESPnet-SE: End-to-End Speech Enhancement and Separation Toolkit Designed for ASR Integration
2020 Interspeech End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming
2020 Interspeech Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition
2020 ICASSP End-To-End Multi-Speaker Speech Recognition With Transformer
2019 ASRU End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform
2019 ASRU MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition (Best Paper Award)
2019 ASRU A Comparative Study on Transformer vs RNN in Speech Applications
2019 Interspeech Knowledge Distillation for End-to-End Monaural Multitalker ASR System
2019 Interspeech Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking

Wangyou Zhang

Publications

Journal Papers

Conference Papers