Man-Machine Speech Communication
20th National Conference, Ncmmsc 2025, Zhenjiang, China, October 16-19, 2025, Proceedings
Herausgegeben:Jia, Jia; Wu, Zhiyong; Gao, Lijian; Huang, Gongping; Li, Ya
Man-Machine Speech Communication
20th National Conference, Ncmmsc 2025, Zhenjiang, China, October 16-19, 2025, Proceedings
Herausgegeben:Jia, Jia; Wu, Zhiyong; Gao, Lijian; Huang, Gongping; Li, Ya
- Broschiertes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
This book constitutes the refereed proceedings of the 20th National Conference on Man-Machine Speech Communication, NCMMSC 2025, held in Zhenjiang, China, during October 16 19, 2025.
The 40 papers included in these proceedings were carefully reviewed and selected from 157 submissions. the conference will feature special events such as a Young Scholars Forum, Student Forum, Industry Forum, and Product and Technology Exhibition. Beyond the main program, the conference will also include publicoutreach activities, grant-writing workshops, and several special sessions.
Andere Kunden interessierten sich auch für
Rupayan ChakrabortyAnalyzing Emotion in Spontaneous Speech38,99 €
Man-Machine Speech Communication57,99 €
Handbook of Natural Language Processing and Machine Translation279,99 €
David S. TaubmanJpeg2000 Image Compression Fundamentals, Standards and Practice158,99 €
Liang LinMultimodal Large Models151,99 €
Deborah Dahl (ed.)Practical Spoken Dialog Systems102,99 €
Chong-Yung ChiBlind Equalization and System Identification52,99 €-
-
-
This book constitutes the refereed proceedings of the 20th National Conference on Man-Machine Speech Communication, NCMMSC 2025, held in Zhenjiang, China, during October 16 19, 2025.
The 40 papers included in these proceedings were carefully reviewed and selected from 157 submissions. the conference will feature special events such as a Young Scholars Forum, Student Forum, Industry Forum, and Product and Technology Exhibition. Beyond the main program, the conference will also include publicoutreach activities, grant-writing workshops, and several special sessions.
The 40 papers included in these proceedings were carefully reviewed and selected from 157 submissions. the conference will feature special events such as a Young Scholars Forum, Student Forum, Industry Forum, and Product and Technology Exhibition. Beyond the main program, the conference will also include publicoutreach activities, grant-writing workshops, and several special sessions.
Produktdetails
- Produktdetails
- Communications in Computer and Information Science 2662
- Verlag: Springer, Berlin; Springer
- Seitenzahl: 539
- Erscheinungstermin: 19. Januar 2026
- Englisch
- Abmessung: 235mm x 155mm x 30mm
- Gewicht: 832g
- ISBN-13: 9789819553815
- ISBN-10: 9819553814
- Artikelnr.: 75835432
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- gpsr@libri.de
- Communications in Computer and Information Science 2662
- Verlag: Springer, Berlin; Springer
- Seitenzahl: 539
- Erscheinungstermin: 19. Januar 2026
- Englisch
- Abmessung: 235mm x 155mm x 30mm
- Gewicht: 832g
- ISBN-13: 9789819553815
- ISBN-10: 9819553814
- Artikelnr.: 75835432
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- gpsr@libri.de
.
Zero
and One
Shot Data Augmentation for Sentence
Level Dysarthric Speech Recognition in Constrained Scenarios. .
Multilevel and Granular L2 Pronunciation Assessment Using Stress
Based Suprasegmental Features and Proficiency Adaptation. .
CDMGTU
Net: A Causal Dual
Branch Multi
Channel Speech Enhancement Network with Multi
Scale Gateted Feature Fusion. .
A Two
Stage Band
Split Mamba
2 Network For Music Source Separation. .
Ideal
LLM: Integrating Dual Encoders and Language
Adapted LLM for Multilingual Speech
to
Text. .
MambaVoc: State Space Models for High
Fidelity Audio Synthesis. .
StreamFlow: Streaming Flow Matching with Block
wise Guided Attention Mask for Speech Token Decoding. .
Automatic Speech Evaluation Method Leveraging Deep Feature Fusion. .
Curriculum Reinforcement Learning for Robust Low
Resource Chinese Dialect Speech Recognition. .
An Acoustic Study on Intonation Production of English Learners from Guanzhong Region in Shaanxi Province. .
Improving Anomalous Sound Detection with Top
M Pseudo
Labeling. .
Dementia Detection via Speech Temporal Sequences with Shifted Windows. .
CL
EDiff: Cross
lingual emotional TTS system based on diffusion model. .
When AI Speaks, Do We Follow? Phonetic Entrainment in Human
AI Dialogues. .
Aishell1Mix: Towards Robust Mandarin Speech Separation with Scalable Audio Language Models. .
Study of the Low
Rank Minimum Variance Distortionless Response Beamformer for Speech Enhancement. .
Exploring Gender Bias in Alzheimer’s Disease Detection: Insights from Mandarin and Greek Speech Perception. .
UniDaugMamba: A Unimodal Data
augmented Mamba for Speech
Based Depression Detection. .
Serial
Parallel Dual
Path Architecture for Speaking Style Recognition. .
Knowledge Augmented Finetuning Matters in Both RAG and Agent Based Dialog Systems. .
NC
KWS: Few
Shot Class
Incremental Keyword Spotting Based on Neural Collapse. .
ZSEmo
MTVITS: A Zero
Shot Cross
Lingual Emotional Speech Synthesis Model for Mandarin and Tibetan Based on VITS. .
CUHK
EE Systems for the vTAD Challenge at NCMMSC 2025. .
Accent Familiarity and Phonological Weighting in Spoken
Word Recognition. .
Audio Deepfake Detection via Dual Branch Classifier with Self
Supervised Pre
Trained Model. .
A Multi
Subspace Attention Approach for Robust Speech Spoofing Detection in Silence
Trimming Conditions. .
Temporally Consistent Teeth Restoration for Talking Heads. .
EEG as a Biometric Identifier: The Impact of Electrode Arrangement, Brain Areas, and Frequency Bands. .
The Phonetic Modification and Facial Movements Made During Mandarin Vowel and Tone Production in Noise. .
Exploring Audio
Visual Fusion for Sound Event Localization and Detection with BEATs. .
On Multi
Input Multi
Frame MVDR Filter for Speech Enhancement with Heterophasic Presentation. .
Adaptive Multi
source Fusion for Uyghur ASR Error Correction. .
The determinants of Chinese lexical stress. .
Introducing Discriminative Speaker Embeddings for Voice Timbre Attribute Detection. .
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models. .
A Timbre Attribute Discrimination System Fusing Pre
trained Speaker Feature Extractors with Gender Prior Features. .
Improving the Robustness of Audio
Visual Target Speaker Extraction With AV
HuBERT Based Lip Features. .
A Hierarchical Fusion Modeling from Perception to Prediction with Personalized Features for Multimodal Depression Detection. .
Revisiting Target Signal Definitions in Distortionless Superdirective Beamforming for Reverberant Speech Enhancement. .
HiStyle: Hierarchical Style Embedding Prediction for Text
Prompt
Guided Controllable Speech Synthesis.
Zero
and One
Shot Data Augmentation for Sentence
Level Dysarthric Speech Recognition in Constrained Scenarios. .
Multilevel and Granular L2 Pronunciation Assessment Using Stress
Based Suprasegmental Features and Proficiency Adaptation. .
CDMGTU
Net: A Causal Dual
Branch Multi
Channel Speech Enhancement Network with Multi
Scale Gateted Feature Fusion. .
A Two
Stage Band
Split Mamba
2 Network For Music Source Separation. .
Ideal
LLM: Integrating Dual Encoders and Language
Adapted LLM for Multilingual Speech
to
Text. .
MambaVoc: State Space Models for High
Fidelity Audio Synthesis. .
StreamFlow: Streaming Flow Matching with Block
wise Guided Attention Mask for Speech Token Decoding. .
Automatic Speech Evaluation Method Leveraging Deep Feature Fusion. .
Curriculum Reinforcement Learning for Robust Low
Resource Chinese Dialect Speech Recognition. .
An Acoustic Study on Intonation Production of English Learners from Guanzhong Region in Shaanxi Province. .
Improving Anomalous Sound Detection with Top
M Pseudo
Labeling. .
Dementia Detection via Speech Temporal Sequences with Shifted Windows. .
CL
EDiff: Cross
lingual emotional TTS system based on diffusion model. .
When AI Speaks, Do We Follow? Phonetic Entrainment in Human
AI Dialogues. .
Aishell1Mix: Towards Robust Mandarin Speech Separation with Scalable Audio Language Models. .
Study of the Low
Rank Minimum Variance Distortionless Response Beamformer for Speech Enhancement. .
Exploring Gender Bias in Alzheimer’s Disease Detection: Insights from Mandarin and Greek Speech Perception. .
UniDaugMamba: A Unimodal Data
augmented Mamba for Speech
Based Depression Detection. .
Serial
Parallel Dual
Path Architecture for Speaking Style Recognition. .
Knowledge Augmented Finetuning Matters in Both RAG and Agent Based Dialog Systems. .
NC
KWS: Few
Shot Class
Incremental Keyword Spotting Based on Neural Collapse. .
ZSEmo
MTVITS: A Zero
Shot Cross
Lingual Emotional Speech Synthesis Model for Mandarin and Tibetan Based on VITS. .
CUHK
EE Systems for the vTAD Challenge at NCMMSC 2025. .
Accent Familiarity and Phonological Weighting in Spoken
Word Recognition. .
Audio Deepfake Detection via Dual Branch Classifier with Self
Supervised Pre
Trained Model. .
A Multi
Subspace Attention Approach for Robust Speech Spoofing Detection in Silence
Trimming Conditions. .
Temporally Consistent Teeth Restoration for Talking Heads. .
EEG as a Biometric Identifier: The Impact of Electrode Arrangement, Brain Areas, and Frequency Bands. .
The Phonetic Modification and Facial Movements Made During Mandarin Vowel and Tone Production in Noise. .
Exploring Audio
Visual Fusion for Sound Event Localization and Detection with BEATs. .
On Multi
Input Multi
Frame MVDR Filter for Speech Enhancement with Heterophasic Presentation. .
Adaptive Multi
source Fusion for Uyghur ASR Error Correction. .
The determinants of Chinese lexical stress. .
Introducing Discriminative Speaker Embeddings for Voice Timbre Attribute Detection. .
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models. .
A Timbre Attribute Discrimination System Fusing Pre
trained Speaker Feature Extractors with Gender Prior Features. .
Improving the Robustness of Audio
Visual Target Speaker Extraction With AV
HuBERT Based Lip Features. .
A Hierarchical Fusion Modeling from Perception to Prediction with Personalized Features for Multimodal Depression Detection. .
Revisiting Target Signal Definitions in Distortionless Superdirective Beamforming for Reverberant Speech Enhancement. .
HiStyle: Hierarchical Style Embedding Prediction for Text
Prompt
Guided Controllable Speech Synthesis.
.
Zero
and One
Shot Data Augmentation for Sentence
Level Dysarthric Speech Recognition in Constrained Scenarios. .
Multilevel and Granular L2 Pronunciation Assessment Using Stress
Based Suprasegmental Features and Proficiency Adaptation. .
CDMGTU
Net: A Causal Dual
Branch Multi
Channel Speech Enhancement Network with Multi
Scale Gateted Feature Fusion. .
A Two
Stage Band
Split Mamba
2 Network For Music Source Separation. .
Ideal
LLM: Integrating Dual Encoders and Language
Adapted LLM for Multilingual Speech
to
Text. .
MambaVoc: State Space Models for High
Fidelity Audio Synthesis. .
StreamFlow: Streaming Flow Matching with Block
wise Guided Attention Mask for Speech Token Decoding. .
Automatic Speech Evaluation Method Leveraging Deep Feature Fusion. .
Curriculum Reinforcement Learning for Robust Low
Resource Chinese Dialect Speech Recognition. .
An Acoustic Study on Intonation Production of English Learners from Guanzhong Region in Shaanxi Province. .
Improving Anomalous Sound Detection with Top
M Pseudo
Labeling. .
Dementia Detection via Speech Temporal Sequences with Shifted Windows. .
CL
EDiff: Cross
lingual emotional TTS system based on diffusion model. .
When AI Speaks, Do We Follow? Phonetic Entrainment in Human
AI Dialogues. .
Aishell1Mix: Towards Robust Mandarin Speech Separation with Scalable Audio Language Models. .
Study of the Low
Rank Minimum Variance Distortionless Response Beamformer for Speech Enhancement. .
Exploring Gender Bias in Alzheimer’s Disease Detection: Insights from Mandarin and Greek Speech Perception. .
UniDaugMamba: A Unimodal Data
augmented Mamba for Speech
Based Depression Detection. .
Serial
Parallel Dual
Path Architecture for Speaking Style Recognition. .
Knowledge Augmented Finetuning Matters in Both RAG and Agent Based Dialog Systems. .
NC
KWS: Few
Shot Class
Incremental Keyword Spotting Based on Neural Collapse. .
ZSEmo
MTVITS: A Zero
Shot Cross
Lingual Emotional Speech Synthesis Model for Mandarin and Tibetan Based on VITS. .
CUHK
EE Systems for the vTAD Challenge at NCMMSC 2025. .
Accent Familiarity and Phonological Weighting in Spoken
Word Recognition. .
Audio Deepfake Detection via Dual Branch Classifier with Self
Supervised Pre
Trained Model. .
A Multi
Subspace Attention Approach for Robust Speech Spoofing Detection in Silence
Trimming Conditions. .
Temporally Consistent Teeth Restoration for Talking Heads. .
EEG as a Biometric Identifier: The Impact of Electrode Arrangement, Brain Areas, and Frequency Bands. .
The Phonetic Modification and Facial Movements Made During Mandarin Vowel and Tone Production in Noise. .
Exploring Audio
Visual Fusion for Sound Event Localization and Detection with BEATs. .
On Multi
Input Multi
Frame MVDR Filter for Speech Enhancement with Heterophasic Presentation. .
Adaptive Multi
source Fusion for Uyghur ASR Error Correction. .
The determinants of Chinese lexical stress. .
Introducing Discriminative Speaker Embeddings for Voice Timbre Attribute Detection. .
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models. .
A Timbre Attribute Discrimination System Fusing Pre
trained Speaker Feature Extractors with Gender Prior Features. .
Improving the Robustness of Audio
Visual Target Speaker Extraction With AV
HuBERT Based Lip Features. .
A Hierarchical Fusion Modeling from Perception to Prediction with Personalized Features for Multimodal Depression Detection. .
Revisiting Target Signal Definitions in Distortionless Superdirective Beamforming for Reverberant Speech Enhancement. .
HiStyle: Hierarchical Style Embedding Prediction for Text
Prompt
Guided Controllable Speech Synthesis.
Zero
and One
Shot Data Augmentation for Sentence
Level Dysarthric Speech Recognition in Constrained Scenarios. .
Multilevel and Granular L2 Pronunciation Assessment Using Stress
Based Suprasegmental Features and Proficiency Adaptation. .
CDMGTU
Net: A Causal Dual
Branch Multi
Channel Speech Enhancement Network with Multi
Scale Gateted Feature Fusion. .
A Two
Stage Band
Split Mamba
2 Network For Music Source Separation. .
Ideal
LLM: Integrating Dual Encoders and Language
Adapted LLM for Multilingual Speech
to
Text. .
MambaVoc: State Space Models for High
Fidelity Audio Synthesis. .
StreamFlow: Streaming Flow Matching with Block
wise Guided Attention Mask for Speech Token Decoding. .
Automatic Speech Evaluation Method Leveraging Deep Feature Fusion. .
Curriculum Reinforcement Learning for Robust Low
Resource Chinese Dialect Speech Recognition. .
An Acoustic Study on Intonation Production of English Learners from Guanzhong Region in Shaanxi Province. .
Improving Anomalous Sound Detection with Top
M Pseudo
Labeling. .
Dementia Detection via Speech Temporal Sequences with Shifted Windows. .
CL
EDiff: Cross
lingual emotional TTS system based on diffusion model. .
When AI Speaks, Do We Follow? Phonetic Entrainment in Human
AI Dialogues. .
Aishell1Mix: Towards Robust Mandarin Speech Separation with Scalable Audio Language Models. .
Study of the Low
Rank Minimum Variance Distortionless Response Beamformer for Speech Enhancement. .
Exploring Gender Bias in Alzheimer’s Disease Detection: Insights from Mandarin and Greek Speech Perception. .
UniDaugMamba: A Unimodal Data
augmented Mamba for Speech
Based Depression Detection. .
Serial
Parallel Dual
Path Architecture for Speaking Style Recognition. .
Knowledge Augmented Finetuning Matters in Both RAG and Agent Based Dialog Systems. .
NC
KWS: Few
Shot Class
Incremental Keyword Spotting Based on Neural Collapse. .
ZSEmo
MTVITS: A Zero
Shot Cross
Lingual Emotional Speech Synthesis Model for Mandarin and Tibetan Based on VITS. .
CUHK
EE Systems for the vTAD Challenge at NCMMSC 2025. .
Accent Familiarity and Phonological Weighting in Spoken
Word Recognition. .
Audio Deepfake Detection via Dual Branch Classifier with Self
Supervised Pre
Trained Model. .
A Multi
Subspace Attention Approach for Robust Speech Spoofing Detection in Silence
Trimming Conditions. .
Temporally Consistent Teeth Restoration for Talking Heads. .
EEG as a Biometric Identifier: The Impact of Electrode Arrangement, Brain Areas, and Frequency Bands. .
The Phonetic Modification and Facial Movements Made During Mandarin Vowel and Tone Production in Noise. .
Exploring Audio
Visual Fusion for Sound Event Localization and Detection with BEATs. .
On Multi
Input Multi
Frame MVDR Filter for Speech Enhancement with Heterophasic Presentation. .
Adaptive Multi
source Fusion for Uyghur ASR Error Correction. .
The determinants of Chinese lexical stress. .
Introducing Discriminative Speaker Embeddings for Voice Timbre Attribute Detection. .
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models. .
A Timbre Attribute Discrimination System Fusing Pre
trained Speaker Feature Extractors with Gender Prior Features. .
Improving the Robustness of Audio
Visual Target Speaker Extraction With AV
HuBERT Based Lip Features. .
A Hierarchical Fusion Modeling from Perception to Prediction with Personalized Features for Multimodal Depression Detection. .
Revisiting Target Signal Definitions in Distortionless Superdirective Beamforming for Reverberant Speech Enhancement. .
HiStyle: Hierarchical Style Embedding Prediction for Text
Prompt
Guided Controllable Speech Synthesis.







