改进MFCC特征和MLA模型的语音情感识别OA

Improved MFCC Features and MLA Model for Speech Emotion Recognition

中文摘要

英文摘要

MFCC及其一阶差分特征表征了语音的静态和动态信息,常作为SER的情感特征.在传统的MFCC特征提取过程中,通过人工调参实现语音信噪比的平衡,容易造成过度补偿的情况.本文提出两种改进方法,分别获得EMFCC和AMFCC特征.为了获得最佳的分类准确率,基于池化层、LSTM和注意力机制构建了MLA模型,能够有效捕捉特征中的情感信息.采用由MFCC及其一阶差分特征和两个改进MFCC特征组成的混合特征,在CASIA语料库上取得了 81.79%的未加权准确率.消融实验的结果表明,与SER领域其他较为先进的识别方法进行对比,改进的MFCC特征具有较好的性能优势.

MFCC and its first-order differential features represent the static and dynamic information of speech,often used as emotional features in SER.In the traditional MFCC feature extraction process,balancing the speech signal-to-noise ratio through manual parameter tuning can easily lead to overcompensation.This article proposes two improvement methods to obtain EMFCC and AMFCC features,respectively.In order to achieve the best classification accuracy,an MLA model was constructed based on pooling layer,LSTM,and attention mechanism,which can effectively capture emotional information in features.A mixed feature consisting of MFCC and its first-order differential features,as well as two improved MFCC features,achieved an unweighted accuracy of 81.79%on the CASIA corpus.The results of the ablation experiment indicate that compared with other advanced recognition methods in the SER field,the improved MFCC feature has better performance advantages.

作者：张晓莉

作者单位：福建师范大学教育技术学系福州 350007

分类：电子信息工程

中文关键词：语音情感识别;梅尔频率倒谱系数;长短时记忆;注意力机制

英文关键词：Speech Emotion Recognition;MFCC;Long Short-Term Memory;Attention Mechanism

刊名：《福建电脑》 2024 (001)

页码/页数：52-56 / 5

DOI：10.16707/j.cnki.fjpc.2024.01.010

下载量：0

点击量：0

改进MFCC特征和MLA模型的语音情感识别OA

Improved MFCC Features and MLA Model for Speech Emotion Recognition

评论