论文检索
期刊
全部知识仓储预印本开放期刊机构
高级检索

多模型融合的VoxSRC22说话人日志系统OA北大核心CSTPCD

Multi-Model Fusion VoxSRC22 Speaker Diarization System

中文摘要英文摘要

为有效解决"谁在什么时候说话"的问题,提出一种说话人日志方法.该方法由六个模块组成,包括语音活动检测(voice activity detection,VAD)、语音增强、说话人嵌入提取器、说话人聚类、重叠语音检测(overlapping speech detection,OSD)和结果融合.利用语音增强技术可以改善语音活动检测的性能.有效地结合不同的说话人嵌入提取器和聚类算法可以进一步降低系统错误率.在系统融合后处理重叠语音展示了最佳结果.实验结果表明,最佳系统的性能相对基线提升了72%,并在VoxCeleb说话人识别挑战赛(VoxCeleb speaker recognition chal-lenge,VoxSRC)2022评估集上分别实现了5.48%的说话人日志错误率(diarization error rate,DER)和32.10%的杰卡德错误率(Jaccard error rate,JER),排名第四.

In order to effectively address the problem of speaker diarization,a novel speaker diarization method is pro-posed.The proposed method consists of six modules,including voice activity detection(VAD),speech enhancement,speaker embedding extractor,speaker clustering,overlapping speech detection(OSD),and result fusion.The application of speech enhancement techniques can improve the performance of voice activity detection.The effective combination of different speaker embedding extractors and clustering algorithms can further reduce speaker diarization error rate.The best performance is achieved by processing the overlapping speech after system fusion.Experimental results show that the performance of the proposed system outperforms the baseline by 72%,achieves a speaker diarization error rate(DER)of 5.48%and a Jaccard error rate(JER)of 32.10%on the VoxCeleb speaker recognition challenge(VoxSRC)2022 evaluation set,ranking fourth.

杜雨轩;周若华

北京建筑大学 电气与信息工程学院,北京 102616

计算机与自动化

说话人日志;语音活动检测;声纹嵌入;说话人聚类;结果融合

speaker diarization;voice activity detection;speaker embedding;speaker cluster;result fusion

《计算机工程与应用》 2024 (010)

164-172 / 9

10.3778/j.issn.1002-8331.2301-0080

评论

下载量:0
点击量:0