多模型融合的VoxSRC22说话人日志系统OA北大核心CSTPCD

Multi-Model Fusion VoxSRC22 Speaker Diarization System

中文摘要

英文摘要

为有效解决"谁在什么时候说话"的问题,提出一种说话人日志方法.该方法由六个模块组成,包括语音活动检测(voice activity detection,VAD)、语音增强、说话人嵌入提取器、说话人聚类、重叠语音检测(overlapping speech detection,OSD)和结果融合.利用语音增强技术可以改善语音活动检测的性能.有效地结合不同的说话人嵌入提取器和聚类算法可以进一步降低系统错误率.在系统融合后处理重叠语音展示了最佳结果.实验结果表明,最佳系统的性能相对基线提升了72%,并在VoxCeleb说话人识别挑战赛(VoxCeleb speaker recognition chal-lenge,VoxSRC)2022评估集上分别实现了5.48%的说话人日志错误率(diarization error rate,DER)和32.10%的杰卡德错误率(Jaccard error rate,JER),排名第四.

In order to effectively address the problem of speaker diarization,a novel speaker diarization method is pro-posed.The proposed method consists of six modules,including voice activity detection(VAD),speech enhancement,speaker embedding extractor,speaker clustering,overlapping speech detection(OSD),and result fusion.The application of speech enhancement techniques can improve the performance of voice activity detection.The effective combination of different speaker embedding extractors and clustering algorithms can further reduce speaker diarization error rate.The best performance is achieved by processing the overlapping speech after system fusion.Experimental results show that the performance of the proposed system outperforms the baseline by 72%,achieves a speaker diarization error rate(DER)of 5.48%and a Jaccard error rate(JER)of 32.10%on the VoxCeleb speaker recognition challenge(VoxSRC)2022 evaluation set,ranking fourth.

作者：杜雨轩;周若华

作者单位：北京建筑大学电气与信息工程学院,北京 102616

分类：计算机与自动化

中文关键词：说话人日志;语音活动检测;声纹嵌入;说话人聚类;结果融合

英文关键词：speaker diarization;voice activity detection;speaker embedding;speaker cluster;result fusion

刊名：《计算机工程与应用》 2024 (010)

页码/页数：164-172 / 9

DOI：10.3778/j.issn.1002-8331.2301-0080

下载量：0

点击量：0

多模型融合的VoxSRC22说话人日志系统OA北大核心CSTPCD

Multi-Model Fusion VoxSRC22 Speaker Diarization System

评论