基于深度注意力的融合全局和语义特征的图像描述模型OA

Deep attention-based image caption model with fusion of global and semantic feature

中文摘要

英文摘要

现有的图像描述模型使用全局特征时受限于感受野大小相同,而基于对象区域的图像特征缺少背景信息.为此,提出了一种新的语义提取模块提取图像中的语义特征,使用多特征融合模块将全局特征与语义特征进行融合,使得模型同时关注图像的关键对象内容信息和背景信息.并提出基于深度注意力的解码模块,对齐视觉和文本特征以生成更高质量的图像描述语句.所提模型在Microsoft COCO数据集上进行了实验评估,分析结果表明该方法能够明显提升描述的性能,相较于其他先进模型具有竞争力.

Aiming at the problems that existing image caption generation models face limitations when utilizing global features due to the fixed receptive field size,and object region-based image features lack background information,an image caption model(DFGS)is proposed.A multi-feature fusion module is designed to fuse global and semantic feature,allowing the model to focus on key object and background information in the image.A deep attention-based decoding module is designed to align visual and textual features,enhancing the generation of higher-quality image description statements.Experimental results on MSCOCO data-set show that the proposed model can produce more accurate captions,and is competitive compared with other advanced models.

作者：及昕浩;彭玉青

作者单位：河北工业大学人工智能与数据科学学院, 天津 300401

分类：计算机与自动化

中文关键词：图像描述;全局特征;语义特征;特征融合

英文关键词：image caption;global feature;semantic feature;feature fusion

刊名：《网络安全与数据治理》 2024 (002)

页码/页数：49-53 / 5

基金：河北省研究生创新项目(220056)

DOI：10.19358/j.issn.2097-1788.2024.02.008

下载量：0

点击量：0

基于深度注意力的融合全局和语义特征的图像描述模型OA

Deep attention-based image caption model with fusion of global and semantic feature

评论