首页 > 鸟类学研究（英文版） > Step-by-step to success:Multi-stage learning driven robust audiovisual fusion network for fine-grained bird species classification

Step-by-step to success:Multi-stage learning driven robust audiovisual fusion network for fine-grained bird species classification

导出在线阅读解读下载全文终端阅读

二维码有效期 120s

点击刷新

注：终端设备浏览有效期：

学术成果认领

翻译打印收藏纠错

摘要Bird monitoring and protection are essential for maintaining biodiversity,and fine-grained bird classification has become a key focus in this field.Audio-visual modalities provide critical cues for this task,but robust feature extraction and efficient fusion remain major challenges.We introduce a multi-stage fine-grained audiovisual fusion network(MSFG-AVFNet)for fine-grained bird species classification,which addresses these challenges through two key components:(1)the audiovisual feature extraction module,which adopts a multi-stage fine-tuning strategy to provide high-quality unimodal features,laying a solid foundation for modality fusion;(2)the audiovisual feature fusion module,which combines a max pooling aggregation strategy with a novel audiovisual loss function to achieve effective and robust feature fusion.Experiments were conducted on the self-built AVB81 and the publicly available SSW60 datasets,which contain data from 81 and 60 bird species,respectively.Comprehensive experiments demonstrate that our approach achieves notable performance gains,outperforming existing state-of-the-art methods.These results highlight its effectiveness in leveraging audiovisual modalities for fine-grained bird classification and its potential to support ecological monitoring and biodiversity research.

作者 Shanshan Xie ^[1] Jiangjian Xie ^[1] Yang Liu ^[1] Lianshuai Sha ^[1] Ye Tian ^[1] Jiahua Dong ^[2] Diwen Liang ^[2] Kaijun Pan ^[2] Junguo Zhang ^[1] 学术成果认领

作者单位 School of Technology,Beijing Forestry University,Beijing 100083,China;State Key Laboratory of Efficient Production of Forest Resources,Beijing 100083,China ^[1] South China Institute of Environmental Sciences,Ministry of Ecology and Environment & The Key Laboratory of Urban Ecological Environment Simulation and Protection,Ministry of Ecology and Environment of the People's Republic of China,Guangzhou 510535,China ^[2]

关键词 Audiovisual modality Bird species classification Feature fusion Fine-grained

DOI 10.1016/j.avrs.2025.100280

发布时间 2025-12-01（万方平台首次上网日期，不代表论文的发表时间）

翻译满意度评价：

准确一般字义偏差语意偏离译文作用小提交