层内调节特征金字塔的密集人群姿态估计算法

doi:10.16818/j.issn1001-5868.2024073103

首页 > 过刊浏览>2024年第45卷第6期 >931-938. DOI:10.16818/j.issn1001-5868.2024073103

层内调节特征金字塔的密集人群姿态估计算法
DOI:
                        10.16818/j.issn1001-5868.2024073103
                    
CSTR:
                        
                    
作者:
                        谷学静1,2谷学静
华北理工大学 电气工程学院, 河北 唐山 063210;唐山市数字媒体工程技术研究中心, 河北 唐山 063210
在期刊界中查找
在百度中查找
在本站中查找
郭志斌1,2郭志斌
华北理工大学 电气工程学院, 河北 唐山 063210;唐山市数字媒体工程技术研究中心, 河北 唐山 063210
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:(1. 华北理工大学 电气工程学院, 河北 唐山 063210;2. 唐山市数字媒体工程技术研究中心, 河北 唐山 063210)
作者简介:谷学静(1972-),女,河北省唐山市人,博士,教授,主要研究方向为数字媒体技术、人机交互、虚拟人及智能Agent、人工心理和人工情感；
通讯作者:
中图分类号:TP391.4
基金项目:河北省自然科学基金高端钢铁冶金联合研究基金专项项目(F2017209120)；唐山市沉浸式虚拟环境基础创新团队项目(18130221A).通信作者:郭志斌

Dense Crowd Pose Estimation Algorithm for In-layer Adjustment Feature Pyramid

Author:

GU Xuejing ^{^1,2}
GU Xuejing
College of Electrical Engineering, NorthChina University of Science and Technology, Tangshan 063210, CHN;Tangshan Digital Media Engineering Technology Research Center, Tangshan 063000, CHN
在期刊界中查找
在百度中查找
在本站中查找
GUO Zhibin ^{^1,2}
GUO Zhibin
College of Electrical Engineering, NorthChina University of Science and Technology, Tangshan 063210, CHN;Tangshan Digital Media Engineering Technology Research Center, Tangshan 063000, CHN
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

(1. College of Electrical Engineering, NorthChina University of Science and Technology, Tangshan 063210, CHN;2. Tangshan Digital Media Engineering Technology Research Center, Tangshan 063000, CHN)

Fund Project:

摘要

图/表

访问统计

参考文献 [28]

相似文献

引证文献

资源附件

文章评论

摘要:

针对现有姿态估计算法对密集人群检测存在漏检和误检等问题,提出一种改进YOLOv8sPose的密集人群姿态估计算法YOLOv8Pose-DC。首先,设计一种集中式层内调节特征金字塔网络,采用并联方式把可变形注意力机制和CASPPF结合起来,通过自上而下的方式对金字塔网络进行全局集中调节,增加网络中全局表示的空间权重,使得改进算法能够获得全面且具有区分性的特征表示；其次,提出多尺度双检测头结构,减少计算量的同时提高模型检测效率；然后,使用DySample模块,提高模型上采样效率；最后,加入上下文感知模块,提高模型全局信息关联能力,并抑制无用背景突出人物特征。实验结果表明,相较于基准模型,YOLOv8Pose-DC的mAP@0.5提升3.1%,召回率提升4.2%。所设计的算法性能有较大提升,满足生产需要。

关键词:姿态估计; YOLOv8sPose; 密集人群; 集中式层内调节特征金字塔; DySample

Abstract:

To address the issues of missed and false detections in existing dense crowd pose estimation algorithms, an improved YOLOv8sPose algorithm for dense crowd pose estimation, namely, YOLOv8Pose-Dense Crowd (YOLOv8Pose-DC), is proposed. First, a centralized intrinsic adjustment feature pyramid network is designed, which combines deformable attention mechanisms and coordinate attention-based spatial pyramid pooling fast (CASPPF) in a parallel manner, globally focusing and adjusting the pyramid network from top to bottom, thereby increasing the spatial weight of global representation within the network. This enables the improved algorithm to obtain comprehensive and distinctive feature representations. Second, a multi-scale dual detection head structure is proposed, reducing computational complexity while enhancing model detection efficiency. Furthermore, the DySample module is utilized to improve the upsampling efficiency of the model. Finally, a spatial context aware module (SCAM) is added to enhance the model's ability in associating global information and suppressing irrelevant background features, to highlight human characteristics. Compared to the baseline model, YOLOv8Pose-DC increases mAP@0.5 by 3.1% and recall rate by 4.2%. The designed algorithm significantly improves performance and fully meets production requirements.

Key words:pose estimation; YOLOv8sPose; dense crowds; centralized intra-layer adjustment feature pyramid; DySample

参考文献

[1] Güler R A, Neverova N, Kokkinos I.Densepose:Dense human pose estimation in the wild[C]// Proc.of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:7297-7306.

[2] Papaioannidis C, Mademlis I, Pitas I.Fast CNN-based single-person 2D human pose estimation for autonomous systems[J].IEEE Transactions on Circuits and Systems for Video Technology, 2022,33(3):1262-1275.

[3] Tong M L, Han H, Zhu W.Shared latent dynamical structure for three-dimensional human pose estimation[J].Science China Information Sciences, 2011,54:1375-1382.

[4] Wang H, Shi Q, Shan B.Three-dimensional human pose estimation with spatial-temporal interaction enhancement transformer[J].Applied Sciences, 2023,13(8):5093.

[5] Nguyen T D, Kresovic M.A survey of top-down approaches for human pose estimation[J].arXiv preprint arXiv:2202.02656,2022.

[6] Chan Y, Wang Z, Peng Y, et al.Cascaded pyramid network for multi-person pose estimation[C]// Proc.of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:7103-7112.

[7] Carreira J, Zakharov D.PoseNet:A convolutional network for real-time 6-DOF camera relocalization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018,40(9):1881-1893.

[8] Yin R, Yin J.Global relation modeling and refinement for bottom-up human pose estimation[J].arXiv preprint arXiv:2303.14888,2023.

[9] Maji D, Nagori S, Mathew M, et al.Yolo-pose:Enhancing Yolo for multi person pose estimation using object keypoint similarity loss[C]// Proc.of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022:2637-2646.

[10] Redmon J, Divvala S, Girshick R, et al.You only look once:Unified, real-time object detection[C]// Proc.of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:779-788.

[11] Lin T Y, Dollr P, Girshick R, et al.Feature pyramid networks for object detection[C]// Proc.of the IEEE Conference on Computer Vision and Pattern Recognition, 2017:2117-2125.

[12] Quan Y, Zhang D, Zhang L, et al.Centralized feature pyramid for object detection[J].IEEE Trans.on Image Processing, 2023,32:4341-4354.

[13] Tan M, Pang R, Le Q V.Efficientdet:Scalable and efficient object detection[C]// Proc.of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020:10781-10790.

[14] 井方科, 任红格, 李松.基于多尺度特征融合的小目标交通标志检测[J/OL].激光与光电子学进展, 1-13[2024-05-26].Jing Fangke, Ren Hongge, Li Song.Small target traffic signdetection based on multi-scale feature fusion[J/OL].Advances in Laser and Optoelectronics, 1-13[2024-05-26].

[15] Toshev A, Szegedy C.Deeppose:Human pose estimation via deep neural networks[C]// Proc.of the IEEE Conf.on Computer Vision and Pattern Recognition, 2014:1653-1660.

[16] 马明旭, 马宏, 宋华伟.基于YOLO-Pose的城市街景小目标行人姿态估计算法[J].计算机工程, 2024,50(4):177-186.Ma Mingxu, Ma Hong, Song Huawei.Pedestrian pose estimation algorithm based on YOLO-Pose in urban street view[J].Computer Engineering, 2024,50(4):177-186.

[17] 王茹, 刘大明, 张健.Wear-YOLO:变电站电力人员安全装备检测方法研究[J].计算机工程与应用, 2024,60(9):111-121.Wang Ru, Liu Daming, Zhang Jian.Wear-YOLO:Research on detection method of safety equipment for power personnel in substation[J].Computer Engineering and Applications, 2024,60(9):111-121.

[18] 赵志宏, 郝子晔.改进YOLOv8的航拍小目标检测方法:CRP-YOLO[J/OL].计算机工程与应用:1-15[2024-06-09].Zhao Zhihong, Hao Ziye.Improved YOLOv8 aerial small target detection method:CRP-YOLO[J/OL].Computer Engineering and Applications:1-15[2024-06-09].

[19] 李松, 史涛, 井方科.改进YOLOv8的道路损伤检测算法[J].计算机工程与应用, 2023,59(23):165-174.Li Song, Shi Tao, Jing Fangke.Improved road damage detection algorithm of YOLOv8[J].Computer Engineering and Applications, 2023,59(23):165-174.

[20] Liu S, Qi L, Qin H, et al.Path aggregation network for instance segmentation[C]// Proc.of the IEEE Conf.on Computer Vision and Pattern Recognition, 2018:8759-8768.

[21] Ghiasi G, Lin T Y, Le Q V.Nas-fpn:Learning scalable feature pyramid architecture for object detection[C]// Proc.of the IEEE/CVF Conf.on Computer Vision and Pattern Recognition, 2019:7036-7045.

[22] Xia Z, Pan X, Song S, et al.Vision transformer with deformable attention[C]// Proc.of the IEEE/CVF Conf.on Computer Vision and Pattern Recognition, 2022:4794-4803.

[23] Liu W, Lu H, Fu H, et al.Learning to upsample by learning to sample[C]// Proc.of the IEEE/CVF Inter.Conf.on Computer Vision, 2023:6027-6037.

[24] Zhang Y, Ye M, Zhu G, et al.FFCA-YOLO for small object detection in remote sensing images[J].IEEE Trans.on Geoscience and Remote Sensing, 2024,62:1-15.

[25] Gao Z, Hidalgo G, Simon T, et al.OpenPose:realtime multi-person 2D pose estimation using part affinity fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021,43(1):172-186.

[26] Osokin D.Real-time 2D multi-person pose estimation on CPU:Lightweight openpose[C]// International Conference on Pattern Recognition Applications and Meth, 2019:53996245.

[27] Sun K, Xiao B, Liu D, et al.Deep high-resolution representation learning for human pose estimation[C]// Proc.of the IEEE/CVF Conf.on Computer Vision and Pattern Recognition, 2019:5686-5696.

[28] Cheng B W, Xiao B, Wang J D, et al.HigherHRNet:Scale-aware representation learning for bottom-up human pose estimation[C]// Proc.of the IEEE/CVF Conf.on Computer Vision and Pattern Recognition, 2020:5386-5395.

引用本文

谷学静,郭志斌.层内调节特征金字塔的密集人群姿态估计算法[J].半导体光电,2024,45(6):931-938. GU Xuejing, GUO Zhibin. Dense Crowd Pose Estimation Algorithm for In-layer Adjustment Feature Pyramid[J].,2024,45(6):931-938.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-07-31
最后修改日期:
录用日期:
在线发布日期: 2025-02-20
出版日期:

首页

期刊简介

编委会

在线订阅

不端检测

投稿须知

联系我们

引用本文

分享

文章指标

历史

文章二维码

漂浮通知

首页

期刊简介

编委会

在线订阅

不端检测

投稿须知

联系我们

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码

漂浮通知