基于多特征提取的3D人体姿态估计算法

葛森林; 高浩

doi:10.19304/J.ISSN1000-7180.2023.0271

摘要: 作为人工智能计算机视觉领域一项重要的任务，3D人体姿态估计受到了广泛的关注，并成功地应用在人机交互、电影游戏制作等领域。然而，3D人体姿态估计仍然面临着很大的挑战，主要是人体遮挡问题和数据集视角冗余问题，这些问题严重影响了3D人体姿态估计结果精度与速度的提升。本文提出了一种基于多特征提取的3D人体姿态估计方法。首先通过采集多个相机视角下的图片数据，将所采图片数据放入2D人体关节点检测网络模型中，得到人体2D关节点。接着将采集到的人体数据输入到关节点置信度计算网络模型，得到视角图片中各个关节点的权重值。随后将2D人体关节点热图通过一个热图权重计算网络计算出热图权重，将各个视角下的权重特征计算融合得到加权后的2D人体关节点热图。最后将所得加权后的2D人体关节点热图和视角图片中各个关节点的权重值输入到三角化算法中，映射得到空间中的3D人体关节点。本文的关键思想是设计一个关节点置信度计算网络从输入图像中学习每个关节的置信度权重，同时提取了反映热图特征质量的权重矩阵，以提高遮挡视图中热图的特征质量。此外，使用感知哈希算法对Occlusion-Person数据集进行去视角实验，在保证结果准确性的同时提高了模型推理速度。本文方法是端到端可微的，可以显著地提高算法效率和鲁棒性。本文在Human3.6M和Occlusion-Person两个公共数据集上使用平均关节位置误差（Mean Per Joint Position Error, MPJPE ）指标对该方法进行评估，分别取得27.3 mm和9.7 mm的结果。实验结果表明，该算法与最先进的方法相比，性能有了显著提升。

Abstract: As an important task in the field of artificial intelligence and computer vision, 3D human pose estimation has received widespread attention and has produced many applications in fields such as human-computer interaction and movie game production. However, 3D human pose estimation still faces significant challenges, mainly including human occlusion issues and dataset perspective redundancy issues, which seriously affect the accuracy and speed of 3D human pose estimation results. This paper proposes a multi-feature extraction method for 3D human posture estimation. Firstly, by collecting image data from multiple camera perspectives, the collected image data is placed into a 2D human pose estimation network model to obtain 2D joints. Then, the collected human data is input into a joint confidence calculation network model to obtain the weight values of each joint point in the perspective image, Subsequently, the 2D human joint heatmap is calculated using a heatmap weight calculation network to calculate the weight of the heat map, and the weighted 2D human joint heatmap is obtained by fusing the weight features under various views. Finally, the weighted 2D human joints heatmap and the weight values of each joint point in the perspective image are input into the triangulation algorithm to map the 3D human joint points in space. The key idea of this paper is to design a joint confidence calculation network to learn the confidence weights of each joint from the input image, and extract a confidence matrix that reflects the quality of the heatmap to improve the quality of the heat map features in the occluded view. In addition, a perceptual hash algorithm is used to perform a de-view experiment on the Occlusion-Person dataset, which improves the model inference speed while ensuring the accuracy of the results.The method in this paper is end-to-end differentiable, which can significantly improve the efficiency and robustness of the algorithm. This paper evaluates the method using the Mean Per Joint Position Error(MPJPE) metric on two common datasets, Human3.6M and Occlusion Person, achieving results of 27.3 mm and 9.7 mm, respectively. Experimental results show that the performance of this algorithm has been significantly improved compared to the most advanced methods.

基于多特征提取的3D人体姿态估计算法

3D human pose estimation based on multi-feature extraction