STELA: Spatial-temporal enhanced learning with an anatomical graph transformer for 3D human pose estimation
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Transformers have led to remarkable performance improvements in 3D human pose estimation by capturing global dependencies between joints in spatial and temporal aspects. To leverage human body topology information, attempts have been made to incorporate graph representation within a transformer architecture. However, they neglect spatial-temporal anatomical knowledge inherent in the human body, without considering implicit relationships of non-connected joints. Furthermore, they disregard the movement patterns between joint trajectories, concentrating on the trajectories of individual joints. In this paper, we propose Spatial-Temporal Enhanced Learning with an Anatomical graph transformer (STELA) to aggregate the spatial-temporal global relationships and intricate anatomical relationships between joints. It consists of Global Self-attention (GS) and Anatomical Graph-attention (AG) branches. GS learns long-range dependencies between all joints across entire frames. AG focuses on the anatomical relationships of the human body in the spatial-temporal aspect using skeleton and motion pattern graphs. Extensive experiments demonstrate that STELA outperforms state-of-the-art approaches with an average of 41% fewer parameters, reducing MPJPE by an average of 2.7 mm on Human3.6M and 1.5 mm on MPI-INF-3DHP.

키워드

3D human pose estimationGraph transformerSpatial-temporal learningAnatomical relationship
제목
STELA: Spatial-temporal enhanced learning with an anatomical graph transformer for 3D human pose estimation
저자
Son, JianLee, JihoKim, Eunwoo
DOI
10.1016/j.cviu.2025.104381
발행일
2025-06
유형
Article
저널명
Computer Vision and Image Understanding
257