ANNEXE | Yuejiao Su

ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction

CVPR 2025

Yuejiao Su Yi Wang Qiongyang Hu Chuang Yang Lap-Pui Chau

The Hong Kong Polytechnic University

Arxiv Supp Code Poster

Abstract

Egocentric interaction perception is one of the essential branches in investigating human-environment interaction, which lays the basis for developing next-generation intelligent systems. However, existing egocentric interaction understanding methods cannot yield coherent textual and pixel-level responses simultaneously according to user queries, which lacks flexibility for varying downstream application requirements. To comprehend egocentric interactions exhaustively, this paper presents a novel task named Egocentric Interaction Reasoning and pixel Grounding (Ego-IRG). Taking an egocentric image with the query as input, Ego-IRG is the first task that aims to resolve the interactions through three crucial steps: analyzing, answering, and pixel grounding, which results in fluent textual and fine-grained pixel-level responses. Another challenge is that existing datasets cannot meet the conditions for the Ego-IRG task. To address this limitation, this paper creates the Ego-IRGBench dataset based on extensive manual efforts, which includes over 20k egocentric images with 1.6 million queries and corresponding multimodal responses about interactions. Moreover, we design a unified ANNEXE model to generate text- and pixel-level outputs utilizing multimodal large language models, which enables a comprehensive interpretation of egocentric interactions. The experiments on the Ego-IRGBench exhibit the effectiveness of our ANNEXE model compared with other works.

Ego-IRG Task

Ego-IRGBench dataset

ANNEXE Model

Qualitative Results

Citation

                @inproceedings{su:annexe:2025,
    title = {ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction},
    author = {Yuejiao Su, Yi Wang, Qiongyang Hu, Chuang Yang, and Lap-Pui Chau},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2025}
  }
  
            

Data

News: The Ego-IRGBench dataset will be released soon.

Ego-IRGBench_Egocentric_Image. Egocentric_Image.

Ego-IRGBench_depth. Depth_Map.

Ego-IRGBench_Query_Answer. Query_Answer.

Ego-IRGBench_Mask. Mask.

Baidu Cloud download link. Baidu Cloud download link..