Jongseo Lee | Research Homepage

About

About Me

I recently completed my M.S. in Computer Science at Kyung Hee University and am continuing my research at the same lab as a Post-Master Researcher, advised by Prof. Jinwoo Choi. I am currently seeking PhD positions in the United States (Fall 2027).

My research centers on trustworthy multimodal video understanding across vision, language, and audio. Rather than a single technique, I'm drawn to the whole picture — how these models are trained, what they actually learn, and whether their reasoning is grounded, reliable, and interpretable. This thread runs across my work: diagnosing failure modes of Video-LLMs in DeltaDirect, concept-level explanation in DANCE (NeurIPS 2025 Spotlight), audio–visual reasoning in CA²ST (IEEE TPAMI 2026), and continual learning in ESSENTIAL (ICCV 2025 Highlight).

I am now moving toward Physical AI, studying video as the primary modality through which machines perceive, reason about, and eventually act in the physical world.

Before graduate school, I earned dual bachelor's degrees in Biomedical Engineering and Electronics Engineering. That interdisciplinary background — signal processing, embedded systems, and AI — still shapes how I approach research today.

I am always open to collaboration and discussion on computer vision, multimodal AI, and reliable video understanding. Feel free to reach out!

Education

Post-Master Researcher, Computer Science

Kyung Hee University · 2025.9 – Present

M.S. in Computer Science

Kyung Hee University · 2023.8 – 2025.8

B.S. in Electronics Engineering

Kyung Hee University · 2020 – 2023.7

B.S. in Biomedical Engineering

Kyung Hee University · 2017 – 2023.2

Selected Recognition

Spotlight NeurIPS 2025 · DANCE top 3.5%

Highlight ICCV 2025 · ESSENTIAL

Spotlight CVPR 2025 XAI4CV · PCBEAR top 16.7%

Journal IEEE TPAMI 2026 · CA²ST

Research Interests

Trustworthy Multimodal Video

Video-language-audio models we can rely on — with grounded reasoning, hallucination mitigation, and interpretability at the core, so an answer reflects what truly happens on screen.

How Models Learn

A broad curiosity about the learning process itself — how these models are trained, what representations they form, and why they succeed or fail — rather than any single technique.

Video for Physical AI

Extending video understanding toward Physical AI — treating video as the modality through which machines perceive, reason about, and eventually act in the physical world.

News

Recent News

2026.07

Seeking PhD Positions in the U.S. (Fall 2027) Open to Opportunities

I completed my M.S. and am now a Post-Master Researcher at the Vision and Learning Lab. I am actively looking for PhD positions in the United States for Fall 2027, focusing on trustworthy multimodal video understanding, Physical AI, and reliable video-language models. If you think I could be a good fit for your group, I would love to hear from you — jong980812@khu.ac.kr.

2026.05

New Preprint on arXiv arXiv 2026

We released “Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs”. We identify a fundamental failure mode of Video-LLMs — directional motion blindness — and introduce DeltaDirect, a parameter-efficient motion-change head that raises LLaVA-Video-7B from 27.6% to 85.4% on real-world direction QA. See arXiv (2605.22823) and the project page.

2025.11

TPAMI Paper Accepted IEEE TPAMI

Our paper “CA²ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition” has been accepted to IEEE TPAMI, one of the most prestigious journals in computer vision and AI. This work extends our NeurIPS 2023 paper CAST with audio-visual reasoning in a unified cross-attention framework. Huge thanks to my collaborators Joohyun Chang and Jinwoo Choi. arXiv (2503.23447)

2025.09

NeurIPS 2025 Spotlight Paper Accepted NeurIPS 2025Spotlight · top 3.5%

Our paper “Disentangled Concepts Speak Louder Than Words: Explainable Video Action Recognition” has been accepted to NeurIPS 2025 as a Spotlight (3.5% acceptance rate). The work disentangles motion dynamics, objects, and scenes into human-understandable concepts — strong performance with clear explanations of model decisions. Grateful to my collaborators Wooil Lee, Gyeong-Moon Park, Seong Tae Kim, and Jinwoo Choi.

2025.07

ICCV 2025 Highlight Paper Accepted ICCV 2025Highlight

My paper “ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning” has been accepted to ICCV 2025 as a Highlight — after three submission attempts. The journey was filled with revisions and rejections, but it proved to be an incredible learning experience. Thank you to everyone who supported me along the way.

Publications

* equally contributed first authors · † corresponding author

International Conferences & Journals

Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

Jongseo Lee, Hyuntak Lee, Sunghun Kim, Sooa Kim, Jihoon Chung, Jinwoo Choi†

arXiv 2026 Preprint, under review

Paper Project Page Code

Disentangled Concepts Speak Louder Than Words: Explainable Video Action Recognition

Jongseo Lee, Wooil Lee, Gyeong-Moon Park†, Seong Tae Kim†, Jinwoo Choi†

NeurIPS 2025Spotlight · top 3.5%

Paper Code

ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning

Jongseo Lee*, Kyungho Bae*, Kyle Min, Gyeong-Moon Park†, Jinwoo Choi†

ICCV 2025Highlight

Paper Code

PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition

Jongseo Lee, Wooil Lee, Gyeong-Moon Park†, Seong Tae Kim†, Jinwoo Choi†

CVPR 2025 · XAI4CV WorkshopSpotlight · top 16.7%

Paper

CA²ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition

Jongseo Lee*, Joohyun Chang*, Dongho Lee, Jinwoo Choi†

IEEE TPAMI 2025

Paper

PCEvE: Part Contribution Evaluation Based Model Explanation for Human Figure Drawing Assessment and Beyond

Jongseo Lee*, Geo Ahn*, Seong Tae Kim†, Jinwoo Choi†

arXiv 2024 Preprint, under review

Paper

CAST: Cross-Attention in Space and Time for Video Action Recognition

Dongho Lee*, Jongseo Lee*, Jinwoo Choi†

NeurIPS 2023

Paper Code

Metaverse Interface with Haptic and Rigid Sense Feedback at a Low Cost

Jongseo Lee, Su Hyeon Kim, Sun Woong Jang, Jun Yeong Moon, Doug Young Suh†

Journal of Appropriate Technology 8(2), 2022

DOI

Domestic Conferences

Video Concept Bottleneck Model

Jongseo Lee, Soohyun Park, Jinwoo Choi†

KIISE 2024

Paper

Efficient Video Class Incremental Learning via Class Token Re-Learning

Jongseo Lee, Soohyun Park, Jinwoo Choi†

KIISE 2024

Paper

Audio-Video Cross Attention for Effective Video Action Recognition

Jongseo Lee, Joohyun Chang, Jinwoo Choi†

KIISE 2024

Paper