Beyond the Visible: Disocclusion-Aware Editing via Proxy Dynamic Graphs

Anran Qi¹ Changjian Li² Adrien Bousseau¹ Niloy J.Mitra^{3, 4}

¹Inria - Université Côte d'Azur ²University of Edinburgh ³Adobe Research ⁴University College London

Paper (PDF) arXiv Code (Coming Soon) Supplementary

Abstract

We address image-to-video generation with explicit user control over the final frame’s disoccluded regions. Current image-to-video pipelines produce plausible motion003 but struggle to generate predictable, articulated motions while enforcing user-specified content in newly revealed areas. Our key idea is to separate motion specification from appearance synthesis: we introduce a lightweight, user-editable Proxy Dynamic Graph (PDG) that deterministically yet approximately drives part motion, while a frozen009 diffusion prior is used to synthesize plausible appearance that follows that motion. In our training-free pipeline, the user loosely annotates and reposes a PDG, from which we compute a dense motion flow to leverage diffusion as a motion-guided shader. We then let the user edit appearance in the disoccluded areas of the image, and exploit the visibility information encoded by the PDG to perform a latent-space composite that reconciles motion with user intent in these areas. This design yields controllable articulation and user control over disocclusions without finetuning. We demonstrate clear advantages against state-of-020 the-art alternatives towards images turned into short videos of articulated objects, furniture, vehicles, and deformables. Our method mixes generative control, in the form of loose pose and structure, with predictable controls, in the form of appearance specification in the final frame in the disoccluded regions, unlocking a new image-to-video workflow.

Motivation

Motivation: forward vs. backward inconsistency in disocclusions. (a) Forward pass with DaS[1] exposes a large disoccluded region (white mask); (b) Backward pass from the last frame produces a plausible but different reveal resulting in a (c) difference map highlights misalignments concentrated on newly visible areas (rear of the bus, background), showing that forward/ backward visibility disagree. Hence, (d) Naive pixel copy–paste between forward/backward passes creates seams/ghosting (yellow circles) due to parallax, shading, and occlusion-order mismatches. This motivates us to solve the disocclusion problem.

Method Overview

From an input image, we build a Proxy Dynamic Graph (PDG) and obtain coarse part tracks (marked with red arrows) and a disocclusion mask. Pass I, top: we run DaS (Diffusion-as-shader) to generate a motion-aware video driven by the PDG. The user then edits the final frame to prescribe the desired reveal in disoccluded regions (top-right). Pass II, bottom: without any retraining, we surgically replace the corresponding feature channels with the edited final-frame features and rerun DaS, yielding a video that preserves PDG-driven motion while matching the user-specified reveal (bottom-right).

Part Motion Editing

Examples of PDG-based object part motion editing. we first estimate the depth map, the camera parameters and the objects or object parts segments. This allows us to lift the depth map into a point cloud for each part, which form the nodes of the PDG. Based on the relationships and motion parameters of each nodes, we transform the 3D pointcloud across all frames to render a coarse, warped version of the intended object animation (Left, Rendered). (Right, Ours) Our method generates a high-quality, dynamic video that accurately reflects the user’s intent.

Rendered Ours

Disocclusion and Part Motion Editing

Examples of PDG-based object part motion and disocclusion editing. The motion of the objects/parts inevitably reveals disoccluded regions, we reinstate control over these regions (Right, Ours).

Rendered + New Concept Ours

Rendered | Ours

Qualitative Comparison: Part Motion Editing

Comparison of our method with the state-of-the-art motion control method Veo3.1[2], DragAnything[3] and Puppet Master[4] on user-specified object part editing tasks.

Rendered Ours Veo3.1 DragAnything Puppet Master

Qualitative Comparison: Disocclusion and Part Motion Editing

Comparison of our method on DDisocclusion and Part Motion Editing. Since no prior work addresses our novel motion- and disocclusion-aware video generation task, we created four baseline methods (see our paper for more detail).

Rendered + New Concept Ours DaS_Tnew pixel-cp pixel-cp++ Veo3.1

Citation


      @misc{Qi2025beyondvisible,
            title={Beyond the Visible: Disocclusion-Aware Editing via Proxy Dynamic Graphs},
            author={Anran Qi, Changjian Li, Adrien Bousseau and Miloy J.Mitra},
            year={2025},
            eprint={2512.13392},
            archivePrefix={arXiv},
            primaryClass={cs.CV},
            url={http://arxiv.org/abs/2512.13392},
      }

References

Gu, Zekai, Rui Yan, Jiahao Lu, Peng Li, Zhiyang Dou, Chenyang Si, Zhen Dong et al. , "Diffusion as shader: 3d-aware video diffusion for versatile video generation control.", Siggraph 2025.
Google DeepMind. , "Veo: Video generation with moe-driven diffusion and audio generation.", 2024.
Wu, Weijia, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, and Di Zhang , "Draganything: Motion control for anything using entity representation.", ECCV 2024.
Li, Ruining, Chuanxia Zheng, Christian Rupprecht, and Andrea Vedaldi , "Puppet-master: Scaling interactive video generation as a motion prior for part-level dynamics.", ICCV 2025.