We address image-to-video generation with explicit user control over the final frame’s disoccluded regions. Current image-to-video pipelines produce plausible motion003 but struggle to generate predictable, articulated motions while enforcing user-specified content in newly revealed areas. Our key idea is to separate motion specification from appearance synthesis: we introduce a lightweight, user-editable Proxy Dynamic Graph (PDG) that deterministically yet approximately drives part motion, while a frozen009 diffusion prior is used to synthesize plausible appearance that follows that motion. In our training-free pipeline, the user loosely annotates and reposes a PDG, from which we compute a dense motion flow to leverage diffusion as a motion-guided shader. We then let the user edit appearance in the disoccluded areas of the image, and exploit the visibility information encoded by the PDG to perform a latent-space composite that reconciles motion with user intent in these areas. This design yields controllable articulation and user control over disocclusions without finetuning. We demonstrate clear advantages against state-of-020 the-art alternatives towards images turned into short videos of articulated objects, furniture, vehicles, and deformables. Our method mixes generative control, in the form of loose pose and structure, with predictable controls, in the form of appearance specification in the final frame in the disoccluded regions, unlocking a new image-to-video workflow.
Motivation: forward vs. backward inconsistency in disocclusions. (a) Forward pass with DaS[1] exposes a large disoccluded region (white mask); (b) Backward pass from the last frame produces a plausible but different reveal resulting in a (c) difference map highlights misalignments concentrated on newly visible areas (rear of the bus, background), showing that forward/ backward visibility disagree. Hence, (d) Naive pixel copy–paste between forward/backward passes creates seams/ghosting (yellow circles) due to parallax, shading, and occlusion-order mismatches. This motivates us to solve the disocclusion problem.
From an input image, we build a Proxy Dynamic Graph (PDG) and obtain coarse part tracks (marked with red arrows) and a disocclusion mask. Pass I, top: we run DaS (Diffusion-as-shader) to generate a motion-aware video driven by the PDG. The user then edits the final frame to prescribe the desired reveal in disoccluded regions (top-right). Pass II, bottom: without any retraining, we surgically replace the corresponding feature channels with the edited final-frame features and rerun DaS, yielding a video that preserves PDG-driven motion while matching the user-specified reveal (bottom-right).
Examples of PDG-based object part motion editing. we first estimate the depth map, the camera parameters and the objects or object parts segments. This allows us to lift the depth map into a point cloud for each part, which form the nodes of the PDG. Based on the relationships and motion parameters of each nodes, we transform the 3D pointcloud across all frames to render a coarse, warped version of the intended object animation (Left, Rendered). (Right, Ours) Our method generates a high-quality, dynamic video that accurately reflects the user’s intent.
Examples of PDG-based object part motion and disocclusion editing. The motion of the objects/parts inevitably reveals disoccluded regions, we reinstate control over these regions (Right, Ours).
Rendered | Ours
Rendered | Ours
Rendered | Ours
Rendered | Ours
Rendered | Ours
Rendered | Ours
Comparison of our method with the state-of-the-art motion control method Veo3.1[2], DragAnything[3] and Puppet Master[4] on user-specified object part editing tasks.
Comparison of our method on DDisocclusion and Part Motion Editing. Since no prior work addresses our novel motion- and disocclusion-aware video generation task, we created four baseline methods (see our paper for more detail).
@misc{Qi2025beyondvisible,
title={Beyond the Visible: Disocclusion-Aware Editing via Proxy Dynamic Graphs},
author={Anran Qi, Changjian Li, Adrien Bousseau and Miloy J.Mitra},
year={2025},
eprint={2512.13392},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={http://arxiv.org/abs/2512.13392},
}