VideoCraft: A Mixed Reality-Empowered Video Generation Workflow with Spatial Layer Editing for Concept Video Creation
Abstract
Concept videos for physical spaces are powerful tools for creators to explore and present spatial design ideas by integrating digital elements into real-world footage. While current video-to-video (V2V) generation models have eased the traditionally labor-intensive creation process, they lack support for seamlessly inserting new objects into original spaces and enabling precise spatial adjustments. To address these challenges, we propose VideoCraft, a novel mixed reality (MR)-empowered video generation workflow for concept video creation. Through a formative study, we identify key limitations in simply integrating MR and V2V models, particularly around localized editing for style and geometry. Therefore, we introduce a spatial layer editing mechanism into the workflow, enabling intuitive spatial manipulation through layer shaping, features, and states. We evaluate VideoCraft through a controlled user study and expert interviews, demonstrating its effectiveness in enhancing spatial precision and creative control.