VideoCraft: A Mixed Reality-Empowered Video Generation Workflow with Spatial Layer Editing for Concept Video Creation

Boyu Li¹, Linping Yuan², Zeyu Wang^1,2

The Hong Kong University of Science and Technology (Guangzhou)¹, The Hong Kong University of Science and Technology²

Abstract

Concept videos for physical spaces are powerful tools for creators to explore and present spatial design ideas by integrating digital elements into real-world footage. While current video-to-video (V2V) generation models have eased the traditionally labor-intensive creation process, they lack support for seamlessly inserting new objects into original spaces and enabling precise spatial adjustments. To address these challenges, we propose VideoCraft, a novel mixed reality (MR)-empowered video generation workflow for concept video creation. Through a formative study, we identify key limitations in simply integrating MR and V2V models, particularly around localized editing for style and geometry. Therefore, we introduce a spatial layer editing mechanism into the workflow, enabling intuitive spatial manipulation through layer shaping, features, and states. We evaluate VideoCraft through a controlled user study and expert interviews, demonstrating its effectiveness in enhancing spatial precision and creative control.

PDF Project BibTeX

BibTeX copied to clipboard