MagicScroll: Enhancing Immersive Storytelling with Controllable Scroll Image Generation

The Hong Kong University of Science and Technology (Guangzhou)1, Xiamen University Malaysia2, The Hong Kong University of Science and Technology3

Abstract

Scroll images are a unique medium commonly used in virtual reality (VR) providing an immersive visual storytelling experience. Despite rapid advances in diffusion-based image generation, it remains an open research question to generate scroll images suitable for immersive, coherent, and controllable storytelling in VR. This paper proposes a multi-layered, diffusion-based scroll image generation framework with a novel semantic-aware denoising process. We incorporate layout prediction and style control modules to generate coherent scroll images of any aspect ratio. Based on the scroll image generation framework, we use different multi-window strategies to render diverse visual forms such as chains, rings, and forks for VR storytelling. Quantitative and qualitative evaluations demonstrate that our techniques can significantly enhance text-image consistency and visual coherence in scroll image generation, as well as the level of immersion and engagement of VR storytelling. We will release our source code to facilitate better collaborations on immersive storytelling between AI researchers and creative practitioners.

PDF arXiv Project BibTeX
BibTeX copied to clipboard