HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation

The Hong Kong University of Science and Technology (Guangzhou)1, Tencent AI Lab2, South China University of Technology3, The Hong Kong University of Science and Technology4

Abstract

Current text-to-avatar methods often rely on implicit representations, leading to 3D content that artists cannot easily edit and animate in graphics software. This paper introduces a novel framework for generating stylized head avatars from text guidance, which leverages locally learnable mesh deformation and 2D diffusion priors to achieve high-quality digital assets for attribute-preserving manipulation. Given a template mesh, our method represents mesh deformation with per-face Jacobians and adaptively modulates local deformation using a learnable vector field. This vector field enables anisotropic scaling while preserving the rotation of vertices, which can better express identity and geometric details. We also employ landmark- and contour-based regularization terms to balance the expressiveness and plausibility of generated head avatars from multiple views without relying on any specific shape prior. Our framework can not only generate realistic shapes and textures that can be further edited via text, but also support seamless editing using the preserved attributes from the template mesh, such as 3DMM parameters, blendshapes, and UV coordinates. Extensive experiments demonstrate that our framework can generate diverse and expressive head avatars with high-quality meshes that artists can easily manipulate in 3D graphics software, facilitating downstream applications such as more efficient asset creation and animation with preserved attributes.

Project arXiv BibTeX
BibTeX copied to clipboard