Preparing the exhibit

Night at the Museum

A Scalable Framework for Text-Driven Mesh Motion Generation

Chenyang Xu, Haoran Li, Guangzhao He, Zeyu Jiang, Shichen Zhang,
Juexiao Zhang, Sihang Li, Chen Feng, Jing Zhang
(tentative)
2026 · Under Review

↓ Scroll to Watch Demo Video

Demo Video

↓ Scroll to Enter Museum

Abstract

We introduce AniMuse, a two-stage framework for text-driven animal mesh animation directly from raw meshes, without predefined skeletons, joint names, or manual rigging. At its core are Semantic Gaussian Bones (SGBs), a compact skeleton-free deformation representation decoded from a globally shared learnable query book and trained through explicit linear blend skinning with topology-aware mask-gated weights. The shared query book yields stable cross-instance bone slots, providing a mesh-native control space for text-conditioned generation and SGB-slot motion inpainting. A DiT-based diffusion model generates per-bone SE(3) trajectories from text and geometric bone latents, while allowing users to clamp selected SGB slots and inpaint the remaining full-body motion. On DeformingThings4D, our rig reduces bidirectional CD-L1 by 39% over the best neural skeleton baseline, and a forward-only variant achieves the lowest CD-L2 overall. On the out-of-domain AnimalML3D benchmark, AniMuse improves overall motion quality over skeleton-based and vertex-based baselines.

Acknowledgements

Demo data is extracted from Planet Zoo using code provided by Animo. Visual design references: Ruinart Unconventional Gallery, UNESCO Virtual Museum, Microsoft × NHM Visions of Nature.

Day at the Museum