Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory

Music-conditioned 3D dance generation can facilitate real-world applications such as assisting human performers. However, this task faces several challenges. First, not all physically viable 3D human poses are applicable to dance. Also, the dance sequence generated should be in harmony with the rhythm of the music.

Image credit: Pxhere, CC0 Public Domain

A paper recently published on arXiv.org outlines a new dance generation. It has two main components targeting spatial and temporal challenges respectively. First, a finite dictionary of quantitative dance units is created.

In addition, to generate a temporally harmonic dance sequence, a generative pre-trained transformer-like network has been introduced to translate the music and source currency codes into target future pose codes.

Experiments show that the proposed approach outperforms the current state-of-the-art on both automated metric and visualization decisions.

Driving 3D characters to dance after a piece of music is highly challenging due to the spatial constraints imposed on poses by choreography norms. In addition, the generated dance sequence also needs to maintain temporal coherence with different musical styles. To address these challenges, we propose Bellando, a novel music-to-dance framework with two powerful components: 1) a choreographic memory that learns to summarize meaningful dance units from a 3D pose sequence into a quantitative codebook. is, 2) an actor-critic generative pre-trend transformer (GPT) which prepares these units for a fluent dance with music. With learned choreographic memory, dance generation is realized on quantitative units meeting high choreography standards, such that the generated dance sequences are confined within spatial constraints. To achieve synchronized alignment between diverse tempo tempos and musical beats, we present an actor-critic-based reinforcement learning scheme for GPT with a newly-designed beat-aligned reward function. Extensive experiments on standard benchmarks show that our proposed framework achieves state-of-the-art performance both qualitatively and quantitatively. In particular, learned choreographic memory is shown in an uncontrolled manner leading to human-exploratory dance-style exploration.

Link: https://arxiv.org/abs/2203.13055


Leave a Reply

Your email address will not be published.