Audio2gesture#

Overview#

Audio2Gesture is Neural network trained to generate body motion that is derived entirely from an audio source. With various animation styles and options available to animate the full body or upper body. Connect your character with the automatic Retargeting tool. A2G provides a high quality and efficient solution to generating body motion for characters in heavy dialogue scenarios.

Overview Tutorial Video#

Getting Started in Audio2Gesture#

Upon loading The Audio2Gesture extension the user is presented with the following pipeline options.

Option	Effect
A2G offline pipeline	The Audio2Gesture Offline Pipeline loads the “Regular” Audio Player for use with audio wave files to generate animation clips.
A2G Streaming pipeline	The Audio2Gesture Streaming Pipeline loads the “Streaming” Audio Player and enables a runtime workflow for TTS audio Streaming.
Base Skeleton	Loads the main skeleton that A2G manipulates to drive performances. The Base Skeleton will load by default when you build a new pipeline. The Skeleton is provided here as a convenience for retargeting reference should you encounter any problems setting up your character retarget.

Audio Players#

See Audio2Face documentation links below.

A2G offline Pipeline#

Target Skeleton#

This field will display any valid skelroot found in the current stage.

Skeleton Connected#

The Green Check mark means the selected skeleton has retargeting setup and is connected.

Skeleton not connected#

Indicates the currently assigned skeleton is not ready for retargeting. Clicking the icon exposes the Run AutoRetarget command.

Auto Retarget#

Success will return a green check Mark. Failure will prompt you to open the retargeting window. For a comprehensive look at the Retargeting tool - Please refer to the Documentation found here.

Open Retargeting Window#

Opens the Retargeting tool for more comprehensive setup of characters. For more Details - Please refer to the documentation for Animation Retargeting.

Run A2G#

Runs an optimization algorithm to find the best suited animations for the current audio source and parameters and sets A2G in a run state ready to receive audio. A progress bar will be presented during the process.

Note

Every time you change the Parameters for Audio2Gesture, you must click “run A2G” again so the new parameters can be processed by the neural network.

Style#

This provides a variety of animation style options to suit various spoken word scenarios.

Neutral (default)
Big Gestures
Calm Speech
Public Speech
Public Speech - casual
Public Speech - behind a table

Animation Mode#

A post processing feature that provides the option of a full body animation performance or upper body performance only.

Animation Option#

A2G will present a number of options for motion types that best suit the processed audio file and will default to the best or “top” option. User can choose between the other options to explore character performance alternatives.

Advanced Settings#

After changing these settings it is required that you “run” A2G once more.

Option	Effect
Num Epochs	A2G performs iterative optimizations for each new audio track. More iterations generates better quality.
Num Samples	On each Iteration A2F generates a number of sample animations. More samples = Better quality.
Smoothing Time Span	Parameter to control smoothing duration to source animations as they are stitched together.
Audio Sync Strength	Animation smoothing can affect audio synchronization - this options provides control over that balance between smoothing and accuracy.

Animation Graph Setup#

Option	Effect
Character	Select a character from the current stage.
Translation Var	Select a translation variable from an anim graph in the current stage.
Rotation Var	Select a rotation Variable from an anim graph in the current stage

Animation Recording#

Destination path#

Specify a folder on disk to write your animation clip. Press the folder to use a browser window to select the folder. Press the link button to browse to the folder in file explorer.

Take Name#

Specify a name for your animation clip. The output USD will be: {destination_path}/{target_prim_name}_{take_name}.usd The output USD will contain one SkelAnimation with the Take Name.

Export FPS.#

Set the desired Frames Per Second to record the animation data. (defaults at 60 fps)

Record.#

Clicking record - will execute the “run” command and start playing the audio for a clean output of the full animation to match the audio clip duration.