Wan Animate Pipeline by csgoogle · Pull Request #367 · AI-Hypercomputer/maxdiffusion

csgoogle · 2026-03-28T14:42:32Z

Wan Animate Pipeline

This CL publishes add the Wan Animate pipepline.

Reused the existing Wan attention operator for face encoder cross attention.
Swept Flash Attention block-size configurations to identify the best inference setting.

Links

Performance

compile_time: 292.73833787906915
generation_time: 157.68515427410603

Configuration

cp: 8 (v6e8)
cfg: 1.0
prev_segments: 5
resolution: 1280x720
fps: 24
generated_frames: 77

github-actions · 2026-03-28T14:42:41Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

Perseus14 · 2026-04-18T07:56:44Z

Please resolve conflicts and enable support for diagnostics and profiling as in this PR

github-actions · 2026-05-07T09:52:23Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

📋 Review Summary

The Pull Request introduces the Wan Animate pipeline, which includes the transformer model architecture, inference entry point, and necessary utilities. The implementation is comprehensive and follows the established patterns in the repository, including support for segment-based inference and parity with Diffusers.

🔍 General Feedback

Performance Optimization: The current implementation of the transformer re-encodes the face video frames during every denoising step. Since the face video is static throughout the inference process, this encoding can be pre-computed once per segment to significantly reduce redundant computation and speed up generation.
Compilation Efficiency: The generation script performs two full inference passes. For high-resolution video generation, this double work is expensive. Consider reducing the number of steps in the first (compile) pass.
Robustness: Added checks for optional inputs in the transformer to prevent potential runtime errors when face_pixel_values is not provided.
Code Quality: The reuse of the Wan attention operator and the integration with the existing configuration system is well-done. The use of nnx.scan for transformer blocks ensures memory efficiency during inference.

github-actions · 2026-05-11T14:16:58Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-05-11T16:39:06Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-05-11T16:46:38Z

🤖 I'm sorry @Perseus14, but I was unable to process your request. Please see the logs for more details.

github-actions · 2026-05-11T17:15:52Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This PR successfully implements the Wan Animate pipeline in MaxDiffusion, including the 3D transformer architecture, motion and face encoders, and the inference pipeline with temporal tiling. The implementation is robust, follows established patterns for sharding and JIT optimization, and is backed by comprehensive parity tests.

🔍 General Feedback

Efficiency: The use of pre-computed motion vectors once per segment and the implementation of scan_layers show good attention to performance on TPU/GPU hardware.
Testing: The parity tests are exhaustive and provide high confidence in the implementation's correctness relative to the reference Diffusers implementation.
Modularity: Reusing the Wan attention operator for face conditioning is a clean approach.

github-actions

## 📋 Review Summary

Additional technical suggestions for the Wan Animate transformer implementation.

🔍 General Feedback

Optimization: Minor optimizations in MotionConv2d to avoid repeated array conversions.
Robustness: Defensive check in WanAnimateFaceBlockCrossAttention for sequence length consistency.

github-actions · 2026-05-11T17:20:06Z

🤖 I'm sorry @Perseus14, but I was unable to process your request. Please see the logs for more details.

github-actions · 2026-05-11T17:22:33Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-05-11T17:27:33Z

🤖 I'm sorry @Perseus14, but I was unable to process your request. Please see the logs for more details.

github-actions · 2026-05-11T18:02:18Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-05-11T18:07:13Z

🤖 I'm sorry @Perseus14, but I was unable to process your request. Please see the logs for more details.

github-actions · 2026-05-12T05:53:13Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

The Pull Request introduces the Wan Animate pipeline and its associated 3D transformer model to the maxdiffusion library. The implementation is comprehensive, including checkpointing support, sharding-aware attention, and extensive parity tests against the reference implementation. The code follows the established architectural patterns of the project and integrates well with existing Wan and NNX utilities.

🔍 General Feedback

Parity Testing: Excellent inclusion of detailed parity tests (wan_animate_module_parity_test.py) which ensures the JAX/Flax implementation matches the reference torch model.
NNX Migration: The use of flax.nnx for the new models is consistent with the project's direction.
Robustness: Some minor improvements suggested for robustness (e.g., division by zero checks) and consistency in parameter access within nnx modules.
Documentation: The docstrings are informative and follow the project's style.

- Add WanAnimateTransformer3DModel with motion encoder, face encoder, and face adapter cross-attention blocks - Add WanAnimatePipeline supporting animate and replace modes with multi-segment temporal stitching - Add generate_wan_animate.py inference entrypoint - Add base_wan_animate.yml config for 720p inference - Pre-compute face motion vectors once per segment instead of every denoising step for faster inference - Simplify face block cross-attention forward pass: replace einops with jnp.reshape, remove redundant sharding constraints - Add parity tests for animate modules and diffusers comparison

Perseus14 · 2026-05-12T11:12:14Z

LGTM!

Test case failed is unrelated to this change.

csgoogle marked this pull request as ready for review April 6, 2026 16:33

csgoogle requested a review from entrpn as a code owner April 6, 2026 16:33

csgoogle force-pushed the sagarchapara/wananimate-pipeline branch 2 times, most recently from e281524 to 349d080 Compare April 13, 2026 09:10

csgoogle commented Apr 13, 2026

View reviewed changes

Comment thread src/maxdiffusion/schedulers/scheduling_unipc_multistep_flax.py

Perseus14 reviewed Apr 13, 2026

View reviewed changes

Comment thread src/maxdiffusion/schedulers/scheduling_unipc_multistep_flax.py

Perseus14 reviewed Apr 13, 2026

View reviewed changes

Comment thread assets/wan_animate/src_face.mp4 Outdated

Perseus14 added the gemini-review label Apr 17, 2026

Perseus14 reviewed Apr 18, 2026

View reviewed changes

Comment thread src/maxdiffusion/pipelines/wan/wan_pipeline_animate.py

Comment thread src/maxdiffusion/schedulers/scheduling_unipc_multistep_flax.py

Comment thread src/maxdiffusion/generate_wan_animate.py

Perseus14 reviewed May 7, 2026

View reviewed changes

Comment thread .gitignore Outdated

Perseus14 reviewed May 7, 2026

View reviewed changes

Comment thread src/maxdiffusion/generate_wan_animate.py

Perseus14 reviewed May 7, 2026

View reviewed changes

Comment thread src/maxdiffusion/configs/base_wan_animate.yml

Perseus14 added gemini-review and removed gemini-review labels May 7, 2026

github-actions Bot reviewed May 7, 2026

View reviewed changes

Comment thread src/maxdiffusion/generate_wan_animate.py

Comment thread src/maxdiffusion/models/wan/transformers/transformer_wan_animate.py

csgoogle force-pushed the sagarchapara/wananimate-pipeline branch 3 times, most recently from ef88d04 to f6b4c22 Compare May 11, 2026 12:22

Perseus14 added gemini-review and removed gemini-review labels May 11, 2026

Perseus14 assigned csgoogle May 11, 2026

github-actions Bot reviewed May 11, 2026

View reviewed changes

Perseus14 added gemini-review and removed gemini-review labels May 11, 2026

github-actions Bot reviewed May 12, 2026

View reviewed changes

csgoogle force-pushed the sagarchapara/wananimate-pipeline branch 2 times, most recently from b2ea208 to a2d4356 Compare May 12, 2026 09:52

csgoogle force-pushed the sagarchapara/wananimate-pipeline branch from a2d4356 to eddfd4d Compare May 12, 2026 09:55

Perseus14 approved these changes May 12, 2026

View reviewed changes

Perseus14 added the pull ready label May 12, 2026

Conversation

csgoogle commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Wan Animate Pipeline

Links

Performance

Configuration

Uh oh!

github-actions Bot commented Mar 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Perseus14 commented Apr 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

📋 Review Summary

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Perseus14 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

csgoogle commented Mar 28, 2026 •

edited

Loading