Few-Step Sampling
Few-step diffusion models facilitate rapid inference, but generally lack support for CFG, making negative guidance ineffective.
NAG restores effective negative prompting, enabling direct suppression of visual, semantic, and stylistic attributes.
This enhances controllability and expands creative freedom across composition, style, and quality.
Flux-Schnell, 4 steps
SD3.5-Large-Turbo, 8 steps
Flux-Dev, 25 steps
Sampling with CFG
NAG is a general enhancement to standard guidance strategies, such as CFG, offering advancements in multi-step models.
SD3.5-Large, 25 steps, CFG
Video Generation
NAG extends robust control to video synthesis, enabling content suppression and quality enhancement. It highlights NAG’s capacity as a general-purpose mechanism for diffusion control across spatial and temporal domains.
Wan2.1-T2V-14B, 480p, 25 steps
Wan2.1-I2V-14B, 480p, 25 steps
Approach
The prevailing approach to diffusion model control, Classifier-Free Guidance (CFG), enables negative guidance by extrapolating between positive and negative conditional outputs at each denoising step. However, in few-step regimes, CFG's assumption of consistent structure between diffusion branches breaks down, as these branches diverge dramatically at early steps. This divergence causes severe artifacts rather than controlled guidance.
Normalized Attention Guidance (NAG) operates in attention space by extrapolating positive and negative features Z+ and Z-, followed by L1-based normalization and α-blending. This constrains feature deviation, suppresses out-of-manifold drift, and achieves stable, controllable guidance.

Computational Cost
We measure the per-step sampling latency on NVIDIA A100 GPU.
Unlike CFG, which requires doubling the computation of sampling steps, NAG only applies additional computation to cross-attention layers or MM-DiT blocks.
In Flux, NAG incurs a similar cost to CFG, whereas in SD3.5-Large, SANA, SDXL and Wan2.1, it introduces significantly lower additional inference time.
Model Family | Baseline | CFG | NAG |
---|---|---|---|
Wan2.1 | 10.7 s | +10.7 s (100%) | +1.3 s (12%) |
SANA | 39 ms | +35 ms (90%) | +5 ms (13%) |
SDXL | 75 ms | +25 ms (34%) | +17 ms (22%) |
SD3.5‐Large | 231 ms | +219 ms (95%) | +109 ms (43%) |
Flux | 487 ms | +488 ms (100%) | +426 ms (87%) |