Few-Step Sampling
Few-step diffusion models facilitate rapid inference, but generally lack support for CFG, making negative
guidance ineffective.
NAG restores effective negative prompting, enabling direct suppression of visual, semantic, and stylistic
attributes.
This enhances controllability and expands creative freedom across composition, style, and quality.
Flux-Schnell, 4 steps
SD3.5-Large-Turbo, 8 steps
Flux-Dev, 25 steps
Sampling with CFG
NAG is a general enhancement to standard guidance strategies, such as CFG, offering advancements in multi-step models.
SD3.5-Large, 25 steps, CFG
Video Generation
NAG extends robust control to video synthesis, enabling content suppression and quality enhancement. It highlights NAG’s capacity as a general-purpose mechanism for diffusion control across spatial and temporal domains.
Wan2.1-T2V-14B, 480p, 25 steps
Wan2.1-I2V-14B, 480p, 25 steps
Approach
The prevailing approach to diffusion model control, Classifier-Free Guidance (CFG), enables negative
guidance by extrapolating between positive and negative conditional outputs at each denoising step. However,
in few-step regimes, CFG's assumption of consistent structure between diffusion branches breaks down, as
these branches diverge dramatically at early steps. This divergence causes severe artifacts rather than
controlled guidance.
Normalized Attention Guidance (NAG) operates in attention space by extrapolating positive and negative
features Z+ and Z-, followed by L1-based normalization and α-blending. This constrains
feature deviation, suppresses out-of-manifold drift, and achieves stable, controllable guidance.

Computational Cost
We measure the per-step sampling latency on NVIDIA A100 GPU.
Unlike CFG, which requires doubling the computation of sampling steps, NAG only applies additional
computation to cross-attention layers or MM-DiT blocks.
In Flux, NAG incurs a similar cost to CFG, whereas in SD3.5-Large, SANA, SDXL and Wan2.1, it introduces
significantly lower additional inference time.
Model Family | Baseline | CFG | NAG |
---|---|---|---|
Wan2.1 | 10.7 s | +10.7 s (100%) | +1.3 s (12%) |
SANA | 39 ms | +35 ms (90%) | +5 ms (13%) |
SDXL | 75 ms | +25 ms (34%) | +17 ms (22%) |
SD3.5‐Large | 231 ms | +219 ms (95%) | +109 ms (43%) |
Flux | 487 ms | +488 ms (100%) | +426 ms (87%) |