HiDiffusion

(ECCV 2024)

Shen Zhang, Zhaowei Chen, Zhenyu Zhao, Yuhao Chen, Yao Tang, Jiajun Liang^*

MEGVII Technology
^*Indicates corresponding author

Increases the resolution and speed of your diffusion models by only adding a single line of code.

Higher resolution, better visual enjoyment

SDXL, 2048x2048. Standing tall amidst the ruins, a stone golem awakens, vines and flowers sprouting from the crevices in its body.

SDXL, 2048x2048. A disoriented astronaut, lost in a galaxy of swirling colors, floating in zero gravity, grasping at memories, poignant loneliness, stunning realism, cosmic chaos, emotional depth, 12K, hyperrealism, unforgettable, mixed media, celestial, dark, introspective

SDXL, 2048x2048. Intricately detailed, samurai meditating by a serene reflection pool, feudal japan, intense, vivid complementary colors, beautiful linework, octane render, professional concept art by jonathan hickman, miyamoto mushashi, ghibli, alphonse mucha, reads, ross tran, rutkowski, grenier, octane render, marvel, dc, dnd.

SDXL, 2048x2048. Echoes of a forgotten song drift across the moonlit sea, where a ghost ship sails, its spectral crew bound to an eternal quest for redemption.

SDXL, 2048x3072. Adorable concept illustration of a plush animal peacefully sitting on a child’s bed, soft lighting, gentle texture, dreamy atmosphere, pastel tones, matte finish, wide shot, by Yuko Shimizu.

SDXL, 2048x3072. An ancient tree spirit emerging from the heart of a primeval forest, its bark-like skin etched with runes, moss and vines entwining its limbs, and eyes that glow with the wisdom of centuries.

SDXL, 4096x4096. Thick strokes, bright colors, an exotic fox, cute, chibi kawaii. detailed fur, hyperdetailed , big reflective eyes, fairytale, artstation,centered composition, perfect composition, centered, vibrant colors, muted colors, high detailed, 8k.

SDXL-Turbo, 1024x1024. Glamorous and sexy wet Rena Rounen aka Nano in sumptuous blouse, beautiful, pearlescent skin, natural beauty, seductive eyes and face, elegant girl, natural beauty, very detailed face, seductive lady, full body portrait, natural lights, photorealism, summer vibrancy, cinematic, a portrait by artgerm, rossdraws, Norman Rockwell, magali villeneuve, Gil Elvgren, Alberto Vargas, Earl Moran, Enoch Bolles.

SDXL-Turbo, 1024x1024. rabbit as a hitman, dynamic lighting, fantasy concept art, trending on art station, stunning visuals, creative, cinematic, ultra detailed, comic strip style.

SD 2.1, 1024x1024. Portrait of a beautiful girl with dark hair dressed in 1940's fashion, park background, rich vivid colors, ambient lighting, dynamic lighting, 4k, official media, anime key visual, makoto shinkai, ilya kuvshinov, lois van baarle, rossdraws, detailed, trending on artstation.

SD 2.1, 2048x2048. A regal corgi sitting on a velvet throne, adorned with a tiny golden crown and a royal purple cloak, surveying its kingdom with a benevolent gaze from a grand castle hall. comic style, animation.

SD 1.5, 1024x1024. Waves crash against a rocky shore where a lighthouse stands tall, its beacon shines in the starry night.

SD 1.5, 2048x2048. Arcane style highly detailed matte painting stylized three quarters portrait of an anthropomorphic rugged happy fox with sunglasses! head animal person, background blur bokeh, high detail, muted color, 8k.

Faster speed, better practicality.

SDXL, 2048x3072. In the depths of a mystical forest, a robotic owl with night vision lenses for eyes watches over the nocturnal creatures. MY ALT TEXT

SDXL, 2048x4096. Autumn season, a serene mountain lake lies beside a mountain. The leaves are yellow, the blue sky with fluffy clouds adds to the tranquility of the landscape.

Abstract

Diffusion models have become a mainstream approach for high-resolution image synthesis. However, directly generating higher-resolution images from pretrained diffusion models will encounter unreasonable object duplication and exponentially increase the generation time. In this paper, we discover that object duplication arises from feature duplication in the deep blocks of the U-Net. Concurrently, We pinpoint the extended generation times to self-attention redundancy in U-Net's top blocks. To address these issues, we propose a tuning-free higher-resolution framework named HiDiffusion. Specifically, HiDiffusion contains Resolution-Aware U-Net~(RAU-Net) that dynamically adjusts the feature map size to resolve object duplication and engages Modified Shifted Window Multi-head Self-Attention(MSW-MSA) that utilizes optimized window attention to reduce computations. we can integrate HiDiffusion into various pretrained diffusion models to scale image generation resolutions even to 4096×4096 at 1.5-6× the inference speed of previous methods. Extensive experiments demonstrate that our approach can address object duplication and heavy computation issues, achieving state-of-the-art performance on higher-resolution image synthesis tasks.

Method

HiDiffusion framework.

Text-to-Image Task

Image quality comparison with other high-resolution image generation methods

2048x2048. An Astronaut in space playing an electric guitar, stylistic, cinematic, earth visible in the background. MY ALT TEXT

2048x2048. Girl with pink hair, vaporwave style, retro aesthetic, cyberpunk, vibrant, neon colors, vintage 80s and 90s style, highly detailed. MY ALT TEXT

2048x3072. Roger rabbit as a real person, photorealistic, cinematic. MY ALT TEXT

2048x4096. An otherworldly forest with bioluminescent trees, their neon blue leaves casting an ethereal glow on the path below, and curious creatures with gentle eyes peering from behind the glowing trunks. MY ALT TEXT

4096x4096. An adorable happy brown border collie sitting on a bed, high detail. MY ALT TEXT

4096x4096. Standing tall amidst the ruins, a stone golem awakens, vines and flowers sprouting from the crevices in its body.

Efficiency comparison with other acceleration methods

ControlNet Task

2048x2048 image generation with ControlNet. We can generate better images with faster speed.

Inpainting Task

2048x2048 image generation on inpainting task. We can generate better images with faster speed.

BibTeX

@inproceedings{zhang2025hidiffusion,
  title={HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models},
  author={Zhang, Shen and Chen, Zhaowei and Zhao, Zhenyu and Chen, Yuhao and Tang, Yao and Liang, Jiajun},
  booktitle={European Conference on Computer Vision},
  pages={145--161},
  year={2025},
  organization={Springer}
}

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models