-
Notifications
You must be signed in to change notification settings - Fork 125
Add sdxl lightning quant use #992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 14 commits
043e0f9
e5911e0
32163bb
c86cf6b
ac9e77c
20fb0c3
a66095f
21e4822
d546020
c662247
ab9064d
ba63eb7
8349c4c
be9b1ee
a5b6bf2
8d0f589
42c4300
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,122 @@ | ||
| # Run SDXL-Lightning with OneDiff | ||
|
|
||
| 1. [Environment Setup](#environment-setup) | ||
| - [Set Up OneDiff](#set-up-onediff) | ||
| - [Set Up Compiler Backend](#set-up-compiler-backend) | ||
| - [Set Up SDXL-Lightning](#set-up-sdxl-lightning) | ||
| 2. [Compile](#compile) | ||
| - [Without Compile (Original PyTorch HF Diffusers Baseline)](#without-compile) | ||
| - [With OneFlow Backend](#with-oneflow-backend) | ||
| - [With NexFort Backend](#with-nexfort-backend) | ||
| 3. [Quantization (Int8)](#quantization) | ||
| - [With Quantization - OneFlow Backend](#with-quantization---oneflow-backend) | ||
| - [With Quantization - NexFort Backend](#with-quantization---nexfort-backend) | ||
| 4. [Performance Comparison](#performance-comparison) | ||
| 5. [Quality](#quality) | ||
|
|
||
| ## Environment Setup | ||
|
|
||
| ### Set Up OneDiff | ||
| Follow the instructions to set up OneDiff from the https://github.com/siliconflow/onediff?tab=readme-ov-file#installation. | ||
|
|
||
| ### Set Up Compiler Backend | ||
| OneDiff supports two compiler backends: OneFlow and NexFort. Follow the setup instructions for these backends from the https://github.com/siliconflow/onediff?tab=readme-ov-file#install-a-compiler-backend. | ||
|
|
||
|
|
||
| ### Set Up SDXL-Lightning | ||
| - HF model: [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning) | ||
| - HF pipeline: [diffusers usage](https://huggingface.co/ByteDance/SDXL-Lightning#2-step-4-step-8-step-unet) | ||
|
|
||
| ## Compile | ||
|
|
||
| > [!NOTE] | ||
| Current test is based on an 8 steps distillation model. | ||
|
|
||
| ### Run 1024x1024 Without Compile (Original PyTorch HF Diffusers Baseline) | ||
| ```bash | ||
| python3 onediff_diffusers_extensions/examples/lightning/text_to_image_sdxl_light.py \ | ||
| --prompt "product photography, world of warcraft orc warrior, white background" \ | ||
| --compiler none \ | ||
| --saved_image sdxl_light.png | ||
| ``` | ||
|
|
||
| ### Run 1024x1024 With Compile [OneFlow Backend] | ||
| ```bash | ||
| python3 onediff_diffusers_extensions/examples/lightning/text_to_image_sdxl_light.py \ | ||
| --prompt "product photography, world of warcraft orc warrior, white background" \ | ||
| --compiler oneflow \ | ||
| --saved_image sdxl_light_oneflow_compile.png | ||
| ``` | ||
|
|
||
| ### Run 1024x1024 With Compile [NexFort Backend] | ||
| ```bash | ||
| python3 onediff_diffusers_extensions/examples/lightning/text_to_image_sdxl_light.py \ | ||
| --prompt "product photography, world of warcraft orc warrior, white background" \ | ||
| --compiler nexfort \ | ||
| --compiler-config '{"mode": "max-optimize:max-autotune:low-precision", "memory_format": "channels_last", "options": {"triton.fuse_attention_allow_fp16_reduction": false}}' \ | ||
| --saved_image sdxl_light_nexfort_compile.png | ||
| ``` | ||
|
|
||
|
|
||
| ## Quantization (Int8) | ||
|
|
||
| > [!NOTE] | ||
| Quantization is a feature for onediff enterprise. | ||
|
|
||
| ### Run 1024x1024 With Quantization [OneFlow Backend] | ||
|
|
||
| Execute the following command to quantize the model, where `--quantized_model` is the path to the quantized model. For an introduction to the quantization parameters, refer to: https://github.com/siliconflow/onediff/blob/main/README_ENTERPRISE.md#diffusers-with-onediff-enterprise | ||
|
|
||
| ```bash | ||
| python3 onediff_diffusers_extensions/tools/quantization/quantize-sd-fast.py \ | ||
| --quantized_model /path/to/sdxl_lightning_oneflow_quant \ | ||
| --conv_ssim_threshold 0.1 \ | ||
| --linear_ssim_threshold 0.1 \ | ||
| --conv_compute_density_threshold 300 \ | ||
| --linear_compute_density_threshold 300 \ | ||
| --save_as_float true \ | ||
| --use_lightning 1 | ||
| ``` | ||
|
|
||
| Test the quantized model: | ||
|
|
||
| ```bash | ||
| python3 onediff_diffusers_extensions/examples/lightning/text_to_image_sdxl_light.py \ | ||
| --prompt "product photography, world of warcraft orc warrior, white background" \ | ||
| --compiler oneflow \ | ||
| --use_quantization \ | ||
| --base /path/to/sdxl_lightning_oneflow_quant \ | ||
| --saved_image sdxl_light_oneflow_quant.png | ||
| ``` | ||
|
|
||
|
|
||
| ### Run 1024x1024 With Quantization [NexFort Backend] | ||
|
|
||
| ```bash | ||
| python3 onediff_diffusers_extensions/examples/lightning/text_to_image_sdxl_light.py \ | ||
| --prompt "product photography, world of warcraft orc warrior, white background" \ | ||
| --compiler nexfort \ | ||
| --compiler-config '{"mode": "max-optimize:max-autotune:low-precision", "memory_format": "channels_last", "options": {"triton.fuse_attention_allow_fp16_reduction": false}}' \ | ||
| --use_quantization \ | ||
| --quantize-config '{"quant_type": "int8_dynamic"}' \ | ||
| --saved_image sdxl_light_nexfort_quant.png | ||
| ``` | ||
|
|
||
|
|
||
| ## Performance Comparison | ||
|
|
||
| **Testing on an NVIDIA RTX 4090 GPU, using a resolution of 1024x1024 and 8 steps:** | ||
|
|
||
| Data update date: 2024-07-29 | ||
| | Configuration | Iteration Speed (it/s) | E2E Time (seconds) | Warmup time (seconds) <sup>1</sup> | Warmup with Cache time (seconds) | | ||
| |---------------------------|------------------------|--------------------|-----------------------|----------------------------------| | ||
| | PyTorch | 14.68 | 0.840 | 1.31 | - | | ||
| | OneFlow Compile | 29.06 (+97.83%) | 0.530 (-36.90%) | 52.26 | 0.64 | | ||
| | OneFlow Quantization | 43.45 (+195.95%) | 0.424 (-49.52%) | 59.87 | 0.51 | | ||
| | NexFort Compile | 28.07 (+91.18%) | 0.526 (-37.38%) | 539.67 | 68.79 | | ||
| | NexFort Quantization | 30.85 (+110.15%) | 0.476 (-43.33%) | 610.25 | 93.28 | | ||
|
|
||
| <sup>1</sup> OneDiff Warmup with Compilation time is tested on AMD EPYC 7543 32-Core Processor. | ||
|
|
||
| ## Quality | ||
| https://github.com/siliconflow/odeval/tree/main/models/lightning | ||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,198 @@ | ||||||||
| import argparse | ||||||||
| import json | ||||||||
| import os | ||||||||
| import time | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove unused import: The -import timeCommittable suggestion
Suggested change
ToolsRuff
|
||||||||
|
|
||||||||
| import torch | ||||||||
| from diffusers import StableDiffusionXLPipeline | ||||||||
| from huggingface_hub import hf_hub_download | ||||||||
| from onediffx import compile_pipe, load_pipe, quantize_pipe, save_pipe | ||||||||
| from onediffx.utils.performance_monitor import track_inference_time | ||||||||
| from safetensors.torch import load_file | ||||||||
|
|
||||||||
| try: | ||||||||
| USE_PEFT_BACKEND = diffusers.utils.USE_PEFT_BACKEND | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix undefined name: The variable - USE_PEFT_BACKEND = diffusers.utils.USE_PEFT_BACKEND
+ from diffusers import utils
+ USE_PEFT_BACKEND = utils.USE_PEFT_BACKENDCommittable suggestion
Suggested change
ToolsRuff
|
||||||||
| except Exception as e: | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove unused variable The exception variable -except Exception as e:
+except Exception:Committable suggestion
Suggested change
ToolsRuff
|
||||||||
| USE_PEFT_BACKEND = False | ||||||||
|
|
||||||||
| parser = argparse.ArgumentParser() | ||||||||
| parser.add_argument( | ||||||||
| "--base", type=str, default="stabilityai/stable-diffusion-xl-base-1.0" | ||||||||
| ) | ||||||||
| parser.add_argument("--repo", type=str, default="ByteDance/SDXL-Lightning") | ||||||||
| parser.add_argument("--cpkt", type=str, default="sdxl_lightning_8step_unet.safetensors") | ||||||||
| parser.add_argument("--variant", type=str, default="fp16") | ||||||||
| parser.add_argument( | ||||||||
| "--prompt", | ||||||||
| type=str, | ||||||||
| # default="street style, detailed, raw photo, woman, face, shot on CineStill 800T", | ||||||||
| default="A girl smiling", | ||||||||
| ) | ||||||||
| parser.add_argument("--save_graph", action="store_true") | ||||||||
| parser.add_argument("--load_graph", action="store_true") | ||||||||
| parser.add_argument("--save_graph_dir", type=str, default="cached_pipe") | ||||||||
| parser.add_argument("--load_graph_dir", type=str, default="cached_pipe") | ||||||||
| parser.add_argument("--height", type=int, default=1024) | ||||||||
| parser.add_argument("--width", type=int, default=1024) | ||||||||
| parser.add_argument( | ||||||||
| "--saved_image", type=str, required=False, default="sdxl-light-out.png" | ||||||||
| ) | ||||||||
| parser.add_argument("--seed", type=int, default=1) | ||||||||
| parser.add_argument( | ||||||||
| "--compiler", | ||||||||
| type=str, | ||||||||
| default="oneflow", | ||||||||
| help="Compiler backend to use. Options: 'none', 'nexfort', 'oneflow'", | ||||||||
| ) | ||||||||
| parser.add_argument( | ||||||||
| "--compiler-config", type=str, help="JSON string for nexfort compiler config." | ||||||||
| ) | ||||||||
| parser.add_argument( | ||||||||
| "--quantize-config", type=str, help="JSON string for nexfort quantization config." | ||||||||
| ) | ||||||||
| parser.add_argument("--bits", type=int, default=8) | ||||||||
| parser.add_argument("--use_quantization", action="store_true") | ||||||||
|
|
||||||||
|
|
||||||||
| args = parser.parse_args() | ||||||||
|
|
||||||||
| OUTPUT_TYPE = "pil" | ||||||||
|
|
||||||||
| n_steps = int(args.cpkt[len("sdxl_lightning_") : len("sdxl_lightning_") + 1]) | ||||||||
|
|
||||||||
| is_lora_cpkt = "lora" in args.cpkt | ||||||||
|
|
||||||||
| if args.compiler == "oneflow": | ||||||||
| from onediff.schedulers import EulerDiscreteScheduler | ||||||||
| else: | ||||||||
| from diffusers import EulerDiscreteScheduler | ||||||||
|
|
||||||||
| if is_lora_cpkt: | ||||||||
| if not USE_PEFT_BACKEND: | ||||||||
| print("PEFT backend is required for load_lora_weights") | ||||||||
| exit(0) | ||||||||
| pipe = StableDiffusionXLPipeline.from_pretrained( | ||||||||
| args.base, torch_dtype=torch.float16, variant="fp16" | ||||||||
| ).to("cuda") | ||||||||
| if os.path.isfile(os.path.join(args.repo, args.cpkt)): | ||||||||
| pipe.load_lora_weights(os.path.join(args.repo, args.cpkt)) | ||||||||
| else: | ||||||||
| pipe.load_lora_weights(hf_hub_download(args.repo, args.cpkt)) | ||||||||
| pipe.fuse_lora() | ||||||||
| else: | ||||||||
| if args.use_quantization and args.compiler == "oneflow": | ||||||||
| print("oneflow backend quant...") | ||||||||
| pipe = StableDiffusionXLPipeline.from_pretrained( | ||||||||
| args.base, torch_dtype=torch.float16, variant="fp16" | ||||||||
| ).to("cuda") | ||||||||
| import onediff_quant | ||||||||
| from onediff_quant.utils import replace_sub_module_with_quantizable_module | ||||||||
|
|
||||||||
| quantized_layers_count = 0 | ||||||||
| onediff_quant.enable_load_quantized_model() | ||||||||
|
|
||||||||
| calibrate_info = {} | ||||||||
| with open(os.path.join(args.base, "calibrate_info.txt"), "r") as f: | ||||||||
| for line in f.readlines(): | ||||||||
| line = line.strip() | ||||||||
| items = line.split(" ") | ||||||||
| calibrate_info[items[0]] = [ | ||||||||
| float(items[1]), | ||||||||
| int(items[2]), | ||||||||
| [float(x) for x in items[3].split(",")], | ||||||||
| ] | ||||||||
|
|
||||||||
| for sub_module_name, sub_calibrate_info in calibrate_info.items(): | ||||||||
| replace_sub_module_with_quantizable_module( | ||||||||
| pipe.unet, | ||||||||
| sub_module_name, | ||||||||
| sub_calibrate_info, | ||||||||
| False, | ||||||||
| False, | ||||||||
| args.bits, | ||||||||
| ) | ||||||||
| quantized_layers_count += 1 | ||||||||
|
|
||||||||
| print(f"Total quantized layers: {quantized_layers_count}") | ||||||||
|
|
||||||||
| else: | ||||||||
| from diffusers import UNet2DConditionModel | ||||||||
|
|
||||||||
| unet = UNet2DConditionModel.from_config(args.base, subfolder="unet").to( | ||||||||
| "cuda", torch.float16 | ||||||||
| ) | ||||||||
| if os.path.isfile(os.path.join(args.repo, args.cpkt)): | ||||||||
| unet.load_state_dict( | ||||||||
| load_file(os.path.join(args.repo, args.cpkt), device="cuda") | ||||||||
| ) | ||||||||
| else: | ||||||||
| unet.load_state_dict( | ||||||||
| load_file(hf_hub_download(args.repo, args.cpkt), device="cuda") | ||||||||
| ) | ||||||||
| pipe = StableDiffusionXLPipeline.from_pretrained( | ||||||||
| args.base, unet=unet, torch_dtype=torch.float16, variant="fp16" | ||||||||
| ).to("cuda") | ||||||||
|
|
||||||||
| pipe.scheduler = EulerDiscreteScheduler.from_config( | ||||||||
| pipe.scheduler.config, timestep_spacing="trailing" | ||||||||
| ) | ||||||||
|
|
||||||||
| if pipe.vae.dtype == torch.float16 and pipe.vae.config.force_upcast: | ||||||||
| pipe.upcast_vae() | ||||||||
|
|
||||||||
| # Compile the pipeline | ||||||||
| if args.compiler == "oneflow": | ||||||||
| print("oneflow backend compile...") | ||||||||
| pipe = compile_pipe( | ||||||||
| pipe, | ||||||||
| ) | ||||||||
| if args.load_graph: | ||||||||
| print("Loading graphs...") | ||||||||
| load_pipe(pipe, args.load_graph_dir) | ||||||||
| elif args.compiler == "nexfort": | ||||||||
| print("nexfort backend compile...") | ||||||||
| nexfort_compiler_config = ( | ||||||||
| json.loads(args.compiler_config) if args.compiler_config else None | ||||||||
| ) | ||||||||
|
|
||||||||
| options = nexfort_compiler_config | ||||||||
| pipe = compile_pipe( | ||||||||
| pipe, backend="nexfort", options=options, fuse_qkv_projections=True | ||||||||
| ) | ||||||||
| if args.use_quantization and args.compiler == "nexfort": | ||||||||
| print("nexfort backend quant...") | ||||||||
| nexfort_quantize_config = ( | ||||||||
| json.loads(args.quantize_config) if args.quantize_config else None | ||||||||
| ) | ||||||||
| pipe = quantize_pipe(pipe, ignores=[], **nexfort_quantize_config) | ||||||||
|
|
||||||||
|
|
||||||||
| with track_inference_time(warmup=True): | ||||||||
| image = pipe( | ||||||||
| prompt=args.prompt, | ||||||||
| height=args.height, | ||||||||
| width=args.width, | ||||||||
| num_inference_steps=n_steps, | ||||||||
| guidance_scale=0, | ||||||||
| output_type=OUTPUT_TYPE, | ||||||||
| ).images | ||||||||
|
|
||||||||
|
|
||||||||
| # Normal run | ||||||||
| torch.manual_seed(args.seed) | ||||||||
| with track_inference_time(warmup=False): | ||||||||
| image = pipe( | ||||||||
| prompt=args.prompt, | ||||||||
| height=args.height, | ||||||||
| width=args.width, | ||||||||
| num_inference_steps=n_steps, | ||||||||
| guidance_scale=0, | ||||||||
| output_type=OUTPUT_TYPE, | ||||||||
| ).images | ||||||||
|
|
||||||||
|
|
||||||||
| image[0].save(args.saved_image) | ||||||||
|
|
||||||||
| if args.save_graph: | ||||||||
| print("Saving graphs...") | ||||||||
| save_pipe(pipe, args.save_graph_dir) | ||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,10 @@ | ||
| #!/bin/bash | ||
|
|
||
| python3 examples/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_unet.safetensors --save_graph --save_graph_dir cached_unet_pipe | ||
| python3 examples/lightning/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_unet.safetensors --save_graph --save_graph_dir cached_unet_pipe | ||
|
|
||
| python3 examples/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_unet.safetensors --load_graph --load_graph_dir cached_unet_pipe | ||
| python3 examples/lightning/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_unet.safetensors --load_graph --load_graph_dir cached_unet_pipe | ||
|
|
||
|
|
||
| HF_HUB_OFFLINE=0 python3 examples/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_lora.safetensors --save_graph --save_graph_dir cached_lora_pipe | ||
| HF_HUB_OFFLINE=0 python3 examples/lightning/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_lora.safetensors --save_graph --save_graph_dir cached_lora_pipe | ||
|
|
||
| HF_HUB_OFFLINE=0 python3 examples/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_lora.safetensors --load_graph --load_graph_dir cached_lora_pipe | ||
| HF_HUB_OFFLINE=0 python3 examples/lightning/text_to_image_sdxl_light.py --base /share_nfs/hf_models/stable-diffusion-xl-base-1.0 --repo /share_nfs/hf_models/SDXL-Lightning --cpkt sdxl_lightning_4step_lora.safetensors --load_graph --load_graph_dir cached_lora_pipe |
Uh oh!
There was an error while loading. Please reload this page.