**Diffusers** 是一个值得首选用于**生成图像、音频甚至 3D 分子结构**的,最先进的预训练扩散模型库。无论您是在寻找简单的推理解决方案,还是想训练自己的扩散模型,**Diffusers** 这一模块化工具箱都能对其提供支持。本库的设计更偏重于**可用而非高性能**、**简明而非简单**以及**易用而非抽象**。 这个库包含三个主要组件: 1. **最先进的扩散管道 (diffusion pipelines)**:只需几行代码即可进行推理。 2. **可交替使用的各种噪声调度器 (noise schedulers)**:用于平衡生成速度和质量。 3. **预训练模型 (models)**:可作为构建模块,并与调度程序结合使用,来创建您自己的端到端扩散系统。 ## Installation **With `pip`** ```bash pip install --upgrade diffusers ``` **With `conda`** ```sh conda install -c conda-forge diffusers ``` **Apple Silicon (M1/M2) support** Please, refer to [the documentation](https://huggingface.co/docs/diffusers/optimization/mps). ## Quickstart In order to get started, we recommend taking a look at two notebooks: - The [Getting started with Diffusers](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) notebook, which showcases an end-to-end example of usage for diffusion models, schedulers and pipelines. Take a look at this notebook to learn how to use the pipeline abstraction, which takes care of everything (model, scheduler, noise handling) for you, and also to understand each independent building block in the library. - The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffusion models training methods. This notebook takes a step-by-step approach to training your diffusion models on an image dataset, with explanatory graphics. ## **New** Stable Diffusion is now fully compatible with `diffusers`! Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information. You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation. ### Text-to-Image generation with Stable Diffusion We recommend using the model in [half-precision (`fp16`)](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) as it gives almost always the same results as full precision while being roughly twice as fast and requiring half the amount of GPU RAM. ```python # make sure you're logged in with `huggingface-cli login` from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_type=torch.float16, revision="fp16") pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] ``` **Note**: If you don't want to use the token, you can also simply download the model weights (after having [accepted the license](https://huggingface.co/CompVis/stable-diffusion-v1-4)) and pass the path to the local folder to the `StableDiffusionPipeline`. ``` git lfs install git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 ``` Assuming the folder is stored locally under `./stable-diffusion-v1-4`, you can also run stable diffusion without requiring an authentication token: ```python pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4") pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] ``` If you are limited by GPU memory, you might want to consider chunking the attention computation in addition to using `fp16`. The following snippet should result in less than 4GB VRAM. ```python pipe = StableDiffusionPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, ) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" pipe.enable_attention_slicing() image = pipe(prompt).images[0] ``` If you wish to use a different scheduler, you can simply instantiate it before the pipeline and pass it to `from_pretrained`. ```python from diffusers import LMSDiscreteScheduler lms = LMSDiscreteScheduler( beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear" ) pipe = StableDiffusionPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, scheduler=lms, ) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] image.save("astronaut_rides_horse.png") ``` If you want to run Stable Diffusion on CPU or you want to have maximum precision on GPU, please run the model in the default *full-precision* setting: ```python # make sure you're logged in with `huggingface-cli login` from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") # disable the following line if you run on CPU pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] image.save("astronaut_rides_horse.png") ``` ### Image-to-Image text-guided generation with Stable Diffusion The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images. ```python import requests import torch from PIL import Image from io import BytesIO from diffusers import StableDiffusionImg2ImgPipeline # load the pipeline device = "cuda" model_id_or_path = "CompVis/stable-diffusion-v1-4" pipe = StableDiffusionImg2ImgPipeline.from_pretrained( model_id_or_path, revision="fp16", torch_dtype=torch.float16, ) # or download via git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 # and pass `model_id_or_path="./stable-diffusion-v1-4"`. pipe = pipe.to(device) # let's download an initial image url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" response = requests.get(url) init_image = Image.open(BytesIO(response.content)).convert("RGB") init_image = init_image.resize((768, 512)) prompt = "A fantasy landscape, trending on artstation" images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images images[0].save("fantasy_landscape.png") ``` You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) ### In-painting using Stable Diffusion The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by providing a mask and text prompt. ```python from io import BytesIO import torch import requests import PIL from diffusers import StableDiffusionInpaintPipeline def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB") img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" init_image = download_image(img_url).resize((512, 512)) mask_image = download_image(mask_url).resize((512, 512)) device = "cuda" model_id_or_path = "CompVis/stable-diffusion-v1-4" pipe = StableDiffusionInpaintPipeline.from_pretrained( model_id_or_path, revision="fp16", torch_dtype=torch.float16, ) # or download via git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 # and pass `model_id_or_path="./stable-diffusion-v1-4"`. pipe = pipe.to(device) prompt = "a cat sitting on a bench" images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75).images images[0].save("cat_on_bench.png") ``` ### Tweak prompts reusing seeds and latents You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb). For more details, check out [the Stable Diffusion notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) and have a look into the [release notes](https://github.com/huggingface/diffusers/releases/tag/v0.2.0). ## Examples There are many ways to try running Diffusers! Here we outline code-focused tools (primarily using `DiffusionPipeline`s and Google Colab) and interactive web-tools. ### Running Code If you want to run the code yourself 💻, you can try out: - [Text-to-Image Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256) ```python # !pip install diffusers transformers from diffusers import DiffusionPipeline device = "cuda" model_id = "CompVis/ldm-text2im-large-256" # load model and scheduler ldm = DiffusionPipeline.from_pretrained(model_id) ldm = ldm.to(device) # run pipeline in inference (sample random noise and denoise) prompt = "A painting of a squirrel eating a burger" image = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6).images[0] # save image image.save("squirrel.png") ``` - [Unconditional Diffusion with discrete scheduler](https://huggingface.co/google/ddpm-celebahq-256) ```python # !pip install diffusers from diffusers import DDPMPipeline, DDIMPipeline, PNDMPipeline model_id = "google/ddpm-celebahq-256" device = "cuda" # load model and scheduler ddpm = DDPMPipeline.from_pretrained(model_id) # you can replace DDPMPipeline with DDIMPipeline or PNDMPipeline for faster inference ddpm.to(device) # run pipeline in inference (sample random noise and denoise) image = ddpm().images[0] # save image image.save("ddpm_generated_image.png") ``` - [Unconditional Latent Diffusion](https://huggingface.co/CompVis/ldm-celebahq-256) - [Unconditional Diffusion with continuous scheduler](https://huggingface.co/google/ncsnpp-ffhq-1024) **Other Notebooks**: * [image-to-image generation with Stable Diffusion](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg), * [tweak images via repeated Stable Diffusion seeds](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg), ### Web Demos If you just want to play around with some web demos, you can try out the following 🚀 Spaces: | Model | Hugging Face Spaces | |-------------------------------- |------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Text-to-Image Latent Diffusion | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/CompVis/text2img-latent-diffusion) | | Faces generator | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/CompVis/celeba-latent-diffusion) | | DDPM with different schedulers | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/fusing/celeba-diffusion) | | Conditional generation from sketch | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/huggingface/diffuse-the-rest) | | Composable diffusion | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Shuang59/Composable-Diffusion) |