Diffusers 是一个值得首选用于生成图像、音频甚至 3D 分子结构的,最先进的预训练扩散模型库。无论您是在寻找简单的推理解决方案,还是想训练自己的扩散模型,Diffusers 这一模块化工具箱都能对其提供支持。本库的设计更偏重于可用而非高性能、简明而非简单以及易用而非抽象。
这个库包含三个主要组件:
With pip
pip install --upgrade diffusers
With conda
conda install -c conda-forge diffusers
Apple Silicon (M1/M2) support
Please, refer to the documentation.
In order to get started, we recommend taking a look at two notebooks:
diffusers
!Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION. It's trained on 512x512 images from a subset of the LAION-5B database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See the model card for more information.
You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the model card, read the license and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to this section of the documentation.
We recommend using the model in half-precision (fp16
) as it gives almost always the same results as full
precision while being roughly twice as fast and requiring half the amount of GPU RAM.
# make sure you're logged in with `huggingface-cli login`
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_type=torch.float16, revision="fp16")
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
Note: If you don't want to use the token, you can also simply download the model weights
(after having accepted the license) and pass
the path to the local folder to the StableDiffusionPipeline
.
git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
Assuming the folder is stored locally under ./stable-diffusion-v1-4
, you can also run stable diffusion
without requiring an authentication token:
pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4")
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
If you are limited by GPU memory, you might want to consider chunking the attention computation in addition
to using fp16
.
The following snippet should result in less than 4GB VRAM.
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
pipe.enable_attention_slicing()
image = pipe(prompt).images[0]
If you wish to use a different scheduler, you can simply instantiate
it before the pipeline and pass it to from_pretrained
.
from diffusers import LMSDiscreteScheduler
lms = LMSDiscreteScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear"
)
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
scheduler=lms,
)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")
If you want to run Stable Diffusion on CPU or you want to have maximum precision on GPU, please run the model in the default full-precision setting:
# make sure you're logged in with `huggingface-cli login`
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
# disable the following line if you run on CPU
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")
The StableDiffusionImg2ImgPipeline
lets you pass a text prompt and an initial image to condition the generation of new images.
import requests
import torch
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionImg2ImgPipeline
# load the pipeline
device = "cuda"
model_id_or_path = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
model_id_or_path,
revision="fp16",
torch_dtype=torch.float16,
)
# or download via git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
# and pass `model_id_or_path="./stable-diffusion-v1-4"`.
pipe = pipe.to(device)
# let's download an initial image
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))
prompt = "A fantasy landscape, trending on artstation"
images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images
images[0].save("fantasy_landscape.png")
You can also run this example on colab
The StableDiffusionInpaintPipeline
lets you edit specific parts of an image by providing a mask and text prompt.
from io import BytesIO
import torch
import requests
import PIL
from diffusers import StableDiffusionInpaintPipeline
def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
device = "cuda"
model_id_or_path = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionInpaintPipeline.from_pretrained(
model_id_or_path,
revision="fp16",
torch_dtype=torch.float16,
)
# or download via git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
# and pass `model_id_or_path="./stable-diffusion-v1-4"`.
pipe = pipe.to(device)
prompt = "a cat sitting on a bench"
images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75).images
images[0].save("cat_on_bench.png")
You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. This notebook shows how to do it step by step. You can also run it in Google Colab .
For more details, check out the Stable Diffusion notebook and have a look into the release notes.
There are many ways to try running Diffusers! Here we outline code-focused tools (primarily using DiffusionPipeline
s and Google Colab) and interactive web-tools.
If you want to run the code yourself 💻, you can try out:
Text-to-Image Latent Diffusion
# !pip install diffusers transformers
from diffusers import DiffusionPipeline
device = "cuda"
model_id = "CompVis/ldm-text2im-large-256"
# load model and scheduler
ldm = DiffusionPipeline.from_pretrained(model_id)
ldm = ldm.to(device)
# run pipeline in inference (sample random noise and denoise)
prompt = "A painting of a squirrel eating a burger"
image = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6).images[0]
# save image
image.save("squirrel.png")
Unconditional Diffusion with discrete scheduler
# !pip install diffusers
from diffusers import DDPMPipeline, DDIMPipeline, PNDMPipeline
model_id = "google/ddpm-celebahq-256"
device = "cuda"
# load model and scheduler
ddpm = DDPMPipeline.from_pretrained(model_id) # you can replace DDPMPipeline with DDIMPipeline or PNDMPipeline for faster inference
ddpm.to(device)
# run pipeline in inference (sample random noise and denoise)
image = ddpm().images[0]
# save image
image.save("ddpm_generated_image.png")
Other Notebooks:
If you just want to play around with some web demos, you can try out the following 🚀 Spaces: | Model | Hugging Face Spaces | |-------------------------------- |------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Text-to-Image Latent Diffusion | | | Faces generator | | | DDPM with different schedulers | | | Conditional generation from sketch | | | Composable diffusion | |