11 months ago · 5d2c725d8a
--- a/README.md
+++ b/README.md
@@ -1,29 +1,10 @@
 
				-<p align="center">
			
 
				-    <br>
			
 
				-    <img src="https://github.com/huggingface/diffusers/raw/main/docs/source/imgs/diffusers_library.jpg" width="400"/>
			
 
				-    <br>
			
 
				-<p>
			
 
				-<p align="center">
			
 
				-    <a href="https://github.com/huggingface/diffusers/blob/main/LICENSE">
			
 
				-        <img alt="GitHub" src="https://img.shields.io/github/license/huggingface/datasets.svg?color=blue">
			
 
				-    </a>
			
 
				-    <a href="https://github.com/huggingface/diffusers/releases">
			
 
				-        <img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg">
			
 
				-    </a>
			
 
				-    <a href="CODE_OF_CONDUCT.md">
			
 
				-        <img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg">
			
 
				-    </a>
			
 
				-</p>
			
 
				-
			
 
				-🤗 Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves
			
 
				-as a modular toolbox for inference and training of diffusion models.
			
 
				-
			
 
				-More precisely, 🤗 Diffusers offers:
			
 
				-
			
 
				-- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)). Check [this overview](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/README.md#pipelines-summary) to see all supported pipelines and their corresponding official papers.
			
 
				-- Various noise schedulers that can be used interchangeably for the preferred speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)).
			
 
				-- Multiple types of models, such as UNet, can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)).
			
 
				-- Training examples to show how to train the most popular diffusion model tasks (see [examples](https://github.com/huggingface/diffusers/tree/main/examples), *e.g.* [unconditional-image-generation](https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation)).
			
 
				+**Diffusers** 是一个值得首选用于**生成图像、音频甚至 3D 分子结构**的，最先进的预训练扩散模型库。无论您是在寻找简单的推理解决方案，还是想训练自己的扩散模型，**Diffusers** 这一模块化工具箱都能对其提供支持。本库的设计更偏重于**可用而非高性能**、**简明而非简单**以及**易用而非抽象**。
			
 
				+
			
 
				+这个库包含三个主要组件：
			
 
				+
			
 
				+1.  **最先进的扩散管道 (diffusion pipelines)**：只需几行代码即可进行推理。
			
 
				+2.  **可交替使用的各种噪声调度器 (noise schedulers)**：用于平衡生成速度和质量。
			
 
				+3.  **预训练模型 (models)**：可作为构建模块，并与调度程序结合使用，来创建您自己的端到端扩散系统。
			
 
				 
			
 
				 ## Installation
			
 
				 
			
@@ -43,17 +24,7 @@ conda install -c conda-forge diffusers
 
				 
			
 
				 Please, refer to [the documentation](https://huggingface.co/docs/diffusers/optimization/mps).
			
 
				 
			
 
				-## Contributing
			
 
				-
			
 
				-We ❤️  contributions from the open-source community! 
			
 
				-If you want to contribute to this library, please check out our [Contribution guide](https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md).
			
 
				-You can look out for [issues](https://github.com/huggingface/diffusers/issues) you'd like to tackle to contribute to the library.
			
 
				-- See [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) for general opportunities to contribute
			
 
				-- See [New model/pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) to contribute exciting new diffusion models / diffusion pipelines
			
 
				-- See [New scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22)
			
 
				 
			
 
				-Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or
			
 
				-just hang out ☕.
			
 
				 
			
 
				 ## Quickstart
			
 
				 
			
@@ -63,7 +34,7 @@ In order to get started, we recommend taking a look at two notebooks:
 
				   Take a look at this notebook to learn how to use the pipeline abstraction, which takes care of everything (model, scheduler, noise handling) for you, and also to understand each independent building block in the library.
			
 
				 - The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffusion models training methods. This notebook takes a step-by-step approach to training your
			
 
				   diffusion models on an image dataset, with explanatory graphics. 
			
 
				-  
			
 
				+
			
 
				 ## **New** Stable Diffusion is now fully compatible with `diffusers`!  
			
 
				 
			
 
				 Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
			
@@ -255,7 +226,7 @@ You can generate your own latents to reproduce results, or tweak your prompt on
 
				 
			
 
				 For more details, check out [the Stable Diffusion notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb)
			
 
				 and have a look into the [release notes](https://github.com/huggingface/diffusers/releases/tag/v0.2.0).
			
 
				-  
			
 
				+
			
 
				 ## Examples
			
 
				 
			
 
				 There are many ways to try running Diffusers! Here we outline code-focused tools (primarily using `DiffusionPipeline`s and Google Colab) and interactive web-tools.
			
@@ -317,80 +288,5 @@ If you just want to play around with some web demos, you can try out the followi
 
				 | Conditional generation from sketch  	| [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/huggingface/diffuse-the-rest)           	|
			
 
				 | Composable diffusion | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Shuang59/Composable-Diffusion)           	|
			
 
				 
			
 
				-## Definitions
			
 
				-
			
 
				-**Models**: Neural network that models $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$ (see image below) and is trained end-to-end to *denoise* a noisy input to an image.
			
 
				-*Examples*: UNet, Conditioned UNet, 3D UNet, Transformer UNet
			
 
				-
			
 
				-<p align="center">
			
 
				-    <img src="https://user-images.githubusercontent.com/10695622/174349667-04e9e485-793b-429a-affe-096e8199ad5b.png" width="800"/>
			
 
				-    <br>
			
 
				-    <em> Figure from DDPM paper (https://arxiv.org/abs/2006.11239). </em>
			
 
				-<p>
			
 
				-    
			
 
				-**Schedulers**: Algorithm class for both **inference** and **training**.
			
 
				-The class provides functionality to compute previous image according to alpha, beta schedule as well as predict noise for training.
			
 
				-*Examples*: [DDPM](https://arxiv.org/abs/2006.11239), [DDIM](https://arxiv.org/abs/2010.02502), [PNDM](https://arxiv.org/abs/2202.09778), [DEIS](https://arxiv.org/abs/2204.13902)
			
 
				-
			
 
				-<p align="center">
			
 
				-    <img src="https://user-images.githubusercontent.com/10695622/174349706-53d58acc-a4d1-4cda-b3e8-432d9dc7ad38.png" width="800"/>
			
 
				-    <br>
			
 
				-    <em> Sampling and training algorithms. Figure from DDPM paper (https://arxiv.org/abs/2006.11239). </em>
			
 
				-<p>
			
 
				-    
			
 
				-
			
 
				-**Diffusion Pipeline**: End-to-end pipeline that includes multiple diffusion models, possible text encoders, ...
			
 
				-*Examples*: Glide, Latent-Diffusion, Imagen, DALL-E 2
			
 
				-
			
 
				-<p align="center">
			
 
				-    <img src="https://user-images.githubusercontent.com/10695622/174348898-481bd7c2-5457-4830-89bc-f0907756f64c.jpeg" width="550"/>
			
 
				-    <br>
			
 
				-    <em> Figure from ImageGen (https://imagen.research.google/). </em>
			
 
				-<p>
			
 
				-    
			
 
				-## Philosophy
			
 
				-
			
 
				-- Readability and clarity is preferred over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper.
			
 
				-- Diffusers is **modality independent** and focuses on providing pretrained models and tools to build systems that generate **continuous outputs**, *e.g.* vision and audio.
			
 
				-- Diffusion models and schedulers are provided as concise, elementary building blocks. In contrast, diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of another library, such as text-encoders. Examples for diffusion pipelines are [Glide](https://github.com/openai/glide-text2im) and [Latent Diffusion](https://github.com/CompVis/latent-diffusion).
			
 
				-
			
 
				-## In the works
			
 
				-
			
 
				-For the first release, 🤗 Diffusers focuses on text-to-image diffusion techniques. However, diffusers can be used for much more than that! Over the upcoming releases, we'll be focusing on:
			
 
				-
			
 
				-- Diffusers for audio
			
 
				-- Diffusers for reinforcement learning (initial work happening in https://github.com/huggingface/diffusers/pull/105).
			
 
				-- Diffusers for video generation
			
 
				-- Diffusers for molecule generation (initial work happening in https://github.com/huggingface/diffusers/pull/54)
			
 
				-
			
 
				-A few pipeline components are already being worked on, namely:
			
 
				-
			
 
				-- BDDMPipeline for spectrogram-to-sound vocoding
			
 
				-- GLIDEPipeline to support OpenAI's GLIDE model
			
 
				-- Grad-TTS for text to audio generation / conditional audio generation
			
 
				-
			
 
				-We want diffusers to be a toolbox useful for diffusers models in general; if you find yourself limited in any way by the current API, or would like to see additional models, schedulers, or techniques, please open a [GitHub issue](https://github.com/huggingface/diffusers/issues) mentioning what you would like to see.
			
 
				-
			
 
				-## Credits
			
 
				-
			
 
				-This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:
			
 
				-
			
 
				-- @CompVis' latent diffusion models library, available [here](https://github.com/CompVis/latent-diffusion)
			
 
				-- @hojonathanho original DDPM implementation, available [here](https://github.com/hojonathanho/diffusion) as well as the extremely useful translation into PyTorch by @pesser, available [here](https://github.com/pesser/pytorch_diffusion)
			
 
				-- @ermongroup's DDIM implementation, available [here](https://github.com/ermongroup/ddim).
			
 
				-- @yang-song's Score-VE and Score-VP implementations, available [here](https://github.com/yang-song/score_sde_pytorch)
			
 
				-
			
 
				-We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available [here](https://github.com/heejkoo/Awesome-Diffusion-Models) as well as @crowsonkb and @rromb for useful discussions and insights.
			
 
				 
			
 
				-## Citation
			
 
				 
			
 
				-```bibtex
			
 
				-@misc{von-platen-etal-2022-diffusers,
			
 
				-  author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
			
 
				-  title = {Diffusers: State-of-the-art diffusion models},
			
 
				-  year = {2022},
			
 
				-  publisher = {GitHub},
			
 
				-  journal = {GitHub repository},
			
 
				-  howpublished = {\url{https://github.com/huggingface/diffusers}}
			
 
				-}
			
 
				-```