Nono.MA

JULY 23, 2022

Two months ago, HuggingFace open-source "state-of-the-art diffusion models for image and audio generation in PyTorch" at github.com/huggingface/diffusers.

"Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves as a modular toolbox for inference and training of diffusion models."

Here's a text-to-image example from the repository's README.

# !pip install diffusers transformers
from diffusers import DiffusionPipeline

model_id = "CompVis/ldm-text2im-large-256"

# load model and scheduler
ldm = DiffusionPipeline.from_pretrained(model_id)

# run pipeline in inference (sample random noise and denoise)
prompt = "A painting of a squirrel eating a burger"
images = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6)["sample"]

# save images
for idx, image in enumerate(images):
    image.save(f"squirrel-{idx}.png")

Latent diffusion is the type of model architecture used in Google's Imagen or OpenAI's DALL·E to generate images from text and increase the resolution of output images.

BlogCodeAi