Here are my impressions of OpenAI's latest iteration of DALL·E, an AI system that generates images from text. I've generated images in different styles and variations of my drawings, experimented with public pages, mask edits, uploads, and more.
Hi, everyone. It's Nono here, and this is an overview on DALLE 2. This is OpenAI's latest text to image generation system, which I've been lucky to get early access in the past few weeks. So I've been playing around with it a bit. And today I'm gonna give you some hints on what this system can do, how the web UI looks like and what you might be able to do with these tools regarding creativity or design, craft, or even making images and image compositions from your own images or from text prompts that you have on the web.
This is the user interface that you find once you get your beta request approved, you get an email through which you can sign up. I put my email here, put my password, and then I get to this interface.
The interface is pretty simple.
You just have a text box, like Google search, but instead of search, it says, Generate on the right. And you also have that Surprise Me button, that will suggest prompts that you can use directly or get inspired to build your own.
You can also upload an image to edit or to generate variations, but in a nutshell, with a prompt like this, these models led you generate images from text.
That means that I can write a sentence. Here we have this example that says "An Impressionist oil painting of sunflowers in a purple vase." That would generate in this current interface six different images, and then from those we can generate more variations or do edits, or maybe tweak a bit the prompt that we're inputting to get different options.
Here I was writing something that came to mind. It seems like the exercise here is coming up with the best quote. This is a process you just say, okay, I want images of a mirror layers camera with amount lenses in a dark room, digital art.
Digital art, and other types are image styles that DALLE knows about, that were on the data set that had been captioned and ensures that the image has some sort of aesthetic look.
For example, if you just put photo or like bokeh photo, it would be a photo that has the contents of what you've prompted, if the model knows how to generate those.
If you say something like candy minimalism or digital art or 3d animation, the aesthetics of the generated image will change a lot.
So let's go ahead and see some of the results that prompt brought me from, DALLE 2 Those are mirrorless cameras. I was trying to generate cameras that look like the Sony alphas, mirrorless cameras with E-mount, and this looked pretty good. If you look at this one specifically has this ring that is the color that you can see that recognizes the Sony alpha 7 series cameras.
And what you have on the right, I'm putting for some of them a QR code, so you can read that with your phone and you can see a page that DALLE generates in which you can see this image shared by me.
Next I generated a frog walking with a balloon digital art, it generated these things that look more like, illustrations, maybe done with a tablet or maybe done with oil or some watercolors.
It wasn't quite what I was trying to look for, but they look really good.
And then I put "A frog walking with a balloon, 3d animation." Note, how, from here that says digital art to here, I only changed 3d animation, but the style of the image generated changed the image completely.
Then I went and selected the top right image because it's the appearance of the frog as a cartoon or animation I liked the most and that as an image to work upon. So the reference image is the one on the top left. I selected the balloon, and added that part to the sentence instead of a frog walking with a balloon 3d animation, I wrote "A frog walking with a blue furry balloon, 3d animation" So I added the "blue furry balloon" and selected the balloon. And some of them actually look really well. I was looking for something like the image on the bottom middle, everything stays the same, except for the part that I've selected with the prompt or the context that I've written on the quote.
This is what happens that you can see on that QR code. I'm sharing an image. I'm making an image publicly available, and anyone can access through the link on that QR code. And you see how the reference original image is that with a frog and a red balloon.
If you were to not do that selection for a local edit and then change the quote, so it keeps the other context of the aesthetics the frog that is on the bottom. this is what it will generate. Instead of this generates something like this.
See how different the results are if I prompt the image on the top left with selections to only change the balloon with the same sentence, "A frog walking with a blue fairy balloon." To if I prompted from scratch with that same sentence. The frogs are completely different. They have some blue tone as well.
So in this demo, I actually wrote the quote. I wrote those images and then when I click edit, I select that balloon and add that blue furry part to the prompt, So I click generate, I wait for a bit, and then this is what we get. And what I did here is that I selected the ImageNet. I liked, I clicked on share and published. And then that gives me a public link that I can share with you. I shared through that QR code that was on the page, and then we get to this publicly available page that I mentioned before.
So this is a really cool process, which just by typing words and like making local edits, I can generate new images that didn't exist before. There's some knowledge that the data set and that the model have about these images, but we're creating in a completely different way that wasn't possible before.
The images that you generate are super cool. So this is "A spaceship controlled by a cute dog in sunglasses, digital art." You can like them more or less, but you have to be with me that this is surprising and that you can get an illustration of what you were thinking in really high quality and really appealing graphics in matter of seconds.
This is probably a more interesting experiment. This is when you select that part on the form where it said, upload your own image. And what I did is I uploaded the top left image, which is my sketch of a Cann backpack.
And I asked DALLE 2 generate different variations of that image for me. So from that prompt, it was able to understand that it was a sketch that it had tones of red and black and brown, and that it was a sketch of a backpack with things inside. Which I think it's a bit mind blowing because a human has portrayed something, not even with a camera, just with pen and watercolor. I've scanned it. And then I get these alterations of the same concept. It's a backpack. It's some way open. It has contents inside. It has some straps that you can hold it with.
And here's another experiment I did. I also uploaded a sketch of my own and I did a few rounds of manipulations, both with changing the prompt and local edits.
So I upload the top left image, generate different alterations of that. You can also see slightly different sketching styles but really similar, shading in different styles. But we also have the concept of some scribbles outside of the drawing on some of them. It's a box, it's a broken box and it also has labels or tags that are embedded on the box and some shading. So it, again, super impressive results.
This is second iteration manipulation. So I actually go ahead and try to make some blobs on that top left sketch of a box, and I say that I want a box with some flowers and daisies. That's the prompt that I. after I do that, I get this one among six, I select it and I tell it to give me more variations of it.
So we get these variations from this sketch. It's variations of a sketch that I uploaded and manipulated with DALLE. And then from that resulting image, I'm generating more variations. And I think that this one, for instance, is the one that I selected afterwards, I particularly like. Because we're start seeing not only the sunflower, but also some green color on the leaves and the tail.
What I have here is not DALLE. This is the architecture of Imagen by Google. And this is the architecture of Parti which is another model at Google that I believe was released really recently.
How Google Imagen works really similar to what DALLE does is you have a prompt, right? You have a text and then you have a frozen text encoder model. You input that text into the model. So it gives you some text embedding and then you take that into a text to image diffusion model.
We go from text to embeddings to an image is 64 by 64 pixels, in the case of Google Imagen. Then we have two different super resolution diffusion model steps that basically add more detail, more texture and everything that makes the resolution be 1024 by 1024, which is also the resolution that DALLE allowing us to download from the internet.
And that was a quick overview on some of the experiments that I've done, what DALLE is, and what you can do with it.
I just wanted to let you know, this is really promising. It's a breakthrough on creativity with AI. This is human and AI creations, and one of the main reasons why they don't release it yet to the public is because they're concerned of harmful uses of like pornography uses, and some other uses that might be harmful for society or might create bias or harassment and other things that are not fun.
I invite you to apply for the beta, to let me know your thoughts on the chat and on the video comments, and make sure to like the video, this is the type of content that you would like to see more and you wanna support the videos I'm doing here on the channel and to subscribe if you want to get notified when I go live next or when I upload new videos.
Thanks a lot for watching and I'll see you on the next video.