Equal or better quality? I suppose it depends on what you are trying to create, ...

johnfn · on Sept 28, 2022

Which one are you comparing against? I've tried hundreds of prompts between SD and DALL-E and get comparable results. Midjourney was lagging for a while, but the new --testp parameter is really remarkable, which, in my view, makes it superior not only to Stable Diffusion but also to DALL-E as well.

gpt5 · on Sept 28, 2022

An easy example of DALL-E superiority is its ability to combine two different concepts together.

For example, DALL-E performs extremely impressively on prompts in the format of “a still of homer Simpson in The Godfather” (replace character and movie as you wish). with the other two it’s a lot of misses

cbozeman · on Sept 28, 2022

With StableDiffusion I can buy a used RTX 3090 on eBay for $650, tell the model to generate 5,000 images, and then review each one until I find what it is I'm looking for.

Turns out a shitload of misses are acceptable when it only takes 4-7 seconds to generate an image from a prompt. 5000 generations on an RTX 3090 takes around 7 hours +/- 30 minutes, by the way.

Turing_Machine · on Sept 29, 2022

What I've been doing is generate maybe 100 images, pick the best one, and then generating another 100 from that, using --init-image ("good" image file name) and --init-image-strength 0.2 (or so), either with the original prompt or a slightly tweaked one.

Those are the params I use in ImaginAIry, mileage may vary if you're using a different package.

gpt5 · on Sept 29, 2022

It's a bit ironic bringing a 7 hours of RTX 3090 run as a cost saving given that it's like 3KWh of electricity, which costs more than DALL-E's already outrageous prices.

moralestapia · on Sept 29, 2022

Is your math ok?

3KWh is like $0.5 USD ...

DALL-E would give you like two pictures for that price, LOL

cbozeman · on Sept 29, 2022

I live in Texas. 3 KWh of power is $0.29 for me.

edumucelli · on Sept 29, 2022

In France that would be 0.51 EUR ...

johnfn · on Sept 28, 2022

While this is likely true for this specific prompt, I think that cherry-picking a single prompt that DALL-E outperforms SD on is not super indicative of anything. I've conversely found a large number of prompts where SD outperforms DALL-E, either in aesthetic quality or just following directions! I think you'd really have to compare both of them across a large number of prompts of different types to be sure.

pdntspa · on Sept 29, 2022

To say nothing of the fact that you have lots of sliders to configure jsut how closely or loosely it follows your prompt. And choice in sampling methods.

You can't just compare SD and DALL-E performance on prompts alone, because SD gives you a lot more levers to steer it in the direction you want.

gpt5 · on Sept 29, 2022

Can you share the prompts you see where SD consistently outperforms DALL-E?

johnfn · on Sept 29, 2022

Sure, try this one:

> house interior, friendly, playful, video game, screenshot, mockup, birds-eye view, top down perspective, jrpg, 32 bit, pixel art, black background

SD absolutely demolishes DALL-E on this one. SD produces really nice-looking output, with a high degree of consistency. DALL-E produces incoherent nonsense.

pdntspa · on Sept 29, 2022

Have you poked around lexica.art?

bitcurious · on Sept 28, 2022

>An easy example of DALL-E superiority is its ability to combine two different concepts together.

This is a con for some prompts. As an example, I asked for a painting of an elephant and a dog drinking tea together. The result was a dog with an elephant nose next to a teapot.

A similar misfire was the word 'porcupine' which drew pigs, I guess because porc is in it? Anyway, it's idea-blending is a little too aggressive.

Terretta · on Sept 28, 2022

Start your prompt with "group photo of" then list the elephant and the dog. If you try this across many images, group photo will result in about 2x as many keeping the subjects separate.

TillE · on Sept 28, 2022

Yeah you're right that Stable Diffusion produces garbage for that prompt.

I'd love to see a site with lots of examples of the same prompt fed into various models, I assume someone has already made that.

zimpenfish · on Sept 28, 2022

> Yeah you're right that Stable Diffusion produces garbage for that prompt.

I dunno, I generated 20 images from that prompt locally and got three good ones[1].

https://imgur.com/a/rZ6wOEF

wunderbaba · on Sept 28, 2022

What? None of the people in these images are even remotely recognizable as Homer Simpson.

zimpenfish · on Sept 28, 2022

What would you count as a pass then? A literal rendering of the cartoon Homer Simpson on top of a still from the actual Godfather film?

gpt5 · on Sept 29, 2022

Check out DALL-E results for similar prompts:

https://twitter.com/Dalle2Pics/status/1534718848137560064?re...

zimpenfish · on Sept 29, 2022

Ah, that is much better, definitely.

Have to let the AI experts speculate on why SD goes nuts there because it definitely knows what "The Godfather (1972)" means (if you ask for e.g. 'A still of Patrick Stewart in "The Godfather (1972)"' you get one - which I believe DALL-E can't do because of their facial restrictions?)

avereveard · on Sept 28, 2022

from dall-e: https://i.imgur.com/RHiOjuM.png

I would argue that none of these follow the prompt. they all represent a goodfather frame in simpson stile, which is not about placing homer in a godfather still.

yreg · on Sept 28, 2022

My experience is that with prompts that fit into OpenAI's limiting content policy DALL-E text2img results are usually much better. And I use SD like 95% of the time, so it's not the case that I would be more used to DALL-E.

KaoruAoiShiho · on Sept 28, 2022

I need some examples because I don't really see it for the vast majority of usecases.

yreg · on Sept 28, 2022

Here I wanted to illustrate the game Waffle[0], first attempt with Dalle was pretty good, not true for SD:

https://labs.openai.com/s/rCzJwauuiaIj1Pd3IyJGaHS3

Here I wanted an illustration of a nuclear plant in a japanese landscape, first attempt with Dalle produced multiple good results. I tried SD and MJ (back when MJ didn't use SD) as well, had trouble even with multiple attempts:

https://labs.openai.com/s/FxhxtMFe3kFS8msV8vekRAJ3

There are others, but anyway I think my examples are not important since it will be always easy to cherry pick prompts that yield the best results in model X.

In my experience SD is good at producing (especially non-photo-realistic) art that looks pretty and DALL-E is better at following a specific prompt when I know what exactly I want.

Of course I recognise your experience might (and probably does) differ.

[0] - https://wafflegame.net/

hooloovoo_zoo · on Sept 28, 2022

Agreed; SD barely follows prompts at all.

zimpenfish · on Sept 28, 2022

> Agreed; SD barely follows prompts at all.

I would heartily disagree - I've generated ~6.5k images using SD locally and most of them could be linked to the prompt they came from.

hooloovoo_zoo · on Sept 28, 2022

Doesn’t ‘most of them could be linked to the prompt they came from’ strike you as damning with faint praise?

twojacobtwo · on Sept 28, 2022

> SD barely follows prompts at all.

> ...and most of them could be linked to the prompt they came from.

You made it sound as if there is almost no connection between the prompt and the images and zimpenfish said that the majority could be linked, implying a strong connection. He/she doesn't have to be praising it at all to counter your claim.

zimpenfish · on Sept 28, 2022

Not hugely - e.g. taking the 38 prompts including "a painting by William Adolphe Bouguereau" (which is easily the worst of the modifiers for me), 10 of them I'd say were "no clue to the prompt". For the 56 Munch images, 54 were good and 2 were quibbles ("an isopod as an angel" had no isopod but did have an angelic human - is that a pass or no?)

(Which is probably better than you'd get from a human given the exact same prompts.)

itintheory · on Sept 28, 2022

Have you seen a decent tutorial for setting up SD locally? I've been using it through huggingface, but that seems pretty limited.

Caseee · on Sept 28, 2022

You can find a number of different guides over at the stable diffusion subreddit, from CLI to GUIs in different flavors.

https://www.reddit.com/r/StableDiffusion/comments/xcq819/dre...

zimpenfish · on Sept 28, 2022

No, sorry, but there's a whole bunch of one-click things now, I think?

I'm running it on Windows 10 using (a modified version of) https://github.com/bfirsh/stable-diffusion.git and Anaconda to create the environment from their `environment.yaml` (all of which was done using the normal `cmd` shell). Then to use it, I activate that env from `cmd` and switch into cygwin `bash` to run the `txt2img.py` script (because it's easier to script, etc.)

[edit: probably helps that I already had a working VQGAN-CLIP setup which meant all the CUDA stuff was already there. For that I followed https://www.youtube.com/watch?v=XH7ZP0__FXs which covered the CUDA installation for VQGAN-CLIP.]

cmdr2 · on Sept 29, 2022

There's a 1-click installation for installing Stable Diffusion locally (Windows/Linux), doesn't require anything pre-installed - https://github.com/cmdr2/stable-diffusion-ui#installation

Nimitz14 · on Sept 28, 2022

Official repo is straightforward: https://github.com/CompVis/stable-diffusion

Have to admit just started looking into it, mb there are better options