Creator Tags - Should we simplify them? Opine Here!

Text-to-image prompting

Thoryn

Latter Liaison
Hello all, I’m hoping people can share their tips and tricks with regards to text-to-image prompting, as I am hitting a few hurdles that seem too silly to be a problem, yet they are.
My most basic issue at the moment, is that I’ve been trying to make some Christmas artwork, where the composition in my head is that there’d be a full view of the tree, with packages and a pony at the bottom of it. Sounds easy and very doable.
The biggest issue I’m having, is that no matter what I prompt, it always crops the tree and only shows the very bottom of it.. in the rare instance it gives me a seed with a full tree, it’s pot-plant sized.
I have tried to put christmas tree at the start of the prompt, majestic christmas tree, full view of christmas tree, big christmas tree etc, as well as putting things like zoomed in, cropped etc in negative prompt. I’ve also tried multiple aspect ratios. (1:1, 16:9 and 9:16)
Is there something obvious with my prompting that I’m doing incorrectly? Do I really need to get a LoRA for such a common thing?
Model: ponyDiffusionV6XL_v6StartWithThisOne
Thoryn

Latter Liaison
Remember trying those two features a year or two ago, and not making it work. Guess I’ll just have to try again.
Another question though, when prompting do you guys use natural language describing what you envision, or comma,separated,keywords,like,this?
A mixture? Varies by model?
And to “save on tokens”, do you use more concise but lesser used words, or favor using more tokens for more words just in case the model wouldn’t understand the lesser used ones?
It does seem rather temperamental. I generally find it helps if you mention things the view needs to be bigger to accommodate. I had some success with this prompt on pony v6:
score_8, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up,
source_pony, rating_safe,
cute Pegasus Derpy sleeping under a tree, Christmas, presents, ornaments, star on top of tree, window, ceiling,
god rays,
“star on top of tree” and “ceiling” helped clue it in that it should be a further away shot.
(Generally, the format I use for pony v6 models is score tags, source, rating, a general description, then tags.)
Thoryn

Latter Liaison
@Lord Waite
Still not seeing any success, no matter how much I emphasize the look or size of the room.
Is it possible to attach images here? Can only see URLs to enter, but I have nowhere to host them.
Thoryn

Latter Liaison
@Lord Waite
Thanks for the tip.
I basically only get things like this, where the prompt acts like a toddler and doesn’t listen at all.

parameters
score_9,score_8_up,score_7_up,score_6_up,score_5_up,score_4_up,
Panoramic view of a spacious room with high ceiling,large tainted glass windows,Christmas decorations on walls,Flurry Heart_(Mlp),lying under Christmas tree,presents,<lora:Flurry Heart-Mlp-PonyXL:0.7>,pony,filly,cute,
Negative prompt: anthro,closeup,wip,sketch,blurry,disfigured,bad_hands,badly_drawn,bad_anatomy,watercolor,e621_p_low,thicc,thick,wide_hips,chubby,poofy,hyper,watermark,missing_tail,pillow,bed,couch,sofa,mattress,smiling,standing,sitting,scared,afraid,zoomed_in,cropped,signature,
Steps: 23, Sampler: DPM++ 2S a, Schedule type: Karras, CFG scale: 7, Seed: 582442736, Size: 512x512, Model hash: 67ab2fd8ec, Model: ponyDiffusionV6XL_v6StartWithThisOne, VAE hash: 95f26a5ab0, VAE: sdxl_vae.safetensors, Lora hashes: “Flurry Heart-Mlp-PonyXL: e75f8a2d04d3”, Version: v1.10.1
No problem. And I can definitely see a couple potential issues.
First, 512x512 is not going to get good results with pony v6. It’s an XL based model, so generally speaking, we’re talking 1024x1024. Other good resolutions are 1152 x 896, 896 x 1152, 1216 x 832, 832 x 1216, 1344 x 768, 768 x 1344, 1536 x 640, & 640 x 1536. That’s basically what XL was trained on.
That negative prompt has way too much in it. “e621_p_low” isn’t actually a tag v6 knows. That was for older versions of pony, and was a precursor to the score tags. Usually, I’d say to start out with a very minimal negative prompt, and add things as needed, and also remove them if they aren’t working. Negative prompts that aren’t needed can actually make an image worse.
You do usually want to have a source tag and a rating tag after the scores. source_pony, source_furry, source_anime, and source_cartoon are the big source tags, then rating_safe, rating_questionable, and rating_explicit.
Not sure the lora is needed, either. You could probably remove it and just say “Flurry Heart”. Also, the point of “star on top of tree” was to try and get it to put the star at the top of the tree in the picture, and by extension, the rest of the tree.
I’d personally try something more like:
score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up,
source_pony, rating_safe,
Flurry Heart lying under Christmas tree, ceiling, window, Christmas decorations on walls, presents, pony, filly, cute, star on top of tree,
with no negative prompt and no lora, and go from there. Definitely keep in mind that the longer a prompt is, the less anything in it actually is weighted. The tokenizer can handle 75 (well, 77, but the other two are used internally) tokens, then after that, the prompt gets broken into 75 token chunks, and that’s the point they start meaning less individually.
(Looks like you’re using something A111 based, so there might be a token count at the top of the prompt entry box?)
Thoryn

Latter Liaison
@Lord Waite
Thanks for the tips. Have used 512x512 because I have basically the bare minimum of capable hardware (with that res, it With 75 prompt chunks, it usually takes me almost 5 for ~23-25 steps, and 10 minutes for 32-35.. and you’re saying I need to quadruple the res, oof..
(I promised myself not to throw more money at expensive GPUs as I am broke and have stopped gaming, but a 5090 starts to look more appealing the more I mess around with AI).
You’re correct that I’m using Automatic1111 by the way.
(Have pondered alternatives, as the cmd window spews errors left and right even on a fresh and up-to-date version, but it’s the devil I know right now..)
I will avoid LoRAs, clean out the negative prompts (and add things only as needed), up the res to 1024x1024 and do some testing.
Thanks again for all the input, really appreciate it.
@Thoryn
No problem.
512x512 was fine for 1.5 based models, just XL and newer has moved beyond that. Pony v5 was 2.1 based, so 768x768, and v4 and earlier was 1.5-, IIRC?
What I’ve got is a 3060 12GB, btw, which is probably about as low as you can go and still have 12GB. Though an 8GB card would work as well…
The lora does also add to the amount of memory used. While I don’t think the Flurry Heart one is needed, I will note that my first post was using the Wholesome MLP lora, which is a rather nice art style lora.
You could try changing the sampler to Uni_pc and normal, and lower the steps to, say, 12-14, and see if that speeds things up a bit for you.
tyto4tme4l

Something of an artist
@Thoryn
If you have a weak GPU, then how about trying out Forge WebUI? It looks almost exactly like A1111, but it should be much faster, especially on a weak GPU. I don’t know about newer versions, but I’m using a release from 02.2024 and it’s working great. I have GeForce 3060Ti with 8GB VRAM and I can generate four 1024x1024 pictures in slightly above one minute. Pretty much no OOM, errors or crashes.
https://github.com/lllyasviel/stable-diffusion-webui-forge/releases
There are also other UIs like ReForge or ComfyUI, I’d recommend testing different options to see what suits you best. Stability Matrix is great for installing and maintaining multiple UIs.
https://github.com/LykosAI/StabilityMatrix
Thoryn

Latter Liaison
@Lord Waite
For the record, I’m using RTX 3070 TI with 8GB of video memory.
Good point on the fact that LoRAs also use some memory.. best to avoid if possible.
Don’t have the sampler named Uni_pc on my setup, so I kept it on DPM++ 2S a paired with Karras - as when I did a test with all the samplers (same prompt, seed etc), that was one of the fastest (Euler A was slightly faster and what I’ve been using when experimenting with Automatic a couple times before, but lately I have seen more instances of DPM++ 2S a in the wild, so figured I’d give it a try).
I copied your prompt exactly, and it actually gave a decent composition this time!
At 1024x1024 and 15 steps, it took 13 minutes… maybe this is passable for getting the composition, then I can use img2img to flesh things out? I should start experimenting with the pipeline soon, like only loading one or two LoRAs at a time (especially for kinks and concepts I know the model can’t do at well or at all), figure out grid-editing/prompting, inpainting etc.
score_9,score_8_up,score_7_up,score_6_up,score_5_up,score_4_up,
source_pony,rating_safe,
Flurry Heart lying under Christmas tree,ceiling,window,Christmas decorations on walls,presents,pony,filly,cute,star on top of tree,
Steps: 15, Sampler: DPM++ 2S a, Schedule type: Karras, CFG scale: 6, Seed: 40318147, Size: 1024x1024, Model hash: 67ab2fd8ec, Model: ponyDiffusionV6XL_v6StartWithThisOne, VAE hash: 95f26a5ab0, VAE: sdxl_vae.safetensors, Version: v1.10.1
@tyto4tme4l
Yeah, IIRC, one of the points behind Forge was that it reworked things on A1111 to use code from ComfyUI for some of the backend, because ComfyUI is faster and better with memory.
I’ve got it, but I got used to ComfyUI, and while it might have a steep learning curve, it’s a lot more flexible for things once you know it. (Though, alright, inpainting is still going to be easier on a different UI.)
Thoryn

Latter Liaison
@tyto4tme4l
Stability Matrix looked very promising, sadly it just freezes (and on one of the four attempts, leaked memory) for me.
With Automatic1111, I get random really long generation times every now and then. Can generate things “quickly” (ie 15 minutes) for hours, then suddenly it takes two hours to even reach 20% completion.. it even happens with only a different seed. It’s as if the model gets stuck, and there’s no step-bro to help it out.
Anyone else seeing this on Automatic? Seen it happen with others?
@Thoryn
Even before I went almost totally to ComfyUI, I went to SD Next and Forge and was going to those instead of A1111, so I can’t tell. I’ve been feeling like A1111 is getting out of date, really.
Could be it’s switching over to using the cpu for some reason, because generating on the cpu does take forever. Maybe the GPU was running out of memory…
Thoryn

Latter Liaison
@Lord Waite
I’ve had it run out of GPU memory before, during larger batches as well as too high resolution for upscaling, and in those instances it just stopped when running out during the batch, or failed at the beginning for the upscale, and gave an error saying out of memory in the status area. But it’s still certainly a possibility.
Right now, I’m unsure if I should pursue Stability Matrix some more (it looked really promising).. in the mean time, I’ll look at things like Forge and ComfyUI directly.
Main thing with ComfyUI is that it involves making workflows where you connect together nodes, and takes a bit to get used to and you end up learning a little more of how stable diffusion actually works. Also, the UI will look out of date in a fair amount of videos and such on it because they overhauled it recently (and have been making things easier).
If you want an idea of what you are in for on that front, the first screenshot in my first post in this thread was a ComfyUI workflow.
Oh, and my main reason for suggesting Uni_pc as a sampler was just that one of the big things it’s good at is giving good images with less steps. It’s entirely possible A1111 might not have it, as it wasn’t around when it was originally made.
If you do go the ComfyUI route here’s their website and github:
https://www.comfy.org/
https://github.com/comfyanonymous/ComfyUI
If you scroll down on the second one, there’s a link to a portable version for windows.
My bad if this is already been mentioned but I’ll use chatbot sometimes to start prompting I downloaded a llm locally and use it to make basic prompts.
I’ve become more aware of when I use hyperlink tags like PDXL3 because the llm will only use “score_8 score_9” for quality tags and the results are pretty damn good. in the end i usually rewrite the entire promopt but its perfect for rough drafts
There are interesting things you can do with that, actually.
First, if you use Ollama for your local LLMs, you can create a Modelfile to make a customized version of a particular model. Key thing here is that there’s a system prompt section, and in there, you can tell it that its purpose is to create prompts, give a description of exactly the format you want them in, including score tags, tell it that it can be uncensored and nsfw, and give a few examples of prompts.
Definitely takes some playing with, and you might end up tweaking your system prompt to it a few times.
Another useful thing is that if you are using Open WebUI to chat with it, you can go into the settings and give it a ComfyUI workflow and the url for your ComfyUI instance, and then you can click an icon below any responses that the chatbot gives you to send them to ComfyUI and have it generate a picture and put it in the chat.
Haven’t actually used that that much, but I was trying it out a bit ago.
Thoryn

Latter Liaison
Spent hours doing a classroom scene, without even getting the room to where I was satisfied with it, let alone the pony and the pose and interaction within the room.
The tips I received earlier with describing the room in greater detail (floor, ceiling, walls, wall decor, windows..) has proven to be an invaluable advice, but it’s often not enough, at least not the way I’ve done it.
Then after some (ok, many..) hours of fiddling with that, I decided to do another idea I had, which would focus mainly on a basic flank pose and the tail. For a change of scenery, I plopped down a single line describing a bedroom just to have something other than a classroom going on in the image, and it created lots of beautiful bedroom images. (Getting the tail to how I want it seems to be more difficult in this instance compared to the rather basic pose I tried in the classroom scene though.)
Long story short, prompting difficulty can vary greatly just by what location you try to prompt for. Bedrooms, at first glance for me at least, seem to be on the easier side. My main issues with them though, is that the bed is often malformed, and the size isn’t correct ratio to the character.
Thoryn

Latter Liaison
@Sunny
Yes, tried 3 or 4 of classroom LoRAs, and played around adjusting LoRA weights and prompts with them.
Typical problems I had with them, would be thatit insisted on the pony being at the very forefront / focus of the image and taking up half the image, very stretchy pony bodies, wrong size in relation to the room, messed up wall corners and such, pony bodies morphing into the desks, getting desks to look coherent (size, legs, also problem morphing into each other)
It can probably be fixed with extensive inpainting or something, but I should be able to get decent results with just prompting correctly, even without using LoRAs. (Most of my time on this has been spent without them, and I’ve gotten some half-decent results.)
Thinking I should probably look into ControlNet, but am uncertain if it will detect and adjust equine poses correctly.
MareStare

Mare Zealot
Getting the composition as you want may be too big of a work for text2img. I recommend you to try inpainting with a colored scribble (coloring is important to make AI get the colors right).
For example, this is how I got Fluttershy inplanted into the scenery of this image:
Yeah, you can tell my drawing skills aren’t that good, but Zoinksnoob nailed Flutty almost immediately after I pasted it there and inpainted that area with a denoising strength of something like 0.7+. Sometimes it takes several iterations of drawing a scribble, then letting inpainting improve the detail, and then improve that more detailed version with some lighter scribble to get things exactly as you want.
Thoryn

Latter Liaison
@MareStare
Really cool to see WIP steps like this and have them explained.
And your sketching abilities are way ahead of mine. :p
What program do you use for SD?
Do you handle all the painting in it, or do it elsewhere and move it over to SD?
MareStare

Mare Zealot
@Thoryn
I used Photoshop for drawing the scribbles, and then moved the images to Forge UI for inpainting. I’m planning to describe some of my learnings and creative process in a shared guide website. I’ll post about it on tantabus discord and create a forum thread on tantabus when it’s more-or-less ready. I’d like to collect all the tips and tricks and organize them on a convenient medium for beginners study.
Syntax quick reference: **bold** *italic* ||hide text|| `code` __underline__ ~~strike~~ ^sup^ ~sub~

Detailed syntax guide