A 6-Layer Framework for Writing AI Image Prompts
A repeatable structure for any image type — posters, photoreal, UI mockups, edits.
Why structure beats keywords
Most prompt advice gives you keywords or example outputs. Neither helps when you sit down to write your own prompt for your own scene.
What helps is structure — a checklist you can run through to make sure your prompt isn't missing anything obvious. This framework isn't an industry standard, just a practical synthesis of what tends to be present in prompts that produce good results.
The 6 layers
[Subject + specifics] + [Action / Pose] + [Environment + cultural anchor] + [Composition: shot, angle, aspect ratio] + [Lighting: quality + direction + temperature] + [Style / Medium]
Most failed prompts skip 3-4 of these layers.
Layer 1: Subject + specifics
Not "a man." Not "a coffee shop." A man is too vague — the model picks the most generic interpretation. Add: age, build, clothing material, hair, expression, posture.
A coffee shop is too vague. Add: era, city, time of day, occupancy, signage.
The rule: if a stranger could draw 50 different pictures from your description, your subject layer is too thin.
Layer 2: Action / pose
What is the subject doing? Standing isn't enough. Standing how? Hands where? Looking where?
This layer matters most for human subjects and scene compositions. For object photography it can collapse to "centered, slight tilt to the right."
Be specific about gaze direction — "looking off-frame to the camera-left" reads completely differently from "looking directly at the camera."
Layer 3: Environment + cultural anchor
Where, when, what's around. This is where cultural anchors do massive heavy lifting:
- "a coffee shop in 1990s grunge era"
- "a busy office during dot-com 1999"
- "a noodle bar in 1985 Tokyo"
Each anchor gives the model a thousand visual associations for free. Use them.
Layer 4: Composition
Shot type, angle, aspect ratio. The vocabulary you need:
Shot type: extreme close-up, close-up, medium close-up, medium shot, medium wide, wide, extreme wide, aerial, top-down flat-lay.
Angle: eye-level, low-angle, high-angle, dutch angle, bird's-eye, worm's-eye.
Aspect ratio: always state it. 16:9, 9:16, 4:5, 1:1, 3:2, 2.39:1.
Layer 5: Lighting
The single highest-impact lever in any prompt. Three variables:
Quality: soft / harsh / diffused / dappled / specular.
Direction: front / side / back / rim / from camera-left / from above / from below.
Temperature: warm tungsten / cool daylight / golden hour / blue hour / overcast / mixed.
Always name at least one explicit light source. "Soft window light from camera-left" is gold. "Good lighting" is useless.
Layer 6: Style / medium
This is where you tell the model whether you want photoreal, illustration, or something stylized. Use art disciplines, not artist names:
- Photoreal: name a film stock — "Kodak Portra 400, fine grain"
- Editorial: "magazine editorial style, slightly desaturated"
- Illustration: "watercolor, visible paper texture, soft color bleeds"
- Vector: "flat vector illustration, 2-color palette"
- Concept art: "matte painting, painterly atmosphere"
The framework in action
Watch the same idea evolve through the layers:
Layer 0 (typical thin prompt): "a hiker on a mountain"
With Subject: "Solo hiker in faded red windbreaker"
Add Action: "...walking away from camera on a sandstone trail"
Add Environment: "...in a desert canyon at golden hour"
Add Composition: "Low-angle wide shot at 24mm, 3:2 aspect"
Add Lighting: "Long shadows raking across the rock, warm rim light from camera-right"
Add Style: "Documentary travel photography, Kodak Portra 400 grain"
Final stacked prompt:
Solo hiker in faded red windbreaker walking away from camera on a sandstone trail in a desert canyon at golden hour. Long shadows raking across the rock, distant rock formations visible in middle distance. Low-angle wide shot at 24mm, f/8, 3:2 aspect. Warm rim light from camera-right, dust catching in the light. Documentary travel photography, Kodak Portra 400 grain.Where the framework breaks (and what to use instead)
Three categories need a different structure:
Image edits: Use CHANGE / PRESERVE / MATCH instead. The 6-layer formula doesn't apply because you're not generating from scratch.
Typography-heavy posters: Lead with text (in quotes, with placement), then add the visual layer using layers 3-6. Subject and action collapse.
Abstract / experimental work: Photographic vocabulary actively hurts abstract output. Use medium + mood + palette + composition instead, with art-movement anchors instead of camera specs.
For everything else — photoreal, cinematic, character art, interiors, food, fashion, architecture — the 6 layers work as a useful checklist.
Use the framework or use Depikt
You can apply this manually on every prompt. Or you can paste your rough idea into Depikt and get a structured 6-layer prompt back in seconds. Same framework, automated.
Generate yours
Generate polished prompts in seconds.
Paste a rough idea. Get back a structured prompt that ships.
More in Framework
GPT Image 2 Prompt Examples: 12 Templates That Actually Work
OpenAI's GPT Image 2 launched on April 21, 2026 with reasoning-powered generation and dramatically improved text rendering. Here are 12 production-grade prompts across the categories that matter, with explanations of why each one works.
Tips10 ChatGPT Image Prompt Tips for Production-Quality Results
Most ChatGPT image prompt advice is recycled from older models. Here's what works specifically with GPT Image 2's reasoning architecture — practical techniques, not magic words.