Back to blog
Framework·April 25, 2026·7 min

A 6-Layer Framework for Writing AI Image Prompts

A repeatable structure for any image type — posters, photoreal, UI mockups, edits.

Why structure beats keywords

Most prompt advice gives you keywords or example outputs. Neither helps when you sit down to write your own prompt for your own scene.

What helps is structure — a checklist you can run through to make sure your prompt isn't missing anything obvious. This framework isn't an industry standard, just a practical synthesis of what tends to be present in prompts that produce good results.

The 6 layers

[Subject + specifics] + [Action / Pose] + [Environment + cultural anchor] + [Composition: shot, angle, aspect ratio] + [Lighting: quality + direction + temperature] + [Style / Medium]

Most failed prompts skip 3-4 of these layers.

Layer 1: Subject + specifics

Not "a man." Not "a coffee shop." A man is too vague — the model picks the most generic interpretation. Add: age, build, clothing material, hair, expression, posture.

A coffee shop is too vague. Add: era, city, time of day, occupancy, signage.

The rule: if a stranger could draw 50 different pictures from your description, your subject layer is too thin.

Layer 2: Action / pose

What is the subject doing? Standing isn't enough. Standing how? Hands where? Looking where?

This layer matters most for human subjects and scene compositions. For object photography it can collapse to "centered, slight tilt to the right."

Be specific about gaze direction — "looking off-frame to the camera-left" reads completely differently from "looking directly at the camera."

Layer 3: Environment + cultural anchor

Where, when, what's around. This is where cultural anchors do massive heavy lifting:

  • "a coffee shop in 1990s grunge era"
  • "a busy office during dot-com 1999"
  • "a noodle bar in 1985 Tokyo"

Each anchor gives the model a thousand visual associations for free. Use them.

Layer 4: Composition

Shot type, angle, aspect ratio. The vocabulary you need:

Shot type: extreme close-up, close-up, medium close-up, medium shot, medium wide, wide, extreme wide, aerial, top-down flat-lay.

Angle: eye-level, low-angle, high-angle, dutch angle, bird's-eye, worm's-eye.

Aspect ratio: always state it. 16:9, 9:16, 4:5, 1:1, 3:2, 2.39:1.

Layer 5: Lighting

The single highest-impact lever in any prompt. Three variables:

Quality: soft / harsh / diffused / dappled / specular.

Direction: front / side / back / rim / from camera-left / from above / from below.

Temperature: warm tungsten / cool daylight / golden hour / blue hour / overcast / mixed.

Always name at least one explicit light source. "Soft window light from camera-left" is gold. "Good lighting" is useless.

Layer 6: Style / medium

This is where you tell the model whether you want photoreal, illustration, or something stylized. Use art disciplines, not artist names:

  • Photoreal: name a film stock — "Kodak Portra 400, fine grain"
  • Editorial: "magazine editorial style, slightly desaturated"
  • Illustration: "watercolor, visible paper texture, soft color bleeds"
  • Vector: "flat vector illustration, 2-color palette"
  • Concept art: "matte painting, painterly atmosphere"

The framework in action

Watch the same idea evolve through the layers:

Layer 0 (typical thin prompt): "a hiker on a mountain"

With Subject: "Solo hiker in faded red windbreaker"

Add Action: "...walking away from camera on a sandstone trail"

Add Environment: "...in a desert canyon at golden hour"

Add Composition: "Low-angle wide shot at 24mm, 3:2 aspect"

Add Lighting: "Long shadows raking across the rock, warm rim light from camera-right"

Add Style: "Documentary travel photography, Kodak Portra 400 grain"

Final stacked prompt:

Solo hiker in faded red windbreaker walking away from camera on a sandstone trail in a desert canyon at golden hour. Long shadows raking across the rock, distant rock formations visible in middle distance. Low-angle wide shot at 24mm, f/8, 3:2 aspect. Warm rim light from camera-right, dust catching in the light. Documentary travel photography, Kodak Portra 400 grain.

Where the framework breaks (and what to use instead)

Three categories need a different structure:

Image edits: Use CHANGE / PRESERVE / MATCH instead. The 6-layer formula doesn't apply because you're not generating from scratch.

Typography-heavy posters: Lead with text (in quotes, with placement), then add the visual layer using layers 3-6. Subject and action collapse.

Abstract / experimental work: Photographic vocabulary actively hurts abstract output. Use medium + mood + palette + composition instead, with art-movement anchors instead of camera specs.

For everything else — photoreal, cinematic, character art, interiors, food, fashion, architecture — the 6 layers work as a useful checklist.

Use the framework or use Depikt

You can apply this manually on every prompt. Or you can paste your rough idea into Depikt and get a structured 6-layer prompt back in seconds. Same framework, automated.

Generate yours

Generate polished prompts in seconds.

Paste a rough idea. Get back a structured prompt that ships.