Visuals: Create Cover Art & Images

Visuals Cover Art & Images lets you generate artwork from scratch, use a song for inspiration, and use reference images (like artist photos) to guide style and subject.

What you can create

This creation flow is used for:

Single / album cover art
Promo artwork (release announcements, teasers, social graphics)
Artist imagery (portraits, concepts, themed photos)
General image generation (thumbnails, posters, artwork, infographics)

If you want to turn an image into motion, use Spotify Canvas & Loop Videos (image-to-motion) after you generate your cover.

The part most people get wrong

Getting to the Create page is easy.

The hard part is getting an output that matches what’s in your head.

Most “bad results” come from one of these:

The prompt is too vague
The prompt asks for too many things at once
The reference images are unclear (or fighting each other)
The user changes 5 settings at once and can’t tell what helped

This article is focused on fixing that.

Step-by-step (quick)

Go to Create → Cover Art & Images
Choose your direction:

Generate from scratch (prompt-only)
Generate from a song (select a track for song-aware outputs)
Generate with reference images (style and/or subject guidance)

Add your inputs (prompt, song, reference images)
Pick your settings (style, mood, model)
Click Generate

Your results are saved in My Library.

Option A: Generate from scratch (prompt-only)

This is the fastest option and works great when you already know what you want.

Use this when you want to generate almost anything, like:

Cover art concepts (abstract or photoreal)
Characters or mascots
Avatar-style figures for content
YouTube thumbnails, artwork, or graphics
Visual concepts for a release (multiple directions quickly)

Best practice:

Write a clear prompt (see the Prompt Formula below)
Keep your first generation simple
Iterate 1–2 changes at a time

Option B: Generate from a song (song-aware cover art)

Visuals supports music uploads and analyzes your track.

That analysis can guide the vibe of your cover art (tone, energy, pacing).

Use this when:

You want cover art that matches a specific track
You want the AI to follow the song’s vibe without guessing
You don’t have a strong visual direction yet

Tip:

Sometimes song-aware generations can go a little too literal.

If you already have a creative direction, write it clearly, then add a line like:

This is my rough visual vision. Also use the context from the song to guide the mood and style.

Keep the song selected, then iterate your prompt until it matches your direction.

Option C: Use reference images (how to use artist photos correctly)

Reference images are the fastest way to get consistent, usable results.

1) Decide what your reference image is for

Reference images usually do one of two jobs:

Subject reference (who or what should appear)
Style reference (the look and feel)

If you upload an artist photo and you want that person in the image, say it clearly.

Examples:

Create cover art using the person in my reference photo as the main subject. Maintain their identity and facial features.
Keep the same outfit as the reference photo.
Use the person in my reference photo as the subject, but change the outfit to a futuristic stage outfit.

Important:

Avoid using the full name of a publicly known artist/person in your prompt. If you need a specific person, use reference photos and describe the vibe instead.

2) Using multiple reference images

You can upload multiple references to combine:

A person (subject)
A style (lighting, colors, texture)
An object (car, instrument, symbol)

Best practice:

Keep references consistent (don’t mix 5 different styles)
If results get messy, remove references until it improves, then add back one at a time

3) Common mistakes with reference images

Uploading a low-quality selfie and expecting a magazine cover
Uploading multiple faces and not saying which one is the subject
Expecting the AI to perfectly recreate a person without stating it
Mixing references that don’t match (cartoon + studio portrait + anime)

The Prompt Formula (simple and repeatable)

A strong prompt usually has 4 parts:

Subject (what’s on screen)
Setting (where it is)
Style (how it should look)
Composition (how it’s framed)

Here’s a template you can copy:

Subject: [who/what]
Setting: [where]
Style: [photoreal / illustration / collage / 3D / minimalist]
Composition: [close-up portrait / centered / wide / negative space]
Lighting/Color: [warm / cool / neon / film grain / high contrast]

Prompt examples (good)

Artist photo cover (clean and usable)

Create an album cover using the person in my reference photo as the main subject. Maintain their identity and facial features. Use dramatic side lighting, a black background, and a high-contrast cinematic look. Frame it like a real album cover and leave negative space.

Minimalist single cover

Create a minimalist single cover with a black background and one red symbol centered. Make it clean, modern, and high contrast, with subtle texture and lots of negative space.

Retro collage cover

Create an album cover in a vintage collage style. Use torn paper textures, warm film grain, and a dreamy mood. Keep the subject centered and make the composition feel intentional, not messy.

Prompt examples (too vague)

Make a cool cover art
Make it cinematic
Make it modern

If you’re using genre/mood/style buttons and it’s not working, write exactly what you want in plain language.

Make a cool cover art
Make it cinematic
Make it modern

If you’re using genre/mood buttons and it’s not working, write exactly what you want in plain language.

Getting consistent results (the “pro” workflow)

If you want a consistent brand across a release or catalog:

Pick 1–3 strong reference images
Reuse the same prompt structure
Keep the same style choices
Change 1–2 things at a time (not everything at once)

This is how you stop wasting generations.

Important: review your prompt before you generate

There’s a common quirk to watch for.

If you select settings (genre, mood, style) and then edit your prompt (or edit the prompt and then select settings), those settings can sometimes rewrite or override parts of your prompt.

Before you click Generate, quickly re-read the prompt to make sure it still says what you intend.

Choosing an image model (simple guide)

In Cover Art & Images, you can choose between these models:

Nano Banana (default)
Nano Banana Pro
GPT-Image
Flux Kontext Max

Here’s the simplest way to choose.

Nano Banana (default)

Use this when you want fast iteration.

Best for: rapid concepting, exploring directions, quick retries
Speed: usually very fast (often under ~10 seconds)
Strengths: strong character consistency, good quality, best “first pass” model

Nano Banana Pro

Use this when you want your final.

Best for: highest-quality cover art, best text/spelling, best consistency when using your own photos
Speed: slower than Nano Banana (often ~30–60 seconds)
Best workflow: iterate with Nano Banana, then remix/refine with Nano Banana Pro once you like the direction

GPT-Image

Use this when you want to try a different look, but avoid it for projects that require consistent faces.

Strengths: can be strong for certain styles
Tradeoff: weaker character consistency (faces can change between generations)
Speed: can be slower (it varies by load and request type, but it’s common to see tens of seconds for some generations/edits)
Reference: https://community.openai.com/t/image-generation-edit-api-time-out-with-gpt-image-1/1328514

Flux Kontext Max

Use this if you want another high-quality option and you care about prompt adherence and typography.

Strengths: strong prompt following and typography
Speed: varies by provider and settings. Some providers advertise single-digit to low double-digit seconds.

Recommendation:

Start with Nano Banana while you dial in the prompt
Switch to Nano Banana Pro when you want the final output

Reminder:

Nano Banana is the default model in Visuals unless you manually change it.

Titles, logos, and text

If your cover needs a lot of text or perfect spelling, use Nano Banana Pro.

If you want a logo inside the image:

Upload the logo as a reference image
Say it clearly in your prompt (example: “Include the logo from my reference image on the cover, clean and readable”)

Tip:

If you created a great concept in Nano Banana, you can remix it in Nano Banana Pro to clean up spelling and polish the final output.

Troubleshooting: Fix bad results fast

The output doesn’t match my idea

Make the subject more specific
Remove extra ideas from the prompt
Add a reference image

The person looks wrong

Use a better subject photo (clear face, good lighting)
Upload 2–3 photos of the same person (different angles/poses) so the model understands their identity
Say it clearly: “Use the person in my reference photo as the main subject. Maintain their identity and facial features.”
Don’t upload multiple faces unless you want multiple people

The style keeps changing

Reuse the same references
Keep your prompt structure consistent
Avoid mixing styles (realistic + anime + cartoon)

The cover is too busy

Add one of these lines:

Minimal background
Clean composition
Lots of negative space
Centered subject

The image has unwanted text or marks

We don’t currently have a separate “Avoid” or “Negative” field, so include it directly in your prompt.

Add lines like:

No text
No watermark
No extra logos
No random letters

Tip:

What you don’t want is just as important as what you do want.

Advanced tip: get multiple ideas in one generation

If you want lots of cover art options without spending credits on many generations, you can ask for multiple concepts in one image.

Example workflow:

Set aspect ratio to 9:16
Prompt: “Create a 3×3 grid of 9 different cover art concepts for this song. Each tile should be a different pose/composition, but keep the style consistent. Make each tile clean and readable.”
Generate once to get 9 options
If you like one tile, upload that grid image as a reference image and say which tile to extract and remake as a single high-res cover

Privacy and sharing

Everything you generate is private by default.

If you want, you can choose to make something public to share it and get visibility.

Credits (quick note)

Cover Art & Images uses Visuals Credits when you generate.

You’ll always see the credit cost before generation runs.

If you need help with credits, see: Visuals Credits: How they work.

FAQs

Can I use Visuals Cover Art & Images for non-music designs (thumbnails, infographics, general artwork)?
Yes. You can use it as a general image generator for artwork, thumbnails, infographics, and more.

Do I need to upload a song to create cover art?
No. You can generate from a prompt and reference images. Uploading a song helps when you want song-aware outputs.

How do I make sure my artist photo is used in the image?
Upload the photo as a reference image and say it clearly in your prompt (example: “Use the person in my reference photo as the main subject”).

Why does the text on my cover look misspelled?
AI images often misspell text. Use text fields if available, or add text after export.

Can I keep my images private?
Yes. Everything you generate is private by default. You can choose to make an image public if you want.

What should I do if something fails or errors out?
Use the in-app chat bubble to message us in Intercom and include a screenshot and the error message.

What is Visuals by Alphana?

Visuals Quickstart: Create your first asset in 5 minutes

Visuals: Create Spotify Canvas & Loop Videos

Visuals: Create Lyric Videos