What is Image Edit?
Image Edit is an AI-powered tool designed to help you make detailed edits to your images through simple text descriptions. Whether you need to change lighting, adjust colors, remove unwanted objects, or even add new elements, Image Edit allows you to give specific instructions to the AI, which then generates high-quality results based on your prompts.How to use Image Edit
- Open ImagineArt and select Edit Mode.

- **Add an image to be used as a reference: **Upload the image you want to edit or simply drag it into the ”+” box. This will serve as the reference for the AI to make edits based on your description.Choose an editing model from the options below.
- **Choose an image editing model: **Select from various available models (e.g., Nano Banana, Seedream v4, ChatGPT Edit) depending on the level of detail and style you need.
- **Write the editing prompt: **Now, write a detailed description of the changes you want to make. Be specific about what you want altered—whether it’s adjusting brightness, adding objects, or changing the composition.

- **Choose aspect ratio and generation settings: **Select the aspect ratio (e.g., square, landscape, portrait) and any other settings that match your project’s needs.
- **Click Create: **Once everything is set, click Generate to let the AI process your image and apply the changes based on your prompt.
Be specific in your edit prompt. Instead of “change the background,” try “replace the background with a sun-drenched Tuscan vineyard at golden hour.” Precision helps the AI understand exactly what you want.
Editing models
- Nano Banana
- ChatGPT Image
- Imagine You
- Nano Banana 2
- Nano Banana Pro
- Seedream v5 Lite
- xAI Grok Imagine
- ChatGPT 1.5
- Seedream v4.5
- Flux.2 Max
- Flux.2 Pro
- Flux Kontext Max Multi
- Seedream v4 Edit
Best for: Complex natural language instructions, scene adjustments, and perspective changes.Nano Banana, developed by Google, understands complex editing instructions similar to how a conversational AI processes language. It excels at interpreting multi-part instructions and making nuanced adjustments to scenes, angles, and object placement.Key capabilities:
- Understands complex, multi-step editing instructions in plain language
- Adjusts scene angles, perspectives, and spatial relationships
- Adds, removes, or repositions objects based on descriptions
- Produces high-fidelity textures and stylized outputs
- Select Edit Mode and choose Nano Banana as your model.
- Upload up to 3 reference images.
- Write your editing prompt, describing what you want to change or add.
- Click Create.
"Shift the camera angle to a low-angle shot looking up at the subject. Add dramatic storm clouds to the sky."Best for: Multi-element compositions, detailed scene generation, and images with embedded text.ChatGPT Image by OpenAI excels at prompt accuracy and complex scene construction. It is particularly strong when you need multiple distinct elements to coexist in a single image or when your image needs to include readable, correctly spelled text.Key capabilities:
- Highly prompt-accurate: follows detailed, multi-part instructions closely
- Strong at generating complex scenes with many interacting elements
- Best-in-class text rendering inside images
- Character and object design from detailed descriptions
- Select Edit Mode and choose ChatGPT as your model.
- Upload up to 3 reference images.
- Write your detailed editing or generation prompt.
- Click Create.
"Add a chalk-board sign in the background that reads 'Grand Opening' in bold script. Keep the storefront and lighting unchanged."Best for: Character consistency across multiple scenes, poses, and styles.Imagine You is a specialized character consistency model. Upload a single reference photo of a person or character, and it generates variations of that same character in different poses, outfits, environments, and artistic styles—while preserving their core visual identity.Key capabilities:
- One-shot character consistency from a single reference photo
- Generates variations across styles (photorealistic, cartoon, fantasy, etc.)
- Explore different poses, clothing, and settings while maintaining likeness
- Select presets or guide output through the prompt box
- Select Edit Mode and choose Imagine You as your model.
- Upload a clear reference photo of your character.
- Select a style preset or write a custom prompt to describe the variation you want.
- Set aspect ratio and image count, then click Create.
"Same character, wearing a futuristic silver spacesuit, standing on a Martian landscape, dramatic cinematic lighting."Best for: High-accuracy editing with real-time knowledge, multi-character consistency, and sharp text rendering.Nano Banana 2, powered by Google’s Gemini 3.1 Flash Image model, is an upgraded successor to Nano Banana with significantly improved accuracy, richer visual quality, and real-time web search integration. It can handle up to 5 characters and 14 objects simultaneously while maintaining consistency, and supports resolutions from 512px up to 4K. Its real-time web grounding means it can generate visually accurate representations of real-world subjects, brands, and locations.Key capabilities:
- Real-time web search grounding — Pulls live online information so generated visuals are factually and visually accurate (e.g., real product appearances, landmarks, logos).
- Multi-character and multi-object consistency — Maintain up to 5 characters and 14 objects consistently across a single image or across multiple outputs.
- Precise text rendering — Generate legible, accurate text within images for marketing mockups, greeting cards, posters, and product labels.
- High-resolution output — Supports resolutions up to 4K with vibrant lighting, richer textures, and sharper details.
- Fast Flash-speed generation — Delivers high quality at the speed expected from Gemini Flash architecture.
- Select Edit Mode and choose Nano Banana 2 as your model.
- Upload up to 14 reference images (optional) or start with a text prompt.
- Describe your desired output or edits with as much detail as needed.
- Choose your resolution and aspect ratio.
- Click Create.
"Generate a product advertisement for a luxury watch on a marble surface. Include the text 'Timeless Precision' in elegant serif font at the bottom."Best for: Professional-grade, photorealistic image creation with maximum prompt fidelity and multilingual text support.Nano Banana Pro, powered by Google’s Gemini 3 Pro Image model, is Google’s highest-quality image generation and editing model. Designed for creative professionals and advanced workflows, it produces ultra-high-resolution 4K images with industry-leading text rendering, localized editing, and search-grounded accuracy. It’s the best choice when image quality, detail, and precision are non-negotiable.Key capabilities:
- Ultra-high-resolution 4K output — Generates images at 3840×2160 or higher with exceptional sharpness and detail.
- Industry-leading text rendering — Create infographics, slides, diagrams, and layouts with multi-language text rendered clearly, from short taglines to full paragraphs.
- Localized editing and inpainting — Fine-tune specific areas of an image—adjusting focus, lighting, color grading, or camera angles—without affecting the rest.
- Search-grounded generation — Google Search grounding allows the model to research topics and generate factually accurate, context-aware visuals for maps, diagrams, or real-world subjects.
- Multi-identity consistency — Preserve facial features, clothing, and posture for up to 5 people across multi-image sequences.
- Multilingual support — Accurately render text across different writing systems and languages.
- Select Edit Mode and choose Nano Banana Pro as your model.
- Upload reference images (optional) or start from a detailed text prompt.
- Describe your desired output, specifying style, lighting, text elements, and any localized edits.
- Select your resolution (up to 4K) and aspect ratio.
- Click Create.
"Create a magazine cover layout featuring a woman in a tailored red suit against a white studio backdrop. Include the headline 'The Future of Design' in bold black font at the top, and a subheading in French below."Best for: 4K image generation with deep reasoning, real-time web search, and exceptional text accuracy.Seedream v5 Lite, ByteDance’s most advanced image generation model, combines chain-of-thought visual reasoning with real-time web search to produce native 4K images in 2–3 seconds. It’s the first Seedream model with live internet connectivity, enabling it to retrieve up-to-date visual references during generation. With 99%+ text spelling accuracy in both English and Chinese, it excels at any content that requires precise typography — from product labels to marketing posters.Key capabilities:
- Real-time web search integration — The first Seedream model to retrieve live information during generation, producing more accurate and up-to-date visuals.
- Native 4K generation — Produces high-resolution images natively, without upscaling artifacts.
- Chain-of-thought visual reasoning — The model thinks through complex prompts step by step, resulting in better-composed and more coherent images.
- 99%+ text accuracy — Industry-leading spelling accuracy for rendered text in both English and Chinese across diverse font styles, rotations, and layouts.
- Multi-reference compositing — Combine elements from up to 14 reference images in a single generation.
- Select Edit Mode and choose Seedream v5 Lite as your model.
- Upload up to 14 reference images (optional) or start from a text prompt.
- Write a detailed prompt describing the scene, style, and any text elements you need.
- Set your resolution and aspect ratio.
- Click Create.
"A high-resolution product mockup of a skincare bottle on a pale pink marble surface, with soft natural lighting. Include the brand name 'Lumière' in elegant gold cursive on the label."Best for: Versatile, high-volume image generation across photorealistic, artistic, and illustrative styles.Grok Imagine is xAI’s dedicated image generation model, built for creative flexibility and production-scale output. It supports a wide range of visual styles — from photorealistic portraits to anime, oil painting, and pencil sketches — and accepts up to three reference images per prompt. With generation modes optimized for either quality or speed, and support for up to 300 requests per minute, it’s well-suited for both individual creators and high-volume workflows.Key capabilities:
- Multi-style generation — Produce images ranging from ultra-realistic photography to anime, watercolor, oil painting, digital illustration, and beyond.
- Image-to-image generation — Upload up to 3 reference images and describe the transformation you want.
- Quality and Speed modes — Choose between Speed Mode for fast output and Quality Mode for finer detail and higher fidelity.
- Precise text and logo rendering — Generate legible text, accurate logos, and detailed realistic elements within images.
- Multiple aspect ratios — Supports 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 1:2, and 2:1 ratios.
- High-volume capability — Handles up to 300 API requests per minute for production use cases.
- Select Edit Mode and choose xAI Grok Imagine as your model.
- Upload up to 3 reference images (optional) to guide the visual style or subject.
- Write a prompt describing the desired image, style, and any elements to change or preserve.
- Select your generation mode (Quality or Speed) and aspect ratio.
- Click Create.
"A realistic portrait of a woman with curly red hair standing in a forest at golden hour. Painterly style, warm tones, soft background bokeh."Best for: Precise image editing, inpainting, transparent background generation, and real-time streaming previews.ChatGPT 1.5 (GPT Image 1.5) is OpenAI’s most advanced image generation and editing model, built as the successor to DALL-E 3. It delivers superior instruction-following accuracy with highly targeted edits that change only what you ask for, leaving the rest of the image intact. Key additions include streaming generation (see partial previews as the image builds), transparent background support, and prompt-guided inpainting — making it a powerful tool for production-ready image workflows.Key capabilities:
- Precise targeted editing — Change specific elements (shirt color, background, object position) using natural language while keeping everything else consistent.
- Prompt-guided inpainting — Edit a defined region of an image using a mask, with the model blending changes naturally into the surrounding content.
- Transparent background generation — Produce images with no background, ready for compositing into designs or placing on custom surfaces.
- Streaming generation — Receive up to 3 partial image previews as generation progresses, so you can assess results before completion.
- Consistent facial and detail fidelity — Maintains original composition, lighting, style, and fine-grained detail across edits and iterations.
- Select Edit Mode and choose ChatGPT 1.5 as your model.
- Upload your image (required for editing workflows) or start from a text prompt.
- For inpainting, indicate the region you want to edit; for full edits, describe the changes in natural language.
- Enable streaming if you want partial preview images during generation.
- Click Create.
"Remove the background entirely and replace it with a transparent layer. Keep the product and all its shadows exactly as they are."Best for: Unified text-to-image generation and editing with exceptional typography and multi-image consistency.Seedream v4.5 is ByteDance’s professional-grade image model that unifies text-to-image generation and image editing in a single architecture. Released in December 2025, it delivers up to 4-megapixel (2048×2048) output with outstanding text rendering — supporting complex typography, multiple font styles, curved and rotated text, and multilingual content. Its multi-image support makes it especially effective for projects requiring consistent visual output across many assets.Key capabilities:
- Unified generation and editing — Generate new images from text and edit existing images within the same model and workflow.
- Exceptional text rendering — Accurately renders complex words, multiple text elements, diverse font styles, rotated or curved text, and multilingual content.
- Multi-reference support — Process up to 14 reference images simultaneously, maintaining consistent subjects, lighting, and style across complex compositions.
- High-resolution output — Generates images up to 4MP (2048×2048), suitable for professional print and digital media.
- Multi-image consistency — Ideal for product catalogs, brand campaigns, and visual storytelling where multiple images must share the same character, style, or theme.
- Select Edit Mode and choose Seedream v4.5 as your model.
- Upload up to 14 reference images (optional) for consistency or editing guidance.
- Write a prompt describing your scene, style, and any text elements to include.
- Choose your output resolution and aspect ratio.
- Click Create.
"A series of product packaging designs for a tea brand in a minimalist Japanese style. Each image should feature the same logo, clean typography with the flavor name, and a botanical illustration in muted earth tones."Best for: Top-tier photorealism, character consistency, and professional-grade image editing with maximum prompt fidelity.FLUX.2 Max is Black Forest Labs’ highest-quality image generation and editing model in the FLUX.2 family. Built on a hybrid Mistral-3 24B vision-language model with Rectified Flow Transformer architecture, it delivers unmatched photorealism, the strongest prompt following in its class, and real-time web grounding for factually accurate visuals. It’s the premium choice for commercial campaigns, product imagery, and any creative work where quality is the top priority.Key capabilities:
- Unmatched photorealism — Professional-grade output with exceptional sharpness, lighting, and material detail at up to 4MP resolution.
- Real-time web grounding — Performs live web searches to incorporate accurate, current information into generated images.
- Multi-reference editing — Supports up to 10 reference images with reliable character consistency, product styling, and brand identity preservation.
- Precise hex color control — Specify exact color values for pixel-perfect brand and design consistency.
- Strongest prompt adherence — Maximum faithfulness to complex, multi-element prompts across styles and subjects.
- Adjustable inference — Fine-tune quality with up to 50 inference steps and custom guidance scale.
- Select Edit Mode and choose Flux.2 Max as your model.
- Upload up to 10 reference images (optional) to guide character, style, or product consistency.
- Write a detailed prompt specifying style, lighting, colors, and any elements to preserve or change.
- Set your resolution (up to 4MP) and inference steps.
- Click Create.
"A luxury fashion editorial photograph of a model in a tailored ivory trench coat, standing on a rain-slicked Paris street at night. Cinematic lighting, reflections in puddles, sharp focus on the fabric texture."Best for: High-resolution generation and editing with reliable text rendering and consistent character identity across multiple outputs.FLUX.2 Pro is Black Forest Labs’ professional-tier model in the FLUX.2 family, featuring 32 billion parameters and support for resolutions up to 4 megapixels. Released in November 2025, it delivers accurate text rendering, precise color matching, and consistent character identity across multiple generations — making it a strong choice for creators who need dependable, high-quality output at scale without the maximum compute of FLUX.2 Max.Key capabilities:
- 32-billion parameter architecture — Delivers strong generation quality, prompt understanding, and spatial reasoning.
- 4MP resolution support — Generates images at up to 4 megapixels with high detail and clarity.
- Accurate text rendering — Reliably places legible, well-styled text within images.
- Precise color matching — Consistent color reproduction for brand and product imagery.
- Character identity consistency — Maintain the same character’s appearance across multiple generated outputs.
- Multi-reference editing — Combine reference images for compositing, style matching, and character-guided generation.
- Select Edit Mode and choose Flux.2 Pro as your model.
- Upload reference images (optional) for style or character guidance.
- Write a prompt describing the desired image, specifying details like lighting, composition, and text.
- Set your resolution and aspect ratio.
- Click Create.
"A high-resolution product shot of a premium chocolate box on a dark wood surface, with soft studio lighting. Include 'Maison Noël' in gold serif text embossed on the lid."Best for: Multi-image compositing, precise character consistency across scenes, and context-aware editing using multiple source images.Flux Kontext Max Multi is Black Forest Labs’ highest-quality context-aware image editing model, optimized specifically for multi-image workflows. While the standard Flux Kontext Max excels at single-image editing, the Multi variant is designed to combine, blend, and maintain consistency across up to 8–10 input images simultaneously — ideal for complex compositions, multi-scene campaigns, and character-driven visual storytelling where elements from different sources must seamlessly coexist.Key capabilities:
- Multi-image input support — Accepts up to 8–10 reference images, enabling compositing and element fusion from diverse sources.
- Strongest instruction following — Maximum accuracy in understanding and executing complex, multi-element editing instructions.
- Character and identity consistency — Preserves facial features, clothing, proportions, and visual identity across all inputs and outputs.
- Detail preservation — Fine-grained preservation of textures, lighting, and compositional elements from source images.
- Context-aware editing — Processes all input images in context to maintain coherent spatial relationships, lighting, and style.
- High-resolution output — Generates at up to 4MP across any aspect ratio.
- Select Edit Mode and choose Flux Kontext Max Multi as your model.
- Upload up to 8–10 reference images from which you want to draw elements.
- Write a prompt specifying what to combine, preserve, or change across the input images.
- Set your aspect ratio and output resolution.
- Click Create.
"Combine the character from image 1 with the outdoor environment in image 2. Keep the character's clothing and facial features identical, and match the lighting to the scene in image 2."Best for: High-fidelity, subject-preserving image edits — outfit swaps, product restyling, and interior design changes.Seedream v4 Edit is ByteDance’s dedicated image-to-image editing model, optimized for making precise, targeted changes to existing images while maintaining subject identity, lighting, and overall composition. It’s specifically tuned for tasks like swapping outfits, changing hair or makeup, restyling products, and transforming interior finishes — all without disrupting what you want to keep. Its context-aware approach handles depth, perspective, and lighting consistency automatically.Key capabilities:
- Subject-preserving editing — Accurately identifies and preserves the main subject (person, product, or object) while applying targeted changes.
- High-fidelity output — Reliable skin tones, fabric and material detail, logo reproduction, and fine edge clarity on people and products.
- Natural language spatial instructions — Describe edits in plain language (e.g., “change the wall color to sage green”) for intuitive control.
- Context-aware transformations — Automatically maintains depth, perspective, and lighting consistency when integrating new elements — no manual blending required.
- Multi-reference support — Use multiple reference images to guide editing style, subject identity, or material appearance.
- 4K output — Produces high-definition results up to 4K resolution.
- Select Edit Mode and choose Seedream v4 Edit as your model.
- Upload the image you want to edit as your base image.
- Upload additional reference images (optional) to guide style or subject identity.
- Write a prompt specifying what you want to change and what should stay the same.
- Click Create.
"Change the model's jacket to a cream-colored oversized blazer. Keep her face, hair, pose, and all background elements exactly as they are."Supported models summary
| Model | Primary strength |
|---|---|
| Flux Kontext Max | Precise local edits, character consistency |
| Nano Banana | Complex natural language instructions |
| ChatGPT Image | Multi-element scenes, embedded text |
| Imagine You | Character variations from a single photo |
| Runway Reference | Cross-image style and visual consistency |
| Seedream v4 Edit | Dream-like, artistic edits with creative flair |
| Qwen Edit | Intuitive edits with style transfer and fine-tuning |

