Vidu Contest
WaveSpeed.ai
Home/Explore/Best Image Tool/wavespeed-ai/qwen-image/text-to-image
text-to-image

text-to-image

Qwen-Image 20B MMDiT

wavespeed-ai/qwen-image/text-to-image

Qwen-Image is a 20B MMDiT next-gen text-to-image model that generates images from text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input
width
height
1024 × 1024 px
Range: 256 - 1536
If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API.
If enabled, the output will be encoded into a BASE64 string instead of a URL. This property is only available through the API.

Idle

A beautiful Chinese woman wearing a “WaveSpeedAI” logo T-shirt is smiling at the camera with a black marker. Behind her, a glass panel reads in handwriting, "Meet Qwen Image - a powerful image foundation model capable of complex text rendering and precise image editing."

Your request will cost $0.02 per run.

For $1 you can run this model approximately 50 times.

One more thing:

ExamplesView all

A beautiful Chinese woman wearing a “WaveSpeedAI” logo T-shirt is smiling at the camera with a black marker. Behind her, a glass panel reads in handwriting, "Meet Qwen Image - a powerful image foundation model capable of complex text rendering and precise image editing."
Bookstore window display. A sign displays "New Arrivals This Week". Below, a shelf tag with the text "Best-Selling Novels Here". To the side, a colorful poster advertises "Author Meet And Greet on Saturday" with a central portrait of the author. There are four books on the bookshelf, namely "The light between worlds" "When stars are scattered" "The slient patient" "The night circus"
A girl with little freckles and messy red hair sitting on a rooftop during sunset, denim jacket slightly worn, holding a Polaroid camera, city skyline glowing in soft hues behind her
A man in a suit is standing in front of the window, looking at the bright moon outside the window. The man is holding a yellowed paper with handwritten words on it: "A lantern moon climbs through the silver night, Unfurling quiet dreams across the sky, Each star a whispered promise wrapped in light, That dawn will bloom, though darkness wanders by." There is a cute cat on the windowsill.
An elven queen with long silver hair and glowing blue eyes, wearing a magnificent white gown adorned with jewels. She stands in an ancient, mystical forest surrounded by luminous plants and mist. Moonlight filtering through the canopy, creating magical light and shadows. Fantasy art, epic, intricate details, masterpiece, digital painting.
A Victorian noble lady with an elegant updo and a gentle gaze, wearing a deep red velvet dress, sitting in an ornate library. Warm candlelight illuminates her face and the surrounding bookshelves. In the style of John Singer Sargent, classic oil painting, expressive brushstrokes, masterpiece, rich textures.
A slide featuring artistic, decorative shapes framing neatly arranged textual information styled as  an elegant infographic. At the very center, the title "Habits for Emotional Wellbeing" appears clearly, surrounded by a symmetrical floral pattern. On the left upper section, "Practice Mindfulness" appears next to a minimalist lotus flower icon, with the short sentence, "Be present, observe without judging, accept without resisting". Next, moving downward, "Cultivate Gratitude" is written near an open hand illustration, along with the line, "Appreciate simple joys and acknowledge positivity daily". Further down, towards bottom-left, "Stay Connected" accompanied by a minimalistic chat bubble icon reads "Build and maintain meaningful relationships to sustain emotional energy". At bottom right corner, "Prioritize Sleep" is depicted next to a crescent moon illustration, accompanied by the text "Quality sleep benefits both body and mind". Moving upward along the right side, "Regular Physical Activity" is near a jogging runner icon, stating: "Exercise boosts mood and relieves anxiety". Finally, at the top right side, appears "Continuous Learning" paired with a book icon, stating "Engage in new skill and knowledge for growth". The slide layout beautifully balances clarity and artistry, guiding the viewers naturally along each text segment.
A movie poster. The first row is the movie title, which reads "Imagination Unleashed". The second row is the movie subtitle, which reads "Enter a world beyond your imagination". The third row reads "Cast: Qwen-Image". The fourth row reads "Director: The Collective Imagination of Humanity". The central visual features a sleek, futuristic computer from which radiant colors, whimsical creatures, and dynamic, swirling patterns explosively emerge, filling the composition with energy, motion, and surreal creativity. The background transitions from dark, cosmic tones into a luminous, dreamlike expanse, evoking a digital fantasy realm. At the bottom edge, the text "Launching in the Cloud, August 2025" appears in bold, modern sans-serif font with a glowing, slightly transparent effect, evoking a high-tech, cinematic aesthetic. The overall style blends sci-fi surrealism with graphic design flair—sharp contrasts, vivid color grading, and layered visual depth—reminiscent of visionary concept art and digital matte painting, 32K resolution, ultra-detailed.
Real style, three different looking puppies have a camera in front of them and the puppies look at it curiously. Elevated view
A female athlete with defined muscles and a tight ponytail, preparing for a run. She is wearing a black sports top and leggings, her gaze focused and determined. The background is a city running track at dawn with a light mist on the ground. Dynamic action shot, strong rim lighting outlining her silhouette, powerful and energetic, high contrast.

README

Qwen-Image (Text-to-Image)

Qwen-Image is a 20B MMDiT-based text-to-image generation model, especially strong at native text rendering in both English and Chinese. It is a powerful creative tool for posters, comics, and visual storytelling, while also excelling at general image generation from photorealism to anime.

Why it looks great

  • SOTA text rendering: Rivals GPT-4o in English and best-in-class for Chinese.
  • In-pixel text generation: Text is fully integrated into the image (no overlays).
  • Bilingual typography: Handles diverse fonts, styles, and complex layouts.
  • General image capability: Excels across styles—photorealistic, anime, impressionist, minimalist.

Limits and Performance

  • Max resolution per job: up to 1536 × 1536 pixels
  • Custom size: manually set width & height
  • Output formats: JPEG / PNG / WEBP
  • Processing speed: ~5–8 seconds per image (depends on size & queue)
  • Input prompt: supports detailed, multi-line descriptions

Price

Only $0.02 per image!!!

How to Use

  1. Write a prompt describing the image (can include embedded text).
  2. Adjust size (width & height, up to 1536×1536).
  3. Set a seed for reproducibility.
  4. Choose output_format.
  5. Run the job and download the generated image.

Pro tips for best quality

  • For poster design, explicitly describe font style, placement, and mood.
  • For bilingual text, specify both Chinese and English in the prompt.
  • Use consistent seeds to regenerate similar layouts with slight variations.
  • Keep height:width ratio balanced for best typography results.