Home/Explore/wavespeed-ai/moondream3-preview/caption

vision-language

wavespeed-ai/moondream3-preview/caption

Generate descriptive captions for images. Moondream3 Caption creates short, normal, or long descriptions to help you understand and describe visual content.

Hint: You can drag and drop a file or click to upload

If set to true, the function will wait for the result before returning the response. This property is only available through the API.

Idle

Your request will cost $0.005 per run.

For $1 you can run this model approximately 200 times.

README

Moondream3 Caption - Image Captioning

Moondream3 Caption is a specialized vision-language model for generating descriptive captions for images.

Features

  • Flexible Caption Length: Choose between short, normal, or long descriptions
  • Accurate Descriptions: Trained to generate accurate, contextual captions
  • Fast Processing: Optimized for quick caption generation
  • Natural Language: Human-readable, grammatically correct descriptions

Example Usage

Short Caption

{
  "image": "https://example.com/photo.jpg",
  "length": "short"
}

Normal Caption

{
  "image": "https://example.com/photo.jpg",
  "length": "normal"
}

Long Caption

{
  "image": "https://example.com/photo.jpg",
  "length": "long"
}

Best Practices

  • Use 'short' for quick summaries
  • Use 'normal' for balanced descriptions
  • Use 'long' for detailed narratives
  • Supported formats: JPEG, PNG, WebP
  • Max image size: 10MB

Pricing

Fixed price per request. Contact WaveSpeed for volume discounts.