vision-language
Idle
Your request will cost $0.001 per run.
For $1 you can run this model approximately 1000 times.
Moondream3 Detect is a specialized vision-language model for finding objects in images and returning their bounding box coordinates.
{
"image": "https://example.com/photo.jpg",
"prompt": "car"
}
{
"image": "https://example.com/photo.jpg",
"prompt": "person"
}
{
"image": "https://example.com/photo.jpg",
"prompt": "bicycle"
}
Returns bounding box coordinates in the format: [x1, y1, x2, y2] where (x1, y1) is the top-left corner and (x2, y2) is the bottom-right corner.
Fixed price per request. Contact WaveSpeed for volume discounts.