vision-language
Idle
Your request will cost $0.001 per run.
For $1 you can run this model approximately 1000 times.
Moondream3 Point is a specialized vision-language model for locating specific objects in images and returning their coordinate points.
{
"image": "https://example.com/photo.jpg",
"prompt": "person"
}
{
"image": "https://example.com/photo.jpg",
"prompt": "car"
}
{
"image": "https://example.com/photo.jpg",
"prompt": "laptop"
}
Returns coordinate points in the format: [x, y] where x and y are the pixel coordinates of the object's center or key point.
Fixed price per request. Contact WaveSpeed for volume discounts.