API for object detection, depth estimation, and distance prediction using computer vision models
This API provides access to advanced computer vision models for real-time image processing. It leverages:
The API supports both HTTP and WebSocket protocols:
Process a single image via HTTP request
Stream images for real-time processing via WebSocket
Process a single image for object detection, depth estimation, and distance prediction.
Content-Type: multipart/form-data
| Parameter | Type | Required | Description |
|---|---|---|---|
| file | File | Yes | The image file to process (JPEG, PNG) |
Returns a JSON object containing:
| Field | Type | Description |
|---|---|---|
| objects | Array | Array of detected objects with their properties |
| objects[].class | String | Class of the detected object (e.g., 'car', 'person') |
| objects[].distance_estimated | Number | Estimated distance of the object |
| objects[].features | Object | Features used for prediction (bounding box, depth information) |
| frame_id | Number | ID of the processed frame (0 for HTTP requests) |
| timings | Object | Processing time metrics for each step |
{
"objects": [
{
"class": "car",
"distance_estimated": 15.42,
"features": {
"xmin": 120.5,
"ymin": 230.8,
"xmax": 350.2,
"ymax": 480.3,
"mean_depth": 0.75,
"depth_mean_trim": 0.72,
"depth_median": 0.71,
"width": 229.7,
"height": 249.5
}
},
{
"class": "person",
"distance_estimated": 8.76,
"features": {
"xmin": 450.1,
"ymin": 200.4,
"xmax": 510.8,
"ymax": 380.2,
"mean_depth": 0.58,
"depth_mean_trim": 0.56,
"depth_median": 0.55,
"width": 60.7,
"height": 179.8
}
}
],
"frame_id": 0,
"timings": {
"decode_time": 0.015,
"models_time": 0.452,
"process_time": 0.063,
"json_time": 0.021,
"total_time": 0.551
}
}
| Status Code | Description |
|---|---|
| 200 | OK - Request was successful |
| 400 | Bad Request - Empty file or invalid format |
| 500 | Internal Server Error - Processing error |
Stream images for real-time processing and get instant results. Ideal for video feeds and applications requiring continuous processing.
Note: WebSocket offers better performance for real-time applications. Use this endpoint for processing video feeds or when you need to process multiple images in rapid succession.
Send binary image data directly over the WebSocket connection:
The WebSocket API returns the same JSON structure as the HTTP API, with incrementing frame_id values.
{
"objects": [
{
"class": "car",
"distance_estimated": 14.86,
"features": {
"xmin": 125.3,
"ymin": 235.1,
"xmax": 355.7,
"ymax": 485.9,
"mean_depth": 0.77,
"depth_mean_trim": 0.74,
"depth_median": 0.73,
"width": 230.4,
"height": 250.8
}
}
],
"frame_id": 42,
"timings": {
"decode_time": 0.014,
"models_time": 0.445,
"process_time": 0.061,
"json_time": 0.020,
"total_time": 0.540
}
}
You can test the API directly using the interactive Swagger UI below:
Upload an image to test the WebSocket endpoint:
Status: Disconnected
Last Response: