Real-Time Image Processing API Documentation

Overview

This API provides access to advanced computer vision models for real-time image processing. It leverages:

DETR (DEtection TRansformer) - For accurate object detection
GLPN (Global-Local Path Networks) - For depth estimation
LSTM Model - For Z-location prediction

The API supports both HTTP and WebSocket protocols:

POST /api/predict

Process a single image via HTTP request

WS /ws/predict

Stream images for real-time processing via WebSocket

HTTP API

WebSocket API

Try it

HTTP API Reference

POST /api/predict

Process a single image for object detection, depth estimation, and distance prediction.

Request

Content-Type: multipart/form-data

Parameter	Type	Required	Description
file	File	Yes	The image file to process (JPEG, PNG)

Request Example

# Python example using requests import requests url = "http://localhost:8000/api/predict" files = {"file": open("image.jpg", "rb")} response = requests.post(url, files=files) data = response.json() print(data)

Response

Returns a JSON object containing:

Field	Type	Description
objects	Array	Array of detected objects with their properties
objects[].class	String	Class of the detected object (e.g., 'car', 'person')
objects[].distance_estimated	Number	Estimated distance of the object
objects[].features	Object	Features used for prediction (bounding box, depth information)
frame_id	Number	ID of the processed frame (0 for HTTP requests)
timings	Object	Processing time metrics for each step

Response Example

{
  "objects": [
    {
      "class": "car",
      "distance_estimated": 15.42,
      "features": {
        "xmin": 120.5,
        "ymin": 230.8,
        "xmax": 350.2,
        "ymax": 480.3,
        "mean_depth": 0.75,
        "depth_mean_trim": 0.72,
        "depth_median": 0.71,
        "width": 229.7,
        "height": 249.5
      }
    },
    {
      "class": "person",
      "distance_estimated": 8.76,
      "features": {
        "xmin": 450.1,
        "ymin": 200.4,
        "xmax": 510.8,
        "ymax": 380.2,
        "mean_depth": 0.58,
        "depth_mean_trim": 0.56,
        "depth_median": 0.55,
        "width": 60.7,
        "height": 179.8
      }
    }
  ],
  "frame_id": 0,
  "timings": {
    "decode_time": 0.015,
    "models_time": 0.452,
    "process_time": 0.063,
    "json_time": 0.021,
    "total_time": 0.551
  }
}

HTTP Status Codes

Status Code	Description
200	OK - Request was successful
400	Bad Request - Empty file or invalid format
500	Internal Server Error - Processing error

WebSocket API Reference

WebSocket /ws/predict

Stream images for real-time processing and get instant results. Ideal for video feeds and applications requiring continuous processing.

Note: WebSocket offers better performance for real-time applications. Use this endpoint for processing video feeds or when you need to process multiple images in rapid succession.

Connection

# JavaScript example const socket = new WebSocket('ws://localhost:8000/ws/predict'); socket.onopen = function(e) { console.log('Connection established'); }; socket.onmessage = function(event) { const response = JSON.parse(event.data); console.log('Received:', response); }; socket.onclose = function(event) { console.log('Connection closed'); };

Sending Images

Send binary image data directly over the WebSocket connection:

// JavaScript example: Sending an image from canvas or file function sendImageFromCanvas(canvas) { canvas.toBlob(function(blob) { const reader = new FileReader(); reader.onload = function() { socket.send(reader.result); }; reader.readAsArrayBuffer(blob); }, 'image/jpeg'); } // Or from input file fileInput.onchange = function() { const file = this.files[0]; const reader = new FileReader(); reader.onload = function() { socket.send(reader.result); }; reader.readAsArrayBuffer(file); };

Response Format

The WebSocket API returns the same JSON structure as the HTTP API, with incrementing frame_id values.

Response Example

{
  "objects": [
    {
      "class": "car",
      "distance_estimated": 14.86,
      "features": {
        "xmin": 125.3,
        "ymin": 235.1,
        "xmax": 355.7,
        "ymax": 485.9,
        "mean_depth": 0.77,
        "depth_mean_trim": 0.74,
        "depth_median": 0.73,
        "width": 230.4,
        "height": 250.8
      }
    }
  ],
  "frame_id": 42,
  "timings": {
    "decode_time": 0.014,
    "models_time": 0.445,
    "process_time": 0.061,
    "json_time": 0.020,
    "total_time": 0.540
  }
}

Try The API

You can test the API directly using the interactive Swagger UI below:

Simple WebSocket Client

Upload an image to test the WebSocket endpoint:

Status: Disconnected

Last Response: