Spaces:

Really-amin
/

Datasourceforcryptocurrency

Running

File size: 16,313 Bytes

48ae4e0

# Charts Validation & Hardening Documentation

## Overview

This document provides comprehensive documentation for the newly implemented chart endpoints with validation and security hardening.

## New Endpoints

### 1. `/api/charts/rate-limit-history`

**Purpose:** Retrieve hourly rate limit usage history for visualization in charts.

**Method:** `GET`

**Parameters:**

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| `hours` | integer | No | 24 | 1-168 | Hours of history to retrieve (clamped server-side) |
| `providers` | string | No | top 5 | max 5, comma-separated | Provider names to include |

**Response Schema:**

```json
[
  {
    "provider": "coingecko",
    "hours": 24,
    "series": [
      {
        "t": "2025-11-10T13:00:00Z",
        "pct": 42.5
      },
      {
        "t": "2025-11-10T14:00:00Z",
        "pct": 38.2
      }
    ],
    "meta": {
      "limit_type": "per_minute",
      "limit_value": 30
    }
  }
]
```

**Response Fields:**

- `provider` (string): Provider name
- `hours` (integer): Number of hours covered
- `series` (array): Time series data points
  - `t` (string): ISO 8601 timestamp with 'Z' suffix
  - `pct` (number): Rate limit usage percentage [0-100]
- `meta` (object): Rate limit metadata
  - `limit_type` (string): Type of limit (per_second, per_minute, per_hour, per_day)
  - `limit_value` (integer|null): Limit value, null if no limit configured

**Behavior:**

- Returns one series object per provider
- Each series contains exactly `hours` data points (one per hour)
- Hours without data are filled with `pct: 0.0`
- If provider has no rate limit configured, returns `meta.limit_value: null` and `pct: 0`
- Default: Returns up to 5 providers with configured rate limits
- Series ordered chronologically (oldest to newest)

**Examples:**

```bash
# Default: Last 24 hours, top 5 providers
curl "http://localhost:7860/api/charts/rate-limit-history"

# Custom: 48 hours, specific providers
curl "http://localhost:7860/api/charts/rate-limit-history?hours=48&providers=coingecko,cmc,etherscan"

# Single provider, 1 week
curl "http://localhost:7860/api/charts/rate-limit-history?hours=168&providers=binance"
```

**Error Responses:**

- `400 Bad Request`: Invalid provider name
  ```json
  {
    "detail": "Invalid provider name: invalid_xyz. Must be one of: ..."
  }
  ```
- `422 Unprocessable Entity`: Invalid parameter type
- `500 Internal Server Error`: Database or processing error

---

### 2. `/api/charts/freshness-history`

**Purpose:** Retrieve hourly data freshness/staleness history for visualization.

**Method:** `GET`

**Parameters:**

| Parameter | Type | Required | Default | Constraints | Description |
|-----------|------|----------|---------|-------------|-------------|
| `hours` | integer | No | 24 | 1-168 | Hours of history to retrieve (clamped server-side) |
| `providers` | string | No | top 5 | max 5, comma-separated | Provider names to include |

**Response Schema:**

```json
[
  {
    "provider": "coingecko",
    "hours": 24,
    "series": [
      {
        "t": "2025-11-10T13:00:00Z",
        "staleness_min": 7.2,
        "ttl_min": 15,
        "status": "fresh"
      },
      {
        "t": "2025-11-10T14:00:00Z",
        "staleness_min": 999.0,
        "ttl_min": 15,
        "status": "stale"
      }
    ],
    "meta": {
      "category": "market_data",
      "default_ttl": 1
    }
  }
]
```

**Response Fields:**

- `provider` (string): Provider name
- `hours` (integer): Number of hours covered
- `series` (array): Time series data points
  - `t` (string): ISO 8601 timestamp with 'Z' suffix
  - `staleness_min` (number): Data staleness in minutes (999.0 indicates no data)
  - `ttl_min` (integer): TTL threshold for this provider's category
  - `status` (string): Derived status: "fresh", "aging", or "stale"
- `meta` (object): Provider metadata
  - `category` (string): Provider category
  - `default_ttl` (integer): Default TTL for category (minutes)

**Status Derivation:**

```
fresh:  staleness_min <= ttl_min
aging:  ttl_min < staleness_min <= ttl_min * 2
stale:  staleness_min > ttl_min * 2  OR  no data (999.0)
```

**TTL by Category:**

| Category | TTL (minutes) |
|----------|---------------|
| market_data | 1 |
| blockchain_explorers | 5 |
| defi | 10 |
| news | 15 |
| default | 5 |

**Behavior:**

- Returns one series object per provider
- Each series contains exactly `hours` data points (one per hour)
- Hours without data are marked with `staleness_min: 999.0` and `status: "stale"`
- Default: Returns up to 5 most active providers
- Series ordered chronologically (oldest to newest)

**Examples:**

```bash
# Default: Last 24 hours, top 5 providers
curl "http://localhost:7860/api/charts/freshness-history"

# Custom: 72 hours, specific providers
curl "http://localhost:7860/api/charts/freshness-history?hours=72&providers=coingecko,binance"

# Single provider, 3 days
curl "http://localhost:7860/api/charts/freshness-history?hours=72&providers=etherscan"
```

**Error Responses:**

- `400 Bad Request`: Invalid provider name
- `422 Unprocessable Entity`: Invalid parameter type
- `500 Internal Server Error`: Database or processing error

---

## Security & Validation

### Input Validation

1. **Hours Parameter:**
   - Server-side clamping: `1 <= hours <= 168`
   - Invalid types rejected with `422 Unprocessable Entity`
   - Out-of-range values automatically clamped (no error)

2. **Providers Parameter:**
   - Allow-list enforcement: Only valid provider names accepted
   - Max 5 providers enforced (excess silently truncated)
   - Invalid names trigger `400 Bad Request` with detailed error
   - SQL injection prevention: No raw SQL, parameterized queries only
   - XSS prevention: Input sanitized (strip whitespace)

3. **Rate Limiting (Recommended):**
   - Implement: 60 requests/minute per IP for chart routes
   - Use middleware or reverse proxy (nginx/cloudflare)

### Security Measures Implemented

✓ Allow-list validation for provider names
✓ Parameter clamping (hours: 1-168)
✓ Max provider limit (5)
✓ SQL injection prevention (ORM with parameterized queries)
✓ XSS prevention (input sanitization)
✓ Comprehensive error handling with safe error messages
✓ Logging of all chart requests for monitoring
✓ No sensitive data exposure in responses

### Edge Cases Handled

- Empty provider list → Returns default providers
- Unknown provider → 400 with valid options listed
- Hours out of bounds → Clamped to [1, 168]
- No data available → Returns empty series or 999.0 staleness
- Provider with no rate limit → Returns null limit_value
- Whitespace in provider names → Trimmed automatically
- Mixed valid/invalid providers → Rejects entire request

---

## Testing

### Automated Tests

Run the comprehensive test suite:

```bash
# Run all chart tests
pytest tests/test_charts.py -v

# Run specific test class
pytest tests/test_charts.py::TestRateLimitHistory -v

# Run with coverage
pytest tests/test_charts.py --cov=api --cov-report=html
```

**Test Coverage:**

- ✓ Default parameter behavior
- ✓ Custom time ranges (48h, 72h)
- ✓ Provider selection and filtering
- ✓ Response schema validation
- ✓ Percentage range validation [0-100]
- ✓ Timestamp format validation
- ✓ Status derivation logic
- ✓ Edge cases (invalid providers, hours clamping)
- ✓ Security (SQL injection, XSS prevention)
- ✓ Performance (response time < 500ms)
- ✓ Concurrent request handling

### Manual Sanity Checks

Run the CLI sanity check script:

```bash
# Ensure backend is running
python app.py &

# Run sanity checks
./tests/sanity_checks.sh
```

**Checks performed:**

1. Rate limit history (default params)
2. Freshness history (default params)
3. Custom time ranges
4. Response schema validation
5. Invalid provider rejection
6. Hours parameter clamping
7. Performance measurement
8. Edge case handling

---

## Performance Targets

### Response Time (P95)

| Environment | Target | Conditions |
|-------------|--------|------------|
| Production | < 200ms | 24h / 5 providers |
| Development | < 500ms | 24h / 5 providers |

### Optimization Strategies

1. **Database Indexing:**
   - Indexed: `timestamp`, `provider_id` columns
   - Composite indexes on frequently queried combinations

2. **Query Optimization:**
   - Hourly bucketing done in-memory (fast)
   - Limited to 168 hours max (1 week)
   - Provider limit enforced early (max 5)

3. **Caching (Future Enhancement):**
   - Consider Redis cache for 1-minute TTL
   - Cache key: `chart:type:hours:providers`
   - Invalidate on new data ingestion

4. **Connection Pooling:**
   - SQLAlchemy pool size: 10
   - Max overflow: 20
   - Recycle connections every 3600s

---

## Observability & Monitoring

### Logging

All chart requests are logged with:

```json
{
  "timestamp": "2025-11-11T01:00:00Z",
  "level": "INFO",
  "logger": "api_endpoints",
  "message": "Rate limit history: 3 providers, 48h"
}
```

### Recommended Metrics (Prometheus/Grafana)

```python
# Counter: Total requests per endpoint
chart_requests_total{endpoint="rate_limit_history"} 1523

# Histogram: Response time distribution
chart_response_time_seconds{endpoint="rate_limit_history", le="0.1"} 1450
chart_response_time_seconds{endpoint="rate_limit_history", le="0.2"} 1510

# Gauge: Current rate limit usage per provider
ratelimit_usage_pct{provider="coingecko"} 87.5

# Gauge: Freshness staleness per provider
freshness_staleness_min{provider="binance"} 3.2

# Counter: Invalid request count
chart_invalid_requests_total{endpoint="rate_limit_history", reason="invalid_provider"} 23
```

### Recommended Alerts

```yaml
# Critical: Rate limit exhaustion
- alert: RateLimitExhaustion
  expr: ratelimit_usage_pct > 90
  for: 3h
  annotations:
    summary: "Provider {{ $labels.provider }} at {{ $value }}% rate limit"
    action: "Add API keys or reduce request frequency"

# Critical: Data staleness
- alert: DataStale
  expr: freshness_staleness_min > ttl_min
  for: 15m
  annotations:
    summary: "Provider {{ $labels.provider }} data is stale ({{ $value }}m old)"
    action: "Check scheduler, verify API connectivity"

# Warning: Chart endpoint slow
- alert: ChartEndpointSlow
  expr: histogram_quantile(0.95, chart_response_time_seconds) > 0.2
  for: 10m
  annotations:
    summary: "Chart endpoint P95 latency above 200ms"
    action: "Check database query performance"
```

---

## Database Schema

### Tables Used

**RateLimitUsage**
```sql
CREATE TABLE rate_limit_usage (
    id INTEGER PRIMARY KEY,
    timestamp DATETIME NOT NULL,  -- INDEXED
    provider_id INTEGER NOT NULL,  -- FOREIGN KEY, INDEXED
    limit_type VARCHAR(20),
    limit_value INTEGER,
    current_usage INTEGER,
    percentage REAL,
    reset_time DATETIME
);
```

**DataCollection**
```sql
CREATE TABLE data_collection (
    id INTEGER PRIMARY KEY,
    provider_id INTEGER NOT NULL,  -- FOREIGN KEY, INDEXED
    actual_fetch_time DATETIME NOT NULL,
    data_timestamp DATETIME,
    staleness_minutes REAL,
    record_count INTEGER,
    on_schedule BOOLEAN
);
```

---

## Frontend Integration

### Chart.js Example (Rate Limit)

```javascript
// Fetch rate limit history
const response = await fetch('/api/charts/rate-limit-history?hours=48&providers=coingecko,cmc');
const data = await response.json();

// Build Chart.js dataset
const datasets = data.map(series => ({
    label: series.provider,
    data: series.series.map(p => ({
        x: new Date(p.t),
        y: p.pct
    })),
    borderColor: getColorForProvider(series.provider),
    tension: 0.3
}));

// Create chart
new Chart(ctx, {
    type: 'line',
    data: { datasets },
    options: {
        scales: {
            x: { type: 'time', time: { unit: 'hour' } },
            y: { min: 0, max: 100, title: { text: 'Usage %' } }
        },
        interaction: { mode: 'index', intersect: false },
        plugins: {
            legend: { display: true, position: 'bottom' },
            tooltip: {
                callbacks: {
                    label: ctx => `${ctx.dataset.label}: ${ctx.parsed.y.toFixed(1)}%`
                }
            }
        }
    }
});
```

### Chart.js Example (Freshness)

```javascript
// Fetch freshness history
const response = await fetch('/api/charts/freshness-history?hours=72&providers=binance');
const data = await response.json();

// Build datasets with status-based colors
const datasets = data.map(series => ({
    label: series.provider,
    data: series.series.map(p => ({
        x: new Date(p.t),
        y: p.staleness_min,
        status: p.status
    })),
    borderColor: getColorForProvider(series.provider),
    segment: {
        borderColor: ctx => {
            const point = ctx.p1.$context.raw;
            return point.status === 'fresh' ? 'green'
                 : point.status === 'aging' ? 'orange'
                 : 'red';
        }
    }
}));

// Create chart with TTL reference line
new Chart(ctx, {
    type: 'line',
    data: { datasets },
    options: {
        scales: {
            x: { type: 'time' },
            y: { title: { text: 'Staleness (min)' } }
        },
        plugins: {
            annotation: {
                annotations: {
                    ttl: {
                        type: 'line',
                        yMin: data[0].meta.default_ttl,
                        yMax: data[0].meta.default_ttl,
                        borderColor: 'rgba(255, 99, 132, 0.5)',
                        borderWidth: 2,
                        label: { content: 'TTL Threshold', enabled: true }
                    }
                }
            }
        }
    }
});
```

---

## Troubleshooting

### Common Issues

**1. Empty series returned**

- Check if providers have data in the time range
- Verify provider names are correct (case-sensitive)
- Ensure database has historical data

**2. Response time > 500ms**

- Check database indexes exist
- Reduce `hours` parameter
- Limit number of providers
- Consider adding caching layer

**3. 400 Bad Request on valid provider**

- Verify provider is in database: `SELECT name FROM providers`
- Check for typos or case mismatch
- Ensure provider has not been renamed

**4. Missing data points (gaps in series)**

- Normal behavior: gaps filled with zeros/999.0
- Check data collection scheduler is running
- Review logs for collection failures

---

## Changelog

### v1.0.0 - 2025-11-11

**Added:**
- `/api/charts/rate-limit-history` endpoint
- `/api/charts/freshness-history` endpoint
- Comprehensive input validation
- Security hardening (allow-list, clamping, sanitization)
- Automated test suite (pytest)
- CLI sanity check script
- Full API documentation

**Security:**
- SQL injection prevention
- XSS prevention
- Parameter validation and clamping
- Allow-list enforcement for providers
- Max provider limit (5)

**Testing:**
- 20+ automated tests
- Schema validation tests
- Security tests
- Performance tests
- Edge case coverage

---

## Future Enhancements

### Phase 2 (Optional)

1. **Provider Picker UI Component**
   - Dropdown with multi-select (max 5)
   - Persist selection in localStorage
   - Auto-refresh on selection change

2. **Advanced Filtering**
   - Filter by category
   - Filter by rate limit status (ok/warning/critical)
   - Filter by freshness status (fresh/aging/stale)

3. **Aggregation Options**
   - Category-level aggregation
   - System-wide average/percentile
   - Compare providers side-by-side

4. **Export Functionality**
   - CSV export
   - JSON export
   - PNG/SVG chart export

5. **Real-time Updates**
   - WebSocket streaming for live updates
   - Auto-refresh without flicker
   - Smooth transitions on new data

6. **Historical Analysis**
   - Trend detection (improving/degrading)
   - Anomaly detection
   - Predictive alerts

---

## Support & Maintenance

### Code Location

- Endpoints: `api/endpoints.py` (lines 947-1250)
- Tests: `tests/test_charts.py`
- Sanity checks: `tests/sanity_checks.sh`
- Documentation: `CHARTS_VALIDATION_DOCUMENTATION.md`

### Contact

For issues or questions:
- Create GitHub issue with `[charts]` prefix
- Tag: `enhancement`, `bug`, or `documentation`
- Provide: Request details, expected vs actual behavior, logs

---

## License

Same as parent project.