Spaces:

Really-amin
/

Datasourceforcryptocurrency

Running

App Files Files Community

Datasourceforcryptocurrency / docs /components /CHARTS_VALIDATION_DOCUMENTATION.md

Really-amin

Upload 301 files

e4e4574 verified 9 days ago

preview code

raw

history blame contribute delete

16.3 kB

	# Charts Validation & Hardening Documentation

	## Overview

	This document provides comprehensive documentation for the newly implemented chart endpoints with validation and security hardening.

	## New Endpoints

	### 1. `/api/charts/rate-limit-history`

	Purpose: Retrieve hourly rate limit usage history for visualization in charts.

	Method: `GET`

	Parameters:

	\| Parameter \| Type \| Required \| Default \| Constraints \| Description \|
	\|-----------\|------\|----------\|---------\|-------------\|-------------\|
	\| `hours` \| integer \| No \| 24 \| 1-168 \| Hours of history to retrieve (clamped server-side) \|
	\| `providers` \| string \| No \| top 5 \| max 5, comma-separated \| Provider names to include \|

	Response Schema:

	```json
	[
	{
	"provider": "coingecko",
	"hours": 24,
	"series": [
	{
	"t": "2025-11-10T13:00:00Z",
	"pct": 42.5
	},
	{
	"t": "2025-11-10T14:00:00Z",
	"pct": 38.2
	}
	],
	"meta": {
	"limit_type": "per_minute",
	"limit_value": 30
	}
	}
	]
	```

	Response Fields:

	- `provider` (string): Provider name
	- `hours` (integer): Number of hours covered
	- `series` (array): Time series data points
	- `t` (string): ISO 8601 timestamp with 'Z' suffix
	- `pct` (number): Rate limit usage percentage [0-100]
	- `meta` (object): Rate limit metadata
	- `limit_type` (string): Type of limit (per_second, per_minute, per_hour, per_day)
	- `limit_value` (integer\|null): Limit value, null if no limit configured

	Behavior:

	- Returns one series object per provider
	- Each series contains exactly `hours` data points (one per hour)
	- Hours without data are filled with `pct: 0.0`
	- If provider has no rate limit configured, returns `meta.limit_value: null` and `pct: 0`
	- Default: Returns up to 5 providers with configured rate limits
	- Series ordered chronologically (oldest to newest)

	Examples:

	```bash
	# Default: Last 24 hours, top 5 providers
	curl "http://localhost:7860/api/charts/rate-limit-history"

	# Custom: 48 hours, specific providers
	curl "http://localhost:7860/api/charts/rate-limit-history?hours=48&providers=coingecko,cmc,etherscan"

	# Single provider, 1 week
	curl "http://localhost:7860/api/charts/rate-limit-history?hours=168&providers=binance"
	```

	Error Responses:

	- `400 Bad Request`: Invalid provider name
	```json
	{
	"detail": "Invalid provider name: invalid_xyz. Must be one of: ..."
	}
	```
	- `422 Unprocessable Entity`: Invalid parameter type
	- `500 Internal Server Error`: Database or processing error

	---

	### 2. `/api/charts/freshness-history`

	Purpose: Retrieve hourly data freshness/staleness history for visualization.

	Method: `GET`

	Parameters:

	\| Parameter \| Type \| Required \| Default \| Constraints \| Description \|
	\|-----------\|------\|----------\|---------\|-------------\|-------------\|
	\| `hours` \| integer \| No \| 24 \| 1-168 \| Hours of history to retrieve (clamped server-side) \|
	\| `providers` \| string \| No \| top 5 \| max 5, comma-separated \| Provider names to include \|

	Response Schema:

	```json
	[
	{
	"provider": "coingecko",
	"hours": 24,
	"series": [
	{
	"t": "2025-11-10T13:00:00Z",
	"staleness_min": 7.2,
	"ttl_min": 15,
	"status": "fresh"
	},
	{
	"t": "2025-11-10T14:00:00Z",
	"staleness_min": 999.0,
	"ttl_min": 15,
	"status": "stale"
	}
	],
	"meta": {
	"category": "market_data",
	"default_ttl": 1
	}
	}
	]
	```

	Response Fields:

	- `provider` (string): Provider name
	- `hours` (integer): Number of hours covered
	- `series` (array): Time series data points
	- `t` (string): ISO 8601 timestamp with 'Z' suffix
	- `staleness_min` (number): Data staleness in minutes (999.0 indicates no data)
	- `ttl_min` (integer): TTL threshold for this provider's category
	- `status` (string): Derived status: "fresh", "aging", or "stale"
	- `meta` (object): Provider metadata
	- `category` (string): Provider category
	- `default_ttl` (integer): Default TTL for category (minutes)

	Status Derivation:

	```
	fresh: staleness_min <= ttl_min
	aging: ttl_min < staleness_min <= ttl_min * 2
	stale: staleness_min > ttl_min * 2 OR no data (999.0)
	```

	TTL by Category:

	\| Category \| TTL (minutes) \|
	\|----------\|---------------\|
	\| market_data \| 1 \|
	\| blockchain_explorers \| 5 \|
	\| defi \| 10 \|
	\| news \| 15 \|
	\| default \| 5 \|

	Behavior:

	- Returns one series object per provider
	- Each series contains exactly `hours` data points (one per hour)
	- Hours without data are marked with `staleness_min: 999.0` and `status: "stale"`
	- Default: Returns up to 5 most active providers
	- Series ordered chronologically (oldest to newest)

	Examples:

	```bash
	# Default: Last 24 hours, top 5 providers
	curl "http://localhost:7860/api/charts/freshness-history"

	# Custom: 72 hours, specific providers
	curl "http://localhost:7860/api/charts/freshness-history?hours=72&providers=coingecko,binance"

	# Single provider, 3 days
	curl "http://localhost:7860/api/charts/freshness-history?hours=72&providers=etherscan"
	```

	Error Responses:

	- `400 Bad Request`: Invalid provider name
	- `422 Unprocessable Entity`: Invalid parameter type
	- `500 Internal Server Error`: Database or processing error

	---

	## Security & Validation

	### Input Validation

	1. Hours Parameter:
	- Server-side clamping: `1 <= hours <= 168`
	- Invalid types rejected with `422 Unprocessable Entity`
	- Out-of-range values automatically clamped (no error)

	2. Providers Parameter:
	- Allow-list enforcement: Only valid provider names accepted
	- Max 5 providers enforced (excess silently truncated)
	- Invalid names trigger `400 Bad Request` with detailed error
	- SQL injection prevention: No raw SQL, parameterized queries only
	- XSS prevention: Input sanitized (strip whitespace)

	3. Rate Limiting (Recommended):
	- Implement: 60 requests/minute per IP for chart routes
	- Use middleware or reverse proxy (nginx/cloudflare)

	### Security Measures Implemented

	✓ Allow-list validation for provider names
	✓ Parameter clamping (hours: 1-168)
	✓ Max provider limit (5)
	✓ SQL injection prevention (ORM with parameterized queries)
	✓ XSS prevention (input sanitization)
	✓ Comprehensive error handling with safe error messages
	✓ Logging of all chart requests for monitoring
	✓ No sensitive data exposure in responses

	### Edge Cases Handled

	- Empty provider list → Returns default providers
	- Unknown provider → 400 with valid options listed
	- Hours out of bounds → Clamped to [1, 168]
	- No data available → Returns empty series or 999.0 staleness
	- Provider with no rate limit → Returns null limit_value
	- Whitespace in provider names → Trimmed automatically
	- Mixed valid/invalid providers → Rejects entire request

	---

	## Testing

	### Automated Tests

	Run the comprehensive test suite:

	```bash
	# Run all chart tests
	pytest tests/test_charts.py -v

	# Run specific test class
	pytest tests/test_charts.py::TestRateLimitHistory -v

	# Run with coverage
	pytest tests/test_charts.py --cov=api --cov-report=html
	```

	Test Coverage:

	- ✓ Default parameter behavior
	- ✓ Custom time ranges (48h, 72h)
	- ✓ Provider selection and filtering
	- ✓ Response schema validation
	- ✓ Percentage range validation [0-100]
	- ✓ Timestamp format validation
	- ✓ Status derivation logic
	- ✓ Edge cases (invalid providers, hours clamping)
	- ✓ Security (SQL injection, XSS prevention)
	- ✓ Performance (response time < 500ms)
	- ✓ Concurrent request handling

	### Manual Sanity Checks

	Run the CLI sanity check script:

	```bash
	# Ensure backend is running
	python app.py &

	# Run sanity checks
	./tests/sanity_checks.sh
	```

	Checks performed:

	1. Rate limit history (default params)
	2. Freshness history (default params)
	3. Custom time ranges
	4. Response schema validation
	5. Invalid provider rejection
	6. Hours parameter clamping
	7. Performance measurement
	8. Edge case handling

	---

	## Performance Targets

	### Response Time (P95)

	\| Environment \| Target \| Conditions \|
	\|-------------\|--------\|------------\|
	\| Production \| < 200ms \| 24h / 5 providers \|
	\| Development \| < 500ms \| 24h / 5 providers \|

	### Optimization Strategies

	1. Database Indexing:
	- Indexed: `timestamp`, `provider_id` columns
	- Composite indexes on frequently queried combinations

	2. Query Optimization:
	- Hourly bucketing done in-memory (fast)
	- Limited to 168 hours max (1 week)
	- Provider limit enforced early (max 5)

	3. Caching (Future Enhancement):
	- Consider Redis cache for 1-minute TTL
	- Cache key: `chart:type:hours:providers`
	- Invalidate on new data ingestion

	4. Connection Pooling:
	- SQLAlchemy pool size: 10
	- Max overflow: 20
	- Recycle connections every 3600s

	---

	## Observability & Monitoring

	### Logging

	All chart requests are logged with:

	```json
	{
	"timestamp": "2025-11-11T01:00:00Z",
	"level": "INFO",
	"logger": "api_endpoints",
	"message": "Rate limit history: 3 providers, 48h"
	}
	```

	### Recommended Metrics (Prometheus/Grafana)

	```python
	# Counter: Total requests per endpoint
	chart_requests_total{endpoint="rate_limit_history"} 1523

	# Histogram: Response time distribution
	chart_response_time_seconds{endpoint="rate_limit_history", le="0.1"} 1450
	chart_response_time_seconds{endpoint="rate_limit_history", le="0.2"} 1510

	# Gauge: Current rate limit usage per provider
	ratelimit_usage_pct{provider="coingecko"} 87.5

	# Gauge: Freshness staleness per provider
	freshness_staleness_min{provider="binance"} 3.2

	# Counter: Invalid request count
	chart_invalid_requests_total{endpoint="rate_limit_history", reason="invalid_provider"} 23
	```

	### Recommended Alerts

	```yaml
	# Critical: Rate limit exhaustion
	- alert: RateLimitExhaustion
	expr: ratelimit_usage_pct > 90
	for: 3h
	annotations:
	summary: "Provider {{ $labels.provider }} at {{ $value }}% rate limit"
	action: "Add API keys or reduce request frequency"

	# Critical: Data staleness
	- alert: DataStale
	expr: freshness_staleness_min > ttl_min
	for: 15m
	annotations:
	summary: "Provider {{ $labels.provider }} data is stale ({{ $value }}m old)"
	action: "Check scheduler, verify API connectivity"

	# Warning: Chart endpoint slow
	- alert: ChartEndpointSlow
	expr: histogram_quantile(0.95, chart_response_time_seconds) > 0.2
	for: 10m
	annotations:
	summary: "Chart endpoint P95 latency above 200ms"
	action: "Check database query performance"
	```

	---

	## Database Schema

	### Tables Used

	RateLimitUsage
	```sql
	CREATE TABLE rate_limit_usage (
	id INTEGER PRIMARY KEY,
	timestamp DATETIME NOT NULL, -- INDEXED
	provider_id INTEGER NOT NULL, -- FOREIGN KEY, INDEXED
	limit_type VARCHAR(20),
	limit_value INTEGER,
	current_usage INTEGER,
	percentage REAL,
	reset_time DATETIME
	);
	```

	DataCollection
	```sql
	CREATE TABLE data_collection (
	id INTEGER PRIMARY KEY,
	provider_id INTEGER NOT NULL, -- FOREIGN KEY, INDEXED
	actual_fetch_time DATETIME NOT NULL,
	data_timestamp DATETIME,
	staleness_minutes REAL,
	record_count INTEGER,
	on_schedule BOOLEAN
	);
	```

	---

	## Frontend Integration

	### Chart.js Example (Rate Limit)

	```javascript
	// Fetch rate limit history
	const response = await fetch('/api/charts/rate-limit-history?hours=48&providers=coingecko,cmc');
	const data = await response.json();

	// Build Chart.js dataset
	const datasets = data.map(series => ({
	label: series.provider,
	data: series.series.map(p => ({
	x: new Date(p.t),
	y: p.pct
	})),
	borderColor: getColorForProvider(series.provider),
	tension: 0.3
	}));

	// Create chart
	new Chart(ctx, {
	type: 'line',
	data: { datasets },
	options: {
	scales: {
	x: { type: 'time', time: { unit: 'hour' } },
	y: { min: 0, max: 100, title: { text: 'Usage %' } }
	},
	interaction: { mode: 'index', intersect: false },
	plugins: {
	legend: { display: true, position: 'bottom' },
	tooltip: {
	callbacks: {
	label: ctx => `${ctx.dataset.label}: ${ctx.parsed.y.toFixed(1)}%`
	}
	}
	}
	}
	});
	```

	### Chart.js Example (Freshness)

	```javascript
	// Fetch freshness history
	const response = await fetch('/api/charts/freshness-history?hours=72&providers=binance');
	const data = await response.json();

	// Build datasets with status-based colors
	const datasets = data.map(series => ({
	label: series.provider,
	data: series.series.map(p => ({
	x: new Date(p.t),
	y: p.staleness_min,
	status: p.status
	})),
	borderColor: getColorForProvider(series.provider),
	segment: {
	borderColor: ctx => {
	const point = ctx.p1.$context.raw;
	return point.status === 'fresh' ? 'green'
	: point.status === 'aging' ? 'orange'
	: 'red';
	}
	}
	}));

	// Create chart with TTL reference line
	new Chart(ctx, {
	type: 'line',
	data: { datasets },
	options: {
	scales: {
	x: { type: 'time' },
	y: { title: { text: 'Staleness (min)' } }
	},
	plugins: {
	annotation: {
	annotations: {
	ttl: {
	type: 'line',
	yMin: data[0].meta.default_ttl,
	yMax: data[0].meta.default_ttl,
	borderColor: 'rgba(255, 99, 132, 0.5)',
	borderWidth: 2,
	label: { content: 'TTL Threshold', enabled: true }
	}
	}
	}
	}
	}
	});
	```

	---

	## Troubleshooting

	### Common Issues

	1. Empty series returned

	- Check if providers have data in the time range
	- Verify provider names are correct (case-sensitive)
	- Ensure database has historical data

	2. Response time > 500ms

	- Check database indexes exist
	- Reduce `hours` parameter
	- Limit number of providers
	- Consider adding caching layer

	3. 400 Bad Request on valid provider

	- Verify provider is in database: `SELECT name FROM providers`
	- Check for typos or case mismatch
	- Ensure provider has not been renamed

	4. Missing data points (gaps in series)

	- Normal behavior: gaps filled with zeros/999.0
	- Check data collection scheduler is running
	- Review logs for collection failures

	---

	## Changelog

	### v1.0.0 - 2025-11-11

	Added:
	- `/api/charts/rate-limit-history` endpoint
	- `/api/charts/freshness-history` endpoint
	- Comprehensive input validation
	- Security hardening (allow-list, clamping, sanitization)
	- Automated test suite (pytest)
	- CLI sanity check script
	- Full API documentation

	Security:
	- SQL injection prevention
	- XSS prevention
	- Parameter validation and clamping
	- Allow-list enforcement for providers
	- Max provider limit (5)

	Testing:
	- 20+ automated tests
	- Schema validation tests
	- Security tests
	- Performance tests
	- Edge case coverage

	---

	## Future Enhancements

	### Phase 2 (Optional)

	1. Provider Picker UI Component
	- Dropdown with multi-select (max 5)
	- Persist selection in localStorage
	- Auto-refresh on selection change

	2. Advanced Filtering
	- Filter by category
	- Filter by rate limit status (ok/warning/critical)
	- Filter by freshness status (fresh/aging/stale)

	3. Aggregation Options
	- Category-level aggregation
	- System-wide average/percentile
	- Compare providers side-by-side

	4. Export Functionality
	- CSV export
	- JSON export
	- PNG/SVG chart export

	5. Real-time Updates
	- WebSocket streaming for live updates
	- Auto-refresh without flicker
	- Smooth transitions on new data

	6. Historical Analysis
	- Trend detection (improving/degrading)
	- Anomaly detection
	- Predictive alerts

	---

	## Support & Maintenance

	### Code Location

	- Endpoints: `api/endpoints.py` (lines 947-1250)
	- Tests: `tests/test_charts.py`
	- Sanity checks: `tests/sanity_checks.sh`
	- Documentation: `CHARTS_VALIDATION_DOCUMENTATION.md`

	### Contact

	For issues or questions:
	- Create GitHub issue with `[charts]` prefix
	- Tag: `enhancement`, `bug`, or `documentation`
	- Provide: Request details, expected vs actual behavior, logs

	---

	## License

	Same as parent project.