Datasourceforcryptocurrency / APL_USAGE_GUIDE.md
Really-amin's picture
Upload 325 files
9d92c17 verified
# Auto Provider Loader (APL) - Usage Guide
**Version:** 1.0
**Last Updated:** 2025-11-16
**Status:** PRODUCTION READY βœ…
---
## Overview
The Auto Provider Loader (APL) is a **real-data-only** system that automatically discovers, validates, and integrates cryptocurrency data providers (both HTTP APIs and Hugging Face models) into your application.
### Key Features
- πŸ” **Automatic Discovery** - Scans JSON resources for provider definitions
- βœ… **Real Validation** - Tests each provider with actual API calls (NO MOCKS)
- πŸ”§ **Smart Integration** - Automatically adds valid providers to config
- πŸ“Š **Comprehensive Reports** - Generates detailed validation reports
- ⚑ **Performance Optimized** - Parallel validation with configurable timeouts
- πŸ›‘οΈ **Auth Handling** - Detects and handles API key requirements
---
## Architecture
### Components
1. **provider_validator.py** - Core validation engine
- Validates HTTP JSON APIs
- Validates HTTP RPC endpoints
- Validates Hugging Face models
- Handles authentication requirements
2. **auto_provider_loader.py** - Discovery and orchestration
- Scans resource files
- Coordinates validation
- Integrates valid providers
- Generates reports
### Provider Types Supported
| Type | Description | Example |
|------|-------------|---------|
| `HTTP_JSON` | REST APIs returning JSON | CoinGecko, CoinPaprika |
| `HTTP_RPC` | JSON-RPC endpoints | Ethereum nodes, BSC RPC |
| `WEBSOCKET` | WebSocket connections | Alchemy WS, real-time feeds |
| `HF_MODEL` | Hugging Face models | Sentiment analysis models |
---
## Quick Start
### 1. Basic Usage
Run the APL to discover and validate all providers:
```bash
cd /workspace
python3 auto_provider_loader.py
```
This will:
- Scan `api-resources/*.json` for provider definitions
- Scan `providers_config*.json` for existing providers
- Discover HF models from `backend/services/`
- Validate each provider with real API calls
- Generate comprehensive reports
- Update `providers_config_extended.json` with valid providers
### 2. Understanding Output
```
================================================================================
πŸš€ AUTO PROVIDER LOADER (APL) - REAL DATA ONLY
================================================================================
πŸ“‘ PHASE 1: DISCOVERY
Found 339 HTTP provider candidates
Found 4 HF model candidates
πŸ”¬ PHASE 2: VALIDATION
βœ… Valid providers
❌ Invalid providers
⚠️ Conditionally available (requires auth)
πŸ“Š PHASE 3: COMPUTING STATISTICS
πŸ”§ PHASE 4: INTEGRATION
πŸ“ PHASE 5: GENERATING REPORTS
```
### 3. Generated Files
After running APL, you'll find:
- `PROVIDER_AUTO_DISCOVERY_REPORT.md` - Human-readable report
- `PROVIDER_AUTO_DISCOVERY_REPORT.json` - Machine-readable detailed results
- `providers_config_extended.backup.{timestamp}.json` - Config backup
- `providers_config_extended.json` - Updated with new valid providers
---
## Validation Logic
### HTTP Providers
For each HTTP provider, APL:
1. **Checks URL structure**
- Detects placeholder variables (`{API_KEY}`, `{PROJECT_ID}`)
- Identifies WebSocket endpoints (`ws://`, `wss://`)
2. **Determines endpoint type**
- JSON REST API β†’ GET request to test endpoint
- JSON-RPC β†’ POST request with `eth_blockNumber` method
3. **Makes real test call**
- 8-second timeout
- Handles redirects
- Validates response format
4. **Classifies result**
- βœ… `VALID` - Responds with 200 OK and valid data
- ❌ `INVALID` - Connection fails, timeout, or error response
- ⚠️ `CONDITIONALLY_AVAILABLE` - Requires API key (401/403)
- ⏭️ `SKIPPED` - WebSocket (requires separate validation)
### Hugging Face Models
For each HF model, APL:
1. **Queries HF Hub API**
- Checks if model exists: `GET https://huggingface.co/api/models/{model_id}`
- Does NOT download or load the full model (saves time/resources)
2. **Validates accessibility**
- βœ… `VALID` - Model found and publicly accessible
- ⚠️ `CONDITIONALLY_AVAILABLE` - Requires HF_TOKEN
- ❌ `INVALID` - Model not found (404) or other error
---
## Configuration
### Environment Variables
APL respects these environment variables:
| Variable | Purpose | Default |
|----------|---------|---------|
| `HF_TOKEN` | Hugging Face API token | None |
| `ETHERSCAN_API_KEY` | Etherscan API key | None |
| `BSCSCAN_API_KEY` | BSCScan API key | None |
| `INFURA_PROJECT_ID` | Infura project ID | None |
| `ALCHEMY_API_KEY` | Alchemy API key | None |
### Validation Timeout
Default timeout is 8 seconds. To customize:
```python
from auto_provider_loader import AutoProviderLoader
apl = AutoProviderLoader()
apl.validator.timeout = 15.0 # 15 seconds
await apl.run()
```
---
## Adding New Provider Sources
### 1. Add to JSON Resources
Create or update a JSON file in `api-resources/`:
```json
{
"registry": {
"my_providers": [
{
"id": "my_api",
"name": "My API",
"category": "market_data",
"base_url": "https://api.example.com/v1",
"endpoints": {
"prices": "/prices"
},
"auth": {
"type": "none"
}
}
]
}
}
```
### 2. Re-run APL
```bash
python3 auto_provider_loader.py
```
APL will automatically discover and validate your new provider.
---
## Integration with Existing Code
### Using Validated Providers
After APL runs, valid providers are in `providers_config_extended.json`:
```python
import json
# Load validated providers
with open('providers_config_extended.json', 'r') as f:
config = json.load(f)
# Get all valid providers
valid_providers = config['providers']
# Use a specific provider
coingecko = valid_providers['coingecko']
print(f"Provider: {coingecko['name']}")
print(f"Category: {coingecko['category']}")
print(f"Response time: {coingecko['response_time_ms']}ms")
```
### Filtering by Category
```python
# Get all market data providers
market_providers = {
pid: data for pid, data in valid_providers.items()
if data.get('category') == 'market_data'
}
```
---
## Conditional Providers
Providers marked as `CONDITIONALLY_AVAILABLE` require API keys:
### 1. Check Requirements
See `PROVIDER_AUTO_DISCOVERY_REPORT.md` for required env vars:
```markdown
### Conditionally Available Providers (90)
- **Etherscan** (`etherscan_primary`)
- Required: `ETHERSCAN_PRIMARY_API_KEY` environment variable
- Reason: HTTP 401 - Requires authentication
```
### 2. Set Environment Variables
```bash
export ETHERSCAN_API_KEY="your_key_here"
export BSCSCAN_API_KEY="your_key_here"
```
### 3. Re-run Validation
```bash
python3 auto_provider_loader.py
```
Previously conditional providers will now validate as VALID if keys are correct.
---
## Performance Tuning
### Parallel Validation
HTTP providers are validated in batches of 10 to balance speed and resource usage:
```python
# In auto_provider_loader.py
batch_size = 10 # Adjust based on your needs
```
Larger batches = faster but more network load
Smaller batches = slower but more conservative
### Timeout Adjustment
For slow or distant APIs:
```python
validator = ProviderValidator(timeout=15.0) # 15 seconds
```
---
## Troubleshooting
### Issue: Many providers marked INVALID
**Possible causes:**
- Network connectivity issues
- Rate limiting (try again later)
- Providers genuinely down
**Solution:** Check individual error reasons in report
### Issue: All providers CONDITIONALLY_AVAILABLE
**Cause:** Most providers require API keys
**Solution:** Set required environment variables
### Issue: HF models all INVALID
**Causes:**
- No internet connection to HuggingFace
- Models moved or renamed
- Rate limiting from HF Hub
**Solution:** Check HF Hub status, verify model IDs
### Issue: Validation takes too long
**Solutions:**
- Reduce batch size
- Decrease timeout
- Filter providers before validation
---
## Advanced Usage
### Validating Specific Providers
```python
from provider_validator import ProviderValidator
import asyncio
async def validate_one():
validator = ProviderValidator()
result = await validator.validate_http_provider(
"coingecko",
{
"name": "CoinGecko",
"category": "market_data",
"base_url": "https://api.coingecko.com/api/v3",
"endpoints": {"ping": "/ping"}
}
)
print(f"Status: {result.status}")
print(f"Response time: {result.response_time_ms}ms")
asyncio.run(validate_one())
```
### Custom Discovery Logic
```python
from auto_provider_loader import AutoProviderLoader
class CustomAPL(AutoProviderLoader):
def discover_http_providers(self):
# Your custom logic
providers = super().discover_http_providers()
# Filter or augment
return [p for p in providers if p['data'].get('free') == True]
apl = CustomAPL()
await apl.run()
```
---
## API Reference
### ProviderValidator
```python
class ProviderValidator:
def __init__(self, timeout: float = 10.0)
async def validate_http_provider(
provider_id: str,
provider_data: Dict[str, Any]
) -> ValidationResult
async def validate_hf_model(
model_id: str,
model_name: str,
pipeline_tag: str = "sentiment-analysis"
) -> ValidationResult
def get_summary() -> Dict[str, Any]
```
### AutoProviderLoader
```python
class AutoProviderLoader:
def __init__(self, workspace_root: str = "/workspace")
def discover_http_providers() -> List[Dict[str, Any]]
def discover_hf_models() -> List[Dict[str, Any]]
async def validate_all_http_providers(providers: List)
async def validate_all_hf_models(models: List)
def integrate_valid_providers() -> Dict[str, Any]
def generate_reports()
async def run() # Main entry point
```
---
## Best Practices
1. **Regular Re-validation**
- Run APL weekly to catch provider changes
- Providers can go offline or change endpoints
2. **Monitor Conditional Providers**
- Set up API keys for high-value providers
- Track which providers need auth
3. **Review Reports**
- Check invalid providers for patterns
- Update configs based on error reasons
4. **Backup Configs**
- APL creates automatic backups
- Keep manual backups before major changes
5. **Test Integration**
- After APL runs, test your application
- Verify new providers work in your context
---
## Zero Mock/Fake Data Guarantee
**APL NEVER uses mock or fake data.**
- All validations are REAL API calls
- All response times are ACTUAL measurements
- All status classifications based on REAL responses
- Invalid providers are GENUINELY unreachable
- Valid providers are GENUINELY functional
This guarantee ensures:
- Production-ready validation results
- Accurate performance metrics
- Trustworthy provider recommendations
- No surprises in production
---
## Support
### Documentation
- `PROVIDER_AUTO_DISCOVERY_REPORT.md` - Latest validation results
- `APL_FINAL_SUMMARY.md` - Implementation summary
- This guide - Usage instructions
### Common Questions
**Q: Can I use APL in CI/CD?**
A: Yes! Run `python3 auto_provider_loader.py` in your pipeline.
**Q: How often should I run APL?**
A: Weekly for production, daily for development.
**Q: Can I add custom provider types?**
A: Yes, extend `ProviderValidator` class with new validation methods.
**Q: Does APL support GraphQL APIs?**
A: Not yet, but you can extend it by adding GraphQL validation logic.
---
## Version History
### v1.0 (2025-11-16)
- Initial release
- HTTP JSON validation
- HTTP RPC validation
- HF model validation (API-based, lightweight)
- Automatic discovery from JSON resources
- Comprehensive reporting
- Zero mock data guarantee
---
*Auto Provider Loader - Real Data Only, Always.*