| # Auto Provider Loader (APL) - Usage Guide | |
| **Version:** 1.0 | |
| **Last Updated:** 2025-11-16 | |
| **Status:** PRODUCTION READY β | |
| --- | |
| ## Overview | |
| The Auto Provider Loader (APL) is a **real-data-only** system that automatically discovers, validates, and integrates cryptocurrency data providers (both HTTP APIs and Hugging Face models) into your application. | |
| ### Key Features | |
| - π **Automatic Discovery** - Scans JSON resources for provider definitions | |
| - β **Real Validation** - Tests each provider with actual API calls (NO MOCKS) | |
| - π§ **Smart Integration** - Automatically adds valid providers to config | |
| - π **Comprehensive Reports** - Generates detailed validation reports | |
| - β‘ **Performance Optimized** - Parallel validation with configurable timeouts | |
| - π‘οΈ **Auth Handling** - Detects and handles API key requirements | |
| --- | |
| ## Architecture | |
| ### Components | |
| 1. **provider_validator.py** - Core validation engine | |
| - Validates HTTP JSON APIs | |
| - Validates HTTP RPC endpoints | |
| - Validates Hugging Face models | |
| - Handles authentication requirements | |
| 2. **auto_provider_loader.py** - Discovery and orchestration | |
| - Scans resource files | |
| - Coordinates validation | |
| - Integrates valid providers | |
| - Generates reports | |
| ### Provider Types Supported | |
| | Type | Description | Example | | |
| |------|-------------|---------| | |
| | `HTTP_JSON` | REST APIs returning JSON | CoinGecko, CoinPaprika | | |
| | `HTTP_RPC` | JSON-RPC endpoints | Ethereum nodes, BSC RPC | | |
| | `WEBSOCKET` | WebSocket connections | Alchemy WS, real-time feeds | | |
| | `HF_MODEL` | Hugging Face models | Sentiment analysis models | | |
| --- | |
| ## Quick Start | |
| ### 1. Basic Usage | |
| Run the APL to discover and validate all providers: | |
| ```bash | |
| cd /workspace | |
| python3 auto_provider_loader.py | |
| ``` | |
| This will: | |
| - Scan `api-resources/*.json` for provider definitions | |
| - Scan `providers_config*.json` for existing providers | |
| - Discover HF models from `backend/services/` | |
| - Validate each provider with real API calls | |
| - Generate comprehensive reports | |
| - Update `providers_config_extended.json` with valid providers | |
| ### 2. Understanding Output | |
| ``` | |
| ================================================================================ | |
| π AUTO PROVIDER LOADER (APL) - REAL DATA ONLY | |
| ================================================================================ | |
| π‘ PHASE 1: DISCOVERY | |
| Found 339 HTTP provider candidates | |
| Found 4 HF model candidates | |
| π¬ PHASE 2: VALIDATION | |
| β Valid providers | |
| β Invalid providers | |
| β οΈ Conditionally available (requires auth) | |
| π PHASE 3: COMPUTING STATISTICS | |
| π§ PHASE 4: INTEGRATION | |
| π PHASE 5: GENERATING REPORTS | |
| ``` | |
| ### 3. Generated Files | |
| After running APL, you'll find: | |
| - `PROVIDER_AUTO_DISCOVERY_REPORT.md` - Human-readable report | |
| - `PROVIDER_AUTO_DISCOVERY_REPORT.json` - Machine-readable detailed results | |
| - `providers_config_extended.backup.{timestamp}.json` - Config backup | |
| - `providers_config_extended.json` - Updated with new valid providers | |
| --- | |
| ## Validation Logic | |
| ### HTTP Providers | |
| For each HTTP provider, APL: | |
| 1. **Checks URL structure** | |
| - Detects placeholder variables (`{API_KEY}`, `{PROJECT_ID}`) | |
| - Identifies WebSocket endpoints (`ws://`, `wss://`) | |
| 2. **Determines endpoint type** | |
| - JSON REST API β GET request to test endpoint | |
| - JSON-RPC β POST request with `eth_blockNumber` method | |
| 3. **Makes real test call** | |
| - 8-second timeout | |
| - Handles redirects | |
| - Validates response format | |
| 4. **Classifies result** | |
| - β `VALID` - Responds with 200 OK and valid data | |
| - β `INVALID` - Connection fails, timeout, or error response | |
| - β οΈ `CONDITIONALLY_AVAILABLE` - Requires API key (401/403) | |
| - βοΈ `SKIPPED` - WebSocket (requires separate validation) | |
| ### Hugging Face Models | |
| For each HF model, APL: | |
| 1. **Queries HF Hub API** | |
| - Checks if model exists: `GET https://huggingface.co/api/models/{model_id}` | |
| - Does NOT download or load the full model (saves time/resources) | |
| 2. **Validates accessibility** | |
| - β `VALID` - Model found and publicly accessible | |
| - β οΈ `CONDITIONALLY_AVAILABLE` - Requires HF_TOKEN | |
| - β `INVALID` - Model not found (404) or other error | |
| --- | |
| ## Configuration | |
| ### Environment Variables | |
| APL respects these environment variables: | |
| | Variable | Purpose | Default | | |
| |----------|---------|---------| | |
| | `HF_TOKEN` | Hugging Face API token | None | | |
| | `ETHERSCAN_API_KEY` | Etherscan API key | None | | |
| | `BSCSCAN_API_KEY` | BSCScan API key | None | | |
| | `INFURA_PROJECT_ID` | Infura project ID | None | | |
| | `ALCHEMY_API_KEY` | Alchemy API key | None | | |
| ### Validation Timeout | |
| Default timeout is 8 seconds. To customize: | |
| ```python | |
| from auto_provider_loader import AutoProviderLoader | |
| apl = AutoProviderLoader() | |
| apl.validator.timeout = 15.0 # 15 seconds | |
| await apl.run() | |
| ``` | |
| --- | |
| ## Adding New Provider Sources | |
| ### 1. Add to JSON Resources | |
| Create or update a JSON file in `api-resources/`: | |
| ```json | |
| { | |
| "registry": { | |
| "my_providers": [ | |
| { | |
| "id": "my_api", | |
| "name": "My API", | |
| "category": "market_data", | |
| "base_url": "https://api.example.com/v1", | |
| "endpoints": { | |
| "prices": "/prices" | |
| }, | |
| "auth": { | |
| "type": "none" | |
| } | |
| } | |
| ] | |
| } | |
| } | |
| ``` | |
| ### 2. Re-run APL | |
| ```bash | |
| python3 auto_provider_loader.py | |
| ``` | |
| APL will automatically discover and validate your new provider. | |
| --- | |
| ## Integration with Existing Code | |
| ### Using Validated Providers | |
| After APL runs, valid providers are in `providers_config_extended.json`: | |
| ```python | |
| import json | |
| # Load validated providers | |
| with open('providers_config_extended.json', 'r') as f: | |
| config = json.load(f) | |
| # Get all valid providers | |
| valid_providers = config['providers'] | |
| # Use a specific provider | |
| coingecko = valid_providers['coingecko'] | |
| print(f"Provider: {coingecko['name']}") | |
| print(f"Category: {coingecko['category']}") | |
| print(f"Response time: {coingecko['response_time_ms']}ms") | |
| ``` | |
| ### Filtering by Category | |
| ```python | |
| # Get all market data providers | |
| market_providers = { | |
| pid: data for pid, data in valid_providers.items() | |
| if data.get('category') == 'market_data' | |
| } | |
| ``` | |
| --- | |
| ## Conditional Providers | |
| Providers marked as `CONDITIONALLY_AVAILABLE` require API keys: | |
| ### 1. Check Requirements | |
| See `PROVIDER_AUTO_DISCOVERY_REPORT.md` for required env vars: | |
| ```markdown | |
| ### Conditionally Available Providers (90) | |
| - **Etherscan** (`etherscan_primary`) | |
| - Required: `ETHERSCAN_PRIMARY_API_KEY` environment variable | |
| - Reason: HTTP 401 - Requires authentication | |
| ``` | |
| ### 2. Set Environment Variables | |
| ```bash | |
| export ETHERSCAN_API_KEY="your_key_here" | |
| export BSCSCAN_API_KEY="your_key_here" | |
| ``` | |
| ### 3. Re-run Validation | |
| ```bash | |
| python3 auto_provider_loader.py | |
| ``` | |
| Previously conditional providers will now validate as VALID if keys are correct. | |
| --- | |
| ## Performance Tuning | |
| ### Parallel Validation | |
| HTTP providers are validated in batches of 10 to balance speed and resource usage: | |
| ```python | |
| # In auto_provider_loader.py | |
| batch_size = 10 # Adjust based on your needs | |
| ``` | |
| Larger batches = faster but more network load | |
| Smaller batches = slower but more conservative | |
| ### Timeout Adjustment | |
| For slow or distant APIs: | |
| ```python | |
| validator = ProviderValidator(timeout=15.0) # 15 seconds | |
| ``` | |
| --- | |
| ## Troubleshooting | |
| ### Issue: Many providers marked INVALID | |
| **Possible causes:** | |
| - Network connectivity issues | |
| - Rate limiting (try again later) | |
| - Providers genuinely down | |
| **Solution:** Check individual error reasons in report | |
| ### Issue: All providers CONDITIONALLY_AVAILABLE | |
| **Cause:** Most providers require API keys | |
| **Solution:** Set required environment variables | |
| ### Issue: HF models all INVALID | |
| **Causes:** | |
| - No internet connection to HuggingFace | |
| - Models moved or renamed | |
| - Rate limiting from HF Hub | |
| **Solution:** Check HF Hub status, verify model IDs | |
| ### Issue: Validation takes too long | |
| **Solutions:** | |
| - Reduce batch size | |
| - Decrease timeout | |
| - Filter providers before validation | |
| --- | |
| ## Advanced Usage | |
| ### Validating Specific Providers | |
| ```python | |
| from provider_validator import ProviderValidator | |
| import asyncio | |
| async def validate_one(): | |
| validator = ProviderValidator() | |
| result = await validator.validate_http_provider( | |
| "coingecko", | |
| { | |
| "name": "CoinGecko", | |
| "category": "market_data", | |
| "base_url": "https://api.coingecko.com/api/v3", | |
| "endpoints": {"ping": "/ping"} | |
| } | |
| ) | |
| print(f"Status: {result.status}") | |
| print(f"Response time: {result.response_time_ms}ms") | |
| asyncio.run(validate_one()) | |
| ``` | |
| ### Custom Discovery Logic | |
| ```python | |
| from auto_provider_loader import AutoProviderLoader | |
| class CustomAPL(AutoProviderLoader): | |
| def discover_http_providers(self): | |
| # Your custom logic | |
| providers = super().discover_http_providers() | |
| # Filter or augment | |
| return [p for p in providers if p['data'].get('free') == True] | |
| apl = CustomAPL() | |
| await apl.run() | |
| ``` | |
| --- | |
| ## API Reference | |
| ### ProviderValidator | |
| ```python | |
| class ProviderValidator: | |
| def __init__(self, timeout: float = 10.0) | |
| async def validate_http_provider( | |
| provider_id: str, | |
| provider_data: Dict[str, Any] | |
| ) -> ValidationResult | |
| async def validate_hf_model( | |
| model_id: str, | |
| model_name: str, | |
| pipeline_tag: str = "sentiment-analysis" | |
| ) -> ValidationResult | |
| def get_summary() -> Dict[str, Any] | |
| ``` | |
| ### AutoProviderLoader | |
| ```python | |
| class AutoProviderLoader: | |
| def __init__(self, workspace_root: str = "/workspace") | |
| def discover_http_providers() -> List[Dict[str, Any]] | |
| def discover_hf_models() -> List[Dict[str, Any]] | |
| async def validate_all_http_providers(providers: List) | |
| async def validate_all_hf_models(models: List) | |
| def integrate_valid_providers() -> Dict[str, Any] | |
| def generate_reports() | |
| async def run() # Main entry point | |
| ``` | |
| --- | |
| ## Best Practices | |
| 1. **Regular Re-validation** | |
| - Run APL weekly to catch provider changes | |
| - Providers can go offline or change endpoints | |
| 2. **Monitor Conditional Providers** | |
| - Set up API keys for high-value providers | |
| - Track which providers need auth | |
| 3. **Review Reports** | |
| - Check invalid providers for patterns | |
| - Update configs based on error reasons | |
| 4. **Backup Configs** | |
| - APL creates automatic backups | |
| - Keep manual backups before major changes | |
| 5. **Test Integration** | |
| - After APL runs, test your application | |
| - Verify new providers work in your context | |
| --- | |
| ## Zero Mock/Fake Data Guarantee | |
| **APL NEVER uses mock or fake data.** | |
| - All validations are REAL API calls | |
| - All response times are ACTUAL measurements | |
| - All status classifications based on REAL responses | |
| - Invalid providers are GENUINELY unreachable | |
| - Valid providers are GENUINELY functional | |
| This guarantee ensures: | |
| - Production-ready validation results | |
| - Accurate performance metrics | |
| - Trustworthy provider recommendations | |
| - No surprises in production | |
| --- | |
| ## Support | |
| ### Documentation | |
| - `PROVIDER_AUTO_DISCOVERY_REPORT.md` - Latest validation results | |
| - `APL_FINAL_SUMMARY.md` - Implementation summary | |
| - This guide - Usage instructions | |
| ### Common Questions | |
| **Q: Can I use APL in CI/CD?** | |
| A: Yes! Run `python3 auto_provider_loader.py` in your pipeline. | |
| **Q: How often should I run APL?** | |
| A: Weekly for production, daily for development. | |
| **Q: Can I add custom provider types?** | |
| A: Yes, extend `ProviderValidator` class with new validation methods. | |
| **Q: Does APL support GraphQL APIs?** | |
| A: Not yet, but you can extend it by adding GraphQL validation logic. | |
| --- | |
| ## Version History | |
| ### v1.0 (2025-11-16) | |
| - Initial release | |
| - HTTP JSON validation | |
| - HTTP RPC validation | |
| - HF model validation (API-based, lightweight) | |
| - Automatic discovery from JSON resources | |
| - Comprehensive reporting | |
| - Zero mock data guarantee | |
| --- | |
| *Auto Provider Loader - Real Data Only, Always.* | |