# Auto Provider Loader (APL) - Usage Guide **Version:** 1.0 **Last Updated:** 2025-11-16 **Status:** PRODUCTION READY ✅ --- ## Overview The Auto Provider Loader (APL) is a **real-data-only** system that automatically discovers, validates, and integrates cryptocurrency data providers (both HTTP APIs and Hugging Face models) into your application. ### Key Features - 🔍 **Automatic Discovery** - Scans JSON resources for provider definitions - ✅ **Real Validation** - Tests each provider with actual API calls (NO MOCKS) - 🔧 **Smart Integration** - Automatically adds valid providers to config - 📊 **Comprehensive Reports** - Generates detailed validation reports - ⚡ **Performance Optimized** - Parallel validation with configurable timeouts - 🛡️ **Auth Handling** - Detects and handles API key requirements --- ## Architecture ### Components 1. **provider_validator.py** - Core validation engine - Validates HTTP JSON APIs - Validates HTTP RPC endpoints - Validates Hugging Face models - Handles authentication requirements 2. **auto_provider_loader.py** - Discovery and orchestration - Scans resource files - Coordinates validation - Integrates valid providers - Generates reports ### Provider Types Supported | Type | Description | Example | |------|-------------|---------| | `HTTP_JSON` | REST APIs returning JSON | CoinGecko, CoinPaprika | | `HTTP_RPC` | JSON-RPC endpoints | Ethereum nodes, BSC RPC | | `WEBSOCKET` | WebSocket connections | Alchemy WS, real-time feeds | | `HF_MODEL` | Hugging Face models | Sentiment analysis models | --- ## Quick Start ### 1. Basic Usage Run the APL to discover and validate all providers: ```bash cd /workspace python3 auto_provider_loader.py ``` This will: - Scan `api-resources/*.json` for provider definitions - Scan `providers_config*.json` for existing providers - Discover HF models from `backend/services/` - Validate each provider with real API calls - Generate comprehensive reports - Update `providers_config_extended.json` with valid providers ### 2. Understanding Output ``` ================================================================================ 🚀 AUTO PROVIDER LOADER (APL) - REAL DATA ONLY ================================================================================ 📡 PHASE 1: DISCOVERY Found 339 HTTP provider candidates Found 4 HF model candidates 🔬 PHASE 2: VALIDATION ✅ Valid providers ❌ Invalid providers ⚠️ Conditionally available (requires auth) 📊 PHASE 3: COMPUTING STATISTICS 🔧 PHASE 4: INTEGRATION 📝 PHASE 5: GENERATING REPORTS ``` ### 3. Generated Files After running APL, you'll find: - `PROVIDER_AUTO_DISCOVERY_REPORT.md` - Human-readable report - `PROVIDER_AUTO_DISCOVERY_REPORT.json` - Machine-readable detailed results - `providers_config_extended.backup.{timestamp}.json` - Config backup - `providers_config_extended.json` - Updated with new valid providers --- ## Validation Logic ### HTTP Providers For each HTTP provider, APL: 1. **Checks URL structure** - Detects placeholder variables (`{API_KEY}`, `{PROJECT_ID}`) - Identifies WebSocket endpoints (`ws://`, `wss://`) 2. **Determines endpoint type** - JSON REST API → GET request to test endpoint - JSON-RPC → POST request with `eth_blockNumber` method 3. **Makes real test call** - 8-second timeout - Handles redirects - Validates response format 4. **Classifies result** - ✅ `VALID` - Responds with 200 OK and valid data - ❌ `INVALID` - Connection fails, timeout, or error response - ⚠️ `CONDITIONALLY_AVAILABLE` - Requires API key (401/403) - ⏭️ `SKIPPED` - WebSocket (requires separate validation) ### Hugging Face Models For each HF model, APL: 1. **Queries HF Hub API** - Checks if model exists: `GET https://huggingface.co/api/models/{model_id}` - Does NOT download or load the full model (saves time/resources) 2. **Validates accessibility** - ✅ `VALID` - Model found and publicly accessible - ⚠️ `CONDITIONALLY_AVAILABLE` - Requires HF_TOKEN - ❌ `INVALID` - Model not found (404) or other error --- ## Configuration ### Environment Variables APL respects these environment variables: | Variable | Purpose | Default | |----------|---------|---------| | `HF_TOKEN` | Hugging Face API token | None | | `ETHERSCAN_API_KEY` | Etherscan API key | None | | `BSCSCAN_API_KEY` | BSCScan API key | None | | `INFURA_PROJECT_ID` | Infura project ID | None | | `ALCHEMY_API_KEY` | Alchemy API key | None | ### Validation Timeout Default timeout is 8 seconds. To customize: ```python from auto_provider_loader import AutoProviderLoader apl = AutoProviderLoader() apl.validator.timeout = 15.0 # 15 seconds await apl.run() ``` --- ## Adding New Provider Sources ### 1. Add to JSON Resources Create or update a JSON file in `api-resources/`: ```json { "registry": { "my_providers": [ { "id": "my_api", "name": "My API", "category": "market_data", "base_url": "https://api.example.com/v1", "endpoints": { "prices": "/prices" }, "auth": { "type": "none" } } ] } } ``` ### 2. Re-run APL ```bash python3 auto_provider_loader.py ``` APL will automatically discover and validate your new provider. --- ## Integration with Existing Code ### Using Validated Providers After APL runs, valid providers are in `providers_config_extended.json`: ```python import json # Load validated providers with open('providers_config_extended.json', 'r') as f: config = json.load(f) # Get all valid providers valid_providers = config['providers'] # Use a specific provider coingecko = valid_providers['coingecko'] print(f"Provider: {coingecko['name']}") print(f"Category: {coingecko['category']}") print(f"Response time: {coingecko['response_time_ms']}ms") ``` ### Filtering by Category ```python # Get all market data providers market_providers = { pid: data for pid, data in valid_providers.items() if data.get('category') == 'market_data' } ``` --- ## Conditional Providers Providers marked as `CONDITIONALLY_AVAILABLE` require API keys: ### 1. Check Requirements See `PROVIDER_AUTO_DISCOVERY_REPORT.md` for required env vars: ```markdown ### Conditionally Available Providers (90) - **Etherscan** (`etherscan_primary`) - Required: `ETHERSCAN_PRIMARY_API_KEY` environment variable - Reason: HTTP 401 - Requires authentication ``` ### 2. Set Environment Variables ```bash export ETHERSCAN_API_KEY="your_key_here" export BSCSCAN_API_KEY="your_key_here" ``` ### 3. Re-run Validation ```bash python3 auto_provider_loader.py ``` Previously conditional providers will now validate as VALID if keys are correct. --- ## Performance Tuning ### Parallel Validation HTTP providers are validated in batches of 10 to balance speed and resource usage: ```python # In auto_provider_loader.py batch_size = 10 # Adjust based on your needs ``` Larger batches = faster but more network load Smaller batches = slower but more conservative ### Timeout Adjustment For slow or distant APIs: ```python validator = ProviderValidator(timeout=15.0) # 15 seconds ``` --- ## Troubleshooting ### Issue: Many providers marked INVALID **Possible causes:** - Network connectivity issues - Rate limiting (try again later) - Providers genuinely down **Solution:** Check individual error reasons in report ### Issue: All providers CONDITIONALLY_AVAILABLE **Cause:** Most providers require API keys **Solution:** Set required environment variables ### Issue: HF models all INVALID **Causes:** - No internet connection to HuggingFace - Models moved or renamed - Rate limiting from HF Hub **Solution:** Check HF Hub status, verify model IDs ### Issue: Validation takes too long **Solutions:** - Reduce batch size - Decrease timeout - Filter providers before validation --- ## Advanced Usage ### Validating Specific Providers ```python from provider_validator import ProviderValidator import asyncio async def validate_one(): validator = ProviderValidator() result = await validator.validate_http_provider( "coingecko", { "name": "CoinGecko", "category": "market_data", "base_url": "https://api.coingecko.com/api/v3", "endpoints": {"ping": "/ping"} } ) print(f"Status: {result.status}") print(f"Response time: {result.response_time_ms}ms") asyncio.run(validate_one()) ``` ### Custom Discovery Logic ```python from auto_provider_loader import AutoProviderLoader class CustomAPL(AutoProviderLoader): def discover_http_providers(self): # Your custom logic providers = super().discover_http_providers() # Filter or augment return [p for p in providers if p['data'].get('free') == True] apl = CustomAPL() await apl.run() ``` --- ## API Reference ### ProviderValidator ```python class ProviderValidator: def __init__(self, timeout: float = 10.0) async def validate_http_provider( provider_id: str, provider_data: Dict[str, Any] ) -> ValidationResult async def validate_hf_model( model_id: str, model_name: str, pipeline_tag: str = "sentiment-analysis" ) -> ValidationResult def get_summary() -> Dict[str, Any] ``` ### AutoProviderLoader ```python class AutoProviderLoader: def __init__(self, workspace_root: str = "/workspace") def discover_http_providers() -> List[Dict[str, Any]] def discover_hf_models() -> List[Dict[str, Any]] async def validate_all_http_providers(providers: List) async def validate_all_hf_models(models: List) def integrate_valid_providers() -> Dict[str, Any] def generate_reports() async def run() # Main entry point ``` --- ## Best Practices 1. **Regular Re-validation** - Run APL weekly to catch provider changes - Providers can go offline or change endpoints 2. **Monitor Conditional Providers** - Set up API keys for high-value providers - Track which providers need auth 3. **Review Reports** - Check invalid providers for patterns - Update configs based on error reasons 4. **Backup Configs** - APL creates automatic backups - Keep manual backups before major changes 5. **Test Integration** - After APL runs, test your application - Verify new providers work in your context --- ## Zero Mock/Fake Data Guarantee **APL NEVER uses mock or fake data.** - All validations are REAL API calls - All response times are ACTUAL measurements - All status classifications based on REAL responses - Invalid providers are GENUINELY unreachable - Valid providers are GENUINELY functional This guarantee ensures: - Production-ready validation results - Accurate performance metrics - Trustworthy provider recommendations - No surprises in production --- ## Support ### Documentation - `PROVIDER_AUTO_DISCOVERY_REPORT.md` - Latest validation results - `APL_FINAL_SUMMARY.md` - Implementation summary - This guide - Usage instructions ### Common Questions **Q: Can I use APL in CI/CD?** A: Yes! Run `python3 auto_provider_loader.py` in your pipeline. **Q: How often should I run APL?** A: Weekly for production, daily for development. **Q: Can I add custom provider types?** A: Yes, extend `ProviderValidator` class with new validation methods. **Q: Does APL support GraphQL APIs?** A: Not yet, but you can extend it by adding GraphQL validation logic. --- ## Version History ### v1.0 (2025-11-16) - Initial release - HTTP JSON validation - HTTP RPC validation - HF model validation (API-based, lightweight) - Automatic discovery from JSON resources - Comprehensive reporting - Zero mock data guarantee --- *Auto Provider Loader - Real Data Only, Always.*