Spaces:

Really-amin
/

Datasourceforcryptocurrency

Running

App Files Files Community

Datasourceforcryptocurrency / APL_USAGE_GUIDE.md

Really-amin

Upload 325 files

9d92c17 verified 7 days ago

preview code

raw

history blame contribute delete

12.4 kB

	# Auto Provider Loader (APL) - Usage Guide

	Version: 1.0
	Last Updated: 2025-11-16
	Status: PRODUCTION READY ✅

	---

	## Overview

	The Auto Provider Loader (APL) is a real-data-only system that automatically discovers, validates, and integrates cryptocurrency data providers (both HTTP APIs and Hugging Face models) into your application.

	### Key Features

	- 🔍 Automatic Discovery - Scans JSON resources for provider definitions
	- ✅ Real Validation - Tests each provider with actual API calls (NO MOCKS)
	- 🔧 Smart Integration - Automatically adds valid providers to config
	- 📊 Comprehensive Reports - Generates detailed validation reports
	- ⚡ Performance Optimized - Parallel validation with configurable timeouts
	- 🛡️ Auth Handling - Detects and handles API key requirements

	---

	## Architecture

	### Components

	1. provider_validator.py - Core validation engine
	- Validates HTTP JSON APIs
	- Validates HTTP RPC endpoints
	- Validates Hugging Face models
	- Handles authentication requirements

	2. auto_provider_loader.py - Discovery and orchestration
	- Scans resource files
	- Coordinates validation
	- Integrates valid providers
	- Generates reports

	### Provider Types Supported

	\| Type \| Description \| Example \|
	\|------\|-------------\|---------\|
	\| `HTTP_JSON` \| REST APIs returning JSON \| CoinGecko, CoinPaprika \|
	\| `HTTP_RPC` \| JSON-RPC endpoints \| Ethereum nodes, BSC RPC \|
	\| `WEBSOCKET` \| WebSocket connections \| Alchemy WS, real-time feeds \|
	\| `HF_MODEL` \| Hugging Face models \| Sentiment analysis models \|

	---

	## Quick Start

	### 1. Basic Usage

	Run the APL to discover and validate all providers:

	```bash
	cd /workspace
	python3 auto_provider_loader.py
	```

	This will:
	- Scan `api-resources/*.json` for provider definitions
	- Scan `providers_config*.json` for existing providers
	- Discover HF models from `backend/services/`
	- Validate each provider with real API calls
	- Generate comprehensive reports
	- Update `providers_config_extended.json` with valid providers

	### 2. Understanding Output

	```
	================================================================================
	🚀 AUTO PROVIDER LOADER (APL) - REAL DATA ONLY
	================================================================================

	📡 PHASE 1: DISCOVERY
	Found 339 HTTP provider candidates
	Found 4 HF model candidates

	🔬 PHASE 2: VALIDATION
	✅ Valid providers
	❌ Invalid providers
	⚠️ Conditionally available (requires auth)

	📊 PHASE 3: COMPUTING STATISTICS
	🔧 PHASE 4: INTEGRATION
	📝 PHASE 5: GENERATING REPORTS
	```

	### 3. Generated Files

	After running APL, you'll find:

	- `PROVIDER_AUTO_DISCOVERY_REPORT.md` - Human-readable report
	- `PROVIDER_AUTO_DISCOVERY_REPORT.json` - Machine-readable detailed results
	- `providers_config_extended.backup.{timestamp}.json` - Config backup
	- `providers_config_extended.json` - Updated with new valid providers

	---

	## Validation Logic

	### HTTP Providers

	For each HTTP provider, APL:

	1. Checks URL structure
	- Detects placeholder variables (`{API_KEY}`, `{PROJECT_ID}`)
	- Identifies WebSocket endpoints (`ws://`, `wss://`)

	2. Determines endpoint type
	- JSON REST API → GET request to test endpoint
	- JSON-RPC → POST request with `eth_blockNumber` method

	3. Makes real test call
	- 8-second timeout
	- Handles redirects
	- Validates response format

	4. Classifies result
	- ✅ `VALID` - Responds with 200 OK and valid data
	- ❌ `INVALID` - Connection fails, timeout, or error response
	- ⚠️ `CONDITIONALLY_AVAILABLE` - Requires API key (401/403)
	- ⏭️ `SKIPPED` - WebSocket (requires separate validation)

	### Hugging Face Models

	For each HF model, APL:

	1. Queries HF Hub API
	- Checks if model exists: `GET https://huggingface.co/api/models/{model_id}`
	- Does NOT download or load the full model (saves time/resources)

	2. Validates accessibility
	- ✅ `VALID` - Model found and publicly accessible
	- ⚠️ `CONDITIONALLY_AVAILABLE` - Requires HF_TOKEN
	- ❌ `INVALID` - Model not found (404) or other error

	---

	## Configuration

	### Environment Variables

	APL respects these environment variables:

	\| Variable \| Purpose \| Default \|
	\|----------\|---------\|---------\|
	\| `HF_TOKEN` \| Hugging Face API token \| None \|
	\| `ETHERSCAN_API_KEY` \| Etherscan API key \| None \|
	\| `BSCSCAN_API_KEY` \| BSCScan API key \| None \|
	\| `INFURA_PROJECT_ID` \| Infura project ID \| None \|
	\| `ALCHEMY_API_KEY` \| Alchemy API key \| None \|

	### Validation Timeout

	Default timeout is 8 seconds. To customize:

	```python
	from auto_provider_loader import AutoProviderLoader

	apl = AutoProviderLoader()
	apl.validator.timeout = 15.0 # 15 seconds
	await apl.run()
	```

	---

	## Adding New Provider Sources

	### 1. Add to JSON Resources

	Create or update a JSON file in `api-resources/`:

	```json
	{
	"registry": {
	"my_providers": [
	{
	"id": "my_api",
	"name": "My API",
	"category": "market_data",
	"base_url": "https://api.example.com/v1",
	"endpoints": {
	"prices": "/prices"
	},
	"auth": {
	"type": "none"
	}
	}
	]
	}
	}
	```

	### 2. Re-run APL

	```bash
	python3 auto_provider_loader.py
	```

	APL will automatically discover and validate your new provider.

	---

	## Integration with Existing Code

	### Using Validated Providers

	After APL runs, valid providers are in `providers_config_extended.json`:

	```python
	import json

	# Load validated providers
	with open('providers_config_extended.json', 'r') as f:
	config = json.load(f)

	# Get all valid providers
	valid_providers = config['providers']

	# Use a specific provider
	coingecko = valid_providers['coingecko']
	print(f"Provider: {coingecko['name']}")
	print(f"Category: {coingecko['category']}")
	print(f"Response time: {coingecko['response_time_ms']}ms")
	```

	### Filtering by Category

	```python
	# Get all market data providers
	market_providers = {
	pid: data for pid, data in valid_providers.items()
	if data.get('category') == 'market_data'
	}
	```

	---

	## Conditional Providers

	Providers marked as `CONDITIONALLY_AVAILABLE` require API keys:

	### 1. Check Requirements

	See `PROVIDER_AUTO_DISCOVERY_REPORT.md` for required env vars:

	```markdown
	### Conditionally Available Providers (90)

	- Etherscan (`etherscan_primary`)
	- Required: `ETHERSCAN_PRIMARY_API_KEY` environment variable
	- Reason: HTTP 401 - Requires authentication
	```

	### 2. Set Environment Variables

	```bash
	export ETHERSCAN_API_KEY="your_key_here"
	export BSCSCAN_API_KEY="your_key_here"
	```

	### 3. Re-run Validation

	```bash
	python3 auto_provider_loader.py
	```

	Previously conditional providers will now validate as VALID if keys are correct.

	---

	## Performance Tuning

	### Parallel Validation

	HTTP providers are validated in batches of 10 to balance speed and resource usage:

	```python
	# In auto_provider_loader.py
	batch_size = 10 # Adjust based on your needs
	```

	Larger batches = faster but more network load
	Smaller batches = slower but more conservative

	### Timeout Adjustment

	For slow or distant APIs:

	```python
	validator = ProviderValidator(timeout=15.0) # 15 seconds
	```

	---

	## Troubleshooting

	### Issue: Many providers marked INVALID

	Possible causes:
	- Network connectivity issues
	- Rate limiting (try again later)
	- Providers genuinely down

	Solution: Check individual error reasons in report

	### Issue: All providers CONDITIONALLY_AVAILABLE

	Cause: Most providers require API keys

	Solution: Set required environment variables

	### Issue: HF models all INVALID

	Causes:
	- No internet connection to HuggingFace
	- Models moved or renamed
	- Rate limiting from HF Hub

	Solution: Check HF Hub status, verify model IDs

	### Issue: Validation takes too long

	Solutions:
	- Reduce batch size
	- Decrease timeout
	- Filter providers before validation

	---

	## Advanced Usage

	### Validating Specific Providers

	```python
	from provider_validator import ProviderValidator
	import asyncio

	async def validate_one():
	validator = ProviderValidator()

	result = await validator.validate_http_provider(
	"coingecko",
	{
	"name": "CoinGecko",
	"category": "market_data",
	"base_url": "https://api.coingecko.com/api/v3",
	"endpoints": {"ping": "/ping"}
	}
	)

	print(f"Status: {result.status}")
	print(f"Response time: {result.response_time_ms}ms")

	asyncio.run(validate_one())
	```

	### Custom Discovery Logic

	```python
	from auto_provider_loader import AutoProviderLoader

	class CustomAPL(AutoProviderLoader):
	def discover_http_providers(self):
	# Your custom logic
	providers = super().discover_http_providers()
	# Filter or augment
	return [p for p in providers if p['data'].get('free') == True]

	apl = CustomAPL()
	await apl.run()
	```

	---

	## API Reference

	### ProviderValidator

	```python
	class ProviderValidator:
	def __init__(self, timeout: float = 10.0)

	async def validate_http_provider(
	provider_id: str,
	provider_data: Dict[str, Any]
	) -> ValidationResult

	async def validate_hf_model(
	model_id: str,
	model_name: str,
	pipeline_tag: str = "sentiment-analysis"
	) -> ValidationResult

	def get_summary() -> Dict[str, Any]
	```

	### AutoProviderLoader

	```python
	class AutoProviderLoader:
	def __init__(self, workspace_root: str = "/workspace")

	def discover_http_providers() -> List[Dict[str, Any]]
	def discover_hf_models() -> List[Dict[str, Any]]

	async def validate_all_http_providers(providers: List)
	async def validate_all_hf_models(models: List)

	def integrate_valid_providers() -> Dict[str, Any]
	def generate_reports()

	async def run() # Main entry point
	```

	---

	## Best Practices

	1. Regular Re-validation
	- Run APL weekly to catch provider changes
	- Providers can go offline or change endpoints

	2. Monitor Conditional Providers
	- Set up API keys for high-value providers
	- Track which providers need auth

	3. Review Reports
	- Check invalid providers for patterns
	- Update configs based on error reasons

	4. Backup Configs
	- APL creates automatic backups
	- Keep manual backups before major changes

	5. Test Integration
	- After APL runs, test your application
	- Verify new providers work in your context

	---

	## Zero Mock/Fake Data Guarantee

	APL NEVER uses mock or fake data.

	- All validations are REAL API calls
	- All response times are ACTUAL measurements
	- All status classifications based on REAL responses
	- Invalid providers are GENUINELY unreachable
	- Valid providers are GENUINELY functional

	This guarantee ensures:
	- Production-ready validation results
	- Accurate performance metrics
	- Trustworthy provider recommendations
	- No surprises in production

	---

	## Support

	### Documentation

	- `PROVIDER_AUTO_DISCOVERY_REPORT.md` - Latest validation results
	- `APL_FINAL_SUMMARY.md` - Implementation summary
	- This guide - Usage instructions

	### Common Questions

	Q: Can I use APL in CI/CD?
	A: Yes! Run `python3 auto_provider_loader.py` in your pipeline.

	Q: How often should I run APL?
	A: Weekly for production, daily for development.

	Q: Can I add custom provider types?
	A: Yes, extend `ProviderValidator` class with new validation methods.

	Q: Does APL support GraphQL APIs?
	A: Not yet, but you can extend it by adding GraphQL validation logic.

	---

	## Version History

	### v1.0 (2025-11-16)
	- Initial release
	- HTTP JSON validation
	- HTTP RPC validation
	- HF model validation (API-based, lightweight)
	- Automatic discovery from JSON resources
	- Comprehensive reporting
	- Zero mock data guarantee

	---

	Auto Provider Loader - Real Data Only, Always.