# Cryptocurrency Data Aggregator - Complete Rewrite A production-ready cryptocurrency data aggregation application with AI-powered analysis, real-time data collection, and an interactive Gradio dashboard. ## Features ### Core Capabilities - **Real-time Price Tracking**: Monitor top 100 cryptocurrencies with live updates - **AI-Powered Sentiment Analysis**: Using HuggingFace models for news sentiment - **Market Analysis**: Technical indicators (MA, RSI), trend detection, predictions - **News Aggregation**: RSS feeds from CoinDesk, Cointelegraph, Bitcoin.com, and Reddit - **Interactive Dashboard**: 6-tab Gradio interface with auto-refresh - **SQLite Database**: Persistent storage with full CRUD operations - **No API Keys Required**: Uses only free data sources ### Data Sources (All Free, No Authentication) - **CoinGecko API**: Market data, prices, rankings - **CoinCap API**: Backup price data source - **Binance Public API**: Real-time trading data - **Alternative.me**: Fear & Greed Index - **RSS Feeds**: CoinDesk, Cointelegraph, Bitcoin Magazine, Decrypt, Bitcoinist - **Reddit**: r/cryptocurrency, r/bitcoin, r/ethtrader, r/cryptomarkets ### AI Models (HuggingFace - Local Inference) - **cardiffnlp/twitter-roberta-base-sentiment-latest**: Social media sentiment - **ProsusAI/finbert**: Financial news sentiment - **facebook/bart-large-cnn**: News summarization ## Project Structure ``` crypto-dt-source/ ├── config.py # Configuration constants ├── database.py # SQLite database with CRUD operations ├── collectors.py # Data collection from all sources ├── ai_models.py # HuggingFace model integration ├── utils.py # Helper functions and utilities ├── app.py # Main Gradio application ├── requirements.txt # Python dependencies ├── README.md # This file ├── data/ │ ├── database/ # SQLite database files │ └── backups/ # Database backups └── logs/ └── crypto_aggregator.log # Application logs ``` ## Installation ### Prerequisites - Python 3.8 or higher - 4GB+ RAM (for AI models) - Internet connection ### Step 1: Clone Repository ```bash git clone cd crypto-dt-source ``` ### Step 2: Install Dependencies ```bash pip install -r requirements.txt ``` This will install: - Gradio (web interface) - Pandas, NumPy (data processing) - Transformers, PyTorch (AI models) - Plotly (charts) - BeautifulSoup4, Feedparser (web scraping) - And more... ### Step 3: Run Application ```bash python app.py ``` The application will: 1. Initialize the SQLite database 2. Load AI models (first run may take 2-3 minutes) 3. Start background data collection 4. Launch Gradio interface Access the dashboard at: **http://localhost:7860** ## Gradio Dashboard ### Tab 1: Live Dashboard 📊 - Top 100 cryptocurrencies with real-time prices - Columns: Rank, Name, Symbol, Price, 24h Change, Volume, Market Cap - Auto-refresh every 30 seconds - Search and filter functionality - Color-coded price changes (green/red) ### Tab 2: Historical Charts 📈 - Select any cryptocurrency - Choose timeframe: 1d, 7d, 30d, 90d, 1y, All - Interactive Plotly charts with: - Price line chart - Volume bars - MA(7) and MA(30) overlays - RSI indicator - Export charts as PNG ### Tab 3: News & Sentiment 📰 - Latest cryptocurrency news from 9+ sources - Filter by sentiment: All, Positive, Neutral, Negative - Filter by coin: BTC, ETH, etc. - Each article shows: - Title (clickable link) - Source and date - AI-generated sentiment score - Summary - Related coins - Market sentiment gauge (0-100 scale) ### Tab 4: AI Analysis 🤖 - Select cryptocurrency - Generate AI-powered analysis: - Current trend (Bullish/Bearish/Neutral) - Support/Resistance levels - Technical indicators (RSI, MA7, MA30) - 24-72h prediction - Confidence score - Analysis saved to database for history ### Tab 5: Database Explorer 🗄️ - Pre-built SQL queries: - Top 10 gainers in last 24h - All positive sentiment news - Price history for any coin - Database statistics - Custom SQL query support (read-only for security) - Export results to CSV ### Tab 6: Data Sources Status 🔍 - Real-time status monitoring: - CoinGecko API ✓ - CoinCap API ✓ - Binance API ✓ - RSS feeds (5 sources) ✓ - Reddit endpoints (4 subreddits) ✓ - Database connection ✓ - Shows: Status (🟢/🔴), Last Update, Error Count - Manual refresh and data collection controls - Error log viewer ## Database Schema ### `prices` Table - `id`: Primary key - `symbol`: Coin symbol (e.g., "bitcoin") - `name`: Full name (e.g., "Bitcoin") - `price_usd`: Current price in USD - `volume_24h`: 24-hour trading volume - `market_cap`: Market capitalization - `percent_change_1h`, `percent_change_24h`, `percent_change_7d`: Price changes - `rank`: Market cap rank - `timestamp`: Record timestamp ### `news` Table - `id`: Primary key - `title`: News article title - `summary`: AI-generated summary - `url`: Article URL (unique) - `source`: Source name (e.g., "CoinDesk") - `sentiment_score`: Float (-1 to 1) - `sentiment_label`: Label (positive/negative/neutral) - `related_coins`: JSON array of coin symbols - `published_date`: Original publication date - `timestamp`: Record timestamp ### `market_analysis` Table - `id`: Primary key - `symbol`: Coin symbol - `timeframe`: Analysis period - `trend`: Trend direction (Bullish/Bearish/Neutral) - `support_level`, `resistance_level`: Price levels - `prediction`: Text prediction - `confidence`: Confidence score (0-1) - `timestamp`: Analysis timestamp ### `user_queries` Table - `id`: Primary key - `query`: SQL query or search term - `result_count`: Number of results - `timestamp`: Query timestamp ## Configuration Edit `config.py` to customize: ```python # Data collection intervals COLLECTION_INTERVALS = { "price_data": 300, # 5 minutes "news_data": 1800, # 30 minutes "sentiment_data": 1800 # 30 minutes } # Number of coins to track TOP_COINS_LIMIT = 100 # Gradio settings GRADIO_SERVER_PORT = 7860 AUTO_REFRESH_INTERVAL = 30 # seconds # Cache settings CACHE_TTL = 300 # 5 minutes CACHE_MAX_SIZE = 1000 # Logging LOG_LEVEL = "INFO" LOG_FILE = "logs/crypto_aggregator.log" ``` ## API Usage Examples ### Collect Data Manually ```python from collectors import collect_price_data, collect_news_data # Collect latest prices success, count = collect_price_data() print(f"Collected {count} prices") # Collect news count = collect_news_data() print(f"Collected {count} articles") ``` ### Query Database ```python from database import get_database db = get_database() # Get latest prices prices = db.get_latest_prices(limit=10) # Get news by coin news = db.get_news_by_coin("bitcoin", limit=5) # Get top gainers gainers = db.get_top_gainers(limit=10) ``` ### AI Analysis ```python from ai_models import analyze_sentiment, analyze_market_trend from database import get_database # Analyze sentiment result = analyze_sentiment("Bitcoin hits new all-time high!") print(result) # {'label': 'positive', 'score': 0.95, 'confidence': 0.92} # Analyze market trend db = get_database() history = db.get_price_history("bitcoin", hours=168) analysis = analyze_market_trend(history) print(analysis) # {'trend': 'Bullish', 'support_level': 50000, ...} ``` ## Error Handling & Resilience ### Fallback Mechanisms - If CoinGecko fails → CoinCap is used - If both APIs fail → cached database data is used - If AI models fail to load → keyword-based sentiment analysis - All network requests have timeout and retry logic ### Data Validation - Price bounds checking (MIN_PRICE to MAX_PRICE) - Volume and market cap validation - Duplicate prevention (unique URLs for news) - SQL injection prevention (read-only queries only) ### Logging All operations are logged to `logs/crypto_aggregator.log`: - Info: Successful operations, data collection - Warning: API failures, retries - Error: Database errors, critical failures ## Performance Optimization - **Async/Await**: All network requests use aiohttp - **Connection Pooling**: Reused HTTP connections - **Caching**: In-memory cache with 5-minute TTL - **Batch Inserts**: Minimum 100 records per database insert - **Indexed Queries**: Database indexes on symbol, timestamp, sentiment - **Lazy Loading**: AI models load only when first used ## Troubleshooting ### Issue: Models won't load **Solution**: Ensure you have 4GB+ RAM. Models download on first run (2-3 min). ### Issue: No data appearing **Solution**: Wait 5 minutes for initial data collection, or click "Refresh" buttons. ### Issue: Port 7860 already in use **Solution**: Change `GRADIO_SERVER_PORT` in `config.py` or kill existing process. ### Issue: Database locked **Solution**: Only one process can write at a time. Close other instances. ### Issue: RSS feeds failing **Solution**: Some feeds may be temporarily down. Check Tab 6 for status. ## Development ### Running Tests ```bash # Test data collection python collectors.py # Test AI models python ai_models.py # Test utilities python utils.py # Test database python database.py ``` ### Adding New Data Sources Edit `collectors.py`: ```python def collect_new_source(): try: response = safe_api_call("https://api.example.com/data") # Parse and save data return True except Exception as e: logger.error(f"Error: {e}") return False ``` Add to scheduler in `collectors.py`: ```python # In schedule_data_collection() threading.Timer(interval, collect_new_source).start() ``` ## Validation Checklist - [x] All 8 files complete - [x] No TODO or FIXME comments - [x] No placeholder functions - [x] All imports in requirements.txt - [x] Database schema matches specification - [x] All 6 Gradio tabs implemented - [x] All 3 AI models integrated - [x] All 5+ data sources configured - [x] Error handling in every network call - [x] Logging for all major operations - [x] No API keys in code - [x] Comments in English - [x] PEP 8 compliant ## License MIT License - Free to use, modify, and distribute. ## Support For issues or questions: - Check logs: `logs/crypto_aggregator.log` - Review error messages in Tab 6 - Ensure all dependencies installed: `pip install -r requirements.txt` ## Credits - **Data Sources**: CoinGecko, CoinCap, Binance, Alternative.me, CoinDesk, Cointelegraph, Reddit - **AI Models**: HuggingFace (Cardiff NLP, ProsusAI, Facebook) - **Framework**: Gradio --- **Made with ❤️ for the Crypto Community**