Spaces:
Running on Zero
Running on Zero
Gamahea
Initialize dropdowns with data on app load - Populate training dataset dropdown with prepared datasets on startup - Initialize LoRA dropdowns with available LoRAs - Load LoRA list table with existing data - Populate export dataset dropdown - Fixes 'No prepared datasets available' when datasets exist
5912922 | # HuggingFace Collection Integration - Complete | |
| ## π― Overview | |
| Full integration with HuggingFace Collection for LEMM LoRAs and datasets, including automatic syncing, import/export, and name conflict resolution. | |
| ## β Implemented Features | |
| ### 1. **Dataset Import** (`import_prepared_dataset`) | |
| - **Location**: `backend/services/dataset_service.py` | |
| - **Purpose**: Import prepared datasets from ZIP files | |
| - **Features**: | |
| - Supports both root-level and subfolder `dataset_info.json` structures | |
| - Automatic name conflict resolution with numeric suffixes (`_1`, `_2`, etc.) | |
| - Validates dataset structure before import | |
| - Updates metadata with new dataset key if renamed | |
| ```python | |
| # Example usage in app.py | |
| def import_dataset(zip_file): | |
| dataset_service = DatasetService() | |
| dataset_key = dataset_service.import_prepared_dataset(zip_file) | |
| return f"β Imported dataset: {dataset_key}" | |
| ``` | |
| ### 2. **LoRA Collection Sync** (`sync_on_startup`) | |
| - **Location**: `backend/services/hf_storage_service.py` | |
| - **Purpose**: Automatically download missing LoRAs from HF collection on app startup | |
| - **Features**: | |
| - Lists all LoRAs in collection | |
| - Compares with local LoRA directory | |
| - Downloads only missing LoRAs | |
| - Handles name conflicts with numeric suffixes | |
| - Logs sync activity | |
| ```python | |
| # Called automatically on app startup (app.py line 82) | |
| hf_storage = HFStorageService(username="Gamahea", collection_slug="lemm-100-pre-beta") | |
| sync_result = hf_storage.sync_on_startup(loras_dir=Path("models/loras")) | |
| ``` | |
| ### 3. **Enhanced LoRA Upload** | |
| - **Location**: `app.py` - `start_lora_training()` function | |
| - **Purpose**: Upload trained LoRAs to HF collection with full metadata | |
| - **Features**: | |
| - Uploads LoRA to individual model repo | |
| - Adds to collection automatically | |
| - Includes training config in metadata | |
| - Returns repo URL and collection link | |
| - Graceful error handling (saves locally if upload fails) | |
| ```python | |
| # Upload after training (app.py lines 1397-1411) | |
| upload_result = hf_storage.upload_lora(lora_dir, training_config=config) | |
| if upload_result and 'repo_id' in upload_result: | |
| # Success - show URLs | |
| progress += f"\nβ LoRA uploaded successfully!" | |
| progress += f"\nπ Model: {upload_result['repo_id']}" | |
| progress += f"\nπ Collection: https://huggingface.co/collections/Gamahea/lemm-100-pre-beta" | |
| ``` | |
| ## π¦ Name Conflict Resolution | |
| All import functions implement automatic name conflict resolution: | |
| 1. **First Check**: Try original name | |
| 2. **If Exists**: Append `_1`, `_2`, `_3`, etc. | |
| 3. **Update Metadata**: Store new name in `dataset_info.json` or `metadata.json` | |
| 4. **Log Action**: Inform user of renaming | |
| ### Example Flow | |
| ``` | |
| Original: my_dataset | |
| Already exists β my_dataset_1 | |
| Already exists β my_dataset_2 | |
| Available β Use my_dataset_2 β | |
| ``` | |
| ## π Automatic Workflows | |
| ### On App Startup | |
| 1. Check HF collection for LoRAs | |
| 2. Compare with local `models/loras/` directory | |
| 3. Download any missing LoRAs | |
| 4. Log sync results | |
| ### After LoRA Training | |
| 1. Train LoRA adapter locally | |
| 2. Upload to HF as individual model repo | |
| 3. Add to collection | |
| 4. Return URLs for viewing | |
| ### Dataset Import | |
| 1. User uploads ZIP file | |
| 2. Extract and validate structure | |
| 3. Check for name conflicts | |
| 4. Copy to `training_data/` directory | |
| 5. Update dropdown lists | |
| ## π οΈ Technical Details | |
| ### File Structure Support | |
| **LoRA ZIP Files** (both supported): | |
| ``` | |
| Option 1 (root): | |
| my_lora.zip/ | |
| βββ metadata.json | |
| βββ adapter_config.json | |
| βββ adapter_model.safetensors | |
| Option 2 (subfolder): | |
| my_lora.zip/ | |
| βββ my_lora/ | |
| βββ metadata.json | |
| βββ adapter_config.json | |
| βββ adapter_model.safetensors | |
| ``` | |
| **Dataset ZIP Files** (both supported): | |
| ``` | |
| Option 1 (root): | |
| my_dataset.zip/ | |
| βββ dataset_info.json | |
| βββ audio/ | |
| β βββ sample_000001.wav | |
| β βββ sample_000002.wav | |
| βββ splits.json | |
| Option 2 (subfolder): | |
| my_dataset.zip/ | |
| βββ my_dataset/ | |
| βββ dataset_info.json | |
| βββ audio/ | |
| βββ splits.json | |
| ``` | |
| ### Error Handling | |
| All import/sync functions include: | |
| - **Try-catch blocks** for graceful error handling | |
| - **Comprehensive logging** with context | |
| - **User-friendly error messages** | |
| - **Fallback behavior** (e.g., save locally if upload fails) | |
| ## π HuggingFace Collection Structure | |
| **Collection**: `Gamahea/lemm-100-pre-beta` | |
| - **Purpose**: Organize all LEMM LoRA adapters | |
| - **Visibility**: Public | |
| - **Items**: Individual model repos | |
| **Model Repos**: `Gamahea/lemm-lora-{name}` | |
| - **Type**: LoRA adapters (safetensors) | |
| - **Metadata**: Training config, dataset info, creation date | |
| - **Files**: adapter_model.safetensors, adapter_config.json, metadata.json | |
| ## π― User Workflows | |
| ### Train & Share a LoRA | |
| 1. Prepare dataset (curated or user audio) | |
| 2. Configure training parameters | |
| 3. Click "Start Training" | |
| 4. Wait for completion | |
| 5. LoRA automatically uploaded to HF collection | |
| 6. Share collection link with others | |
| ### Use Someone's LoRA | |
| 1. Open LEMM Space | |
| 2. App automatically syncs LoRAs from collection | |
| 3. Select LoRA in generation dropdown | |
| 4. Generate music with custom style | |
| ### Import a Dataset | |
| 1. Export dataset from another LEMM instance | |
| 2. Click "Import Dataset" in training tab | |
| 3. Upload ZIP file | |
| 4. Dataset appears in training dropdown | |
| 5. Use for LoRA training | |
| ## π Related Files | |
| - **HF Storage Service**: [backend/services/hf_storage_service.py](backend/services/hf_storage_service.py) | |
| - **Dataset Service**: [backend/services/dataset_service.py](backend/services/dataset_service.py) | |
| - **Main App**: [app.py](app.py) | |
| - **LoRA Training Service**: [backend/services/lora_training_service.py](backend/services/lora_training_service.py) | |
| ## π Commit History | |
| - **17f5813** (latest): Add dataset import & LoRA collection sync | |
| - `import_prepared_dataset()` method | |
| - `sync_on_startup()` method | |
| - Enhanced `upload_lora()` with training_config | |
| - Numeric suffix naming for conflicts | |
| - **f65e448**: Fixed LoRA import to support both ZIP structures | |
| - **2f0c8b4**: Added "Load for Training" workflow | |
| - **b40ee5f**: Fixed DataFrame handling in dataset preparation | |
| ## π Result | |
| **Complete HuggingFace ecosystem integration!** | |
| - β Auto-sync LoRAs from collection | |
| - β Upload trained LoRAs to collection | |
| - β Import/export datasets | |
| - β Name conflict resolution | |
| - β Comprehensive error handling | |
| - β User-friendly feedback | |
| All three issues from screenshots are now resolved! π | |