lemm-test-100 / HF_COLLECTION_INTEGRATION.md
Gamahea
Initialize dropdowns with data on app load - Populate training dataset dropdown with prepared datasets on startup - Initialize LoRA dropdowns with available LoRAs - Load LoRA list table with existing data - Populate export dataset dropdown - Fixes 'No prepared datasets available' when datasets exist
5912922
# HuggingFace Collection Integration - Complete
## 🎯 Overview
Full integration with HuggingFace Collection for LEMM LoRAs and datasets, including automatic syncing, import/export, and name conflict resolution.
## βœ… Implemented Features
### 1. **Dataset Import** (`import_prepared_dataset`)
- **Location**: `backend/services/dataset_service.py`
- **Purpose**: Import prepared datasets from ZIP files
- **Features**:
- Supports both root-level and subfolder `dataset_info.json` structures
- Automatic name conflict resolution with numeric suffixes (`_1`, `_2`, etc.)
- Validates dataset structure before import
- Updates metadata with new dataset key if renamed
```python
# Example usage in app.py
def import_dataset(zip_file):
dataset_service = DatasetService()
dataset_key = dataset_service.import_prepared_dataset(zip_file)
return f"βœ… Imported dataset: {dataset_key}"
```
### 2. **LoRA Collection Sync** (`sync_on_startup`)
- **Location**: `backend/services/hf_storage_service.py`
- **Purpose**: Automatically download missing LoRAs from HF collection on app startup
- **Features**:
- Lists all LoRAs in collection
- Compares with local LoRA directory
- Downloads only missing LoRAs
- Handles name conflicts with numeric suffixes
- Logs sync activity
```python
# Called automatically on app startup (app.py line 82)
hf_storage = HFStorageService(username="Gamahea", collection_slug="lemm-100-pre-beta")
sync_result = hf_storage.sync_on_startup(loras_dir=Path("models/loras"))
```
### 3. **Enhanced LoRA Upload**
- **Location**: `app.py` - `start_lora_training()` function
- **Purpose**: Upload trained LoRAs to HF collection with full metadata
- **Features**:
- Uploads LoRA to individual model repo
- Adds to collection automatically
- Includes training config in metadata
- Returns repo URL and collection link
- Graceful error handling (saves locally if upload fails)
```python
# Upload after training (app.py lines 1397-1411)
upload_result = hf_storage.upload_lora(lora_dir, training_config=config)
if upload_result and 'repo_id' in upload_result:
# Success - show URLs
progress += f"\nβœ… LoRA uploaded successfully!"
progress += f"\nπŸ”— Model: {upload_result['repo_id']}"
progress += f"\nπŸ“š Collection: https://huggingface.co/collections/Gamahea/lemm-100-pre-beta"
```
## πŸ“¦ Name Conflict Resolution
All import functions implement automatic name conflict resolution:
1. **First Check**: Try original name
2. **If Exists**: Append `_1`, `_2`, `_3`, etc.
3. **Update Metadata**: Store new name in `dataset_info.json` or `metadata.json`
4. **Log Action**: Inform user of renaming
### Example Flow
```
Original: my_dataset
Already exists β†’ my_dataset_1
Already exists β†’ my_dataset_2
Available β†’ Use my_dataset_2 βœ…
```
## πŸ”„ Automatic Workflows
### On App Startup
1. Check HF collection for LoRAs
2. Compare with local `models/loras/` directory
3. Download any missing LoRAs
4. Log sync results
### After LoRA Training
1. Train LoRA adapter locally
2. Upload to HF as individual model repo
3. Add to collection
4. Return URLs for viewing
### Dataset Import
1. User uploads ZIP file
2. Extract and validate structure
3. Check for name conflicts
4. Copy to `training_data/` directory
5. Update dropdown lists
## πŸ› οΈ Technical Details
### File Structure Support
**LoRA ZIP Files** (both supported):
```
Option 1 (root):
my_lora.zip/
β”œβ”€β”€ metadata.json
β”œβ”€β”€ adapter_config.json
└── adapter_model.safetensors
Option 2 (subfolder):
my_lora.zip/
└── my_lora/
β”œβ”€β”€ metadata.json
β”œβ”€β”€ adapter_config.json
└── adapter_model.safetensors
```
**Dataset ZIP Files** (both supported):
```
Option 1 (root):
my_dataset.zip/
β”œβ”€β”€ dataset_info.json
β”œβ”€β”€ audio/
β”‚ β”œβ”€β”€ sample_000001.wav
β”‚ └── sample_000002.wav
└── splits.json
Option 2 (subfolder):
my_dataset.zip/
└── my_dataset/
β”œβ”€β”€ dataset_info.json
β”œβ”€β”€ audio/
└── splits.json
```
### Error Handling
All import/sync functions include:
- **Try-catch blocks** for graceful error handling
- **Comprehensive logging** with context
- **User-friendly error messages**
- **Fallback behavior** (e.g., save locally if upload fails)
## πŸ“Š HuggingFace Collection Structure
**Collection**: `Gamahea/lemm-100-pre-beta`
- **Purpose**: Organize all LEMM LoRA adapters
- **Visibility**: Public
- **Items**: Individual model repos
**Model Repos**: `Gamahea/lemm-lora-{name}`
- **Type**: LoRA adapters (safetensors)
- **Metadata**: Training config, dataset info, creation date
- **Files**: adapter_model.safetensors, adapter_config.json, metadata.json
## 🎯 User Workflows
### Train & Share a LoRA
1. Prepare dataset (curated or user audio)
2. Configure training parameters
3. Click "Start Training"
4. Wait for completion
5. LoRA automatically uploaded to HF collection
6. Share collection link with others
### Use Someone's LoRA
1. Open LEMM Space
2. App automatically syncs LoRAs from collection
3. Select LoRA in generation dropdown
4. Generate music with custom style
### Import a Dataset
1. Export dataset from another LEMM instance
2. Click "Import Dataset" in training tab
3. Upload ZIP file
4. Dataset appears in training dropdown
5. Use for LoRA training
## πŸ”— Related Files
- **HF Storage Service**: [backend/services/hf_storage_service.py](backend/services/hf_storage_service.py)
- **Dataset Service**: [backend/services/dataset_service.py](backend/services/dataset_service.py)
- **Main App**: [app.py](app.py)
- **LoRA Training Service**: [backend/services/lora_training_service.py](backend/services/lora_training_service.py)
## πŸ“ Commit History
- **17f5813** (latest): Add dataset import & LoRA collection sync
- `import_prepared_dataset()` method
- `sync_on_startup()` method
- Enhanced `upload_lora()` with training_config
- Numeric suffix naming for conflicts
- **f65e448**: Fixed LoRA import to support both ZIP structures
- **2f0c8b4**: Added "Load for Training" workflow
- **b40ee5f**: Fixed DataFrame handling in dataset preparation
## πŸŽ‰ Result
**Complete HuggingFace ecosystem integration!**
- βœ… Auto-sync LoRAs from collection
- βœ… Upload trained LoRAs to collection
- βœ… Import/export datasets
- βœ… Name conflict resolution
- βœ… Comprehensive error handling
- βœ… User-friendly feedback
All three issues from screenshots are now resolved! πŸš€