Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: ASI V2.5 ZeroGPU Demo
emoji: π
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
π ASI V2.5: ZeroGPU H200 Performance Demo
REAL GPU ASI V2.5 Testing on NVIDIA H200!
This Space demonstrates the Adaptive Structured Intelligence V2.5 attention mechanism running on ZeroGPU H200 with 70GB VRAM - finally showing the real speedups ASI was designed for!
β¨ Features
- π ZeroGPU H200: NVIDIA H200 with 70GB VRAM (FREE with Pro!)
- π₯ Real GPU Performance: Actual CUDA optimizations and tensor cores
- π Long Sequences: Test up to 8192 tokens (impossible on CPU)
- β‘ Mixed Precision: FP16 optimization for H200
- π― Adaptive Attention: Linear O(L) for long sequences, Exact O(LΒ²) for short
π― Performance Expectations
| Sequence Length | Expected Speedup | Attention Type |
|---|---|---|
| 1024 tokens | 1.2-1.5x | Linear |
| 2048 tokens | 1.5-2.0x | Linear |
| 4096 tokens | 2.0-2.5x | Linear |
| 8192 tokens | 2.5x+ | Linear |
π§ How to Use
- Configure ASI parameters (threshold, feature_dim, etc.)
- Set sequence lengths to test (try long ones!)
- Click "Run ZeroGPU ASI V2.5 Test"
- See REAL GPU speedups!
π‘ Why ZeroGPU?
- β FREE with HuggingFace Pro ($9/mois)
- β 70GB VRAM - handles very long sequences
- β NVIDIA H200 - latest architecture
- β 8x quota with Pro priority
π Technical Details
- ASI V2.5: Adaptive attention switching
- CUDA Optimized: Tensor cores + mixed precision
- Memory Efficient: 70GB VRAM utilization
- Production Ready: Real deployment example
π Links
- π€ Model: khopilot/asi-v25-longformer-core
- π GitHub: khopilot/asi-v25-longformer-core
- π¦ PyPI: asi-v25-longformer
Finally - ASI V2.5 running on real GPU with the speedups it was designed for! π