Spaces:
Running
Running
File size: 7,896 Bytes
c1bc514 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
# π User Guide - ZeroGPU LLM Inference
## Quick Start (5 Minutes)
### 1. Choose Your Model
The model dropdown shows 30+ options organized by size:
- **Compact (<2B)**: Fast, lightweight - great for quick responses
- **Mid-size (2-8B)**: Best balance of speed and quality
- **Large (14B+)**: Highest quality, slower but more capable
**Recommendation for beginners**: Start with `Qwen3-4B-Instruct-2507`
### 2. Try an Example Prompt
Click on any example below the chat box to get started:
- "Explain quantum computing in simple terms"
- "Write a Python function..."
- "What are the latest developments..." (requires web search)
### 3. Start Chatting!
Type your message and press Enter or click "π€ Send"
## Core Features
### π¬ Chat Interface
The main chat area shows:
- Your messages on one side
- AI responses with a π€ avatar
- Copy button on each message
- Smooth streaming as tokens generate
**Tips:**
- Press Enter to send (Shift+Enter for new line)
- Click Copy button to save responses
- Scroll up to review history
- Use Clear Chat to start fresh
### π€ Model Selection
**When to use each size:**
| Model Size | Best For | Speed | Quality |
|------------|----------|-------|---------|
| <2B | Quick questions, testing | β‘β‘β‘ | ββ |
| 2-8B | General chat, coding help | β‘β‘ | βββ |
| 14B+ | Complex reasoning, long-form | β‘ | ββββ |
**Specialized Models:**
- **Phi-4-mini-Reasoning**: Math, logic problems
- **Qwen2.5-Coder**: Programming tasks
- **DeepSeek-R1-Distill**: Step-by-step reasoning
- **Apriel-1.5-15b-Thinker**: Multimodal understanding
### π Web Search
Enable this when you need:
- Current events and news
- Recent information (after model training cutoff)
- Facts that change frequently
- Real-time data
**How it works:**
1. Toggle "π Enable Web Search"
2. Web search settings accordion appears
3. System prompt updates automatically
4. Search runs in background (won't block chat)
5. Results injected into context
**Settings explained:**
- **Max Results**: How many search results to fetch (4 is good default)
- **Max Chars/Result**: Limit length per result (50 prevents overwhelming context)
- **Search Timeout**: Maximum wait time (5s recommended)
### π System Prompt
This defines the AI's personality and behavior.
**Default prompts:**
- Without search: Helpful, creative assistant
- With search: Includes search results and current date
**Customization ideas:**
```
You are a professional code reviewer...
You are a creative writing coach...
You are a patient tutor explaining concepts simply...
You are a technical documentation writer...
```
## Advanced Features
### ποΈ Advanced Generation Parameters
Click the accordion to reveal these controls:
#### Max Tokens (64-16384)
- **What it does**: Sets maximum response length
- **Lower (256-512)**: Quick, concise answers
- **Medium (1024)**: Balanced (default)
- **Higher (2048+)**: Long-form content, detailed explanations
#### Temperature (0.1-2.0)
- **What it does**: Controls randomness/creativity
- **Low (0.1-0.3)**: Focused, deterministic (good for facts, code)
- **Medium (0.7)**: Balanced creativity (default)
- **High (1.2-2.0)**: Very creative, unpredictable (stories, brainstorming)
#### Top-K (1-100)
- **What it does**: Limits token choices to top K most likely
- **Lower (10-20)**: More focused
- **Medium (40)**: Balanced (default)
- **Higher (80-100)**: More varied vocabulary
#### Top-P (0.1-1.0)
- **What it does**: Nucleus sampling threshold
- **Lower (0.5-0.7)**: Conservative choices
- **Medium (0.9)**: Balanced (default)
- **Higher (0.95-1.0)**: Full vocabulary range
#### Repetition Penalty (1.0-2.0)
- **What it does**: Reduces repeated words/phrases
- **Low (1.0-1.1)**: Allows some repetition
- **Medium (1.2)**: Balanced (default)
- **High (1.5+)**: Strongly avoids repetition (may hurt coherence)
### Preset Configurations
**For Creative Writing:**
```
Temperature: 1.2
Top-P: 0.95
Top-K: 80
Max Tokens: 2048
```
**For Code Generation:**
```
Temperature: 0.3
Top-P: 0.9
Top-K: 40
Max Tokens: 1024
Repetition Penalty: 1.1
```
**For Factual Q&A:**
```
Temperature: 0.5
Top-P: 0.85
Top-K: 30
Max Tokens: 512
Enable Web Search: Yes
```
**For Reasoning Tasks:**
```
Model: Phi-4-mini-Reasoning or DeepSeek-R1
Temperature: 0.7
Max Tokens: 2048
```
## Tips & Tricks
### π― Getting Better Results
1. **Be Specific**: "Write a Python function to sort a list" β "Write a Python function that sorts a list of dictionaries by a specific key"
2. **Provide Context**: "Explain recursion" β "Explain recursion to someone learning programming for the first time, with a simple example"
3. **Use System Prompts**: Define role/expertise in system prompt instead of every message
4. **Iterate**: Use follow-up questions to refine responses
5. **Experiment with Models**: Try different models for the same task
### β‘ Performance Tips
1. **Start Small**: Test with smaller models first
2. **Adjust Max Tokens**: Don't request more than you need
3. **Use Cancel**: Stop bad generations early
4. **Clear Cache**: Clear chat if experiencing slowdowns
5. **One Task at a Time**: Don't send multiple requests simultaneously
### π When to Use Web Search
**β
Good use cases:**
- "What happened in the latest SpaceX launch?"
- "Current cryptocurrency prices"
- "Recent AI research papers"
- "Today's weather in Paris"
**β Don't need search for:**
- General knowledge questions
- Code writing/debugging
- Math problems
- Creative writing
- Theoretical explanations
### π Understanding Thinking Mode
Some models output `<think>...</think>` blocks:
```
<think>
Let me break this down step by step...
First, I need to consider...
</think>
Here's the answer: ...
```
**In the UI:**
- Thinking shows as "π Thought"
- Answer shows separately
- Helps you see the reasoning process
**Best for:**
- Complex math problems
- Multi-step reasoning
- Debugging logic
- Learning how AI thinks
## Troubleshooting
### Generation is Slow
- Try a smaller model
- Reduce Max Tokens
- Disable web search if not needed
- Clear chat history
### Responses are Repetitive
- Increase Repetition Penalty
- Reduce Temperature slightly
- Try different model
### Responses are Random/Nonsensical
- Decrease Temperature
- Reduce Top-P
- Reduce Top-K
- Try more stable model
### Web Search Not Working
- Check timeout isn't too short
- Verify internet connection
- Try increasing Max Results
- Check search query in debug panel
### Cancel Button Doesn't Work
- Wait a moment (might be processing)
- Refresh page if persists
- Check browser console for errors
## Keyboard Shortcuts
- **Enter**: Send message
- **Shift+Enter**: New line in input
- **Ctrl+C**: Copy (when text selected)
- **Ctrl+A**: Select all in input
## Best Practices
### For Beginners
1. Start with example prompts
2. Use default settings initially
3. Try 2-4 different models
4. Gradually explore advanced settings
5. Read responses fully before replying
### For Power Users
1. Create custom system prompts
2. Fine-tune parameters per task
3. Use debug panel for prompt engineering
4. Experiment with model combinations
5. Utilize web search strategically
### For Developers
1. Study the debug output
2. Test code generation thoroughly
3. Use lower temperature for determinism
4. Compare multiple models
5. Save working configurations
## Privacy & Safety
- **No data collection**: Conversations not stored permanently
- **Model limitations**: May produce incorrect information
- **Verify important info**: Don't rely solely on AI for critical decisions
- **Web search**: Uses DuckDuckGo (privacy-focused)
- **Open source**: Code is transparent and auditable
## Support & Feedback
Found a bug? Have a suggestion?
- Check GitHub issues
- Submit feature requests
- Contribute improvements
- Share your use cases
---
**Happy chatting! π**
|