File size: 7,896 Bytes
c1bc514
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
# πŸ“– User Guide - ZeroGPU LLM Inference

## Quick Start (5 Minutes)

### 1. Choose Your Model
The model dropdown shows 30+ options organized by size:
- **Compact (<2B)**: Fast, lightweight - great for quick responses
- **Mid-size (2-8B)**: Best balance of speed and quality
- **Large (14B+)**: Highest quality, slower but more capable

**Recommendation for beginners**: Start with `Qwen3-4B-Instruct-2507`

### 2. Try an Example Prompt
Click on any example below the chat box to get started:
- "Explain quantum computing in simple terms"
- "Write a Python function..."
- "What are the latest developments..." (requires web search)

### 3. Start Chatting!
Type your message and press Enter or click "πŸ“€ Send"

## Core Features

### πŸ’¬ Chat Interface

The main chat area shows:
- Your messages on one side
- AI responses with a πŸ€– avatar
- Copy button on each message
- Smooth streaming as tokens generate

**Tips:**
- Press Enter to send (Shift+Enter for new line)
- Click Copy button to save responses
- Scroll up to review history
- Use Clear Chat to start fresh

### πŸ€– Model Selection

**When to use each size:**

| Model Size | Best For | Speed | Quality |
|------------|----------|-------|---------|
| <2B | Quick questions, testing | ⚑⚑⚑ | ⭐⭐ |
| 2-8B | General chat, coding help | ⚑⚑ | ⭐⭐⭐ |
| 14B+ | Complex reasoning, long-form | ⚑ | ⭐⭐⭐⭐ |

**Specialized Models:**
- **Phi-4-mini-Reasoning**: Math, logic problems
- **Qwen2.5-Coder**: Programming tasks
- **DeepSeek-R1-Distill**: Step-by-step reasoning
- **Apriel-1.5-15b-Thinker**: Multimodal understanding

### πŸ” Web Search

Enable this when you need:
- Current events and news
- Recent information (after model training cutoff)
- Facts that change frequently
- Real-time data

**How it works:**
1. Toggle "πŸ” Enable Web Search"
2. Web search settings accordion appears
3. System prompt updates automatically
4. Search runs in background (won't block chat)
5. Results injected into context

**Settings explained:**
- **Max Results**: How many search results to fetch (4 is good default)
- **Max Chars/Result**: Limit length per result (50 prevents overwhelming context)
- **Search Timeout**: Maximum wait time (5s recommended)

### πŸ“ System Prompt

This defines the AI's personality and behavior.

**Default prompts:**
- Without search: Helpful, creative assistant
- With search: Includes search results and current date

**Customization ideas:**
```
You are a professional code reviewer...
You are a creative writing coach...
You are a patient tutor explaining concepts simply...
You are a technical documentation writer...
```

## Advanced Features

### πŸŽ›οΈ Advanced Generation Parameters

Click the accordion to reveal these controls:

#### Max Tokens (64-16384)
- **What it does**: Sets maximum response length
- **Lower (256-512)**: Quick, concise answers
- **Medium (1024)**: Balanced (default)
- **Higher (2048+)**: Long-form content, detailed explanations

#### Temperature (0.1-2.0)
- **What it does**: Controls randomness/creativity
- **Low (0.1-0.3)**: Focused, deterministic (good for facts, code)
- **Medium (0.7)**: Balanced creativity (default)
- **High (1.2-2.0)**: Very creative, unpredictable (stories, brainstorming)

#### Top-K (1-100)
- **What it does**: Limits token choices to top K most likely
- **Lower (10-20)**: More focused
- **Medium (40)**: Balanced (default)
- **Higher (80-100)**: More varied vocabulary

#### Top-P (0.1-1.0)
- **What it does**: Nucleus sampling threshold
- **Lower (0.5-0.7)**: Conservative choices
- **Medium (0.9)**: Balanced (default)
- **Higher (0.95-1.0)**: Full vocabulary range

#### Repetition Penalty (1.0-2.0)
- **What it does**: Reduces repeated words/phrases
- **Low (1.0-1.1)**: Allows some repetition
- **Medium (1.2)**: Balanced (default)
- **High (1.5+)**: Strongly avoids repetition (may hurt coherence)

### Preset Configurations

**For Creative Writing:**
```
Temperature: 1.2
Top-P: 0.95
Top-K: 80
Max Tokens: 2048
```

**For Code Generation:**
```
Temperature: 0.3
Top-P: 0.9
Top-K: 40
Max Tokens: 1024
Repetition Penalty: 1.1
```

**For Factual Q&A:**
```
Temperature: 0.5
Top-P: 0.85
Top-K: 30
Max Tokens: 512
Enable Web Search: Yes
```

**For Reasoning Tasks:**
```
Model: Phi-4-mini-Reasoning or DeepSeek-R1
Temperature: 0.7
Max Tokens: 2048
```

## Tips & Tricks

### 🎯 Getting Better Results

1. **Be Specific**: "Write a Python function to sort a list" β†’ "Write a Python function that sorts a list of dictionaries by a specific key"

2. **Provide Context**: "Explain recursion" β†’ "Explain recursion to someone learning programming for the first time, with a simple example"

3. **Use System Prompts**: Define role/expertise in system prompt instead of every message

4. **Iterate**: Use follow-up questions to refine responses

5. **Experiment with Models**: Try different models for the same task

### ⚑ Performance Tips

1. **Start Small**: Test with smaller models first
2. **Adjust Max Tokens**: Don't request more than you need
3. **Use Cancel**: Stop bad generations early
4. **Clear Cache**: Clear chat if experiencing slowdowns
5. **One Task at a Time**: Don't send multiple requests simultaneously

### πŸ” When to Use Web Search

**βœ… Good use cases:**
- "What happened in the latest SpaceX launch?"
- "Current cryptocurrency prices"
- "Recent AI research papers"
- "Today's weather in Paris"

**❌ Don't need search for:**
- General knowledge questions
- Code writing/debugging
- Math problems
- Creative writing
- Theoretical explanations

### πŸ’­ Understanding Thinking Mode

Some models output `<think>...</think>` blocks:

```
<think>
Let me break this down step by step...
First, I need to consider...
</think>

Here's the answer: ...
```

**In the UI:**
- Thinking shows as "πŸ’­ Thought"
- Answer shows separately
- Helps you see the reasoning process

**Best for:**
- Complex math problems
- Multi-step reasoning
- Debugging logic
- Learning how AI thinks

## Troubleshooting

### Generation is Slow
- Try a smaller model
- Reduce Max Tokens
- Disable web search if not needed
- Clear chat history

### Responses are Repetitive
- Increase Repetition Penalty
- Reduce Temperature slightly
- Try different model

### Responses are Random/Nonsensical
- Decrease Temperature
- Reduce Top-P
- Reduce Top-K
- Try more stable model

### Web Search Not Working
- Check timeout isn't too short
- Verify internet connection
- Try increasing Max Results
- Check search query in debug panel

### Cancel Button Doesn't Work
- Wait a moment (might be processing)
- Refresh page if persists
- Check browser console for errors

## Keyboard Shortcuts

- **Enter**: Send message
- **Shift+Enter**: New line in input
- **Ctrl+C**: Copy (when text selected)
- **Ctrl+A**: Select all in input

## Best Practices

### For Beginners
1. Start with example prompts
2. Use default settings initially
3. Try 2-4 different models
4. Gradually explore advanced settings
5. Read responses fully before replying

### For Power Users
1. Create custom system prompts
2. Fine-tune parameters per task
3. Use debug panel for prompt engineering
4. Experiment with model combinations
5. Utilize web search strategically

### For Developers
1. Study the debug output
2. Test code generation thoroughly
3. Use lower temperature for determinism
4. Compare multiple models
5. Save working configurations

## Privacy & Safety

- **No data collection**: Conversations not stored permanently
- **Model limitations**: May produce incorrect information
- **Verify important info**: Don't rely solely on AI for critical decisions
- **Web search**: Uses DuckDuckGo (privacy-focused)
- **Open source**: Code is transparent and auditable

## Support & Feedback

Found a bug? Have a suggestion?
- Check GitHub issues
- Submit feature requests
- Contribute improvements
- Share your use cases

---

**Happy chatting! πŸŽ‰**