--- license: apache-2.0 base_model: deepseek-ai/deepseek-coder-1.3b-instruct tags: - security - vulnerability-detection - penetration-testing - code-analysis - cybersecurity - lora - deepseek library_name: peft pipeline_tag: text-generation --- # Pentest Vulnerability Detector ## Model Description This is a fine-tuned version of DeepSeek-Coder-1.3B-Instruct, specialized for detecting security vulnerabilities in code. **Base Model:** deepseek-ai/deepseek-coder-1.3b-instruct **Training Data:** 440 synthetic vulnerability examples **Training Method:** LoRA (Low-Rank Adaptation) with 4-bit quantization **Training Platform:** Google Colab (Free T4 GPU) ## Capabilities The model can detect and analyze: - SQL Injection - Cross-Site Scripting (XSS) - Command Injection / RCE - Insecure Direct Object Reference (IDOR) - Server-Side Request Forgery (SSRF) - Authentication Bypass - Cross-Site Request Forgery (CSRF) - Path Traversal ## Training Details - **Examples:** 440 vulnerability patterns - **Epochs:** 3 - **Batch Size:** 2 (with gradient accumulation) - **Learning Rate:** 2e-4 - **LoRA Rank:** 8 - **Quantization:** 4-bit (NF4) - **Training Time:** ~45-60 minutes on T4 GPU ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model base_model = "deepseek-ai/deepseek-coder-1.3b-instruct" model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto") tokenizer = AutoTokenizer.from_pretrained(base_model) # Load LoRA adapter model = PeftModel.from_pretrained(model, "YOUR_USERNAME/pentest-vulnerability-detector") # Analyze code code = "SELECT * FROM users WHERE id = 'user_input'" prompt = f"System: You are a security expert.\n\nUser: Analyze this code:\n{code}\n\nAssistant:" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=200) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## Inference Script For easier usage, use the provided inference script: ```bash python inference_deepseek.py --model ./model --code "YOUR_CODE_HERE" ``` ## Model Performance The model provides: - Vulnerability type identification - Severity assessment (CRITICAL/HIGH/MEDIUM/LOW) - Detailed attack vector analysis - Specific remediation recommendations - Code-specific security guidance ## Limitations - Not 100% accurate - always verify findings manually - May have false positives/negatives - Best used as a pre-screening tool - Should complement, not replace, manual security testing - Trained on synthetic data - may need fine-tuning for specific use cases ## Ethical Use This model is intended for: - Security research - Penetration testing (authorized only) - Code review and security auditing - Educational purposes **Do not use for:** - Unauthorized system access - Malicious activities - Illegal purposes ## Training Data The model was trained on 440 synthetic vulnerability examples covering: - 100 SQL Injection patterns - 80 XSS patterns - 60 Command Injection patterns - 50 IDOR patterns - 40 SSRF patterns - 40 Authentication Bypass patterns - 40 CSRF patterns - 30 Path Traversal patterns ## Citation If you use this model, please cite: ``` @misc{pentest-vulnerability-detector, author = {YOUR_NAME}, title = {Pentest Vulnerability Detector}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/YOUR_USERNAME/pentest-vulnerability-detector}} } ``` ## License This model adapter is released under the **Apache 2.0 License**. The base model (DeepSeek-Coder-1.3B-Instruct) has its own license terms. ### Apache 2.0 License Summary: - ✅ Commercial use allowed - ✅ Modification allowed - ✅ Distribution allowed - ✅ Patent use allowed - ⚠️ Must include license and copyright notice - ⚠️ Must state changes made See LICENSE file for full terms. ## Contact For questions or issues, please open an issue on the model repository. ## Acknowledgments - Base model: DeepSeek-Coder by DeepSeek AI - Training framework: Hugging Face Transformers, PEFT - Training platform: Google Colab