README.md · jalonso24/llama-lateblight-classifier at main

llama-lateblight-classifier / README.md

jalonso24

Update README.md

f116e97 verified 7 months ago

preview code

raw

history blame contribute delete

3.91 kB

	---
	language: [es]
	license: mit
	tags:
	- text-classification
	- agriculture
	- climate
	- potato
	- Peru
	- Huancavelica
	- LLaMA
	- environmental-prediction
	model-index:
	- name: llama-lateblight-classifier
	results:
	- task:
	type: text-classification
	name: Potato Late Blight Risk Classification
	dataset:
	name: Huancavelica Late Blight Benchmark (Balanced)
	type: tabular
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.97
	- name: F1 (macro)
	type: f1
	value: 0.97
	- name: Precision
	type: precision
	value: 0.97
	- name: Recall
	type: recall
	value: 0.97
	pipeline_tag: text-classification
	library_name: transformers
	---

	# 🌾 LLaMA Late Blight Classifier (Huancavelica, Peru)

	This model is a fine-tuned classifier based on `openlm-research/open_llama_3b`, trained to predict potato late blight risk levels (`Bajo`, `Moderado`, `Alto`) in the highlands of Huancavelica, Peru. It uses environmental inputs (temperature, humidity, precipitation) and crop variety metadata to output discrete classifications.

	---

	## 🤝 Use Case

	Direct Use: Agronomic advisory systems or research tools predicting potato late blight risk from structured prompts or API queries.

	Not for: Open-ended generation, conversational use, or regions with different pathogen pressures without retraining.

	---

	## 🌐 Model Details

	- Base model: `openlm-research/open_llama_3b`
	- Architecture: LLaMA-3B with classification head (`AutoModelForSequenceClassification`)
	- Fine-tuning method: Full fine-tuning on a balanced, curated dataset (not LoRA)
	- Tokenizer: Compatible LLaMA tokenizer (`tokenizer.model` included)
	- Language: Spanish (with structured Spanish prompts)
	- Task: Hard classification (3-class)

	---

	## 🎓 Training

	- Dataset: 156 training + 24 validation examples (balanced across 3 classes)
	- Labels: `Bajo`, `Moderado`, `Alto`
	- Format (JSONL):
	```json
	{
	"instruction": "Evalúa el riesgo de tizón tardío basado en los datos climáticos y la variedad.",
	"input": "Escenario 1: Temperatura promedio 17.2 °C, Humedad 83%, Precipitación 3.4 mm, Variedad Yungay",
	"output": "Moderado"
	}
	```
	- Epochs: 10
	- Optimizer: AdamW (mixed precision)
	- Hardware: 1x A100 40GB (Colab Pro, single GPU)

	---

	## 🌿 Evaluation (Balanced Test Set, n = 90)

	\| Class \| Precision \| Recall \| F1 \| Support \|
	\|-----------\|-----------\|--------\|-------\|---------\|
	\| Bajo \| 1.00 \| 0.90 \| 0.95 \| 30 \|
	\| Moderado \| 0.91 \| 1.00 \| 0.95 \| 30 \|
	\| Alto \| 1.00 \| 1.00 \| 1.00 \| 30 \|
	\| Accuracy \| \| \| 0.97 \| 90 \|

	---

	## 📈 Intended Use and Limitations

	- Designed for: Highland regions in Peru (esp. Huancavelica), with expert-labeled ground truth and local pathogen behavior.
	- Limitations:
	- May generalize poorly to lowland areas or different varieties.
	- Not a substitute for in-field disease monitoring.

	---

	## 📑 Citation

	If you use this model, please cite:

	> Jorge Luis Alonso, Predicting Potato Late Blight in Huancavelica Using LLaMA Models, 2025

	---

	## 🌍 License

	MIT License (model + training data)

	---

	## ⚡ Quick Inference Example

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
	model = AutoModelForSequenceClassification.from_pretrained("jalonso24/llama-lateblight-classifier")
	tokenizer = AutoTokenizer.from_pretrained("jalonso24/llama-lateblight-classifier")
	clf = pipeline("text-classification", model=model, tokenizer=tokenizer, top_k=1)

	prompt = "Escenario: Temperatura 18.1 °C, Humedad 85%, Variedad Amarilis"
	clf(prompt)
	# ➞ [{'label': 'Alto', 'score': 0.95}]
	```