SportExtract NER Model
Model Description
This is a Named Entity Recognition (NER) model fine-tuned on Indonesian sports news articles, specifically for football/soccer content.
Base Model: IndoBERT (indobenchmark/indobert-base-p1)
Model Type: Multi-label token classification
Entities Detected
The model can detect the following entities in Indonesian sports articles:
- ATLET - Athletes/Players
- TIM - Teams
- ORGANISASI - Organizations
- KEWARGANEGARAAN - Nationality
- POSISI - Player positions
- UMUR - Age
- AKSI - Actions in matches
- PENGHARGAAN - Awards/achievements
- STATISTIK - Statistics
- SKOR - Match scores
- TANGGAL - Dates
- STADION - Stadiums
- KEJUARAAN - Tournaments/competitions
- ALASAN_PERISTIWA - Event reasons/context
Usage
import torch
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(
repo_id="george121212afasf/model",
filename="best_model.pt"
)
# Load checkpoint
checkpoint = torch.load(model_path, map_location='cpu')
# Get tokenizer
tokenizer = AutoTokenizer.from_pretrained("indobenchmark/indobert-base-p1")
# Your model class and inference code here
Training Data
Trained on annotated Indonesian sports news articles from various sources.
Model Size
- Parameters: ~125M (IndoBERT base)
- File size: ~1420 MB
Intended Use
This model is designed for extracting sports-related entities from Indonesian news articles, particularly for:
- Sports journalism analysis
- Automated content tagging
- Information extraction from sports news
- 5W1H (Who, What, When, Where, Why, How) analysis
Limitations
- Optimized for Indonesian language sports content
- Best performance on football, basketball, and badminton articles
- May not generalize well to other sports domains
Contact
For questions or feedback, please open an issue in the repository.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support