SAM2 ID Segmenter
Lightweight wrapper and fine‑tuning scaffold around Meta's Segment Anything 2 (SAM2) adapted to segment structured regions in ID / document images (e.g. portrait, number field, security areas). The repository currently focuses on: (1) reproducible loading of a fine‑tuned SAM2 checkpoint, (2) automatic multi‑mask generation + tight cropping, and (3) configuration file driven training/inference settings.
Status: Inference wrapper implemented (
SamSegmentator). End‑to‑end training loop is a planned addition. Config already anticipates training hyper‑parameters.
Contents
- Motivation & Scope
- Intended Use & Non‑Goals
- Repository Structure
- Configuration (
config.json) - Installation
- Inference Usage (
SamSegmentator) - Dataset & Mask Format (planned training)
- Checkpoints & Auto‑Download
- Metrics (recommended)
- Limitations & Risks
- Roadmap
- License & Citation
1. Motivation & Scope
Document / ID workflows often need fast class‑agnostic region extraction (for OCR, redaction, or downstream classifiers). SAM2 provides strong general mask proposals; this project wraps it to directly yield cropped image + mask pairs ordered by area and optionally padded.
2. Intended Use & Non‑Goals
Intended:
- Pre‑segmentation of ID / document fields prior to OCR.
- Selective anonymization / redaction pipelines (masking faces, MRZ, barcodes, etc.).
- Rapid prototyping for custom fine‑tuning of SAM2 on a small set of document classes.
Non‑Goals:
- Biometric identity verification or authoritative fraud detection.
- Legal decision making without human review.
- Full multi‑modal extraction (text recognition is out of scope here).
3. Repository Structure
model_repo/
config.json # Central hyper‑parameter & path config
README.md # (this file)
checkpoints/ # Local downloaded / fine‑tuned checkpoints
samples/
sample_us_passport.jpg
src/
sam_segmentator.py # Inference wrapper (SamSegmentator)
main.py # Placeholder entry point
Planned: train/ scripts for fine‑tuning (not yet implemented).
4. Configuration (model_repo/config.json)
Key fields (example values included in the repo):
model_type: Alwayssam2here.checkpoint_path: Path relative to project root or absolute; if omitted andauto_download=Truethe code will attempt remote download.image_size: Target square size used during training (future). Inference wrapper accepts raw image size.num_classes,class_names: For supervised training (future); not required by the current automatic mask generator, but kept for consistency.augmentation,loss,optimizer,lr_scheduler: Reserved for training loop integration.paths: Expected dataset layout for training:data/train/images,data/train/masks, etc.mixed_precision: Will enabletorch.autocastduring training.
Even if not all fields are consumed now, keeping them centralized avoids future breaking refactors.
5. Installation
Prerequisites
- Python 3.10+ (recommended)
- CUDA GPU (optional but recommended for speed)
Using uv (preferred fast resolver)
If pyproject.toml is present (it is), you can do:
uv sync
This creates / updates the virtual environment and installs dependencies.
Using pip (alternative)
python -m venv .venv
.venv\Scripts\activate
pip install -U pip
pip install -e .
If SAM2 is not a published package in your environment, you may need to install it from source (instructions will depend on the upstream SAM2 repository—add here when finalized).
6. Inference Usage (SamSegmentator)
Minimal example using the sample passport image:
import cv2
from pathlib import Path
from src.sam_segmentator import SamSegmentator
image_path = Path("samples/sample_us_passport.jpg")
img_bgr = cv2.imread(str(image_path)) # BGR (OpenCV)
segmentator = SamSegmentator(
checkpoint_path="checkpoints/sam2.1_hiera_base_plus_ft_ids.pt", # or None to auto-download if configured
pred_iou_thresh=0.88, # forwarded to SAM2AutomaticMaskGenerator
stability_score_thresh=0.90,
)
segments = segmentator.infer(img_bgr, pad_percent=0.05)
print(f"Total segments: {len(segments)}")
# Each segment is (crop_bgr, mask_255)
for i, (crop, mask) in enumerate(segments[:3]):
cv2.imwrite(f"outputs/segment_{i}_crop.png", crop)
cv2.imwrite(f"outputs/segment_{i}_mask.png", mask)
Output: pairs of tightly cropped images and their binary masks (0 background, 255 foreground), sorted by mask area descending.
Parameter Notes
pad_percent: Relative padding (default 5%) added around each tight bounding box.- Deprecated
pad(absolute pixels) still accepted but will warn. - All additional kwargs go to
SAM2AutomaticMaskGenerator(e.g.,box_nms_thresh,min_mask_region_area).
7. Dataset & Mask Format (For Future Training)
Expected layout (mirrors paths in config):
data/
train/
images/*.jpg|png
masks/*.png # Single‑channel, integer indices (0=background)
val/
images/
masks/
Class index mapping (example):
class_names = ["ID1", "ID3", "IDCOVER"]
0 -> background
1 -> ID1
2 -> ID3
3 -> IDCOVER
Masks should use nearest‑neighbor safe compression (PNG). Avoid palette mismatch; explicit integer pixel values are recommended.
8. Checkpoints & Auto‑Download
SamSegmentator will:
- Use provided
checkpoint_pathif it exists. - If none is provided and
auto_download=True, download the default checkpoint tocheckpoints/using an environment configured URL (SAM2_CHECKPOINT_URL). - (Optional) Validate SHA256 if
SAM2_CHECKPOINT_SHA256is set.
Environment variables:
SAM2_CHECKPOINT_URL=<direct_download_url>
SAM2_CHECKPOINT_SHA256=<hex>
SAM2_CHECKPOINT_DIR=checkpoints
9. Metrics (Recommended When Training Added)
- Mean IoU (per class & macro average)
- Dice coefficient
- Pixel accuracy
- Class frequency distribution (to inform potential class weighting) Store per‑epoch metrics as JSON for reproducibility.
10. Limitations & Risks
Technical:
- Current version does not include a fine‑tuning script; only inference wrapper.
- Automatic mask generator is class‑agnostic; without fine‑tuning it may over‑segment or miss tiny fields.
Ethical / Compliance:
- Processing ID documents may involve PII; ensure secure storage and compliant handling.
- Not intended for biometric decisions nor identity verification pipelines without human oversight.
11. Roadmap
- Add training script (supervised fine‑tuning using
config.json). - Optional class‑guided prompting (points / boxes) pipeline.
- Export to ONNX / TorchScript.
- CLI interface for batch folder inference.
- Lightweight web demo (Gradio / FastAPI).
12. License & Citation
Specify a license in a top‑level LICENSE file (e.g., MIT or Apache‑2.0) ensuring compatibility with SAM2's original license.
Please cite SAM / SAM2 in academic work. Example (placeholder):
@article{kirillov2023segmentanything,
title={Segment Anything},
author={Kirillov, Alexander and others},
journal={arXiv preprint arXiv:2304.02643},
year={2023}
}
Add updated SAM2 citation once official reference is finalized.
Acknowledgments
- Meta AI for releasing Segment Anything & SAM2.
- OpenCV, PyTorch, and the broader CV community.
If you have questions or need feature prioritization, open an Issue or start a Discussion.
- Downloads last month
- 1