YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

SAM2 ID Segmenter

Lightweight wrapper and fine‑tuning scaffold around Meta's Segment Anything 2 (SAM2) adapted to segment structured regions in ID / document images (e.g. portrait, number field, security areas). The repository currently focuses on: (1) reproducible loading of a fine‑tuned SAM2 checkpoint, (2) automatic multi‑mask generation + tight cropping, and (3) configuration file driven training/inference settings.

Status: Inference wrapper implemented (SamSegmentator). End‑to‑end training loop is a planned addition. Config already anticipates training hyper‑parameters.


Contents

  1. Motivation & Scope
  2. Intended Use & Non‑Goals
  3. Repository Structure
  4. Configuration (config.json)
  5. Installation
  6. Inference Usage (SamSegmentator)
  7. Dataset & Mask Format (planned training)
  8. Checkpoints & Auto‑Download
  9. Metrics (recommended)
  10. Limitations & Risks
  11. Roadmap
  12. License & Citation

1. Motivation & Scope

Document / ID workflows often need fast class‑agnostic region extraction (for OCR, redaction, or downstream classifiers). SAM2 provides strong general mask proposals; this project wraps it to directly yield cropped image + mask pairs ordered by area and optionally padded.

2. Intended Use & Non‑Goals

Intended:

  • Pre‑segmentation of ID / document fields prior to OCR.
  • Selective anonymization / redaction pipelines (masking faces, MRZ, barcodes, etc.).
  • Rapid prototyping for custom fine‑tuning of SAM2 on a small set of document classes.

Non‑Goals:

  • Biometric identity verification or authoritative fraud detection.
  • Legal decision making without human review.
  • Full multi‑modal extraction (text recognition is out of scope here).

3. Repository Structure

model_repo/
    config.json          # Central hyper‑parameter & path config
    README.md            # (this file)
checkpoints/           # Local downloaded / fine‑tuned checkpoints
samples/
    sample_us_passport.jpg
src/
    sam_segmentator.py   # Inference wrapper (SamSegmentator)
main.py                # Placeholder entry point

Planned: train/ scripts for fine‑tuning (not yet implemented).

4. Configuration (model_repo/config.json)

Key fields (example values included in the repo):

  • model_type: Always sam2 here.
  • checkpoint_path: Path relative to project root or absolute; if omitted and auto_download=True the code will attempt remote download.
  • image_size: Target square size used during training (future). Inference wrapper accepts raw image size.
  • num_classes, class_names: For supervised training (future); not required by the current automatic mask generator, but kept for consistency.
  • augmentation, loss, optimizer, lr_scheduler: Reserved for training loop integration.
  • paths: Expected dataset layout for training: data/train/images, data/train/masks, etc.
  • mixed_precision: Will enable torch.autocast during training.

Even if not all fields are consumed now, keeping them centralized avoids future breaking refactors.

5. Installation

Prerequisites

  • Python 3.10+ (recommended)
  • CUDA GPU (optional but recommended for speed)

Using uv (preferred fast resolver)

If pyproject.toml is present (it is), you can do:

uv sync

This creates / updates the virtual environment and installs dependencies.

Using pip (alternative)

python -m venv .venv
.venv\Scripts\activate
pip install -U pip
pip install -e .

If SAM2 is not a published package in your environment, you may need to install it from source (instructions will depend on the upstream SAM2 repository—add here when finalized).

6. Inference Usage (SamSegmentator)

Minimal example using the sample passport image:

import cv2
from pathlib import Path
from src.sam_segmentator import SamSegmentator

image_path = Path("samples/sample_us_passport.jpg")
img_bgr = cv2.imread(str(image_path))  # BGR (OpenCV)

segmentator = SamSegmentator(
        checkpoint_path="checkpoints/sam2.1_hiera_base_plus_ft_ids.pt",  # or None to auto-download if configured
        pred_iou_thresh=0.88,  # forwarded to SAM2AutomaticMaskGenerator
        stability_score_thresh=0.90,
)

segments = segmentator.infer(img_bgr, pad_percent=0.05)
print(f"Total segments: {len(segments)}")

# Each segment is (crop_bgr, mask_255)
for i, (crop, mask) in enumerate(segments[:3]):
        cv2.imwrite(f"outputs/segment_{i}_crop.png", crop)
        cv2.imwrite(f"outputs/segment_{i}_mask.png", mask)

Output: pairs of tightly cropped images and their binary masks (0 background, 255 foreground), sorted by mask area descending.

Parameter Notes

  • pad_percent: Relative padding (default 5%) added around each tight bounding box.
  • Deprecated pad (absolute pixels) still accepted but will warn.
  • All additional kwargs go to SAM2AutomaticMaskGenerator (e.g., box_nms_thresh, min_mask_region_area).

7. Dataset & Mask Format (For Future Training)

Expected layout (mirrors paths in config):

data/
    train/
        images/*.jpg|png
        masks/*.png        # Single‑channel, integer indices (0=background)
    val/
        images/
        masks/

Class index mapping (example):

class_names = ["ID1", "ID3", "IDCOVER"]
0 -> background
1 -> ID1
2 -> ID3
3 -> IDCOVER

Masks should use nearest‑neighbor safe compression (PNG). Avoid palette mismatch; explicit integer pixel values are recommended.

8. Checkpoints & Auto‑Download

SamSegmentator will:

  1. Use provided checkpoint_path if it exists.
  2. If none is provided and auto_download=True, download the default checkpoint to checkpoints/ using an environment configured URL (SAM2_CHECKPOINT_URL).
  3. (Optional) Validate SHA256 if SAM2_CHECKPOINT_SHA256 is set.

Environment variables:

SAM2_CHECKPOINT_URL=<direct_download_url>
SAM2_CHECKPOINT_SHA256=<hex>
SAM2_CHECKPOINT_DIR=checkpoints

9. Metrics (Recommended When Training Added)

  • Mean IoU (per class & macro average)
  • Dice coefficient
  • Pixel accuracy
  • Class frequency distribution (to inform potential class weighting) Store per‑epoch metrics as JSON for reproducibility.

10. Limitations & Risks

Technical:

  • Current version does not include a fine‑tuning script; only inference wrapper.
  • Automatic mask generator is class‑agnostic; without fine‑tuning it may over‑segment or miss tiny fields.

Ethical / Compliance:

  • Processing ID documents may involve PII; ensure secure storage and compliant handling.
  • Not intended for biometric decisions nor identity verification pipelines without human oversight.

11. Roadmap

  • Add training script (supervised fine‑tuning using config.json).
  • Optional class‑guided prompting (points / boxes) pipeline.
  • Export to ONNX / TorchScript.
  • CLI interface for batch folder inference.
  • Lightweight web demo (Gradio / FastAPI).

12. License & Citation

Specify a license in a top‑level LICENSE file (e.g., MIT or Apache‑2.0) ensuring compatibility with SAM2's original license.

Please cite SAM / SAM2 in academic work. Example (placeholder):

@article{kirillov2023segmentanything,
    title={Segment Anything},
    author={Kirillov, Alexander and others},
    journal={arXiv preprint arXiv:2304.02643},
    year={2023}
}

Add updated SAM2 citation once official reference is finalized.

Acknowledgments

  • Meta AI for releasing Segment Anything & SAM2.
  • OpenCV, PyTorch, and the broader CV community.

If you have questions or need feature prioritization, open an Issue or start a Discussion.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support