AI & ML interests
π€ Hugging Face x πΈ BigScience initiative to create open source community resources for LAMs.
Recent Activity
View all activity
Organization Card
π BigLAM: Machine Learning for Libraries, Archives, and Museums
BigLAM is a community-driven initiative to build an open ecosystem of machine learning models, datasets, and tools for Libraries, Archives, and Museums (LAMs).
We aim to:
- ποΈ Share machine-learning-ready datasets from LAMs via the Hugging Face Hub
- π€ Train and release open-source models for LAM-relevant tasks
- π οΈ Develop tools and approaches tailored to LAM use cases
β¨ Background
BigLAM began as a datasets hackathon within the BigScience πΈ project, a large-scale, open NLP collaboration.
Our goal: make LAM datasets more discoverable and usable to support researchers, institutions, and ML practitioners working with cultural heritage data.
π What You'll Find
The BigLAM organization hosts:
- Datasets: image, text, and tabular data from and about libraries, archives, and museums
- Models: fine-tuned for tasks like:
- Art/historical image classification
- Document layout analysis and OCR
- Metadata quality assessment
- Named entity recognition in heritage texts
- Spaces: tools for interactive exploration and demonstration
π§© Get Involved
We welcome contributions! You can:
- Use our datasets and models
- Join the discussion on GitHub
- Contribute your own tools or data
- Share your work using BigLAM resources
π Why It Matters
Cultural heritage data is often underrepresented in machine learning. BigLAM helps address this by:
- Supporting inclusive and responsible AI
- Helping institutions experiment with ML for access, discovery, and preservation
- Ensuring that ML systems reflect diverse human knowledge and expression
- Developing tools and methods that work well with the unique formats, values, and needs of LAMs
Index card datasets for training and evaulating models for conversion of index cards to structured data/metadata
Datasets which can help train or evaluate various approaches to automatic metadata generation and extraction.
-
biglam/doab-metadata-extraction
Viewer β’ Updated β’ 8.09k β’ 413 β’ 11 -
biglam/rubenstein-manuscript-catalog
Viewer β’ Updated β’ 49.7k β’ 150 β’ 2 -
biglam/bpl-card-catalog
Viewer β’ Updated β’ 838k β’ 246 β’ 4 -
biglam/harvard-library-bibliographic-dataset
Viewer β’ Updated β’ 11.1M β’ 112 β’ 2
Index card datasets for training and evaulating models for conversion of index cards to structured data/metadata
Datasets which can help train or evaluate various approaches to automatic metadata generation and extraction.
-
biglam/doab-metadata-extraction
Viewer β’ Updated β’ 8.09k β’ 413 β’ 11 -
biglam/rubenstein-manuscript-catalog
Viewer β’ Updated β’ 49.7k β’ 150 β’ 2 -
biglam/bpl-card-catalog
Viewer β’ Updated β’ 838k β’ 246 β’ 4 -
biglam/harvard-library-bibliographic-dataset
Viewer β’ Updated β’ 11.1M β’ 112 β’ 2
models
6

biglam/historic-newspaper-illustrations-yolov11
Object Detection
β’
Updated
β’
10

biglam/medieval-manuscript-yolov11
Object Detection
β’
Updated
β’
4

biglam/detr-resnet-50_fine_tuned_loc-2023
Object Detection
β’
41.6M
β’
Updated
β’
70
β’
2

biglam/detr-resnet-50_fine_tuned_nls_chapbooks
Object Detection
β’
41.6M
β’
Updated
β’
79
β’
6

biglam/cultural_heritage_metadata_accuracy
Text Classification
β’
0.1B
β’
Updated
β’
3
β’
3

biglam/autotrain-beyond-the-books
Text Classification
β’
0.1B
β’
Updated
β’
3
datasets
38
biglam/doab-metadata-extraction
Viewer
β’
Updated
β’
8.09k
β’
413
β’
11
biglam/harvard-library-bibliographic-dataset
Viewer
β’
Updated
β’
11.1M
β’
112
β’
2
biglam/rubenstein-manuscript-catalog
Viewer
β’
Updated
β’
49.7k
β’
150
β’
2
biglam/bpl-card-catalog
Viewer
β’
Updated
β’
838k
β’
246
β’
4
biglam/brill_iconclass
Viewer
β’
Updated
β’
87.7k
β’
57
β’
8
biglam/sloane-index-cards
Viewer
β’
Updated
β’
2.73k
β’
31
β’
1
biglam/newspaper-navigator
Viewer
β’
Updated
β’
6.55M
β’
74
β’
10
biglam/loc_beyond_words
Viewer
β’
Updated
β’
3.56k
β’
41
β’
13
biglam/europeana_newspapers
Viewer
β’
Updated
β’
11.9M
β’
116
β’
55
biglam/european_art
Viewer
β’
Updated
β’
15.2k
β’
128
β’
19