Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
5
6
9
Catherine Arnett
catherinearnett
Follow
zouharvi's profile picture
ararruga's profile picture
pietrolesci's profile picture
94 followers
·
31 following
https://catherinearnett.github.io/
linguist_cat
catherinearnett
catherinearnett.bsky.social
AI & ML interests
multilingual NLP, tokenization
Recent Activity
authored
a paper
5 days ago
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
authored
a paper
5 days ago
Explaining and Mitigating Crosslingual Tokenizer Inequities
authored
a paper
5 days ago
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures
View all activity
Organizations
catherinearnett
's models
18
Sort: Recently updated
catherinearnett/B-GPT_pl_en_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
7
catherinearnett/B-GPT_en_pl_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
18
catherinearnett/B-GPT_pl_en_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
10
catherinearnett/B-GPT_en_pl_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
25
catherinearnett/B-GPT_el_en_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
9
catherinearnett/B-GPT_en_el_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
9
catherinearnett/B-GPT_el_en_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
1
catherinearnett/B-GPT_en_el_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
4
catherinearnett/B-GPT_es_en_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
4
catherinearnett/B-GPT_en_es_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
7
catherinearnett/B-GPT_es_en_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
14
catherinearnett/B-GPT_en_es_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
12
catherinearnett/B-GPT_nl_en_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
12
catherinearnett/B-GPT_en_nl_sequential
Text Generation
•
0.1B
•
Updated
Jun 12
•
4
catherinearnett/B-GPT_nl_en_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
16
catherinearnett/B-GPT_en_nl_simultaneous
Text Generation
•
0.1B
•
Updated
Jun 12
•
15
catherinearnett/pythia-1b-bigram_masked
Updated
May 1
catherinearnett/pythia-160m-bigram_masked
Updated
May 1