Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
21
1
12
Pedro Ortiz Suarez
pjox
Follow
laurievb's profile picture
naturelizer's profile picture
thomwolf's profile picture
16 followers
ยท
20 following
https://portizs.eu/
pjox13
pjox
AI & ML interests
Language modeling, parsing, sequence tagging, NER, historical languages.
Organizations
pjox
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
commoncrawl/statistics
over 1 year ago
Set `sep="\s+"` for the duplicates file
2
#1 opened over 1 year ago by
lhoestq
New activity in
oscar-corpus/OSCAR-2301
almost 2 years ago
Porn-related strings in the datasets (zh)
2
#8 opened almost 2 years ago by
kiwakwok
New activity in
oscar-corpus/colossal-oscar-1.0
about 2 years ago
colab crashed after trying to load the dataset
1
#4 opened about 2 years ago by
MhondGhod
New activity in
oscar-corpus/colossal-oscar-1.0
over 2 years ago
Change foldernames
๐
2
4
#3 opened over 2 years ago by
hac541309
New activity in
oscar-corpus/OSCAR-2201
over 2 years ago
Unsafe Files
20
#12 opened over 2 years ago by
GetzPro
New activity in
oscar-corpus/OSCAR-2301
over 2 years ago
About the number of documents
6
#6 opened over 2 years ago by
lixin4ever
New activity in
oscar-corpus/colossal-oscar-1.0
over 2 years ago
Upload the rest of the data for 05-06-23
#1 opened over 2 years ago by
pjox
New activity in
oscar-corpus/OSCAR-2301
over 2 years ago
Changing into Parquet
๐
1
2
#5 opened over 2 years ago by
hac541309
New activity in
pjox/dalembert
over 2 years ago
the link to RoBERTa base model directs us to bert-base-uncased
1
#1 opened over 2 years ago by
hurrial
New activity in
oscar-corpus/OSCAR-2301
over 2 years ago
Deduplicated English Corpus
๐
1
2
#3 opened over 2 years ago by
conceptofmind
Data hosting on Huggingface
1
#2 opened over 2 years ago by
hieuhocnlp
How to download only one language?
2
#1 opened over 2 years ago by
musabg
New activity in
oscar-corpus/OSCAR-2201
over 2 years ago
full of sexy content and does't have 200G in zh corpus
1
#10 opened over 2 years ago by
Hzhiqiang