Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Pedro Ortiz Suarez's picture
21 1 12

Pedro Ortiz Suarez

pjox
jizhongpeng's profile picture Fishtiks's profile picture barthfab's profile picture
·
https://portizs.eu/
  • pjox13
  • pjox

AI & ML interests

Language modeling, parsing, sequence tagging, NER, historical languages.

Recent Activity

authored a paper 3 days ago
SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing
updated a dataset 5 months ago
oscar-corpus/oscar
updated a dataset 6 months ago
pjox/tmp4c-index
View all activity

Organizations

ALMAnaCH (Inria)'s profile picture BigScience Workshop's profile picture OSCAR's profile picture BigScience Catalogue Data's profile picture Scilons Project's profile picture BigScience Data's profile picture Web Data Commons's profile picture Speech and Language Technology, DFKI's profile picture Just some testing..'s profile picture Common Crawl Foundation's profile picture Occiglot's profile picture

authored a paper 3 days ago

SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing

Paper • 2512.11192 • Published Dec 12, 2025
authored a paper over 1 year ago

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

Paper • 2406.08707 • Published Jun 13, 2024 • 17
authored a paper about 2 years ago

CamemBERT: a Tasty French Language Model

Paper • 1911.03894 • Published Nov 10, 2019 • 4
authored a paper almost 3 years ago

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 36
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs