Enrique Molero's picture

1

Enrique Molero

eaguaida

·

eaguaida

AI & ML interests

NLP

Recent Activity

updated a dataset 9 days ago

GSMA/open_telco

published a dataset 11 days ago

GSMA/ot_trajectories

updated a dataset 11 days ago

GSMA/leaderboard

View all activity

Organizations

updated a dataset 9 days ago

GSMA/open_telco

Viewer • Updated 9 days ago • 1.4k • 79

published a dataset 11 days ago

GSMA/ot_trajectories

Updated 11 days ago • 8

updated a dataset 11 days ago

GSMA/leaderboard

Viewer • Updated 11 days ago • 9 • 75

commented on GSMA Open-Telco LLM Benchmarks 2.0: The first dedicated LLM Evaluation for Telecoms 11 days ago

Hi @wlabchoi !
Thanks for trying out the benchmarks and for the detailed question!
We've migrated everything to our new Github repository
For TeleMath specifically, the evaluation methodology is documented in the paper: TeleMath: A Benchmark Dataset for Assessing Large Language Models Capability on Telecom Math
To answer your questions directly:

Evaluation metrics: We use pass@1 (single attempt accuracy) and cons@16 (majority voting over 16 samples) with temperature 0.6 and top_p 0.90
Answer validation: Numerical exact-match; answers are strictly numerical values with units either stated in the question or implied by context
No post-processing: Answers are compared directly against ground-truth numerical values

You can also find more context on the overall benchmark methodology in our blog post.
For running evaluations locally, check the repo documentation:

Getting Started
Running Evaluations
List of Evals

The framework uses Inspect AI, which should help with reproducibility.

Thanks for using our benchmarks!

published a dataset 15 days ago

GSMA/leaderboard

Viewer • Updated 11 days ago • 9 • 75

published a dataset 26 days ago

GSMA/open_telco

Viewer • Updated 9 days ago • 1.4k • 79

updated a dataset 26 days ago

eaguaida/gsma_sample

Viewer • Updated 26 days ago • 1.3k • 61

published 2 datasets 27 days ago

eaguaida/gsma_sample

Viewer • Updated 26 days ago • 1.3k • 61

eaguaida/telemath_normalised

Updated 27 days ago • 12

updated a dataset 27 days ago

eaguaida/3gpp_formatted

Viewer • Updated 27 days ago • 100 • 21

published a dataset 27 days ago

eaguaida/3gpp_formatted

Viewer • Updated 27 days ago • 100 • 21

updated a dataset 27 days ago

eaguaida/gsma_telelogs

Viewer • Updated 27 days ago • 100 • 22

published a dataset 27 days ago

eaguaida/gsma_telelogs

Viewer • Updated 27 days ago • 100 • 22

updated a dataset about 1 month ago

eaguaida/latam_nrc

Viewer • Updated Dec 22, 2025 • 392 • 1

published 2 datasets about 1 month ago

eaguaida/latam_nrc

Viewer • Updated Dec 22, 2025 • 392 • 1

otellm/gsma-sample-data

Viewer • Updated Nov 24, 2025 • 1.4k • 31

updated a dataset about 1 month ago

eaguaida/african-piqa

Viewer • Updated Dec 20, 2025 • 1.8k

published 2 datasets about 1 month ago

eaguaida/african-piqa

Viewer • Updated Dec 20, 2025 • 1.8k

eaguaida/african_piqa

Updated Dec 20, 2025

commented on GSMA Open-Telco LLM Benchmarks 2.0: The first dedicated LLM Evaluation for Telecoms about 1 month ago

Hi, can you please reach out to emolero@gsma.com