5 6 3

Shantanu Agarwal

shantanuagarwal

AI & ML interests

None yet

Recent Activity

upvoted an article 30 days ago

Introducing SynthID Text

upvoted an article 30 days ago

Introduction to ggml

upvoted an article about 1 month ago

KV Cache from scratch in nanoVLM

View all activity

Organizations

upvoted 2 articles 30 days ago

Article

Introducing SynthID Text

Oct 23, 2024

•

Article

Introduction to ggml

Aug 13, 2024

•

256

upvoted 2 articles about 1 month ago

Article

KV Cache from scratch in nanoVLM

Jun 4, 2025

•

107

Article

Continuous batching from first principles

Nov 25, 2025

•

290

liked a Space 2 months ago

The Smol Training Playbook

📚

2.76k

The secrets to building world-class LLMs

commented on Efficient LLM Pretraining: Packed Sequences and Masked Attention 9 months ago

Hi @sirluk , thanks for the great post. Do you know if the above masking technique works for some attention implementations and would be incompatible with some other?

For example, would the above masking work with SDPA/flash_attention_2 and eager (each of these implementations are dealt a bit differently in https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L666 for example)?

upvoted an article 9 months ago

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Oct 7, 2024

•

liked a Space 10 months ago

The Ultra-Scale Playbook

🌌

3.62k

The ultimate guide to training LLM on large GPU Clusters

liked a model over 1 year ago

Qwen/Qwen2.5-14B

Text Generation • 15B • Updated Sep 20, 2024 • 379k • • 139

New activity in Qwen/Qwen2.5-14B over 1 year ago

lora support

#3 opened over 1 year ago by

shantanuagarwal

New activity in mistralai/Mistral-Small-Instruct-2409 over 1 year ago

Base model please

❤️ 👍 23

#6 opened over 1 year ago by

rombodawg

New activity in nvidia/NV-Embed-v1 over 1 year ago

Why do we need to hardcode self._attn_implementation = "eager"

#35 opened over 1 year ago by

shantanuagarwal

MLP intermediate dimension

#3 opened over 1 year ago by

shantanuagarwal

upvoted a collection over 1 year ago

🤖 Agents

Collection

21 items • Updated Dec 31, 2024 • 172

Shantanu Agarwal

AI & ML interests

Recent Activity

Organizations