Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

IlyasMoutawwakil 
posted an update 1 day ago
view post
Post
1930
Transformers v5 just landed! 🚀
It significantly unifies and reduces modeling code across architectures, while opening the door to a whole new class of performance optimizations.

My favorite new feature? 🤔
The new dynamic weight loader + converter. Here’s why 👇

Over the last few months, the core Transformers maintainers built an incredibly fast weight loader, capable of converting tensors on the fly while loading them in parallel threads. This means we’re no longer constrained by how parameters are laid out inside the safetensors weight files.

In practice, this unlocks two big things:
- Much more modular modeling code. You can now clearly see how architectures build on top of each other (DeepSeek v2 → v3, Qwen v2 → v3 → MoE, etc.). This makes shared bottlenecks obvious and lets us optimize the right building blocks once, for all model families.
- Performance optimizations beyond what torch.compile can do alone. torch.compile operates on the computation graph, but it can’t change parameter layouts. With the new loader, we can restructure weights at load time: fusing MoE expert projections, merging attention QKV projections, and enabling more compute-dense kernels that simply weren’t possible before.

Personally, I'm honored to have contributed in this direction, including the work on optimizing MoE implementations and making modeling code more torch-exportable, so these optimizations can be ported cleanly across runtimes.

Overall, Transformers v5 is a strong signal of where the community and industry are converging: Modularity and Performance, without sacrificing Flexibility.

Transformers v5 makes its signature from_pretrained an entrypoint where you can mix and match:
- Parallelism
- Quantization
- Custom kernels
- Flash/Paged attention
- Continuous batching
- ...

Kudos to everyone involved! I highly recommend the:
Release notes: https://github.com/huggingface/transformers/releases/tag/v5.0.0
Blog post: https://huggingface.co/blog/transformers-v5
·
ovi054 
posted an update 2 days ago
danielhanchen 
posted an update about 9 hours ago
RakshitAralimatti 
posted an update about 4 hours ago
view post
Post
126
Just built my entire AI Engineer portfolio by pasting 2 links (GitHub and LinkedIn) into
moonshotai
Kimi 2.5.
That's it. That's the workflow.
Zero coding. Zero iteration. Zero "make the button bigger."
See for yourself: https://rakshit2020.github.io/rakshitaralimatti.github.io/

The model:
✅ Scraped my GitHub repos automatically
✅ Pulled my experience from LinkedIn
✅ Designed an Aurora Glass theme
✅ Mapped every skill to projects
✅ Added animations I'd never code myself


kanaria007 
posted an update 1 day ago
view post
Post
1166
✅ New Article: *Post-Transformer Decision Cores* (v0.1)

Title:
🚀 Post-Transformer Decision Cores: Goal-Native Engines Beyond LLMs
🔗 https://huggingface.co/blog/kanaria007/post-tranformer-decision-cores

---

Summary:
Transformers are powerful—but in SI-Core they’re *not the essence of intelligence*. A *Decision Core* is anything that satisfies the *Jump contracts* (OBS/ETH/MEM/ID/EVAL + RML), and those contracts don’t require next-token prediction.

This article sketches what “post-Transformer” looks like in practice: *goal-native, structure-aware controllers* that may use LLMs as tools—but don’t depend on them as the runtime brain.

> Don’t relax the contracts.
> Replace the engine behind them.

---

Why It Matters:
• Makes LLMs *optional*: shift them to “genesis / exploration / explanation,” while routine high-stakes Jumps run on structured cores
• Improves boring-but-critical properties: *determinism (CAS), fewer inconsistencies (SCI), fewer ETH violations (EAI), better rollback (RBL/RIR)*
• Enables gradual adoption via *pluggable Jump engines* and domain-by-domain “primary vs fallback” switching

---

What’s Inside:
• The architectural inversion: *World → OBS → SIM/SIS → Jump (Decision Core) → RML → Effects* (LLM is just one engine)
• Three compatible post-Transformer directions:

1. *World-model + search controllers* (MPC/MCTS/anytime search with explicit GCS + ETH constraints)
2. *Genius-distilled specialized controllers* (distill structure from GeniusTraces; LLM becomes a “genesis tool”)
3. *SIL-compiled Decision Programs* (typed Jump entrypoints, compiler-checked invariants, DPIR/GSPU targeting)
• A realistic migration path: LLM-wrapped → Genius library → shadow dual-run → flip primary by domain → SIL-compiled cores
• How this connects to “reproducing genius”: GRP provides trace selection/format; this article provides the engine architectures

---

📖 Structured Intelligence Engineering Series
tegridydev 
posted an update 1 day ago
view post
Post
1374
Introducing OpenMALx
openmalx


Repository for Infosec and Machine Learning Resources

OpenMALx is an organization focused on the development of datasets and models for security analysis. The project objective is to provide structured data for training and evaluating large language models in a security context.

---

Technical Focus

**Dataset Formatting:** Processing raw security tool logs into instruction/response pairs for model training.
**Local Execution:** Optimizing models for local hardware to ensure data remains on-premises.
**Response Logic:** Developing structured formats for explaining security vulnerabilities and remediation steps.

Active Projects

**infosec-tool-output:** A dataset mapping static and dynamic analysis tool outputs to technical summaries.
openmalx/infosec-tool-output

**open-malsec:** A collection of text-based security threats, including phishing and social engineering samples, for classification tasks.
openmalx/open-malsec
sergiopaniego 
posted an update 2 days ago
Parveshiiii 
posted an update 2 days ago
view post
Post
1500
🚀 Wanna train your own AI Model or Tokenizer from scratch?

Building models isn’t just for big labs anymore — with the right data, compute, and workflow, you can create **custom AI models** and **tokenizers** tailored to any domain. Whether it’s NLP, domain‑specific datasets, or experimental architectures, training from scratch gives you full control over vocabulary, embeddings, and performance.

✨ Why train your own?
- Full control over vocabulary & tokenization
- Domain‑specific optimization (medical, legal, technical, etc.)
- Better performance on niche datasets
- Freedom to experiment with architectures

⚡ The best part?
- Tokenizer training (TikToken / BPE) can be done in **just 3 lines of code**.
- Model training runs smoothly on **Google Colab notebooks** — no expensive hardware required.

📂 Try out my work:
- 🔗 https://github.com/OE-Void/Tokenizer-from_scratch
- 🔗 https://github.com/OE-Void/GPT
scthornton 
posted an update 3 days ago
view post
Post
2059
SecureCode: security-aware code models (3B–20B), trained for review + remediation

I’ve been frustrated by how often code assistants recommend patterns that pass tests but fail security review (e.g., string-built SQL, brittle auth logic, unsafe parsing, insecure defaults, etc.). So I built **SecureCode**: a collection of **8 code models (3B → 20B)** trained to behave more like a security reviewer.

What you should expect from SecureCode:

- identify likely vuln patterns and explain *why* they’re risky
- outline plausible abuse paths (defensive framing)
- propose a secure rewrite (drop-in where possible)
- include defense-in-depth guidance + regression tests/checks

Links:

- **Models:** https://huggingface.co/collections/scthornton/securecode
- **Dataset:** scthornton/securecode-v2
- **Paper:** https://arxiv.org/html/2512.18542v1 SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models (2512.18542)

**How to test it (copy/paste prompt):**


> You are a senior application security engineer. Review the code below.
>  Output: (1) findings with severity, (2) likely exploit scenarios (high level), (3) secure rewrite,
>  (4) defense-in-depth recommendations, (5) regression tests/checks.
>  Code: `...`



**I’m looking for real-world feedback**

- Your “this slipped through review once” snippets (sanitized is fine)
- False positives / false negatives you observe
- Contributions of new CVE-grounded examples

If you drop a snippet, please include language/framework + what the *correct* remediation looks like in your environment. If you have any contributions or suggestions for the dataset, I'd be happy to hear them. I have some new features and enhancements planned for v3 that are already underway, but for now, I'm focused on testing as many use cases as possible. Appreciate you all!

wangbuer999 
posted an update 3 days ago
view post
Post
2594
HunyuanImage 3.0-Instruct just dropped

fresh -sourceImage 3.0model! Spent 20 mins testing it on a Messi + retro scrambler fusion case

Ran on diffusers v0.26.3 + CUDA 12.1 | 8B MoE params (1.3B activated) | zero VRAM issues

strength=0.9 Messi #10 kit/tattoo sharp, moto’s rusted metal texture blurred (classic open-source pain)
strength=0.7 Moto/cobblestone background crisp, Messi’s jersey details faded completely

strength=0.75 + prompt "Blend seamlessly, keep all original details": both subject & background sharp
No ControlNet, no manual masking the model’s chain-of-thought reasoning parses image+prompt first
Already outperforms Qwen-Image-Edit 2511 (GSB eval +25.7% on single-image edits) | 100% open-source

👉 Repo: https://hunyuan.tencent.com/chat/HunyuanDefault?from=modelSquare&modelId=Hunyuan-Image-3.0-Instruct

technical report:https://arxiv.org/abs/2509.23951

Anyone else struggled with strength tweaks for fusion? This fixed it for my Messi+moto case did it work as well for yours?
·