Very handy
Ci Splunk PRO
Csplk
AI & ML interests
None yet
Recent Activity
liked
a model
5 days ago
Cognitive-Lab/NetraEmbed
updated
a Space
5 days ago
Csplk/moondream2-batch-processing
upvoted
an
article
6 days ago
Transformers v5: Simple model definitions powering the AI ecosystem
Organizations
reacted to
Nymbo's
post with ๐
15 days ago
Post
4744
๐ I've just shipped a major update to the
Nymbo/Tools MCP server: the
Anthropic found 98.7% context savings using code execution with MCP, Cloudflare published similar findings. This is my open-source implementation of the same idea.
# The Problem
Traditional MCP exposes every tool definition directly to the model. With 12 tools, that's thousands of tokens consumed *before the conversation even starts*. Each tool call also passes intermediate results through the context window โ a 10,000-row spreadsheet? That's all going into context just to sum a column.
# The Solution: One Tool to Rule Them All
Instead of the model making individual tool calls, it writes Python code that orchestrates the tools directly:
Don't know what tools are available? The agent can discover them at runtime:
The individual direct tool calls are all still there, but they can be disabled if using the
Agent_Terminal, a single "master tool" that cuts token usage by over 90%!Anthropic found 98.7% context savings using code execution with MCP, Cloudflare published similar findings. This is my open-source implementation of the same idea.
# The Problem
Traditional MCP exposes every tool definition directly to the model. With 12 tools, that's thousands of tokens consumed *before the conversation even starts*. Each tool call also passes intermediate results through the context window โ a 10,000-row spreadsheet? That's all going into context just to sum a column.
# The Solution: One Tool to Rule Them All
Agent_Terminal wraps all 12 tools (Web_Search, Web_Fetch, File_System, Generate_Image, Generate_Speech, Generate_Video, Deep_Research, Memory_Manager, Obsidian_Vault, Shell_Command, Code_Interpreter) into a single Python code execution gateway.Instead of the model making individual tool calls, it writes Python code that orchestrates the tools directly:
# Search for Bitcoin price
result = Web_Search("current price of bitcoin", max_results=3)
print(result)Don't know what tools are available? The agent can discover them at runtime:
print(search_tools('image')) # Find tools by keyword
print(usage('Generate_Image')) # Get full docs for a specific toolThe individual direct tool calls are all still there, but they can be disabled if using the
Agent_Terminal. Try it now - https://www.nymbo.net/nymbot
reacted to
sergiopaniego's
post with ๐
21 days ago
Post
2581
we've just added several example scripts to TRL showing how to train models with GRPO using some of the new OpenEnv environments
train a model to interact with a browser (๐ฎ BrowserGym Env), play Wordle (๐ฎ Wordle Env) and moooore!
TRL (GRPO + vLLM) + OpenEnv! โก๏ธ
๐ go play with them: https://github.com/huggingface/trl/tree/main/examples/scripts/openenv
๐ examples list: https://huggingface.co/docs/trl/main/en/example_overview#scripts
train a model to interact with a browser (๐ฎ BrowserGym Env), play Wordle (๐ฎ Wordle Env) and moooore!
TRL (GRPO + vLLM) + OpenEnv! โก๏ธ
๐ go play with them: https://github.com/huggingface/trl/tree/main/examples/scripts/openenv
๐ examples list: https://huggingface.co/docs/trl/main/en/example_overview#scripts
reacted to
thecollabagepatch's
post with ๐
25 days ago
Post
402
hey musicians
hf continues to make the anti-suno device possible with gary4juce, the VST for your DAW that doesn't try to replace you.
v2 just released. https://thepatch.gumroad.com/l/gary4juce (pay what you want)
now you can use google's magenta-realtime model to generate 48k samples based on your input audio (or other model outputs...there's 4 to play with now).
just duplicate my hf space, turn on an L4/L40s and throw the url into the plugin.
i've got a few finetunes you can switch to as well. or you can push your finetune to the hub and play around.
the space: thecollabagepatch/magenta-retry (you can also use the html web tester to play around with realtime generation on the L40s)
hf continues to make the anti-suno device possible with gary4juce, the VST for your DAW that doesn't try to replace you.
v2 just released. https://thepatch.gumroad.com/l/gary4juce (pay what you want)
now you can use google's magenta-realtime model to generate 48k samples based on your input audio (or other model outputs...there's 4 to play with now).
just duplicate my hf space, turn on an L4/L40s and throw the url into the plugin.
i've got a few finetunes you can switch to as well. or you can push your finetune to the hub and play around.
the space: thecollabagepatch/magenta-retry (you can also use the html web tester to play around with realtime generation on the L40s)
reacted to
prithivMLmods's
post with ๐ฅ
28 days ago
Post
3274
Try the all-new trending Qwen-Image-Edit specialized adapter demos, including Photo-to-Anime, Light Restoration, Multi-Angle Edits, Relighting, and more โ all in a single Hugging Face Space. Below is the demo link. ๐ค๐
โฎ Demo-Space: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
โฎ How-to-Use: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast#2
โฎ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
To know more about it, visit the app page or the respective model page!
โฎ Demo-Space: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
โฎ How-to-Use: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast#2
โฎ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
To know more about it, visit the app page or the respective model page!
reacted to
jbilcke-hf's
post with ๐
about 1 month ago
Post
3582
I made a code sniping agent to detect when new AI papers with code (and weights) are released, and then automatically create a Gradio demo on Hugging Face ๐ง
Here are some examples generated 100% automatically:
https://huggingface.co/collections/jbilcke-hf/sniped
I call this agent CheatCode (https://github.com/jbilcke-hf/CheatCode) because it skips so many steps that it kinda feels like breaking the rules of the AI tech release game ๐
As with any experimental technology, there is still room for improvement ๐ฉ๐ปโ๐ฌ:
- Currently the demos are all generated in one go and not built or tested by the agent itself. A more robust version should loop over the deployed app to fix build/runtime issues.
- There is still a bit of human curation done to avoid making demos for things that canโt really be demonstrated on ZeroGPU (eg. tasks taking several minutes)
- Some papers can actually be showcased in a variety of ways, which isnโt really supported (see Demo 2)
Here are some examples generated 100% automatically:
https://huggingface.co/collections/jbilcke-hf/sniped
I call this agent CheatCode (https://github.com/jbilcke-hf/CheatCode) because it skips so many steps that it kinda feels like breaking the rules of the AI tech release game ๐
As with any experimental technology, there is still room for improvement ๐ฉ๐ปโ๐ฌ:
- Currently the demos are all generated in one go and not built or tested by the agent itself. A more robust version should loop over the deployed app to fix build/runtime issues.
- There is still a bit of human curation done to avoid making demos for things that canโt really be demonstrated on ZeroGPU (eg. tasks taking several minutes)
- Some papers can actually be showcased in a variety of ways, which isnโt really supported (see Demo 2)
reacted to
prithivMLmods's
post with โค๏ธ
about 2 months ago
Post
2285
Letโs have the comparison again with Multimodal OCR3:
nanonets/Nanonets-OCR2-3B vs allenai/olmOCR-2-7B-1025 vs rednote-hilab/dots.ocr vs datalab-to/chandra
Try it here @ prithivMLmods/Multimodal-OCR3
Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
nanonets/Nanonets-OCR2-3B vs allenai/olmOCR-2-7B-1025 vs rednote-hilab/dots.ocr vs datalab-to/chandra
Try it here @ prithivMLmods/Multimodal-OCR3
Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
reacted to
Nymbo's
post with ๐
about 2 months ago
Post
1843
I've made some improvements to my custom Deep_Research tool in the
Nymbo/Tools MCP server. I've added a second LLM process and it still takes less than 1 minute to complete!
The original version of my Deep_Research tool would basically dump up to 50 fetched webpages onto the Researcher model (
# New "Filterer" Process
The new process includes another LLM call before the researcher process. The Filterer (also
# Researcher Context
The Researcher now gets only the relevant webpages, then begins writing the report. When testing with 50 initial results, the researcher would often end up with 10-20 results of relevant context.
It still takes less than a minute to accomplish everything, thanks entirely to Cerebras inference. It now takes about 35-45 seconds to complete once the tool is run.
It's also worth noting that both the Filterer and Researcher now are provided the current time/date before they see the content, reducing hallucinations caused by knowledge cutoffs.
The original version of my Deep_Research tool would basically dump up to 50 fetched webpages onto the Researcher model (
Qwen3-235B), with only a little bit of context shown from each page.# New "Filterer" Process
The new process includes another LLM call before the researcher process. The Filterer (also
Qwen3-235B) gets the query summary and the original 50 pages with low context, and decides which pages are most relevant to the research topic. The Filterer then outputs the URLs to the relevant pages, which are then re-fetched (with more context) and sent to the Researcher.# Researcher Context
The Researcher now gets only the relevant webpages, then begins writing the report. When testing with 50 initial results, the researcher would often end up with 10-20 results of relevant context.
It still takes less than a minute to accomplish everything, thanks entirely to Cerebras inference. It now takes about 35-45 seconds to complete once the tool is run.
It's also worth noting that both the Filterer and Researcher now are provided the current time/date before they see the content, reducing hallucinations caused by knowledge cutoffs.
reacted to
vikhyatk's
post with ๐ฅ
3 months ago
Post
4628
Just released a preview of Moondream 3!
moondream/moondream3-preview
This is a 9B parameter, 2B active MoE VLM with state of the art visual reasoning capabilities.
More details in the release blog post: https://moondream.ai/blog/moondream-3-preview
This is a 9B parameter, 2B active MoE VLM with state of the art visual reasoning capabilities.
More details in the release blog post: https://moondream.ai/blog/moondream-3-preview
reacted to
prithivMLmods's
post with โค๏ธ
3 months ago
Post
3404
Comparing: DeepCaption-VLA-7B, built on Qwen2.5-VL-7B-Instruct, is tailored for image captioning and vision-language attribution, focusing on precise, descriptive captions of visual properties, object attributes, and scene details. In contrast, Qwen2.5-VL-7B-Abliterated-Caption-it is fine-tuned for abliterated captioning, generating highly detailed descriptions across diverse visual categories.
Models๐ค
โฆ DeepCaption-VLA-7B : prithivMLmods/DeepCaption-VLA-7B
โฆ Qwen2.5-VL-7B-Abliterated-Caption-it : prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it
Spacesโต
โ VisionScope-R2 : prithivMLmods/VisionScope-R2
โ Qwen2.5-VL-Outpost : https://huggingface.co/spaces/prithivMLmods/Qwen2.5-VL-Outpost
Collection๐๏ธ
Multimodal VLMs - Aug'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
GitHubโ๏ธ
> DeepCaption-VLA-7B [4bit-notebook demo] : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb
> Qwen2.5-VL-3B-Abliterated-Caption-it(caption) : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Qwen2.5-VL-3B-Abliterated-Caption-it(caption)/Qwen2_5_VL_3B_Abliterated_Caption_it.ipynb
The community GPU grant was given by Hugging Face โ special thanks to them. ๐ค๐
To know more about it, visit the app page or the respective model page!!
Models๐ค
โฆ DeepCaption-VLA-7B : prithivMLmods/DeepCaption-VLA-7B
โฆ Qwen2.5-VL-7B-Abliterated-Caption-it : prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it
Spacesโต
โ VisionScope-R2 : prithivMLmods/VisionScope-R2
โ Qwen2.5-VL-Outpost : https://huggingface.co/spaces/prithivMLmods/Qwen2.5-VL-Outpost
Collection๐๏ธ
DeepCaption attr. :
prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556aVL Abliterated-Caption :
prithivMLmods/vl-abliterated-caption-68a0443b63182e97a15c47a3Multimodal VLMs - Until July'25 :
prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027Multimodal VLMs - Aug'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
GitHubโ๏ธ
> DeepCaption-VLA-7B [4bit-notebook demo] : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb
> Qwen2.5-VL-3B-Abliterated-Caption-it(caption) : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Qwen2.5-VL-3B-Abliterated-Caption-it(caption)/Qwen2_5_VL_3B_Abliterated_Caption_it.ipynb
The community GPU grant was given by Hugging Face โ special thanks to them. ๐ค๐
To know more about it, visit the app page or the respective model page!!
replied to
ZennyKenny's
post
4 months ago
Just use FHE and move on
reacted to
fdaudens's
post with ๐
5 months ago
Post
2623
You might not have heard of Moonshot AI โ but within 24 hours, their new model Kimi K2 shot to the top of Hugging Faceโs trending leaderboard.
Soโฆ who are they, and why does it matter?
Had a lot of fun co-writing this blog post with @xianbao , with key insights translated from Chinese, to unpack how this startup built a model that outperforms GPT-4.1, Claude Opus, and DeepSeek V3 on several major benchmarks.
๐งต A few standout facts:
1. From zero to $3.3B in 18 months:
Founded in March 2023, Moonshot is now backed by Alibaba, Tencent, Meituan, and HongShan.
2. A CEO who thinks from the end:
Yang Zhilin (31) previously worked at Meta AI, Google Brain, and Carnegie Mellon. His vision? Nothing less than AGI โ still a rare ambition among Chinese AI labs.
3. A trillion-parameter model thatโs surprisingly efficient:
Kimi K2 uses a mixture-of-experts architecture (32B active params per inference) and dominates on coding/math benchmarks.
4. The secret weapon: Muon optimizer:
A new training method that doubles efficiency, cuts memory in half, and ran 15.5T tokens with zero failures. Big implications.
Most importantly, their move from closed to open source signals a broader shift in Chinaโs AI scene โ following Baiduโs pivot. But as Yang puts it: โUsers are the only real leaderboard.โ
๐ Check out the full post to explore what Kimi K2 can do, how to try it, and why it matters for the future of open-source LLMs:
https://huggingface.co/blog/fdaudens/moonshot-ai-kimi-k2-explained
Soโฆ who are they, and why does it matter?
Had a lot of fun co-writing this blog post with @xianbao , with key insights translated from Chinese, to unpack how this startup built a model that outperforms GPT-4.1, Claude Opus, and DeepSeek V3 on several major benchmarks.
๐งต A few standout facts:
1. From zero to $3.3B in 18 months:
Founded in March 2023, Moonshot is now backed by Alibaba, Tencent, Meituan, and HongShan.
2. A CEO who thinks from the end:
Yang Zhilin (31) previously worked at Meta AI, Google Brain, and Carnegie Mellon. His vision? Nothing less than AGI โ still a rare ambition among Chinese AI labs.
3. A trillion-parameter model thatโs surprisingly efficient:
Kimi K2 uses a mixture-of-experts architecture (32B active params per inference) and dominates on coding/math benchmarks.
4. The secret weapon: Muon optimizer:
A new training method that doubles efficiency, cuts memory in half, and ran 15.5T tokens with zero failures. Big implications.
Most importantly, their move from closed to open source signals a broader shift in Chinaโs AI scene โ following Baiduโs pivot. But as Yang puts it: โUsers are the only real leaderboard.โ
๐ Check out the full post to explore what Kimi K2 can do, how to try it, and why it matters for the future of open-source LLMs:
https://huggingface.co/blog/fdaudens/moonshot-ai-kimi-k2-explained
reacted to
sergiopaniego's
post with ๐
5 months ago
Post
2026
Updated my HF Space for vibe testing smol VLMs on object detection, visual grounding, keypoint detection & counting! ๐
๐ Compare Qwen2.5 VL 3B vs Moondream 2B side-by-side with annotated images & text outputs.
Try examples or test your own images! ๐
๐ฑSpace: sergiopaniego/vlm_object_understanding
๐ Compare Qwen2.5 VL 3B vs Moondream 2B side-by-side with annotated images & text outputs.
Try examples or test your own images! ๐
๐ฑSpace: sergiopaniego/vlm_object_understanding
reacted to
AdinaY's
post with ๐
5 months ago
Post
2015
The Chinese Open Source Heatmap is live ๐ฅ
You can now track the companies/ research labs/ communities powering Chinaโs open source AI movement.
zh-ai-community/model-release-heatmap-zh
Some highlights:
โจGiant Tech are investing more in open source.
-Alibaba: Full stack open ecosystem
-Tecent: Hunyuan image/video/3D
-Bytedance: Catching up fast in 2025
-Baidu: New player in open LLM
โจNew players emerging postโDeepSeek moment.
-Xiaomi
-Red Note
-Bilibili
-MiniMax
-Moonshot AI
โจStartup list is shifting fast! Those who find a direction aligned with their strengths are the ones who endure.
-DeepSeek
-MiniMax
-StepFun
-Moonshot AI
-Zhipu AI
-OpenBMB
โจResearch Lab & Community are making key contributions.
-BAAI
-Shanghai AI Lab
-OpenMOSS
-MAP
You can now track the companies/ research labs/ communities powering Chinaโs open source AI movement.
zh-ai-community/model-release-heatmap-zh
Some highlights:
โจGiant Tech are investing more in open source.
-Alibaba: Full stack open ecosystem
-Tecent: Hunyuan image/video/3D
-Bytedance: Catching up fast in 2025
-Baidu: New player in open LLM
โจNew players emerging postโDeepSeek moment.
-Xiaomi
-Red Note
-Bilibili
-MiniMax
-Moonshot AI
โจStartup list is shifting fast! Those who find a direction aligned with their strengths are the ones who endure.
-DeepSeek
-MiniMax
-StepFun
-Moonshot AI
-Zhipu AI
-OpenBMB
โจResearch Lab & Community are making key contributions.
-BAAI
-Shanghai AI Lab
-OpenMOSS
-MAP
I think it's nice that Zero GPU is something that provides so much to so many people across the hugging face community to build and use things that would otherwise be unavailable / unfair barriers of entry to do so at no cost to the average user.
zeroGPU has max execution time parameters with sensible defaults and ranges so it should not take to much quota even if left unchecked. Don't forget you will get more quota in no time at all.
reacted to
merve's
post with ๐
6 months ago
Post
2387
y'all have been asking my opinion on how OCR models compare to each other ๐
I will leave three apps to compare newest models by @prithivMLmods instead โคต๏ธ
> compare Nanonets-OCR-s, Qwen2-VL-OCR-2B-Instruct, RolmOCR, Aya-Vision prithivMLmods/Multimodal-OCR
> SmolDocling, Nanonets-OCR-s, MonkeyOCR, Typhoon-OCR-7B prithivMLmods/Multimodal-OCR2
> docscopeOCR, MonkeyOCR, coreOCR prithivMLmods/core-OCR
I will leave three apps to compare newest models by @prithivMLmods instead โคต๏ธ
> compare Nanonets-OCR-s, Qwen2-VL-OCR-2B-Instruct, RolmOCR, Aya-Vision prithivMLmods/Multimodal-OCR
> SmolDocling, Nanonets-OCR-s, MonkeyOCR, Typhoon-OCR-7B prithivMLmods/Multimodal-OCR2
> docscopeOCR, MonkeyOCR, coreOCR prithivMLmods/core-OCR
reacted to
cbensimon's
post with ๐ฅ
6 months ago
Post
4096
๐ ZeroGPU now supports PyTorch native quantization via
While it hasnโt been battle-tested yet,
Let us know if you run into any issues โ and weโre excited to see what the community will build!
torchaoWhile it hasnโt been battle-tested yet,
Int8WeightOnlyConfig is already working flawlessly in our tests.Let us know if you run into any issues โ and weโre excited to see what the community will build!
import spaces
from diffusers import FluxPipeline
from torchao.quantization.quant_api import Int8WeightOnlyConfig, quantize_
pipeline = FluxPipeline.from_pretrained(...).to('cuda')
quantize_(pipeline.transformer, Int8WeightOnlyConfig()) # Or any other component(s)
@spaces.GPU
def generate(prompt: str):
return pipeline(prompt).images[0]
reacted to
clem's
post with ๐ค
7 months ago
Post
3871
It's just become easier to share your apps on the biggest AI app store (aka HF spaces) for unlimited storage, more visibility and community interactions.
Just pick a React, Svelte, or Vue template when you create your space or add
Or follow this link: https://huggingface.co/new-space?sdk=static
Let's build!
Just pick a React, Svelte, or Vue template when you create your space or add
app_build_command: npm run build in your README's YAML and app_file: build/index.html in your README's YAML block.Or follow this link: https://huggingface.co/new-space?sdk=static
Let's build!
Time to podman my songs
reacted to
Yehor's
post with ๐ฅ
8 months ago
Post
2129
Convert your audio data to Parquet/DuckDB files with blazingly fast speeds!
Repository with pre-built binaries: https://github.com/crs-org/audios-to-dataset
Repository with pre-built binaries: https://github.com/crs-org/audios-to-dataset