Β·
AI & ML interests
None yet
Recent Activity
reacted to robtacconelli's post with π€― 1 day ago 𧬠Midicoth: diffusion-based lossless compression β no neural net, no GPU, no training data
What if reverse diffusion could compress text β without a neural network?
Midicoth brings score-based denoising into classical compression. It treats prior smoothing as forward noise and reverses it with Tweedie's formula on a binary tree β 3 denoising steps, James-Stein shrinkage, applied after all model blending. ~2,000 lines of C, single CPU core.
Beats every dictionary compressor we tested:
enwik8 (100 MB) β 1.753 bpb (β11.9% vs xz, β15% vs Brotli, β24.5% vs bzip2)
alice29.txt β 2.119 bpb (β16.9% vs xz)
Outperforms xz, zstd, Brotli, bzip2, gzip on all inputs
PAQ/CMIX still win with hundreds of models + LSTMs. LLM compressors win with pre-trained knowledge. Midicoth closes the gap with pure statistics β no mixer, no gradient descent, just counting.
The Tweedie denoising layer adds 2.3β2.7% on every file tested β the most consistent component in the ablation. Adding SSE or logistic mixers made things worse. In the online setting, count-based beats gradient-based.
No external dependencies. Fully deterministic. Bit-exact encode/decode. ~60 KB/s throughput.
π» Code: https://github.com/robtacconelli/midicoth
π Paper: https://huggingface.co/papers/2603.08771
β Space: https://huggingface.co/spaces/robtacconelli/midicoth
If you ever wondered whether diffusion ideas belong in data compression β here's proof they do. β appreciated! View all activity Organizations
Viewer
β’ Updated β’ 77.6M β’ 133
Viewer
β’ Updated β’ 48.9k β’ 11
cnmoro/PromptSearchTermsDecomposition
Viewer
β’ Updated β’ 50k β’ 8
β’ 1
cnmoro/reasoning-v1-20m-portuguese
Viewer
β’ Updated β’ 20.9M β’ 214
β’ 13
cnmoro/smoltalk-555k-ptbr
Viewer
β’ Updated β’ 556k β’ 25
β’ 3
cnmoro/LogicReasoningEnglishPortuguese
Viewer
β’ Updated β’ 10.5k β’ 11
β’ 3
cnmoro/LegalAlpacaReasoningRag-Qwen
Viewer
β’ Updated β’ 3.04k β’ 6
β’ 1
cnmoro/LegalAlpacaReasoningRag
Viewer
β’ Updated β’ 3.04k β’ 8
β’ 2
cnmoro/DocumentPassageRanking
Viewer
β’ Updated β’ 2.09M β’ 219
β’ 2
cnmoro/QuestionClassification-v2
Viewer
β’ Updated β’ 129k β’ 24
β’ 2
cnmoro/AllTripletsMsMarco-PTBR
Viewer
β’ Updated β’ 26.4M β’ 149
β’ 7
cnmoro/RagMixPTBR-Legal-Alpaca-2M
Viewer
β’ Updated β’ 2.09M β’ 89
β’ 8
cnmoro/GPT4-500k-Augmented-PTBR-Clean
Viewer
β’ Updated β’ 566k β’ 50
β’ 9
cnmoro/QuestionClassification
Viewer
β’ Updated β’ 129k β’ 45
β’ 1
cnmoro/TextSimplification-PTBR-330k
Viewer
β’ Updated β’ 330k β’ 7
β’ 2
cnmoro/WizardVicuna-PTBR-Instruct-Clean
Viewer
β’ Updated β’ 204k β’ 12
β’ 9
cnmoro/Text_Structuring_SOLAR_10.7B_Distilled_Smaller
Viewer
β’ Updated β’ 541k β’ 13
β’ 2
Viewer
β’ Updated β’ 2.82M β’ 13
β’ 5
cnmoro/Text_Structuring_SOLAR_10.7B_Distilled
Viewer
β’ Updated β’ 333k β’ 7
β’ 3
cnmoro/EXL2_Calibration_Dataset_EN_PTBR
Viewer
β’ Updated β’ 100k β’ 8
β’ 3
cnmoro/Instruct-PTBR-ENUS-11M
Viewer
β’ Updated β’ 2.69M β’ 23
β’ 12