I would be incredibly thankful if you could quant this: JetBrains/Mellum-4b-dpo-python - - - JetBrains/Mellum-4b-dpo-all
Hello there, I am currently tinkering on my own little RAG solution for Internal Coding Help,
but I am trying (more like I must) to accomplish this with the minimal ressources I have.
So yes, for the autocomplete part of this whole shebang case I am really needing quantifified 4b GGUF's.. yes I know... (IQ4_XS),
as I am incredibly Hardware Limited, here are the full Links to the Models in Question:
https://huggingface.co/JetBrains/Mellum-4b-dpo-python
https://huggingface.co/JetBrains/Mellum-4b-dpo-all
I really REALLY appreciate all the work you guys are doing, thank you from my whole heart. <3
(P.S.: I am also open for other Autocomplete Suggestions - - - Mostly Python / VBScript - - - Again, thank you!)
I'm glad we were able to help you with your project. I was a bit skeptical as booth models you requested where of architecture LlamaForCausalLM which is often specified when a custom architecture incompatible with llama.cpp is used so I didn't want to give you wrong hopes by informing you when I queued them 3 hours ago. It’s so nice they booth completed successfully. Regarding size i1-IQ4_XS is probably the lowest you can go without scarifying quality on a 4B model.
I know, thank you, with Context I have only about 4GB of VRAM available so it is the only one fitting.
As far as I know it is a custom Adaption of Starcoder that JetBeans used there, but right now it gave the """best""" kind of answers up until now.
I am struggling sadly with the "Autocompletion", the "FIM" "Fill in the Middle" of Code is very fiddly and you must have luck right now that the AI can calculate something useful.
No wonder if you can only give a bit of context from above and below the part of code you are currently working on.
Well I am further looking into it.
The RAG itself is running, automatic periodical ingestion of Local GIT Repository's from our Coders, running some embedding, reranker, faiss, and other stuff over it and fill a local LanceDB with the Vectorized Data.
When an Question is send to the Application, the Input gets Vectorized (LLama CPP running Embedding Model), some other magic, then looked into the LanceDB which of the Data would fit the question and code and given as a whole to...
... an KoboldCPP with right now a Version of Qwen Coder that gets the Question and the fitting RAG Data to generate an answer.
Right now, really not bad for the nearly Lobotomized Models I have to use because of the Hardware Restrictions ^^.
On the next Part I would want to ingest Drupal Wiki entry's automatically, (this.. is a Problem in itself.. you only get ... lets say horrible HTML Tag infested raw text back, that makes it a bit hard to parse...) to each coding Project and somehow link this to the Cloned Local GIT Repository's to create an even better Database for the occurring questions when the whole thing gets vectorized.
Well.. baby steps :)
Thank you again!
If you have any suggestions for FIM Models without thinking (curse you Qwen3 what obscure things I tried to make you stop thinking! URGH), I am all ears ^^
I wish you guys a grand day!