Image understanding error

#7
by dreamsmaner - opened

Model: mradermacher/Huihui-gemma-4-12B-it-abliterated-i1-GGUF@IQ4_XS (https://huggingface.co/mradermacher/Huihui-gemma-4-12B-it-abliterated-i1-GGUF) with Huihui-gemma-4-12B-it-abliterated.mmproj-f16.gguf from (https://huggingface.co/mradermacher/Huihui-gemma-4-12B-it-abliterated-GGUF).
Windows 11
2060/6GB VRAM + 32GB RAM
LM Studio 0.4.16 (build 1) with CUDA 12 llama.cpp (Windows) v2.20.1 -> OpenAI-Compatible API

Msty Sutdio loads the API and chat. Result as below(chat is fine but image understanding is not):

@ User:

  • Who are you? What can you do?

@ huihui-gemma-4-12b-it-abliterated-i1:

  • I am Gemma 4, a large language model developed by Google DeepMind. I am an open weights model designed to understand and process information across different modalities to assist you with a wide range of tasks.
    What I can do:
    Text Processing: I can generate creative content (like stories, poems, and emails), summarize long documents, translate languages, and answer complex questions.
    Image Understanding: I can "see" and interpret images to describe what is happening in them, identify objects, or explain the relationship between different elements in a visual scene.
    Audio Processing: As part of the Gemma 4 family, I can process audio input (specifically within the 2B and 4B versions of my model family).
    Reasoning and Problem Solving: I can help with logical reasoning, mathematical problems, and coding tasks across various programming languages.
    Knowledge Retrieval: My knowledge includes a vast amount of information up until my cutoff date of January 2025, allowing me to provide context on history, science, art, and more.
    Key Modalities:
    Input: I can take in text, images, and audio (depending on the specific model version).
    Output: I generate text only. While I can understand images and audio, I do not generate images or sounds myself.
    I am designed to be versatile—whether you need help brainstorming a new project, explaining a difficult concept, or drafting a professional report, I am here to assist!

@ User:

  • Can you understand this image?
    image1

@ huihui-gemma-4-12b-it-abliterated-i1:

  • thoughting
    ardthought rant1 3libbin ract2or****** rantogram** 1texogram48 rat<|channel>1thianthought ******** artist1ton rato7user** artistólogohellothought rant)*ior rot

Try using a higher-bit quantized version.

Tried Q4_K_M. Sent an image and the model crashed in LM Studio:

2026-06-08 21:19:52 [DEBUG]
Received request: POST to /v1/chat/completions with body [Hidden]
2026-06-08 21:19:53 [DEBUG]
LlamaV4::load called with model path: D:@llm-models\mradermacher\Huihui-gemma-4-12B-it-abliterated-i1-GGUF\Huihui-gemma-4-12B-it-abliterated.i1-Q4_K_M.gguf
LlamaV4::load config: n_parallel=1 n_ctx=131072 kv_unified=true
2026-06-08 21:19:53 [DEBUG]
gemma4 architecture detected, using AUTO flash attention mode
0.00.096.447 I srv load_model: loading model 'D:@llm-models\mradermacher\Huihui-gemma-4-12B-it-abliterated-i1-GGUF\Huihui-gemma-4-12B-it-abliterated.i1-Q4_K_M.gguf'
2026-06-08 21:19:53 [DEBUG]
0.00.890.049 W load: control-looking token: 50 '<|tool_response>' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-08 21:19:53 [DEBUG]
0.00.890.630 W load: control-looking token: 212 '' was not control-type; this is probably a bug in the model. its type will be overridden
2026-06-08 21:19:54 [DEBUG]
0.00.950.372 W load: special_eog_ids contains '<|tool_response>', removing '' token from EOG list
2026-06-08 21:19:58 [DEBUG]
0.05.614.160 W common_init_from_params: KV cache shifting is not supported for this context, disabling KV cache shifting
0.05.614.177 I common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
2026-06-08 21:19:59 [DEBUG]
0.06.110.968 W init_audio: audio input is in experimental stage and may have reduced quality:
https://github.com/ggml-org/llama.cpp/discussions/13759
0.06.110.985 I srv load_model: loaded multimodal model, 'D:/@llm-models/mradermacher/Huihui-gemma-4-12B-it-abliterated-i1-GGUF/Huihui-gemma-4-12B-it-abliterated.mmproj-f16.gguf'
0.06.110.992 I srv load_model: initializing slots, n_slots = 1
2026-06-08 21:19:59 [DEBUG]
0.06.281.561 W common_speculative_init: no implementations specified for speculative decoding
0.06.281.568 I slot load_model: id 0 | task -1 | new slot, n_ctx = 131072
0.06.281.609 I srv load_model: prompt cache is enabled, size limit: 8192 MiB
0.06.281.611 I srv load_model: use --cache-ram 0 to disable the prompt cache
0.06.281.613 I srv load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
0.06.281.613 I srv load_model: context checkpoints enabled, max = 32, min spacing = 256
0.06.281.652 I srv init: idle slots will be saved to prompt cache and cleared upon starting a new task
2026-06-08 21:19:59 [DEBUG]
0.06.284.378 I init: chat template, example_format: 'You are a helpful assistantHelloHi thereHow are you?'
2026-06-08 21:19:59 [INFO]
[huihui-gemma-4-12b-it-abliterated-i1@q4_k_m] Running chat completion on conversation with 4 messages.
2026-06-08 21:19:59 [INFO]
[huihui-gemma-4-12b-it-abliterated-i1@q4_k_m] Streaming response...
2026-06-08 21:19:59 [DEBUG]
0.06.285.664 I srv init: init: chat template, thinking = 0
0.06.286.030 I srv update_slots: all slots are idle
2026-06-08 21:19:59 [DEBUG]
LlamaV4::predict slot selection: session_id= server-selected (LCP/LRU)
2026-06-08 21:19:59 [DEBUG]
0.06.723.049 I slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = -1
0.06.723.061 I srv get_availabl: updating prompt cache
0.06.723.075 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000
0.06.723.081 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 131072 tokens, 8589934592 est)
0.06.723.085 I srv get_availabl: prompt cache update took 0.02 ms
0.06.723.149 I slot launch_slot_: id 0 | task 0 | processing task, is_child = 0
0.06.723.172 W slot update_slots: id 0 | task 0 | cache reuse is not supported - ignoring n_cache_reuse = 256
2026-06-08 21:19:59 [INFO]
[huihui-gemma-4-12b-it-abliterated-i1@q4_k_m] Prompt processing progress: 0.0%
2026-06-08 21:20:00 [DEBUG]
0.07.286.790 I srv process_chun: processing image...
2026-06-08 21:20:00 [INFO]
[huihui-gemma-4-12b-it-abliterated-i1@q4_k_m] Prompt processing progress: 23.4%
2026-06-08 21:20:00 [DEBUG]
llm-engine\llama.cpp\src\llama-context.cpp:1706: GGML_ASSERT((cparams.causal_attn || cparams.n_ubatch >= n_tokens_all) && "non-causal attention requires n_ubatch >= n_tokens") failed
2026-06-08 21:20:02 [ERROR]
[huihui-gemma-4-12b-it-abliterated-i1@q4_k_m] The model has crashed without additional information. (Exit code: 18446744072635812000). Error Data: n/a, Additional Data: n/a

dreamsmaner changed discussion status to closed

Update:
After increasing "Evaluation Batch Size" to >= 1024 the model doesn't crash. But the image reading output is still incoherent just like the issue I described.

dreamsmaner changed discussion status to open

Sign up or log in to comment