package/code mismatch

#1
by SmerkyG - opened

After pip install retention, I still get errors running the Brumby model.

modeling_brumby.py", line 43, in <module>
    raise ImportError("Retention is required by the Brumby model. Please install it with `pip install retention`.")
ImportError: Retention is required by the Brumby model. Please install it with `pip install retention`.

Please see the following indication of why:

>>> from retention.triton import power_retention, power_retention_inference
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'retention.triton'
>>> import retention
>>> from retention._utils import compute_expanded_dim
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'retention._utils'

The retention module appears to exist, but not retention.triton or retention._utils submodules.

manifest ai org

What version of retention is this? Try to do pip install retention==1.0.5

Apparently it was a problem with python<3.12. After upgrading to that, there remains a trickier problem that vidrial does not appear to be compatible with AMD devices. Specifically, get_cuda_device_properties

manifest ai org

yeah unfortunately vidrial only works with CUDA. The good news is that the triton kernel shouldn't, I'll put up a new version of retention that doesn't require vidrial.

Sounds great! (Btw, AMD can generally convert CUDA, but I get why that could be a whole lot trickier with something like vidrial!)

I was planning to run RULER and other context usage tests on the model, since I didn't see those in the README. Unless you already have those results handy? Seems important, since the promise of power retention is tied so much to that aspect.

manifest ai org

Just pushed retention 1.0.6 that doesn't require vidrial, let me know if resolves the issue :-)

The model is not yet long-context finetuned so I think it would achieved the same performance as Qwen3-14B-base on RULER. You are exactly right, we are doing more long context finetuning to actually realize the advantage of retention.

Gotcha, will be interesting to see once finetuned. What ctxlen was training originally done at?

Even after upgrading to retention 1.0.6 I am unfortunately still getting calls to vidrial due to import:

  File "/home/rwkv/.cache/huggingface/modules/transformers_modules/manifestai/Brumby_hyphen_14B_hyphen_Base/abd96ca97105fe9fc0b522f2353ec28b59143c24/modeling_brumby.py", line 40, in <module>
    from retention.triton import power_retention, power_retention_inference
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/retention/__init__.py", line 1, in <module>
    from retention.triton import power_retention, power_retention_inference
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/retention/triton.py", line 4, in <module>
    from retention._update_state.triton import update_state
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/retention/_update_state/__init__.py", line 8, in <module>
    from retention._update_state.vidrial import update_state as update_state_vidrial, default_D as default_D
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/retention/_update_state/vidrial.py", line 7, in <module>
    from vidrial.kernels.sympow_mma.op import op as sympow_mma
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/vidrial/kernels/sympow_mma/op.py", line 2, in <module>
    from vidrial.kernels.sympow.interface import interface_reference as sympow_reference
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/vidrial/kernels/sympow/interface.py", line 1, in <module>
    from vidrial.kernels.sympow.op import op, op_reference
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/vidrial/kernels/sympow/op.py", line 5, in <module>
    from vidrial.kernels.sympow.binding import binding_autotuned
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/vidrial/kernels/sympow/binding.py", line 13, in <module>
    from vidrial.kernels.mma_configurator import SMEM_LIMIT, dtype_to_bytes
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/vidrial/kernels/mma_configurator.py", line 15, in <module>
    device_props = get_cuda_device_basic_props()[0]
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/vidrial/py_utils/gpu.py", line 103, in get_cuda_device_properties
    raise OSError("could not load any of: " + ' '.join(libnames))
OSError: could not load any of: libcuda.so libcuda.dylib nvcuda.dll cuda.dll
manifest ai org

It was trained with 8k context len, using Nemotron's pretraining data.

It looks like vidrial is available in your system, in which case retention will try to import it, the error should go away if you uninstall it (it's not compatible anyway)

Hmm same problem basically, once I uninstall vidrial:

Traceback (most recent call last):
  File "/home/rwkv/.cache/huggingface/modules/transformers_modules/manifestai/Brumby_hyphen_14B_hyphen_Base/abd96ca97105
fe9fc0b522f2353ec28b59143c24/modeling_brumby.py", line 40, in <module>
    from retention.triton import power_retention, power_retention_inference
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/retention/__init__.py", line 1, in <module>
    from retention.triton import power_retention, power_retention_inference
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/retention/triton.py", line 4, in <module>
    from retention._update_state.triton import update_state
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/retention/_update_state/__init__.py", line 8, in <module>
    from retention._update_state.vidrial import update_state as update_state_vidrial, default_D as default_D
  File "/home/rwkv/miniconda3/lib/python3.13/site-packages/retention/_update_state/vidrial.py", line 5, in <module>
    from vidrial.py_utils.common import default_d_tile
ModuleNotFoundError: No module named 'vidrial.py_utils'
manifest ai org

There was another bug on our end induced in disabling vidrial, hopefully 2nd time is the charm, try pip install retention==1.0.7 --upgrade

manifest@us-central-3-002:~/sean$ uv venv --seed
Using CPython 3.11.14
Creating virtual environment with seed packages at: .venv
 + pip==25.3
 + setuptools==80.9.0
 + wheel==0.45.1
Activate with: source .venv/bin/activate
manifest@us-central-3-002:~/sean$ source .venv/bin/activate
(sean) manifest@us-central-3-002:~/sean$ uv pip install transformers retention==1.0.7 --upgrade
(sean) manifest@us-central-3-002:~/sean$ uv pip list | grep vidrial
(sean) manifest@us-central-3-002:~/sean$ python
Python 3.11.14 (main, Oct 14 2025, 21:26:53) [Clang 20.1.4 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from transformers import AutoModelForCausalLM
>>> AutoModelForCausalLM.from_pretrained('manifestai/Brumby-14B-Base', trust_remote_code=True)
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 8/8 [00:00<00:00, 129.55it/s]
BrumbyForCausalLM(
  (model): BrumbyModel(
    (embed_tokens): Embedding(151936, 5120)
    (layers): ModuleList(
      (0-39): 40 x BrumbyDecoderLayer(
        (self_attn): BrumbyAttention(
          (q_proj): Linear(in_features=5120, out_features=5120, bias=False)
          (k_proj): Linear(in_features=5120, out_features=1024, bias=False)
          (v_proj): Linear(in_features=5120, out_features=1024, bias=False)
          (g_proj): Linear(in_features=5120, out_features=8, bias=False)
          (o_proj): Linear(in_features=5120, out_features=5120, bias=False)
          (q_norm): BrumbyRMSNorm((128,), eps=1e-06)
          (k_norm): BrumbyRMSNorm((128,), eps=1e-06)
        )
        (mlp): BrumbyMLP(
          (gate_proj): Linear(in_features=5120, out_features=17408, bias=False)
          (up_proj): Linear(in_features=5120, out_features=17408, bias=False)
          (down_proj): Linear(in_features=17408, out_features=5120, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): BrumbyRMSNorm((5120,), eps=1e-06)
        (post_attention_layernorm): BrumbyRMSNorm((5120,), eps=1e-06)
      )
    )
    (norm): BrumbyRMSNorm((5120,), eps=1e-06)
    (rotary_emb): BrumbyRotaryEmbedding()
  )
  (lm_head): Linear(in_features=5120, out_features=151936, bias=False)
)

Thanks for the quick turnaround, it runs now!

Just a heads up, I'd consider checking ctxlen evals going forward - I ran a quick passkey test and for me the model begins to fail at around 2k ctxlen even in the easiest setting.

SmerkyG changed discussion status to closed
SmerkyG changed discussion status to open
SmerkyG changed discussion status to closed

Sign up or log in to comment