Config files are broken
"Error(s) in loading state_dict for SAMAudioJudgeModel:
Missing key(s) in state_dict: "data_proj.weight", "data_proj.bias", "audio_codec.encoder.block.0.bias", ... and some more lines"
Issue: Unable to run SAM-Audio locally due to missing torch.nn.attention.flex_attention despite full dependency installation
I’m trying to run SAM-Audio locally using the public GitHub repo (facebookresearch/sam-audio) and the Hugging Face checkpoint facebook/sam-audio-large, but I’m blocked by a PyTorch API issue. I’m on Linux (no sudo, no conda), Python 3.11.13 via pyenv, and I successfully ran pip install ., including installing perception-models (which resolves core.audio_visual_encoder), fixing FFmpeg/torchcodec issues, and getting past all earlier dependency errors.
Importing from sam_audio import SAMAudio, SAMAudioProcessor now proceeds until it fails with ModuleNotFoundError: No module named 'torch.nn.attention.flex_attention'. I verified that flex_attention.py exists in the PyTorch GitHub source tree (e.g., v2.9.1), but it is not importable from any public PyTorch wheels I tested (2.2.x, 2.4/2.5, 2.9.1). This suggests SAM-Audio depends on an internal or experimental PyTorch API that is not exposed in released builds. Is SAM-Audio intended to require a PyTorch nightly or Meta-internal build, and is there a supported public PyTorch version or workaround to run it end-to-end?