--- license: mit language: en library_name: transformers tags: - unet - film - computer-vision - image-segmentation - medical-imaging - pytorch --- # FILMUnet2D This model is a 2D U-Net with FiLM conditioning for Ultrasound multi-organ segmentation. ## Installation Make sure you have `transformers` and `torch` installed. ```bash pip install transformers torch ``` ## Usage You can load the model and processor using the `Auto` classes from `transformers`. Since this repository contains custom code, make sure to pass `trust_remote_code=True`. ```python import torch from transformers import AutoModel, AutoImageProcessor from PIL import Image # 1. Load model and processor repo_id = "AImageLab-Zip/US_FiLMUNet" processor = AutoImageProcessor.from_pretrained(repo_id, trust_remote_code=True) model = AutoModel.from_pretrained(repo_id, trust_remote_code=True) model.eval() # 2. Load and preprocess your image # The processor handles resizing, letterboxing, and normalization. image = Image.open("path/to/your/image.png").convert("RGB") inputs = processor(images=image, return_tensors="pt") # 3. Prepare conditioning input # This should be an integer tensor representing the organ ID. # Replace `4` with the appropriate ID for your use case. organ_id = torch.tensor([4]) # 4. Run inference with torch.no_grad(): outputs = model(**inputs, organ_id=organ_id) # 5. Post-process the output to get the final segmentation mask # The processor can convert the logits to a binary mask, automatically handling # the removal of letterbox padding and resizing to the original image dimensions. mask = processor.post_process_semantic_segmentation( outputs, inputs, threshold=0.7, return_as_pil=True )[0] # 6. Save the result mask.save("output_mask.png") print("Segmentation mask saved to output_mask.png") ``` ### Model Details - **Architecture:** U-Net with FiLM layers for conditional segmentation. - **Conditioning:** The model's output is conditioned on an `organ_id` input. - **Input:** RGB images. - **Output:** A single-channel segmentation mask. ### Configuration The model configuration can be accessed via `model.config`. Key parameters include: - `in_channels`: Number of input channels (default: 3). - `num_classes`: Number of output classes (default: 1). - `n_organs`: The number of different organs the model was trained to condition on. - `depth`: The depth of the U-Net. - `size`: The base number of filters in the first layer. ### Organ IDs The `organ_id` passed to the model corresponds to the following mapping: ```python organ_to_class_dict = { "appendix": 0, "breast": 1, "breast_luminal": 1, "cardiac": 2, "thyroid": 3, "fetal": 4, "kidney": 5, "liver": 6, "testicle": 7, } ``` ### Alternative Versions This repository contains multiple versions of the model located in subfolders. You can load a specific version by using the `subfolder` parameter. #### 4-Stage U-Net This version has a U-Net depth of 4. ```python from transformers import AutoModel model_4_stages = AutoModel.from_pretrained( "AImageLab-Zip/US_FiLMUNet", subfolder="unet_4_stages", trust_remote_code=True ) ```