| --- |
| language: |
| - en |
| tags: |
| - computer-vision |
| - segmentation |
| - few-shot-learning |
| - zero-shot-learning |
| - sam2 |
| - clip |
| - pytorch |
| license: apache-2.0 |
| datasets: |
| - custom |
| metrics: |
| - iou |
| - dice |
| - precision |
| - recall |
| library_name: pytorch |
| pipeline_tag: image-segmentation |
| --- |
| |
| # SAM 2 Few-Shot/Zero-Shot Segmentation |
|
|
| This repository contains a comprehensive research framework for combining Segment Anything Model 2 (SAM 2) with few-shot and zero-shot learning techniques for domain-specific segmentation tasks. |
|
|
| ## π― Overview |
|
|
| This project investigates how minimal supervision can adapt SAM 2 to new object categories across three distinct domains: |
| - **Satellite Imagery**: Buildings, roads, vegetation, water |
| - **Fashion**: Shirts, pants, dresses, shoes |
| - **Robotics**: Robots, tools, safety equipment |
|
|
| ## ποΈ Architecture |
|
|
| ### Few-Shot Learning Framework |
| - **Memory Bank**: Stores CLIP-encoded examples for each class |
| - **Similarity-Based Prompting**: Uses visual similarity to generate SAM 2 prompts |
| - **Episodic Training**: Standard few-shot learning protocol |
|
|
| ### Zero-Shot Learning Framework |
| - **Advanced Prompt Engineering**: 4 strategies (basic, descriptive, contextual, detailed) |
| - **Attention-Based Localization**: Uses CLIP's cross-attention for prompt generation |
| - **Multi-Strategy Prompting**: Combines different prompt types |
|
|
| ## π Performance |
|
|
| ### Few-Shot Learning (5 shots) |
| | Domain | Mean IoU | Mean Dice | Best Class | Worst Class | |
| |--------|----------|-----------|------------|-------------| |
| | Satellite | 65% | 71% | Building (78%) | Water (52%) | |
| | Fashion | 62% | 68% | Shirt (75%) | Shoes (48%) | |
| | Robotics | 59% | 65% | Robot (72%) | Safety (45%) | |
|
|
| ### Zero-Shot Learning (Best Strategy) |
| | Domain | Mean IoU | Mean Dice | Best Class | Worst Class | |
| |--------|----------|-----------|------------|-------------| |
| | Satellite | 42% | 48% | Building (62%) | Water (28%) | |
| | Fashion | 38% | 45% | Shirt (58%) | Shoes (25%) | |
| | Robotics | 35% | 42% | Robot (55%) | Safety (22%) | |
|
|
| ## π Quick Start |
|
|
| ### Installation |
| ```bash |
| pip install -r requirements.txt |
| python scripts/download_sam2.py |
| ``` |
|
|
| ### Few-Shot Experiment |
| ```python |
| from models.sam2_fewshot import SAM2FewShot |
| |
| # Initialize model |
| model = SAM2FewShot( |
| sam2_checkpoint="sam2_checkpoint", |
| device="cuda" |
| ) |
| |
| # Add support examples |
| model.add_few_shot_example("satellite", "building", image, mask) |
| |
| # Perform segmentation |
| predictions = model.segment( |
| query_image, |
| "satellite", |
| ["building"], |
| use_few_shot=True |
| ) |
| ``` |
|
|
| ### Zero-Shot Experiment |
| ```python |
| from models.sam2_zeroshot import SAM2ZeroShot |
| |
| # Initialize model |
| model = SAM2ZeroShot( |
| sam2_checkpoint="sam2_checkpoint", |
| device="cuda" |
| ) |
| |
| # Perform zero-shot segmentation |
| predictions = model.segment( |
| image, |
| "fashion", |
| ["shirt", "pants", "dress", "shoes"] |
| ) |
| ``` |
|
|
| ## π Project Structure |
|
|
| ``` |
| βββ models/ |
| β βββ sam2_fewshot.py # Few-shot learning model |
| β βββ sam2_zeroshot.py # Zero-shot learning model |
| βββ experiments/ |
| β βββ few_shot_satellite.py # Satellite experiments |
| β βββ zero_shot_fashion.py # Fashion experiments |
| βββ utils/ |
| β βββ data_loader.py # Domain-specific data loaders |
| β βββ metrics.py # Comprehensive evaluation metrics |
| β βββ visualization.py # Visualization tools |
| βββ scripts/ |
| β βββ download_sam2.py # Setup script |
| βββ notebooks/ |
| βββ analysis.ipynb # Interactive analysis |
| ``` |
|
|
| ## π¬ Research Contributions |
|
|
| 1. **Novel Architecture**: Combines SAM 2 + CLIP for few-shot/zero-shot segmentation |
| 2. **Domain-Specific Prompting**: Advanced prompt engineering for different domains |
| 3. **Attention-Based Prompt Generation**: Leverages CLIP attention for localization |
| 4. **Comprehensive Evaluation**: Extensive experiments across multiple domains |
| 5. **Open-Source Implementation**: Complete codebase for reproducibility |
|
|
| ## π Citation |
|
|
| If you use this work in your research, please cite: |
|
|
| ```bibtex |
| @misc{sam2_fewshot_zeroshot_2024, |
| title={SAM 2 Few-Shot/Zero-Shot Segmentation: Domain Adaptation with Minimal Supervision}, |
| author={Your Name}, |
| year={2024}, |
| url={https://huggingface.co/esalguero/Segmentation} |
| } |
| ``` |
|
|
| ## π€ Contributing |
|
|
| We welcome contributions! Please feel free to submit issues, pull requests, or suggestions for improvements. |
|
|
| ## π License |
|
|
| This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details. |
|
|
| ## π Links |
|
|
| - **GitHub Repository**: [https://github.com/ParallelLLC/Segmentation](https://github.com/ParallelLLC/Segmentation) |
| - **Research Paper**: See `research_paper.md` for complete methodology |
| - **Interactive Analysis**: Use `notebooks/analysis.ipynb` for exploration |
|
|
| --- |
|
|
| **Keywords**: Few-shot learning, Zero-shot learning, Semantic segmentation, SAM 2, CLIP, Domain adaptation, Computer vision |