MobileSAM (Mobile Segment Anything) is a lightweight, efficient image segmentation model designed for mobile and edge devices. It brings the capabilities of Meta’s Segment Anything Model (SAM) to resource-constrained environments while maintaining compatibility with the original SAM pipeline.
MobileSAM was introduced in the paper Faster Segment Anything: Towards Lightweight SAM for Mobile Applications (Zhang et al., 2023) and is available as pretrained weights (mobile_sam.pt) through Ultralytics.
MobileSAM is a prompt-based instance segmentation model optimized for real-time applications on devices with limited computational resources.
It preserves the full SAM pipeline, including:
The core architectural difference lies in the image encoder:
| Component | Original SAM | MobileSAM |
|---|---|---|
| Image Encoder | ViT-H (632M parameters) | Tiny-ViT (5M parameters) |
| Mask Decoder | 3.876M parameters | 3.876M parameters |
| Whole Pipeline | 615M parameters | 9.66M parameters |
| Metric | Original SAM | MobileSAM |
|---|---|---|
| Total Parameters | 615M | 9.66M |
| Inference Speed | 456 ms/image | 12 ms/image |
| Encoder Speed | 452 ms | 8 ms |
| Decoder Speed | 4 ms | 4 ms |
MobileSAM processes an image in approximately 12 ms on a single GPU, making it suitable for real-time deployment.
It is approximately:
MobileSAM consists of two major components:
Tiny-ViT Image Encoder (5M parameters)
Prompt-Guided Mask Decoder (unchanged from SAM)
Accepts:
Produces high-quality segmentation masks
Because the mask decoder is unchanged, MobileSAM maintains functional equivalence with SAM in terms of prompt behavior.
| Model | Weights | Supported Tasks | Inference | Validation | Training | Export |
|---|---|---|---|---|---|---|
| MobileSAM | mobile_sam.pt |
Instance Segmentation | ✅ | ❌ | ❌ | ❌ |
Currently, MobileSAM supports inference only.
| Model | Size (MB) | Params (M) | CPU Speed (ms/im) |
|---|---|---|---|
| Meta SAM-b | 375 | 93.7 | 49401 |
| Meta SAM2-b | 162 | 80.8 | 31901 |
| Meta SAM2-t | 78.1 | 38.9 | 25997 |
| MobileSAM | 40.7 | 10.1 | 25381 |
| FastSAM-s (YOLOv8 backbone) | 23.7 | 11.8 | 55.9 |
| YOLOv8n-seg | 6.7 | 3.4 | 24.5 |
| YOLO11n-seg | 5.9 | 2.9 | 30.1 |
Observations:
Benchmarks were conducted on Apple M4 Pro (24GB RAM), torch 2.6.0, ultralytics 8.3.90.
MobileSAM is designed for:
It has been adopted in projects such as:
MobileSAM shares the exact same API as SAM, allowing drop-in replacement.
from ultralytics import SAM
model = SAM("mobile_sam.pt")
model.predict(
"ultralytics/assets/zidane.jpg",
points=[900, 370],
labels=[1]
)
Supports:
from ultralytics.data.annotator import auto_annotate
auto_annotate(
data="path/to/images",
det_model="yolo11x.pt",
sam_model="mobile_sam.pt"
)
This enables automatic segmentation dataset generation using YOLO detection + MobileSAM masks.
MobileSAM was:
The training code is planned for release.
@article{mobile_sam,
title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
journal={arXiv preprint arXiv:2306.14289},
year={2023}
}
MobileSAM provides a practical trade-off between:
By replacing the ViT-H encoder with Tiny-ViT while retaining the mask decoder, it achieves dramatic parameter and latency reductions without breaking compatibility.
For mobile and embedded segmentation tasks where SAM is too heavy, MobileSAM offers a strong, production-ready alternative.