OWLv2 (Open-World Localization version 2) is an advanced zero-shot text-conditioned object detection model developed by Google Research. For more information about the model, see the research paper and Hugging Face model hub.
OWLv2 can be used through the Hugging Face Transformers library for zero-shot object detection tasks.
First install the required packages:
pip install transformers torch pillow requests
If you use OWLv2 in your research, please cite the original paper:
@misc{minderer2023scaling,
title={Scaling Open-Vocabulary Object Detection},
author={Matthias Minderer and Alexey Gritsenko and Neil Houlsby},
year={2023},
eprint={2306.09683},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
For more information about using OWLv2 with Transformers, see the Hugging Face documentation.