TrOCR (large-sized model, fine-tuned on SROIE)
TrOCR model fine-tuned on the SROIE dataset. It was introduced in the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al. and first released in this repository.
Disclaimer: The team releasing TrOCR did not write a model card for this model so this model card has been written by the Hugging Face team.
The TrOCR model is an encoder-decoder model, consisting of:
Images are processed as sequences of fixed-size patches (16x16 resolution) which are linearly embedded with added position embeddings before being fed to the Transformer encoder. The text decoder then autoregressively generates tokens.
You can use this raw model for optical character recognition (OCR) on single text-line images. For other tasks, check the model hub for fine-tuned versions.
Here's how to use this model in PyTorch:
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
# Load image (note: this model is designed for printed text)
url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg'
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
# Initialize processor and model
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-large-printed')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-large-printed')
# Process image and generate text
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
@misc{li2021trocr,
title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models},
author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
year={2021},
eprint={2109.10282},
archivePrefix={arXiv},
primaryClass={cs.CL}
}