OCR PDF
Make a scanned PDF searchable.
Pick a scanned or image-based PDF (max 50 MB)
Heads up — first run downloads ~12 MB. The OCR engine and English language data load from the Tesseract project’s public CDN the first time. After that it’s cached. Expect about 5–15 seconds per page on a modern device. English only at launch.
Run optical character recognition on a scanned or image-based PDF. We render each page, recognise the text with Tesseract, and write the original pages back with an invisible text layer on top — so the PDF looks identical but text is now selectable and searchable. English only at launch. Runs entirely in your browser, but the first run downloads about 12 MB of language data from the Tesseract project's public CDN (one-time, cached).
How it works
How OCR PDF works
Upload your PDF
Drop in a scanned or image-based PDF. The smaller and clearer the scan, the better the OCR result.
Wait for OCR
Tesseract.js processes each page (5–15 seconds per page). The first run downloads ~12 MB of language data; subsequent runs use the cache.
Download the searchable PDF
Pages look identical to the original, but text is now selectable, copyable, and searchable.
FAQ
Frequently asked questions
- It looks at each page as an image and identifies the shapes of letters, then writes the recognised text into the PDF as a hidden layer over the original page image. Selection rectangles, copy/paste, and Ctrl/Cmd+F search all start working — without changing how the page looks.