How OCR · image/PDF to text works
OCR, Optical Character Recognition, extracts text from scanned images and image-based PDFs and gives you a searchable, copyable result. Sunasty runs the Tesseract engine compiled to WebAssembly, loading the language model you select on demand.
Accuracy depends on the quality of your scan and the language selected. Crisp, high-contrast scans of printed text in a supported language typically yield excellent results; handwriting, low-resolution scans, pages with complex multi-column layouts or mixed scripts will be less accurate. Always review the output, particularly for names, numbers and technical terms. For best results, deskew the scan first with the PDF Deskew tool.