PDF → Text extraction

Extract all readable text from your PDF. Download as a .txt file. Everything stays in your browser, no file is sent to a server.

How PDF to text works

PDF to Text extracts the existing text layer from a PDF and saves it as a plain .txt file. The extraction is performed by pdf.js inside your browser, reading the text content objects embedded in the PDF's page streams. The document never leaves your device; the result is assembled locally and offered as a direct download.

This tool reads a text layer that is already present in the file. If your PDF was created by a word processor or export tool, it almost certainly has a text layer and extraction will work well. If the PDF is a scan of a paper document, the pages contain image data only and there is no text layer to extract; in that case, this tool will return empty or incomplete output. Scanned PDFs require optical character recognition (OCR) to produce text, which is a separate process not performed by this tool. Check whether your PDF has selectable text in a viewer before using this tool.

Written by Bastien Sulyan

How to use PDF to text, step by step

Load your PDF into the text extraction tool.
Wait for pdf.js to read the text layer from all pages.
Review the extracted text preview.
Click download to save the .txt file.

Common use cases

Extract the text from a research paper PDF to paste into a note-taking application or run through a summariser.
Pull the content from a PDF invoice into a spreadsheet for bookkeeping without manual retyping.
Recover the text from a corrupted or locked-layout PDF where copy-paste in a viewer is broken.
Convert a PDF article into plain text for processing with a script or command-line tool.

Frequently asked questions

Why does the extracted text come out empty or garbled for some PDFs?

The most common cause is that the PDF is a scan: the pages are images and contain no text layer. Other causes include PDFs where the text is stored as outlines or custom font encodings that pdf.js cannot map to readable characters. For scanned documents, OCR is needed to produce text.

Does this tool perform OCR on scanned PDFs?

No. This tool reads an existing text layer from the PDF. It does not perform optical character recognition. For scanned PDFs, use the OCR tool, which passes the page images through a local OCR engine in your browser.

Is the text extraction done on a server or in my browser?

In your browser. pdf.js reads the PDF structure locally, parses the text content objects from each page stream, and assembles the output in browser memory. The PDF data at no point leaves your device during this process.

Will the formatting and layout be preserved in the text output?

No. Plain text does not carry font, size, colour or position information. The output is unformatted text in reading order as determined by pdf.js. Tables, multi-column layouts and special formatting are flattened. For rich layout preservation, PDF to HTML converters are a better fit.

Can I extract text from a password-protected PDF?

If the PDF has an open-user password, you must provide it for the PDF to be readable at all. Owner-level extraction restrictions may also block the operation. Remove those restrictions first using the PDF Unlock tool, then retry extraction.

Do I need to create an account to extract text from a PDF?

No. There is no sign-up and no account. Drop the file, read the extracted preview, and download the .txt file.

Does PDF to Text work on a mobile browser?

Yes. pdf.js runs the same way on a phone browser as on desktop. Copy or download the extracted text straight from the mobile page once the extraction finishes.

Related tools

Keep everything local, explore complementary tools.

All PDF tools