Extracting Text from PDF Documents

Here’s a quick tip if you’ve needed to extract text from a PDF document.

PDF files might not be easy to convert but if the PDF is of reasonably good quality (particularly if it is a scan of a hard copy). When you’ve got a hard copy and you’ve lost the electronic copy then you’re really left with two options:

  1. type up the document from scratch,
  2. try and extract the text with OCR (Optical Character Recognition).

If you’ve got a larger document then you might want to go for option two and, if so, then you may be interested in a tool called FreeOCR.

It’s a pretty simple piece of software which you can use to scan a PDF and export text to Microsoft Word and, as the name implies, it is free. As long as you have Windows Vista, 7, 8 or 8.1 then you don’t need to install anything else but if you’re running Windows XP then you will need the .NET framework installed to make it work.

Granted, it won’t preserve formatting and you’ll need some additional software to extract images but the text will probably be the bulk of the work for most document conversion needs.

Leave a Reply

Your email address will not be published.