{"id":6319,"date":"2013-10-22T11:30:40","date_gmt":"2013-10-22T00:30:40","guid":{"rendered":"http:\/\/mingersoft.com\/blog\/?p=6319"},"modified":"2013-10-20T20:45:43","modified_gmt":"2013-10-20T09:45:43","slug":"extracting-text-from-pdf-documents","status":"publish","type":"post","link":"https:\/\/mingersoft.com\/blog\/2013\/10\/extracting-text-from-pdf-documents\/","title":{"rendered":"Extracting Text from PDF Documents"},"content":{"rendered":"<p>Here&#8217;s a quick tip if you&#8217;ve needed to extract text from a PDF document.<\/p>\n<p>PDF files might not be easy to convert but if the PDF is of reasonably good quality (particularly if it is a scan of a hard copy). When you&#8217;ve got a hard copy and you&#8217;ve lost the electronic copy then you&#8217;re really left with two options:<\/p>\n<ol>\n<li>type up the document from scratch,<\/li>\n<li>try and extract the text with OCR (Optical Character Recognition).<\/li>\n<\/ol>\n<p>If you&#8217;ve got a larger document then you might want to go for option two and, if so, then you may be interested in a tool called <a title=\"FreeOCR\" href=\"http:\/\/www.paperfile.net\/download.html\" target=\"_blank\">FreeOCR<\/a>.<\/p>\n<p>It&#8217;s a pretty simple piece of software which you can use to scan a PDF and export text to Microsoft Word and, as the name implies, it is free. As long as you have Windows Vista, 7, 8 or 8.1 then you don&#8217;t need to install anything else but if you&#8217;re running Windows XP then you will need the .NET framework installed to make it work.<\/p>\n<p>Granted, it won&#8217;t preserve formatting and you&#8217;ll need some additional software to extract images but the text will probably be the bulk of the work for most document conversion needs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here&#8217;s a quick tip if you&#8217;ve needed to extract text from a PDF document. PDF files might not be easy to convert but if the PDF is of reasonably good quality (particularly if it is a scan of a hard copy). When you&#8217;ve got a hard copy and you&#8217;ve lost the electronic copy then you&#8217;re &hellip; <\/p>\n<p><a class=\"more-link btn\" href=\"https:\/\/mingersoft.com\/blog\/2013\/10\/extracting-text-from-pdf-documents\/\">Continue reading<\/a><\/p>\n","protected":false},"author":1,"featured_media":4087,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[18],"tags":[452,412],"class_list":["post-6319","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-pdf","tag-word","item-wrap"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/mingersoft.com\/blog\/wp-content\/uploads\/2012\/05\/PDF-Icon.png?fit=256%2C256&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/mingersoft.com\/blog\/wp-json\/wp\/v2\/posts\/6319","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mingersoft.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mingersoft.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mingersoft.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mingersoft.com\/blog\/wp-json\/wp\/v2\/comments?post=6319"}],"version-history":[{"count":0,"href":"https:\/\/mingersoft.com\/blog\/wp-json\/wp\/v2\/posts\/6319\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mingersoft.com\/blog\/wp-json\/wp\/v2\/media\/4087"}],"wp:attachment":[{"href":"https:\/\/mingersoft.com\/blog\/wp-json\/wp\/v2\/media?parent=6319"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mingersoft.com\/blog\/wp-json\/wp\/v2\/categories?post=6319"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mingersoft.com\/blog\/wp-json\/wp\/v2\/tags?post=6319"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}