Scraping the relevant accounting information from PDF format into text
Steve Hyde, OnnX Inc, working in healthcare area
Data from a legacy accounting system could only be provided in PDF format. We used PDF2TXT to convert the PDF into a text file for the purpose of scraping the relevant accounting information from the PDF into a proprietary accounting system.
No other tool, free or expensive, came close to producing the quality and the accuracy of the output we obtained from PDF2TXT
We began this project with over 100 PDF files generated from different versions of a legacy system over the last several years. We converted these files using PDF to Text conversion tools from MANY vendors. Some tools yielded extremely poor results. Some tools converted some files accurately but failed on the same type of file generated in a different time period or with a different mix of data.
PDF2TXT converted all the files and gave a result that EXACTLY matched the source PDF. No other tool, free or expensive, came close to producing the quality and the accuracy of the output we obtained from PDF2TXT. PDF2TXT changed this application from an impossible to manage process to a black box. If you choose another tool, you are wasting your time.

