Images to Data :OCR Softwares

January 13, 2009
77 Views

Some softwares I have used to convert images into text and even tables are –
http://code.google.com/p/ocropus/
and http://code.google.com/p/tesseract-ocr/
Note- Both are open source , funded by Google , who uses them and are greatly helpful for say email marketing or converting images into rows and columns of text. You may need to tweak the resolution a bit and […]

Some softwares I have used to convert images into text and even tables are –

http://code.google.com/p/ocropus/

and http://code.google.com/p/tesseract-ocr/

Note- Both are open source , funded by Google , who uses them and are greatly helpful for say email marketing or converting images into rows and columns of text. You may need to tweak the resolution a bit and the highlighted scan area in order to get good results and can thus convert images into text and numeric data with a simple desktop scanner.

For higher end needs like production environments for questionnaires and responses-

The following come from the SPSS X List – ( it’s a nice list with many business problems that are also familiar in the SAS and R lists)

The software that SPSS recommends is ReadSoft (http://www.readsoft.com/).  Additionally, SPSS have a couple of complimentary products mrPaper (http://www.spss.com/mrpaper/) and mrScan (http://www.spss.com/mrScan/).

and from Dr Steven Lars in the same list

The high 90%+ accuracy of OCR technology of modern scanning, Remarks OMR (optical mark recognition) algorithms produce 99.9%+ accuracy for detected full or empty closed circles and squares.  It worked very well .
Scanning has been accomplished using upper end, hobby grade scanners with automatic form-feed options driven by Windows software.  Though these machines were purchased through university purchasing, all could have been purchased at Best Buy ( discount technology stores), a comparable store, or at internet discount sources.
You can store data for research purposes, access the data using SPSS or Excel, and respond to counselor questions within a week of receiving the raw, paper surveys.
The current preference is to use web-based information gathering developed through university information technology resources or developed on www.Zoomerang.com ( or even www.surveymonkey.com ).  

 

The biggest challenges were:
First, we had to be careful in the production of our to-be-scanned forms.  Our forms had to be printed on the same copy machine from a set master or printed on the same laser printer to ensure accuracy.  Poor quality copies and printing yields inaccurate scanning.  Survey color also has to be managed carefully, as some colors are opaque to some optical scanners.
 

A link to the publishers of Remark OMR:
http://www.gravic.com/remark/officeomr/index.html?gclid=CJj01MTSiZgCFRxNagodd0_IDQ

www.decisionstats.com Tags: ,