Fast Detection of Problems in Scanned Documents

Laserfiche Computer Science, 2016-17

Liaison(s): Tessa Adair ’14, Karl Chan ’89 P19, Carl Sykes
Advisor(s): Yekaterina Kharitonova, Melissa O’Neill
Students(s): Tiffany Sun (PM), Kharisma Calderon, Carmen Mejia, Andrew Scott

Laserfiche builds software that helps organizations digitize content and automate processes. To ensure that data from scanned paper documents can be accurately extracted, Laserfiche has tools to fix image quality problems such as skew and speckles. The goal of our project was to automatically and quickly detect problems in scanned documents. By detecting these problems, the software can reduce the time and processing required for image correction. Our team extracted features from a collection of scanned images and used machine learning classifiers to predict if a newly scanned document has problems.