Charles River Analytics, Inc.
Charles River Analytics
HomeContact UsSite Map
About UsGovernment ServicesCommercial SolutionsPublicationsCareers
Abstract
Abstract  |  back

Automatic Feature Selection with Applications to Script Identification of Degraded Documents

V. Ablavsky and M.R. Stevens

Proceedings of the Seventh International Conference on Document Analysis and Recognition, Edinburgh, Scotland (August, 2003)

Current approaches to script identification rely on hand-selected features and often require processing a significant part of the document to achieve reliable identification. We present an approach that applies a large pool of image features to a small training sample and uses subset feature selection techniques to automatically select a subset with the most discriminating power. At run time we use a classifier coupled with an evidence accumulation engine to report a script label once a preset likelihood threshold has been reached. We apply the system to a diverse corpus of printed Russian and English documents that suffer from common degradation problems. Our validation study shows promising results both in terms of the script identification accuracy and the ability to identify script on the scale of individual words and text lines.

Request a Copy

Innovative Solutions through Intelligent Systems®