THE HONG KONG POLETECHNIC UNIVERSITY
Final Year Project Proposal Mobile Device Optical Character Recognition and Applications
Student Name: YANG Fan Student ID: 06846354d Supervisor: Prof. Henry Chan
This document serves as the proposal for the final year project supervised by Prof. Henry Chan. One optical character recognition (OCR) system is proposed and several mobile applications would be built upon the system.
This pages is left blank intentionally
2
Table of Contents Problem Statement ......................................................................................................................... 4 Objectives and Outcome ................................................................................................................. 4 Objectives .................................................................................................................................... 4 Outcome ...................................................................................................................................... 5 Project Methodology ....................................................................................................................... 6 Developing on PC......................................................................................................................... 6 Preprocessing .......................................................................................................................... 6 Character Recognition ............................................................................................................. 7 Porting to Android ....................................................................................................................... 8 Why Android? .......................................................................................................................... 8 Porting Method ....................................................................................................................... 9 Project Schedule .............................................................................................................................. 9 Resources Estimation ...................................................................................................................... 9 References ..................................................................................................................................... 10
3
Problem Statement The computational capability of mobile device has been burgeoning during past two or three years. Those applications that are once futile on mobile device due to the limitation of computing power are now becoming available to mobile users. Moreover, with the release of several state-of-art mobile operating systems especially iOS and Android, the creativity of developers and researcher has been unleashed. However, there is not any robust, efficient and yet free optical c haracter recognition (OCR) system available in the market even though there are plenty of popular desktop OCR applications. It is then crucial to develop an OCR system that is suitable to mobile device. Then this system can serve as a base service to other applications. Potentially it can also be a good exploration for using OCR as an input for the mobile device. To be more specific, following problems are to be tackled in this project:
Is it possible to build an OCR system on mobile de vice? If it is, which algorithm is most effective and efficient?
Given a mobile OCR system, is there any possible application that can build on top of it?
Is it feasible to use OCR system as a complementary input method?
Therefore, in my proposal, I suggest to implement a mobile OCR system and also develop mobile applications that can utilize such capability.
Objectives and Outcome Objectives The objectives of this project primarily contain three goals:
Design and Implement Mobile OCR System Design and implement mobile OCR system which can successfully and efficiently recognize characters from image. Current OCR systems are mostly available on PC systems. An effective and efficient OCR system will become the foundation of other OCR related applications.
Implement Application that utilizes the mobile OCR system Utilize the OCR system to implement an application which can recognize Chinese 2D code. There are a lot of applications focusing on the English 2D code. English 2D code makes it much easier to send it through SMS messages. Unlike QR code, a 2D code would not require the internet connect ion. A Chinese 2D code application will serve the 4
same purpose but the system uses Chinese characters as the primary encoding characters.
Investigate Possible Ways to use OCR as input for mobile device OCR will potentially become an important supplement input method for mobile device. In this project, the possible ways to utilize OCR to input characters will be explored.
Outcome The output of this project will potentially benefit the entire mobile user groups, the possible outcome includes:
An OCR service that runs on mobile device which can convert characters contained in an image into text.
A mobile application which can recognize and decode C hinese 2D code
An input method which can use characters in an image as input on the m obile device
5
Project Methodology Due the complexity of this project, also by its nature, the project can be divided into three major phases.
Developing on PC As the first stage, an experimental OCR system will be developed on a P C system for demonstration and validation purpose. The major technique and theory are mostly mature for optical character recognition (OCR). In order to develop an OCR system, following preprocessing will be applied:
Input
Character
image
Segmentation
Denoising
Edge detection
recognition
Figure 1 Preprocessing
Preprocessing The first step in the preprocessing is t o segment the shape image, a simple thresolding is applied to convert the gray level shape image into binary image. In reality, shape images are often corrupted with noise, as a result, the shape obtained from the thresholding usually has noise around the shape boundary, therefore, a denoise process is applied. The denoising process eliminates those isolated pixels and those isolated small regions or segments. Then the technique of edge detection will be applied. The result of applying an edge detector to an image may lead to a set of connected curves that indicate the boundaries of object s, the boundaries of surface markings as well as curves that correspond to discontinuities in surface orientation. Thus, applying an edge detection algorithm to an image may significantly reduce the amount of data to be processed and may therefore filter out information that may be regarded as less relevant, while preserving the important structural properties of an image.
Original
Segmentation Figure 2
6
Denoised
Edge Detected
Character Recognition After preprocessing, the character recognition methods can then applied. However, there are three ways that I want to compare in this project. These t hree methods are all widely applied currently.
Fourier Descriptor The term "Fourier Descriptor'' describes a family of related image features. Generally, it refe rs to the use of a Fourier Transform to analyze a closed planar curve. Much work has been done studying the use of the Fourier descriptor as a mechanism for shape identification. Some work has also been done using Fourier descriptors to assist in OCR. In the context of OCR, the planar curve is generally derived from a character boundary. Since each of a c haracter's boundaries is a closed curve, the sequence of (x, y) coordinates that specifies the curve is periodic. This makes it ideal for analysis with a Discrete Fourier Transform. In this project, the Fourier descriptor approach will the primary way of character r ecognition due its claimed efficiency and ease of use.
A single connected component (left image) and its boundary curves and centroids (right image).
ANN The Artificial Neural Network (ANN) is a wonderful tool that can help to resolve such kind of problems. The ANN is an information-processing paradigm inspired by the way the human brain processes information. Artificial neural networks are collections of mathematical models that represent some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning. The key element of ANN is topology. The ANN consists of a large number of highly interconnected processing elements (nodes) that are tied together with weighted connections (links). Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true for ANN as well. Learning typically occurs by example through training, or exposure to a set of input/output data (pattern) where the training algorithm adjusts the link weights. The link weights store the knowledge necessary to solve specific problems.
7
Originated in late 1950's, neural networks did not gain much popularity until 1980s’, a computer booming era. Today ANNs are mostly used for solution of complex real wo rld problems. They are often good at solving problems that are too complex for conventional technologies (e.g., problems that do not have an algorithmic solution or for which an algorithmic solution is too complex to be found) and are often we ll suited to problems that people are good at solving, but for which traditional methods are not. They are good pattern recognition engines and robust classifiers, with the ability to generalize in making decisions based on imprecise input data. They offer ideal solutions to a variety of classification problems such as speech, characte r and signal recognition, as well as functional prediction and system modeling, where t he physical processes are not understood or are highly complex. The advantage of ANNs lies in their resilience against distortions in the input data and their capability to learn. However, ANN is potentially more complex than the Fourier descriptor approach. So it will serve as a comparative object to Fourier descriptor approach unless it is proved to be very much efficient and thus feasible to deploy on mobile device.
Template Matching For characters without any transformation like scaling or rotation, a template matching approach may be effective. The template matching process determines the best location by testing all or a sample of the viable test locations within the search image that the template image may match up to. The template matching algorithm may potentially require sampling of a large number of points, it is possible to reduce the number of sampling points by reducing the resolution of the search and template images by the same factor and performing the operation on the resultant downsized images (multiresolution, or pyramid, image processing), providing a search window of data points within the search image so that the template does not have to search every viable data point, or a combination of both.
Template
Target
Result
Porting to Android Why Android? Android is a mobile operating system developed by Google and is based upon a modified version of the Linux kernel. It was initially developed by Android Inc. (a firm purchased by Google) and later positioned in the Open Handset Alliance. According to NPD Gr oup, unit sales for Android 8
OS smart phones ranked first among all smart phone OS handsets sold in the U.S. in the second quarter of 2010, at 33%.BlackBerry OS is second at 28%, and iOS is ranked third with 22%. Therefore we can Android system has g reat popularities among users and developers. Also, as an open source system, developing application for Android system will be mostly enjoyable process. More important is, Android power device usually has more computational power which will facilitate the performance of our system.
Porting Method Porting a program originally written in C language from PC platform to Android system will be basically straightforward. Google has released Android NDK tool chain which makes the porting process fairly smooth.
Project Schedule Following is the tentative project schedule: Milestone
Date
Remarks
FYP proposal Implement Preprocessing Implement Fourier Descriptor Implement ANN Implement Template Matching Compare Three Approaches Design Chinese 2D code Mid-term check point Report Implement Chinese 2D code Explore OCR input method
Oct 7 Oct 30 Nov 15 Nov 30 Dec 10 Dec 20 Dec 30 Jan 13 Jan 20 Feb 10
Required by the department
Implement demonstrative OCR input method Final Report
Feb 30
Decide which approach is most suitable Required by the department Decide whether it is feasible to use OCR input
April 14
Required by the department
Resources Estimation During this project, only very limited resources are required, including
One web camera
One Android powered smart phone
A PC with Linux operating system installed
9
References 1. Brigham, E. Oran (1988). The fast Fourier tr ansform and its applications. Englewood Cliffs, N.J.: Prentice Hall. ISBN 0-13-307505-2. 2. Oppenheim, Alan V.; Schafer, R. W.; and Buck, J. R. (1999). Discrete-time signal processing. Upper Saddle River, N.J.: Prentice Hall. ISBN 0 -13-754920-2. 3. Smith, Steven W. (1999). "Chapter 8: The Discrete Fourier Transform". The Scientist and Engineer's Guide to Digital Signal Processing (Second ed.). San Diego, Calif.: California Technical Publishing. ISBN 0-9660176-3-3. 4. Cormen, Thomas H.; Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein (2001). "Chapter 30: Polynomials and the FFT". Introduction to Algorithms (Second ed.). MIT Press and McGraw-Hill. pp. 822 –848. ISBN 0-262-03293-7. esp. section 30.2: The DFT and FFT, pp. 830 –838. 5. P. Duhamel, B. Piron, and J. M. Etcheto (1988). "On c omputing the inverse DFT". IEEE Trans. Acoust., Speech and Sig. Processing 36 (2): 285 –286. doi:10.1109/29.1519. 6. J. H. McClellan and T. W. Parks (1972). "Eigenvalues and eigenvectors of t he discrete Fourier transformation". IEEE Trans. Audio Electroacoust. 20 (1): 66 –74. doi:10.1109/TAU.1972.1162342. 7. Azriel Rosenfeld, Picture Processing by Computer, New York: Academic Press, 1969 8. "Space Technology Hall of Fame: Inducted Technologies/1994", Space Foundation, 1994. Retrieved 7 January 2010. 9. A Brief, Early History of Computer Gr aphics in Film, Larry Yaeger, 16 Aug 2002 (last update), retrieved 24 March 2010 10. J. Canny (1986) "A computational approach to edge detection", I EEE Trans. Pattern Analysis and Machine Intelligence, vol 8, pages 679 -714. 11. R. Haralick, (1984) "Digital step edges from z ero crossing of second directional derivatives", IEEE Trans. on Pattern Analysis and Machine Intelligence, 6(1):58 –68. 12. R. Kimmel and A.M. Bruckstein (2003) "On regularized Laplacian ze ro crossings and other optimal edge integrators", International Journal of Computer Vision, 53(3) pages 225-243.
10