HUMAN COMPUTER INTERFACE BASED ON FACE TRACKING FOR PHYSICALLY CHALLENGED USERS A PROJECT REPORT Submitted by
ABDUL ASIM A. AFSHAN S. ANAND R. in partial fulfillment for the award of the degree of
BACHELOR OF TECHNOLOGY in
INFORMATION TECHNOLOGY B.S.ABDUR RAHMAN CRESCENT ENGINEERING COLLEGE, VANDALUR
ANNA UNIVERSITY: CHENNAI 600 025 APRIL 2009
ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “HUMAN COMPUTER INTERFACE BASED ON FACE TRACKING FOR PHYSICALLY CHALLENGED USERS”
is the
bonafide work of “ABDUL ASIM A. (40405205001), AFSHAN S. (40405205005) & ANAND R. (40405205008)”
who carried out the project work under my
supervision.
SIGNATURE
SIGNATURE
Dr. T.R RANGASWAMY
Dr. ANGELINA GEETHA
HEAD OF THE DEPARTMENT
SUPERVISOR Professor
Department of Information Technology
Department of Computer Science
B.S.A CRESCENT ENGINEERING COLLEGE
B.S.A CRESCENT ENGINEERING COLLEGE
SEETHAKATHI ESTATE G.S.T. Road, Vandalur, Chennai - 600 048, India
SEETHAKATHI ESTATE G.S.T. Road, Vandalur, Chennai - 600 048, India
ii
ANNA UNIVERSITY : CHENNAI 600 025
VIVA VOCE EXAMINATION
The viva-voce examination of the following students who have submitted the project work “HUMAN COMPUTER INTERFACE BASED ON FACE TRACKING FOR PHYSICALLY CHALLENGED USERS” is held on _____________
ABDUL ASIM A. (40405205001) AFSHAN S.
(40405205005)
ANAND R.
(40405205008)
INTERNAL EXAMINER
EXTERNAL EXAMINER
iii
ACKNOWLEDGEMENT We are grateful to our Principal, Dr. V. M. PERIASAMY, B.S.A. Crescent Engineering College, for providing us an excellent environment to carry out our course successfully. We are deeply indebted to our beloved Head of the Department, Dr. T. R. RANGASWAMY, Department of Information Technology, who moulded us both technically and morally for achieving greater success in life. We express our thanks to our project coordinator Ms. R. REVATHY, Senior Lecturer, Department of Information Technology, for her valuable suggestions at every stage of our project. We record our sincere thanks to our guide Dr. ANGELINA GEETHA, Professor, Department of Computer Science, for being instrumental in the completion of our project with her exemplary guidance. We thank all the staff members of our department for their valuable support and assistance at various stages of our project development.
iv
TABLE OF CONTENTS CHAPTER NO.
1.
TITLE
PAGE NO.
ABSTRACT
vii
LIST OF TABLE
viii
LIST OF FIGURES
ix
LIST OF ABBREVATIONS
x
INTRODUCTION 1.1
Feature Detection
1
1.2
Face Detection
3
1.3
Algorithms on Face Detection
3
1.4
Human Computer Interface for
4
physically challenged users 1.5
HCI based on Mouse Movements
5
1.6
Related Works
6
2.
PROBLEM DEFINITION
8
3.
DEVELOPMENT PROCESS
9
3.1
9
3.2
Requirement Analysis and Specification 3.1.1
Input Requirements
3.1.2
Output Requirements
10
3.1.3
Functional Requirements
10
Resource Requirements
10
3.2.1
Hardware
11
3.2.2
Software
11
v
3.3
Design 3.3.1
System Architecture
12
3.3.2
Detailed Design
13
3.3.2.1 User Interface
14
3.3.2.2 Module Description
14
3.4 3.5
12
Implementation
19
Testing
23
4.
APPLICATION AND FUTURE ENHANCEMENTS
25
5.
CONCLUSION
26
APPENDIX A – SCREENSHOTS
27
REFERENCES
35
vi
ABSTRACT Physically challenged people find it difficult to use a computer because information is presented in an inaccessible form to them. Though many forms of computer access are available for disabled people, these systems are expensive and require sophisticated hardware support. In this context, this system focuses on helping quadriplegic and non-verbal users. The challenge is to develop a Human Computer Interface for such users which is inexpensive and easy to implement. Human Computer Interface is a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use and with the study of major phenomena surrounding them. We propose an interface for people with severe disabilities based on face tracking. Body features like the eyes and the lips may also be used for implementing a human computer interface but with some limitations. In eye tracking, the motion of the pupil is hard to track with a web camera which would be the primary mode of input in the proposed system. For a physically challenged user, moving the face itself demands greater effort and hence finer intricacies eyeball and lip movement cannot be considered. The system depends on a web camera for input and hence would be affordable by the target users. User friendliness is enhanced as the system is devoid of any sophisticated hardware requirement.
vii
LIST OF TABLES S.No 1. 2.
Table Name Hardware Resource requirement table Software Resource requirement table
viii
Page No 11 11
LIST OF FIGURES
Figure.No 1.1 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
Figure Name Head tracking system Architecture diagram System flow diagram Code snippet for webcam capture Code snippet for face detection Code snippet for mouse pointer movement Code snippet for playing video clips Message Board Algorithm flow diagram
ix
Page No 2 12 15 16 17 18 18 19 22
LIST OF ABBREVIATIONS
S.No 1. 2. 3. 4. 5. 6. 7. 8. 9.
Acronym CAMSHIFT HCI SDLC GUI MFC CLR ATL OpenCV COM
Expansion Continuous Adaptive Mean Shift Human Computer Interface Software Development Life Cycle Graphical User Interface Microsoft Class Foundation Common Language Runtime Active Template Library Open Computer Vision Component Object Model
1. INTRODUCTION
x
1.1 Feature Detection Feature detection is a process by which specialized nerve cells in the brain respond to specific features of a visual stimulus, such as lines, edges, angle, or movement. The nerve cells fire selectively in response to stimuli that have specific characteristics. Feature detection was discovered by David Hubel and Torsten Wiesel of Harvard University. In computer vision and image processing the concept of feature detection refers to methods that aim at computing abstractions of image information and making local decisions at every image point whether there is an image feature of a given type at that point or not. The resulting features will be subsets of the image domain, often in the form of isolated points, continuous curves or connected regions. Feature detection is a low-level image processing operation. That is, it is usually performed as the first operation on an image, and examines every pixel to see if there is a feature present at that pixel. If this is part of a larger algorithm, then the algorithm will typically only examine the image in the region of the features. As a built-in pre-requisite to feature detection, the input image is usually smoothed by a Gaussian kernel in a scalespace representation and one or several feature images are computed, often expressed in terms of local derivative operations. Occasionally, a higher level algorithm may be used to guide the feature detection stage, so that only certain parts of the image are searched for features. Once features have been detected, a local image patch around the feature can be extracted. This extraction may involve quite considerable amounts of image processing. The result is known as a feature descriptor or feature vector. Types of tracking: Eye Tracking:
xi
Eye tracking is the process of measuring either the point of gaze or the motion of an eye relative to the head. An eye tracker is a device for measuring eye positions and eye movements. There are a number of methods for measuring eye movements. The most popular variant uses video images from which the eye position is extracted. Other methods use search coils or are based on the electro-oculogram. Two general types of eye tracking techniques are used: Bright Pupil and Dark Pupil. Their difference is based on the location of the illumination source with respect to the optics. If the illumination is coaxial with the optical path, then the eye acts as a retro-reflector as the light reflects off the retina creating a bright pupil effect similar to red eye. If the illumination source is offset from the optical path, then the pupil appears dark because the retro-reflection from the retina is directed away from the camera. Head Tracking: Head tracking technology consists of a device transmitting a signal from atop the computer monitor and tracking a reflector placed on the user's head or eyeglasses. A mouse alternative as this allows the person to control the mouse cursor by moving his/her head. Once calibrated, the movement of the user's head relates to what direction the onscreen cursor will travel. An example of a head tracking system is given in Figure 1.1.
Figure 1.1: Head tracking system 1.2 Face Detection
xii
Face detection is a computer technology that determines the locations and sizes of human faces in arbitrary (digital) images. It detects facial features and ignores anything else, such as buildings, trees and bodies. Face detection can be regarded as a more general case of face localization; In face localization, the task is to find the locations and sizes of a known number of faces (usually one). In face detection, one does not have this additional information. Early face-detection algorithms focused on the detection of frontal human faces, whereas newer algorithms attempt to solve the more general and difficult problem of multi-view face detection which is the detection of faces that are either rotated along the axis from the face to the observer (in-plane rotation), or rotated along the vertical or leftright axis (out-of-plane rotation) or both. Face detection is used in biometrics, often as a part of (or together with) a facial recognition system. It is also used in video surveillance, human computer interface and image database management. Some recent digital cameras use face detection for autofocus. Also, face detection is useful for selecting regions of interest in photo slideshows that use a pan-and-scale effect.
1.3 Algorithms on Face Detection Neural Network-Based Face Detection by Rowley, Baluja and Kanade: This is a neural network-based algorithm to detect upright, frontal views of faces in gray-scale images. The algorithm works by applying one or more neural networks directly to portions of the input image, and arbitrating their results. Each network is trained to output the presence or absence of a face. The algorithms and training methods are designed to be general, with little customization for faces. Many face detection researchers have used the idea that facial images can be characterized directly in terms of pixel intensities. These images can be characterized by probabilistic models of the set of xiii
face images or implicitly by neural networks or other mechanisms. The parameters for these models are adjusted either automatically from example images or by hand. Algorithm by Henry Schneiderman and Takeo Kanade This algorithm is a statistical method for three dimensional object detection. The statistics of both object appearance and non-object is represented using histograms. Each histogram represents the joint statistics of a subset of wavelet coefficients and their position on the object. This approach uses many such histograms to represent a wide variety of visual attributes. The algorithm is the first of its kind to reliably detect human faces with out-of-plane rotation. CAMSHIFT Algorithm CAMSHIFT stands for "Continuously Adaptive Mean Shift.". It combines the basic Mean Shift algorithm with an adaptive region-sizing step. The kernel is a simple step function applied to a skin-probability map. The skin probability of each image pixel is based on color using a method called histogram back projection. Color is represented as Hue from the HSV color model. While it is a very fast and simple method of tracking, because CAMSHIFT tracks the center and size of the probability distribution of an object, it is only as good as the probability distribution that is produced for the object.
1.4 Human Computer Interaction for the Physically Challenged Human–computer interface (HCI) is the study of interaction between people (users) and computers. It is often regarded as the intersection of computer science, behavioral sciences, design and several other fields of study. Interaction between users and computers occurs at the user interface (or simply interface), which includes both software and hardware, for example, general-purpose computer peripherals and largescale mechanical systems, such as aircraft and power plants.
xiv
Persons with severe motion impairment like biplegics, quadriplegics etc. face difficulty in accessing computer-based systems since they cannot use conventional computer access devices like mouse or keyboards. Alternate computer interfaces based on tracking of body features needs to be developed for these users. The challenge lies in designing a system which would serve as a general interface between computers and physically challenged users.
1.5 HCI Based on Mouse Movements: Pointing devices like the mouse and trackball enables users to control a pointer and interact with a graphical user interface. The current human-computer interaction mode, based primarily on the message board and the mouse, has seen little change since the advent of modern computing. Currently computers come with cameras as standard equipment. Hence it is desirable to employ them in designing next-generation human computer interaction devices. The feasibility of interfaces based on speech driven input has also been extensively investigated. Relying on input based on human features has opened up the possibility of developing interfaces for people who cannot use the keyboard or mouse due to severe disabilities. Such systems make use of human features such as the head, eyes, lips or face for tracking the movement of the user and translating the movements into mouse movements on the screen. The purpose of this project is to develop an interface for quadriplegic and non-verbal users.
1.6 Related Work In the works of James Gips, Margrit Betke and Peter Fleming (2000), preliminary investigations have been carried out for the design of a human computer interface for xv
quadriplegic and non-verbal users. The system has been broken down into two main components. The first component is the Vision Computer which receives real-time input from a camera mounted on the monitor. The second component is the User’s Computer which runs a special driver program in the background to translate the user’s movement from the input device into mouse movements on the screen. A camera mouse system was developed by James Gips, Margrit Betke and Peter Fleming (2002). The system makes use of body features like the tip of the user’s nose or finger or face to track the position of the mouse. Various body features are examined for tracking reliability and user convenience. The visual tracking algorithm used in this system is based on cropping an online template of the tracked feature from the current image frame and testing where this template correlates in the subsequent frame. The location of the highest correlation is interpreted as the new location of the feature in the subsequent frame. Our system takes into consideration, part of the modules of the algorithm for regular updating of the image frames. We study the working of the CAMSHIFT algorithm proposed by Gary R.Bradski (1998) to develop a Perceptual User Interface. Perceptual interfaces are the ones in which the computer is given the ability to sense and produce analogs of the human senses. The CAMSHIFT algorithm is a modification of the mean shift algorithm which is based on probability distributions. The Continuous Adaptive Mean Shift (CAMSHIFT) algorithm deals with dynamically changing color probability distributions derived from video frames. Since CAMSHIFT relies on color distribution alone, errors in color will cause errors in tracking. A face detection algorithm based on skin color has been proposed by Sanjay Singh, D.S. Chauhan, Mayank Vatsa and Richa Singh (2003). The authors have discussed various algorithms based on skin color. Three main color spaces of RGB, YCbCr and HIS have been combined to get a new skin color based face detection algorithm which achieves higher accuracy. Our system involves face localization discussed in this publication. xvi
In the works of Rajesh Kumar and Anupam Kumar (2008), alternate input systems to replace the traditional mouse and keyboard are discussed. The authors have developed an input system which uses the head and eyes to track the movements of the user. The algorithm is based upon image matching using correlation coefficients. The system comprises of an image tracer module and cursor position is determined by calculating correlation coefficient of tracing window in image space. Ian R. Fasel and Javier R. Movellan (2002) have conducted a comprehensive analysis of some techniques used in neutrally inspired face detectors. Algorithms such as SNoW, AdaBoost and Bootstrap have been studied. The AdaBoost algorithm is based on active sampling of images whereas its counterparts use random sampling. It has been experimentally proven that Adaboost delivered consistent performance under various conditions. In the works of Zhaomin Zhu, Takashi Morimoto, Hidekazu Adachi, Osamu Kiriyama , Tetsushi Koide and Hans Juergen Mattausch (2005), a face detection system has been proposed based on Haar- like features. The detection technique is based on the idea of the wavelet template that defines the shape of an object in terms of a subset of the wavelet coefficients of the image. The object in this case is the human face. Our system makes use of the Haar face detection algorithm to recognize and track faces from real time video input. The main tasks involved are webcam capture, face detection and translation of facial movements into mouse movements. A web camera is a low-resolution capture device. The Haar face detection algorithm processes the video feed using a large number of evaluations called classifiers to localize faces. This helps in achieving a high degree of accuracy.
2. PROBLEM DEFINITION
xvii
People with severe disabilities resulting from birth or accidents or from degenerative diseases and bed ridden patients have been excluded from access to computers and even lack proper means of communication with fellow human beings. Information is presented in an inaccessible form to them. They are unable to speak and have very little or no voluntary muscle control. In most cases, these people are able to move only their heads. Their level of mental functioning might not be known because of their inability to communicate. People with severe physical disabilities often are isolated, spending hours in bed or in a wheelchair at home or in an institutional setting. Computer and communication technology can make all the difference in the world for people with profound physical disabilities. Our approach is to develop a computer interface for the disabled using facial tracking. The challenge is to develop a low cost system devoid of any sophisticated hardware for input. The system should be free from any special hardware to track the desired feature as this may cause inconvenience to the user. The facial movements of the user are captured using a webcam and translated into mouse pointer movements after preprocessing and applying face detection algorithm. Thus by moving the face, the user would be able to control the mouse. The interface contains options for raising an alarm, summoning a nurse and playing audio and video for entertainment. An on-screen message board has also been provided to enable the user to communicate effectively.
3. DEVELOPMENT PROCESS
xviii
A software development process is a structure imposed on the development of a software product. The activities concerned with the development of a software are collectively known as Software Development Life Cycle (SDLC). SDLC is any logical process used by a systems analyst to develop an information system, including requirements, validation, training, and user ownership. An SDLC should result in a high quality system that meets or exceeds customer expectations, reaches completion within time and cost estimates, works effectively and efficiently in the current and planned.
3.1 Requirement Analysis and Specifications The requirement engineering process consists of feasibility study, requirements elicitation and analysis, requirements specification, requirements validation and requirements management. Requirements elicitation and analysis is an iterative process that can be represented as a spiral of activities, namely requirements discovery, requirements classification and organization, requirements negotiation and requirements documentation.
3.1.1 Input Requirements The input for the human computer interface will be obtained from a web camera. Since the interface would solely depend on the camera, care should be taken in choosing the computer camera. A web camera is chosen over other mediums of video capture for two reasons. First, a web camera is less expensive compared to other visual input devices and this makes the system affordable to every individual. Also the web camera does not require any specialized drivers or software support and this makes it easy for the developer to access real-time video feeds.
3.1.2 Output Requirements xix
The output will be the movement of the mouse pointer on the interface. The video stream from the camera will be displayed at the center of the interface along with the tracking of the face.
3.1.3 Functional Requirements The facial movements of the user are captured through the camera in Visual C++. The live video stream is fed to the face detection algorithm. The detected face is given as input to the tracker module which translates the facial movements into mouse pointer movements. This can be then be used to access the user interface.
3.2 RESOURCE REQUIREMENTS Software requirements is a sub-field of Software engineering that deals with the elicitation, analysis, specification, and validation of requirements for software. Requirements analysis in systems engineering and software engineering, encompasses those tasks that go into determining the needs or conditions to meet for a new or altered product, taking account of the possibly conflicting requirements of the various stakeholders, such as beneficiaries or users. Requirements analysis is critical to the success of a development project. Requirements must be actionable, measurable, testable, related to identified business needs or opportunities, and defined to a level of detail sufficient for system design. 3.2.1 Hardware The minimum hardware requirements for this project are listed in Table 1.
Table 1: Hardware Requirements xx
Hardware
Requirement
Processor
Intel Pentium IV or AMD – 1.8 GHz
Memory
1 GB RAM
Hard Disk
1 GB
Video Capture Device
Logitech or Microsoft web camera
(Input)
3.2.2 Software The minimum software requirements for this project are listed in Table 2.
Table 2: Software Requirements Software
Requirement
Operating System
Windows 2000/XP
Runtime Package Microsoft Visual C++, Intel OpenCV Webcam Drivers
Logitech/Microsoft SDK
3.3 DESIGN Software design is a process of problem-solving and planning for a software solution. After the purpose and specifications of software are determined, software developers will design or employ designers to develop a plan for a solution. It includes low-level component and algorithm implementation issues as well as the architectural view.
xxi
3.3.1 System Architecture
Figure 3.1: Architecture Diagram
The architecture of the system is represented in Figure 3.1. The system receives real time input from the user via a web camera. The vide o stream is accessed via the webcam capture module. The vendor-supplied webcam software cannot be used for interfacing the webcam and the face detection module. The input from the camera is given to the face detection module. The core of the face detection module contains the algorithm which works on localizing the facial segments from the rest of the image. The algorithm is adapted to detect faces from streaming video feeds. After the face has been detected in the video stream, the movements of the face are translated into mouse cursor movements on the screen and updated accordingly in realxxii
time. The position of the face is converted into onscreen coordinates and this is mapped into mouse pointer coordinates in the tracker module. Hence, when the user moves his face, the mouse cursor is moved correspondingly. This tracking module is interfaced with the Graphical User Interface (GUI). Using the mouse movements, the user can interact with the application interface.
3.3.2 Detailed Design Our system provides an efficient way for bed ridden people to interact with a computer and also provides an efficient communication system. The main tasks to be accomplished in the development of the proposed system are as follows: •
Accessing the video stream from the video camera in real time
•
Detecting the facial motion from the captured video
•
Development of the user interface to aid the target users
•
Translating the facial motion into an input format which can be used to manipulate the user interface
•
Triggering of control signals based on the translated input format
3.3.2.1 User Interface The system has been developed in Microsoft Visual C++. The system can be executed by running the project executable file. The web camera has to be setup and initialized before executing the system. The system will automatically detect the web camera provided there is only one active camera at execution time. The web camera must be fixed and focused on the facial region of the target user. Care should be taken to align the camera in this way. The system tracks the signals captured by the web camera, analyses and detects the face region. As the video stream xxiii
progresses, by applying the algorithm, the facial movement is detected. Once face detection has been established, control passes to the mouse pointer and the user is able to move the mouse pointer by moving his/her face. At the center of the interface is a display window which shows the real time video stream from the web camera. It displays the detected face which is updated constantly in real-time. The interface has buttons to invoke various functions. The user is able to raise an alarm, summon a nurse or play audio and video for entertainment purpose. An onscreen message board can also be invoked for communication purposes. The invoked function can be stopped using the stop button and the application can be closed using the exit button provided in the interface.
3.3.2.2 Module Description The basic flow of the system is represented in Figure 3.2. The Human computer interface for physically challenged users is made possible by the video feed from the web camera. The modules of the proposed system are as follows:
1. Webcam Capture module 2. Face Detector 3. Tracker module 4. Application Interface
xxiv
Figure 3.2: System Flow diagram
Webcam Capture module: The input for the system is captured using the web camera. Lighting conditions should also be favourable. The bundled software supplied with the camera can be used to capture images and video. But this cannot be interfaced with the application to be developed. Thus we capture the video stream from the camera in Visual C++ using Microsoft DirectShow. Microsoft DirectShow is a part of the Microsoft Direct X SDK. It is a set of low-level application programming interfaces for creating games and other high performance multimedia applications. DirectShow automatically detects and uses audio and video acceleration whenever available. The captured video stream is displayed at the center of the user interface. The video stream is given as input to the face detection module. The code for webcam capture is given in Figure 3.3.
xxv
// Capture from the camera capture = cvCaptureFromCAM(-1); // Capture the frame and load it in IplImage frame = cvRetrieveFrame( capture ); // Allocate framecopy as the same size of the frame if( !frame_copy ) frame_copy = cvCreateImage( cvSize(frame->width,frame->height), IPL_DEPTH_8U, frame->nChannels );
Figure 3.3 : Code snippet for webcam capture
Face Detector module: The facial movements of the user are captured from the web camera and given to the face detector module. The algorithm used in our system is the Multi-view Face Detection and Recognition Algorithm using Haar-like Features. Haar-like features are digital image features used in object recognition. They owe their name to their intuitive similarity with Haar wavelets. The feature set considers rectangular regions of the image and sums up the pixels in this region. This sum is used to categorize images. We could thus categorize all images whose Haar-like feature in this rectangular region to be in a certain range of values as one category and those falling out of this range in another. This might roughly divide the set of images into ones having a lot of faces and the ones not having faces. We could thus categorize all images whose Haar-like feature in this rectangular region to be in a certain range of values as one category and those falling out of this range in another. This might roughly divide the set of images into ones having a lot of faces. Once the face has been detected, a coloured box is drawn around the face to localize it. The algorithm constantly localizes the face in the dynamic video stream. xxvi
const char* cascade_name = "haarcascade_frontalface_alt.xml"; // Create a new image based on the input image IplImage* temp = cvCreateImage( cvSize(img->width/scale,img->height/scale), 8, 3 ); // Detect the objects CvSeq*faces=cvHaarDetectObjects(img,cascade,storage,1.1,2, CV_HAAR_DO_CANNY_PRUNING, cvSize(40, 40) );
Figure 3.4 : Code snippet for face detection
Tracker module: The face detector module draws a square around the localized face. The coordinates of the square are passed as coordinates to the SetCursor function. This enables the mouse pointer to move when the user moves his/her face. The coordinates are multiplied by a scaling factor in order to enhance mouse movement. Mouse clicking function is implemented using a time delay. When the mouse pointer hovers over a button for a specified time, the button gets clicked. The code snippet for mouse control is given in Figure 3.5. //face coordinates pt1.x = r->x*scale; pt2.x = (r->x+r->width)*scale; pt1.y = r->y*scale; pt2.y = (r->y+r->height)*scale; pt3.x=(pt1.x)*7; pt3.y=(pt1.y)*7; SetCursorPos(pt3.x,pt3.y); xxvii
//mouse clicking mouse_event(MOUSEEVENTF_LEFTDOWN,0,0,0,GetMessageExtraInfo()); mouse_event(MOUSEEVENTF_LEFTUP,0,0,0,GetMessageExtraInfo());
Figure 3.5 : Code snippet for mouse pointer movement Application Interface: The user interface is a Microsoft Class Foundation (MFC) Dialog based application built in VC++. A face tracking display is present at the center of the user interface to display the facial movements of the user. The following function buttons are present around the face tracking display. Emergency button
- raises an alarm when clicked
Video
- plays a small video clip as entertainment. The code snippet for playing videos is given in Figure 3.6.
clock1=MCIWndCreate(GetSafeHwnd(),AfxGetInstanceHandle(), WS_CHILD,"globe.avi");
WS_VISIBLE|
Figure 3.6 : Code snippet for playing video clips Audio
- plays a small audio clip as entertainment
Message board
- enables the user to display small messages to express their needs. A screenshot is provided in Figure 3.7
xxviii
Figure 3.7 : Message board Stop
- stops the currently invoked function
Exit
- used to exit the application
3.4 IMPLEMENTATION Software implementation involves compilation and execution of the designed system. Modular and subsystem programming code will be accomplished during this stage. Unit testing and module testing are done in this stage by the developers. This stage is intermingled with the next in that individual modules will need testing before integration to the main project. Planning in software life cycle involves setting goals, defining targets, establishing schedules, and estimating budgets for an entire software project.
Microsoft Visual C++ xxix
Microsoft Visual C++ 2005 provides a powerful and flexible development environment for creating Microsoft Windows–based and Microsoft .NET–based applications. It can be used as an integrated development system, or as a set of individual tools. Visual C++ is comprised of these components: The Visual C++ 2005 compiler tools - The compiler has new features supporting developers that target virtual machine platforms like the Common Language Runtime (CLR) . There are now compilers to target x64 and Itanium. The compiler continues to support targeting x86 machines directly, and optimizes performance for both platforms. The Visual C++ 2005 Libraries - This includes the industry-standard Active Template Library (ATL) , the MFC libraries, and standard libraries such as the Standard C++ Library, and the C RunTime Library, which has been extended to provide security enhanced alternatives to functions known to pose security issues. A new library, the C++ Support Library, is designed to simplify programs that target the CLR. The Visual C++ 2005 Development Environment - Although the C++ compiler tools and libraries can be used from the command-line, the development environment provides powerful support for project management and configuration (including better support for large projects), source code editing, source code browsing, and debugging tools. This environment also supports IntelliSense, which makes informed, contextsensitive suggestions as code is being authored. In addition to conventional graphical user-interface applications, Visual C++ enables developers to build Web applications, smart-client Windows-based applications, and solutions for thin-client and smart-client mobile devices. C++ is the world's most popular systems-level language, and Visual C++ gives developers a world-class tool with which to build software.
Intel OpenCV Library xxx
The Intel Open Source Computer Vision (OpenCV) library is a computer vision library originally developed by Intel. It is free for commercial and research use under a BSD license. The library is cross-platform, and runs on Windows, Mac OS X, Linux, PSP, VCRT (Real-Time OS on Smart camera) and other embedded devices. It focuses mainly on real-time image processing, as such, if it finds Intel's Integrated Performance Primitives on the system, it will use these commercial optimized routines to accelerate itself. Officially launched in 1999, the OpenCV project was initially an Intel Research initiative to advance CPU-intensive applications, part of a series of projects including real-time ray tracing and 3D display walls. The library is mainly written in C, which makes it portable to some specific platforms such as Digital signal processor. But wrappers for languages such as C# and Python have been developed to encourage adoption by a wider audience. Our system makes use of some functions present in this library in the form of DLLs.
Microsoft DirectShow: DirectShow codename Quartz, is a multimedia framework and API produced by Microsoft for software developers to perform various operations with media files or streams. It is the replacement for Microsoft's earlier Video for Windows technology. Based on the Microsoft Windows Component Object Model (COM) framework, DirectShow provides a common interface for media across many programming languages, and is an extensible, filter-based framework that can render or record media files on demand at the request of the user or developer. The DirectShow development tools and documentation were originally distributed as part of the DirectX SDK. Currently, they are distributed as part of the Windows SDK. DirectShow's counterparts on other platforms include Apple's QuickTime framework and various Linux multimedia frameworks such as GStreamer or Xine.
xxxi
Working of the Algorithm: The algorithm used in our system is the Multi-view Face Detection and Recognition Algorithm using Haar-like Features. This algorithm is designed for still images. It has been modified to detect faces from streaming video feeds. The working of the algorithm is as follows
Rectangular Scaling
Input
Sum Pixel
Image
Calculation
Rectangle
Haar-like
Node
Feature
Selection
Calculation Haar-like Feature Comparison
Face Detection
Haar-like features in Database Scaling
Figure 3.8: Algorithm Flow Diagram
The overall algorithm is depicted in Figure 3.8. The detection technique is based on the idea of a wavelet template that defines the shape of an object in terms of a subset of the wavelet coefficients of the image. The input image is scanned across location and scale using a scaling factor of 1.1. At each location and independent decision is made regarding the location of the face.
xxxii
This leads to a large number of classifier evaluations. Each classifier is a simple function of rectangular sums followed by a threshold. In each round of boosting, one feature is selected, that with the lowest weighted error. In subsequent rounds incorrectly labeled examples are given a higher weight while correctly labeled examples are given a lower weight. In order to reduce the false positive rate while preserving efficiency, classification is divided into a cascade of classifiers. The input is passed from one classifier to the next as long as each classifier classifies the window as a face An input window is evaluated on the first classifier of the cascade and if that classifier returns false then computation on that window ends and detector returns false. If the classifier returns true then the window is passed onto the next classifier in the cascade. The next classifier evaluates the window in the same way. The more a window looks like a face, more classifiers are evaluated on it and longer it takes to classify the window.
3.5 TESTING Testing is the process of evaluating the correctness, the quality and the completeness of the system developed. Our system was tested across a variety of applicants. It was found that the system was able to detect faces successfully in all cases. The application is also able to pick out faces from considerably large distances. The user requires some training in order to move the mouse efficiently. Face detection is found to be efficient even with a normal web camera and under ordinary lighting conditions. However, care should be taken to align the web camera with the facial region of the user for optimum face detection.
xxxiii
4. APPLICATIONS AND FUTURE ENHANCEMENT Our system is mainly targeted towards physically disabled people who are quadriplegic and non-verbal and bed ridden patients. But this human computer interface has other applications as well. It can be used as an alternative to the traditional mouse and xxxiv
keyboard. It can be used to control the entire computer, browse the internet, prepare documents etc. As the system is relatively inexpensive, it can be installed in hospitals as a communication system for patients. The system may also be used as a hands-free navigation device to access a computer. This facilitates multitasking. For example, a doctor while performing a surgery can make use of this system to issue commands to a computer. The system can be enhanced with high resolution cameras like infra red cameras to improve face detection. It can be interfaced with external mobile devices to enhance the communication part. The system can be enhanced for use in biometric security systems.
5. CONCLUSION The objective of this project is to provide an automated system which will capture the facial movements of the target user and correlate it with mouse pointer movements on the screen. The developed interface will enable quadriplegic and non-verbal users to access a computer. xxxv
A system has been developed for use by disabled people and bedridden patients. A webcam interface captures the facial movements of the user. Face detection algorithm is implemented and integrated with mouse movements on the screen. The system has been integrated with four functions to aid physically challenged people. An emergency button is provided for raising an alarm. Clicking on the audio button plays audio files for entertainment. The video button is used to play videos for entertainment. An onscreen message board has been provided for communication purposes. It helps the users to display short messages to express their needs . The future focus is on enabling the system to incorporate certain hardware based interfaces such as moving a robot.
Appendix A – Screenshots
MAIN INTERFACE
xxxvi
FACE TRACKING 1
xxxvii
FACE TRACKING 2
xxxviii
FACE TRACKING 3
xxxix
FACE TRACKING 4
xl
FACE TRACKING FOR A BED RIDDEN USER
xli
PLAYING VIDEO
xlii
MESSAGE BOARD
xliii
REFERENCES
xliv
1. Gary R. Bradski (1998), “Computer Vision Face Tracking for Use in a Perceptual User Interface”, Intel Technical Journal Q2 ’98, Microcomputer Research Lab, Santa Clara, CA, Intel Corporation. 2. James Gips, Margrit Betke and Peter Fleming (), “The Camera Mouse: Preliminary Investigation of Automated Visual Tracking For Computer Access”, Computer Science Department, Boston College, Chestnut Hill, MA 02467. 3. James Gips, Margrit Betke and Peter Fleming (2002), ”The Camera Mouse: Visual Tracking of Body Features to Provide Computer Access for People With Severe Disabilities”, IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 10, No. 1. 4. Rajesh Kumar, Anupam Kumar (2008), “Black Pearl: An Alternative for Mouse
and Keyboard”, ICGST-GVIP, ISSN 1687-398X, Volume (8), Issue (III). 5. Sanjay Kr. Singh, D. S. Chauhan, Mayank Vatsa, Richa Singh(2003), ”A Robust Skin Color Based Face Detection Algorithm”, Tamkang Journal of Science and Engineering, Vol. 6, No. 4, pp. 227-234. 6. Zhaomin Zhu, Takashi Morimoto, Hidekazu Adachi, Osamu Kiriyama , Tetsushi
Koide and Hans Juergen Mattausch (2005), “Multi-View Face Detection and Recognition using Haar-like Features”, Research Center for nano-devices and systems, Hiroshima University.
xlv