Research
Go back to cover page

Please Sign my Guest Book


Vision-Based Detection of Driver Fatigue

Sarbjit Singh - sasingh@cs.umn.edu
Nikolaos P. Papanikolopoulos - npapas@cs.umn.edu
Artificial Intelligence, Robotics, and Vision Laboratory
Department of Computer Science,
University of Minnesota, Minneapolis, MN 55455

ABSTRACT
INTRODUCTION
OVERVIEW OF THE SYSTEM
THE EYE TRACKER
THE FATIGUE DETECTOR
RESULTS AND FUTURE WORK
CONCLUSIONS


ABSTRACT

    This page describes advances towards a non-intrusive approach for real-time detection of driver fatigue. It uses a color video camera that points directly towards the driver's face. It monitors the driver's eye in order to detect micro-sleeps (short periods of sleep of about 3-4 seconds). The system deals with skin-color information in order to search for the face in the  input space. Allowing only those pixels with skin like color to be, we perform blob operation in order to determine the exact position of the face. We reduce the search space by analyzing the horizontal gradient map of the face, taking into account the knowledge that eye regions in the face have a great change in the horizontal intensity gradient. In order to find the exact location of the pupil, we use gray scale model matching. Using this pattern recognition technique, we track the eyes in the video frame sequence until detect errors in the tracking module. We also use the same pattern recognition technique to determine whether the eye is open or closed. If the eyes remain closed for an abnormal number of times (3-4 sec) the system draws the conclusion that the person is falling asleep and issues some kind of warning signal.

The system uses a Pentium Pro 200 MHz personal computer with a Matrox Genesis imaging board which holds a Texas Instruments TMS320C80 DSP chip. The systems performance is 15 frames per second for tracking and 10 frames per second for fatigue detection.  Back

   

INTRODUCTION

A large number of automobile accidents are caused due to driver fatigue. The "U.S. National Highway Traffic Safety Administration" has indicated that driver fatigue takes the blame for as many as 240,000 motor vehicle accidents a year in the US. Sleep related accidents cost the American government and business an estimated is of $46 billion dollars a year.
Sleep deprivation and sleep disorder are becoming a more common problem for car drivers in a society in which people seem not to have enough time to perform all the activities they need to carry out on a daily basis. Reducing the number of accidents related to driver fatigue would save the society a significant amount of money and personal suffering. By monitoring the driver's symptoms, we can determine driver fatigue early enough as to take preventive course to avoid an accident due to lack of awareness.
There are many indicators of oncoming fatigue, some of which are possible to detect with the use of a camera. Two well known symptoms that we consider feasible to be detected are micro-sleeps (short periods, 2-3 seconds, in which the driver loses consciousness) and the forward bouncing movement of the driver's head.
At the moment, we work in the extraction and tracking of facial features that can be used to determine micro-sleep symptoms.
The input to the system is a continuous sequence of images fed from a video camera. From this sequence, the system can analyze the eyes in each image, as well as compare the eyes between frames.
As in our previous version of the system [1], we have used computer vision techniques to extract and track eye locations throughout the entire video sequence. However, the techniques described in this paper have significant forward steps towards accuracy and robustness in the process of detecting driver fatigue.
Localizing the eyes at the first frame is the most computationally expensive phase of the tracking system. In this phase, the system has no previous information about the eyes location in the image. The system has to find the area of the image that will be used in subsequent frames in order to track the eyes. During the tracking phase, the search space is reduced as the system has an approximate knowledge of the eye's position from the previous frame. This tracking can be done at a relatively low computational cost. In order to detect failure in tracking, general constraints such as the distance between the eyes and the horizontal alignment of the two eyes are used. To make sure that the correct feature is being tracked, the eyes should be relocated periodically, even if no failure has been detected. By determining whether the eyes are open or closed during eye tracking, we can determine if there are any micro-sleep symptoms that can help us determine driver fatigue.
The next section describes some related work in the area of facial feature extraction and tracking and in the detection of driver fatigue. The eye tracking and driver fatigue detection steps are described in subsequent sections. Finally, we present our experimental results. Back


RELATED WORK

    There has been a lot study of about motor vehicle driver fatigue [3]. There are many commercial systems in the market that detect and signal driver unawareness. Most of these systems monitor driver's corrective movement of the steering wheel. When these movements are not normal a warning signal is issued.
    Facial feature extraction is an active research area in computer vision. Yoo [4] proposed an approach that takes into account facial symmetry in order to find the placement of the face in an image. Sophisticated methods such as eigenspace matching [5], contour tracking using snakes [6], deformable template [7], and ellipse fitting techniques are computationally very expensive. Skin color/ based detection [8,9] is a widely used technique to extract face region from background images. Eye tracking is also being widely studied due to its potential applications in multimodal user interface [10,11].
    This paper describes advances made to the vision based system for detection of driver fatigue described in [1,2]. The essentials of micro-sleep detection survive in this new version of the system, however a significantly different technique has been used in order to increase the accuracy and robustness of the system.



OVERVIEW OF THE SYSTEM

Two well-differentiated functional phases have been defined in the system: the Eye Tracking and the Fatigue Detection. The eye tracker receives the first frame as an input from the camera. At this point it is assumed that there is no previous knowledge about the locations of the eyes. We use the initial frame to localize the eyes within the entire input image. It may be the case that due to unfavorable illumination conditions or head orientation in the initial image, the eye localization may fail. In this case, we have to grab a new initial frame and apply the localization algorithm in order to find the positions of the eyes. This process is repeated until we have an acceptable certainty about the positions of the eyes. After the portion of the image containing the driver's eyes is estimated, the system enters into its tracking mode. In this mode the search space for eye in subsequent input frames is reduced to the small area surrounding the eye regions from the previous input frame. This is based on the assumption that the driver's head exhibits very small displacements in the time required to grab and process consecutive input frames. During tracking, error-detection is performed in order to recover from possible tracking failure. If an error occurs, the system gets out of tracking mode and recalculates the locations of the eyes from the active image frame.
The fatigue detection phase is closely related to the tracking phase. The information obtained from processing each consecutive input frame in the eye-tracking phase is fed directly to the fatigue detection phase if no tracking error has occurred. This phase is responsible of extracting the information from the image of the eye that will determine any signs of micro-sleeps. At each frame, when the eyes are localized, the system determines whether the eyes are open or not. Thus, we are able to tell when the eyes have been closed for too long. In practice, we count the number of consecutive frames during which the eyes are closed. If this number gets too large, we issue a warning signal.
The Fatigue detector system is to be mounted at the top of the dashboard inside a vehicle, pointing towards the driver's face. The system does not have the capability to control tilt or zoom factors in order to adjust for the driver's head movements, however it is advisable to have this capability in order to get a better resolution inside the region of interest. For experimentation, we are using a JVC color video camera, and a Pentium Pro 200 MHz personal computer with a Matrox Genesis imaging board that uses the Texas Instruments TMS320C80 DSP processor.  Back

THE EYE TRACKER

As described in [1], an eye tracking system can be can be divided into three functional units: localization of the eyes, tracking of eyes, and estimation of the tracking error.

Localization of the Eyes
In order to reduce the search space and to avoid any kind of distracting features, the first step we followed was to extract the driver's face from the input image.
In a previous version of our system [1], we had used a symmetry based approach similar to [4], in which it is assumed that there exists a horizontal symmetry on human faces and that we could reduce the search space for eyes by focusing our attention on a stripe of image around this symmetry line. This approach, although very fast, showed to be quite fragile because the system expected that the driver's face be in a totally frontal view every time the system entered into the face detection step. Moreover, the technique gave information about the possible left and right margins of the face but was incapable to provide any prediction of the upper and lower margins of the face.
We have adopted an approach that uses skin color information in order to predict and track the position of the driver's face. Skin color models have shown to be very successful to segment human faces in noisy images. We have used the approach suggested in [5]. The main idea behind most skin color models lays on the notion that skin pixels do not vary as much in color as they do in brightness. This is true even for individual among different races. In other words, given an RGB representation of an image, and a pixel P1 with value [r1,g1,b1] in RGB space, and a pixel P2 with value [r2,g2,b2], we say that they have similar color but possible different brightness if the following expression ( described in [5 ] ) holds:

 


As brightness is not important to represent human skin under normal light condition, we can eliminate it by making a R3-->R2 transformation from RGB space to chromatic color space (r,g) by simple normalization

 

 



Skin color pixels are clustered in chromatic color space and they can be represented using a Gaussian distribution. For this purpose, we used as a sample space a representative portion of the face of different people under different light condition. We represented the skin color distribution using the Gaussian model , where , with



and


The procedure for creating skin color model is fully explained in [5,6].
Once the skin model was created, we used it in our system to filter the incoming video frames to allow the filtering of only those pixels that had a high likelihood of being face pixels. This allowed us to quickly detect the region of the image where the face is located. In order to reduce the computational cost we resample the input image of resolution of 640x480 into a 160x120 frame. Figure 2.b shows the result o applying this method over the incoming video frame on Figure 2.a. As one may see, only those pixels that resemble skin color have been admitted. In order to determine the exact face region from the image we use simple blob operation (we chose the largest connected region as the face region). Figure 2.c. shows binarization of the skin pixels and Figure 2.d. shows the result of the entire face detection process after the blob operation is executed.

Figure 2.a. Incoming video frame

 

Figure 2.b. Skin color separation

 

Figure 2.c. Binarization of skin color image.

wpeA.jpg (15655 bytes)

Figure 2.d. Estimated Face position after blob

 


To determine the vertical position of the eyes on the face, we use the observation that eye-regions correspond to regions of high intensity gradients, as suggested in [13]. This method was implemented in [1,2] and it is preserved intact in this version of our system. We create a vertical gradient-map, G(x,y), from the estimated region of the face. Any edge-detection method can be used. We chose to use a Sobel vertical edge kernel convolution. We then do a horizontal projection over G(x,y) by summing the gray level of pixels along the horizontal rows:

 




where H(y) is the horizontal projection histogram value at pixel row y over the gradient image. Since both eyes are likely to be positioned at the same row, H(y) will have a strong peak on that row. However, in order to reduce the risk of error, we consider the best three peaks in H(y) for further search rather than the absolute maximum. Figure 3 shows the result of this process.

 

 

 

wpe6.jpg (3372 bytes)


Figure 3.
Vertical edge map, edge projection, and estimated vertical position of the eyes


In order to further separate the two eyes within the predicted vertical position, we define a symmetry point where this separation will take place. A symmetry value is computed for every pixel-column in the reduced image. If the image is represented as then the symmetry-value for a pixel-column is given by:



    .


       
S(x) is computed for where k is the maximum distance from the pixel-column that symmetry is measured, and xsize is the width of the image. The x corresponding to the lowest value surrounded by two peaks in the symmetry plot of S(x) is the center of the face.
Once we have calculated the symmetry of the reduced image, we can consider the area to the left of the symmetry line as the region containing the left eye, and similarly, the region to tight of the symmetry line as the region containing the right eye. The result from this process is shown in Figure 4. It should be clearly noted that even though the symmetry calculation technique is exactly the same as the one used in [1] to estimate the position and limits of the face, in this system, symmetry is only used to separate the left eye from the right eye.

 

wpe9.jpg (3413 bytes)

Figure 4. Symmetry used to separate individual eye regions.


After estimating the position of the eyes in the image and using the original 640x480 input frame, we use gray scale correlation over the eye region in order to find the exact position of the iris. Gray scale correlation is a pattern matching technique that allows us to search for a pattern in an image. Models were created and stored in a database using rectangular areas of well-known sample eye images from different persons and at different face angles (see Figure 6). A search is performed by assigning a match score to each pixel in the target image, based on how closely the model matches the region around that pixel.
The match M is computed as







where N is the number of pixels in the model, M is the model and I is the image against which the pattern is being compared.

 

Figure 5. Sample open eye model Figure 6. Closed eye model

 


We defined an acceptance level of match score above which a match is considered to be true. In other words, if the match score of a pixel is above the acceptance level, we consider that the iris of the eye is center at that pixel point; if the match score is below the acceptance level, we conclude that the current pattern has no correspondence, and we use another model retrieved from the database. On the other hand, if a match is found, a new model is grabbed from the live image and used in future pattern matching. We do this in order to avoid searching the database in each eye localization phase and to make the tracking process as smooth and accurate as possible.


Tracking of the Eyes
Tracking a feature in a sequence of images involves looking for that feature in a small neighborhood centered at the location of that feature in the previous frame. The main assumption is that the feature being tracked generally has small displacements between frames. In our case, we do not need to go through the entire process of localizing the eye in the following frames. If we assume that the eyes will not move very far between two consecutive frames, we can predict that the next eye match will happen almost certainly in an area surrounding the current iris position.

 

Figure 7. Tracking eyes


Finding Tracking errors
In order to recover from tracking errors, we make sure that the distance between the eyes remains reasonably constant. We also restrict the eyes to be horizontally close to each other. If any of the geometrical constraints is violated, we relocalize the eyes in the next frame.  Back


THE FATIGUE DETECTOR

As we had stated earlier in this paper, this system (as [1,2]) detects micro-sleeps symptoms in order to diagnose driver fatigue. As the driver's fatigue increases, his/her eye blinks tend to last longer. We can determine his blink rate by counting the number of consecutive frames in which the eyelashes remain closed. Our main problem was to differentiate between an open eye, a closed eye and a total absence of eyes in the given image. As the eye closes, we can be certain that the pattern-matching algorithm will fail to keep track of the eye. However this notion should not always be interpreted as tracking error. It is very possible that a closed eye may have caused the recognition failure. Taking this into consideration, we have used a second model to keep track of the eye blinking. The model we have used is that of a closed eye, as shown in Figure 7, and the technique used is exactly the same as the one to find an open eye. If the tracking of the open eye fails, we try with the model of a closed eye. If none of these models seem to produce an acceptable match, we declare the tracking void and go back to the step relocating the eyes. Figure 8 shows how the system signals a "fatigue alert" when the eyes have been closed for many consecutive input frames (3-4 seconds). Back


Figure 8 Fatigue Signal Activated



RESULTS AND FUTURE WORK

We have tested the system with drivers of different skin color, with facial hair, and of different gender. We have monitored the system's response with different degrees of rotation and inclination. The system is able to complete the eye localization at 10 frames per second and tracking at 15 frames per second. For small head-movements, the system rarely loses track of the eyes. The system has a tolerance on head rotation of up to 45 degrees and on tilt of up to 30 degrees. Under these circumstances, the system was able to detect prolonged eye blinks in 95% of the times and it produced occasional false alarms.
We are planning to add the capability of auto-zoom on the eyes, once they are localized. This would avoid the trade-off between having a wide field of view in order to locate the eyes and a narrow field of view in order to detect fatigue.
In order to extend the symptoms detected by the fatigue detector, we are currently working on the extraction of other facial features as nostrils. Nose nostrils position when combined with the positions of the eyes can give more information on head rotation and tilt. This information would not only help to tolerate greater head rotation in the actual eye tracking and micro-sleep detecting system, but would also help to determine bouncing movements of the head, another common symptom of driver fatigue.  Back


CONCLUSIONS

We have presented a non-intrusive real-time eye tracking system. The system is able to localize and track the pupil of a driver as soon as he/she sits down in front of the video camera's view field. The system uses a skin color based approach to locate the face and a gray scale based eye feature extraction. During tracking, the system is able to automatically detect any error that might have occurred. In case of a tracking error, the system is able to recover and resume the proper tracking. The system is able to automatically diagnose fatigue by monitoring the eyes for micro-sleeps. This is achieved by counting the number of consecutive frames in which the eyes are found to be closed. Back



ACKNOWLEDGMENTS

We would like to thank Steve Hay and Ron Cassellius for their support. We would also like to thak Osama Masoud for his continuos colaboration in the project.  Back


References

[1] Eriksson, M and Papanikolopoulos, N. (1998). "A vision based system for the detection of driver datigue," in Proceedings of the ITS America Eighth Annual Meeting, 1998.
[2] Eriksson, M and Papanikolopoulos,N. (1997). "Eye-tracking for detection of driver fatigue," in Proceedings of the IEEE International Conference on Intelligent Transportation Systems, 1997.
[3] US. DOT. FHWA (1996). "Commercial motor vehicle driver fatigue and alertness study". FHWA report number: FHWA-MC-97-001, TC report number: TP 12876E, November.
[4] Yoo, T.W. and Oh, I.S. (1996). "Extraction of face region and features based on chromatic properties of human faces," Pacific Rim International Conference on Artificial Intelligence, pp. 637-645.
[5] Petland A., Moghaddam B., Straner T. (1994). "View-based and modular eigenspaces for face recognition," CVPR'94, pp. 84-91.
[6] Sobottka K. and Pitas I. (1996). "Segmentation and tracking of faces in color images," Proceedings of the Second International Conference on Signals, Sys. And Comp. pp 236-241,
[7] Yuille A.L., Cohen D.S., and Hallinan P.W. (1989). "Feature extraction from faces using deformable templates," in proceedings of CVPR, pp.104-109.
[8] Yang J., Lu W., Waibel A. "Skin-color modeling and adaptation" Proceedings of ACCV'98, vol. II, pp. 687-694 (Hong Kong).
[9] Yang J., Lu W., Waibel A., "A real-time face tracker," Proceedings of WACV'96 (Sarasota, Florida, USA)
[10] Pomerleau D. and Baluja S. (1993). "Non-intrusive gaze tracking using artificial neural networks", AAAI Fall Symposium on Machine Learning in Computer Vision, Raleigh, NC.
[11] Brunelli, R.and Poggio, T. (1993). "Face recognition: features versus templates," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 15, No 10, pp. 1042-1052.
[12] Chow, G. and Li, X. (1993). "Towards a system for automatic feature detection," Pattern Recognition, Vol. 26, No. 12, pp. 1739-1755.
[13] Stringa, L. (1993). "Eyes detection for face recognition," Applied Artificial Intelligence, No 7, pp. 365-382.
[14] Cox, I.J., Ghosn, J. and Yianilos, P.N. (1995). "Feature-based recognition using mixture-distance," NEC Research Institute, Technical Report 95 - 09.
[15] Craw, I., Ellis, H. and Lishman, J.R. (1987). "Automatic Extraction of Face-Features," Pattern Recognition Letters, 5, pp. 183-187.

Back


Copyright © UNIVERSITY OF MINNESOTA 1999 • MINNEAPOLIS, MN 55419 •
E-MAIL: sasingh@cs.umn.edu