관리 메뉴

HAMA 블로그

OpenCV 얼굴인식기 (with Harr Feature ) 본문

통계 & 머신러닝 & 딥러닝

OpenCV 얼굴인식기 (with Harr Feature )

[하마] 이승현 (wowlsh93@gmail.com) 2016. 3. 7. 11:37


목적 

이 세션에서는 

  • 얼굴 탐지의 기본을 살펴볼것입니다. ( Haar Feature-based Cascade 분류기를 통한) 
  • 눈 탐지 등  다양한 탐지를 위한  확장에 대해서도 알아볼것입니다.

기본 

 Haar feature-based cascade 분류기를 이용한 객체 탐지는 Paul Viola 와  Michael Jones 의 논문 ( "간단한 피처의 Boosted Cascade 를 이용한  빠른 객체 탐지"  -  2001년)  에서 제안된  매우 효율적인 객체 탐지 방법이다. 많은 수의 옳고 그른 이미지들로 부터 학습된 cascade function 에 의한  접근 기반의 머신러닝이다. 

여기서 우리는 얼굴탐지를 해볼것인데, 우선 알고리즘은 많은 수의 옳은 이미지 (얼굴)들 과 그른 이미지들 (얼굴 이외의 이미지들) 이 분류를 위한 학습을 위해 필요하다. 그리고 나서 우리는 그것으로부터 특징 features 를 추출할 것이다.이것을 위해 haar features 가 사용된다. 그것들은 convolutional kernel 과 같다. 각각의 피처는 단일 값을 갖는데 ,  검정색 사각형 아래 픽셀 값의 합계와 흰색 사각형 아래 픽셀의 합계에서부터 추출되어 얻어진다.  


haar_features.jpg
image

각 커널의 가능한 크기와 위치들은  피처들의 양을 계산하기위해 사용된다. ( 얼마나 많은 계산이 필요할것인지 상상해보라. 24x24 윈도우는 160,000 피처를 만들것이다). 각각의 피처를 계산하면서 우리는 흰색과 검정색 사각형아래의 픽셀의 합을 구할 필요가 있는데  이것을 해결하기위해 그들은 integral 이미지를 소개했다.이것은 픽셀의 합을 구하는걸 간소화 시킨다.  

이런 모든 피처들 모두를 계산하는것은  불필요하다. 예를들어 아래 이미지를 보자. 첫번째 줄은 2개의 좋은 피처를 보여준다.  첫번째 피처는 눈 부위를 나타내는데 대부분 코나 빰에 비해서는 좀 더 어둡다.  두번째 피처는 눈은 콧대에 비해 더 어둡다는 특성을 나타낸다. 빰이나 다른 어떤 곳에 동일한 윈도우를 적용할 필요는 없어보인다. 160,000+ 의 피쳐들 중에서 가장 좋은것을 어떻게 선택할까?? 그것은 Adaboost 방법에 의해 처리된다.

haar.png
image

For this, we apply each and every feature on all the training images. For each feature, it finds the best threshold which will classify the faces to positive and negative. But obviously, there will be errors or misclassifications. We select the features with minimum error rate, which means they are the features that best classifies the face and non-face images. (The process is not as simple as this. Each image is given an equal weight in the beginning. After each classification, weights of misclassified images are increased. Then again same process is done. New error rates are calculated. Also new weights. The process is continued until required accuracy or error rate is achieved or required number of features are found).

Final classifier is a weighted sum of these weak classifiers. It is called weak because it alone can't classify the image, but together with others forms a strong classifier. The paper says even 200 features provide detection with 95% accuracy. Their final setup had around 6000 features. (Imagine a reduction from 160000+ features to 6000 features. That is a big gain).

So now you take an image. Take each 24x24 window. Apply 6000 features to it. Check if it is face or not. Wow.. Wow.. Isn't it a little inefficient and time consuming? Yes, it is. Authors have a good solution for that.

In an image, most of the image region is non-face region. So it is a better idea to have a simple method to check if a window is not a face region. If it is not, discard it in a single shot. Don't process it again. Instead focus on region where there can be a face. This way, we can find more time to check a possible face region.

For this they introduced the concept of Cascade of Classifiers. Instead of applying all the 6000 features on a window, group the features into different stages of classifiers and apply one-by-one. (Normally first few stages will contain very less number of features). If a window fails the first stage, discard it. We don't consider remaining features on it. If it passes, apply the second stage of features and continue the process. The window which passes all stages is a face region. How is the plan !!!

Authors' detector had 6000+ features with 38 stages with 1, 10, 25, 25 and 50 features in first five stages. (Two features in the above image is actually obtained as the best two features from Adaboost). According to authors, on an average, 10 features out of 6000+ are evaluated per sub-window.

So this is a simple intuitive explanation of how Viola-Jones face detection works. Read paper for more details or check out the references in Additional Resources section.

( 동영상 링크 : https://www.youtube.com/watch?v=WfdYYNamHZ8 )

Haar-cascade Detection in OpenCV

OpenCV comes with a trainer as well as detector. If you want to train your own classifier for any object like car, planes etc. you can use OpenCV to create one. Its full details are given here: Cascade Classifier Training.

Here we will deal with detection. OpenCV already contains many pre-trained classifiers for face, eyes, smile etc. Those XML files are stored in opencv/data/haarcascades/ folder. Let's create face and eye detector with OpenCV.

First we need to load the required XML classifiers. Then load our input image (or video) in grayscale mode.

1 import numpy as np
2 import cv2
3 
4 face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
5 eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
6 
7 img = cv2.imread('sachin.jpg')
8 gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

Now we find the faces in the image. If faces are found, it returns the positions of detected faces as Rect(x,y,w,h). Once we get these locations, we can create a ROI for the face and apply eye detection on this ROI (since eyes are always on the face !!! ).

1 faces = face_cascade.detectMultiScale(gray, 1.3, 5)
2 for (x,y,w,h) in faces:
3  cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
4  roi_gray = gray[y:y+h, x:x+w]
5  roi_color = img[y:y+h, x:x+w]
6  eyes = eye_cascade.detectMultiScale(roi_gray)
7  for (ex,ey,ew,eh) in eyes:
8  cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)
9 
10 cv2.imshow('img',img)
11 cv2.waitKey(0)
12 cv2.destroyAllWindows()

Result looks like below:

face.jpg
image


Comments