Shade Analysis on Facial Images for Robotic Lighting

(1)

Shade Analysis on Facial Images for Robotic Lighting

Hisaya Okada Toyohashi University of Technology

Email: [email protected] Shigeru Kuriyama Toyohashi University of Technology Email: [email protected]

Abstract—Impression of a facial image is deeply depends on the shades and shadows cast on a face, and smart lighting control is therefore a key technology for obtaining its appealing appearance.

The optimal control of lighting is usually done by a skillful photographer or lighting technician in a commercial studio, but the effect of such skillful lighting can be extracted from the features of shading pattern on professional images.

The target of our research is to develop a robotized LED lighting system for automatically controlling its positions and orientations to supply expected optical effects. This article focuses on a method of estimating the lighting condition from a sample image, by which a similar shading style is obtained for the facial image taken under our robotic lighting.

I. INTRODUCTION

Positions and intensities of lighting sources are most im- portant factor for photography, especially in capturing facial images, because of its large influence on impressions.

Representative setting of a lighting is categorized as in Fig.

1, which demonstrates large variations of shading patterns for individual facial parts such as nose and cheek. The professional photographer takes more sophisticated setting with assisted reflector boards for minutely tuning the shading effects according to subject’s feature or target impression. In this research, we assume that such skillful setting of lighting condition can be roughly estimated from the sample image captured by a professional photographer. With the estimated conditions, the similar impression is automatically imitated using fully automated smart lighting system, which we call robotic lighting.

The target of this research is to supply control parameters for the robotic lighting by giving a sample portrait image.

The position and orientation of a lighting bulb is estimated by extracting the brightness features of the image and compared them against those captured with the robotic lighting. Us- ing this mechanism, ordinary users can mechanically control lighting conditions by merely selecting the image of desirable impression without any knowledge of lighting.

II. RELATED WORKS

Computer-aided photography system [2] was introduced to assist professionals in designing composition by navigating poses of full-body portraits with Kinect. Camera-mounted movable system [3] was proposed for simultaneously capturing making images from multiple angles. The effect of lighting, however, was not fully considered in these systems.

Dynamic control of lighting [4] was proposed by capturing many samples of different lighting conditions for selecting desirable one in a way of image retrieval. Movable lighting system [5] that can chase moving person was also proposed for capturing half backlight images with a flying drone. Our robotic lighting is differ from these prototypes in construct- ing the system using commercially available devices, and in focusing facial portrait images supposing self-shot (or selfie) situations.

In the field of computer graphics community, various meth- ods were proposed for controlling shading styles directly on images, by estimating and simulating lighting conditions on 2D images or 3D virtual environments. For example, automatic shadings on portraits were proposed [6] by superimposing shading mask against 2D images or 3D shapes.

Image-based approach is convenient because it requires no estimation of 3D lighting conditions. However, such image modification often causes unnatural impression due to the lack of physical or optical consistency in the real world.

In 3D model based simulations, obtaining natural appearance on synthesized facial images requires the analysis of the lighting condition and material properties such as diffusion and reflectance for each face. Moreover, rendering realistic images is unsuited to interactive, realtime manipulations of their impressions due to its large computational cost.

This research develops a smart and automated control system of a movable lighting using sample images obtained from Web pages or magazines. The lighting condition is extracted from a target image by computing similarity on shading pattern against pre-sampled training images, with which the robotic lighting is optimally controlled for substituting lighting tech- nicians. Our method extracts control parameters of the robotic lighting from a single image and therefore requires no costly complicated computations for estimating a 3D environment with multiple images or movies.

Image feature analysis [7] was proposed by using Haar-like features for discriminating professional images from ordinary ones, and quality rating of portrait images [8] was introduced by extracting image features such as color, brightness, and face orientation.

III. SYSTEM OVERVIEW

Our robotic lighting consists of a power supply and 3 LED bulbs (Philips Hue) that are mounted on a movable

978-1-5090-1636-5/16/$31.00 c⃝2016 IEEE

(2)

Fig. 1. Representative lighting style for facial images [1]

base (iRobot Roomba), as shown in Fig. 2. The movable base is controlled through the serial communication with ZigBee, and the LEDs are dimmed through WiFi with HTTP protocols. Although the colors of LEDs are fully controllable with respect to each RGB component, we fixed them at the color temperature of 4200[K] that is equivalent to the color of fluorescent light.

The free parameters of the robotic lighting therefore con- sist of the intensities for every LEDs and the position and orientation of the base. The intensities are estimated via control parameters sent in APIs, and the states of the base are estimated by detecting attached markers from captured images with a camera on a ceiling, where standard computer vision technique is employed to estimate the position and orientation of the base from the positions of markers.

Fig. 2. Robotic lighting

We first collect facial images used as training samples by moving robotic lighting at fixed states for sampling lighting conditions; positions, orientations, and dimming parameters, as shown in Fig. 3 where each number denotes the state of

Fig. 3. Location of robotic lighting for capturing training samples

The facial images captured under the same lighting condition are categorized as the same class, and common image features are extracted for each class to determine the most similar lighting condition from an arbitrary input facial image.

IV. IMAGE-BASED ESTIMATION OF LIGHTING CONDITION

Fig. 4 shows the distributions of intensities and their gradients for the luminance images of two persons, where the images are obtained by converting RGB color images into greyscale ones using a conventional method. They were captured with our robotic lighting whose beams are projected in the left, right, and front directions, as shown in Fig. 5, where the top numbers represent the corresponding states in Fig. 3.

This demonstrates that the shading pattern, which corre- sponds to the variations of intensities and their gradients, deeply depends on the lighting states, and the differences with respect to individual face are relatively negligible. For this reason, we construct classifiers of lighting condition by learning the image featured captured with the same setting of robotic lighting, and estimate the condition only from a single image using the classifiers.

The shading styles on faces are mostly affected by lighting conditions, and their features are obtained by analyzing the regions of bright or shaded patterns and the directions to which the intensities is attenuated. The global computation of shading features is therefore inadequate due to the lack of the positional information for such regions.

Since detecting the parts of a face often becomes difficult when covered by large shadows, we divide an image into regular-sized rectangle blocks for locally computing intensities and their gradients to make a histogram, as shown in Fig. 6.

Then, feature are obtained by concatenating the histogram, which compose the vectors whose dimension is the product of the number of histogram bins and the number of blocks.

This feature vector is utilized as training samples for classifiers with which the class of specific lighting condition is estimated via image retrieval mechanism with an input image.

We divide each image experimentally into 5 ×5 blocks to compute two kinds of histograms of 20 bins for intensity and gradient values per each block, which constitutes a

(3)

Fig. 4. Variations of intensities and their gradient for different faces

Fig. 5. Example of training images

5×5×20×2 = 1000dimensional vector as a whole. Since the relative locations of every face parts are roughly similar among persons, our block-wise feature can approximately involve the individual feature of each facial part.

We construct Support Vector Machine (or SVM) by using these features pre-classified for the lighting states controlled with our robotic lighting. We introduced an Error Correcting Output Coding method [9] for extending one class classifier of SVM into multi-class.

V. EXPERIMENTAL RESULTS

A. Dataset

We collected facial images of 7 subjects lit by our robotic lighting, where the 12 patterns of lighting conditions were investigated by taking the combinations of two bulb’s locations and 6 patterns of base’s states (positions and orientations) shown in Fig. 3, where Fig. 5 demonstrates the examples of captured images.

We here construct the classifiers of image features separately for faces with/without glasses, for investigating their effect. We therefore collected two kinds of image dataset; one training images were captured in wearing glasses, and the other without wearing their glasses. Consequently,7×12×2 = 168images were captured for training in total.

Moreover, we added the Extended Yale B dataset[10][11], which is often utilized in recognizing human faces under various lighting conditions[12]. The images of this dataset were

captured for 38 subjects with a single lighting whose directions are altered in 64 directions. The 18 image were intentionally omitted due to their ill-conditions for feature extraction, and this constitutes 38×64−18 = 2414 images in total, which is sufficient for reliably estimating the performance of our classifiers.

B. Accuracy in estimating lighting condition

We experimentally investigated the performance of our method in terms of the accuracy in discriminating image features of the same lighting condition.

We evaluated three kinds of features:

• Histogram of luminance intensities

• Histogram of intensity gradients

• Merged histograms of the above two

where these features are computed in two ways: one is globally extracted from a single image as a baseline and another is locally extracted for regularly-divided blocks.

We computed the error rate using the percentage of falsely classified images as shown in Table I, where leave-one-out cross-validation is employed for our dataset, and 10-folds cross-validation for Yale B dataset. This result demonstrates that error rates are dramatically decreased by extracting local features in a block-wise manner, compared to the global ones, and merging two kinds of features can more correctly estimate the lighting conditions for all datasets. However, local features of gradients shows the higher error rates against global ones in our samples. This larger errors may be due to the poorly aligned facial areas of our image dataset caused by manual cropping. In addition, error rates of local features are also relatively higher for the facial images wearing glasses.

This implies that the sharply constrasted images of glasses is sensitive of mis-alignment of facial areas, which badly effects on the reliability of local features.

We found that even the mis-estimated images have similar conditions to the input image. For example, Fig. 7 demonstrates that the impressions of the three mis-estimated images are not so different from the correctly estimated ones. Table II shows the mean average precision (mAP) within 3rd, 5th, and for all, using merged features. This shows that the image of similar shading style could be found by merely giving top-5 images as candidates.

Fig. 8 shows the problematic examples which include the face of glossy skins and flatly shaded pattern. More

(4)

Fig. 6. Flow of computing block-wise local features

TABLE I

ERROR RATES VIA CROSS-VALIDATION FOR EACH FEATURE[%]

XXXXXFeatures XXXXX

Dataset Our samples (with glasses)

Our samples

(no glasses) Yale B

Brightness Global 50.0 59.9 69.8

Local 31.0 28.6 19.5

Gradient of Global 29.8 23.8 26.8

brightness Local 32.1 25.0 16.4

Merged Global 20.2 25.0 23.1

Local 26.1 20.2 13.8

TABLE II

RANKING PERFORMANCES IN CLASSIFYING LIGHTING CONDITIONS hhhhhhh

hhhhhhhhhh Dataset & condition

Ranking index

mAP@3 mAP@5 mAP

Yale B dataset Global 0.846 0.855 0.858

Local 0.911 0.915 0.916

Captured with robotic lighting

No glasses Global 0.835 0.843 0.845 Local 0.869 0.878 0.879 With glasses Global 0.887 0.892 0.892 Local 0.849 0.855 0.857

sophisticated method for discriminating skin materials should be developed for the former case. Although the latter case suggests the limitation of our classification, such featureless appearances can be neglected in our method that aims at imitating attractive, impressive styles.

C. Impression transfer with robotic lighting

We finally demonstrate our actual application; we compare the impression of images captured with our robotic lighting to that of the sample image selected among a Yale B dataset as a target shading style.

Fig. 9 shows the comparison between these images. The lighting conditions of two image dataset are different from each other except that both image datasets are captured with a single bulb. Despite of such conditional differences, our method can select most similar lighting condition for gener- ating the images that has similar shading style around nasal muscle and glossy style on cheek.

Fig. 7. Example of false estimation

Fig. 8. Faces easily causes false estimations

VI. CONCLUSION AND DISCUSSION

This paper proposed a method of estimating lighting condition from the feature of a single facial image. The robotic lighting system is controlled using the estimated condition for imitating the impression of the image. We have experimentally demonstrated its feasibility by investigating the accuracy in estimating lighting conditions.

Essentially, the sample image of target shading pattern has no limitation except that the sample image should be correctly cropped for making accurate correspondence of each blocks.

The facial image captured in complicated lighting conditions, however, could be selected as an input image whose feature is not trained, and such unexpected lighting condition degrades the restoration of shading pattern as a matter of course. Larger scale of training images should be collected for ensuring the

(5)

Fig. 9. Style imitations of images captured with robotic lighting

scalability against various lighting conditions.

Current limitation of our method is that the classifiers of lighting conditions should be constructed for facial image with and without glasses separately. Smarter classifier that can handle both types of images should be developed by extracting more robust features. Moreover, accurately cropping facial area is expected for improving estimation accuracy, which is included in our future works. Intrinsic images [13], which are obtained by decomposing images into the albedo of facial skin and illumination component, could be introduced for cancelling of the effect of skin materials. Hierarchical decomposition of image features [14] is also possible approach for obtaining more complicated structures of shading patterns.

Our target is the synthesis of facial images whose impression is intuitively controllable, and thus psychological evalua- tion should be developed in order to qualitatively estimate the allowable difference on shading patterns from the viewpoint of impression. Our future work also includes on-the-fly, realtime motion-control strategy of our robotic lighting by monitoring the change of image features.

REFERENCES [1] http://www.portraitlighting.net/patternsb.htm

[2] Hongbo Fu, Xiaoguang Han, Quoc Huy Phan.:Data-driven Suggestions for Portrait Posing, SIGGRAPH Asia 2013 Technical Briefs, 29, ACM, 2013.

[3] Yuji Kokumai, Hideki Sasaki, Tomomi Takashina, Yutaka Iwasaki.: A New Photography Style Using a Shooting Assistant Robot, SIGGRAPH Asia 2013 Posters, 24, ACM, 2013.

[4] Ankit Mohan, Jack Tumblin, Bobby Bodenheimer, Cindy Grimm, and Reynold Bailey: Table-top Computed Lighting for Practical Digital Photography, ACM SIGGRAPH 2006 Courses, 2006.

[5] Manohar Srikanth, Kavita Bala, and Fredo Durand: Computational Rim Illumination with Aerial Robots, Proceedings of the Workshop on Computational Aesthetics, ACM, 2014.

[6] YiChang Shih, Sylvain Paris, Connelly Barnes, William T. Freeman, and Fredo Durand.:Style Transfer for Headshot Portraits, ACM Trans.

Graph. 33, 4, Article 148, 2014.

[7] Xin Jin, Mingtian Zhao, Xiaowu Chen, Qinping Zhao, Song-Chun Zhu.:Learning Artistic Lighting Template from Portrait Photographs, Proceedings of the 11th European Conference on Computer Vision: Part IV, pp.101–114, 2010.

[8] Miriam Redi, Nikhil Rasiwasia, Gaurav Aggarwal, Alejandro Jaimes.: The Beauty of Capturing Faces: Rating the Quality of Digital Portraits, Computing Research Repository(CoRR), http://arxiv.org/abs/1501.07304

[9] Escalera, S., O. Pujol, and P. Radeva.:Separability of ternary codes for sparse designs of error-correcting output codes, Pattern Recog. Lett., 30, 3 (Feb. 2009), 285-297.

[10] The Extended Yale Face Database B, http://vision.ucsd.edu/∼leekc/

ExtYaleDatabase/ExtYaleB.html, 2016.6.2.

[11] Athinodoros S. Georghiades, Peter N. Belhumeur, and David J. Krieg- man.:From Few to Many: Illumination Cone Models for Face Recogni- tion under Variable Lighting and Pose, IEEE Trans. Pattern Anal. Mach.

Intell. 23, 6, June 2001, 643-660.

[12] Kuang-Chih Lee, Jeffrey Ho, and David J. Kriegman.:Acquiring Linear Subspaces for Face Recognition under Variable Lighting, IEEE Trans.

Pattern Anal. Mach. Intell. 27, 5, May 2005, 684-698.

[13] Yair Weiss.,Deriving Intrinsic Images from Image Sequences, Interna- tional Conference of Computer Vision, 2001, 68-75.

[14] Lazebnik S., Schmid C., Ponce J.:Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Computer Vision and Pattern Recognition, 2006, 2169-2178.