Automated Emotional Facial Expression Assessment and Emotional Elicitation through Film Clip Stimuli

David Ventura1, Luis Heredia1,2,3 , Margarita Torrente1,2 , Paloma Vicens1,2 1 Rovira i Virgili University, School of Medicine, Laboratory of Toxicology and Environmental Health, Reus, Spain 2 Rovira i Virgili University, Department of Psychology, CRAMC (Research Center for Behavior Assessment), Tarragona, Spain 3 Biomedical Research Institute of Lleida (IRBLleida), Research group on Neurocognition, Psychobiology of personality and Behavior genetics, Lleida, Spain


Introduction
Emotions are personal, object-directed and specific neurophysiological states, apparently unprovoked that can influence behavior, cognition and the body to motivate and facilitate adaptive responses according to natural selection (Izard, 1992). A circularity appears to exist between emotions, behavior, cognition and the body. Emotions provide feedback and modify each other in different ways and at the same time. This must be due to the influence of the environment (Fridja, 2008). Today, two main perspectives in emotion research exist. One is the dimensional perspective, which uses the concepts of valence and arousal (Russell, 1980). The other is the discrete perspective, which suggests that there are a number of basic, universally shared emotions (Ekman, 1992;Ekman, Freisen, & Ancoli, 1980). Ekman, Friesen, and Ellsworth (1972) proposed six independent families of basic emotions: happiness, sadness, anger, disgust, surprise and fear. Although these are broadly accepted, other authors have proposed different lists of basic emotions (see Ortony & Turner, 1990, for a review). These two approaches are not incompatible, however. It is possible to reconcile dimensional and discrete perspectives to some extent by proposing that each discrete emotion represents a combination of several dimensions (Haidt & Keltner, 1999;Russell, 2003). Hybrid theories are supported by experimental data in the studies by Fujimura, Matsuda, Katahira, Okada, and Okanoya (2012), in which participants were able to use discrete and dimensional perception when evaluating emotional faces. We can also argue that emotions are sometimes more complex than our notions for communicating how we feel -for instance, when two or more emotions, usually opposite in valence, are co-activated. These affective experiences have been defined as mixed emotions (Larsen, McGraw, & Cacioppo, 2001).
To assess emotional experience in humans, researchers use stimuli to elicit concrete states that we call "emotions". Since the very first studies, emotional databases have been diversified by using words, pictures and sounds as stimuli Lang, Bradley, & Culthbert, 1999;Lang, Bradley, & Cuthbert, 2008). Emotional databases that in-clude film clips have now been created. These combine movement of images and sound and provide more ecological and realistic experiences than static databases (Fernández, Pascual, Soler, & Fernández-Abascal, 2011;Gabert-Quillen, Bartolini, Abravanel, & Sanislow, 2015;Gross & Levenson, 1995;Hewig et al., 2005;Philippot, 1993;Rottenberg, Ray, & Gross, 2007;Schaefer, Nils, Sanchez, & Philippot, 2010). Emotional assessment of these databases is frequently based on highly structured questionnaires of emotion and self-reported inventories. However, they have several drawbacks, including: 1) the need for consciousness and verbal capacity applied to emotions, 2) human cognition biases, such as social desirability, 3) delayed and introspective access to emotional meta-experience that may result in memory bias, and 4) methodology biases, such as limited options or the compulsory performance of tasks other than the mere feeling of emotions (Quigley, Lindquist, & Barrett, 2014;Robinson & Clore, 2002;Rottenberg et al., 2007). However, at present there are few studies assessing the emotional profiles of film clip databases using alternative assessment methods.
An alternative method for assessing emotions in humans is based on the observation of emotional facial expression. Some authors suggest that facial expressions of basic emotions are fixed and universally shared within and between cultures, regardless of variables such as literacy (Ekman, 1992;Izard, 1992), though emotional functioning is slightly modifiable by social and cultural influences. The literature shows there is consistency between self-reported discrete emotions and facial expression assessed using the "FACS" Facial Action Coding System (Ekman et al., 1980;Ekman & Friesen, 1978;Ekman & Rosenberg, 1997). FACS uses a set of action units (AUs) for isolated or grouped muscles that are responsible for basic facial expressions of emo-tion. Artificial neural networks have helped to develop different programs for assessing facial emotional expression through FACS and, nowadays, there are various automated emotional recognition software programs available, detecting basic emotions (anger, disgust, fear, happiness, sadness, surprise and a neutral state) for research purposes such as EmotionID (Baránková, Halamová, Gablíková, Koróniová, & Strnádelová, 2019), FaceRead-er® (Noldus Information Technologies), Intraface (De la Torre et al., 2015) or FACET (iMotions®) among others. Automated methods for assessing emotions avoid researcher subjectivity and enable FACS analysis to be applied more accurately. An example of current possibilities using this assessment methodology is the study of Kanovský, Baránková, Halamová, Strnádelová, and Koróniová (2020) assessing the facial expression of compassion elicited by a film clip. Moreover, this methodology could be more sensitive to changes experienced in emotional intensities along time and it could help us to better analyze the complex nature of human emotions including emotional interaction processes. Therefore, the aim of this study was to assess the emotional profiles of various film clip stimuli through the emotional facial expression analysis using FaceReader © , an automated recognition software, and to compare the results obtained with those obtained using self-reported methods in previous studies. The authors expected to observe emotional activation patterns much more complex than those described in previous studies using the same validated stimuli for emotions elicitation.

Participants
Participants in this study were undergraduate students of Psychology during the 2015/2016 academic year. Informed consent was obtained from all individual participants included in the study. All participants had normal or corrected-to-normal visual acuity. The automated facial emotional recognition software used in this study overlaps a virtual mesh on the participant's face. A researcher reviewed all participants to detect abnormalities on the overlapping process and only those with good adjustment throughout the testing period were included in the analysis. The final sample therefore consisted of sixty-five participants (54 women, 11 men; 18-29 years of age, M = 21.44, SD = 2.34).

Stimuli
Eleven dubbed Spanish film clips from previously validated databases were used. Three previously non-validated film clips were also added since neutral and disgust conditions were not available in Spanish or failed to elicit the target emotion in our earlier pilot studies (data not published). Two film clips were used for each of the six emotional categories assessed: happiness, sadness, anger, surprise, fear and disgust.
Music, background sounds, dialogues, etc. were taken into account to determine the start and end point. All videos could be understood without additional information. All the videos came from different film sources. The videos lasted between 0:27 and 3:40 minutes and their average length was 1:28 minutes. A short description of the stimuli, their emotional condition and the studies in which they were previously validated are shown in Table 1.

Automated Analysis of Facial Expression
Emotional facial expression was analyzed using FaceReader® v6.1 software (Noldus Information Technologies, Wageningem, The Surprise Gross & Levenson (1995); Rottenberg et al. (2007) "The shining" 1:20 A child is playing with toy cars in a hallway when a ball moves towards him. The boy searches for who has thrown it but there is nobody there.
Fear Gross & Levenson (1995); Rottenberg et al. (2007) "The ring" 2:41 A man is working when his TV turns itself on. He tries to turn it off but the TV turns itself on again. The image shows a girl dressed in a dirty white dress climbing out of a well. She has her long hair over her face and walks unsteadily towards the screen. The girl crawls out of the TV and shows her monstrous face.
The man shouts and falls down, injuring himself.  Table 1 Film clips used as stimuli in the emotional elicitation procedure Netherlands). FaceReader is a software that automatically analyzes facial expression of emotion. This software works in three steps. First, it detects a face in the image and identifies 500 key landmark points through Active Appearance Model (Cootes & Taylor, 2004). Second, a 3-layer artificial neural network classifies the image according to how likely the emotion is present in a person's face. Finally, the software can assign a label to each target face along the assessment period (Goldberg, 2014). The software shows six outputs of emotional intensity for each stimulus (happiness, sadness, anger, surprise, fear, and disgust) indicating emotional intensity value of how much of each emotional expression is being displayed from 0 to 1. A higher value indicates greater likelihood that the person experiences the target emotion. According to Lewinski, den Uyl, and Butler (2014), the accuracy of the detection of each emotion is as follows: happiness 94%, sadness 86%, anger 76%, surprise 94%, fear 82% and disgust 92%.

Elicitation Procedure
Before the experimental procedure began, the participants completed an informed consent form. The experiment was performed individually with the participants alone in a room separate from the investigator. The experiment lasted approximately 35 minutes. The first video, which was the first one for the Neutral emotional state, contained on-screen instructions. Instructions included keeping postural changes to a minimum, avoiding hands or hair on the face and maintaining eye contact with the stimuli. We assessed six emotional states (happiness, sadness, anger, surprise, fear, and disgust). Stimuli were presented always in the same order. To avoid emotional residual effects, a distractor stimulus lasting 10 minutes was also presented between each emotional film clip. This involved simple mathematical operations (for example, 6 + 5 -4 =) and were presented on the center of the screen, white print on black background. All videos included five seconds of white screen before and after. All participants saw all the videos.
The videos were presented on a 23" PC screen, 60 cm from the participants, who were seated as they listened to them through headphones. The room was dark, with only a couple of LED lamps illuminating the participants to record their faces via a webcam (Microsoft Lifecam Studio 1425 1080p HD).
Procedures for the study complied with the ethical principles stipulated by the Clinical Research Ethical Committee of the Sant Joan University Hospital (Reus, Spain) with reference number 15-01-29/1proj1.

Statistical Analysis
The statistical software IBM SPSS® v25 (IBM Corporation, New York, USA) was used to analyze the results. Analyses of variance homogeneity were performed using Levene's test. One-way ANOVA and Tukey-adjusted pairwise tests were used to analyze differences in the six emotional intensities for each film clip. All emotional intensities were compared to those of their distractor in order to avoid residual effects (Rottenberg et al., 2007) using multiple t-tests with Holm-Sidak correction. Statistical significance for all tests was set at p < .05.
The second video ("Beech trees") also had an overall Emotion effect (F(5, 384) = 2.93, MSE = .0004, p = .013) and the emotional intensity of sadness was higher than that of fear (p = .011) ( Figure 1B). The "Beech trees" stimulus produced no statistically significant changes in emotional intensities in comparison with its distractor ( Figure 1C). HAPPINESS ("The hangover" and "There's something about Mary"): The Emotion factor was statistically significant with "The hangover" clip (F(5, 384) = 35.90, MSE = 0.149, p < 0.001). For this first video, the emotional intensity of Happiness was significantly higher than that of all other intensities (all ps < .001) (Figure 2A). Multiple t-test comparisons showed that only the emotional intensity of happiness increased in comparison to the values for the distractor (p < .001) ( Figure 2B).
An overall emotion effect (F(5, 384) = 37.00, MSE = 0.101, p < .001) was also observed for "There's something about Mary". This shows that the intensity of happiness was higher than that of any other emotion (all ps < .001) ( Figure 2C). Multiple t-test comparisons showed that the emotional intensities of happiness and fear increased in comparison to the values for the distractors (p < .001 and p = .003, respectively) ( Figure 2D). SADNESS ("Schindler's list" and "Kramer vs. Kramer"): An overall Emotion effect (F(5, 384) = 3.60, MSE = 0.0009, p = .003) was detected with "Schindler's list" and the intensity of anger was higher than that of happiness, surprise or fear (p = .022; p = .029, p = .005, respectively) ( Figure 3A). The intensity of anger for "Schindler's list" was higher than for its distractor (p = .002) ( Figure 3B).
There were no significant differences between emotional intensities for "Kramer vs. Kramer" (Figure 3C), nor were there any significant increases in emotional intensities compared to the values for the distractor (Figure 3D).

Figure 1
Intensity means for each emotional state for "Instructions" (A) and "Beech trees" (B). Results are shown for mean and standard error. Letters (a,b) indicate significant differences between emotional states at p < .05. Comparisons of each emotional state intensity for "Beech trees" with those for the distractors (C).

356
Studia Psychologica, Vol. 62, No. 4, 2020, 350-363 ANGER ("The piano" and "Leaving Las Vegas"): For the first of these videos, there were no statistically significant differences in any statistical analysis with regard to mean emotional intensity ( Figures 4A and 4B).
There was no overall emotion effect for the second video "Leaving Las Vegas" ( Figure  4C). However, multiple t-tests showed that the intensity of anger increased significantly in comparison with the distractor (p = .004). Also observed was a decrease in intensity for the disgust emotion (p = .031) ( Figure 4D). SURPRISE ("Capricorn one" and "Sea of love"): There were no statistically significant differences either between emotional intensities or between the intensities and the distractors for any stimulus. FEAR ("The shining" and "The ring"): There was an overall effect on emotion (F(5, 384) = 5.74, MSE = 0.0013, p < .001) for "The shining". Tukey's multiple comparisons test showed that the happiness emotion was higher than all the other emotions (compared to sadness, anger and surprise, p = .001; compared to fear, p < .001; and compared to disgust, p = .002) ( Figure 5A). There were no changes with respect to the distractor ( Figure 5B).
No statistically significant differences between emotional intensities were observed for the second video "The ring" for this condition ( Figure 5C). Like with the first video, no changes in emotional intensities were observed with respect to the previous distractor ( Figure 5D).
DISGUST ("Necrosis" and "Pink flamingos"): There was an overall Emotion effect for the first video "Necrosis" (F(5, 384) = 9.58, MSE = 0.0214, p < .001), which indicates that Figure 2 Intensity means for each emotional state for "The hangover" (A) and "There's something about Mary" (C). Results are shown for mean and standard error. Letters (a,b) indicate significant differences between emotional states at p < .05. Comparisons of each emotional state intensity with those for distractors for "The hangover" (B) and "There's something about Mary" (D). Results are shown for mean and standard error. Asterisks indicate significant differences at p < .01 (**) and p < .001 (***).

Figure 4
Intensity means for each emotional state for "The piano" (A) and "Leaving Las Vegas" (C). Results are shown by mean and standard error. Comparisons of each emotional state intensity with those for distractors for "The piano" (B) and "Leaving Las Vegas" (D). Results are shown for mean and standard error. Asterisks indicate significant differences at p < .01 (**), and p < .05 (*).

Figure 3
Intensity means for each emotional state for "Schindler's list" (A) and "Kramer vs.
Kramer" (C). Results are shown for mean and standard error. Letters (a,b) indicate significant differences between emotional states at p < .05. Comparisons of each emotional state intensity with those for distractor for "Schindler's list" (B) and for "Kramer vs. Kramer" (D). Results are shown for mean and standard error. Asterisks indicate significant differences at p < 0.01 (**).

Figure 5
Intensity means for each emotional state for "The shining" (A) and "The ring" (C). Results are shown for mean and standard error. Letters (a,b) indicate significant differences between emotional states at p < .05. Comparisons of each emotional state intensity with those for distractors for "The shining" (B) and "The ring" (D).

Figure 6
Intensity means for each emotional state in "Necrosis" (A) and "Pink flamingos" (C). Results are shown for mean and standard error. Letters (a,b) indicate significant differences between emotional states at p < .05. Comparisons of each emotional state intensity with those for distractors for "Necrosis" (B) and "Pink flamingos" (D). Results are shown for mean and standard error. Asterisks indicate significant differences at p < .001 (***), p < .01 (**), and p < .05 (*).
the intensity of disgust was higher than for all other emotions except happiness (compared to sadness, p = .020; anger, p = .014; surprise, p = .002; and fear, p = .003). Moreover, the intensity of happiness was higher than for all other emotions except disgust (p < .001 in all cases) ( Figure 6A). Multiple t-tests showed a significant increase in the intensities of fear (p = .007) and disgust (p < .001) ( Figure 6B) with respect to the emotional intensities for the distractor.
The "Pink flamingos" video revealed differences in Emotion (F(5, 384) = 5.82, MSE = 0.0165, p < .001). The emotional intensity of happiness was higher than for any other emotion (compared to surprise and fear, p < .001; compared to anger, p = .001; compared to sadness, p = .011 and compared to disgust p = .030) ( Figure 6C). When we compared stimulus intensities with the distractor, we observed a significant increase in the intensities of happiness, fear and disgust (p = .014, p = .020, and p = .011, respectively) ( Figure  6D). Table 2 shows a comparison between the emotions self-reported by participants in previous validation studies and the emotional facial expressions increased after the stimulus presentation in our study.

Discussion
The scientific literature usually shows that film clips can elicit discrete target emotions. However, differences in methodological approaches make it difficult to compare emotional video databases. Moreover, the usual classification of film clips into one discrete emotion category does not enable co-activations to be considered for more complex emotional experiences. In this study we have used an automated method to assess continuous emotional facial expressions in response to a sample of film clips previously classified using self-reported questionnaires.
As expected, neither of the film clips used for the neutral condition ("Instructions" and "Beech trees") increased the intensities of the assessed emotions. Stimuli for the neutral condition are the most difficult ones to compare between databases because not enough information is available due to an absence of data and/or the use of emotional cluster analysis (Gabert-Quillen et al., 2015;Gross & Levenson, 1995;Philippot, 1993).
With regard to happiness, in agreement with previous studies (Fernández et al., 2011;Gabert-Quillen et al., 2015;Schaefer et al., 2010) both film clips elicited a facial expression of happiness. However, a possible mixed emotion was elicited with "There's something about Mary" since we clearly observed an increase in the expression of fear. Izard (1972) suggested that "one emotion can almost instantaneously elicit another emotion that amplifies, attenuates, inhibits or interacts with the original emotional experience". A study by Andrade and Cohen (2007) reported the co-activation of fear and happiness as a type of mixed emotion, which suggests that the co-activation observed in "There's something about Mary" may be related to the clip's sexual content.
With regard to sadness, we found that the first film clip ("Schindler's List") produced an increase in anger in comparison with the distractor. Despite provoking exactly the same mean intensity in anger and sadness, this stimulus was classified as sad in a previous study by Fernández et al. (2011), whereas Schaefer et al. (2010 classified this film clip twice in the same database (anger and sadness). On the other hand, our data showed that this film clip elicited only anger in our participants. One explanation for this may be the mean age of the sample, which in our study was 21.44 (SD = 2.34), while in the study by Schaefer et al. (2010), where participants reported more anger than sadness for this film, it was 19.6 (SD = 3.11) and in the study by Fernández et al. (2011) it was 29.3 (SD = 12.4). As reported by (Blanchard-Fields & Coats, 2008), these differences indicate a gradual decrease in anger expression as one progresses through life. The second film clip ("Kramer vs. Kramer") did not elicit any emotion in our participants, unlike those in the study by Philippot (1993), who used self-reported questionnaires and classified it as a sad stimulus.
Our results for anger showed that the first film clip ("The Piano") did not elicit any emotion, though there was a slight but insignificant increase in anger. With "Leaving Las Vegas", we observed an increase in the intensity of anger and a significant reduction in disgust in comparison with the distractor. Both film clips have been classified as anger stimuli in previous studies (Fernández et al., 2011;Schaefer et al., 2010). It is also interesting that the study by Fernández et al. (2011), which used the same two films, reported a higher intensity of anger for "The Piano".
Though we proposed "Necrosis" as a disgust stimulus, both fear and disgust emotions were detected. Izard (1992) suggests that with this film clip the emotions of fear and disgust interact with and amplify each other. This interaction between fear and disgust has also been suggested by Morales, Wu, and Fitzsimons (2012). The second film clip ("Pink flamingos") was previously classified as a disgust stimulus (Fernández et al., 2011;Gross & Levenson, 1995;Rottenberg et al., 2007). However, we observed a more complex emotional profile, with increases in happiness, fear and disgust. Our results agree with reports of a possible co-activation of disgust and amusement as a kind of mixed emotion (Hemenover & Schimmack, 2007) in which the activation of happiness compensates for the aversive experience of the co-activation of fear and disgust.
Finally, we observed no differences in emotional intensity for surprise or fear even though previous databases reported discrete classifications for the film clips used in our study. Our data seem to indicate a residual effect for "The Shining" since we observed a higher intensity of happiness within the film but no changes in comparison with the distractor. Moreover, as suggested by Rottenberg et al. (2007), the stimuli for surprise have temporal characteristics that differentiate this emotion from the others, i.e. the short duration between onset and offset makes it a short-lived emotion that is hard to capture in analyses of facial emotional expression. With regard to fear, previous studies have also demonstrated that the results of behavioral and physiological measures do not match those of self-reported methods (Rottenberg, Kasch, Gross, & Gotlib, 2002).
Summarizing, our results suggest that happiness, anger and disgust are the easiest emotions to elicit and detect in young people using emotional facial recognition software, whereas surprise and fear are the most difficult. Our results also show that both single and multiple emotion activations can be elicited and detected using this method of assessment. Moreover, results show that film clip stimuli have a complex emotional profile difficult to classify into emotional discrete categories, indicating that the emotional experience in humans is more complex and is beyond the simple stimulus-response paradigm. Finally, the incongruence between emotions self-reported in previous studies and facially expressed in our study using the same film clip stimuli indicates that there is a difference between the experienced and thought out or self-reported emotion. However, in addition to the new findings, our study presents several lim-itations. First limitation is the relatively small set of participants. Moreover, the sample was not representative because most of the participants were women. Second, although the stimuli used in the study have been previously validated through self-reported questionnaires, obtaining this information in the same sample in future research could be useful to corroborate our results. Third, we did not collect information about whether the film clips were previously known to participants or were seen for the first time.
And, finally, in this study we did not evaluate the exact temporal co-occurrence of emotions. Further research is needed to examine the evolution of the emotional state over time in order to determine the exact affective chronometry of video clips and enable non-emotional or unintended emotional fragments to be removed if necessary. Simultaneous physiological and subjective measures as well as gender and age should be taken into account in order to better examine the co-occurrence of emotional processes. In the case of more wide-ranging research, the authors recommend to take into account the above-mentioned factors.