Emotion processing in different media types: Realism, complexity, and immersion

With the aim to improve the ecological validity when studying real life phenomena, research has increasingly been employing more complex and realistic materials, from pictorial and verbal (e.g., movies vs. pictures, narratives vs. single words), to interactive or virtual settings. This article has the objective to understand the emotional impact of these different types of media. It first summarizes neuroimaging findings on emotion processing, focusing on the development toward more realistic and complex materials. The presented literature shows that all media types, simple words or complex movies, may induce consistent emotional responses, mirrored in activations in core emotion regions. Regions related to the (embodied) simulation of another’s bodily state, and mentalizing, the cognitive representation of another’s mental state, are particularly reported in response to more complex, narrative or social materials. Other media-specific responses are described in sensory or language brain regions, while dynamic and multimodal stimuli are reported to yield behavioral advantages together with increased emotional brain responses. Finally, the article discusses the role of immersive processes for emotional engagement in different media settings. The potential to make the viewer immerse into fictional or artificial worlds, is proposed as a crucial modulator for emotional responses in different media types, leading to the formulation of open questions and implications for future research on emotion processing. Correspondence to: Lorna Schlochtermeier, FreieUniversität Berlin, HabelschwerdterAllee 45, 14195 Berlin, Tel: +49 (0)176 64760649, Fax: +49 (0)30 838 55620; E-mail: l.schlochtermeier@fu-berlin.de

A recent aim of affective neuroscientific research is to use more complex, dynamic and social stimulus materials to create more naturalistic settings and thus overcome the problem of studying real life phenomena in an artificial laboratory context [25][26][27]. Focusing on media reception, the particularly powerful emotion effects of watching narrative movies [28], or reading literature [29][30][31] have been emphasized. This review aims to analyze the emotional impact of different media types, focusing on the development from simple to more complex and realistic materials. It therefore illustrates similarities and dissociations of neural networks underlying emotional processing in different media types. It then discusses how the potential to make viewers or listeners dive into and immerse into artificial or virtual worlds [29,32,33] may play a crucial moderating role for emotional responses to different media types. This article focuses on review articles and neuroimaging studies to capture main regularities as well as important findings of single studies (based on a search in "pubmed"and "web of science" published after 2000). We used several keywords, i.e. emotion processing, fMRI, narratives, movies, films, virtual reality, multimodal, dynamic, complex. Rather than giving an exhaustive review of the literature, we present those studies that best illustrate a comparison between media types. As this procedure and the present state of research appear insufficient to reach final conclusions, outstanding research questions are formulated instead.
The basic question we address is how the response to affective information is moderated by the medium it is presented in. As an example consider a picture (i.e., photo of snake) from the International Affective Picture System (IAPS, [9]) and its corresponding verbal representation from the Affective Norms for English Text (ANET, [34]) (ANET: "You watch a giant snake coiled in a display case. You freeze, as the snake's eyes move in your direction, and a red forked-tongue darts out." Is a perceptually more realistic pictorial representation generally more engaging? What if a verbal description leads to a deeper attentive and semantic processing, engaging immersive processes like being transported into and absorbed by a story world [e.g., 30,31,[35][36][37][38]. We will examine this issue below by considering verbal, pictorial, dynamic audiovisual, virtual and interactive stimulus classes, followed by a discussion on the role of immersion in the different media settings.

Emotion processing with different media types
Verbal stimuli: Words, sentences, poems and narratives Simple words, sentences, scripts, poems or narratives have been used as stimuli to induce emotional responses. Although being perceptually non-realistic, emotional content in verbal stimuli can evoke strong and reliable emotional responses on behavioral and brain level, engaging manifold brain networks, related to language, mentalizing, reward and emotion processing [31][32][33][34][35][36][37][38][39]. In single word recognition, typically an emotion processing network including the anterior cingulate (ACC), orbitofrontal cortex (OFC), hippocampus (HP), and extrastriate areas is activated [e.g., [40][41][42][43][44]39], while amygdala (AMY) activations are usually only found for highly arousing words [45]. Research on more complex verbal materials, such as sentences, poems, and narratives suggests an involvement of reward areas [3,19] as well as interactions between mentalizing, the cognitive representation of the other's mind (also referred to as theory of mind), and emotion processing [e.g., [46]. Activations are reported consistently in the emotional network together with regions related to mentalizing, such as temporoparietal junction (TPJ), superior temporal sulcus (STS), middle temporal gyrus (MTG) and medial prefrontal cortex (MPFC) [4,37,47,48]. Furthermore, intense emotional content in written narratives is reported to cause a joint recruitment of both emotional and language regions of the brain [48,49] and while engaging in stories both, mentalizing and premotor networks that are related to the embodied simulation of another's actions, are activated [50]. When listening to emotional stories, in addition to emotion processing, language and mentalizing networks [51], the left anterior temporal lobe plays a key role [52]. Emotional prosody further activates the human-selective voice area and the posterior part of the STS, corresponding to the processing of the emotional content of the speakers voice [53,54]. Synchronizations between subjects in sound and speech comprehension networks (as well as embodied simulation networks) are reported to being modulated by valence and arousal when listening to language [55].

Pictorial stimuli: Pictures of scenes, static and dynamic faces
Compared to words, perceptually more realistic pictures are often reported to have behavioral advantages and evoke stronger brain responses [39,56]. The literature indicates that complexity contributes to these findings: When controlling for complexity, Schlochtermeier et al. [57] did not replicate this effect, suggesting that a picture's perceptual complexity may account for its increased emotional brain response compared to a word. Pictorial stimuli usually depict complex sceneries or facial expression, which are presented statically or dynamically. They produce powerful responses in emotional brain regions [58], engaging processes related to the depicted content in face, object or social processing regions. Compared to photos of facial emotional expressions, photos of natural sceneries show specific activations in the lateral occipital cortex and thalamus as well as common activations in emotional networks. Faces, on the other hand, activate specific regions such as the FG (fusiform gyrus) and the STS [e.g., 59]. Increasing their complexity and realism by presenting facial expressions dynamically rather than statically, results in more pronounced activations for dynamic emotional expressions. They are reported to evoke stronger activations in regions, such as AMY, MTG, FG, STS and IFG (inferior frontal gyrus), which together with findings of faster and better emotion recognition responses, have been attributed to emotional responding being facilitated through increased facial, movement or social processing [review: 60, meta-analysis: 61].

Dynamic audiovisual stimuli: Silent film clips, audiovisual movies and 3D movies
Apart from short sequences of facial expressions, several stimulus sets taken from fictional movies have been developed with the aim to induce powerful emotional responses on behavioral and brain level [e.g., 13]. Regarding neural correlates of emotions elicited by film, Goldin et al. [62] report different emotional networks for sadness including activations in MPFC, IFG, STG, and AMY, and happiness including medial frontal gyrus (MFG), IFG, dorsolateral PFC, temporal lobes (TL), and the caudatus (CAUD). Film stimuli have been used without sound, but also, increasing the stimulus' complexity, with music including the complete, movie composition. When comparing uniand multimodal stimuli, several studies report emotional modulations in regions associated with cross-modal convergence in the TL for multimodal stimuli [e.g., 63,64] as well as stronger responses in limbic and paralimbic structures, for congruent facial and vocal emotion expressions [e.g., 65]. It is reported that emotional multisensory information is integrated at multiple levels, from unisensory cortices to higher order association areas [15,26], which may lead to additive emotional effects [66]. Beside emotional brain activations, networks related to simulation and mentalizing are reported during empathic engagement in audiovisual movies [67], as well as correlations of subjective valence ratings with activations in emotional and sensory simulation regions [68]. Revealing the powerful emotional effects of film stimuli, Nummenmaa et al. [69] showed that while viewing movies their valence modulated widespread synchronizations in emotion and simulation networks, and arousal modulated activation of dorsal attention networks.

Interactive settings and virtual environments
A recent plea to overcome the spectatorial perspective of participants passively watching, reading or listening to emotional or social situations, has lead to the introduction of methods that include virtual, real-life and interactive settings [20,70,71]. Several studies indicate that the inclusion of bodies, interaction, a further spatial dimension, or further modalities changes the way emotions are processed. Research on emotions in multimodal settings points to the relevance of all human senses for emotion processing [26]. Pleading for the inclusion of bodies in naturalistic emotion research, de Gelder et al. [27] reported that the observation of bodily expressions activates more brain areas compared to facial expressions, including the FG and motor-related areas. In addition, the variation of spatial proximity modulates activation in visuospatial brain regions in threat situations indicating that a spatial dimension may add to the understanding of social emotions [72]. Using video games, Mathiak et al. [21] showed that emotional as well as spatial and self-referential regions played a role in complex affect changes in a virtual environment, while active engagement in a video game influences striatal (CAUD, PUT) reward circuits [73]. Interactive compared to passive perspectives are suggested to differentially involve simulation networks [20], indicating that besides recruiting additional brain areas and facilitating emotional responding, interactive or virtually realistic settings, as compared to passive perceptive ones, might change how we understand another J Integr Syst Neurosci, 2015 doi: 10.15761/JSIN.1000109 person's feelings or thoughts and thus change the way we evaluate and experience emotional situations.
In all, hard and fast distinctions between the processing of emotional information in different media types are conceivably difficult. Rather, the literature suggests a complex interplay of emotional information with language and sensory properties of a stimulus (a tentative overview of processing differences is illustrated in Figure 1). While common activations in core affective regions are consistently reported, emotional information engages several specific processes related to the setting, such as speech, language, movement, object and face recognition as well as multimodal integration, mentalizing, simulation and empathizing. Such interactions of emotional responses with the medium or setting may be interpreted within constructivist accounts of emotion [16,74,75]. Latter theories propose that emotional experiences recruit situation-specific distributed modal systems of the brain, including the perception of the external environment, internal bodily sensations, and mentalizing, which may explain the mediaspecific involvement of sensory, sensorimotor or mentalizing processes for emotion processing. Providing a framework for the role of language in emotion processing, Kölsch et al.'s Quartet Theory [76] appears to be the first to include language in its model of the human emotion system. It proposes that language may function to communicate, evoke or regulate emotions. Beside stimulus-specific brain activations, the above-presented research suggests some processing advantages in the sense of behavioral advantages and additional and more pronounced emotional brain activations for more complex materials. If emotional information is presented in more than one modality [26] or includes movement [61], the additional information is integrated in multisensory brain regions, probably facilitating emotional responses, as shown by advantages in emotion recognition and increased connectivity with and activations in emotional brain regions [15,26,66]. In a similar vein, information gained within dynamic interaction and from an additional spatial dimension is suggested to facilitate and modulate the understanding of other people and emotional responding [20,27], while music and affective prosody in language induce sensations directly [76,77]. Beside some advantages of more complex materials, the reported studies show that more realistic stimuli as well as purely symbolic language or artificial movies, are able to elicit reliable and strong emotional responses.
Along this line, several studies even suggest that especially fictional settings or artificial movies may have a high emotional impact due to their potential to let the reader simulate a story world [47] or create a sense of reality [28], by recruiting the brain's mental and embodied simulation systems. Immersion, the concept of being absorbed by or diving into an artificial world [30,31,[36][37][38][78][79][80] provides a framework for this phenomenon. Its role for emotional engagement, its relation to realism and its neural mechanisms in different stimulus categories is therefore discussed in the next section.

Immersion
First of all it should be noted that the concept of immersion is far from unified [30]. It has been defined as the psychological state of constructing a mental picture of and being transported into and absorbed by a story world [e.g., 29,36,78,79]. Immersion potential in this context refers to the potential of a medium to induce immersive processes [30,31,79]. A slightly different position, suggested by Slater Figure 1. Main differences in emotion effects between the various media types and specific emotion effects evoked by more complex media types (as described by overview articles or the majority of single studies). For clarity, regions are assigned to the networks adapted from the constructivist model by Lindquist et al. [75] and descriptions of mentalizing (MENT) and embodied simulation (ES) networks by Gallese AND Guerra et al. [28], Schilbach [20] and Raz et al. [67], coded with different colors (core affect: OFC, ACC, PUT/CAUD, AMY; conceptualization: PCC, mPFC; visual: OCC, FG; language: IFG, TP, STS; mentalizing: TPJ, STS, mPFC, PCC, embodied simulation: IFG, IPL/PRECUN, INS, ACC/MCC, PM/SS (premotor/somatosensory cortex). It should be noted that this is a simplification, i.e. only key brain regions are depicted and the networks do overlap. [81], states that immersion should refer to the technological property of a medium to make an artificial setting seem real, whereas the psychological response should be referred to as "feeling of presence". While still lacking a unified conceptualization, immersive processes have been reported to being highly correlated [31] and interrelated bidirectionally with emotional engagement [82]. To better understand the relation of emotional engagement, immersion and realism, Blascovichs' social threshold model [83] may be helpful. It proposes that the degree to which an artificial, immersive setting impacts a participant cognitively or emotionally can be referred to as social influence. Social influence, here, is determined by the subjective belief that characters are real (social presence), together with the degree that a presented behavior is realistic (behavioral realism). In other words, the immersion potential of a medium, the realism of the presented behavior, and the emotionality of the content, determine the degree of emotional engagement. In this sense watching a beach ball could be comparably engaging as watching a real human being, if the setting was immersive and its behavioral aspects realistic and emotional [83]. Supporting the notion that characters do not have to be real to have an emotional impact, studies have shown that expressions of emotions in avatars are perceived and processed similar to human emotions [27,84]. It is worth noting that the immersion potential of written narratives can be high beyond objective realism [85] and was even reported to being the highest compared to film and music [36]. This issue is being tackled by an ongoing discussion on neurocognitive mechanisms underlying immersion and the sense of presence as well as on technological or compositional factors that may modulate the immersion potential of a medium. According to Ryan [79], the most likely subjective cause for immersion in verbal narratives is theory of mind. It is also suggested to rely on symbol grounding, the grounding of language in sensory-motor brain areas [86] or the closely related concept of neuronal reuse [87,88], the functional re-use of brain areas, originally related to face, pattern and object recognition, for language [29]. With regard to movies, it has similarly been suggested that their "reality effect" relies on embodied simulation [28]. In a similar line, presence in immersive virtual settings is proposed to rely on embodiment processes [89], which here may lead to multisensory body ownership illusions, the belief that virtual body parts are one's own [90]. In virtual environments, sensory enrichment, visual scale or haptic feedback, have been proposed to increase the sense of presence by facilitating embodiment processes [87]. In literature, inducing fiction feelings like empathy, sympathy, identification, suspense, or vicarious fear or joy facilitate immersion [31], as evidenced by activations in mentalizing regions being enhanced by fictional rather than realistic contextual knowledge [47] and suspenseful plot [30,31,36,91]. Backgrounding elements of a story or poem, such as familiarity or situational embedding also facilitate immersion in reading [18], while in movies, compositional features may enhance immersive processes by facilitating embodiment [28]. Prototypical facial expressions or close-ups of faces, for example, may enhance facial emotional processing [60,92]. Attractive faces capture visual attention [93] and close-ups of tools engage motor simulation [92]. Bakels [94] suggests that lighting, frequency of cuts, or changes in camera angle modulate embodied simulations. To illustrate our assumption that less realistic language stimuli as well as more realistic settings have a high immersion potential, Figure 2 shows the hypothetical immersion potential of the different stimulus categories in relation to realism and complexity in a 3-D space.

Conclusion and outstanding questions
This article summarized recent neuroimaging findings focusing Figure 2. The graph illustrates the hypothetical immersion potential of the discussed stimulus categories (and studies using the materials for emotion induction) in relation to realism and complexity in a 3-D space. It should be noted that the localization of the categories in the 3-D space is not based on conclusive empirical findings. It rather represents a heuristic hypothesis, based on the literature on immersion and illustrates our assumption that perceptually non-realistic, complex language stimuli as well as very realistic settings may have a high immersion potential.
on emotion processing in more complex or realistic materials. All considered media types evoke reliable emotional responses, involving manifold brain networks. Media-specific activations are reported in sensory and socio-emotional regions related to mentalizing and embodied simulation. Additional brain activations and a modulation of socio-emotional effects are found for dynamic, multimodal and interactive stimuli. Further, aesthetically composed movies, as well as materials that rely on our imagination, such as narratives or scripts, are able to evoke strong emotional responses. We therefore propose that the immersion potential of a given stimulus material is a crucial factor determining emotional responses in different media settings. Specific technological or compositional features of a material/medium and its context control its immersion potential, e.g., by facilitating mentalizing and embodiment processes. Experimental evidence for neural correlates of the psychological state of immersion [16,37], and the significance of simulation processes for passive compared to interactive settings [20] are still scarce and inconclusive. In sum, the current literature highlights the importance of investigating the emotional impact of imaginative and passive perceptive as well as interactive or real-life settings, to advance our understanding of the human emotion system. Differences in the processing of and interactions between media types may help understand the complex interplay of symbolic language and pictures as well as of self-referential and (inter-) action-related aspects for emotion processing.