Multimodal Emotion Elicitation and Recognition in Virtual Reality

Kuang, Zheyuan

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Kuang, Zheyuan
dc.date.accessioned	2026-06-15T03:13:49Z
dc.date.available	2026-06-15T03:13:49Z
dc.date.issued	2026	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/35413
dc.description.abstract	Virtual Reality (VR) has been effectively used for eliciting emotions, yet most research focuses on the intensity of affective responses rather than on how interaction influences those experiences. To address this gap, this thesis advances a validated VR emotion-elicitation dataset through two extensions. First, we add a new high-arousal, high-valence scene and validate its effectiveness in a within-subject study (N=24). Second, we create interactive and non-interactive versions of each scene to examine the impact of interaction on emotional responses. We evaluate interaction using subjective ratings and physiological signals. Our evaluation study (N=84) shows that interaction not only amplifies emotions but also modulates them in context, supporting coping in negative scenes and enhancing enjoyment in positive scenes. Multimodal Emotion Recognition (MER) increasingly depends on fine-grained, evidence-grounded annotations, yet inspection and label construction are hard to scale when cues are dynamic and misaligned across modalities. This thesis presents an LLM-assisted toolkit that supports multimodal emotion data annotation through an inspectable, event-centered workflow. The toolkit aligns heterogeneous recordings, visualizes modalities on a shared timeline, and packages synchronized keyframes and time windows as traceable event packets. It then uses modality-specific tools and prompt templates to draft structured annotations for analyst verification and editing. Building on the dataset extensions and annotation, this thesis further investigates MER modeling approaches in VR that integrate behavioural and physiological signals from VR headsets and wearable sensors. We introduce an LLM-based Mixture-of-Experts (MoE) framework, where experts specialize in different modalities and a router assigns weights to experts for each event. The goal is to connect predictions to traceable multimodal evidence and support interpretation of affective cues in interactive VR.	en_AU
dc.language.iso	en	en_AU
dc.subject	Virtual Reality	en_AU
dc.subject	Emotion Elicitation	en_AU
dc.subject	Affective Computing	en_AU
dc.subject	Affective Interaction	en_AU
dc.subject	Multimodal Emotion Recognition	en_AU
dc.title	Multimodal Emotion Elicitation and Recognition in Virtual Reality	en_AU
dc.type	Thesis
dc.type.thesis	Masters by Research	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en_AU
usyd.degree	Master of Philosophy M.Phil	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Sarsenbayeva, Zhanna
usyd.include.pub	No	en_AU