Abstract
We continually and effortlessly make meaning of the sonic world around us. In music and speech, complex auditory percepts arise when the sensory encoding of physical stimulus properties interact with the neural substrates of domain-specific structural knowledge. For example, in tonal music, the psychological representation of a pitch is determined not only by its afferent sensory properties, but also by its functional relationship with the musical context. Although behavioral, psychophysical, neuroimaging and neurophysiological research over the past several decades have advanced our understanding of how this phenomenon emerges, we are yet to fully explain the neural mechanisms that underlie our ability to extract meaningful information from an acoustic waveform arriving at the ears.
Over the course of three separate experiments, this thesis examined the representational dynamics of musical pitch (chapters 4 & 5) and speech (chapter 6) in human cortex. In each instance, a set of multiple stimuli of interest were presented to listeners while recording their Magnetoencephalographic (MEG) or Electroencephalographic (EEG) activity. In the first two experiments, stimuli comprised a set of musical tones presented within a Western Tonal context. In the third experiment, stimuli comprised eleven different phonemes – the smallest contrastive unit of speech capable of changing word level meaning. Examining multiple stimuli within the same experimental session enabled the neural representation to be characterized based on its collective dissimilarity structure. To measure the dissimilarity between evoked cortical response patterns corresponding to two given stimuli, multivariate pattern analysis (MVPA) was applied to “decode” the stimulus listeners heard from their underlying cortical activity. For each pairwise stimulus combination, decoding accuracy was used as a proxy for their representational distance in the brain. To evaluate the extent to which these neural distinctions honored sensory, acoustic, or perceptual features, empirical M/EEG-based dissimilarities were compared with predictions stemming from various acoustic, peripheral, or perceptual models of stimuli within the framework of Representational Similarity Analysis (RSA).
In experiment 1, we sought to assess the extent to which pitch “class” (i.e. the harmonic function served by a tone within its musical context) could be decoded from neuronal population activity. Using trained musicians as subjects, the MEG activity elicited by four different “probe tones” following a brief tonal context was recorded. Stimuli comprised a set of four pitch-classes whose harmonic and perceptual properties make them strong candidates for observing a clear representational structure in the brain. Two pitch-classes (the tonic and dominant) were “in-key” and perceptually stable within the prevailing context, while the other two pitch-classes (the minor 2nd and augmented 4th) were “out-of-key” and highly unstable. Using MVPA, we observed that the cortical responses to stable and unstable pitch classes were highly separable from one another. To a lesser extent, the brain also distinguished between the two stable classes. However, we found that neural distinctions between the unstable classes were relatively weak, suggesting that in the absence of a clear harmonic schema, the brain’s representation of pitch converges. These neural distinctions were best accounted for by a model based on the standard tonal hierarchy – indicating that the difference in population coding of different pitches in cortex honored the differences in their perceived stability.
The aims of experiment 2 were to complete the characterization of the neural representation of musical pitch for all twelve pitch-classes. Additionally, we examined the temporal dynamics with which sensory representations of pitch (based on acoustics) interface with higher-level representations based on the tonal schema of Western music. Given the high temporal resolution of MEG, the representational dynamics of musical pitch was probed by applying a sliding classification window; training and testing a new classifier at each time point in the neural epoch. Two different models significantly predicted neural dissimilarities at different peristimulus time windows. Beginning 100 ms after onset, cortical distinctions were explained by differences in the fundamental frequency of tones. However, consistent with the findings of experiment 1, from 200 ms onwards the brain’s representation reflected the hierarchy of perceived stability. In addition to examining the brain’s representation of pitch within one key, we also measured the relationship between different major keys in the cortex. In music theory, distances between different keys are described by the well-known circle of fifths. Research suggests that the cognitive basis of “tonality” rests in the pattern of dissimilarities between individual pitches, and we therefore reasoned that two keys should be related to the extent that they impose a similar structure between individual tones. Indeed, when transposing the original neural distinctions to different musical keys, and correlating the collective dissimilarity structures with one another, the circle of fifths was recovered. The results of experiments 1 and 2 therefore provide a direct link between the complex perceptual structure of tonal music and its underlying origins in the cortex.
Similar to musical pitch, speech perception arises from the mapping of a continuous acoustic signal onto perceptually discrete learnt categorical representations. The broad goals of experiment 3 were to understand the neural processing transformations that occur in the ascending auditory pathway enabling a noisy and highly variable acoustic signal to be mapped onto an invariant representation of a given phoneme. Using similar stimulus decoding methods to those used in previous experiments, we characterized the dynamic representation of a set of eleven consonants based on their evoked EEG activity. Results indicated that cortical dissimilarities between consonants were commensurate with their articulatory distinctions, particularly their manner of articulation, and to a lesser extent, their voicing. To examine the relationship between consonant representations at the auditory periphery and cortex, MVPA was also applied to modelled auditory-nerve (AN) responses of consonants, and time-evolving AN-based and EEG-based dissimilarities were compared with one another. Cortical distinctions between consonants in two periods of activity, centered at 130 ms and 400 ms after onset, aligned with their peripheral dissimilarities in distinct onset and post-onset periods respectively. In relating speech representations across articulatory, peripheral and cortical domains, we further the understanding of crucial transformations in the auditory pathway underlying our ability to perceive speech.
In sum, by measuring cortical stimulus representations in a dynamic fashion and relating them to representations across acoustic, peripheral, and perceptual domains, this thesis furthers our understanding of crucial transformations in the auditory pathway that underlie our ability to perceive speech and music.