Cross-modality in multi-channel acousmatic music: the physical and virtual in music where there is ‘nothing to see’

by ADRIAN MOORE

Background

Making sense of the unknown

Perhaps the title should read: ‘Cross-modality in multi-channel acousmatic music: the physical and virtual in music where there is nothing to see (and where we are normally sitting down in a darkened room next to other people who have all paid for the rather bizarre privilege)’.

Fig 1

Figure 1: 2001: A Space Odyssey (Kubrick, 1968)

In 2001: A Space Odyssey, Moon-Watcher first hears a sound then goes to explore:

‘It was a rectangular slab, three times his height but narrow enough to span with his arms, and it was made of some completely transparent material; indeed, it was not easy to see except when the sun glinted on its edges.  As Moon-Watcher had never encountered ice, or even crystal-clear water, there were no natural objects to which he could compare this apparition. It was certainly rather attractive, and though he was wisely cautious of most new things, he did not hesitate for long before sidling up to it. As nothing happened, he put out his hand, and felt a cold, hard surface.’ (Clarke, 2010)

Of course, prior to finding the slab, Moon-Watcher had attempted to reach out and touch the moon. And if we look at the picture, I’m not sure what would scare Moon-Watcher more: the fact that the slab is not casting a shadow or the question ‘who dug this up?’ But the fact of the matter is that, within a natural environment, we suddenly have the alien. Why does Moon-Watcher almost immediately go up to it and touch it, and then proceed to taste it? Because for him, the situation is real. For acousmatic music to work, despite the unreal or surreal nature of new sonic worlds, whether sounds are natural recordings, synthetic sounds, or manipulated versions of sound files, if the environment is perceived to be as real or as plausible as possible, we should be able to do more than just hear it. We should be able to attempt to understand it. We might even become part of it as we feel moved beyond a mere academic attempt to decipher sounds. As composers, however, we are sculpting sound. The analogy of a potter working with clay is often used to explain the ‘hands-on’ nature of acousmatic music where the rules are formulated upon what we hear. We often rely upon the inherent tactility of this analogy to mitigate against the lack of a visual presence on stage.

It is very interesting to note that whilst composers and psychologists have approached the problem of cross-modality in music from vastly different starting points and methods, their conclusions are surprisingly similar. Interesting too that the differences between working with acousmatic music and Western classical music are not as divisive as one might at first think. We can compare Wishart (1986) and Smalley (2007) when talking about aspects of landscape and environment to Gaver’s notions of ‘hard’ and ‘soft’ and the natural sonification of materials (Gaver, 1993b,a); we can place Godøy’s action gestures of ‘hit’ and ‘blow’ (2003; 2006; 2010) alongside Smalley’s notions of surrogacy (1996) and underpinning our empirical listening strategies are Gibson’s ecological notions of affordances (1986) which highlight the unpredictability of our experiential listening.

Composition in the Studio

Let us not forget that, despite the acousmatic veil of the loudspeaker and the listening challenges this provokes, many sonic gestures, even if developed purely through computer code, have their shapes rooted in instrumental gesture and an analogue musique concrète practice that began with the mixing desk and has moved via the mouse to the Kinect, LEAP motion, video capture and a growing number of instrumental interfaces. For example, in the analogue studio, the motion trajectories of sound were always exaggerated through fader gestures and this practice continues today during expressive ‘sound diffusion’ (Figure 2).

Fig 2

Figure 2: Artificial and natural shape design

The survival instinct

Our instinctual listening modes allow us to analyse the auditory scene (see Bregman (1994)), dissect it for gestalt properties relating to figure and ground and then prioritise our listening towards what we think is important, based upon proximity, position, amplitude (denoting size) and pitch (denoting speed), often leading to an assessment of threat. Our dissection of an acousmatic phrase works in a similar fashion, though the threats are different. As we dissect, we begin to traverse a continuum of gesture (metaphor to the body) and texture (metaphor to our sense of touch). This division and comparison (especially during composition and reflection) affords frequent use of onomatopoeia as we internalise sounds with words. Again, the composer’s categorisation of sounds as ‘short/long/high/low’ resonates with a long line of empirical research based mainly in instrumental music; from Pratt (1930) through to Rusconi et al. (2006).

As a composer of acousmatic music I am often led from a ‘quick and dirty’ approximation of a sound’s key characteristics to techniques of development that bring the sound to its polar opposite (short – make longer, low – make higher). This approximation is often compounded by an exaggeration of a sound’s properties, something Peter Lennox calls cartoonification (Lennox and Myatt, 2011). This process also affords embodied cognition; retaining key features often through imitation, exaggeration and approximation, setting up cross-modal links in the process. As part of my teaching I tend to use a key set of polar opposites to define both sounds and technical processes and consider that we choose transformations based upon a reaction against a sound’s most potent descriptor.

Main Contribution

Environment and the multichannel condition

It was stated at the outset that listening to acousmatic music is challenging. Quite often the foreign nature of the sounds themselves immediately set up barriers and create distances between listener and landscape. The multi-channel disposition of sounds (and to a lesser extent the multi-channel diffusion of stereo material) can afford an immediate immersion, a throwing in at the deep end if you like. From that point, there is the potential (as yet not rigorously explored, only speculated upon here) for the listener to orient their listening with a certain degree of freedom as there is the potential for the sonic landscape or environment to have a degree of redundancy within it – something that, whilst not counter to the musical flow, affords a degree of polyphony beyond that achieved by a stereo mix, by employing multiple channels of audio output through multiple speakers. Through envelopment and immersion the separation of gesture and texture can be made explicit: so too the horizontal and vertical in sound.

However, despite this immersion, the listener must still negotiate their presence within the space. Smalley’s notions of personal space, distal space and distorted space (2007), and ideas of being in (flowing with) and out of (reflecting upon) time, help us place ourselves as: inside and touching; inside and searching; or outside and searching. The links between space and time help the composer with transformations and ultimately enable structures to take on form. But how are these immersive environments constructed?

Fig 3

Figure 3: being in and out of time

Ambisonic recordings and the soundfield

The process of ambisonic recording and manipulation affords an accurate representation of a three-dimensional soundfield. Encoding and decoding requires careful attention if recreation is to be exact. In the University of Sheffield Sound Studios, we decode recorded A-format (from four capsules in a tetrahedron) to B-format signals (W,X,Y,Z) and from there to either 5.1 or 8 channel formats. Unfortunately, we lack the full three-dimensional immersion that speakers at floor and roof level provide. However, there is a solidity to the recreated soundfield, even in a flat plane. This solidity is subtle, and in composition – where distortions can creep in quickly – a realistic space can be simulated quite easily. The compositional balance does however lie between the composer’s intention of integrity of reproduction versus the listener’s acceptance of this reproduction. Notwithstanding any additional composition, the physics of the captured field assures a degree of plausibility through the natural interaction of the sounds recorded.

Sounds all around me. Or, is multi-channel music relieving me of my imagination?

Whilst I am convinced of the relative increases in expressive potential afforded by working in multi-channel, it does not to any extent diminish the carefully crafted stereo master pieces of the last sixty years which are no less complex and demand an equal degree of active listening. Multi-channel acousmatic music may afford sonic redundancies and may introduce listener freedom. Denis Smalley’s article on Space-form and the Acousmatic Image (Smalley, 2007) runs to definitions of some 50 different types of space. Of particular importance are the spaces most closely associated with distance in relation to the listener. When it comes to identifying sound shapes, spectral space is clearly important. Spectral space is a metaphorical impression of space similar to that of pitch-space; a very low bass rumble accompanied by a very shrill whine will normally define a large spectral space with low density (nothing in the middle). Similarly, broadband noise will also (in theory) define the same space but the density is such that we do not perceive height or depth, rather simply ‘quantity’.

Spectral space is closely related to perspectival space which engages with the ‘tracking’ of a sound and a closer understanding of it (potentially outside of the course of time’s flow). Smalley cites Bayle’s notion of environment (with horizon, temperature, climate and the predominance of high to low defining gravity). Plausible sonic environments where agents interact and where nature’s rules such as gravity are followed can be found in numerous pieces of acousmatic music. Examples include: the bouncing ball archetype, the Doppler effect, the tendency of higher frequency sounds to require less amplitude to be heard and for these sounds often to have more agile spatial movement. Smalley places these environments and agents within a navigable gestural space – the perception of source-cause – interacting within an ensemble space – a ‘scene’ – and focused within an arena space – a frame. The above examples enable us to relate quantity to perceived quality and position ourselves within and outside the space. It is here where time perhaps plays an important role.

Imagine being close to loudspeakers listening to a multitude of dry, crisp, aggressive sonic gestures. Our position is defined by our avoidance of these sounds (as though they were throwing punches at us). This situation is like landing in the middle of a dense forest. It is us that have to move. Our situation is defined by what we bump into. Time feels more of a challenge. However, consider a similar environment when our acoustic space is viewable and experienced from within but felt to be ‘distant’. This is the environmental approach and is especially prominent in multi-channel pieces where loudspeakers are playing vastly different sounds or granulating across multiple channels. It is also the case where we perceive a landscape in the stereo field. We feel like we could rotate in our seat and appreciate another perspective as though we were in a boat in the middle of a lake. Normally the components of the soundworld are distant from us. The trees and wildlife are at the periphery of the lake: we can’t reach out and touch them but we can appreciate their movement from afar. Time is ephemeral.

Smalley describes the transmodality of spatial perception in acousmatic music. Key to this is the following: ‘Transmodal linking occurs automatically when the sonic materials seem to evoke what we imagine to be the experience of the world outside the music, and in acousmatic listening (not just acousmsatic music) transmodal responses occur even though these senses are not directly activated in order only to listen.’ (Smalley, 2007, 39)

Smalley cites Chion’s perception of rhythm as ‘trans-sensorial’. He also cites his own previous work on gesture and surrogacy and the body’s understanding of energy transferral. After cursory contextualisation of a number of psychological approaches to perception (in particular the works of Handel and Noë) Smalley reveals his need to be ‘able to act in the soundscape, being physically able to be in it; know birds as live, three- dimensional beings that fly; feel the wind; look at, touch and enter water (how could I know that water flows from its sound alone?); walk around and climb a tree; travel in a car; move out of the house into the landscape’, concluding that; ‘an acousmatic musical work has the potential to harness my enactment, my spatial enactment.’ (Smalley, 2007, 40)

It is through the idea of environment that we can construct a more embodied experience. More importantly we can accept that reduced listening is but a fraction of what we should be doing when attending to an acousmatic work.

The Unreal landscape

Acousmatic compositions are often unreal to many but in fact the sonic landscapes, no matter how foreign, are to be expected. Only sound is going to come out of the loudspeaker after all; not cotton wool, nails or some exotic scent. Perhaps our only fear is of sound power and frequency content. However, once the practicalities are dealt with, we can concentrate upon the auditory scene. Trevor Wishart’s imaginary landscape remains unrealistic but he claims that, ‘for most listeners it would remain a real landscape’ (Wishart, 1986, 47). In considering the multi-channel concert environment, Wishart continues to describe how the listener’s frame of reference may be manipulated through ‘spinning’. Moreover, proximity and amplitude are again related to ‘psychological or social distance’ (Wishart, 1986, 48). We do need to be careful about words like ‘spinning’ however. Any sound made to spin that is not spectrally inclined to do so encourages ‘spinning’ as an (extra) musical parameter and relegates sound to a mere test-tone. It is ‘fake’ but ‘real’.

Perhaps the best our multi-channel environments can become is ‘plausible’. And plausibility relies very much upon a certain degree of naturalness. Therefore, as a painter would set his canvas in a frame and proceed to wash it down, the multi-channel sonic environment is ‘created’ by its boundaries and horizons. These boundaries are not the loudspeakers themselves. At The University of Sheffield Sound Studios we mainly work with 5.1, 7.1 and 8.0 systems an amalgamation of which is shown in figure 4. There are many similarities between the eight-channel studio and the 7.1 design. What 7.1 loses in terms of surround, it gains in terms of specificity (bass and centre).

Fig 4

Figure 4:  Traditional multi-channel speaker studio

On the one hand, whilst Wishart suggests that ‘imposed’ (gestural) and ‘intrinsic’ morphologies (Wishart, 1986, 59) can be blurred by the acousmatic veil, from a composer’s point of view we should ask if this is a good thing. Our multi-channel spaces allow for redundancies that can give morphologies time to make themselves understood. 

Balance and redundancy

Perhaps the most powerful aspect of the multi-channel environment is the ability to separate layer from layer, figure from ground, texture from gesture. John Young has created numerous works for 24 independent channels and the resulting spaces are indeed plausible with very clearly defined foregrounds, surround spaces, and highly identifiable gestural material. It would be naïve to suggest that all wisp-like sounds must be of short duration and be heavily dispersed around the space, but their intrinsic morphology suggests an ephemeral energy profile. Imagine reaching out for a feather floating in the wind. The feather’s profile may be in full 3-d (high-low, front-back, left-right) or its descent from high to low may be approximated to a plane.

Bernard Parmegiani’s huge cycle De Natura Sonorum highlights a number of methodologies that demonstrate how process and compositional style are embedded with cross-modal thought. In movement V, Etude élastique, we hear ‘swirling’ sounds and exponential spectral shapes: the metaphor of elasticity and effervescence is tangible. It is clear Parmegiani is thinking about sound diffusion, as channel separation is very fixed (headphone listening highlights this). Large energies are centrally panned; whip-like gestures are left- or right-focused. Large energies also ‘duck’ the surrounding background noise (a kind of mask). The compositional question – and the one that consistently reverts back to a cross-modal naturalness – is this: to what degree can we engage with the motion of the initial objects such that a performance diffusion shapes the sound and is more than mere panning? Thus the very opening gestures of this movement could mirror the feather example earlier. They may start in a distant plane and (as an imposed morphology though diffusion) move to occupy the surround space.

So it was the case with my work Surface and a number of more recent works in the 5.1 and 7.1 studio that the centre loudspeaker became a point of focus, draining the space to a highly visible loudspeaker or becoming the demonic figure, spewing forth sound into the space.

More recently, in works such as The Battle and Counterattack, I have engaged with a surreal painterly version of the battlefield; Shakespeare’s Birnam Wood meets Twin Peaks’ forest, the idea of visible and covert forces, the structured attack descending into hand-to-hand combat, the rules of engagement, the art of war. These two works are, however, entirely pictorial, but my compositional aim was to bring the listener into the action through plausible environments, engaging ‘actors’  and a strong sense of cross-modality.

This is particularly evident in Counterattack, where the physicality of sound becomes slightly more ominous through careful but strong use of the Low-frequency effects channel (LFE or sub). Low frequencies are dark and immobile. Set against a ‘wash’ generated through varying (and multiple) reverberation plug-ins on pairs of channels a ‘fake’ expansive space can be created. Low frequencies alone go part way to creating a horizontal plane or a sense of depth. Some sort of high-frequency biased or coloured noise is also required – a kind of ‘air’ to provide enclosure, creating an upper canopy. This unoccupied space could be felt as ‘cold’. Within this nebulous space, it is aesthetically viable and compositionally practical to introduce more obvious ‘forces’. Acousmatic music again becomes a ‘cinema for the ear’ as a real scene emerges and a narrative develops.

Conclusions

Redundancy… again

And so it is that narrative can begin to dictate structure over materials; form over content. Until, that is, the narrative (an energy of sorts) wanes and materials must, once again, sustain the momentum and direct the composition. The listener, if allowed time to make sense of what they hear and encouraged to engage cross-modal sensitivities by being surrounded with sound, becomes less led by the hand and more excited to explore. Multi-channel environments make this more achievable. Not only can the listener be in ‘search’ mode (with numerous streams of sound separated or balanced by loudspeaker groups) but the independence of the channels can be used to ‘project’ rather than ‘diffuse’ sound. A soloist singing a completely different tune may appear on the centre loudspeaker surrounded by an unnatural environment. Our slab becomes plausible and can become an object in itself.

References

Bregman, A. S. (1994). Auditory scene analysis: the perceptual organization of sound. Cambridge, Massachusetts: MIT Press.

Clarke, A. C. (2010). 2001: a space odyssey. Hachette UK.

Gaver, W. W. (1993a). How do we hear in the world? explorations in ecological acoustics. Ecological Psychology, 5(4), 285-313.

Gaver, W.  W. (1993b). What in the world do we hear? An ecological approach to auditory event perception. Ecological Psychology, 5(1), 1-29.

Gibson, J. J. (1986). The ecological approach to visual perception. London: Routledge.

Godøy, R. I. (2003).  Motor-mimetic music cognition. Leonardo, 36(4), 317-319.

Godøy, R. I. (2006). Gestural-sonorous objects: embodied extensions of Schaeffer’s conceptual apparatus. Organised Sound, 11(2), 149-157.

Godøy, R. I. & Leman, M. (2010). Musical gestures: sound, movement, and meaning. London: Routledge.

Kubrick, S. (Producer & Director). (1968). 2001: a space odyssey [Motion picure]. USA: Metro-Goldwyn-Mayer.

Lennox, P. & Myatt,  T. (2011, June). Perceptual cartoonification in multi-spatial sound systems. Paper presented at The 17th International Conference on Auditory Display (ICAD-2011), Budapest, Hungary.

Pratt, C. C. (1930). The spatial character of high and low tones. Journal of Experimental Psychology, 13(3),278.

Rusconi, E., Kwan, B., Giordano, B. L., Umilta, C., & Butterworth, B. (2006). Spatial representation of pitch height: the smarc effect. Cognition, 99(2), 113-129.

Smalley, D. (1996). The listening imagination: listening in the electroacoustic era. Contemporary Music Review, 13(2), 77–107.

Smalley, D. (2007). Space-form and the acousmatic image. Organised Sound, 12(1), 35-58.

Wishart, T. (1986). The relation of language to materials. In S. Emmerson, (Ed.), The language of electroacoustic music (pp. 41-60). London: Macmillan.