Designing auditory-tactile perception of music

by SEBASTIAN MERCHEL, M. ERCAN ALTINSOY

Background

The main hypothesis to be evaluated was that vibrations, perceived via the vibro-tactile modality, would be important for the perception of music. These vibrations can be excited directly via the air or via the surfaces that are in contact with the listener. This study focuses on seat vibrations, such as the vibrations that can be perceived in a church or a classical concert hall (Merchel & Altinsoy, 2013). To test the above hypothesis, sound and vibrations were controlled separately in a laboratory experiment. The vibration signals were generated from DVD audio recordings using different approaches, which are described in detail in the doctoral thesis of the first author (Merchel, 2014). In this paper only one approach (substitute signals) will be discussed, which is related to frequency discrimination. Therefore, the differential sensitivity of the auditory and tactile frequency perception will be described first.

Auditory and Tactile Frequency Discrimination

One of the most evident differences between both auditory and tactile modalities is the dramatically reduced ability to distinguish between vibration frequencies in the tactile domain. One of the fundamental characteristics of the auditory system is its ability to discriminate between different frequencies. Just-noticeable differences in frequency (JNDFs) smaller than 1 Hz can be perceived at low frequencies. Figure 1 summarizes the data from various laboratories. In audition, stimuli frequency is related to pitch perception. The total number of perceptible pitch steps in the range of human hearing is approximately 1,400 (Olson, 1967).

Fig 1

Figure 1. Auditory and tactile thresholds for frequency discrimination of subsequent sinusoids (stimuli length > 200 ms) as a function of base frequency. The results from several studies are plotted for each modality.

Tactile JNDFs obtained in various studies are plotted for comparison in Figure 1. The difference limens of tactile frequency discrimination at different body sites are much higher compared to those in audition. Still, in the tactile domain, it is also possible to sort vibrations on a frequency-related scale. However, von Békésy (1957) already noted that there is not a high degree of similarity between pitch sensation in hearing and on the skin.

In summary, tactile frequency perception is weak, and thus, pitch might be of minor importance in the process of vibration generation from music. For instance, it might be possible to strongly modify the frequency content of a vibration signal without influencing overall musical perception. This will be evaluated further in the following.

Method

Vibration Generation: Substitute Signals Approach

The simplest approach by which to generate musical vibrations would be to low-pass filter the audio signal and route it directly to the vibration actuator. However, this has several disadvantages such as possible sound radiation of the shaker at higher frequencies. Therefore, the frequency content of the vibration signal is modified using several substitute signals. Figure 2 illustrates the signal-processing chain. A signal generator was implemented in Pure Data (Pd) to produce continuous sinusoidal tones at 20, 40, 80, and 160 Hz. The frequencies were selected to span a broad frequency range and to be clearly distinguishable considering the JNDFs for vibrations discussed above. Additionally, a condition was included using white Gaussian noise (WGN), low-pass-filtered at 100 Hz. These substitute signals were further multiplied with the envelope of the original low-pass-filtered signal to retain the timing information. An envelope follower was implemented, which calculated the root mean square (RMS) amplitude of the input signal using successive analysis windows. Hann windows were applied, and the window size was set to 1,024 samples, which corresponded to approximately 21 ms, to avoid smearing the impulsive signal content. The period for successive analysis was half of the window size.

Fig 2

Figure 2. Signal processing to generate vibration signals from the audio sum. The envelope of the low-pass-filtered signal was extracted and multiplied with substitute signals, such as sinusoids at 20, 40, 80, and 160 Hz or white noise.

Setup

The generated vibration signals were reproduced vertically using an electrodynamic vibration seat. For details on the construction, individual calibration and specifications of this shaker please refer to Merchel (2014).

Additionally, a surround setup was used, according to ITU-R BS.775-1 (1992), with five Genelec 8040A loudspeakers and a Genelec 7060B subwoofer. The system was equalized to a flat frequency response at the listener position. To place the subject in a standard multimedia reproduction context, an accompanying picture from the DVD was projected onto a silver screen. The video sequence showed the stage, conductor, or individual instrumentalists while playing.

Participants

Twenty participants were voluntarily recruited for this experiment (14 male, 6 female). Most of them were students between 20 and 55 years old (mean = 24 years) and between 58 and 115 kg (mean = 75 kg). None of the participants had hearing or spine damage. The average number of self-reported concert visits per year was nine and ranged from one to approximately 100. Two participants were members of bands. The preferred music styles varied, ranging from rock and pop to classical and jazz.

Fifteen participants had not been involved in music-related experiments before, whereas five had already participated in two similar pilot experiments (Merchel & Altinsoy, 2008; 2009).

Stimuli

To represent typical concert situations for both classical and modern music, four sequences were selected from music DVDs (Wischmann (Director), Smaczny & Atteln (Producers), 2000; Mirow (Director), Koppehele & Koppehele (Producer), 2006;  Wübbolt (Director) & Smaczny (Producer), 2007; Blue Man Group Records, 2003) that included low-frequency content. A stimulus length of approximately 1.5 min was chosen to ensure that the participants had sufficient time to become familiar with each stimulus. The following sequences were selected:

  • Bach, Toccata in D minor (church organ)
  • Verdi, Messa da Requiem, Dies Irae (kettledrum, contrabass)
  • Dvořák, Slavonic Dance No. 2 in E minor, op. 72 (contrabass)
  • Blue Man Group, The Complex, Sing Along (bass, percussion, kick drum)

The first piece, Toccata in D minor, is a well-known organ work that is hereby referred to as BACH. It contains a rising and falling succession of notes covering a broad frequency range. Additionally, steady-state tones with a rich overtone spectrum dominate the composition.

The second sequence, Dies Irae, is abbreviated as VERDI. It is a terrifying composition for double choir and orchestra. Impulsive fortissimo sections with a concert bass drum, kettledrum, and tutti orchestra alternate quickly with sections that are dominated by the choir, bowed instruments, and brass winds. The sequence is characterized by strong transients.

The third stimulus, Slavonic Dance No. 2 in E minor, is referred to as DVORAK. It is a calm orchestral piece, dominated by bowed and plucked strings. Contrabasses and cellos continuously generate low frequencies at a low level.

The fourth and final sequence, Sing Along, is a typical pop music example. It is performed by the Blue Man Group, which is further shortened to BMG. The sequence is characterized by the heavy use of drums and percussion. These instruments generate transient content at low frequencies. Additionally, a bass line can be easily identified.

To generate a vibration signal from these sequences, the mono sum of all audio channels was calculated using Pd.

Experimental Design

The concert recordings were played back to each participant using the setup described above. Additional vibrations were reproduced using the vibration chair. The vibration intensities were initially adjusted so that the peak acceleration levels reached approximately 100 dB and were thus clearly perceptible. However, the vibration level could be varied easily if such a reproduction system were to be implemented in a home audio system. Additionally, the perception thresholds for vibrations varied between subjects. Therefore, each subject was asked to adjust the vibration amplitude individually to the preferred level. This adjustment was typically performed within the first 5-10 s of a sequence. Subsequently, the subject had to judge the overall quality of the concert experience using a quasi-continuous scale (Figure 3). Verbal anchor points ranging from bad to excellent were added, similar to the method described in ITU-T P.800 (1996). To prevent dissatisfaction, the subject could interrupt the current stimulus as soon as she/he was confident with her/his judgment.

The required time varied between subjects, from 30 s to typically no more than 60 s. The stimuli were randomized and divided into blocks of eight stimuli. After each block, participants had the opportunity to relax before continuing with the experiment. Typically, it took 45 min at most to complete three blocks. Before starting the experiment, participants underwent training with three stimuli in order to become familiar with the task and stimulus variations.

Fig 3

Figure 3. Rating scale for evaluation of the overall quality of the concert experience.

Results

The quality scores, using substitute vibration signals, are presented in Figure 4 (see below).

An ANOVA was applied for the statistical analysis. All of the substitute vibrations, except for the 20 Hz condition, were judged to be better than reproduction without vibration at a highly significant level (p < .010). The average differences, compared with the no-vibration condition, were between 29 scale units for the 40 Hz vibration and 18 scale units for WGN and the 160 Hz vibration. There was no significant difference between the 20 Hz vibration and the no-vibration condition. The subjects indicated that the 20 Hz vibration was too low in frequency and did not fit with the audio content. In contrast, 40 and 80 Hz appeared to fit well. No complaints about a mismatch between sound and vibration were noted. Interestingly, even the 160 Hz vibration resulted in fair quality ratings. However, a trend toward worse judgments, compared with the 80 Hz condition, was observed (p ≈ .110). A much stronger effect was expected because this vibration frequency is relatively high, and tingling effects can occur. There was some disagreement between participants, as manifest in the slightly larger confidence intervals for this condition. Even more interestingly, the reproduction of WGN resulted in fair quality ratings. However, this condition was still judged to be slightly worse than the 40 and 80 Hz vibrations (average difference = 11, p < .050). The effect was strongest for the BACH sequence, which resulted in poor quality judgments (i.e. significant interaction between sequence and treatment, p < .010). The BACH sequence contained long tones that lasted for several seconds, which did not fit with the `rattling’ vibrations excited by the noise. In contrast, in the BMG, DVORAK, and VERDI sequences, impulses, and short tones resulted in brief vibration bursts of white noise, which felt less like ‘rattling’. Nevertheless, the character of the bursts was different from sinusoidal excitation. In the BMG sequence in particular, the amplitude of the transient vibrations generated by the bass drum varied depending on the random section of the noise. This finding is most likely one of the reasons why the quality judgment for BMG in the noise condition tended to be worse compared, for example, with using a 40 Hz vibration.

Fig 4

Figure 4. Mean overall quality evaluation for reproduction, using different substitute vibration-generation approaches, plotted with 95 % confidence intervals (0 = low quality, 100 = high quality)

Conclusions

Vibrations are found to play a significant role in the perception of music. Fundamental knowledge about auditory and tactile perception can be used to develop and evaluate perceptually optimized approaches to generate vibrations from music sequences. Even simple vibration signals can improve the perceived quality of the concert experience. For the tested sequences, amplitude-modulated sinusoids at 40 and 80 Hz worked well. This would allow for the use of simple and inexpensive narrow-band inertial vibration actuators in audio reproduction scenarios. However, many more factors need to be considered (Merchel, 2014). The results may be applied in order to improve audio reproduction systems or even concert halls.

Notes

Address for correspondence: Sebastian Merchel, Dresden University of Technology, 01062 Dresden,Germany.

Email: sebastian.merchel@tu-dresden.de.

References

Békésy, G. von. (1957). Neural volleys and the similarity between some sensations produced by tones and by skin vibrations. The Journal of the Acoustical Society of America, 29(10), 1059-1069.

Blue Man Group Records (2003). The complex rock tour live [DVD]. USA: Warner Music Group.

ITU-R BS.775-1. (1992). Multichannel stereophonic sound system with and without accompanying picture. International Telecommunication Union.

ITU-T P.800. (1996). Methods for objective and subjective assessment of quality. International Telecommunication Union.

Merchel, S. (2014). Auditory-tactile music perception. Aachen, Germany: Shaker Verlag.

Merchel, S., & Altinsoy, M. E. (2008). 5.1 oder 5.2 Surround – Ist Surround taktil erweiterbar? In Proceedings of DAGA 2008 – 34th German Annual Conference on Acoustics, Dresden, Germany.

Merchel, S., & Altinsoy, M. E. (2009). Vibratory and acoustical factors in multimodal reproduction of concert DVDs. In Haptic and Audio Interaction Design. Berlin, Germany: Springer.

Merchel, S., & Altinsoy, M. E. (2013). Music-induced vibrations in a concert hall and a church. Archives of Acoustics, 38(1), 13-18.

Mirow, B. (Director), Koppehele, M., & Koppehele, G. (Producers). (2006). Messa da Requiem – Giuseppe Verdi conducted by Placido Domingo [DVD]. Germany: Glor Music Production.

Olson, H. F. (1967). Music, physics and engineering (2nd ed., p. 460). Mineola, USA: Dover Publications.

Wischmann C. (Director), Smaczny, P., & Atteln, G. (Producers). (2000). Ton Koopman plays Bach [DVD]. Germany: EuroArts Music International.

Wübbolt, G. (Director), Smaczny, P. (Producer). (2007). Kurt Masur – Eine Geburtstagsgala [DVD]. Germany: MDR Fernsehen & EuroArts Music International.