The influence of image compression rate on perceived audio quality in music video-clips

by DAVID HAMMERSCHMIDT, CLEMENS WÖLLNER

Background

Due to digital formats, their possible compression and various internet portals, the possibilities and habits of music production and perception have fundamentally changed. Especially in popular music genres an increasing number of music videos are produced to raise more awareness of the performers. Research on intermodal integration of video and sound suggest a reciprocal influence of these modalities on perception (Beerends & de Caluwe, 1999; You, Reiter, Hannuksela, Gabbouj, & Perkis, 2010; Ernst & Rohde, 2012). Although the combination of music with visual components has a long tradition predating the internet, the widespread use of music videos has multiplied with the internet, leading to an increased need for compressed digital formats. Therefore, the ratio between possible audiovisual data reductions and the perception of visual and auditory quality loss has to be balanced.

Current study

This explorative study investigates the extent to which the quality of compressed visual information affects the perceived audio quality of music videos. Participants judged the auditory quality as well as the visual and overall quality of a live music-video in individually randomized single trials. The video was presented in three different compression rates for the visual quality and a constant compression rate for the audio quality. The randomization of the trials was done with Matlab. The stimuli with different compression rates were presented in paired-comparison trials (sequence A vs. sequence B); the study was carried out according to the within-subject design. The equipment used for the study was a 13.3-inch LED backlit glossy widescreen LCD (1280 × 800 pixel resolution) and Beyerdynamic DT-880 Pro headphones.

Compression rates

A live music-video was edited to a 25sec sequence in Adobe Premiere. Three different versions of the stimulus were then generated according to three visual compression rates using H.264 video-codec: 1280p x 720p (pixels), 480p x 360p, and 320p x 240p with a frame rate of 30 fps and 24 fps for the strongest compression rate (320p x 240p). The audio quality remained constant at 320 kbit/s with a sample rate of 48 kHz for all versions using the AAC audio-codec. Additionally, ‘catch-trials’ were generated in which the audio quality was distinctly compressed to 32 kbit/s as well as a comparison with identical sequences (320p x 240p, 320 kbit/s). This served to indicate whether differences in the quality in general and especially of the auditory material were indeed perceived and evaluated by the participants. In total, nine paired-comparison trials had to be judged by each participant.

Stimulus properties

The study used a live recording of the song ‘Life Wasted’ by Pearl Jam. The video can be classified as a performance video, which is the dominant form of visual representation of music. It represents the basis of grading the music video typologies (Jost, Klug, Schmidt, Reautschnig, & Neumann-Braun, 2013). It can be characterized as the synchronous visual realization of what is heard in form of representation of a musical performance. Place, time and action are homogeneous. The image appears as a complete source of the sound. The tone determines the images and thus the greatest possible focus is placed on the sound when perceiving audio-visual music.

The music itself can be classified as Rock/Pop with a typical instrumentation and sound for that genre (2x E-Guitar, E-Bass, Drums, Vocals). The chosen sequence includes 4 chords (F#, C#, E, B), in 4/4 time at 152 bpm.

Participants

11 men and 9 women participated in this study with a mean age of 26.3 years. All of them had musical experience with an average of 12.05 years of musical training. Forty per cent of them knew the performing band and sixty-five per cent % reported to like the song.

Results

Audio Quality

Results revealed a significant influence of the visual quality on perceived (yet constant) audio quality. Compared to videos with the highest visual quality (1280p x 720p), the perceived audio quality was rated significantly lower for videos with 320p x 240p (t (19) = 3.78, p < .005, d = .84) and 480p x 360p (t (19) = 2.45, p < .050, d = 0.55). In comparison, the compression rates of 480p x 360p and 320p x 240p did not show significant differences in audio ratings, though participants tended to judge the quality of the less compressed video (480p x 360p) as better (t (19) = 1.64, p = .118) (Figure 1). As expected, the quality assessment of the audio-compressed catch trials was lower (ps < .001).

Fig 1

Figure 1. Judgments of audio quality for pair-wise comparisons and deviation of the judgments from 0 (‘No difference’). Error bars indicate 95% confidence intervals. Compression rates are as follows: 720p: 1280p x 720p, 360p: 480p x 360p, 240p: 320p x 240p.

Visual Quality

As expected, the visual quality was rated lower for the compressed videos. The video with the highest image resolution (1280p x 720p) was evaluated significantly better in comparison to the video with 320p x 240p (t (19) = 8.73, p < .001) and 480p x 360p (t (19) = 6.10, p < .001). Similarly, the video with a compression rate of 480p x 360p was rated better quality than the one with 320p x 240p (t (19) = 2.71, p < .050). It can thus be stated that the participants have discriminated between all different video image qualities (Figure 2).

Fig 2

Figure 2. Evaluation of visual quality and deviation from 0 (‘No difference’). Error bars indicate 95% confidence intervals.

Overall Quality

The results of the evaluations of the overall quality are clearly in favor of the higher compression rates. Participants rated the sequence 1280p x 720p compared to 320p x 240p significantly better (t (19) = 10.52, p < .001). Equally clear is the feedback for the comparison 1280p x 720p and 480 x 360p (t (19) = 4.29, p < .001) (Figure 3).

Fig 3

Figure 3. Evaluation of overall quality: deviation of the individual comparisons to the value 0 (‘No difference’). Error bars: 95% confidence interval.

Conclusions

It can be concluded that the visual layer significantly influences the perception of auditory quality in music videos. Therefore, the potential for data compression might be limited in music videos, if recipients should be encouraged to purchase the music. The study shows another aspect of the intermodal integration of video and music. Even in the audible domain of music, visual components influence the perception of audio quality. In this study, musically-experienced participants were able to detect image differences clearly, which is an important prerequisite for their influence on perceived audio quality. Therefore, it is not surprising that no evaluation trends are observable for the pair-wise comparisons with identical visual quality. Additional catch-trials with different audio compressions were provided in order to ensure that participants actually assessed the stimuli according to audio quality.

Further studies may determine the exact level of compression for which the visual quality affects the perceived audio quality for music, and may also examine differences between various forms of representation. In future, a multifactorial study will be carried out with different compression rates for both sensory modalities to obtain a more accurate perspective of the relationship and mutual influence between visual and auditory information in music videos. 

Notes

Address for correspondence: David Hammerschmidt, University of Hamburg, Institute of Systematic Musicology, Neue Rabenstr. 13, 20354, Hamburg, Germany. Email: davidhammerschmidt@gmx.de.

References

Beerends, J. G. & de Caluwe, F. E. (1999). The influence of video quality on perceived audio quality and vice versa. Journal of the Audio Engineering Society47(5), 355-362.

Ernest, M. O. & Rohde, M. (2012). Multimodale Objektwahrnehmung. In H. O. Karnath, P. Thier. (Eds.), Kognitive Neurowissenschaften (pp. 139-137). Berlin: Springer.

Jost, C., Klug, D., Schmidt, A., Reautschnig, A., & Neumann-Braun, K. (2013). Zur historischen, ästhetischen und systematischen Verrottung des Musikvideos als paradigmatischen Fall der Audiovision. Computergestützte Analyse von audiovisuellen Medienprodukten (Qualitative Sozialforschung Vol. 22; pp. 7-17) Wiesbaden: Springer.

You, J., Reiter, U., Hannuksela, M., Gabbouj, M., & Perkis, A. (2010). Perceptual-based quality assessment for audio-visual services: a survey. Signal Processing: Image Communication, 25, 482-501.