The future of audio quality

11 Dec

Shame on me. It’s been weeks and weeks without writing anything in my blog, I know… I’ve been a bit snowed under assignments and general work. Also some shit head, but the important thing is that I’m here again writing.

Although I have some more posts on the shelf that I’ll be delivering in the next days, this one comes across as even more urgent than the previous ones. The other day I retweeted what I thought to be an interesting post from TheProAudioFiles on three good reasons to Record Music at 88.2 kHz and I got two retweets from two other audio enthusiasts and also a reply from Mr.  Kim LaJoie, asking for clarification on this topic by an authoritative source like an algorhythm designer. By browsing his website, I reached to his blog, and I found it to be literally packed with great advice on recording and mixing techniques that I hope to be putting to use soon. My question is now: would I have been able to know this if Twitter hadn’t been there? Probably not. Another perfect example of the power of Social Media.

Also, during the Audio and Video Research Methods module at our Master, we’ve been able to carry out some subjective testing for our mini-research projects. Most of them were about subjective audio perception and more precisely, about if we are able to hear any difference between several mp3 encoding bitrates and WAV files. To make it short, all of them showed that we are able to hear the difference between a 64, 96 and 128-kbps encoded mp3 vs. WAV file, but when a 320 kbps encoded mp3 was presented against a WAV file… well, no one could tell the difference between them. This was more or less expected, right?

But one of the projects that caught my attention was that one by Luke Harrison on the preference of mp3 vs. WAV format by young people (16 to 19 years) for a number of different music genres. Although the results deemed inconclusive because of the need for more subjects to be statistically relevant, it was surprising to see the fact that most of them prefered the mp3 version of a classical music sample because it sounded “less harsh and brightey” compared to the WAV one.  This is, when comparing a high-frequency chopped mp3 against a file which contains all the audio information present on the recording, the youngsters prefered the mp3 version. This shows that despite sound perception is something universal, it also relies heavily on education. 

Evolution of portable players (photo by Sifter), under CC BY-NC 2.0 license

What this is showing is that a whole generation of people are now growing up with mp3 as their standard for audio quality, in opposition to the previous generations who have had vinyl records, cassettes, BETA films and closer to the present time, CDs and DVDs as standards for quality. Of all these, CD quality is objectively the best in terms of frequency range reproduction, dynamic range, signal-to-noise ratio, storing space and general usage lifetime. But now, mp3 and all sorts of compressed audio for streaming (Youtube,  Spotify, GoEar, BBC iPlayer…) are the new standard for audio quality. Thus, our perception of audio is being actually clouded by the latest achievements in compressing sound for a fast and reliable streaming service and to save storage space.  We have to compress sound as a mean of delivering video or sound over a long distance, but not because it is the best thing, but because it is only a compromise solution in terms of quality. The whole compression process of music and speech for transmission dates back to 1977 with Prof. Karlheinz Brandenburg at the Fraunhofer Institute, and it wasn’t until 1991 that its first draft was completed.

Since then, more compression algorhythms have been developed like MP4, AAC, OGG and Microsoft’s propietary Windows Media Audio. But despite all of that, it seems like the standard format for portable audio reproduction has been set to mp3. In fact, this is not bad if we know what each thing is for. Mp3 is perfect for listening while you are in the streets, running or doing things around the house. The background noise will make you not really distinguish the trickyest bits of a tune, so encoding is not that important. And if we are going to be listening to music in outdoors situations most of the time, then saving space in our mp3 player will be crucial for us, reducing the bitrate of the mp3 and then also reducing the sound quality that we perceive. But, as I said, we’re not going to notice it if we’re outside with all kinds of background noise, thus having a small signal-to-noise ratio. Convenience wins here.

Of course, this is not the same situation when you are sitting at home in front of your impressive Hi-Fi setup waiting to hear a full classical music album. If you are going to plug an iPod to your HiFi to listen to that great album in full, you’d better make sure that it has been compressed at full quality 320 kbps, separate stereo, etc. Or even better, if you have a player that can handle it, go with lossless compression like FLAC, APE or MPC. Files will take a lot more space in your player, but they are the exact copy of a WAV file taking less space than them (by the way, why iPod doesn’t support FLAC format?). In the end, it all depends on the use we are going to make of the files.

So, the way of consuming audio has definitely changed in the last 15 years. Nowadays, almost anyone sits down at home to listen to a full album from start to end. We might purchase it, download it and listen to it while commuting to work or whatever. And the sound quality that we get is as good as it can get for the specific purpose, but no more than that. But let’s go a bit back in time. It all comes down to the recording and mixing stage of that album. The mixing engineer will have a producer by his side (if the engineer isn’t the producer himself) telling him that he’s got to mix the album so that it does sound AWESOME in any mp3 player or crappy radio out there. So you will have to mix according to the main reproduction devices of your “target market”, using lots of weird EQing, compression and limiting… thus squashing the dynamic range of the recording. The result? Those recordings will sound good in mp3 players, but certainly not in any decent quality HiFi or studio system at all.

We should start asking ourselves then where do we want to take audio to in the near future. It’s ironic that even when better audio equipment in the studios, better compression algorhythms and increasingly higher capacity storage mediums are becoming cheaper each and every year, the dynamic range of recordings (except some honorable ones) is decreasing and we conform ourselves with the lowest quality formats for audio files. But despite that, there are also lots of people concerned about this issue (called audiophiles, if you wish) that have been demanding high-quality lossless audio files in digital music stores such as iTunes.  And they are being listened to.

For example, Paul McCartney’s album “Band on the run” was remastered last year and set to regular sale and online download in the usual formats but for the first time, what made the difference was that HearMusic gave the opportunity for everone to download the FLAC format, 24-bit, 96 kHz resolution mix without peak limiting. Just as the master mix, without any kind of limiting or dynamic compression for CD reproduction. How good is that? Even Apple decided to start turning their eyes to the Beatles audiophile fans when they decided to release the whole Beatles catalog remastered in 2009, in 24-bit, 44.1 kHz FLAC files in a USB stick. 

What I meant with this post is to shine a light on that pessimistic people who think that there is no future for good quality audio nowadays. These last examples are showing us that it’s not true. What we are experiencing is just an opening in the mind frame of users, who are able to actually choose what suits them better: convenience, really good audio quality or both. And I also think that mixing engineers will be given the opportunity by wise record companies of producing two different audio mixes if needs be: one for the mp3 market, and another one for the audio-concerned music lovers. There are already lots of audio sites offering online downloads of high resolution albums, showing us that there is a market for this kind of product.

Only time will tell, but I think that we have reached a point where big advances are being made in terms of audio quality due to the increase in listeners’ concerns. One only has to remember that 3 years agoMetallica’s “Death Magnetic” distorted CD audio due to excessive compression caused online petitions to remix it again, given that a different, non-compressed mix of the some songs of that album were used in a PS3 game, and they sounded WAY better than the “official release”.  Music lovers and listeners, often called consumers by record companies, have started to educate themselves about what they shouldn’t hear. And that’s the best start.


5 Responses to “The future of audio quality”

  1. worsleyjoe December 12, 2011 at 17:07 #

    Interesting read here Jorge. What do you think the implications are of this generation of compressed audio lovers? Surely the music industry has considered this and is adapting?

    You’re right about FLAC as well. I have a few of my favourite songs/albums in this format and would love to have them on my iPod.

    • jorgepolvorinos December 13, 2011 at 17:30 #

      I think that we are all becoming increasingly used to compressed music because it’s the most convenient way of listening to music that we can find nowadays. The only solution for this? A bit of education. But I guess it might not be easy to tell a teenager how mp3 works and why does it sound the way it sounds.
      For the sake of this, it’d be really good to take a step back and go watch a classical music concert in our town, or a choir in a church for Christmas to remind us how real music should sound like.
      For us audio geeks, it’s also very good to have a listen to reference albums known to be carefully crafted and mastered such as some of the ones that we can find in Bob Katz’s Honor Roll webpage. There is a way of recording and mixing that is really creative and organic.

      About FLAC, although it’s great, it takes a lot of space but it’s good to have albums like those ones in that format. Whether Apple updates the firmware on the iPods to support this format or not, only time will tell!

  2. Hosea Kurnianta December 19, 2011 at 09:01 #

    Great analysis Jorge!

    In my opinion, space or file size is the main drawbacks for people not to listen to the wav format. Since the first iPod came out in 2001, the maximum capacity is only 5GB. And now, in 10 years time, it can store as much as 160GB. But since then, the file size for both mp3 and wav did not differ much for each songs.

    Who knows in 5 years time Apple came out with iPod that has 500GB capacity. Perhaps in the near future where big space is not as expensive as now, people start to listen to wav instead of mp3 and can store as much songs as they want.

  3. Dmitryi S. Sizónenco December 4, 2012 at 16:36 #

    You know, it’s not all that simple. For instance, CD-quality audio is noticeably slower and harsher than tape cassettes. You can have the files as proof, but basically, 44 KHz is a very low-res format that just happens to barely fit psychoacoustic thresholds of resolution.
    The reason why young folks chose MP3 over 44/16 is simple – it’s not as harsh. MP3 compression lowpasses, cutting out the harshest parts of CD audio – the high frequencies. It’ll then once again rip out the bands of frequencies which are less likely to be heard. The result, ironically, is darker, less spatious, but also warmer sound than the original CD, lacking a lot of what makes CDs harsh (by-the-by, MP3 files compressed with LAME also tend to have less of a square waveform in case of poorly mastered over-loud CDs).

    Long story short, though, CD is not objectively the best format. Far from it. 16-bit sounds hollow and cold and dead at lower loudness (16 bits are only assigned to the roughly 0 to -6 dB range). There’s a bucketload of reasons why people actually tolerate CDs (marketing brainwash of “perfect sound” just the major reason), but, for one, the loudness war is caused by this lack of resolution – when mastering for CDs, everything has to be squeezed into the top 12 dB, otherwise it sounds cold/hollow. Realistically, whatever has to be sounding tolerably warm has to be in the -6 to 0 dB range. That’s just because less voltage divisions at lower loudness means everything starts to blur together more and more, and actually what’s warm is harmonic detail and speed.

    No such limitations with decent formats like 96/24 and 192/24. And let’s not forget that pretty much any decent-sounding device nowadays oversamples even CD-rate audio. Real 44 KHz format is too slow and dull. Not enough transitions/second. 4X oversampling is common for most CD players nowadays, and anything that plays digital audio (including Apple players, by the sound of them).

    There’s more to it all, but basically, 44/16 being an acceptable format for music is a huge swindle.

    • jorgepolvorinos December 4, 2012 at 19:15 #

      First of all,

      CD-quality audio is noticeably slower and harsher than tape cassettes

      . I don’t know what “harsh” is to your ear, but tape cassettes are the less fidelity format existing, much worse than vinyl and of course, worse than CD. Typical frequency responses of cassettes, in their different kinds are: Normal: 14 KHz; Chrome Dioxide: 16.5 KHz; Metal: 19 KHz. I’m not talking about the noise reduction here. About Signal to Noise ratio, 16-bits give you 96dB dynamic range, 24-bits give you 144dB dynamic range while a Dolby B cassette gives you less than 60dB.

      Of course, recording at 48 or 96 kHz sampling frequency and 24 bit depth is the way to go when you are going to do serious signal processing that will need mathematical operations (bits) to perform the required processing so as not to overflow and thus, distort samples. This means, when you are mixing and mastering, absolutely, go for it.

      But as a final format, I still think that CD is acceptable, and it is very far from being a “huge swindle” as you claim, a bit out of place in my humble opinion. If you know about audio, and not only the crapola that audiophiles try to sell in their market, you know that good audio for CD is possible. Let me show you how. Let’s suppose that our starting point is a 48 kHz/24 bit master. We want to convert it to 44.1kHz/16 bit format for CD distribution.

      First of all, we have to apply resampling. Thus, we have to make sure that the resampling process is good, so we have to watch out for:
      – Aliasing products in the resampler
      – Intermodulation distortion
      I recommend you to access in order to learn a bit about some freeware resamplers and well do they fare for this purposes. They results are quite incredible. Also, if you want to compare the performances of various pieces of software capable of resampling, check out this website:

      Then, for the dithering process! The regular triangular dither adds a low-level broadband noise signal, similar to that produced by the 16th bit. But there is a pyschoacoustically noise-shaped dither which subjectively sounds quieter than simple triangular dither because it takes advantage of the ear’s non-linear frequency response to low-level sounds.

      The fact that a CD might not sound appealing to you is a matter of taste more than a matter of fact, as you rightly describe it with adjectives such as “cold”, “shallow”, “dead”, “slow” and “dull”. What you might prefer is the sound of a tube audio amplifier or a tape machine, because they produce even harmonics, that sound “warmer” to your ear… but that were not in the original place in there. Good digital sound is more clinical in that aspect, as it is intended not to produce any harmonic distortion or coloring to the sound. Note that I am saying GOOD DIGITAL: good performance of anti-aliasing filters, good performance of A/D converters, great slew rate of the converters, etc. Good digital sounds like the original sound, and that might not be for everybody, that’s true. We have grown accostumed to the artifacts created by tape and tube and we’ve learnt to love them.

      Also, there are countless proofs that mastering at reasonable levels can still sound warm and appealing. Check out Bob Katz’s Honor of Roll website, with the best mastered records from different eras and their equivalent loudness in the K-system:
      As Bob himself says when reviewing the album “Stop making sense” by Talking Heads (1984):

      This 16-bit CD illustrates that there is no noise floor problem at high monitor gains and that it is a myth that 16-bit CDs have to be compressed or limited to fit in the medium! After all, CDs have a measurable 115 dB dynamic range (properly dithered)–noise floor is NOT a problem.

      The only thing that gives CDs their bad image, in my opinion, is early digital recordings from the 80s and early 90s, when A/D converters were far from the quality that we can achieve today. They have improved a ton in the last 15 years. So that is not to despise good digital sounds.

      To end up with this, I would love to create and direct a double-blinded ABX test in my University’s Listening room or the anechoic room, with studio monitors and instrumentation amplifiers. We could compare a sample of a well-mastered album in high resolution, 96/24, with the same sample properly resampled and dithered to 44.1/16. Get at least 20 subjects to do the test there, and see what the results tell us. Only that way we could finally cut this “preference” thing off. Let’s talk with facts and figures in our hands, not with prejudices or presumptions.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: