recently generated music waveform data through a crude, simple averaging function that can only generate straight line segments. Furthermore, where straight line segments meet at different angles there are abrupt corners, and we don't want to artificially introduce sharp corners into the music waveform that should be continuously curving.
Two data samples can define a straight line, so any averaging calculation that takes into account only two data points at a time will always produce the straight line segments we don't want. You need at least three data points to define any kind of curve. So that means we need to have our computer calculate more complex averages than our simple example, averages which take into account at least three data points at a time. And, if we want to calculate a better fitting, possibly more complex curve shape that better represents the originally calculated music waveform data points, then we should take more than three data points at a time, and we should have the computer perform more complex kinds of averaging to do the better curve fitting. Indeed, we could even instruct the computer to calculate averages of averages, in higher order multi-layer or recirculating calculations. We call this high power averaging. Now of course we're getting into some serious calculation algorithm complexity and some serious computing horsepower.
Most of us have done that high school math assignment where you see a statistical scatter of sample points, and your job is to plot the best fitting straight line or smooth curve through those points. The lesson in such problems was that the smooth curve represented the true, actual underlying function or waveform, and that the statistical scatter of points represented random noise or experimental distortions or observation errors superimposed on the true function or waveform. Your job was to see past the noise of the scattered points, and be able to find and calculate the true function hiding among the noisy scatter.
High power averaging does much the same thing. High power averaging attempts to see past the random scatter effects of noise, errors, and distortion on digital sample points, to find and calculate the smooth (at middle and low frequencies) original music waveform function hidden among the sample points. In so doing, high power averaging's re-calculation, of the music waveform curve put out by the previous digital filter calculations, reduces the various digital errors, noise, and distortions that have contaminated the sample points into a random scatter. By reducing the various errors, noise, and distortions, high power averaging's re-calculation brings us closer to the original music waveform curve, with better accuracy and resolution (at middle and lower frequencies). This improvement in music waveform accuracy and resolution has audible benefits, including better transparency, better inner detail, better stereo imaging, cleaner purity, more accurate musical sound and more natural musicality (CDs sound less like artificial digital, more like real live music, and more like great analog that of course does not suffer from digital errors in the first place).
You might suspect that all this additional computation is heresy, messing with the sacrosanct nature of the original music signal waveform as it came off the CD. But remember, there was no original music waveform that came off the CD, especially above 2 kc. The true music signal waveform had to be reconstituted, re-created, actually generated anew by the calculations of the (approximately) boxcar digital filter, from the sketchy clues coming off the CD. So the so-called true music waveform must of necessity be a product of calculations in your CD player or D-A processor, and thus it is not sacrosanct from calculations. The averaging algorithm effectively comes after the digital filter, and effectively performs re-calculation improvements on the calculations just finished by the digital filter, calculations which just generated the music waveform anew. Note incidentally that there are many acceptable versions of averaging algorithms, and they can be judged by their very different sonic benefits; this is in contrast to the prior digital filter, for which there is only one paradigm ideal of correctness, the boxcar filter shape (an ideal which can be approached with various success in different CD players).
Increasing Resolution and Accuracy
So far we've talked only about expanding the sampling clock speed by oversampling, not about expanding the bit resolution. Once we've expanded the sampling clock speed and have started calculating averages, it then behooves us to also expand the bit resolution, assuming our DAC chip can accept and resolve the added bit depth. The added bits of resolution occur naturally as a byproduct of averaging calculations (for example, the average of 3 and 4 is 3.5, which already gives you finer resolution information with fractional integers). Since our averaging calculations have generated finer resolution data values, we might as well hold on to this finer resolution and pass it on to the DAC chip, rather than throwing it away. If the averaging algorithm is complex enough, it could easily generate calculated data values with 24 bits of resolution from the 16 bit resolution coming off the CD, so we might as well pass all 24 bits on to the DAC chip, assuming it can process 24 bits of resolution.
But do these added bits really represent true increased resolution? Are we truly getting more information, or more accurate information, about the original music signal (which, after all, was only encoded with 16 bits of resolution onto the CD)? The answer is yes, and no. Right off the bat, let's at least note that it certainly doesn't hurt to have the extra bits tossed into the DAC chip's input hopper, so long as the calculations generating these extra bits were done intelligently. Then, let's also note that averaging isn't likely to improve resolution at the top edge of the passband (20 kc for a 44.1 kc CD). Basically, that's because at 20 kc there's only one sample point coming out of the digital filter to define the amplitude of each half cycle of the waveform, so there's only one data point to average at 20 kc, and if you average one data point with itself you don't get any more information than the data point itself.
However, at lower frequencies, i.e. over most of the passband, sophisticated averaging can realize significant improvements in true resolution, and can bring us even closer to re-creating the correct musical waveform that was originally input into the digital system.
The best proof of this is the Sony DSD-SACD system, which claims to deliver 20 bits of resolution for most of its passband (not counting its upper frequencies), and which is winning wide praise for its musical resolution and accuracy (especially at middle and lower frequencies). The true native resolution of the DSD system is only a very crude 8 bits. DSD starts out as a 1 bit system, sampling at 256 times the highest audio frequency to be captured, which gives it the equivalent of merely 8 bits of information content (trading bandwidth for bit depth). But a system with only 8 bit resolution sounds very crude, 256 times worse than ordinary 16 bit resolution CDs. So how on earth does DSD get all the way from a lowly 8 bits, past the 16 bits of ordinary CDs, and up to 20 bit resolution that (at least in some aspects) surpasses CD sound? The answer is averaging. Heavy duty, multi-layered (higher order), complex averaging. DSD employs this high power averaging to truly improve bit resolution over much of its passband. Again, at the highest frequencies of the audio passband, up around 20 kc, there is little improvement, for there are not may sample points to average together, so up at these high frequencies DSD's resolution remains mediocre to poor (and we can easily hear its sonic mediocrity here). But at middle and lower frequencies, there are many sample points to average together, so by employing some high power averaging a waveform curve can be calculated that approaches the original music waveform with very good resolution (for middle and lower frequency musical information).
Now, high power averaging is a dire necessity for improving resolution in low-bit digital systems with crude native resolution like 1 or 8 bits. But common thinking has been that it's not likewise necessary for PCM digital system with adequately high native resolution. After all, since 16 bits are supposedly sufficient to define a music waveform with full audible resolution, down to a s/n ratio of about 96 db, then why bother with averaging? Well, why not borrow this tool that helps low-bit systems? After all, what's good for the goose is also good for the gander. High power averaging can enhance high resolution PCM yet further, just as well as it does enhance low resolution digital systems. High power averaging can enable every digital system to provide an even more accurate re-creation of the original music waveform. It can help every digital system to reveal even more inner detail (with better stereo imaging also, as a result of the extra detail information), to sound even purer, and to sound more naturally musical.
If we arrogantly believe that multi-bit (16 or 20) PCM is intrinsically perfect enough to define a music waveform to audible perfection, then we'll naturally think that nothing can improve this PCM digital system further, not averaging nor any other tactic. But if we humbly acknowledge that even multi-bit PCM is laden with errors and approximations, we can then make the intellectual leap to trying to help PCM in practical ways, perhaps even borrowing tools from elsewhere to help.
The key premise of PCM is that there are enough bits at each sampled instant to sufficiently define the music waveform completely to audible perfection for that instant, without having to rely on any additional information from neighboring instants. But this doesn't mean that PCM is indeed so perfect that it can't be improved further, and audibly so, by also bringing in that additional information from neighboring instants. This is especially beneficial at improving the sound of all middle and lower musical frequencies. Music's middle and low frequencies are inherently spread out over many, many sample instants, so the information content of the musical waveform for these middle and lower frequencies actually exists as spread out over many samples. Since that's the case, why not take advantage of the fact that the actual information content is spread out over many samples? Why not use an averaging algorithm to gather up this information content from many adjacent samples, and calculate the average trend in order to dramatically reduce digital errors and noise and thereby increase resolution, accuracy, transparency, purity, and natural musicality? Why ignore all this extra information content?
PCM's strength over other digital systems (such as DSD's sigma delta system) is that PCM does in theory completely characterize the waveform amplitude at each sample point, and in theory that's all you need to know to define the waveform completely. But in practice there are many flaws in all digital systems including PCM, flaws which are audibly degrading and should be reduced or eliminated in practice. At very high frequencies of any digital system's passband (whether PCM or 1 bit DSD), averaging cannot wreak much if any improvement, since these high frequencies are inherently spread out over only one digital sample (not many samples as middle and low frequencies are), and you get little improvement by just averaging one data point with itself. But at music's middle and lower frequencies, high power averaging can be of enormous benefit, and it can help even strong PCM systems to sound even better, just as well as it helps the truly needy low-bit digital systems to get off the ground.
The algorithm that high power averaging employs, for fitting the best music waveform curve to the scatter of sample points, should respect the high frequency information that's there in the scatter, and leave it there at the inherent native resolution of the digital system. But then, at music's middle and lower frequencies, this same algorithm can beneficially use the information spread among many sample dots in the scatter, to improve the accuracy and resolution of the middle and lower frequency components of the music waveform At frequencies where there is only 1 sample dot per half cycle, there's no additional information to base resolution improvements upon, but as soon as there is more than 1 sample dot, then, depending on power and curve of averaging algorithm, it can begin using information from neighboring sample points to fit a better curve, i.e. calculate the music waveform (already generated by calculation) to even better accuracy.
This algorithm strategy recognizes basically the same phenomenon that compression schemes (MP3, MPEG) do for audio and video, but it then does the opposite thing with this phenomenon. Both recognize that the information for middle and lower frequencies is spread over many sample points. The compression schemes use this finding as an excuse to throw away many sample points, since this information spread over many sample points repeats itself (this throwing away degrades the signal somewhat, but hopefully not too much). In contrast, high power averaging uses this same finding as an excuse to enhance and improve the fidelity of the signal rather than degrade it. Since there is repetitious information among many sample dots, fine, let's put that to advantage by averaging out that repetitious info. Averaging out this information will reduce the random errors and noise that plague each of those sample points, and so will give us much more accurate information than any one of those sample points could individually. The more sample points we can average, the more we can reduce the errors and noise (from the imperfections of digital systems), and thus the closer we can come to re-creating the original signal, with better accuracy and resolution. Since we can average progressively more sample points at progressively lower frequencies, this means that high power averaging can be most sonically spectacular in improving middle and lower frequencies.
Essentially, multi-bit PCM can stand on its own, but it can get even better if we bring to bear upon it some tools borrowed from the world of low-bit digital systems - tools such as high power averaging which are crucial and so helpful in getting these low-bit systems off the ground. We call this concept hybrid PCM, because it combines the strengths of multi-bit PCM with the strengths of tools borrowed from the world of low-bit sigma delta systems. We also discussed hybrid PCM in IAR's earlier article on the dsd Purcell.
The benefits of high power averaging can apply to all digital systems that employ PCM anywhere in their chain. For example, if the medium is a CD, then it is coded in PCM, and its sound can be benefited by high power averaging - even if some elements in the recorder's digitizing ADC or in the playback DAC chip use other coding schemes such as sigma delta.
Incidentally, the original simple interpolation averaging we used above for opening illustration is actually suboptimal in two regards. First, as noted above, its averaging algorithm is too simple, generating only straight lines between two data sample points or simple curves among three, five etc. sample points. Secondly, this simple type of averaging algorithm anchors itself to each of the data points (at a 44.1 kc rate) calculated by the digital filter as the music waveform, and then fills in meaningful values for the three intermediate sampling points (assuming four times oversampling) between each pair of original points. Thus, its curve fitting activity is anchored to every fourth data point in the oversampled data stream. This activity is useful, mind you, better than not filling in any intermediate values at all. But it is still suboptimal. More powerful averaging algorithms can go further, by freeing themselves from even being anchored to every fourth data point.
Why would it be better to not be anchored to these data points? After all, these data points represent the musical waveform, as best as the digital filter was able to re-create it, and we don't want to lose track of the music waveform by drifting free of these anchor points. The answer lies in an attitude shift. Remember that the best efforts of the digital filter are still imperfect at re-creating the original music waveform, for many reasons, including the fact that no physically real digital filter can mimic the required ideal boxcar filter function. So we can regard the music waveform created by the digital filter as a statistical scatter of data points, approximating the true original music waveform, but still laden with errors, distortions, and noise. Instead of arrogantly assuming that the digital filter's calculations represent a sacrosanct music waveform whose data points (at a 44.1 kc rate) we must anchor to, we humbly acknowledge the imperfections of our digital filters, and try our best to free ourselves from its errors. By using high power averaging as a tool, we can average out the various random errors endemic to digital systems, and we can re-create the complex original music waveform curve that is the best fit among the statistical scatter among error-laden sample dots - yet this best fit curve might not actually touch any of the sample dots.
Reducing Digital System Errors
In a digital system, there are many sources and types of random noise, errors, and distortions contaminating the music waveform, and high power averaging can compensate for all of them, thereby helping digital to sound more like real live music (and more like great analog). Indeed, high power averaging is most effective at reducing noise, and thus also at reducing random or noiselike errors or distortions. For music's middle and low frequencies we know a priori that these components of the music waveform are smooth curves with gently changing shapes, without the brief, sudden, random changes that noise, errors, or distortions superimpose on the smooth waveform. High power averaging finds and calculates the correct original smooth curve, and thereby finds and calculates, to a high resolution degree of accuracy, the original music waveform before it was contaminated by random noise, errors, and distortions. Thus, high power averaging can effectively compensate for or eliminate most random noise, errors, and distortions (again, just for music's middle and lower frequencies).
It's worth discussing some of these contaminations that digital systems impose on the music waveform. First, let's tackle a core contamination problem, the crude native resolution of digital media. A conventional CD has only 16 bits, and a Sony DSD master tape has only 8 effective bits of resolution. But IAR research showed long ago that the human ear/brain can hear finer than 20 bits of resolution on music. Since the human ear/brain can discern, apprise, and appreciate the true musical waveform to an accuracy of 20 bit resolution, it follows that any representation of that same music waveform with cruder resolution, e.g. only 16 bit quantization, will only crudely approximate the true and audibly discernible amplitude value of the music waveform for each sample point, and will be somewhat erroneous at each sample point. Sometimes the crude approximation will be on the high side of the actual music waveform amplitude, and sometimes on the low side, in a moreless random pattern. Thus, we can regard these random (quantization) errors of the 16 bit resolution digital system as noise superimposed on the correct original music waveform. High power averaging can reduce or compensate for this noise, these random errors, and can thereby get us closer to the correct original music waveform than the crude approximations of 16 bit quantization could. In other words, high power averaging actually delivers a higher resolution, more accurate version of the original music waveform (at least for middle and lower frequencies) than the 16 bit resolution of
(Continued on page 27)