Audio Communication Coder (ACC) concept and operation

 

  • What is ACC ?

    ACC is a new source/perceptual audio compression technology created by ATC Labs and is described in two papers that have been presented at the 119th AES Convention, New York, October 7-10, 2005; and at the 120th AES Convention, Paris, May 20-23, 2006. The New York AES paper is available here. The associated PPT presentation is available here (PDF format). The Paris AES paper is available here, and the corresponding PPT presentation is available here (PDF format). Audio demos illustrating the operation of ACC are available below.

  •  

  • Yet another audio coder ?

    While ACC falls in the category of source/perceptual coders, it was not designed to compete directly with other well-known audio compression technologies such as MP3, WMA or AAC. In fact, these technologies were designed to optimize the compression efficiency in such application scenarios as:

    • broadcast,

    • streaming,

    • messaging,

    • (Internet) download,

    however they are not capable to address an emerging application context whose relevance is increasing: real-time, two-way, high-quality audio communication. In fact, their effort in high compression efficiency comes at the cost of an excessive coding delay and system complexity that are not compatible with interactive (mobile) communication requirements. In contrast, ACC features low coding delay (less than 50 ms for sampling frequencies between 32 kHz and 48 kHz) and moderate system complexity, and benefits from other innovative design aspects making it suitable to satisfy:

    • the increased consumer expectation of higher voice/audio quality and new functionalities in mobile communication,

    • the pressure of operators who have strongly invested in 3G licenses and expect the corresponding return by offering consumers new services.

     

  • What are the distinctive features of ACC ?

    These are some of the ACC distinctive features:

    • ACC offers low end-to-end delay by minimizing the size of the transform, and look-ahead and bit-stream buffers,

    • ACC uses intra-frame coding strategies only, insuring high intrinsic error robustness, facilitating error concealment, providing high time resolution in bit stream access, and facilitating special effects such as fast-rewind and fast-forward play modes,

    • ACC implements efficient bandwidth extension using ATC Labs proprietary technology: ABET,

    • ACC is amenable to new functionalities including

      • special effects in audio such as modification of voice gender,

      • semantic classification, access, filtering and retrieval of audio on the compressed (i.e., bit stream) domain.

  •  

  • What is the structure of the ACC encoder ?

    The following figure illustrates a block diagram of the ACC encoder.

    The ACC encoder combines:

    • source coding tools,

    • perceptual coding tools,

    • bandwidth reduction/extension tools.

    Bandwidth reduction is separately implemented for the coding of sinusoidal parameters and the coding of the MDCT residual.

  •  

  • What about the structure of the ACC decoder ?

    The structure of the ACC decoder corresponding to the previous encoder, is depicted in the following figure.

    This figure highlights two important aspects of ACC:

    • bandwidth extension is implemented separately for sinusoids and (residual) noise components, and this adds important flexibilities namely in the control of the spectral tilt of each individual component,

    • bandwidth extension can be implemented without additional filter banks besides the core MDCT, and this not only minimizes coding delay but also reduces system complexity.

  •  

  • Are there audio demos ?

    Yes, a few demos corresponding to the results discussed in the New York ACC paper are presented here. These demos presume that the sampling frequency is 44100 Hz, and that the ODFT/MDCT transform size is 1024 samples. Other demos corresponding to the Paris ACC paper as well recent results on mono/stereo encoding are available below.

    EXAMPLE 1: Bandwidth extension of a sinusoid with vibrato (details on the ACC paper)

    This example illustrates the accuracy of bandwidth extension (BE) implemented in ACC. The input audio is a single sinusoid that is modulated in frequency. Using for the bandwidth extension a factor of 10, it can be heard and seen that not only the center frequency of the sinusoid is correctly magnified, but also its frequency deviation.  

     

    EXAMPLE 2: Bandwidth extension of a harmonic complex with vibrato (details on the ACC paper)

    This example is an extension of the previous one in the sense that we have created a harmonic complex by adding three partials to the fundamental (f0) corresponding to the FM signal of the previous example. ACC has bandwidth extended this harmonic complex by adding partials from 11*f0 till 18*f0. It can be heard and seen that bandwidth extension is correctly implemented in terms of frequency modulation and frequency deviation (use of a good audio editor for careful listening is recommended).  

     

    EXAMPLE 3: Coding and bandwidth extension of a natural music signal  (details on the ACC paper)

    In this example we are using a short excerpt of a music signal (trumpet solo). The coding is performed at 24 kbit/s constant bit rate. The basic bandwidth is 5 kHz. Three versions of the coded audio are illustrated here: without bandwidth extension, with bandwidth extension of sinusoids only (BE S), and with bandwidth extension of both sinusoids and noise (BE S+N) (use of a good audio editor for careful listening is recommended).  

    Original no BE BE S BE S+N
    - short-time PSD short-time PSD short-time PSD
  •  

  • OK, I understand the ACC concept, how about coding quality with natural audio ?

    ACC is a brand new approach to audio coding that targets emerging application areas not reachable by most current audio compression technologies, namely:

    • 3G mobile audio communication,

    • wireless links to microphones or loudspeakers,

    • B-channel audio communication,

    • audio/music conferencing (among musicians for example).

    This list is not exhaustive and the following audio demos help to illustrate the quality that can be enjoyed in real-time application scenarios involving two-way communication and using CBR (constant bit rate coding). The sampling frequency is 32 kHz.

  •  

    item Original 80 kbit/s 64 kbit/s 48 kbit/s 32 kbit/s 24 kbit/s
    jazz
    harpsichord
    castanets
    male
    Sting
    Vega
    Vega solo

     

    ACC has been designed to allow fast adaptation (on-the-fly) to an existing fluctuation in the bandwidth of the channel. Given that ACC relies on intra-frame coding only, the adaptation of the coding process to the available bandwidth of the channel can be performed instantaneously (on a frame basis), and in such a way as to maximize the audio quality. The following figure illustrates such a scenario: every 640 ms, the bit rate switches (on-the-fly) instantaneously.

    Taking the Vega audio item as the original, the (on-the-fly) bit rate switching has been enforced as illustrated in the previous figure and the resulting audio is available next. It can be noticed that the transitions are seamless and take full advantage of the audio bandwidth that is expected for a given bit rate.

    Original Switched bit rate

     

  • How about error concealment ?

    ACC has been designed to address real-time two-way audio communication and therefore effective error concealment is a major concern. Since ACC is not constrained by any inter-frame coding technique, error concealment is simplified because there is no intrinsic spread of errors. On the other hand, error concealment in ACC is implemented so as to take full advantage from the information of correctly decoded (past) audio frames. Due to real-time constraints, ACC performs error concealment using past audio frames only, and not 'future' audio frames, which would increase the communication delay. The following audio example illustrates the performance of the ACC error concealment on the decoding of audio that has been coded (CBR) at 64 kbit/s and when the frame loss is as high as 40%.

    Original 40% frame loss
  •  

  • How does stereo ACC sound like ?

    ACC has been designed to address either mono or stereo coding. In order to minimize complexity in different application scenarios, there are two ACC software implementations, one for mono encoding/decoding only, and another implementation for stereo encoding/decoding. Although the stereo ACC software is more complex than the mono ACC software because it includes additional processing and coding tools devoted to efficient stereo coding, both software implementations behave exactly the same way in mono encoding/decoding. All features described above, namely dynamic bit switching and error concealment, are equally effective in ACC stereo. The following audio demos illustrate the coding quality with stereo audio material at 128 kbit/s and 96 kbit/s (constant bit rate coding).

     

    item Original 128 kbit/s 96 kbit/s
    David Bowie
    harpsichord
    castanets
    Tracy Chapman
    Depeche Mode
    organ
  •  

    Please send your questions or comments to info@atc-labs.com

     

      Copyright © 2003-2009 ATC Labs, Inc.