6.3 GSM Speech Coding

The first stage of speech encoding is to convert human speech, generated by the microphone
as an analogue signal, into a digital equivalent.
GSM achieves this by sampling the analogue voice signal every 125mS, or 8000 times per
second. Each sample is quantised into one of 8192 voltage levels. Each of these levels is
represented by a 13-bit (213) binary code. Therefore, every second 8000x13-bit samples of the
analogue signal are produced, resulting in a raw data rate of 104kbps.
1 Regular Pulse Excitation with Linear Predictive Coding
2 Multipulse Excitation with Long Term Prediction

This raw bit stream is presented to the RPE-LTP Vocoder where it is chopped into 20mS
(2080-bit) blocks. Each block is then processed separately.
The vocoder categorises the data in each 20mS -block into three parts:
· Short-term Linear Predictive Coding data (LPC)
· Long-term prediction data (LTP)
· Regular Pulse Excitation data (RPE)

The long and short term prediction waveforms are each encoded as frequency and amplitude
information in the form of 36-bit blocks, while the RPE is encoded in a 188-bit block primarily
to ensure that the characteristic tone of the voice is reproduced well.
The resulting data rate of 13 kbps is suitable for the bandwidth available on the air interface.
Therefore, for every 20ms 2080-bit data block applied to the vocoder, a 20ms 260-bit output
block is produced. Therefore, a compression ratio of almost 10:1 has been achieved without
significant degradation to the voice quality.


No comments:

Post a Comment