A compressed digital speech signal is encoded to provide a transmission error-resistant transmission signal. The compressed speech signal is derived from a digital speech signal by performing a pitch search on a block obtained by dividing the speech signal in time to provide pitch information for the block. The block of the speech signal is orthogonally transformed to provide spectral data, which is divided by frequency into plural bands in response to the pitch information. A voiced/unvoiced sound discrimination generates voiced/-unvoiced (V/UV) information indicating whether the spectral data in each of the plural bands represents a voiced or an unvoiced sound. The spectral data in the plural bands are interpolated to provide spectral amplitudes for a predetermined number of bands, independent of the pitch. Hierarchical vector quantizing is applied to the spectral amplitudes to generate upper-layer indices, representing an overview of the spectral amplitudes, and lower-layer indices, representing details of the spectral amplitudes. CRC error detection coding is applied to the upper-layer indices, the pitch information, and the V/UV information to generate CRC codes. Convolution coding for error correction is applied to the upper-layer indices, the higher-order bits of the lower-layer indices, the pitch information, the V/UV information, and the CRC codes. The convolution-coded quantities from two blocks of the speech signal are then interleaved in a frame of the transmission signal, together with the lower-order bits of the respective lower-layer indices.