Techniques for multi-party conferencing are provided. A plurality of audio streams is received from a plurality of conference-enabled devices associated with a conference call. Each audio stream includes a corresponding encoded audio signal generated based on sound received at the corresponding conference-enabled device. Two or more of the audio streams are selected based upon an audio characteristic (e.g., a loudness of a person speaking). The selected audio streams are transmitted to each conference-enabled device associated with the conference call. At each conference-enabled device, the selected audio streams are decoded into a plurality of decoded audio streams, the decoded audio streams are combined into a combined audio signal, and the combined audio signal is played from one or more loudspeakers to be listened to by a user.