The main purpose of this work was to investigate the possibility of detecting respiratory diseases in audio recordings of lung auscultation using modern deep learning tools, as well as to explore the possibility of using data augmentation by generating synthetic spectral representations of audio samples. The ICBHI (International Conference on Biomedical and Health Informatics) dataset was used for training, validation and augmentation. The dataset includes lung auscultations of 126 different subjects, there are a total of 920 sounds, of which 810 have signs of chronic diseases, 75 of non-chronic diseases and 35 with no pathology. The stage of data preprocessing includes discretization to 4kHz frequency, as well as filtering of frequency bands that do not carry information value for the task. In the next step, each sample was transformed into a frequency spectrum and Melspectrograms were generated. To solve the problem of class imbalance, the required number of synthetic spectrograms generated by convolutional variation autoencoders was added. At the stage of building the model, the methods of classical convolutional neural networks were used. The quality of the obtained algorithm was evaluated using a 10-fold cross-validation. Also, to assess the generalization of the proposed method, experiments were performed with the split of audio recordings into training and test sets using patient grouping. Qualitative evaluation of the model was performed using sensitivity, specificity, F1-score and Cohen’s kappa. A score of 98.45% F1-score was achieved for the 5-class classification problem which can contribute to the development of ways to synthesize and augment sensitive medical data. In addition, a cons of existing methods in the generalization of the obtained predictions were revealed, which opens the way for further research in the direction of clinical respiratory diseases detection.