People mostly communicate through speech or facial expressions. People's feelings and thoughts are reflected in their faces and speech. This phenomenon is an important tool for people to empathize when communicating with each other. Today, human emotions can be recognized automatically with the help of artificial intelligence systems. Automatic recognition of emotions can increase productivity in all areas including virtual reality, psychology, behavior modeling, in short, human-computer interaction. In this study, we propose a method based on improving the accuracy of emotion recognition using speech data. In this method, new features are determined using convolutional neural networks from MFCC coefficient matrices of speech records in Crema-D dataset. By applying particle swarm optimization to the features obtained, the accuracy was increased by selecting the features that are important for speech emotion classification. In addition, 64 attributes used for each record were reduced to 33 attributes. In the test results, 62.86% accuracy was obtained with CNN, 63.93% accuracy with SVM and 66.01% accuracy with CNN+BPSO+SVM.