Making musical instruments that accompany vocals in a song depends on the mood quality and the music composer’s creativity. The model created by other researchers has restrictions that include being limited to musical instrument digital interface files and relying on recurrent neural networks (RNN) or Transformers for the recursive generation of musical notes. This research offers the world’s first model capable of automatically generating musical instruments accompanying human vocal sounds. The model we created is divided into three types of sound input: short input, combed input, and frequency sound based on the discrete cosine transform (DCT). By combining the sequential models such as Autoencoder and gated recurrent unit (GRU) models, we will evaluate the performance of the resulting model in terms of loss and creativity. The best model has a performance evaluation that resulted in an average loss of 0.02993620155. The hearing test results from the sound output produced in the frequency range 0-1,600 Hertz can be heard clearly, and the tones are quite harmonious. The model has the potential to be further developed in future research in the field of sound processing.