Systems using deep neural network (DNN) have shown promising results in automatic speech recognition (ASR), where one of the biggest challenges is the recognition in noisy speech signals. We have combined two famous architectures of deep learning, the convolutional neural networks (CNN) for acoustic approach and a recurrent architecture with connectionist temporal classification (CTC) for sequential modeling, in order to decode the frames in a sequence forming a word. Experimental results show that the proposed architecture achieves improved performance over classical models, such as hidden model Markov (HMM) for labeling in variable time sequences in BioChaves database.