In this work, we present an alternative model to Arabic speech recognitions to boost classification accuracy of basic speech units in Industrial Internet of Things. This approach integrates both top-down and bottom-up knowledge into the automatic speech attribute transcription (ASAT) framework. ASAT is a new family of lattice-based speech recognition systems grounded on accurate detection of speech attributes. Two state-of-the-art deep neural networks were used for attribute speech feature detection, convolutional neural networks (CNNs) and feed-forward neural network, with pretraining using a stack of restricted Boltzmann machines (DBN-DNN). These attribute detectors are trained using a data-driven approach for Arabic speech data using the Buckwalter code for Arabic transcription. The detectors work by detecting phonologically distinctive articulatory features of Arabic based on manner and place of articulation, which extract characteristics of the human speech production process from the speech dataset. The results of the work are shown in terms of average equal error rate inferring that the attribute model based on the CNN consistently outperforms the DBN-DNN model. Attribute detectors with CNN, versus DBN-DNN model, reduce the AEER producing 10.42% relative improvements for manner of articulation and producing 13.11% relative improvements for place of articulation.