Using Deep Learning to Understand and Model how a Virtual Assistant, like Siri, knows when to Act

Shravan Devraj,Ross Greer

doi:10.47611/jsrhs.v12i4.5243

Abstract

In the era of technology, virtual assistants are all around us and have changed the way we interact with technology. To better understand the inner workings of virtual assistants, we visualized and demonstrated one way that mimics the audio classification techniques of virtual assistants by developing a deep convolutional neural network (DCNN) trained on mel spectrograms to classify audio. Our hypothesis is that mel spectrograms of the wake and non-wake words can be used to accurately classify audio. Out of the 85 files in our dataset, our classifier was trained and validated on 58 files of data and tested on 27 files of data. When evaluating our test performance, our model achieved a value of 1 for precision, recall and accuracy. Our classifier achieved a 100% accuracy in classifying wake words and non-wake words.

Full Text