Homer Dudley's VODER is considered the first attempt to synthesize human speech electronically by breaking it down into acoustic components. Fifty years later, Terminator 2 featured an example of human speech synthesized with artificial intelligence that was used to deceive a human. Speech synthesis is the artificial simulation of human speech using a computer or other device. The counterpart of the voice recognition, speech synthesis is mainly used to convert textual information into audio information so that a person can naturally interact with digital devices. For example, it is used in assistive technology to help visually impaired people read textual content. A separate direction is the use of speech synthesis to create a clone of a person's voice. Deepfake voice technology, also called voice cloning, has advanced to the point where it can accurately reproduce the human voice by mimicking intonation and other features of the speaker. And it can be used to harm a person. Attackers can employ it to fool voice authentication systems or create fake audio recordings to defame public figures, or combine voice clone with social engineering techniques to bamboozle people. This article discusses the architecture of a voice recognition system that will significantly reduce the possibility of fraud using deepfake voice.