Abstract Spoken language processing is an important capability of human intelligence that has hitherto been unexplored by cognitive architectures. This reflects on both the symbolic and sub-symbolic nature of the speech problem, and the capabilities provided by cognitive architectures to model the latter and its rich interplay with the former. Sigma has been designed to leverage the state-of-the-art hybrid (discrete + continuous) mixed (symbolic + probabilistic) capability of graphical models to provide in a uniform non-modular fashion effective forms of, and integration across, both cognitive and sub-cognitive behavior. In this article, previous work on speaker dependent isolated word recognition has been extended to demonstrate Sigma’s feasibility to process a stream of fluent audio and recognize phones, in an online and incremental manner with speaker independence. Phone recognition is an important step in integrating spoken language processing into Sigma. This work also extends the acoustic front-end used in the previous work in service of speaker independence. All of the knowledge used in phone recognition was added supraarchitecturally – i.e. on top of the architecture – without requiring the addition of new mechanisms to the architecture.