Wallace and colleagues’ 1 article on tagging audio-recorded patient-physician interactions is intriguing, because they bring state-of-the-art machine-learning methods to a relatively prosaic problem that has potentially great implications for medical decision-making research and clinical decision support. Understanding the patient-physician interaction must be a first step to supporting shared decision making between these two participants, whether we model it formally, with decision analytic models based on maximizing expected utility, or whether we pay attention to the psychological aspects of that interaction. The physician’s opinion probably matters the most to patients, but how that opinion is communicated or could be modulated, through a decision-making intervention, depends on the type of empirical research conducted by Wallace and colleagues. The prosaic problem they address is straightforward. The best research on the patient-physician interaction requires video recording and tagging of the video at multiple levels. Second best is audio recording. As the authors point out, the dimensions for tagging were initially articulated by Roter in her Roter Interactional Analysis (RIAS) framework over 35 years ago, where dimensions of coding include gathering data, education and counseling, building a relationship, activating, and partnership building. Wallace and colleagues’ General Medical Interaction Analysis System (GMIAS) aims to include greater domain knowledge (e.g., the domain of HIV treatment) to enable the computer to perform automated tagging. As a student might say, ‘‘Wouldn’t it be great if we could get the computer to do the tedious job of coding?’’ And this is the task that Wallace and colleagues set for themselves. Their effort begins after the recording has been transcribed and divided into ‘‘utterances,’’ sequences of words spoken by a participant (physician or patient) before the switching of speakers. Thus, issues of tone, pauses, and timing are not represented. Further, they restrict the coding to just the topic level—that is, the content of the utterance, not its ‘‘speech act,’’ the interpersonal component. Their topic scheme is still broad, including tags for biomedical, psychosocial, logistics, socializing, antiretroviral (HIV) treatment, and missing/other. The heart of their contribution is the algorithm that ‘‘reads’’ the utterances and applies these topic tags. The algorithm is based on conditional random fields (CRFs), a type of probabilistic graphical model (PGM). PGMs have a long history, with either path analysis in the 1920s or hidden Markov models from 1966 as progenitors. The Society was involved in some of these developments in the 1980s, when uncertainty in the artificial intelligence community overlapped with that in the medical decision-making community because of the common interest in making high-stakes decisions in contexts of uncertainty. The key product of that turmoil was the Bayesian (belief) network, because the posteriors calculated from those models could be fed into expected-value decision analytic models. Pearl’s book on these models was influential in providing many researchers with a concrete basis for extending this work. But GPMs proved important in many other domains beyond medical diagnosis and decision making, such as computer vision, spatial analysis, bioinformatics, and natural language processing (NLP). The latter two have many affinities, since so many bioinformatics problems involve sequences of nucleotides or high-order motifs, and NLP involves sequence of phonemes, words, or utterances. Received 2 January 2014 from Johns Hopkins, Baltimore, MD (HPL). Revision accepted for publication 7 March 2014.
Read full abstract