We read with great interest the article by Westen and Weinberger (October 2004) exploring the benefits and limitations of clinical observation and judgment. They make a persuasive case for using statistical methods to aggregate clinicians’ observations in clinical and personality research. Just as increasing the number of items on an assessment instrument can improve reliability, statistically aggregating the judgments of multiple observers is likely to yield a judgment more reliable than that of a single observer. The advantages of pooling multiple observations, however, are not limited to research in clinical and personality psychology. Westen and Weinberger identify two categories of informants—clinicians and participants—but these categories could be expanded to include other observers who might have particular expertise or experience related to the phenomenon of interest. The type of expert best suited to provide observations depends on the type of expertise required. For reports about subjective psychological or physical experience, the individual himor herself is likely to be most expert. For assessment of clinically observable phenomena, clinicians may indeed bring value-added expertise. There are some domains, however, in which those with the greatest expertise are neither specially trained observers nor selfreporters but, rather, lay observers who have a native or learned ability to detect complicated social or psychological phenomena and make subtle discriminations. This type of expertise is often thought of as intuitive because it uses implicit knowledge that is not always accessible to conscious awareness or capable of being fully articulated. One way to harness this intuitive expertise effectively is to pool the judgments of multiple lay observers. Using lay expertise in observational research runs counter to the prevailing belief that manualized coding systems are required to obtain valid, reliable, and reproducible results. To be sure, manualized approaches have contributed greatly to research in psychology. However, some phenomena do not lend themselves easily to codification and rule-driven observation. For example, emotions, power dynamics, validation, and empathy have all been shown to be of great relevance to interpersonal functioning, but these are challenging to operationalize. Yet they are phenomena that individuals routinely evaluate in their everyday interactions with others. When using manual-based systems, coders may find that the rules and procedures designed to facilitate rating of these constructs do not fully utilize—and may even interfere with—the natural decoding expertise that they have developed. Needing, for example, to process all of the information in a manual and simultaneously attend to the behavioral segment being coded may overload the cognitive capacities of many individuals and interfere with their ability to utilize important cues. Consider the example of assessing emotion. Because it is socially adaptive to do so, most human beings develop a sophisticated and almost instantaneous ability to recognize others’ emotions. People use complex internal algorithms to synthesize disparate cues about emotion from body language, facial expression, vocal qualities, speech content, and social context. Of course, individual differences in this proficiency exist. Just as aggregating judgments across clinicians removes idiosyncratic biases and yields a more reliable composite judgment, aggregating the individual judgments of multiple untrained judges yields similar benefits. Our research has led us to believe that lay observers’ intuitive judgments about emotions may in fact capture important information that is lost when coders depend on more commonly used manualized approaches such as the Specific Affect Coding System (SPAFF; Gottman, McCoy, Coan, & Collier, 1996) and the Facial Action Coding System (Ekman & Friesen, 1978). In our laboratory, we experimented for a year with manualized approaches to coding emotion and then decided to explore the utility of what is often (though perhaps misleadingly) termed naive coding. Inspired by the work of Rosenthal and colleagues on coding nonverbal behaviors (e.g., Ambady & Rosenthal, 1993), we provided untrained, unmarried college-age individuals a list of 16 emotion descriptors without instructing them on how to interpret these descriptors or on markers of specific emotions. We asked them to watch videotapes of 10-min couple interactions and to code the emotional expression of each partner. For each 30-s segment of the interaction, coders rated the intensity with which each of the 16 emotions was displayed. Despite the fact that each rater coded in slightly idiosyncratic ways, we obtained good reliabilities on factor-analytically derived scales by combining the ratings of 5–6 coders. The aggregated ratings of these “naive” coders were linked in expected ways with concurrent marital satisfaction and independent assessments of marital adjustment. More important, they predicted with more than 80% accuracy a real-world consequence of great significance: whether couples in committed relationships would remain together after 5 years (Waldinger, Schulz, Hauser, Allen, & Crowell, 2004). When we compared emotion ratings of couple interactions using this approach with ratings of the same interactions by experienced coders using the manualized SPAFF, we found a high degree of correlation between the two sets of ratings (Waldinger et al., 2004). We also