General practice is the point of first contact with the health care system in many countries whether or not general practitioners have a ‘gatekeeper’ role. It is characterized by early disease (and consequently, non-specific and atypical presentations), a wide range of potential diagnoses, co-morbid illness (associated with an increase in chronic conditions treated in primary care) and low prevalence of serious morbidity. These characteristics underline the diagnostic difficulties and conflicting demands that GPs face. Often as gatekeepers of the health care system, GPs are under pressure to reduce ‘unnecessary’ investigations and referrals. At the same time, they are required to detect serious conditions early on—but are not always successful. Missed malignancies and myocardial infarctions account for most claims of diagnostic error made against GPs (diagnostic error being the commonest cause of litigation against GPs both in the UK and USA). Failure to refer appropriately is a second major contributory factor in many successful claims against GPs and may well be influenced by diagnosis. Nevertheless, diagnostic error in primary care is under-researched, possibly due to the difficulties in measuring its true rate and impact and the traditional view that ‘GPs don’t diagnose, they manage’. From a probability theory perspective, diagnosis is a process of revising the prior probability of a particular condition in the light of new evidence (Bayes’ Rule). The new, updated probability is called a posterior probability. In order to use quantitative information in diagnosis, it is necessary to know the diagnostic value of symptoms and signs (known collectively as likelihood ratios) and rules for the combination of likelihood ratios. Likelihood ratios are obtained from diagnostic cohort studies where information on the presence or absence of these diagnostic cues is collected independently of investigations to reach a final diagnosis in a prospective cohort. Broekhuizen et al. have conducted a systematic review of diagnostic studies in Primary Care for the diagnosis of Chronic Obstructive Pulmonary Disease (COPD). However, the use of this information in real-life diagnosis is complicated by the fact that most diagnostic cues are not independent; therefore, adding up likelihood ratios of correlated cues would inflate the posterior probability of the disease. For example, cough, dyspnoea and wheeze all relate to the same underlying pathology and are not independent. There are two approaches to this problem of combining likelihood ratios. First, a regression method can be used to obtain a mathematical model that adjusts the final probability for cue correlation, as shown in three of the COPD studies. In clinical use, the model would be simplified by taking logs of the regression coefficients, the log coefficients being added together to form a score, or Clinical Prediction Rule. Unfortunately, clinical prediction rules do not transfer well from one patient population to another and are often complex to calculate, which limits their use in everyday practice. An alternative approach is to create a Bayesian network that allows for a matrix of cue correlations. Much more data are required to drive a Bayesian network than a clinical prediction rule, and such tools could only be deployed as part of a computerized decision support system. A long-standing aim of eHealth technologies has been to increase the quality and quantity of data from routine health care, using them to provide better evidence for clinical practice. While it is possible to do this via a prospective cohort study, the lack of routine and reliable coding of diagnostic cues in clinical practice is a significant barrier. One European project that has addressed this issue is the Transition Project. Since 1984, the Transition Project has been formally capturing presenting symptoms, based on episode-oriented epidemiology, and using the International Classification of Primary