Congratulations to Teng and Siegmund on a thorough treatment of the merits of multipoint affected relative pair analysis compared to single-marker analysis (Teng and Siegmund, 1998). The paper and other work by these authors provide an elegant mathematical framework for investigating problems in gene mapping using multiple marker loci. In the first part of this discussion, I will point out some of the practical issues involved in applying multipoint methods to real genetic diseases, in particular some of the problems multipoint mappers encounter. Although the potential benefits of multipoint are substantial, it is helpful to remember that the use of multipoint is not always straightforward. Research into useful methods of detection of and solutions to these problems is a growing but underexplored area of methodology. Most potential problems in practical applications concern the validity of the assumptions being made. The most important of these are probably the genetic map, the marker allele frequencies, and the absence of marker typing errors. Parameters associated with the first two assumptions are usually considered fixed in mapping applications, rather than being estimated jointly with the trait model and recombination parameters, and most data sets are analyzed under the assumption that there are no marker typing errors. Consider first the assumed marker map. As currently implemented, most genome scans begin with a fairly sparse map, with markers 10-20 cM apart. If suggestive linkage is detected, then a denser set of markers is typed in the interesting region. Teng and Siegmund (1998) appear to recommend using a dense map for the initial genome scan as well, perhaps 1-5 cM. In addition to the obvious issues of the relative cost of genotyping versus relative pair collection, one problem with dense maps is that, at present, one cannot rely completely on given genetic maps with respect to either the marker order or the intermarker distances. Investigators condition on published marker maps or construct their own maps using their own pedigree data. In either case, the resulting maps are often based on fairly small numbers of families so that marker order often cannot be well estimated. To get a better idea of the problem, consider the construction of a 1 cM map using a set of markers with true intermarker distance of 1 cM. On average, one expects to observe 1 recombination event between a given pair of adjacent markers for every 100 fully informative meioses. To obtain a precise estimate of the recombination fraction, a much larger number of fully informative meioses will be required. In addition, markers are not generally fully informative, so the required sample sizes will be even larger. As map order is essentially inferred from the recombination fractions, large numbers of families are required to produce a dense genetic map that can be relied upon with confidence. Using simulations, Hauser, Boehnke, and Risch (1996) and Hauser (1997) investigated the robustness of model-free affected sib-pair lod score methods to marker map errors. The authors discovered that, on average, isolated errors in either intermarker distance or marker order do not significantly affect the size or power of the linkage statistic but that, in individual data sets, different map orders and map distances can produce very different results. More extensive errors in intermarker distances had a greater effect on power. On average, the lod score decreased if the assumed intermarker distances were smaller than the true distances. If the assumed intermarker distances were larger than the true distances, then the lod score increased if parents were not typed but decreased if parents were typed. They did not investigate the effects of multiple map order errors in a small region.
Read full abstract