1. The Development of PROC MULTTEST While controversial, the use of multiplicity adjustments has gained acceptance recently in varied fields of scientific endeavor and their associated publications. Multiplicity adjustments have been considered important in pharmaceutical safety determinations involving multiple endpoints, such as in adverse events analysis of clinical trials, and in animal carcinogenicity studies. In situations where the compound tested is completely safe, it is likely to observe false positive indications of one or more untoward outcomes when unadjusted testing methods are used. Multiplicity adjustment is also important in epidemiology and other complicated areas of data analysis, as it offers protection against conclusions that are driven by excessive data mining. Multiplicity concerns in toxicology and clinical trials prompted the development of the SAS? procedure PROC MULTTEST which calculates adjusted P-values for a user-supplied family of tests in a wide variety of applications. Biometrics readers will be interested to know that this software is readily available to calculate many (but not all) of the multiplicity-adjusted P-values described by Wright (1992). In addition, the software incorporates many improvements and enhancements, such as the ability to incorporate correlations and nonnormal distributions. The development of PROC MULTTEST started in May 1987. One of us (Westfall) presented a resampling approach to calculating adjusted P-values for multiple tests in multivariate binomial models, with special application to the animal carcinogenicity problem. The talk was entitled Multivariate Binomial Testing, and was given as the keynote address for the Midwestern Biopharmaceutical Statistics Workshop (MBSW) held in Muncie, Indiana. Similar approaches to the same problem were developed concurrently and independently, all of which have been published (Farrar and Crump, 1988; Heyse and Rom, 1988; Westfall, and Young, 1989). [An interesting Biometrics connection should be mentioned here. Dr Young thought that Westfall's (1985) publication concerning simultaneous inference with multivariate binary data might be applied to the multiplicity problem in animal carcinogenicity studies. Dr Young therefore invited Dr Westfall to speak at the MBSW conference on this application.] The initial precursor to PROC MULTTEST, called PROC MBIN, was developed in 1988. We wrote the specifications for the software, and the coding was performed by Youling Lin at Texas Tech University. Under the auspices of the Pharmaceutical Manufacturing Association, a consortium of drug companies offered financial and intellectual support for the project; specific individuals and companies who contributed are listed below in the Acknowledgements section. The development proceeded through a series of meetings at national and regional statistics conferences; attendees included representatives from the funding organizations and from the United States Food and Drug Administration. From the outset, it was decided that the primary form of output from the software would be the adjusted P-value, for the reasons Wright describes in his opening paragraph. The procedure PROC MBIN was described by Westfall, Lin, and Young (1989) and was donated to SAS Institute Inc. The software performed multiplicity adjustments for multiple tests (z-score or exact permutation tests) in multivariate (possibly stratified), multiple group binary outcome situations. The main feature of the output of this software was the use of adjusted P-values to summarize many statistical tests. Single-step resampling (bootstrap or permutation), Bonferroni, and Sidak methods were used to calculate the adjustments. The tabular form of the output was much like Tables 2 and 3 of Wright, showing unadjusted P-values (which we called raw P-values) side-by-side with various types of adjusted P-values.
Read full abstract