In FDA guidance 'Multiple Endpoints in Clinical Trials', the Bonferroni Method was described as the following:
The Bonferroni method is a single-step procedure that is commonly used, perhaps because of its simplicity and broad applicability. It is a conservative test and a finding that survives a Bonferroni adjustment is a credible trial outcome. The drug is considered to have shown effects for each endpoint that succeeds on this test. The Holm and Hochberg methods are more powerful than the Bonferroni method for primary endpoints and are therefore preferable in many cases. However, for reasons detailed in sections IV.C.2-3, sponsors may still wish to use the Bonferroni method for primary endpoints in order to maximize power for secondary endpoints or because the assumptions of the Hochberg method are not justified. The most common form of the Bonferroni method divides the available total alpha (typically 0.05) equally among the chosen endpoints. The method then concludes that a treatment effect is significant at the alpha level for each one of the m endpoints for which the endpoint’s p-value is less than α /m. Thus, with two endpoints, the critical alpha for each endpoint is 0.025, with four endpoints it is 0.0125, and so on. Therefore, if a trial with four endpoints produces two-sided p values of 0.012, 0.026, 0.016, and 0.055 for its four primary endpoints, the Bonferroni method would compare each of these p-values to the divided alpha of 0.0125. The method would conclude that there was a significant treatment effect at level 0.05 for only the first endpoint, because only the first endpoint has a p-value of less than 0.0125 (0.012). If two of the p-values were below 0.0125, then the drug would be considered to have demonstrated effectiveness on both of the specific health effects evaluated by the two endpoints. The Bonferroni method tends to be conservative for the study overall Type I error rate if the endpoints are positively correlated, especially when there are a large number of positively correlated endpoints. Consider a case in which all of three endpoints give nominal p-values between 0.025 and 0.05, i.e., all ‘significant’ at the 0.05 level but none significant under the Bonferroni method. Such an outcome seems intuitively to show effectiveness on all three endpoints, but each would fail the Bonferroni test. When there are more than two endpoints with, for example, correlation of 0.6 to 0.8 between them, the true family-wise Type I error rate may decrease from 0.05 to approximately 0.04 to 0.03, respectively, with negative impact on the Type II error rate. Because it is difficult to know the true correlation structure among different endpoints (not simply the observed correlations within the dataset of the particular study), it is generally not possible to statistically adjust (relax) the Type I error rate for such correlations. When a multiple-arm study design is used (e.g., with several dose-level groups), there are methods that take into account the correlation arising from comparing each treatment group to a common control group.
The guidance also discussed the weighted Bonferroni approach:
The Bonferroni test can also be performed with different weights assigned to endpoints, with the sum of the relative weights equal to 1.0 (e.g., 0.4, 0.1, 0.3, and 0.2, for four endpoints). These weights are prespecified in the design of the trial, taking into consideration the clinical importance of the endpoints, the likelihood of success, or other factors. There are two ways to perform the weighted Bonferroni test:
These two approaches are equivalent
- The unequally weighted Bonferroni method is often applied by dividing the overall alpha (e.g., 0.05) into unequal portions, prospectively assigning a specific amount of alpha to each endpoint by multiplying the overall alpha by the assigned weight factor. The sum of the endpoint-specific alphas will always be the overall alpha, and each endpoint’s calculated p-value is compared to the assigned endpoint-specific alpha.
- An alternative approach is to adjust the raw calculated p-value for each endpoint by the fractional weight assigned to it (i.e., divide each raw p-value by the endpoint’s weight factor), and then compare the adjusted p-values to the overall alpha of 0.05.
- Clinical importance of the endpoints
- The likelihood of success
- Other factors
- With two primary efficacy endpoints, the expectation for regulatory approval for one endpoint is greater than another
- Sample size calculation indicates that the sample size that is sufficient for primary efficacy endpoint #1 is overestimated for the primary efficacy endpoint #2
The study was to be considered positive if either of the two coprimary end points, progression free or overall survival, was significantly longer with durvalumab than with placebo. Approximately 702 patients were needed for 2:1 randomization to obtain 458 progression-free survival events for the primary analysis of progressionfree survival and 491 overall survival events for the primary analysis of overall survival. It was estimated that the study would have a 95% or greater power to detect a hazard ratio for disease progression or death of 0.67 and a 85% or greater power to detect a hazard ratio for death of 0.73, on the basis of a log-rank test with a two-sided significance level of 2.5% for each coprimary end point.
However, in the original study protocol, the weighted Bonferroni method was used and unequal alpha levels were assigned to OS and PFS.
The two co-primary endpoints of this study are OS and PFS. The control for type-I error, a significance level of 4.5% will be used for analysis of OS and a significance level of 0.5% will be used for analysis of PFS. The study will be considered positive (a success) if either the PFS analysis results and/or the OS analysis results are statistically significant.
Here, aweight of 0.9 (resulting in an alpha 0.9 x 0.05 = 0.045) was given to OS and a weight of 0.1 (resulting in an alpha 0.1 x 0.05 = 0.005) was given to PFS.
The initial assumptions for the primary end-point were an annual rate of 21% on placebo with a risk reduced by 36% (hazard ratio (HR) 0.64) with bosentan and a negligible annual attrition rate. In addition, it was planned to conduct a single final analysis at 0.04 (two-sided), taking into account the existence of a co-primary end-point (change in 6MWD at 16 weeks) planned to be tested at 0.01 (two-sided). Over the course of the study, a number of amendments were introduced based on the evolution of knowledge in the field of PAHs, as well as the rate of enrolment and blinded evaluation of the overall event rate. On implementation of an amendment in 2007, the 6MWD end-point was change from a co-primary end-point to a secondary endpoint and the Type I error associated with the single remaining primary end-point was increased to 0.05 (two-sided).
RESPIRE 1 Study:
- Hypothesis 1: ciprofloxacin DPI for 28 days on/off treatment regimen versus pooled placebo (alpha=0.025)
- Hypothesis 2: ciprofloxacin DPI for 14 days on/off treatment regimen versus pooled placebo (alpha=0.025)
RESPIRE 2 Study:
- Hypothesis 1: ciprofloxacin DPI for 28 days on/off treatment regimen versus pooled placebo (alpha=0.001)
- Hypothesis 2: ciprofloxacin DPI for 14 days on/off treatment regimen versus pooled placebo (alpha=0.049)