Monday, February 12, 2018

Weighted Bonferroni Method (or partition of alpha) in Clinical Trials with Multiple Endpoints


In a previous post, the terms of ‘multiple endpoints’ and ‘co-primary endpoints’ were discussed. If a study contains two co-primary efficacy endpoints, study is claimed to be successful if both endpoints have statistical significance at alpha=0.05 (no adjustment for multiplicity is necessary). If a study contains multiple (two) primary efficacy endpoints, the study is claimed to be successful if either endpoint is statistically significant. However, in later situation, the adjustment for multiplicity is necessary to maintain the overall alpha at 0.05. In other words, for hypothesis test for each individual endpoint, the significant level alpha is less than 0.05.

The most simple and straightforward approach is to apply the Bonferroni correction. The Bonferroni correction compensates for the increase in number of hypothesis tests. each individual hypothesis is tested at a significance level of alpha/m, where alpha is the desired overall alpha level (usually 0.05) and m is the number of hypotheses. If there are two hypothesis tests (m=2), each individual hypothesis will be tested at alpha=0.025.

In FDA guidance 'Multiple Endpoints in Clinical Trials', the Bonferroni Method was described as the following:
The Bonferroni method is a single-step procedure that is commonly used, perhaps because of its simplicity and broad applicability. It is a conservative test and a finding that survives a Bonferroni adjustment is a credible trial outcome. The drug is considered to have shown effects for each endpoint that succeeds on this test. The Holm and Hochberg methods are more powerful than the Bonferroni method for primary endpoints and are therefore preferable in many cases. However, for reasons detailed in sections IV.C.2-3, sponsors may still wish to use the Bonferroni method for primary endpoints in order to maximize power for secondary endpoints or because the assumptions of the Hochberg method are not justified. The most common form of the Bonferroni method divides the available total alpha (typically 0.05) equally among the chosen endpoints. The method then concludes that a treatment effect is significant at the alpha level for each one of the m endpoints for which the endpoint’s p-value is less than α /m. Thus, with two endpoints, the critical alpha for each endpoint is 0.025, with four endpoints it is 0.0125, and so on. Therefore, if a trial with four endpoints produces two-sided p values of 0.012, 0.026, 0.016, and 0.055 for its four primary endpoints, the Bonferroni method would compare each of these p-values to the divided alpha of 0.0125. The method would conclude that there was a significant treatment effect at level 0.05 for only the first endpoint, because only the first endpoint has a p-value of less than 0.0125 (0.012). If two of the p-values were below 0.0125, then the drug would be considered to have demonstrated effectiveness on both of the specific health effects evaluated by the two endpoints. The Bonferroni method tends to be conservative for the study overall Type I error rate if the endpoints are positively correlated, especially when there are a large number of positively correlated endpoints. Consider a case in which all of three endpoints give nominal p-values between 0.025 and 0.05, i.e., all ‘significant’ at the 0.05 level but none significant under the Bonferroni method. Such an outcome seems intuitively to show effectiveness on all three endpoints, but each would fail the Bonferroni test. When there are more than two endpoints with, for example, correlation of 0.6 to 0.8 between them, the true family-wise Type I error rate may decrease from 0.05 to approximately 0.04 to 0.03, respectively, with negative impact on the Type II error rate. Because it is difficult to know the true correlation structure among different endpoints (not simply the observed correlations within the dataset of the particular study), it is generally not possible to statistically adjust (relax) the Type I error rate for such correlations. When a multiple-arm study design is used (e.g., with several dose-level groups), there are methods that take into account the correlation arising from comparing each treatment group to a common control group.
The guidance also discussed the weighted Bonferroni approach:
The Bonferroni test can also be performed with different weights assigned to endpoints, with the sum of the relative weights equal to 1.0 (e.g., 0.4, 0.1, 0.3, and 0.2, for four endpoints). These weights are prespecified in the design of the trial, taking into consideration the clinical importance of the endpoints, the likelihood of success, or other factors. There are two ways to perform the weighted Bonferroni test:  
  • The unequally weighted Bonferroni method is often applied by dividing the overall alpha (e.g., 0.05) into unequal portions, prospectively assigning a specific amount of alpha to each endpoint by multiplying the overall alpha by the assigned weight factor. The sum of the endpoint-specific alphas will always be the overall alpha, and each endpoint’s calculated p-value is compared to the assigned endpoint-specific alpha.
  • An alternative approach is to adjust the raw calculated p-value for each endpoint by the fractional weight assigned to it (i.e., divide each raw p-value by the endpoint’s weight factor), and then compare the adjusted p-values to the overall alpha of 0.05.
These two approaches are equivalent

The guidance mentioned that reason for using the weighted Bonferroni test are:
  • Clinical importance of the endpoints
  • The likelihood of success
  • Other factors
Other factors could include:
  • With two primary efficacy endpoints, the expectation for regulatory approval for one endpoint is greater than another
  • Sample size calculation indicates that the sample size that is sufficient for primary efficacy endpoint #1 is overestimated for the primary efficacy endpoint #2 
With the weighted Bonferroni correction, the weights are subjective and are essentially arbitrarily selected which results in the partition of unequal significant levels (alphas) for different endpoints.

There are a lot of applications of Bonferroni and weighted Bonferroni in practice. Here are some examples: 
In the publication Antonia 2017 "Durvalumab after Chemoradiotherapy in Stage III Non–Small-Cell Lung Cancer", two coprimary end points were used in the study
The study was to be considered positive if either of the two coprimary end points, progression free or overall survival, was significantly longer with durvalumab than with placebo. Approximately 702 patients were needed for 2:1 randomization to obtain 458 progression-free survival events for the primary analysis of progressionfree survival and 491 overall survival events for the primary analysis of overall survival. It was estimated that the study would have a 95% or greater power to detect a hazard ratio for disease progression or death of 0.67 and a 85% or greater power to detect a hazard ratio for death of 0.73, on the basis of a log-rank test with a two-sided significance level of 2.5% for each coprimary end point.
However, in the original study protocol, the weighted Bonferroni method was used and unequal alpha levels were assigned to OS and PFS.  
The two co-primary endpoints of this study are OS and PFS. The control for type-I error, a significance level of 4.5% will be used for analysis of OS and a significance level of 0.5% will be used for analysis of PFS. The study will be considered positive (a success) if either the PFS analysis results and/or the OS analysis results are statistically significant.
Here, a weight of 0.9 (resulting in an alpha 0.9 x 0.05 = 0.045) was given to OS and a weight of 0.1 (resulting in an alpha 0.1 x 0.05 = 0.005) was given to PFS.

In COMPASS-2 Study (Bosentan added to sildenafil therapy in patients with pulmonary arterial hypertension), the original protocol contained two primary efficacy endpoints and weighted Bonferroni method (even though it was not explicitly mentioned in publication) was used for multipolicy adjustment. A weight of 0.8 (resulting in an alpha 0.8 x 0.05 = 0.04) was given to time to first mortality/morbidity event and a weight of 0.2 (resulting in an alpha 0.2 x 0.05 = 0.01) was given to the change from baseline to Week 16 in 6MWD.
The initial assumptions for the primary end-point were an annual rate of 21% on placebo with a risk reduced by 36% (hazard ratio (HR) 0.64) with bosentan and a negligible annual attrition rate. In addition, it was planned to conduct a single final analysis at 0.04 (two-sided), taking into account the existence of a co-primary end-point (change in 6MWD at 16 weeks) planned to be tested at 0.01 (two-sided). Over the course of the study, a number of amendments were introduced based on the evolution of knowledge in the field of PAHs, as well as the rate of enrolment and blinded evaluation of the overall event rate. On implementation of an amendment in 2007, the 6MWD end-point was change from a co-primary end-point to a secondary endpoint and the Type I error associated with the single remaining primary end-point was increased to 0.05 (two-sided).
According to FDA’s briefing book on” Ciprofloxacin Dry Powder for Inhalation (DPI)
Meeting of the Antimicrobial Drugs Advisory Committee (AMDAC) “, the sponsor (Bayer) conducted two pivotal studies: RESPIRE 1 and RESPIRE 2. Each study contained two hypotheses. Interestingly, for multiplicity adjustment, the Bonferroni method was used for RESPIRE 1 study and the weighted Bonferroni method for RESPIRE 2 study. We can only guess why weights of 0.02 and 0.98 (resulting in a partition of alpha of 0.001 and 0.049) was chosen in RESPIRE 2 study
RESPIRE 1 Study:
  • Hypothesis 1: ciprofloxacin DPI for 28 days on/off treatment regimen versus pooled placebo (alpha=0.025)
  • Hypothesis 2: ciprofloxacin DPI for 14 days on/off treatment regimen versus pooled placebo (alpha=0.025)
RESPIRE 2 Study:
  • Hypothesis 1: ciprofloxacin DPI for 28 days on/off treatment regimen versus pooled placebo (alpha=0.001)
  • Hypothesis 2: ciprofloxacin DPI for 14 days on/off treatment regimen versus pooled placebo (alpha=0.049)

Thursday, February 01, 2018

Handling site level protocol deviations

In previous post, the CDISC data structure for protocol deviations was discussed. The protocol deviation data set (DV domain) is an event data set (just like how we record the adverse event). The tabulation data set should contain one record per protocol deviation per subject. In other words, each protocol deviation is always tied to each individual subject. In DV data set, each record of the protocol deviations should have an unique identifier for subject ID (usubjid).

There are situations where the protocol deviations are on the site level, not the subject level. For example, many study protocols have a specific requirement for handling the study drugs (or IP - investigational products). The study drug must be stored under the required temperatures. An temperature excursion occurs when a time temperature sensitive pharmaceutical product is exposed to temperature outside the ranges prescribed for storage. The temperature excursion may result in inactivation of the study drug efficacy or cause safety concern. If there are multiple subjects enrolled in the problematic site, the protocol deviation associated with temperature excursion will have impact on all subjects at this site - this is called the site level protocol deviation.

There is no specific discussion about documenting and handling site level protocol deviations in ICH and CDISC guidelines.

According to CDISC SDTM, Protocol Deviations should be captured in DV domain. According to current SDTM standard, all tabulation data sets including DV are designed for subject data (with the only exception of Trial Design info).

For site level deviations, the deviations are not associated with any specific subjects, they can not be directly included in the DV data set. There may be two ways to handle the site level protocol deviations:

  • Document the site level protocol deviations separately from the subject level protocol deviations. Then document them in Clinical Study Report (CSR) and in Study Data Reviewer's Guide (SDRG) if applicable.
  • If any site level deviation has impact on all or multiple subjects enrolled at that site, the specific deviation can be repeated for each affected subject
It is advisable to pre-specify the instructions for handling the site level protocol deviations so that the site level protocol deviations are recorded appropriately.


Identifying and recording the protocol deviations including site level protocol deviations should be an ongoing process during the conduct of the clinical trials. If we wait until the end of the study, we may have difficulties to determine if a specific site level deviation has impact on all subjects or partial subjects at that site.

Tuesday, January 23, 2018

CDISC (CDASH and STDM) and Protocol Deviations

I have previously discussed the protocol deviation issues in clinical trials.
It is important to ensure the GCP compliance and adherence of the study protocol in clinical trials. However, due to the complexity of the clinical trial operations, it is not possible to have a perfect study 100% according to the study protocol. It will be a miracle to complete a study without any protocol deviations. Therefore, identifying and documenting the protocol deviations become a critical task.
After the study, when a clinical study report (CSR) is prepared, there should be a section to describe the protocol deviations. ICH guideline E3 STRUCTURE AND CONTENT OF CLINICAL STUDY REPORTS stated the following:
Protocol deviations also have impact on the statistical analysis side. Protocol deviation will not result in excluding the subjects from full analysis set (usually the intention-to-treat population), however, the important protocol deviations may result in subjects being excluded from the per-protocol population. For pivotal studies, analyses using per-protocol population will always be performed as one type of sensitivity analyses to evaluate the robustness of the study results.
If the results from the main analyses (usually based on intention-to-treatment population) is negative, the analyses on per-protocol population may not be very meaningful. If the results from the main study is positive, the analyses on per-protocol population will be important. If there are a lot of protocol deviations in a study, it may trigger the regulatory reviewer’s scrutiny and damp the confidence about the study results.
The CDISC’s CDASH (the guidelines for case report form design) and SDTM (theguidelines for standardized tabulation data structure) have the detail discussions about the protocol deviations. While violation of inclusion/exclusion criteria (or study entry criteria or eligibility criteria) may also be considered as part of the protocol deviations, the CDISC discussions are only for protocol deviations occurred after the study start (subjects randomized into the study and/or received the first dose of the study drug). Violation of Inclusion/exclusion should be collected separately in IE form (domain).  
CDASH recommends identifying the protocol deviation (DV domain) through other sources, not from the case report form. It stated:
         5.14.1 Considerations Regarding Usage of a Protocol Deviations CRF
The general recommendation is to avoid the creation of a Protocol Deviations CRF (individual sponsors can determine whether it is needed for their particular company), as this information can usually be determined from other sources or derived from other data. As with all domains, Highly Recommended fields are included only if the domain is used. The DV domain table was developed as a guide that clinical teams could use for designing a Protocol Deviations CRF and study database should they choose to do so.
In practice, the protocol deviations are usually collected and maintained by the clinical team either in CTMS (clinical trial management system) or excel spreadsheet even though there are examples (maybe the future trend) of collecting the protocol deviations through the electronic data capture (EDC).

If sponsor decides to use a case report form (paper or electronic) to capture the protocol deviations, CDASH recommends the following: 
If a sponsor decides to use a Protocol Deviations CRF, the sponsor should not rely on this CRF as the only source of protocol deviation information for a study. Rather, they should also utilize monitoring, data review and programming tools to assess whether there were protocol deviations in the study that may affect the usefulness of the datasets for analysis of efficacy and safety.


SDTM requires a protocol deviation (PD) domain for tabulation data set (pd.xpt) no matter how the original protocol deviation data is collected.


In ADAM, whether or not creating an analysis data set for protocol deviation is up to each individual's decision. It is not required. If a summary table for protocol deviation is needed, it can be programmed from the SDTM PV data set and ADAM ADSL data set.

Thursday, January 11, 2018

Statistician's nightmare - mistakes in statistical analyses of clinical trials

Statistician’s job could be risky too.

In recent news announcement, sponsor had to disclose the errors in statistical analyses. All these errors have consequences to the company’s value or even the company’s fate. I hope that the study team members who made this kind of mistakes still have a job in their company. I did have a friend ending up losing the job due to the incorrect report of the p-value.

Here are two examples. In the first example, the p-value was incorrectly calculated and announced, the later had to be corrected – very embarrassing for the statistician who made this mistake. In the second example, the mistake is more on the programming and data management side. Had the initial results been positive, the sponsor might never go back to re-assess the outcomes and the errors might never be identified.

Example #1:
Axovant Sciences (NASDAQ:AXON) today announced a correction to the data related to the Company’s investigational drug nelotanserin previously reported in its January 8, 2018 press release. In the results of the pilot Phase 2 Visual Hallucination study, the post-hoc subset analysis of patients with a baseline Scale for the Assessment of Positive Symptoms - Parkinson's Disease (SAPS-PD) score of greater than 8.0 was misreported. The previously reported data for this population (n=19) that nelotanserin treatment at 40 mg for two weeks followed by 80 mg for two weeks resulted in a 1.21 point improvement (p=0.011, unadjusted) were incorrect. While nelotanserin treatment at 40 mg for two weeks followed by 80 mg for two weeks did result in a 1.21 point improvement, the p-value was actually 0.531, unadjusted. Based on these updated results, the Company will continue to discuss a larger confirmatory nelotanserin study with the U.S. Food and Drug Administration (FDA) that is focused on patients with dementia with Lewy bodies (DLB) with motor function deficits. The Company may further evaluate nelotanserin for psychotic symptoms in DLB and Parkinson’s disease dementia (PDD) patients in future clinical studies.
Example #2:  
(note: PE: pulmonary exacerbation; PEBAC: pulmonary exacerbation blinded adjudication committee) 
Re-Assessment of Outcomes
Following database lock and unblinding of treatment assignment, the Applicant performed additional data assessments due to errors identified in the programming/data entry that impacted identification of PEs. This led to changes in the final numbers of PEs. Based on discussion the Applicant had with the PEBAC Chair, it was decided that 10 PEs initially adjudicated by the PEBAC were to be re-adjudicated by the PEBAC using complete and final subject-level information. This led to a re-adjudication by the PEBAC who were blinded to subject ID, site ID, and treatment. Result(s) of prior adjudication were not provided to the PEBAC.
 Efficacy results presented in Section 7.3 reflect the revised numbers. Further details regarding the reassessment by the PEBAC are discussed in Section 7.3.6.
7.3.6 Primary Endpoint Changes after Database Lock and Un-Blinding
Following database lock and treatment assignment un-blinding, the Applicant performed additional data assessments leading to changes in the final numbers of PEs. Specifically, per the Applicant, during a review of the ORBIT-3 and ORBIT-4 data occurring after database locking and data un-blinding (for persons involved in the data maintenance and analyses), ‘personnel identified errors in the programming done by Accenture Inc. (data analysis contract research organization (CRO)) and one data entry error that impacted identification of PEs. Because of the programming errors, the Applicant states that they chose to conduct a ‘comprehensive audit of all electronic Case Report Forms (eCRFs) entries for signs, symptoms or laboratory abnormalities as entered in the PE worksheets for all patients in ARD-3150-1201 and ARD-3150-1202’ (ORBIT-3 and ORBIT-4). From this audit, the Applicant notes ‘that no further programming errors’ were identified but instead 10 PE events (three from ORBIT-4 and seven from ORBIT-3) were found for which the PE assessment by the PEBAC was considered potentially incorrect. This was based on the premise that subject-level data provided to the PEBAC during the original PE adjudication were updated at the time of the database lock. Reasons provided are: 1) the clinical site provided update information to the eCRF after
 the initial PEBAC review (2 PEs), 2) incorrect information was supplied to the PEBAC during initial adjudication process (2 PEs), 3) inconsistency between visit dates and reported signs and symptoms (6 PEs). After discussion with the PEBAC Chair, it was decided that these 10 PEs initially deemed PEs by the PEBAC were to be re-assessed by the PEBAC using complete and final subject-level information. This led to a re-adjudication by the PEBAC during a closed session on January 25, 2017. This re-adjudication was coordinated by Synteract (Applicant’s CRO) who provided data to the PEBAC that were blinded to subject ID, site ID, and treatment. In addition, result(s) of prior adjudication were not provided. While the PEBAC was provided with subject profiles for other relevant study visits, the PEBAC focus was only on the selected visits for which data were updated or corrected.
 Because of the identified programming errors and PEBAC re-adjudication, there were two new first PEs added to the Cipro arm in ORBIT-3 and two new first PEs added to the placebo arm in ORBIT-4. Given these changes, the log-rank p-value in ORBIT-4 changed from 0.058 to 0.032 (when including sex and prior PEs strata). The p-value in ORBIT-3 changed from 0.826 to 0.974 remaining insignificant. These changes are summarized in Table 9. Note that there were no overall changes in the results of the secondary endpoints analyses from changes in PE status described above.

It is inevitable to make mistakes during the statistical analysis if there is no adequate procedures to prevent them. The following procedures can minimize the chances of making the mistakes as the examples above. 
  • Independent validation process (double programming): The probability for two independent people to make the same mistake is very very low. 
  • Dry-run process: using the dirty data, perform the statistician analysis using the dummy randomization schedule, i.e., perform the statistical analysis with the real data, but fake treatment assignment. The purpose is to do the programming work up front and to check the data upfront so that the issues and mistakes can be identified and corrected. 




Tuesday, January 02, 2018

Adverse Event Collection: When to Start and When to Stop?

In clinical trials, the most critical safety information is the adverse event (AE). There are numerous guidance and guidelines regarding the AE collection. However, there are still a lot of confusions. The very basic question is when to start the AE collection and when to stop the AE collection. For example, here are some discussions:

When to start the AE collection?

It is a very common practice in industry-sponsored clinical trials that AE record keeping begin after informed consent. Adverse events will be collected even for those patients who signed informed consent, but subsequently failed the inclusion/exclusion criteria during the screening period. If we attend the GCP training, it is very likely we will be told this is the way we are supposed to do for adverse event collection in order to be compliant with GCP.

However, the AE definition in the ICH E2A guidance document suggests that adverse event can be recorded at or after the first treatment, not the signing of the informed consent form (ICF). The ICH E2A defined the AE as:
Adverse Event (or Adverse Experience) Any untoward medical occurrence in a patient or clinical investigation subject administered a pharmaceutical product and which does not necessarily have to have a causal relationship with this treatment. An adverse event (AE) can therefore be any unfavourable and unintended sign (including an abnormal laboratory finding, for example), symptom, or disease temporally associated with the use of a medicinal product, whether or not considered related to the medicinal product.
A. Commonly, the study period during which the investigator must collect and report all AEs and SAEs to the sponsor begins after informed consent is obtained and continues through the protocol-specified post-treatment follow-up period. Since the ICH E2A guidance document defines an AE as “any untoward medical occurrence in a patient or clinical investigation subject administered a pharmaceutical product…” This definition clearly excludes the period prior to the IMP’s administration (in this context a placebo comparator used in a study is considered an IMP. Untoward medical occurrences in subjects who never receive any study treatment (active or blinded) are not treatment emergent AEs and would not be included in safety analyses. Typically, the number of subjects “evaluable for safety” comprises the number of subjects who received at least one dose of the study treatment. This includes subjects who were, for whatever reason, excluded from efficacy analyses, but who received at least one dose of study treatment.
 There are situations in which the reporting of untoward medical events that occur after informed consent but prior to the IMP’s administration may be mandated by the protocol and/or may be necessary to meet country-specific regulatory requirements. For example, it is considered good risk management for sponsors to require the reporting of serious medical events caused by protocol-imposed screening/diagnostic procedures, and medication washout or no treatment run-in periods that precede IMP administration. For example, a protocol-mandated washout period, during which subjects are taken off existing treatments (such as during crossover trials) that they are receiving before the test article is administered, may experience withdrawal symptoms from removal of the treatment and must be monitored closely. If the severity and/or frequency of AEs occurring during washout periods are considered unacceptable, the protocol may have to be modified or the study halted. Some protocols may also require the structured collection of signs and symptoms associated with the disease under study prior to IMP administration to establish a baseline against which post-treatment AEs can be compared. In some countries, regulatory authorities require the expedited reporting of these events to assess the safety of the human research.
For a specific study, the screening procedure and the potential injury of the screening procedure should be considered when deciding when to start the AE collection. For a study with very minimal or routine screening procedure (such as phase I study / clinical pharmacology study in healthy volunteers at phase I clinic), it may be ok to collect the AE starting from the first treatment.  For a study with comprehensive screening procedures or with invasive screening procedures, it is advised that the AE collection should start once the subject signs the ICF. For example, in a study assessing the effect of a thrombolytic agent in ischemic stroke patients, the screening procedures include CT scan and arteriogram to assess the location and size of the clot – which can cause adverse effects / injuries to the study participants. In this situation, it is strongly advised that the AE is collected at the ICF signing.

If the AE is collected from the ICF signing, during the statistical analysis, the AEs can be divided into non-treatment emergent AE and treatment emergent AEs (TEAE). Non-TEAEs are those AEs occurred prior to the first study treatment and TEAEs are those AEs with onset date/time at or after the first study treatment. Non-TEAE and TEAE will be summarized separately and the extensive safety analyses will be mainly based on the TEAE.

When to Stop the AE collection?

It is even more murky in terms of when to stop the AE collection because the end of the study is trickier than the start of the study. A study may have a follow-up period after the completion of the study treatment. A subject may discontinue the study treatment earlier, but remain in the study to the end.

There is no clear guidance how long after the last study treatment the AEs need to be collected. In practice, it is common to continue reporting AEs following the last study treatment – the period for post study treatment may be 7 days following the last treatment or 30 days following the last treatment.  The decision of AE collection during the follow-up period should be based on the half life of the study drug, whether there are AEs of special interest related to the study drug in investigation, and whether it is in pediatric or adult population.   

In oncology clinical trials, it is typical not to collect the adverse events during the long-term follow-up period. Adverse events may just be collected for short period after the last treatment, for example 30 days or 3 months or 6 months following the last study treatment. During the long-term follow-up period, only the study endpoint (tumor related events) such as death, tumor progression, or secondary malignant event will be collected.

Should adverse events be collected for subjects who discontinued the study treatment earlier? There is a good question and answer discussion at firstclinica.comAE Reporting for Discontinued Patient
QUESTION: What are the investigator's responsibilities in terms of reporting the post-discontinuation adverse events? On one hand, since the patient discontinued from the study, some think that the investigator has no right to review the patient's clinical record under HIPAA (authorization terminated) or informed consent regulations (consent withdrawn) and consequently has no authority or responsibility to report the adverse events. On the other hand, there does not appear to be any variances to an investigator's IND obligations (even when a patient discontinues from the study) with respect to reporting adverse events according to 21 CFR 312.64. Also, would the investigator's reporting responsibilities be the same for Situation A and Situation B?
ANSWER:
FDA has stated that clinical investigators need to capture information about adverse effects resulting from the use of investigational products, whether or not they are conclusively linked to the product. The fact that a subject has voluntarily withdrawn from the study does not preclude FDA's need for such information. In fact, withdrawal is often due to adverse effects, some already realized and others beginning and that will later progress. For your first scenario, that is obviously not a real problem since the investigator is also the individual's private physician and obviously has this information. While you are correct to worry about privacy issues in both scenarios, the public welfare is a larger issue. Failure to capture and report adverse effects, particularly serious adverse effects, will not only be a problem for the individual in question but potentially for other actual and potential study subjects. It is also essential to capture the information so that the total picture is available to FDA when a marketing decision is imminent. The individual in question may be one of very few who would evidence the particular adverse effect, particularly given the limited number of individuals included in a study. However, this information could have major ramifications for the potentially large population of users of the drug once legally marketed. How to best go about collecting the details of the adverse effect is obviously a different issue.
In summary, the AE collection can be depicted as the following where TEAE stands for treatment-emergent adverse event:



Saturday, December 23, 2017

Composite Endpoint and Competing Risk Model

A competing risk is an event whose occurrence precludes the occurrence of the primary event of interest. For example, when the primary outcome is death due to cardiovascular causes, then death due to non-cardiovascular causes serves as a competing risk, because subjects who die of non-cardiovascular causes (e.g., death due to cancer) are no longer at risk of death due to a cardiovascular cause. However, when the primary outcome is all-cause mortality, then competing risks are absent, as there are no events whose occurrence precludes the occurrence of death due to any cause. In event-driven clinical trials, if a study subject drops out from the study prior to occurrence of the event in interest, the event of dropout precludes the occurrence of the event in interest, this is also a competing risk.

Competing risk issue occurs in clinical trials with a composite endpoint or an endpoint with composite outcome. A composite outcome consists of two or more component outcomes. Patients who have experienced any one of the events specified by the components are considered to have experienced the composite outcome. The main advantages supporting the use of a composite outcome are that it increases statistical efficiency because of higher event rates, which reduces sample size requirement, costs, and time; it helps investigators avoid an arbitrary choice between several important outcomes that refer to the same disease process; and it is a means of assessing the effectiveness of a patient reported outcome that addresses more than one aspect of the patient’s health status
It is common to use a composite endpoint in clinical trials, especially in clinical trials where the primary interest is to reduce the adverse outcomes, but the occurrence of these adverse outcomes may not be frequent enough. If we do a study with each individual component as the endpoint, the sample size required will be too large.

MACE (major adverse cardiac events) is a composite endpoint frequently used in clinical trials assessing the treatment effect in cardiac health. MACE is defined as any event of all-cause mortality, myocardial infarction, or stroke. If a patient died during the study, the MI or stroke will not be observed. If a MI or Stroke event occurred and the subject is discontinued from the study once one of these events occurred, the death event will not be observed – one component is a competing risk for another component.
In clinical trials in pulmonary arterial hypertension, the composite endpoint is used to evaluate the treatment effect in reducing the mortality and morbidity events. EMA guidance “GUIDELINE ON THE CLINICAL INVESTIGATIONS OF MEDICINAL PRODUCTS FOR THE TREATMENT OF PULMONARY ARTERIAL HYPERTENSION “ suggested the time to clinical worsening as the primary efficacy endpoint where the clinical worsening is defined as a composite endpoint consisting of:
1. All-cause death.
2. Time to non-planned PAH-related hospitalization.
3. Time to PAH-related deterioration identified by at least one of the following parameters:
  • increase in WHO FC;
  • deterioration in exercise testing
  • signs or symptoms of right-sided heart failure

Arterial Hypertension”, the primary end point in a time-to-event analysis was a composite of death or a complication related to pulmonary arterial hypertension, whichever occurred first, up to the end of the treatment period. The composite endpoint includes the following events:
  • death (all-cause mortality)
  • hospitalization for worsening of PAH based on criteria defined in the study protocol
  • worsening of PAH resulting in need for lung transplantation or balloon atrial septostomy initiation of parenteral (subcutaneous or intravenous) prostanoid therapy or chronic oxygen therapy due to worsening of PAH
  • disease progression (patients in modified NYHA/WHO functional class II or III at Baseline) confirmed by a decrease in 6MWD from Baseline (≥ 15%, confirmed by 2 tests on different days within 2 weeks) and worsening of NYHA/WHO functional class
  • disease progression (patients in modified NYHA/WHO functional class III or IV at Baseline) confirmed by a decrease in 6MWD from Baseline (≥ 15%, confirmed by 2 tests on different days within 2 weeks) and need for additional PAH-specific therapy.

There is a competing risk issue here, for example, lung transplantation and death are competing each other. If patient has a lung transplantation, the disease course will be changed, and the chance of death and occurrence of other events will be altered. 

A common approach to avoid the competing risk issue is to analyze the time to first event (any one of the components defined in the composite endpoint) as the primary efficacy endpoint even though this approach is often criticized because the importance / severity of these components is not equal (death should be given way more weight than other non-fatal events). FDA seems to be totally comfortable with the time to first event approach in both composite endpoint situation (as evidenced by the approval ofSelexipeg) and recurrent event situation (as evidenced by the FDA advisorycommittee meeting discussion). In a panel discussion at the regulatory-industry workshop in 2017 on the topic of Better Characterization of Disease Burden by Using Recurrent Event Endpoints (View Presentation), Drs Bob Temple and Norman Stockbridge both commented that FDA is fine with the time to fist event analysis as long as further analyses  are performed to evaluate the treatment effect on each individual component.

Competing risk model may be used in statistical analysis of the clinical trial data either as the primary method or as sensitivity analysis. In Schaapveld et al (2015) Second Cancer Risk Up to 40 Years after Treatment for Hodgkin’s Lymphoma, the competing risk model was used for analyzing the cumulative incidence of second cancers.
The cumulative incidence of second cancers was estimated with death treated as a competing risk, and trends over time were evaluated in competing-risk models, with adjustment for the effects of sex, age, and smoking status when appropriate

Competing risk model is more likely to be used as a sensitivity analysis, for example, in SPRINT study “A Randomized Trial of Intensive versus Standard Blood-Pressure Control”, The Fine–Gray model for the competing risk of death was used as a sensitivity analysis.

There are quite some discussions about the competing risk model in clinical trials:

In the situation where there is a competing risk issue, the Grey’s method or Fine and Gray method can be used. These methods are based on the paper below:
  • Gray, R. J. (1988), “A Class of K-Sample Tests for Comparing the Cumulative Incidence of a Competing  Risk,” Annals of Statistics, 16, 1141–1154.
  • Fine, J. P. and Gray, R. J. (1999), “A Proportional Hazards Model for the Subdistribution of a Competing Risk,” Journal of the American Statistical Association, 94, 496–509.

There are SAS macros for Gray’s method. Recently, Gray’s method and Fine and Gray methods are built in SAS PHREG and SAS PHREG can be handily used for performing the competing risk model. Here are some SAS papers regarding competing risk model analysis.

Sunday, December 10, 2017

Recurrent Events versus Composite Events: Statistical Analysis Methods for Recurrent Events

Recurrent events are repeated occurrences of the same type of event.
Composite endpoint is a combination of various clinical events that might happen, such as heart attack or death or stroke, where any one of those events would count as part of the composite endpoint.

While composite endpoint may also be discussed within the scope of the recurrent event endpoint, there are some distinctions between these two terms. The methods for statistical analysis are also different:
Recurrent Event Endpoint
Composite Endpoint
Examples:
  • Relapses in multiple sclerosis
  • Exacerbations in pulmonary diseases such as chronic obstructive pulmonary disease
  • Bleeding episodes in hemophilia

Examples:
  • MACE in cardiovascular trials where MACE (major adverse cardiac event) includes death, MI, and stroke.
  • Clinical worsening event in pulmonary arterial hypertension where clinical worsening includes all-cause death, PAH-related hospitalization, PAH-related deterioration of disease,…

Same type of event
Different type of event
Each event has the same contribution to the total number of events.
It is usually criticized that each component may contribute differently to the total counts (death is much severe event comparing with others)
The study design is usually with fixed duration. Events are collected over a fixed duration of time
The study design is usually an event-driven study. Different subjects may be followed up for different durations
Usually for events with relatively frequency
Usually for events that not frequently or rarely happen (so that we combine all these types to increase the power and minimize the sample size)
Can be analyzed as:
Frequency of events
Annualized rate of events
Time to first event
Duration of events
Duration of event free
Can be analyzed:
Time to the first event
Time to event for each component
Frequency of events
Competing risk is less an issue
Competing risk is an issue
Example of a trial with recurrent event endpoint:
Example of a trial with composite endpoint:






While the composite endpoint is usually analyzed as time to first event (whichever occurs the first for any of the components) using log rank test or Cox proportional hazard model, the recurrent event may be analyzed using different ways. Below are some examples of  

Emicizumab Prophylaxis in Hemophilia A with Inhibitors

The primary end point was the difference in the rate of treated bleeding events (hereafter referred to as the bleeding rate) over a period of at least 24 weeks between participants receiving emicizumab prophylaxis (group A) and those receiving no prophylaxis (group B) after the last randomly assigned participant had completed 24 weeks in the trial or had discontinued participation, whichever occurred first.
For all bleeding-related end points, comparisons of the bleeding rate in group A versus group B and the intraindividual comparisons were performed with the use of a negative binomial-regression model to determine the bleeding rate per day, which was converted to an annualized bleeding rate.
The primary efficacy end point was the annual rate of sickle cell–related pain crises, which was calculated as follows: total number of crises× 365 ÷ (end date − date of randomization + 1),with the end date defined as the date of the last dose plus 14 days. Annualized rates were used for the comparisons because they take into account the duration that a participant was in the trial. The crisis rate for every patient was annualized to 12 months. The annual crisis rate was imputed for patients who did not complete the trial. The difference in the annual crisis rate between the high-dose crizanlizumab group and the placebo group was analyzed with the use of the stratified Wilcoxon rank-sum test, with the use of categorized history of crises in the previous year (2 to 4 or 5 to 10 crises) and concomitant hydroxyurea use (yes or no) as strata. A hierarchical testing procedure was used (alpha level of 0.05 for high-dose crizanlizumab vs. placebo, and if significant, low-dose crizanlizumab vs. placebo).

A painful crisis was defined as a visit to a medical facility that lasted more than four hours for acute sickling-related pain (hereinafter referred to as a medical contact), which was treated with a parenterally administered narcotic (except for a few facilities in which only orally administered narcotics were used); the definition is similar to that used in a previous study. Annual rates were computed by dividing the number of crises by the number of years elapsed (e.g., 6 crises in 1.9 years - 3.16 crises per year). To test the effect of treatment on the crisis rate, the patients were ranked according to the number of crises they had had per year for observed periods of up to two years. Death was considered the worst outcome, followed by a stroke (defined as a documented new neurologic deficit lasting more than 24 hours, confirmed by a neurologist) or the institution of long-term transfusion therapy (more than four months); outcomes for all other patients were ranked according to the individual crisis rate. These ranks were used to compare the two treatment groups (Van der Waerden’s test). A rank statistic was planned for the primary analysis because it was expected to have more power to detect differences and to be less influenced by extreme values than a t-test of the means.

The primary efficacy endpoint was mean change from baseline in frequency of headache days for the 28-day period ending with week 24. A headache day was defined as a calendar day (00:00 to 23:59) when the patient reported four or more continuous hours of a headache, per the patient diary. Subsequent to study initiation, but prior to study completion and treatment unmasking, the protocol and statistical analysis plan for PREEMPT 2 was amended to change the primary and secondary endpoints, making frequency of headache days the PREEMPT 2 primary endpoint. This change was made based on several factors: availability of PREEMPT 1 data, guidance provided in newly issued International Headache Society clinical trial guidelines for evaluating headache prophylaxis in CM (34) and the earlier expressed preference of the US Food and Drug Administration (FDA), all of which supported using headache day frequency as a primary outcome measure for CM. For each primary and secondary variable, prespecified comparisons between treatment groups were done by analysis of covariance of the change from baseline, with the same variable’s baseline value as a covariate, with main effects of treatment group and medication overuse strata. The baseline covariate adjustment was prespecified as the primary analysis; sensitivity analyses (e.g., rank-sum test on changes from baseline without a baseline covariate) were also performed.

The primary outcome was the time to the first acute exacerbation of COPD, with acute exacerbation of COPD defined as “a complex of respiratory symptoms (increased or new onset) of more than one of the following: cough, sputum, wheezing, dyspnea, or chest tightness with a duration of at least 3 days requiring treatment with antibiotics or systemic steroids.” The primary analysis was based on a log-rank test of the difference between the two treatment groups in the time to the first exacerbation, with no adjustments for baseline covariates. A Cox proportional-hazards  model was used to adjust for differences in prespecified, prerandomization factors that might predict the risk of acute exacerbations of COPD.

The primary outcome was the effect of simvastatin on the exacerbation rate, which was defined as the number of exacerbations per person-year.

COPD exacerbation rates in the two study groups were compared with the use of a rate ratio. The independence of individual exacerbations was ensured by considering participants to have had two separate exacerbations if the onset dates were at least 14 days apart. Exacerbation rates in each group and the between-group differences were analyzed with the use of negative binomial regression modeling and time-weighted intention-to-treat analyses with adjustments of confidence intervals for between-participant variation (overdispersion).




FDA recommended the time to first exacerbation as the primary efficacy endpoint over the use of frequency of exacerbations as primary endpoints. The time to first exacerbation will be analyzed using log-rank test or Cos proportional hazard model.
Even though the FDA agrees that the frequency of exacerbations may be a clinically relevant endpoint; however, there are several statistical issues and challenges in providing a reliable and unbiased estimate of treatment effect using this endpoint:
  • Dependencies of exacerbation on previous exacerbations within patients
  • Effect of influential cases as it can potentially impact the results
  • Distinguishing between early vs. late exacerbations as a function of time
  • Distinguishing between first vs. subsequent exacerbations within patients
  • Investigator biases in assessing the number of events (e.g. events occurring close together)