Design and analysis of stratified clinical trials in the presence of bias (2024)

  • Journal List
  • Sage Choice
  • PMC7270725

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

Design and analysis of stratified clinical trials in the presence ofbias (1)

Stat Methods Med Res. 2020 Jun; 29(6): 1715–1727.

Published online 2019 May 10. doi:10.1177/0962280219846146

PMCID: PMC7270725

PMID: 31074333

Ralf-Dieter Hilgers,1 Martin Manolov,1 Nicole Heussen,1,2 and William F Rosenberger3

Author information Copyright and License information PMC Disclaimer

Associated Data

Supplementary Materials

Abstract

Background

Among various design aspects, the choice of randomization procedure have tobe agreed on, when planning a clinical trial stratified by center. The aimof the paper is to present a methodological approach to evaluate whether arandomization procedure mitigates the impact of bias on the test decision inclinical trial stratified by center.

Methods

We use the weighted t test to analyze the data from aclinical trial stratified by center with a two-arm parallel group design, anintended 1:1 allocation ratio, aiming to prove a superiority hypothesis witha continuous normal endpoint without interim analysis and no adaptation inthe randomization process. The derivation is based on the weightedt test under misclassification, i.e. ignoring bias. Anadditive bias model combing selection bias and time-trend bias is linked todifferent stratified randomization procedures.

Results

Various aspects to formulate stratified versions of randomization proceduresare discussed. A formula for sample size calculation of the weightedt test is derived and used to specify the toleratedimbalance allowed by some randomization procedures. The distribution of theweighted t test under misclassification is deduced, takingthe sequence of patient allocation to treatment, i.e. the randomizationsequence into account. An additive bias model combining selection bias andtime-trend bias at strata level linked to the applied randomization sequenceis proposed. With these before mentioned components, the potential impact ofbias on the type one error probability depending on the selectedrandomization sequence and thus the randomization procedure is formallyderived and exemplarily calculated within a numerical evaluation study.

Conclusion

The proposed biasing policy and test distribution are necessary to conduct anevaluation of the comparative performance of (stratified) randomizationprocedure in multi-center clinical trials with a two-arm parallel groupdesign. It enables the choice of the best practice procedure. The evaluationstimulates the discussion about the level of evidence resulting in thosekind of clinical trials.

Keywords: Multi-center clinical trial, weighted t test, sample size, stratified randomization, type I error probability, selection bias, time-trend bias

1 Introduction

Large clinical trials often stratify the randomization on a small collection ofcovariates that may introduce heterogeneity into the patient stream. An importantcovariable in multi-center trials is often the clinical center, as different studypersonnel, clinical settings, and patient populations may result in differentialstudy outcomes.1 A stratified population-based analysis can be performed with or withoutstratification in the design. Less is known about the impact of stratification whenthere is a bias in the clinical trial. In this paper, we explore this issue both forselection bias and chronological bias, and we demonstrate the impact of theseanalyses on a weighted stratified analysis. In so doing, we explore the role ofspecific stratified randomization procedures (RPs) and how certain procedures maymitigate the effects of bias. The recognition of the role of RPs in mitigating biashas been explored in prior research for unstratified trials.26 But because stratification intoK strata creates K different independentrandomized clinical trials, and a stratified test combines Kindependent tests, the impact of bias can be more pronounced.

The paper is organized as follows: In Section 2, we describe different stratified RPsand discuss aspects to formulate stratified versions of RPs. In Section 3, we derivea formulation of Fleiss1 stratified test statistic preserving the allocation sequence and derive thedistribution of the test statistic taking bias into account but ignoring bias in theanalysis and mention some sample size considerations for the stratified test. InSection 4, we specify the bias model in the form of an additive combination ofstrata-specific selection bias and strata-specific time-trend bias linked tostratified allocation sequence. The criterion introduced in Section 5 is used tosummarize the impact of the allocation sequence-specific bias on the type I errorprobability over the range of all sequences induced by a specific RP. Consequently,an assessment of different RPs is enabled which guides the choice of an RPs forapplication in a particular clinical trial setting. The methodology is applied tosome-specific scenarios in Section 6 to illustrate the effects. We discuss thefindings in Section 7 and draw conclusions in Section 8.

2 Stratified RPs

RPs for clinical trials for two treatments are well described in literature.2 In principle, any RP used for two-treatment clinical trials can be employedwithin strata in a stratified randomization. A comprehensive review is given in Rosenberger.2 Complete randomization in which patients are assigned to treatments withprobability 1/2 is rarely used in stratified clinical trials. Rather, some form ofrestricted randomization is employed in an effort to balance treatments withinstrata. Hilgers6 categorized restricted RPs that force balance in probability, force balanceusing a maximal tolerated imbalance, or force terminal balance. A selective list ofrestricted RPs is given as follows2:

  • Efron's biased coin design (EBC(p)),which consists of flipping a biased coin with probabilityp0.5 in favor of the treatment which has been allocatedless frequently and a fair coin in case of equal numbers of treatment assignments,7

  • Big stick design (BSD(a)), which can beimplemented via complete randomization with a forced deterministicassignment when a maximal tolerated imbalance a isreached during the enrollment,8

  • Random allocation rule (RAR), which assigns half thepatients to E and C randomly,9

  • Permuted block randomization (PBR(b))with block size b uses RAR within blocks ofb patients, for b even,10

  • Maximal procedure (MP(a)) which usesthe allocation sequences of RAR by additionally imposing a maximaltolerated imbalance (a) and assigning equal probabilityto all such sequences.11

Note that EBC(p) may be classified as a restricted RPforcing balance in probability. BSD(a) forces balance by maximaltolerated imbalance a during the allocation process but does notforce terminal balance. Restricted RPs with a maximal tolerable imbalance andterminal balance are PBR(b) and MP(a).

The International Council of Harmonization stated in the E9 recommendation (ICH E9)

It is advisable to have a separate random scheme for each centre, i.e., tostratify by centre or to allocate several whole blocks to each centre.

The European Medicines Agency “Guideline on Clinical Trials in Small Populations”recommends stratified randomization to improve power. Using permuted blocks withineach stratum is the most popular method of stratified randomization, and this isoften called the stratified block design. Blocks can be selectedwith a fixed size or with variable sizes. However, blocking is not the only methodto use within strata. The ICH E912 guidelines also state that “differenttrial designs will require different procedures for generating randomizationschedules.” We now define stratified randomization more formally.

Consider the allocation zji{0,1} of patients i=1,,nj either to the treatment E if zji = 1 or C if zji = 0 in stratum j. An RP is implemented by assigningprobabilities P(Zj=zj|zj{0,1}nj) to the possible allocations Zj=(Zj1,,Zjnj) in stratum j. A stratifiedrandomization is implemented by creating independent randomizationlists zj{0,1}nj for each stratum 1jK. Denote the possible allocations by z=(z1,,zK)×j=1K{0,1}nj={0,1}N with N=j=1Knj. Then a stratified version of an RP is implemented by assigningprobabilities P(Z=z|z{0,1}N)=Πj=1KP(Zj=zj|zj{0,1}nj) to the possible allocations z{0,1}N. Of course, when implementing complete randomization, thestratified and unstratified RPs result in the same set if randomization sequenceswith the same probabilities because assignments are independent and equiprobable,i.e.

P(Z=z|z{0,1}N)=Πj=1KP(Zj=zj|zj{0,1}nj)=Πj=1K12nj=12N

However, when implementing a stratified restricted RP, this observation generallydoes not hold and some further definitions are necessary. Even in the very simpleRAR, the set of possible randomization sequences is reduced considerable and theprobability for the stratified allocation sequence becomes

P(Z=z|z{0,1}N)=Πj=1KP(Zj=zj|zj{0,1}nj)=Πj=1K(njnj/2)-1=(NN/2)-1

Another important aspect concerns the “balancing behavior” of restricted RPs. Theterm restricted refers to the fact that conditions on the randomization process areintroduced to control the potential imbalance in the frequency of treatmentallocations. Let s,1sN, denotes the patient's number preserving the appearance of patientsin the trial so that s = 1 denotes the first patient ands = N the last enrolled patient.njE(s) and njC(s) denote the number of patients allocated to treatmentsE and C in stratum j until atotal of s patients are recruited in the trial so far. Then, theimbalance in the number of allocations to treatment E andC in stratum j until a total ofs patients are recruited is measured by

dj(s)=njE(s)-njC(s)

(1)

Three definitions of imbalance are used in the following:

  1. An RP shows overall final balance, if d(N)=defj=1Kdj(N)=0

  2. An RP controls the final balance within strata, ifdj(N)=0 for 1jK

  3. An RP controls the maximal tolerated imbalance, if-adj(s)a for all 1jK,1sN.

Of course controlling the overall final balance within strata does not imply tocontrol final balance within stratum, i.e. dj(s)=0. Simply controlling the overall final balance may result in onestratum assigning patients only to E and another stratum assigningthe same number of patients to C only, a case which invalidates theestimation of treatment difference within strata, presumably one issue in the ICHguidance. On the other hand, final balance within strata (dj(N)=0) implies overall final balance d(N)=0. In the following, we deal with stratified RPs and deriveadditional restrictions for meaningful definitions. With RAR, stratification andconsequently final balance within strata require even samples sizes within eachstratum and two treatment arms. The requirement of final balance within strataimplies in the case of the stratified block design that the block sizes are divisorsof the stratum sample sizes nj,1jK. Note that in stratified trials with a larger number of centers,usually smaller sample sizes in centers occur and thus final balance within strataforces the block sizes to be small, which will increase the potential for selectionbias. Of course, center-specific block sizes are possible but rather uncommon. Wewill consider common block sizes in the following.

It should also be noted that the stratified RAR procedure in general cannot beconsidered as an unstratified PBR with block sizes nj,1jK, because enrollment of patients in the trial is parallel in strata,so that in general d(j=1knj)=0,1kK.

Similar problems arise, when controlling the maximal tolerated imbalance with margina, which results in an upper overall bound of j=1K|dj(s)|Ka for all 1sN. Thus, controlling the maximal tolerated imbalance across stratacould be accomplished by having a different imbalance level in each stratum. Twovery straightforward simple settings are uniform spread |dj(s)|a/K,1jK for all 1sN resulting j=1K|dj(s)|j=1Ka/K=a for all 1sN or proportional spread with |dj(s)|anj/N,1jK for all 1sN resulting j=1K|dj(s)|j=1Kanj/N=a for all 1sN. Hilgers6 suggests defining a in relation to loss of power, and thisimplies stratum-specific maximal tolerated imbalance according to the rulesabove.

3 Stratified analysis

As mentioned in Section 1, a stratified randomization requires a stratified analysis,although a stratified analysis can be performed whether or not the randomization wasstratified. In this section, we examine the distributional properties of a teststatistic introduced by Fleiss1 (page 268, formulas 1 and 2) based on a weighted t statisticfor the analysis of stratified clinical trials. While we do not considerrandomization tests in this paper, clearly randomization-based inference is anattractive alternative, see Rosenberger.2 The reason for using a parametric t test is that itfacilitates our goal of determining the effect of bias on inference, since we canderive the distribution of the test statistic under various forms of bias. Inparticular, in this section, we derive the non-centrality parameter for thedistribution of the test statistic under alternative hypotheses and comment on howit can be used for sample size considerations. In the sequel, we are interested inthe role of the RP in the analysis of stratified trials. Because Fleiss wrotespecifically about centers rather than strata, we use both interchangeably; itshould be clear that stratification can be done on variables other than centerhowever.

We will consider a two-arm parallel group clinical trial stratified byK centers with no interim analysis. The response to thetreatments E and C respectively is measured withthe continuous normally distributed endpoint yji,1inj=njE+njC,1jK, on njE patients in the experimental group (E) andnjC patients in the control group (C) in centersj. The total sample size is denoted by N=j=1Knj.

We use the allocation sequence notation of the statistical modelassuming no treatment by center interaction by

yji=μEZji+μC(1-Zji)+τji+εji

(2)

where εjiN(0,σ2),1inj=njE+njC,1jK. The expected treatment effects under E andC are denoted by μE and μC, respectively. The Zji denotes the allocation sequence indicator with Zji = 1 if patient i in center j is allocatedto treatment E and Zji = 0 if patient i in center j is allocatedto treatment C. Here and in what follows the notationsnjE=i=1njZji and njC=i=1nj(1-Zji) are used. Furthermore, τji denotes the fixed “bias” effect acting on the response of patienti in center j. Without loss of generality, weassume τji>0.

Fleiss's statistic to test the hypothesis (H0μE=μC) of no treatment effect across centers becomes

t=j=1KwjDjspj=1Kwj2/wj*=(j=1KwjDj)σ-1j=1Kwj2/wj*-1sp/σ

(3)

where Dj=yjE-yjC are the mean treatment differences with yjE=1njEi=1njyjiZji and yjC=1njCi=1njyji(1-Zji). Furthermore, wj are weights associated with center j, wj*=njE×njCnjE+njC and sp is the pooled variance given by

sp2=(j=1K=E,C(nj-1)sj2)/(j=1K=E,C(nj-1)sj2)(j=1K=E,C(nj-1))(j=1K=E,C(nj-1))

Here sj2 denotes the variance of treatment group ℓ in centerj. To derive the distribution of equation (3) under model (2),the distributions of the numerator as well denominator must be calculated. Note thatthe variance is given by

Var(wjDj)=j=1Kwj2(1njE2i=1njVar(yji)Zji+1njC2i=1njVar(yji)(1-Zji))=j=1Kwj2(σ2njE+σ2njC)=σ2j=1Kwj2njE+njCnjE×njC=σ2j=1Kwj2/wj*

so that the numerator of equation (3) has variance 1. Of course,Dj is normally distributed via the distribution of yij and thus the expectation of the denominator equals

δ(Z)=E((j=1KwjDj)σj=1Kwj2/wj*)=(σj=1Kwj2/wj*)-1E(j=1Kwj(1njEi=1njyjiZji-1njCi=1njyji(1-Zji)))=(σj=1Kwj2/wj*)-1((μE-μC)j=1Kwj+j=1Kwj(τjE-τjC))

where Z=(Z11,,Z1n1,,ZK1,,ZKnK)t is the observed allocation vector, τjE=1njEi=1njτjEZji and τjC=1njCi=1njτjC(1-Zji). In summary, the numerator is i.i.d. normally distributed withexpectation δ(Z) and variance 1.

Next, we calculate the distribution of the denominator sp/σ using the allocation sequence notation

(j=1K=E,C(nj-1))sp2σ2=j=1K=E,C(nj-1)sj2σ2=j=1K((njE-1)sjE2σ2+(njC-1)sjC2σ2)=j=1K(i=1njZji(yjiσ-yjEσ)2+i=1nj(1-Zji)(yjiσ-yjCσ)2)

Note that Var(yji/σ)=1 and E(yji/σ)=(μE+τji)/σ for Zji = 1 and or E(yji/σ)=(μC+τji)/σ for Zji = 0 are i.i.d. normally distributed for all 1inj and 1jK. Following the arguments in Johnson and Kotz,13 the i=1njZji(yji-yjE)2/σ2 for group E, i.e. Zij = 1 and the i=1nj(1-Zji)(yji-yjC)2/σ2 for group C, i.e. Zij = 0 are χ2 distributed with njE-1 and njC-1 degrees of freedom respectively and non-centrality parameters

i=1njZjiσ2(μE+τji-1njEi=1njE(μE+τjE))2=i=1njZjiσ2(τji-τjE)2i=1nj(1-Zji)σ2(μC+τji-1njCi=1njC(μC+τjC))2=i=1nj(1-Zji)σ2(τji-τjC)2

Applying that the sum of independent non-central χνj2(λj) distributions is non-central χ2 with νj degrees of freedom and non-centrality parameter λj, it follows that the distribution of (j=1K=E,C(nj-1))sp2/σ2 is non-central χ2 with non-centrality parameter

λ(Z)=1σ2j=1K(i=1njZji(τji-τjE)2+i=1nj(1-Zji)(τji-τjC)2)=1σ2j=1Ki=1njτji2-j=1KnjEτjE2-j=1KnjCτjC2

and

df=j=1K=E,C(nj-1)=N-2K

(4)

degrees of freedom. Finally, we have to show the independence of thenumerator

j=1KwjDj=j=1Kwj(yjE-yjC)=j=1Kwj(1njEi=1njyjiZji-1njCi=1njyji(1-Zji))

and denominator

(j=1K=E,C(nj-1))sp2=j=1K(i=1njZji(yji-yjE)2+i=1nj(1-Zji)(yji-yjC)2)

as random variables. Here, Theorem 3 of Searle14 is used, stating that two random variables that can be expressed asxtAx and Bx, where xN(μ,V) is independent, if BVA=0. First, note that V=σ2I holds in our case.

For enabling the matrix notation of the above expressions, a usual design matrixX can be defined which includes two columns for the allocationindicator variables and N rows. Rearrangement of the design matrixby center and treatment group so that the first n1E observations belong to treatment E and thepreceding n1C observations belong to C in center 1 and so oncan be implemented by a suitable permutation matrix P. This simplifies the matrix notation of the above numerator anddenominator in terms of B and A by reshuffling the allocation sequence Z=(Z11,,Zknk)t using a suitable permutation matrix P. This permutation matrix does not affect the matrix equation.Furthermore, note that it is sufficient to show the matrix equation for a particularcenter j because of the block structure implied by the independentobservations in different centers. With this reshuffling, the notation for centerj corresponding to Theorem 3 is

Bj=wj(1njE1njEt,-1njC1njCt)t

and with Hij=Inij-1nij1nij×nij the matrix

Aj=(HnjE,HnjC)=(InjE-1njE1njE×njE,InjC-1njC1njC×njC)

so that σ2BjInjE+njCAj=0 for all 1jK, which shows the independence. In summary, the distribution of thestatistic in equation (3) is doubly non-central t,13 with non-centrality parameter

δ(Z)=(σj=1Kwj2/wj*)-1((μE-μC)j=1Kwj+j=1Kwj(τjE-τjC))λ(Z)=1σ2(j=1Ki=1njτji2-j=1KnjEτjE2-j=1KnjCτjC2)

(5)

In the case sampling is “stratified” by center and the objective is to estimate theoverall treatment effect accounting for center, Fleiss1 proposed the weights wj=wj*=njE×njCnjE+njC resulting in the test statistic (3)

t=j=1Kwj*Djspj=1Kwj*=j=1KnjE×njCnjE+njCDjspj=1KnjE×njCnjE+njC

(6)

Of course, equation (5) implies that δ(Z) depends on the weights only and becomes

δ(z)=(σj=1Kwj*)-1((μE-μC)j=1Kwj*+j=1Kwj*(τjE-τjC))

(7)

In the case sampling is “stratified” by center and the objective is to estimate theoverall treatment effect, Fleiss1 proposed the weights wj = 1 so that equation (3) becomes

t=j=1KDjspj=1K1/wj*

whereas the first non-centrality parameter δ(z) equals

δ(z)=(σj=1K1/wj*)-1(K·(μE-μC)+j=1K(τjE-τjC))

Weighting centers in the absence and presence of center-by-treatment interaction hasdiscussed in detail by other authors.15

3.1 Sample size considerations

We now briefly discuss the aspects of the sample size and power calculation usingthe weighted t test statistic. Details can be found in theSupplementary Material Section S1. The results will be used in our numericalevaluation study.

Assuming no bias τji=0 in model (2), the sample size to prove the hypothesisH0μE=μC vs. H1μE-μC=Δ with the weighted t test (equation (3)) isgiven by

(j=1Kwj)2j=1Kwj2wj*=σ2Δ2(tN-2K(1-β)+tN-2K(1-α/2))2

(8)

The derivation assumed hom*ogeneous variances in all groups and centers. Using theoptimal weights of Fleiss,1 i.e. wj=wj*=njE×njCnjE+njC, equation (8) simplifies to

j=1KnjE×njCnjE+njC=σ2Δ2(tN-2K(1-β)+tN-2K(1-α/2))2

which in case of a balanced allocation ratio of r·nj=njE and (1-r)·nj=njC with 0rr for all 1jK becomes

r(1-r)N=σ2Δ2(tN-2K(1-β)+tN-2K(1-α/2))2

(9)

This formula, derived under the assumption of hom*ogenous variances using theoptimal weights and the allocation ratio of r, can be evaluatedunder various perspectives. One can determine the sample size necessary todetect a certain treatment effect of a clinical trial or to determine the powerfor various settings of the allocation ratio. Of course, the relationship of thesample size to the RP is obvious in the case of RPs forcing terminal balance.The power can also be related to RPs with the maximal tolerated imbalance margina. The margin can be justified on the basis of thetolerable loss in power resulting from unbalanced allocation. In this case,equation (9) can be used to describe the relationship between rand the power. Both aspects are mentioned in the numerical evaluation studybelow. Using the weights wj=1, the left-hand side of equation (8) yields r(1-r)K2j=1K1nj and thus depends on the center sample sizes. In the case ofequal center sample sizes, the same formula can be used for the unweighted test.In contrast to the weighted test, the sample size formula for the unweightedcase requires assumptions if unbalanced sample sizes across centers areassumed.

4 Stratification in the presence of bias

We now turn to the question of bias. Two common forms of bias encountered in clinicaltrials are chronological bias due to time trends in patient outcomes,16 and selection bias, which can result in covariate imbalancesand inflation of type I error rates.3 By definition, selection bias arises from the conscious or unconsciousguessing of treatment assignments so that patients have a higher chance ofassignment to the investigator's treatment of choice for those patients. Whiledouble-blinded studies, and multi-center studies with a central randomization unitmitigate the possibility of selection bias, Berger17 gives numerous examples of when selection bias has arisen in practice. As theICH E9 Guidelines note,12

It is important to identify potential sources of bias as completely aspossible so that attempts to limit such bias may be made…. The treatmenteffect and treatment comparisons should involve consideration of thepotential contribution of bias to the p-value.

A recent paper provides a template on assessing the potential forchronological or selection bias and gives guidance on how to choose an appropriateRP and test statistic to account for that possibility.6 Here, we use a similar model to determine the impact on Fleiss's test in thepresence of such bias.

We first specify a compound bias vector τji for stratum j and patient i that is alinear combination of a metric of chronological bias and selection bias. Taking intoaccount the stratified randomization, we explore a linear time-trend16 model per stratum similar to Hilgers6 given by

τji=θjinjE+njClineartimetrend+ηjnjE(i-1)-njC(i-1)njE(i-1)+njC(i-1)selectionbias

(10)

Hereby, the magnitude θj of the linear time trend varies between centers. Note that Hilgers6 proposed to formulate θj as fraction of the variance σ2. The second term generalizes the biasing policy first introduced by Proschan3 for the Gauss test and later investigated by Hilgers6 for the t test. The amount of selection bias ηj0 is allowed to vary between centers. The biasing policy in equation(10) “favors” or biases the expected response towards treatment Eassuming if the less frequent treatment allocated so far is Eassuming E will be allocated next. The direction ηj0 corresponds to favoring E. Other metrics havebeen used to define the selection bias metric, including just the sign ofnjE(i-1)-njC(i-1). We chose our metric so that it is roughly the same scale as thechronological bias metric.

5 Evaluation criterion

In our numerical evaluation study, we enumerate all possible randomization sequencesfor four different procedures and compare the bias to the type I error rate viacomputing the proportion of sequences that preserve the type I error rate at thenominal (0.05) level. If there is no bias (e.g. ηj=θj=0), 100% of sequences will preserve the type I error rate,regardless of the procedure used. To be more formal, denote the bias vectorτ=(τ11,,τ1nj,,τK1,,τKnj) and the set of all sequences z generated by the RP by ΩRP. The test statistic t(z) depends on the randomization sequence is centralt distributed with N-2k degrees of freedom under the null hypotheses and no biasτ=0, i.e. the null hypotheses H0 will berejected at the α level if |t(z)|tN-2k(1-α/2). Then, the evaluation criterion can be expressed by using ourdistributional result above including the non-centrality parameter (7)

PRP,τ(H1|H0)=PRP,τ(Z{0,1}N|tN-2k,δ(Z),λ(Z)(1-α2)||tN-2k(1-α2)|)=zΩRP1{FN-2k,δ(z),λ(z)(tN-2k(α2))+FN-2k,-δ(z),λ(z)(tN-2k(α2))α}PRP,τ(Z=z)

(11)

where FN-2k,δ,λ denotes the distribution function of the doubly non-centralt-distribution with N-2K degrees of freedom and non-centrality parameters δ(Z) and λ(Z). In the ideal case, the probability should be 1, meaning that the5% level is maintained by all allocation sequences. A value below 1 indicates thatthe actual type I error rate is higher than the target level of 5%. Note that thisquantity summarizes the impact of bias over all randomization sequences anddemonstrates the clinical consequences as well as the “go/no-go” decision of theregulator directly.

6 Numerical evaluation study

The objective of the following numerical evaluation study is to illustrate effects ofstratification in both the randomization and the test statistic. It is not intendedto conduct a comprehensive simulation study, recognizing that the specification ofthe sample size as well as θj and ηj depends on the practical situation. To be more specific, we start with aK = 2 center clinical trial and use a total sample size of 80patients with common θj and ηj in all centers. The following reasoning leads to the specification ofθj and ηj. Concerning the linear time trend θj, it should be noted that although the θj are defined within each center, the maximal extent of the time trend shouldnot exceed σ. In contrast, although the magnitude of the selectionbias effect ηj may vary between centers, it is like a population effect within center and nomaximal extent restriction may apply. To relate the total sample size of 80 in aK = 2 center clinical trial to the effect size, formula (9) isused. The hypothesis H0μE=μC vs. H1μE-μC=Δ should be tested with the (optimal) weighted ttest (equation (3)) assuming common variance σ = 1 and intendedallocation ratio of 1:1 at the 5% significance level with a power of 80%. Thisresults in a uniform effect size of Δ=0.635. With this effect size, the allocation ratio r isvaried so that the loss in power does not exceed 2%. This yields an allocation ratioof r = 0.608 which translates to sample size of 31:49 correspondingto a maximal tolerable imbalance of 18. With the uniform or proportional spread,this results in a maximal tolerated imbalance by center of 4 and 5,respectively.

For illustration purposes, we will compare the stratified and unstratified versionsof CR, BSD(9), PBR(4), and EBC(2/3). These four procedures represent completerandomization and the three types of restricted randomization mentioned earlier.These procedures were evaluated for two different splits of the total sample size(n1=n2=40 and n1=60,n2=20) and the combinations of selection and time-trend bias as(η,θ)=(0,0.2),(0.2,0),(0.2,0.2),(0,0.05),(0.05,0),(0.05,0.05). The evaluation criterion was the number of sequences protectingthe 5% level for stratified and unstratified randomization as well as stratified(wj=1,wj*) and unstratified (us) test statistic and RP (seeSupplementary Material). Note that unstratified randomization and test statisticcorrespond to the case presented in Hilgers.6 The results for (0,0.05), (0.05,0), (0.05,0.05) are given in Table 1 as well as for(0,0.2), (0.2,0), (0.2,0.2) in Table 2. In an additional evaluation, the number of centersK is increased from 2 to 8 while splitting the total sampleuniformly to the centers to show, whether there is a different influence on the typeI error rate. We used an R software script to conduct the analysis, seeSupplementary Material.

Table 1.

Probability of stratified and unstratified randomization procedures tokeep the 5% level for BSD(9), CR, EBC(0.67) and PBR(4) depending on theamount of selection η=0,0.05 and time-trend bias Θ=0,0.05 for different allocation ratios and analysis usingweighted (wj*), unweighted (wj = 1) andunstratified (us) t test.

Allocation ratioΘηRandomization procedureStratified randomizationUnstratified randomization
wj* testwj = 1 testus-testwj* testwj = 1 testus-test
20600.050BSD (9)0.580.270.710.580.270.67
CR0.580.280.670.580.280.68
EBC (0.67)0.850.410.950.760.410.96
PBR (4)1.000.931.001.000.931.00
00.05BSD (9)0.350.120.470.350.120.47
CR0.340.110.460.340.110.47
EBC (0.67)0.110.040.190.170.040.19
PBR (4)0.000.000.000.000.000.00
0.050.05BSD (9)0.430.170.660.410.170.63
CR0.420.170.630.420.170.63
EBC (0.67)0.220.080.840.300.080.86
PBR (4)0.030.001.000.030.001.00
40400.050BSD (9)0.570.310.760.590.310.67
CR0.590.320.680.590.320.68
EBC (0.67)0.820.520.970.740.520.96
PBR (4)1.001.001.001.001.001.00
00.05BSD (9)0.350.160.470.340.160.47
CR0.340.160.470.360.160.47
EBC (0.67)0.110.040.200.150.040.20
PBR (4)0.000.000.000.000.000.00
0.050.05BSD (9)0.420.230.690.430.230.62
CR0.430.230.630.420.230.63
EBC (0.67)0.220.100.880.290.100.87
PBR (4)0.030.001.000.020.001.00
8 × 100.050BSD (2)0.790.130.980.680.130.66
CR0.690.100.680.680.100.68
EBC (0.67)0.780.120.910.710.120.96
PBR (2)1.000.361.000.810.361.00
00.05BSD (2)0.000.000.150.040.000.48
CR0.050.000.480.050.000.47
EBC (0.67)0.010.000.210.020.000.19
PBR (2)0.000.000.000.000.000.00
0.050.05BSD (2)0.000.000.860.050.000.63
CR0.050.000.620.050.000.63
EBC (0.67)0.010.000.790.020.000.86
PBR (2)0.000.001.000.000.001.00

Open in a separate window

BSD: big stick design; EBC: Efron's biased coin design; PBR: permutedblock randomization; CR: complete randomization.

Table 2.

Probability of stratified and unstratified randomization procedures tokeep the 5% level for BSD(9), CR, EBC(0.67) and PBR(4) depending on theamount of selection η=0,0.2 and time-trend bias Θ=0,0.2 for different allocation ratios and analysis usingweighted (wj*), unweighted (wj = 1) andunstratified (us) t test.

Allocation ratioΘηRandomization procedureStratified randomizationUnstratified randomization
wj* testwj = 1 testus-testwj* testwj = 1 testus-test
20600.20BSD (9)0.650.310.710.670.310.67
CR0.660.320.680.660.320.68
EBC (0.67)0.900.470.950.840.470.96
PBR (4)1.000.971.001.000.971.00
00.2BSD (9)0.450.150.540.430.150.53
CR0.450.160.540.440.160.55
EBC (0.67)0.150.050.230.230.050.24
PBR (4)0.000.000.000.000.000.00
0.20.2BSD (9)0.480.180.620.480.180.61
CR0.480.180.620.480.180.60
EBC (0.67)0.280.090.830.340.090.84
PBR (4)0.030.001.000.030.001.00
40400.20BSD (9)0.660.360.750.660.360.65
CR0.670.360.670.670.360.67
EBC (0.67)0.890.600.970.820.600.96
PBR (4)1.001.001.001.001.001.00
00.2BSD (9)0.450.220.530.450.220.54
CR0.440.210.540.450.210.54
EBC (0.67)0.150.050.240.220.050.25
PBR (4)0.000.000.000.000.000.00
0.20.2BSD (9)0.480.250.670.480.250.61
CR0.480.250.620.470.250.61
EBC (0.67)0.270.110.860.330.110.84
PBR (4)0.010.001.000.010.001.00
8 × 100.20BSD (2)0.720.110.970.610.110.66
CR0.610.080.680.620.080.68
EBC (0.67)0.700.100.910.640.100.96
PBR (2)1.000.371.000.720.371.00
00.2BSD (2)0.000.000.250.030.000.54
CR0.030.000.540.030.000.54
EBC (0.67)0.000.000.260.010.000.24
PBR (2)0.000.000.000.000.000.00
0.20.2BSD (2)0.000.000.820.040.000.60
CR0.040.000.610.030.000.61
EBC (0.67)0.000.000.770.010.000.85
PBR (2)0.000.001.000.000.001.00

Open in a separate window

BSD: big stick design; EBC: Efron's biased coin design; PBR: permutedblock randomization; CR: complete randomization.

In the case where both biases are present, the stratified randomization withstratified analysis performs worse than unstratified analysis scenarios. Themagnitude does not depend on the balancing of sample sizes between centers(2060 vs. 4040; Table1). Using the favored weighted test statistic following a stratifiedanalysis, it appears that BSD and CR perform much better than all other RPs in theboth biased scenarios. However, the effect depends markedly on the type of bias. Inthe case of only time trend in the data, the final balance procedures (EBC(0.67),PBR) perform better than BSD or CR as well as with the unstratified analysisfollowing unstratified randomization. Weightig with wj* performs uniformly better than weighting with wj=1.

7 Discussion

The approach presented in this paper for multi-center trials follows the ideas of theevaluation of randomization procedures for design optimization (ERDO)6 framework. However as outlined, many aspects need to be addressed todemonstrate the contribution of randomization in mitigating bias during the planningphase of a multi-center trial.

Although Kraemer18 discussed various RPs in clinical trials including stratification, the mostcommon choice of stratified randomization is PBR with common block size.1922 We have presented new aspectsto formulate RPs, whether unrestricted or restricted, in order to induce the finalbalance or maximal tolerated imbalance including PBR in a stratified form. We havediscussed the formulation of stratified unrestricted and restricted proceduresforcing balance in probability, forcing balance by maximal tolerable imbalance, andforcing terminal balance as three subclassifications of restricted RPs.

There are several limitations of this study. First, our compound criterion forselection bias and chronological bias imposes similar scaling, but it is difficultor impossible to scale them identically. Second, the weighting of the two criteriais subjective and may be adjusted to account for the different scaling. Although ourstatistical test assumes hom*ogeneous variances across centers, the methodology canbe used with standardized observation in the case of known heterogeneous variancesacross centers.

Our proposed approach is demonstrated in a numerical evaluation study. Here, we usevery specific settings, e.g. common selection bias and time-trend effects acrosscenters, limited sample sizes corresponding to a particular effect size. We areaware that this evaluation study does not mirror all practical situations. However,specific practical situations of the multi-center clinical trial to be planned canbe embedded easily into the evaluation study to demonstrate the correspondingeffects. Moreover, the corresponding results for different evaluation metrics, e.g.mean type I error probability, are supplemented in tables. We used the supplementedR code for all computations.

We have chosen to use a parametric t test as our evaluationstatistic rather than the more natural randomization test.23 Randomization tests can be computed easily through the Monte Carlore-randomization methodology, although power considerations are computationallyintensive. They tend to preserve type I error rates under time trends and have nodistributional assumptions.2 Randomization tests can be formulated easily incorporating stratification,but the theoretical results we have derived herein would be impossible for exactrandomization tests or Monte Carlo re-randomization tests.

Our theoretical derivation could be applied to a general class of weightswj including, in particular, the inverse variance approach, although we focusour numerical evaluation study to the weights wj = 1 or wj=wj*, see Lin.15 Lin stated that many statisticians as well as the U.S. Food and DrugAdministration recommend the unweighted wj = 1 analysis.

Sample size considerations are presented by various authors. Whereas Ruvuna24 and Vierron and Giraudeau25 used the normal approximation formula, Lin's15 approach is based on the t statistic. We presented a generalsample size formula for the weighted t test with Kcenters which generalizes Lin's approach for the two center case and the weighted(wj*) and unweighted wj = 1 evaluation. Among others, our results can be used to demonstrate theeffect on the power when adding centers during progress of the trial, which seem tobe common practice to increase recruitment. Furthermore, our formulas can be usedfor power considerations, when imbalance in sample sizes between centers isassumed.24,25 Although it was not discussed in here, the approach can beextended to the case of random center size by using the corrected variance formulasof Ganju and Mehrotra.26

Although some authors mention that randomization is used to avoid bias, bias is quitelikely to occur when the PBR is used, particularly when the block size is small. Wepresent a general formal analytical approach to show how RPs are able to limit theimpact of selection and chronological bias on the test decision.

The idea behind the selection bias used originates from a natural preference for oneof the treatments. Furthermore, it seems to be very common, assuming that theallocation process tends to produce a balanced allocation ratio at least at the end,that investigators would believe that the treatment used most frequently thus far isless likely to appear next. Combining these two arguments, it may be reasonable,that in the situation of knowledge or best guessing what the next allocation wouldprobably be, to choose the next patient according to the expected next treatment.This is also in line with the patient's hope to be assigned to the better treatment.Summarizing, it has to be stated that this process is unconscious or subconscious.The question is not whether selection bias occurs or not, but rather how much impactof bias one is willing to accept. This can be investigated with the proposedsensitivity analysis approach even in the planning phase. With this consideration, aunique approach is presented to link the randomization process of unrestricted orrestricted procedures with the trial outcome.

Of course, other biases for time trend, e.g. log-time trend and step time trend16 or attrition bias could be easily implemented in the modeling and then usedin a numerical evaluation study. For instance, attrition bias could be modeled by avariable taking 0 or 1 on missingness, which offers opportunities, to studymechanism like missing at random.

Within this paper, we formulate a biasing policy for selection and chronological biasfor a two-arm, parallel group, multi-center trial, according to the weightedstratified t test procedure proposed by Fleiss.1 We further derive the distribution of the stratified weighted test statisticto calculate the impact on the type I error rate. Finally, theimpact of the combined additive bias in multi-center trials using the unstratifiedt test compared to the weighted stratified ttest is demonstrated in a simulation study.

8 Conclusion

Stratification in the randomization process makes the analysis sensitive to bias,i.e. results in type I error inflation. Procedures forcing terminalbalance are worse in the cases where the study is prone to selection bias,irrespective if time trend is present additionally. Unbalanced sample size betweencenters does not affect the results. This leads to the conclusion thatstratification in the randomization should be considered carefully if bias issupposed to be present. In summary, the presented approach contributes to optimizingthe design of clinical trials stratified by center with respect to improve thederived level of evidence.

Supplemental Material

Supplemental Material1 - Supplemental material for Design and analysis ofstratified clinical trials in the presence of bias:

Click here for additional data file.(45K, pdf)

Supplemental material, Supplemental Material1 for Design and analysis ofstratified clinical trials in the presence of bias by Ralf-Dieter Hilgers,Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methodsin Medical Research

Supplemental Material2 - Supplemental material for Design and analysis ofstratified clinical trials in the presence of bias:

Click here for additional data file.(71K, pdf)

Supplemental material, Supplemental Material2 for Design and analysis ofstratified clinical trials in the presence of bias by Ralf-Dieter Hilgers,Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methodsin Medical Research

Supplemental Material3 - Supplemental material for Design and analysis ofstratified clinical trials in the presence of bias:

Click here for additional data file.(69K, pdf)

Supplemental material, Supplemental Material3 for Design and analysis ofstratified clinical trials in the presence of bias by Ralf-Dieter Hilgers,Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methodsin Medical Research

Supplemental Material4 - Supplemental material for Design and analysis ofstratified clinical trials in the presence of bias:

Click here for additional data file.(70K, pdf)

Supplemental material, Supplemental Material4 for Design and analysis ofstratified clinical trials in the presence of bias by Ralf-Dieter Hilgers,Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methodsin Medical Research

Supplemental Material5 - Supplemental material for Design and analysis ofstratified clinical trials in the presence of bias:

Click here for additional data file.(73K, pdf)

Supplemental material, Supplemental Material5 for Design and analysis ofstratified clinical trials in the presence of bias by Ralf-Dieter Hilgers,Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methodsin Medical Research

Supplemental Material6 - Supplemental material for Design and analysis ofstratified clinical trials in the presence of bias:

Click here for additional data file.(2.6K, zip)

Supplemental material, Supplemental Material6 for Design and analysis ofstratified clinical trials in the presence of bias by Ralf-Dieter Hilgers,Martin Manolov, Nicole Heussen and William F Rosenberger in Statistical Methodsin Medical Research

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to theresearch, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship, and/or publication of this article: This research was supported by theIDeAl project funded from the European Union Seventh Framework Programme (FP72007-2013) under grant agreement No. 602552. RDH received funding from the EuropeanJoint Programme on Rare Diseases within European Union’s Horizon 2020 research andinnovation program under grant agreement No. 825575. Part of the work was done whileRDH joined 2018 workshop on Design of Experiments: New Challenges at CIRM Luminy,France. RDH was granted by RWTH Aachen University under project rwth0334 withcomputing resources for simulations.

Supplemental material

Supplemental material is available for this article online.

References

1. Fleiss JL.Analysis of data from multiclinictrials. Control Clin Trials1986; 7:267–275. [PubMed] [Google Scholar]

2. Rosenberger WF, Lachin J.Randomization in clinical trials: theory andpractice, New York, NY:Wiley, 2016. [Google Scholar]

3. Proschan M.Influence of selection bias on type I error rateunder random permuted block designs. StatSin1994; 4:219–231. [Google Scholar]

4. Kennes LN, Cramer E, Hilgers RD, et al. The impact of selection bias on test decisions in randomizedclinical trials. Stat Med2011; 30:2573–2581. [PubMed] [Google Scholar]

5. Tamm M, Cramer E, Kennes LN, et al. Influence of selection bias on the test decision – a simulationstudy. Methods Inf Med2012; 51:138–143. [PubMed] [Google Scholar]

6. Hilgers RD, Uschner D, Rosenberger WF, et al. ERDO – a framework to select an appropriate randomizationprocedure for clinical trials. BMC Med ResMethodol2017; 17(1):159. [PMC free article] [PubMed] [Google Scholar]

7. Efron B.Forcing a sequential experiment to bebalanced. Biometrika1971; 58:403–417. [Google Scholar]

8. Soares JF, Wu CFJ.Some restricted randomization rules insequential designs. Commun Stat TheoryMethods1982; 12:2017–2034. [Google Scholar]

9. Mantel N.Random numbers and experimentaldesign. Ann Stat1969; 23:32–34. [Google Scholar]

10. Zelen M.The randomization and stratification of patientsto clinical trials. J Chronic Dis1974; 27:365–375. [PubMed] [Google Scholar]

11. Berger VW, Ivanova A, Knoll DM.Minimizing predictability while retainingbalance through the use of less restrictive randomizationprocedures. Stat Med2003; 22(19):3017–3028. [PubMed] [Google Scholar]

12. ICH E9. Statistical principles forclinical trials. https://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E9/Step4/E9_Guideline.pdf(accessed 24 April 2019).

13. Johnson NL, Kotz S.Continuous univariate distributions – 2,New York, NY:Wiley, 1970. [Google Scholar]

14. Searle SR. Linearmodels. New York, NY: Wiley, 1971.

15. Lin Z.An issue of statistical analysis in controlledmulti centre studies: how shall we weight the centres?.Stat Med1999; 18:365–373. [PubMed] [Google Scholar]

16. Tamm M, Hilgers RD.Chronological bias in randomized clinical trialsarising from different types of unobserved time trends.Methods Inf Med2014; 53:501–510. [PubMed] [Google Scholar]

17. Berger VW. Selection biasand covariate imbalances in randomized clinical trials.Chichester: Wiley, 2005. [PubMed]

18. Kraemer H, Fendt KH.Random assignment in clinical trials: issues inplanning (infant health and development program). JClin Epidemiol1990; 43:1157–1167. [PubMed] [Google Scholar]

19. Ganju J, Zhou K.The benefit of stratification in clinical trialsrevisited. Stat Med2011; 30:2881–2889. [PubMed] [Google Scholar]

20. Pickering RM, Weatherall M.The analysis of continuous outcomes inmulti-centre trials with small centre sizes. StatMed2007; 26:5445–5456. [PubMed] [Google Scholar]

21. Chu R, Thabane L, Ma J, et al. Comparing methods to estimate treatment effects on a continuousoutcome in multicentre randomized controlled trials: a simulationstudy. BMC Med Res Methodol2011; 11: 21. [PMC free article] [PubMed] [Google Scholar]

22. Feaster DJ, Mikulich-Gilbertson S, Brinks AM.Modeling site effects in the design and analysisof multisite trials. Am J Drug AlcoholAbuse1998; 37:383–391. [PMC free article] [PubMed] [Google Scholar]

23. Zheng L, Zelen M.Multi-center clinical trials: randomization andancillary statistics. Ann Appl Stat2008; 2(2):582–600. [PMC free article] [PubMed] [Google Scholar]

24. Ruvuna F.Unequal center sizes, sample size, and power inmulticenter clinical trials. Drug Inf J2004; 38:387–394. [Google Scholar]

25. Vierron E, Giraudeau B.Sample size calculation for multicenterrandomized trial: Taking the center effect into account.Control Clin Trials2007; 28:451–458. [PubMed] [Google Scholar]

26. Ganju J, Mehrotra DV.Stratified experiments reexamined with emphasison multicenter trials. Control Clin Trials2003; 24:167–181. [PubMed] [Google Scholar]

Articles from Statistical Methods in Medical Research are provided here courtesy of SAGE Publications

Design and analysis of stratified clinical trials in the presence of
bias (2024)
Top Articles
Latest Posts
Article information

Author: Kareem Mueller DO

Last Updated:

Views: 6171

Rating: 4.6 / 5 (46 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Kareem Mueller DO

Birthday: 1997-01-04

Address: Apt. 156 12935 Runolfsdottir Mission, Greenfort, MN 74384-6749

Phone: +16704982844747

Job: Corporate Administration Planner

Hobby: Mountain biking, Jewelry making, Stone skipping, Lacemaking, Knife making, Scrapbooking, Letterboxing

Introduction: My name is Kareem Mueller DO, I am a vivacious, super, thoughtful, excited, handsome, beautiful, combative person who loves writing and wants to share my knowledge and understanding with you.