groups » Data Management and Statistics » Statistical Advice

This discussion group is for members to post specific clinical trial statistical questions that they have



  • GHN_Editors The Editorial Team June 2, 2011

    This statistical question was posted by Roma Chilengi as a blog. We are adding it here so that others can also post any similar questions that they may have. Please see Roma's question below:

    To the GHT statisticians, I need confirmation about the best approach to design an epidemiological study on "causes of rota vaccine failure in children". My programme will vaccinate and follow up a birth cohort of about 170,000 children with rota virus vaccine. In parallel to this implementation work, i am considering designing some research to evaluate potential causes of vaccine failure such as high levels of maternal immunity, levels of IgA in milk, severe malnutrition and HIV/AIDS. Expected vaccine efficacy is about 49.2% and seroconversion rates of about 47.8%. Alpha at 5%, 95%confidence and 80% power. I need some comments/guidance as to the sample size approach. Many thanks, Roma

  • MarcelW Marcel Wolbers June 3, 2011

    Dear Roma

    This sounds like a very interesting study!
    I take it you are not planning to measure the potential risk factors for all 170,000 children? Otherwise, count yourself fortunate to have such a massive data set and the sample size calculation has already taken care of itself as it is given by the design of the birth cohort.

    If you are talking about planning a sub-study, the sample size critically depends on three things:
    - The tentative strength of the effect of the candidate risk factors on the outcome of vaccine failure
    - The variability of the risk factors
    - The correlation between different risk factors (if you want to study independent effects of candidate factors adjusted for others and/or confounders)
    You might get some ideas about realistic estimates for these values from a literature search.

    More specifically, you will likely model the outcome of vaccine failure with a logistic regression model which includes the risk factors as covariates. There might be simple software or reliable online calculators available for sample size calculation for logistic regression but I am not aware of them. What I have personally done before, though, is to hand-implement the method proposed by Hsieh et al (see below). For a trained statistician, this is relatively easy to do. For others, it might be daunting.

    Setting up a formal sample size calculation for logistic regression will still require you to make some clinical assumptions about realistic effect sizes etc which you might not have at this stage as there hasn't been any prior research done. A pragmatic rule of thumb is that for a logistic regression model you should observe at least 10 events (vaccine failures) for each predictors. The rationale for this is that simulation studies have shown that otherwise a prognostic models developed on the study cohort is not likely to generalize well to the general population as one is overfitting the data. More on this with references is in the (excellent) book by Frank Harrell (p.60f, see below).

    Best regards,


    Hsieh, F.Y., Block, D.A., and Larsen, M.D. (1998). A Simple Method of Sample Size Calculation for Linear and Logistic Regression. Statistics in Medicine, Volume 17, pages 1623-1634.

    Harrell, Frank E. (2001). Regression Modeling Strategies. Springer Series in Statistics.

  • brianfaragher Brian Faragher June 3, 2011

    Dear Roma

    While I agree fully with everything Marcel suggests, I think you also need to look at your sample size problem at a simpler level. Given the size of your cohort (I suspect most medical statisticians are green with envy!), it is likely that you will have more than 80% power to detect even very modest factors influencing vaccine failure. However, it is always wise in my view to explore in some detail the numbers of participants that will need to be exposed to individual risk factors to give you adequate power at this very basic level. Fundamental to this will have to be some decision as to the size of effect you wish to detect (either in the form of a risk difference or risk ratio) - and this is usually something the clinical members of the research team may have to advise on.

    There are many software packages around that will do the relevant calculations - I personally use WinEpiscope with students as it is freeware - and of course there is the excellent book by Machin.

    Best wishes.

    Brian Faragher
    Liverpool School of Tropical Medicine

  • Thanks, all I have just be introduced and invited to join the Global Health Trail Group. I hope to enjoy the discussions along side learn more to add to my little knowledge in research work. Thanks

  • GHN_Editors The Editorial Team June 8, 2011

    Thanks Matilda, and welcome! We hope you find the discussions useful and interesting.

  • GHN_Editors The Editorial Team Nov. 20, 2012

    We are planning a new area for statisticians and data managers and so it would be excellent to hear from people who would welcome this and also those who might be able to contribute.

Please Sign in (or Register) to view further.