Sample Size · Effect Size · Power (1–β)

Power Calculation

Design studies that can truly detect meaningful effects. We scope your hypotheses, select the right tests, set defensible assumptions, and compute sample size or power with sensitivity/robustness checks delivered with reproducible scripts and a clear write-up.

Typical target power: 80–90% (two-sided α = 0.05), tuned to your field and constraints.

What you get

  • Assumption sheet (effect size logic, α, tails, design factors)
  • Sample size & power tables for candidate tests/designs
  • Sensitivity analyses (dropouts, unequal groups, ICC/cluster effects)
  • Reproducible scripts (G*Power settings and/or R code)
  • Methods write-up suitable for protocol/IRB/grant/manuscript
  • Two revision rounds to match supervisor/journal feedback

Outcome: a defensible, reviewer-ready sample size justification aligned to your aims, design, and constraints.

Why Power Analysis Matters

  • Reduces false negatives (Type II): avoids underpowered, inconclusive studies.
  • Efficient resource use: right-sized samples save time, funds, and participant burden.
  • Reviewer compliance: many IRBs/journals require explicit sample size justifications.
  • Transparent assumptions: makes design choices explicit and reproducible.
  • Ethical design: balances detection capability with minimal risk/exposure.

Core Inputs We Specify

Effect size

Clinically/academically meaningful difference; standardized metrics (d, r, OR/RR), or raw deltas with SD/variance.

α & tails

Significance level (e.g., .05) and one- vs two-sided hypotheses tied to prior evidence and risks.

Desired power (1–β)

Typical targets 0.80–0.90; higher for critical decisions or multiple endpoints.

Design factors

Group allocation ratio, pairing/repeated measures correlation, blocking/stratification, covariates.

Variance & ICC

Heterogeneity, clustering (multisite/classroom/clinic), design effect for cluster trials.

Outcome type & test

Continuous, binary, count, time-to-event; appropriate tests/models matched to aim.

Common Tests & Designs We Cover

t-tests

One-sample, two-sample (equal/unequal n), paired/repeated measures.

ANOVA/ANCOVA

One-way/Factorial, repeated measures/mixed; covariate-adjusted designs.

Regression/GLM

Linear, logistic, Poisson/negative binomial; R² or predictor effect targets.

Proportions/Chi-square

Two-proportion tests, goodness-of-fit, RxC tables with expected counts.

Survival/Time-to-event

Log-rank/Cox; events needed given hazard ratios, accrual, follow-up, censoring.

Cluster & multilevel

Design effect via ICC; cluster-randomized or classroom/clinic designs.

Non-parametric

Rank-based approximations and robust alternatives when assumptions fail.

Tools we use include G*Power and R (pwr/pwr2ppl/sim-based checks) with shared settings for full reproducibility.

Our Process

1) Define aims & endpoints

Clarify primary/secondary outcomes, hypotheses, and minimal meaningful effects grounded in literature/practice.

2) Choose tests/models

Map outcomes and design to appropriate tests (two-sample, ANCOVA, Cox, GLMM, etc.), including tails and α.

3) Set assumptions

Variance/SD, ICC, correlations, allocation ratio, covariates, drop-out rates; document sources and rationale.

4) Compute & compare

Generate sample size/power tables across plausible ranges; highlight feasible scenarios and trade-offs.

5) Sensitivity & robustness

Stress-test assumptions (e.g., higher variance, unequal groups, attrition) and show implication on n and power.

6) Deliver write-up & scripts

Provide a method statement for protocol/IRB/manuscript plus G*Power screenshots or R code for replication.

Typical Deliverables

What to Share

FAQ

We use prior literature, pilot data, or a “minimal important difference” from domain guidance, and run sensitivity tables across plausible values.

Yes calculations account for k:n ratios and the efficiency loss with imbalance; we recommend feasible ratios given constraints.

We incorporate ICC to compute the design effect and adjust sample size for the number/size of clusters.

Within-subject correlation increases efficiency; we use estimated correlations to adjust n and show sensitivity.

Two-tailed is standard unless a strong a-priori directional claim and no interest in the opposite effect; we document the rationale.

Yes expected R² or baseline-outcome correlation can reduce required n; we show both adjusted and unadjusted scenarios.

We inflate n by anticipated attrition and advise strategies (follow-up windows, ITT principles) to mitigate loss of power.

We use approximations/transformations or simulation-based checks where parametric assumptions are doubtful.

For designs beyond closed-form solutions, we use R-based simulations to approximate power under your data-generating assumptions.

A concise paragraph with test, α, tails, effect size definition, power target, assumptions/sources, resulting n, and citations, plus scripts/screens.

Need a defensible sample size fast?

Send your aim, endpoint, and any pilot numbers. We’ll return a scoped plan, fixed quote, and delivery timeline.