Confidence Interval Estimation

Confidence Interval (CI) Estimation is a method of interval estimation that produces an interval (from sample data) likely to include the value of an unknown population parameter. Unlike point estimates, CIs quantify the uncertainty associated with the estimate.

Core Concepts

  • Definition: A range of values derived from sample statistics that is likely to contain the value of an unknown population parameter with a certain level of confidence (e.g., 95%).
  • Interpretation: If the sampling process were repeated infinitely, the specified percentage of calculated intervals would contain the true population parameter. It does not mean there is a 95% probability the specific calculated interval contains the parameter (frequentist vs. Bayesian distinction).
  • Components:
    • Point Estimate: The best guess for the parameter (e.g., sample mean ).
    • Margin of Error (MoE): Reflects sampling variability; calculated as .
    • Confidence Level (): The long-run proportion of intervals capturing the true parameter.
  • Assumptions:
    • Random sampling.
    • Independence of observations.
    • Normality of the sampling distribution (justified by Central Limit Theorem for large , or inherent normality of population).

Mathematical Formulation

For a population mean with known variance or large samples:

Where:

  • = sample mean
  • = critical value from standard normal distribution
  • = population standard deviation (or for sample)
  • = sample size

Relation to Uncertainty Quantification

Accurate CI estimation requires rigorous handling of variance and potential model errors. In modern computational contexts, particularly with generative models, failure to quantify uncertainty leads to overconfidence in erroneous outputs.

Key Distinctions

  • Hypothesis Testing: Rejects a null hypothesis; CIs provide a range of plausible values.
  • Prediction Interval: Estimates where a future single observation will fall; CIs estimate the population parameter. PIs are always wider than CIs.
  • Credible Interval (Bayesian): Directly states the probability that the parameter lies within the interval, given the data.