Analysis of Survey Data

Data analysis must take into account the sampling methodology used to select the sample and collect the responses. Accounting for the sampling methodology will allow the generalization of the sample results to the population and produce design-based measures of uncertainty.

The founder of QuantifyAfrica is the creator of the Python library Samplics for designing and analyzing complex survey data.


Sample Weight Calculation and Adjustments.

The sample weight is the main mechanism for generalizing the sample results to the target population. QuantifyAfrica will calculate design sample weights and make all adjustments necessary to account for nonresponse, poststratification, and calibration, when applicable. For more complex indicators and/or sampling designs, we will create and use replication weights such as Balanced Repeated Replication (BBR), Bootstrap, and Jackknife.


Estimation of Population Parameters.

Often one of the objectives of the study is to estimate unknown population parameters, for example, the proportion of people under the poverty level, the proportion of children vaccinated against measles, and so on. QuantifyAfrica uses Taylor-based or replicate-based estimators to approximate the unknown population parameters. Measures of uncertainty are always produced to assess the reliability of the estimates.


Categorical Data Analysis.

Tabulation, crosstabulation, hypothesis testing, logistics regression, loglinear and multinomial models, classification, and clustering are all statistical techniques QuantifyAfrica uses to analyze categorical data.


SAE and Disaggregated Estimates.

Disaggregated data is critical for policymakers, executive managers, civil society leaders, and many others who need to tailor communications, decisions and interventions to the appropriate populations. Small area estimation (SAE) techniques extend the disaggregation power of the sample by using modeling techniques and auxiliary information external to the survey data such as census, administrative, or geo-referenced remote sensing data.


Other Topics.

  • Imputation
  • Dimensionality reduction (PCA, factor analysis)
  • Non-probability sampling
  • Record linkage