IBM SPSS Complex Samples

Correctly and Easily Compute Statistics for Complex Sampling

  • Overview
  • Data Analysis
  • Planning
  • Data Management

Do you analyze data from survey or market research, public health datasets, or government agencies? Do you use sample survey methodology in your research, or are your data likely to come from a public-use dataset that includes complex sample designs? Are you confident that the statistical methods you use to analyze sample survey data provide you with the most accurate results?

If you're working with complex sample designs, such as stratified, clustered or multistage sampling, you need specialized statistical techniques to account for the sample design and its associated standard errors.

IBM SPSS Complex Samples (formerly called SPSS Complex Samples) is a module of IBM SPSS Statistics. It provides the specialized planning tools and statistics you need when working with sample survey data. It enables you to make more statistically valid inferences for a population by incorporating the sample design into survey analysis. You can more accurately work with numerical and categorical outcomes in complex sample designs using two algorithms for analysis and prediction. In addition, a new algorithm enables you to predict time to an event. This add-on module is an indispensable statistical tool for survey and market researchers, public opinion researchers, or social scientists, and enables you to reach more accurate conclusions when working with sample survey methodology.

Work Efficiently and Easily

Only IBM SPSS Complex Samples makes understanding and working with your complex sample survey results easy. Through the intuitive interface, you can analyze data and interpret results. When you're finished, you can publish public-use datasets and include your sampling and analysis plans. These plans act as a template and allow you to save all the decisions made when creating the plan—define it once and you're done. This saves time and improves accuracy for yourself and others who may want to plug your plans into the data to replicate results or pick up where you left off.

To begin your work in IBM SPSS Complex Samples, use the wizards, which prompt you for the many factors you must consider before you start planning. If you are creating your own samples, use the Sampling Wizard to define the scheme and draw the sample. If you're using public-use datasets that already have samples, such as those provided by the Centers for Disease Control and Prevention (CDC), use the Analysis Plan Wizard to specify how the samples were defined and how standard errors should be estimated. Once you create a sample or specify standard errors, you can create plans, analyze your data, and produce results (see diagram below for workflow).

You can use the following types of sample design information with IBM SPSS Complex Samples:

  • Stratified sampling—Increase the precision of your sample or ensure a representative sample from key groups by choosing to sample within subgroups of the survey population. For example, subgroups might be a specific number of males or females or contain people in certain job categories, people of a certain age group and so on.
  • Clustered sampling—Select clusters, which are groups of sampling units, for your survey. Clusters can include schools, hospitals or geographic areas with sampling units that might be students, patients or citizens. Clustering often helps makes surveys more cost-effective.
  • Multistage sampling—Select an initial or first-stage sample based on groups of elements in your population; then create a second-stage sample by drawing a sub-sample from each selected unit in the first-stage sample. By repeating this option, you can select a higher-stage sample. For example, in a face-to-face survey, you might sample individuals within households and city blocks.
Accurate analysis of survey data is easy in IBM SPSS Complex Samples. Start with one of the wizards (which one depends on your data source) and then use the interactive interface to create plans, analyze data and interpret results.

Accurate analysis of survey data is easy in IBM SPSS Complex Samples. Start with one of the wizards (which one depends on your data source) and then use the interactive interface to create plans, analyze data and interpret results.


Everything You Need for Data Analysis

As a researcher, you want to be confident about your results. Performing data analysis in IBM SPSS Complex Samples helps you to achieve more statistically valid inferences for populations measured in your complex sample data. IBM SPSS Complex Samples provides you with better results because, unlike most conventional statistical software, it incorporates the sample design into survey analysis. And, it easily plugs into other IBM SPSS Statistics modules so you can seamlessly work in the IBM SPSS Statistics environment.

IBM SPSS Complex Samples provides you with five procedures to analyze data from sample survey data. And you can use ordinal data in much the same way you use numeric data.

Complex Samples Descriptives (CSDESCRIPTIVES)—Estimates means, sums and ratios, and computes standard errors, design effects, confidence intervals hypothesis tests for samples drawn by complex methods. The procedure estimates variances by taking into account the sample design used to select the sample, including equal probability and probability proportionate to size (PPS) methods, using both with replacement (WR) and without replacement (WOR) sampling procedures. Optionally, CSDESCRIPTIVES performs analyses for subpopulations.

You can also use CSDESCRIPTIVES to specify how to handle missing data:

  • Base each statistic on all valid data for the analysis
    variable(s) used in computing the statistic. Compute ratios using all cases with valid data for both of the specified variables. You may base statistics for different variables on different sample sizes.
  • Base only cases with valid data for all analysis variables when computing statistics. Always base statistics for different variables on the same sample size.
  • Exclude user-missing values among the strata, cluster, and subpopulation variables
  • Include user-missing values among the strata, cluster, and subpopulation variables. Treat user-missing values for these variables as valid data.

Complex Sample Tabulate (CSTABULATE)—Displays one-way frequency tables or two-way crosstabulations and associated standard errors, design effects, confidence intervals and hypothesis tests for samples drawn by complex sampling methods. The procedure estimates variances by taking into account the sample design used to select the sample, including equal probability and PPS methods, and with replacement (WR) and without replacement (WOR ) sampling procedures. Optionally, CSTABULATE creates tables for subpopulations.

Use the following statistics within the table:

  • Population size
  • Standard error
  • Row percentages
  • Column percentages
  • Table percentages
  • Coefficient of variation
  • Design effects
  • Square root of the design effects
  • Confidence interval
  • Unweighted counts
  • Cumulative population size estimates
  • Cumulative percentages
  • Expected population size estimates
  • Pearson residuals
  • Adjusted Pearson residuals

Use the following statistics and tests for the entire table:

  • Test of homogeneous proportions
  • Test of independence
  • Odds ratio
  • Relative risk
  • Risk difference

You can use CSTABULATE to specify how to handle missing data, just as you do with CSDESCRIPTIVES.

Complex Samples General Linear Models (CSGLM)—Enables you to build linear regression, analysis of variance (ANOVA), and analysis of covariance (ANCOVA) models for samples drawn by complex sampling methods. The procedure estimates variances by taking into account the sample design used to select the sample, including equal probability and PPS methods, and WR and WOR sampling procedures. Optionally, CSGLM performs analyses for subpopulations.

You can use the following statistics with CSGLM:

  • Model parameters: Coefficient estimates, standard error for each coefficient estimate, t test for each coefficient estimate, confidence interval for each coefficient estimate, and square root of the design effect for each coefficient estimate
  • Population means of dependent variable and covariates
  • Model fit
  • Sample design information

Hypothesis tests include:

  • Test statistics: Wald F test, adjusted Wald F test, Wald Chi-square test, and adjusted Wald Chi-square test
  • Adjustment for multiple comparisons: Least significant difference, Bonferroni, sequential Bonferroni, Sidak, and sequential Sidak
  • Sampling degrees of freedom: Based on sample design or fixed by user
  • Estimated means: Request eliminated marginal means for factors and interactions in the model

Handle missing data using listwise deletion of missing values.

Complex Ordinals Selection (CSORDINAL)—Makes it easier to predict outcomes when you are using ordinal data. You can estimate variances, taking into account the sample design used to select the sample. And you can perform an analysis for a subpopulation. You can create models for:

  • Main effects
  • All n-way interactions
  • Fully crossed designs
  • Custom, including nested terms

The CSORDINAL procedure includes the following statistics, tests, and functionality:

  • Statistics
    • Model parameters
    • Model fit
    • Parallel lines tests
    • Summary statistics for model variables
    • Sample design information
  • Hypothesis tests
    • Test statistics
    • Adjustment for multiple comparisons
    • Sampling degrees of freedom
  • Model variables can be saved to the active file or exported to external files that contain parameter matrices
  • Three estimation methods are available: Newton-Raphson, Fisher Scoring, and Fisher Scoring followed by Newton-Raphson
  • Numerous tests of output are also included

Complex Samples Logistic Regression (CSLOGISTIC)—Performs binary logistic regression analysis, as well as multiple logistic regression (MLR) analysis, for samples drawn by complex sampling methods. The procedure estimates variances by taking into account the sample design used to select the sample, including equal probability and PPS methods, and WR and WOR sampling procedures. Optionally, CSLOGISTIC performs analyses for subpopulations.

You can use the following statistics with CSLOGISTIC:

  • Model parameters: Coefficient estimates, exponential estimates, standard error for each coefficient estimate, t test for each coefficient estimate, confidence interval for each coefficient estimate, design effect for each coefficient estimate, square root of the design effect for each coefficient estimate, covariances of parameter estimates, and correlations of the parameter estimates
  • Model fit: Pseudo R-squared and classification table
  • Summary statistics for model variables
  • Sample design information

Hypothesis tests include:

  • Test statistics: Wald F test, adjusted Wald F test, Wald Chi-square test, and adjusted Wald Chi-square test
  • Adjustment for multiple comparisons: Least significant difference, Bonferroni, sequential Bonferroni, Sidak, and sequential Sidak
  • Sampling degrees of freedom: Based on sample design or fixed by user

Handle missing data using listwise deletion of missing values.

Complex Samples Cox Regression (CSCOXREG) —Applies Cox proportional hazards regression to analysis of survival times— that is, the length of time before the occurrence of an event for samples drawn by complex sampling methods. CSCOXREG supports continuous and categorical predictors, which can be time-dependent. CSCOXREG provides an easy way of considering differences in subgroups as well as analyzing effects of a set of predictors. Also, the procedure handles data where there are multiple cases (such as patient visits, encounters, and observations) for a single subject.

You can use the following statistics with CSCOXREG:

  • Sample design information
  • Event and censoring summary
  • Risk set at event time
  • Model parameters: Coefficient estimates, exponentiated estimates, standard error for each coefficient estimate, t test for each coefficient estimate, confidence interval for each coefficient estimate, design effect for each coefficient estimate, square root of the design effect for each coefficient estimate, covariances of parameter estimates, and correlations of the parameter estimates
  • Model assumptions: Test of proportional hazards, parameter estimates for alternative model, covariance matrix for alternative model, and baseline survival and cumulative hazard functions

Hypothesis tests include:

  • Test statistics: F test, Adjusted F test, Chi-square test. Adjusted Chi-square test
  • Adjustment for multiple comparisons: Least significant difference, Bonferroni, Sequential Bonferroni, Sidak, and sequential Sidak
  • Sampling degrees of freedom: based on sample design or fixed by user

Download more information


Everything You Need for Planning

To help you through the planning stage in the analytical process, IBM SPSS Complex Samples provides you with specialized tools and procedures for working with sample survey data. And it easily plugs into other IBM SPSS Statistics modules so you can seamlessly work in the IBM SPSS Statistics environment.

Complex Samples Plan (CSPLAN)—Use this procedure to specify the sampling frame to create a complex sample design or analysis specification used by companion procedures in IBM SPSS Complex Samples. With CSPLAN, you can specify how to draw or analyze stratified, clustered or multistage complex sample designs, with or without replacement. Methods for sampling with probability proportionate to size (PPS) are also available.

Because CSPLAN does not actually extract the sample or analyze data, you sample cases using sample designs created by CSPLAN as input to the Complex Samples Selection (CSSELECT) procedure.

Sampling Plan Wizard—If you are creating your own samples, use the Sampling Plan Wizard to define the scheme and draw the sample. From there, you can create plan files that you can save and share with colleagues.

Analysis Preparation Wizard—If you're using public-use datasets that already have samples, such as those provided by the CDC, use the Analysis Plan Wizard to specify how the samples were defined and how standard errors should be estimated. From there, you can create plan files that you can save and share with colleagues.

Plan files—Once you have created plan files, you can save them and treat them as templates. This allows you to save all the decisions you made when creating the plan. This saves time and improves accuracy for yourself and others who may want to plug your plans into the data to replicate results or pick up where you left off.

Download more information


Everything You Need for Data Management

IBM SPSS Complex Samples provides what you need for the data management stage when working with sample survey data. And it easily plugs into other IBM SPSS Statistics modules so you can seamlessly work in the IBM SPSS Statistics environment.

Complex Samples Selection (CSSELECT) procedure—Enables you to select complex, probability-based samples from a population. CSSELECT chooses units according to a sample design created through the CSPLAN procedure.

With this procedure, you can:

  • Control the scope of execution and specify a seed value with the CRITERIA subcommand
  • Control whether or not user-missing values of classification (stratification and clustering) variables are treated as valid variables with the CLASSMISSING subcommand
  • Specify general options concerning input and output files with the DATA subcommand
  • Write sampled units to an external file using an option to keep/drop specified variables
  • Automatically save first-stage joint inclusion probabilities to an external file when the plan specifies a probability proportionate to size (PPS) without replacement (WR) sampling method
  • Opt to generate text files containing a rule that describes characteristics of selected units

Download more information