Population Assessment of Tobacco and Health

Testing for statistical differences in cross-sectional prevalence estimates across waves Tips for Using the PATH Study Data User Forum

Return to the Population Assessment of Tobacco and Health (PATH) Study Series page.

4 posts / 0 new
Last post
janahirs
Testing for statistical differences in cross-sectional prevalence estimates across waves

I am interested in comparing cross-sectional prevalence estimates across waves of PATH data. Per the PATH user guide (p. 61), is it not appropriate to compute separate estimates for each wave and compare by assessing confidence interval overlap. They recommend creating a stacked (or long) dataset with a wave indicator, which I have done. The user guide goes on to say "the subsequent analyses must include the newly created wave indicator variable and the design correctly specified in a software package designed to capture sample variability described in Appendix A....Manipulating the files as described above and using the appropriate variance estimation will correctly reflect these correlations." 

After applying the recommended survey set command in Stata, I have tested out several lines of code (some from Appendix A) to compare estimates by wave indicator (e.g., chi-square, regression with categorical wave as a predictor). However, it is unclear to me whether performing these simple tests (for example, a chi-square test comparing prevalence of smoking by wave indicator) with the recommended survey specification (weighted with BRR variance estimation) is accounting for the correlation between participants across waves (since they are mostly the same people). Is it? If not, what is the recommended statistical approach to do so? Several statisticians have recommended using mixed models but it seems to me that this would have been specified in the user guide if needed. There is no specific guidance on this type of analysis in Appendix A.

clairece
Re: Testing for statistical differences in cross-sectional...

Please refer to section 5.4.3.2 “Cross-sectional Analyses Comparing Different (or Partially Overlapping) Sets of Persons between Waves” in the user guide. As indicated in the user guide, to create cross-sectional estimates for comparing waves, the PATH Study data user will have to perform the following data manipulation steps:

  • Rename wave-specific variables, including weight variables, to obtain a single common name for each set of comparable variables. Use the cross-sectional weights for Wave 1 and the single-wave weights for all other waves;
  • Create a wave indicator variable;
  • Concatenate or “stack” data files from each wave to form a single file with one record per respondent per wave in which they provided data.

As stated in the user guide “even though there is not complete overlap between the two sets of respondents, there are still correlations between the two groups that should be reflected due to partial overlap and because some persons are in the same PSUs. This correlation serves to reduce the estimated variance of the comparison; manipulating the files as described above and using the appropriate variance estimation methods will correctly reflect these correlations.”

Claire Cepuran

National Addiction and HIV Data Archive Program
Inter-university Consortium for Political
  and Social Research (ICPSR)
734-615-1959

janahirs
Thank you for your quick

Thank you for your quick response, Claire. 

These are the instructions I have followed. However, they stop short of actually explaining how to analyze the newly created stacked dataset. I cannot find this information anywhere in the user guide.

Per my original post, after applying the recommended survey set command in Stata, I have tested out several lines of code (some from Appendix A) to compare estimates by wave indicator (e.g., chi-square, regression with categorical wave as a predictor). However, it is unclear to me whether performing these simple tests (for example, a chi-square test comparing prevalence of smoking by wave indicator) with the recommended survey specification (weighted with BRR variance estimation) is accounting for the correlation between participants across waves (since they are mostly the same people). Is it? If not, what is the recommended statistical approach to do so? Several statisticians have recommended using mixed models but it seems to me that this would have been specified in the user guide if needed. There is no specific guidance on this type of analysis in Appendix A.

Can you please provide guidance on how to produce and statistically compare cross-sectional estimates with the stacked data file (created per user guide instructions)? I'd like a p-value comparing smoking prevalence estimates from waves 1, 2, and 3. Is it sufficient to run the survey set command and then do a chi-square by indicator or regression with the categorical indicator? Or is a mixed model required that accounts for the clustering by PERSONID?

Jana

clairece
Re: Testing for statistical differences in cross-sectional...

Thank you for contacting PATH Study Support. The PATH Study is unable to assist with individual analytic questions from researchers or provide any other type of personal assistance as the study does not endorse specific statistical approaches. The RUF and PUF User Guides, codebooks, and annotated instruments are valuable resources that may help address this question, located at  https://doi.org/10.3886/Series606  

You may also wish to consult with statisticians and analysts at your institution with more specific questions

Claire Cepuran

National Addiction and HIV Data Archive Program
Inter-university Consortium for Political
  and Social Research (ICPSR)
734-615-1959

Log in or register to post comments