Introduction to Political Science Data Analysis: Political Science 5075/7075
University of Colorado-Boulder
Dr. Vanessa A. Baird, Professor
Office Ketchum 131D; Email Vanessa.Baird@Colorado.edu
Office Hours: Tuesday, Wednesday 1-2
(also by appointment)
David Doherty, Teaching Assistant
David.Doherty@colorado.edu
Office Hours: 10:45-12:15 Tuesday, Thursday
(also by appointment)
The objective of this course is to introduce the methods by which social scientists apply statistics to substantive areas of research. We will examine the elements of research inference, which apply both to qualitative and quantitative designs, but we will focus on being particularly critical of quantitative designs. Since social scientists are interested in explaining why some phenomena vary or change, across people, time or geographical units, we will talk about the kinds of variance and distributions of your variables followed by a discussion of inferring the manner and extent to which multiple variables “co-vary” (vary together). One of the most important issues in social science is how to “operationalize” concepts (match an abstract concept to a quantitative coding scheme) that are interesting to us, so we will learn how to devise and assess measurement theories. We will look at how to use numerous indicators to represent a single concept and how to evaluate the reliability and validity of these concepts. Ordinary Least Squares (OLS) is the most commonly introduced method, and it is important in that it serves as the foundation for more complex statistical methods. We will therefore spend a great deal of time learning this method in both bivariate (explaining how one variable varies with another) and multivariate (explaining one variable with at least two others) designs. We will also focus on the logic of inference of OLS in hypothesis testing, which involves evaluating the error in your estimation.
This class is particularly concerned with model specification, meaning that theories ought to be as exhaustive as possible. However, your model of the world must not (and should not) match the world precisely – as social scientists, we are interested in balancing accuracy with parsimony. Though social scientists have statistical ways of aiding you in your process of balancing parsimony with accuracy, developing sound political theory that has exhausted other studies in your question as its basis is of primary importance in this class. The question that we continue to ask as we develop theoretical models is whether there are alternative explanations for our variable of interest that are not included in the present analysis. If so, then the model (the theoretical representation of empirical reality) is not well specified. On the other hand, if explanatory variables do not contribute significantly to our understanding of the concept, then they ought not be included in the analysis. In the later part of the class, we will learn how to manipulate explanatory variables to incorporate the context of theoretical relationships into our models. We will consider various methods for improving accuracy, using dummy nominal variables, quadratic equations and interactions. This course will conclude with several sessions on how to evaluate and criticize OLS multivariate models.
The structure of the course is applied; you should be spending about 50% of your time learning and understanding statistics and the other 50% of your time will be associated with applying those statistics to real questions. Some of the former will be in class (though you will need to reinforce your understanding of statistics on your own) and most of the latter will happen outside of class, in the lab assignments and on your paper. I expect that you will spend at least 15 hours per week on this class. Each lecture builds substantially on previously covered material. You cannot afford to miss class. If you do not understand something from the class, ask questions. If you still do not understand, come to my office during my office hours. It is your responsibility to ask questions if you do not understand something from the lecture. This is a difficult class and I do not expect you to understand everything the first time. Please take advantage of office hours. I have extra readings that I will make available in case the text is not clear on some issues.
It should go without saying that attendance is absolutely mandatory. Your lab assignments will consist of 25% of your grade (graded in terms of √, √+, √- and 0). If I note that students are not doing the reading, then pop quizzes will accompany the lab grades. The final exam will count for 25% of your grade. The paper is 50% of your grade. The paper is divided into three sections, the research design, measurement, and analysis. Each subsequent section includes revisions of previous sections. 10% of your paper grade will consist of your comments on previous student’s paper, which are due one week before the due date of each paper.
I find that people learn in many different ways, and so I have readings on the topics from multiple textbooks in my office that I will make available in addition to the required texts for the course. You are responsible for the material and if you have no problem understanding the material from my lectures and the texts, then you are not required to read the additional available materials. However, if you are having any difficulties at all, I expect you to read the extra material.
McClendon, McKee. 1994. Multiple Regression and Causal Analysis. Peacock.
McIver, John P., and Edward G. Carmines. 1981. Unidimensional Scaling. Beverly Hills: Sage Publications.
Carmines, Edward G., and Richard A. Zeller. Reliability and Validity Assessment. Beverly Hills, Calif.: Sage Publications.
|
Date and Day |
Topic |
Assignment |
|
August 24 |
Introduction |
|
|
August 26 |
History of Causality, Political Science Discipline |
Get email address and sobek account. Lab Assignment Due: Search through APSA program and PROceedings, and ICPSR for potential research problems and respective data. You will report your findings in class.
Reading: McKee, 1-8 |
|
August 31 |
Spurious Relationships and Types of Variables |
Lab Assignment Due: Download and define study 6403 ICPSR data from ICPSR website in SPSS. You should begin to familiarize yourself with SPSS commands using the data lab’s resources. Reading: McKee, 8-18 |
|
September 2 |
|
No class. You should spend your extra time searching for a paper topic. |
|
September 7 |
Mean, Variance, Standard Deviation, Normality, Probability, Central Limit Theorem |
Lab Assignment Due: Come up with your own example of a spurious relationship. You should have a paper topic at this stage. Reading: McKee, 20-28 |
|
|
Discussion of Crosstabs Analysis |
Lab Assignment Due: Do a descriptive univariate analysis that tells you the mean, standard deviation and variance of a particular variable. Also, conduct both histogram and frequency analyses. Interpret the results. Is there a great deal of variance in this variable? Reading: Walt Stone and David Davis: An Introduction to Quantitative Methods. |
|
|
Introduction to Bivariate Regression: Covariance, Correlation, Expected Values |
Lab Assignment Due: Do a crosstab analysis and control for a possible intervening (or spurious) variable. Interpret the results. Explain whether your secondary variable is intervening, spurious or neither and be sure to explain the criteria for obtaining any of the above three. You should have a topic outline and an exhaustive bibliography for your research design at this stage. |
|
|
Bivariate Regression: Fitting a Line |
Reading: McKee 28-41 |
Course Outline, continued
|
|
Properties of Estimators |
|
|
|
Elements of Computing Bivariate Analysis |
Begin Reading: Unidimensional Scaling |
|
|
Reporting Bivariate Results |
Lab Assignment Due: Run a theoretically driven bivariate analysis and write up the results. |
|
|
Creating Indices; Guttman Scales |
Reading: Unidimensional Scaling |
|
|
Validity |
Lab Assignment Due: Find a set of indicators that can be combined into a single variable. Defend your measure against potential alternatives, including standardization. Reading: Reliability and Validity, Chapters 1 and 2 |
|
|
Reliability |
Reading: Reliability and Validity, Chapters 3 and 4 |
|
|
Missing data and recoding |
Lab Assignment Due: Create an index (or use the index you have already created) and discuss reliability and validity of measure. |
|
|
Introduction to Multivariate Regression |
Reading: McKee 60-80 |
|
|
The Logic of Controlling |
Reading: Re-read McKee 60-80 |
|
|
Multivariate Regression: Residuals |
Reading: McKee 94-118 |
|
|
Interpreting Multivariate Output |
Reading: Re-read McKee 94-118 |
|
|
Review Multivariate Regression |
|
|
|
Dummy Variables |
Lab Assignment Due: Run a multiple regression. Interpret the slope coefficients. Note: It would be best if this were your regression analysis for your paper. Reading: McKee 198-214 |
Course Outline, continued
|
|
Quadratic Terms |
|
|
|
Interactions I: Interpreting Dummy Variable Interactions |
Lab Assignment Due: Run a regression with an ordinal variable. Then, create a dummy variable with that ordinal variable and run the same regression with the new variable. Evaluate the differences in the two equations. Reading: McKee 271-281 |
|
|
Interactions II: Interpreting Ordinal and Categorical Interactions |
Reading: McKee 281-287 |
|
|
Calculating Interactions |
|
|
|
Evaluating and Correcting for Missing Data |
Lab Assignment Due: Run regression with an interaction. Interpret the slopes under the different conditions (levels of the interaction variable). Reading: TBA |
|
|
Evaluating Regression Models: I |
reading TBA |
|
|
Evaluating Regression Models: II |
reading TBA |
|
|
Evaluating Regression Models: III |
reading TBA |
Paper due dates:
Wednesday September 22
Wednesday October 27
December 1
FINAL EXAM DUE MONDAY, DECEMBER 13, 5pm