Joan V. Joseph [she/they]


  • home
  • preanalysis
  • junior
  • tutorials
  • thesis
  • book
  • academia
  • twitter
  • cv

Principal Component Analysis
for Face Detection and Classification



June 1, 2019



The increasing availability and accessibility of visual content data, which may provide distinctive information not available in text data, and innovations in machine learning have reduced the costs of analyzing these data and allowed researchers to introduce new techniques such as neural networks and deep learning to do so. Torres (2018) quantifies protest images from the Black Lives Matter movement using the Bag of Visual Words method, Joo and Steinert-Threlkeld (2018) highlights convolutional neural networks, a new class of automated methods based on computer vision and deep learning which can automatically analyze visual content data, and Zhang and Pan (2018) create a machine-assisted system that uses social media data to identify collective action events occurring in the real world by using deep learning, image as data, and two-stage classification. The goal of the techniques used in these studies which contribute to the study of political communication and social movements among others is image classification, which in a very simple sense is a pattern recognition problem that presents a challenging area of research. Though a difficult task, emulating image classification on a computer system has great implications for computational science and machine learning, among other fields, including political science, where an increasing number of researchers are using images and videos as data for causal inference. The aim of this tutorial is to highlight a computational model of image classification that is fast, simple, and accurate in a limited environment which does not depend upon three-dimensional models or geometry with application to face detection and classification. I highlight Principal Component Analysis (PCA), a nonparametric method of extracting relevant information from high dimensional data and the most traditional and extensively used algorithms in the research of face classification (Yang et al. (2004), Moon and Phillips (2001)).



Structural Topic Models
Latent Topics of Indirect Rule



June 1, 2019



The increasing availability and accessibility of text data, which may provide distinctive information not available in traditional data, and innovations in machine learning have reduced the costs of analyzing these data and allowed researchers to introduce methods such as automated text analysis to reduce the costs on analyzing these data. In this tutorial I present the Structural Topic Model (STM), an unsupervised method for uncovering thematic structure within a corpus of documents which can improve qualitative interpretability and causal inference, as the topics found in the text may provide information on how to best operationalize "difficult to measure” concepts and improved measurement implies improved causal inference. Building off of the tradition of probabilistic topic models such as the Latent Dirichlet Allocation and Correlated Topic Models, the STM’s key innovation is that it allows the incorporation of metadata, information about each document, into the topic model.



Dynamic Causal Inference
Finite and Polynomial Distributed Lag Models



June 1, 2019



Data that have both cross-sectional and time-series dimensions give researchers more information on unit (individual, state, firm, etc.) behavior and the ability to ask and analyze more theoretically interesting questions than possible with single dimensional data. Panel data also have several advantages over cross-sectional and time-series data in that it can aid in uncovering dynamic relationships as the data provide information for a unit at any fixed point in time, buying the ability to measure unit-specific change over time. The most com- mon model used to exploit panel data to account for unobserved, unit-specific heterogeneity while obtaining estimates on observable variables of interest is the linear fixed effects model [yit = γi + βX′it + εit] where γi allows for unit-specific heterogeneity by allowing the intercept to vary by unit. The fixed effects model is equivalent to demeaning all observations so that all cross-sectional effects are eliminated and only within-unit temporal variation is used for the estimation of β. However, this makes it so that the fixed effects model provide no leverage on modeling the dynamics of most political economy theories since the types of questions it may answer are only related to whether temporal variation in x is associated with temporal variation in y. These theoretical models often suggest that some relationships changes in x may lead to temporary or permanent change in y, or that x may have a lagged effect. For example, changes in the rate of monetary growth have only temporary effects on real output growth and changes in trade openness may have a lagged effect on welfare state policies. In this tutorial, I present the finite and polynomial distributed lag models which can model the dynamics of political economy questions of interest and then use a panel dataset to model the dynamic effects of globalization on the welfare state.



Matching Methods for Causal Inference



July 15, 2019



There is a large literature on methods for estimating average treatment effects under the selection-on-observables. A frequent quantity of interest for researchers is the average treat- ment effect, the difference in potential outcomes and an average effect over some population is generally estimated, E[Y1i − Y0i], or the average effect of the treatment on the treated, ATT, which is the effect for those in the treatment group E[Y1i − Y0i|Di = 1]. For example, what is the effect of job training programs on future earnings? While the fundamental problem of causal inference makes it difficult to compute these quantities – for an individ- ual that participated in the job training program, we can never simultaneously measure the earnings in case they did participate, and earnings in case they did not participate – there are ways to get around it, matching, weighting, and regression. Let’s say that we know that ignorability holds conditional on some pretreatment covariates Xi. We know that we have to control for Xi in some way, but what is the best way to do this? Matching methods for causal inference selectively prune observations from the data in order to reduce model dependence. They are successful when simultaneously maximizing balance between the treated and control groups on the pre-treatment covariates and the number of observations remaining in the data set. This tutorial introduce the foundations of matching and matching methods as ways that researchers employ to work around these issues which involves taking observational data, such as data from surveys, and matching people who have similar characteristics but different treatments. But researchers can identify two individuals who are similar in almost every way except their job training status. It stands to reason the effects of the job training program on the participant can approximate the effect of job training program on the non-participant, and vice versa. This is the idea of matching methods: create two such groups that are similar in every respect but their treatment. Matching can then be thought of as broadly to be any method that aims to balance the distribution of covariates in the treated and control groups. The most important property of matching is that, under ignorability, if we are able to find a matching solution with good balance on the covariates, then no further modeling of the covariates is necessary and we can ignore the linearity assumption required by regression.