There is a large literature on methods for estimating average treatment effects under the selection-on-observables. A frequent quantity of interest for researchers is the average treat- ment effect, the difference in potential outcomes and an average effect over some population is generally estimated, E[Y1i − Y0i], or the average effect of the treatment on the treated, ATT, which is the effect for those in the treatment group E[Y1i − Y0i|Di = 1]. For example, what is the effect of job training programs on future earnings? While the fundamental problem of causal inference makes it difficult to compute these quantities – for an individ- ual that participated in the job training program, we can never simultaneously measure the earnings in case they did participate, and earnings in case they did not participate – there are ways to get around it, matching, weighting, and regression. Let’s say that we know that ignorability holds conditional on some pretreatment covariates Xi. We know that we have to control for Xi in some way, but what is the best way to do this? Matching methods for causal inference selectively prune observations from the data in order to reduce model dependence. They are successful when simultaneously maximizing balance between the treated and control groups on the pre-treatment covariates and the number of observations remaining in the data set. This tutorial introduce the foundations of matching and matching methods as ways that researchers employ to work around these issues which involves taking observational data, such as data from surveys, and matching people who have similar characteristics but different treatments. But researchers can identify two individuals who are similar in almost every way except their job training status. It stands to reason the effects of the job training program on the participant can approximate the effect of job training program on the non-participant, and vice versa. This is the idea of matching methods: create two such groups that are similar in every respect but their treatment. Matching can then be thought of as broadly to be any method that aims to balance the distribution of covariates in the treated and control groups. The most important property of matching is that, under ignorability, if we are able to find a matching solution with good balance on the covariates, then no further modeling of the covariates is necessary and we can ignore the linearity assumption required by regression.