Academic Paper Grokking

Oliver Jack Dean Jul 10, 2023

Lately, I've been pondering the art of reading economic, socio-economic, or any old academic paper teeming with data.

Surprisingly, there aren't many up-to-date guides available on this subject.

However, fear not, for I stumbled upon some an amazing open-source course which provided me with a potential approach.

First and foremost, I highly recommend checking out Scott Cunningham's remarkable Mixtape Sessions. These sessions offer a delightful and accessible dive into various economic and econometric theories.

As I read the material, I discovered a recurring theme in great papers—they heavily rely on well-crafted and reliable models.

And what often elevates these models to the realm of reliability and high quality are the utilization of "instrumental variable" (IV) techniques.

Now, IVs are a clever set of statistical techniques that economists and researchers, in general, employ to estimate causal relationships in the face of endogeneity, omitted variable bias, measurement errors, and simultaneity. They prove particularly invaluable when controlled experiments are not feasible. And believe me, many academic papers fall under this category!

However, finding suitable IVs can be quite challenging for many researchers. Those who manage to identify or develop such IVs truly have a model worth exploring and of "good quality".

IVs are then merged and combined with other techniques, like Differences-in-Differences - to understand the impact or effects of IVs applied to X.

So, based on the Mixtape Sessions and my own musings, I've distilled a few key takeaways that serve as a self-used framework for reading and analyzing academic papers in general. Let's dive in:

Models vs. Estimands vs. Estimators

To begin with, when perusing papers, keep in mind that models often serve as descriptive tools, outlining the interactions between different variables, such as economic or healthcare variables.

However, it's crucial to note that many models in academic papers are built upon strong assumptions. If these assumptions are not met, the model risks bias and inconsistency.

Tricky stuff.

But coming back to variables and parameters, for instance, in a supply-demand model, a parameter could be the elasticity of demand, capturing the responsiveness of quantity demanded to changes in price.

Similarly, in the context of disease spread, parameters like the basic reproduction number (R0) and infection rates offer insights into the dynamics of the phenomenon.

The Econ Paper Review Framework

So, how to read econ papers or heavy healthcare data science papers?

Well, here's a simplified framework that is made up of three high level components:

Familiarize yourself with the paper's research question and the underlying model used to address it.
Scrutinize the assumptions made within the model and consider their potential impact on the results.
Pay close attention to the parameters employed in the model. Understand their significance and how they contribute to the broader understanding of the topic.

And below, I have expanded upon these core components and compressed it all down into a min-framework or mini-methodology for reviewing Econ papers or any kind of paper using data and models - based on my learnings from the Mixtape Sessions:

Step	Question	Example	Related Techniques/Terminology
1	What is the main research question or objective of the paper? What parameters are we interested in estimating?	Does the paper aim to determine the causal effect of X on Y? Are we interested in estimating parameters that describe how X and Y interact, such as the elasticity of demand in a supply-demand model?	Causal effect, treatment effect, outcome variable, parameters, elasticity of demand
2	How does the paper establish causal inference between the variables? Does the paper use a DiD approach? (Optional: Not all papers may use this approach)	Does the paper use a randomized control trial, natural experiment, or observational data to establish causality? Does the paper use a DiD approach to measure the effect of policy X on outcome Y?	Causal inference, endogeneity, exogeneity, Differences in Differences (DiD)
3	What are the instrumental variables used in the paper, and how are they justified?	Does the paper use Z as an instrument for X? Is there a clear explanation of how Z affects X but not Y, except through X?	Instrumental variables, relevance, exclusion restriction
4	Does the paper discuss any heterogeneous effects in the data?	Does the paper discuss how the effect of X on Y might differ across different groups or under different conditions?	Heterogeneous effects, Local Average Treatment Effect (LATE), Marginal Treatment Effects (MTE)
5	What parameters are being estimated, and how are they being estimated?	Does the paper aim to estimate the average treatment effect (ATE), or is it focused on local average treatment effects (LATE)? What estimators are used?	Parameters, estimands, estimators, Average Treatment Effect (ATE), Local Average Treatment Effect (LATE)
6	Does the paper use machine learning methods for IV estimation? If so, how are these methods justified and implemented?	Does the paper use a machine learning algorithm to estimate the treatment effects? Is there a clear explanation of how the algorithm was trained and validated?	Machine learning, algorithm, training, validation
7	Does the paper address the issue of weak instruments or the use of many instruments in IV estimation? If so, how are these issues addressed?	If the paper uses many instruments, does it use methods like the Lasso for instrument selection? If weak instruments are a concern, does the paper use techniques like the Stock-Yogo weak instrument test?	Weak instruments, many instruments, Lasso, Stock-Yogo weak instrument test
8	How does the paper evaluate the robustness of its findings?	Does the paper perform sensitivity analyses or robustness checks to assess the validity of its findings?	Robustness, sensitivity analysis, validity
9	What are the limitations of the study, and how does the paper address them?	Does the paper acknowledge potential issues with its methodology, such as the exclusion restriction in IV estimation, or potential biases in the machine learning algorithm?	Limitations, methodology, exclusion restriction, bias
10	Does the paper discuss the parallel trends assumption? If so, how is it addressed and tested? (Optional: This is specifically relevant for papers using a DiD approach)	Does the paper discuss the parallel trends assumption in the context of the DiD approach? Is there a clear explanation of how this assumption is tested and what the results of the test are?	Parallel trends assumption, Differences in Differences (DiD), testing parallel trends
11	Does the paper discuss the assumptions behind the model used? If so, how are these assumptions justified and tested?	Does the paper discuss any assumptions about the relationships between variables in the model? Is there a clear explanation of how these assumptions are tested and what the results of the test are?	Model assumptions, testing assumptions, instrumental variables, monotonicity assumption (one of many possible assumptions)