tekst1

1.       Randomized control trial (RCT)

Randomly assign samples in 2 groups: treated/untreated

a)       Limits to the method

Not very socially sensitive

Internal validity:          results must be valid

External validity:         results can be generalized

Full compliance:         no outside influence on sample

Spillovers:                    no externalities affecting random samples

Political issues:            may be sensitive dividing people in groups; some may not like being divided into groups

 

b)       How to test the effectiveness 90% confidence test

A simple T-test

ci (sample 1) (sample 2), level(90)

ttest (sample 1) = (sample 2)

 

c)       Idea of randomized experiments, power of tests

randomized experiments 

are the experiments in science that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and insurvey sampling.

The power of a statistical test is the probability that it correctly rejects the null hypothesis when the null hypothesis is false.

 

2.       DID

the before/after and the treated/not-treated comparisons

β = (Y treatment after – Y treatment before) – (Y control after – Y control before)

a)       Compute DID estimator for females

*Simple DID comparison using the 'ttest' command*

 

gen exptot0=exptot if year==0

bysort nh: egen exptot91=max(exptot0)

 

gen exptot1=exptot if year==1

bysort nh: egen exptot98=max(exptot1)

 

gen lexptot91=ln(1+exptot91)

gen lexptot98=ln(1+exptot98)

 

su lexptot98 if year==1 & dfmfd==1

global yt1p1=r(mean)

su lexptot91 if year==1 & dfmfd==1

global yt0p1=r(mean)

 

su lexptot98 if year==1 & dfmfd==0

global yt1p0=r(mean)

su lexptot91 if year==1 & dfmfd==0

global yt0p0=r(mean)

 

display ($yt1p1-$yt0p1)-($yt1p0-$yt0p0)

 

gen lexptot9891=lexptot98-lexptot91

ttest lexptot9891 if year==1, by(dmmfd)

ttest lexptot9891 if year==1, by(dfmfd)

 

b)       Implement DID with a regression

gen lnland=ln(1+hhland/100)

xtreg y year p yearp sexhead agehead educhead lnland vaccess pcirr rice wheat milk oil egg

 

c)       Unobserved heterogeneity and equal trend assumptions

Unobserved heterogeneity:

When the Conditional Independence Assumption (CIA) does not hold because units were selected in the program on the basis of unmeasured characteristics that are expected to influence outcomes.

Example: more motivated teachers self-select into a training programme; since motivation is typically not observable, it cannot be introduced in the model, and thus, the matching estimator will be unable to isolate the impact of the treatment from the impact of motivation.

Hence, the usual matching estimator may be biased in case of selection-on-unobservables. If longitudinal data are available, the CIA can be further relaxed by simply differentiating out the data. This procedure in a regression framework allows accounting for any time-invariant unobservable characteristics that may influence simultaneously the participation decision and the outcome of interest.

equal-trends assumption

the outcome variation we observe for the comparison group would be the same also for the treatment group if the programme had not existed. In other words, the DID approach assumes that, without the treatment, there would be no differences in the trends of the outcome variable of the two groups.

 

3.       PSM

The propensity score is the probability to be treated given a set of covariates. Thus, we need to estimate a probabilistic model where the dependent variable is the treatment status and the independent variables are the covariates that determine the probability to be treated.

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus to those that did not.

For non-randomised experiments, to evaluate a programmes. With PSM we no longer match each treated unit to an untreated unit that has exactly the same value for all observed control characteristics. Instead, for each unit in the treatment group and in the pool of untreated units we compute the probability that a unit will enrol in the programme based on the observed values of its characteristics (so-called propensity score) and implement matching using only this information.

Propensity score is a single number 0 – 1. Once the propensity score has been computed for all units, then units in the treatment group can be matched with units in the pool of untreated units that have the closest propensity score: “closest units”. They become the comparison group and are used to produce an estimate of the counterfactual. Then, the difference in average outcomes between the treatment units and their matched comparison units produces the estimated impact of the programme.

two main assumptions.

1) There must be a set of observed characteristics that are able to explain the dissimilarities between treated and untreated units (and therefore the individual’s decision to enrol in the programme). This is the same assumption we analysed in the previous chapter, that is the Conditional Independence Assumption (CIA). In general, the CIA is crucial for identifying the causal effect of the programme because it ensures that, although treated and untreated groups differ, these differences may be accounted for in order to reduce the selection bias.

2) The second key assumption for a good matching is the existence of a sufficient overlap between the observed characteristics of treated and untreated units. This is called Common Support Condition (CSC) and formally states that for every value of 􀁋 there must be a strictly positive probability of being both treated and untreated:

 

a)       Generate propensity score

 

tab dfmfd

 

label define treatment 1 "treated" 0 "untreated"

label values dfmfd treatment

ta dfmfd

 

pscore dfmfd sexhead agehead educhead lnland vaccess pcirr rice wheat milk egg, pscore(ps98) blockid(blockf1) comsup level(0.01) logit

cap drop ps98 blockf1

pscore dfmfd sexhead agehead educhead lnland vaccess pcirr rice wheat milk egg oil, pscore(ps98) blockid(blockf1) level(0.01) comsup logit

 

table dfmfd, c(sum ps98 min ps98 max ps98)

ta comsup dfmfd

twoway (kdensity ps98 if dfmfd==1 , color(red) xtitle(propensity score) legend(label(1 "Treated") label(2 "Unreated"))) (kdensity ps98 if  dfmfd==0, color(blue) )

 

cap drop comsup2

su ps98 if  dfmfd==1

gen comsup2=(ps98>r(min))

su comsup*

su ps98 if  dfmfd==0

replace comsup2=0 if ps98>r(max)

ta comsup2 dfmfd

 

b)       Caliper matching with radius 0.01

A caliper: a maximum propensity score distance to avoid the risk of poor matches. Caliper matching uses only the nearest neighbour within each caliper (if any), whilst radius matching uses all untreated units within the caliper. If no untreated units fall within the caliper of a given treated unit no matching can be implemented, and that treated unit is dropped from the estimation sample. The smaller the caliper the higher the probability to lose treated units, but smaller the caliper the better the matching in terms of similarity of propensity scores. Again, there is a bias/precision trade-off.

 

use dataset,replace

drop if p1==.

gen logit1=log((1-p1)/p1) 

sum logit1

set seed 1000

generate x=uniform()

sort x

psmatch2 aodserv, pscore(logit1) caliper(.01) noreplacement descending

sort _id

g match=id[_n1]

g treat=id if _nn==1

drop if treat==.

sum treat

 

c)       Tradeoff bias and precision, kernel matching balancing testing

Tradeoff: smaller the caliper  --> the higher the probability to lose treated units. The smaller the caliper the better the matching in terms of similarity of propensity scores. A bias/precision trade-off.

 

Kernel matching (KM): a nonparametric matching algorithm that compares the outcome of each treated unit with a weighted average of the outcomes of all nonparticipants, with higher weights assigned to the outcomes of the most similar nonparticipants (in terms of propensity score).

 

Balancing test:  if the propensity score adequately balances characteristics between the treatment and comparison groups. Another step in assessing the quality of matching. There should be no statistically significant difference between the covariate means of the treatment and comparison groups. If these tests indicate that balance is not achieved, and there is no other (available) variable that could be added to the model.

Example: if there are large mean differences in an important covariate in the model between the treatment and comparison groups, we add the square of the variable and/or interactions with other variables. The estimation of the propensity score, matching procedure and balancing test would then be repeated to check for improvement in the balancing performance. This process should be repeated until balance is achieved.

Sometimes balance on the matched samples is not possible, regardless of the amount of adjustment efforts. A balancing test requires comparing the average characteristics of units with a similar propensity score so as to check whether these characteristics differ between treated and untreated units. A possible strategy is to split the sample of both treated and untreated units in different quantiles as if we were to implement a stratified matching; hence, the number of quantiles should be chosen accordingly with the aim of not observing significant differences in the average propensity score of the two groups in each quantile. Then, for each quantile, simple t-tests can be used to check whether there exist statistically significant differences between the covariate means of the two groups.

 

4.       Discontinuity design approach

Often we are interested in people around a certain threshold (older than certain age ), by looking at a narrow band of units that are just below and just above these cut-off points and comparing their outcomes, one can get a good estimate of the programme’s impact. In fact, when we compare units that are just above and just below an exogenously given cut-off point we are in a situation that is very close to a randomised experiment; the reason is that people who are just above (or just below) the threshold become eligible (or not eligible) for the programme basically by chance.

 

a)       Local polynomial regressions

You make two lines left and right of the treshold line

Kernel Smoother

 

sum age unemployment_duration

table age50, c(mean unemployment_duration) row

tab age

 

egen age_bins = cut(age), at(46(0.25)54)

tab age_bins

su age_bins

 

bysort age_bins: egen mean_unempdur = mean(unemployment_duration)

 

sum mean_unempdur

 

scatter mean_unempdur age_bins  || lfit mean_unempdur age_bins if age_bins < 50 ///

|| lfit mean_unempdur age_bins if age_bins >= 50 , xlabel(46(1)54) ylabel(0(10)40) xline(50)

 

gen t=(age >=50)

tab t

 

gen agesc =age - 50

generate t_agesc = t*agesc

 

forvalues i=2/4 {

        cap generate agesc`i' = agesc^`i'

        cap generate t_agesc`i' = t*agesc`i'

        }

       

regress unemployment_duration t agesc t_agesc, robust

regress unemployment_duration t agesc agesc2 t_agesc t_agesc2 , robust

regress unemployment_duration t agesc agesc2 agesc3 t_agesc t_agesc2 t_agesc3, robust

regress unemployment_duration t agesc agesc2 agesc3 agesc4 t_agesc t_agesc2 t_agesc3 t_agesc4, robust

 

predict fitq4

 

cap drop output*

lpoly unemployment_duration age if age<50,  kernel(epan2) generate(output0) at(age) nograph

lpoly unemployment_duration age if age>=50,  kernel(epan2) generate(output1) at(age) nograph

 

sum output0 if age>=49 & age <50

scalar outcome0 =r(mean)

 

sum output1 if age>=50 & age<51

scalar outcome1 =r(mean)

 

scalar diff_outcome= outcome1-outcome0

display diff_outcome

 

twoway scatter mean_unempdur age_bin || line output0 age if age<50 || line output1 age if age>=50, xlabel(46(1)54) ylabel(5(10)40) xline(50)

 

scatter mean_unempdur age_bins  || lfit mean_unempdur age_bins if age_bins < 50  ///

                                                                                                                     || lfit mean_unempdur age_bins if age_bins >= 50 ///

                                                                                                                     || qfit mean_unempdur age_bins if age_bins < 50  ///

                                                                                                                     || qfit mean_unempdur age_bins if age_bins >= 50 ///

                                                                                                                     || line output0 age if age_bins < 50                                       ///

                                                                                                                     || line output1 age if age_bins>=50, xlabel(46(1)54) ylabel(0(10)40) xline(50)

 

 

b)       Estimated effect

cap drop output*

lpoly unemployment_duration age if age<50,  kernel(epan2) generate(output0) at(age) nograph

lpoly unemployment_duration age if age>=50,  kernel(epan2) generate(output1) at(age) nograph

 

sum output0 if age>=49 & age <50

scalar outcome0 =r(mean)

 

sum output1 if age>=50 & age<51

scalar outcome1 =r(mean)

 

scalar diff_outcome= outcome1-outcome0

display diff_outcome

 

twoway scatter mean_unempdur age_bin || line output0 age if age<50 || line output1 age if age>=50, xlabel(46(1)54) ylabel(5(10)40) xline(50)

 

scatter mean_unempdur age_bins  || lfit mean_unempdur age_bins if age_bins < 50  ///

                       || lfit mean_unempdur age_bins if age_bins >= 50 ///                                                                            || qfit mean_unempdur age_bins if age_bins < 50  ///

                       || qfit mean_unempdur age_bins if age_bins >= 50 ///

                       || line output0 age if age_bins < 50               ///

                       || line output1 age if age_bins>=50, xlabel(46(1)54) ylabel(0(10)40) xline(50)

 

c)       Fuzzy RDD, local polynomial regression and SEs

Fuzzy Regression discoiutivity design RDD

Some people above the threshold do not get treated and some people below the threshold do get treated. This occurs when eligibility rules are not strictly followed and people may migrate between analysed areas. In these situations there is no sharp switch of the treatment status (from 0 to 1 for everybody) at the cut-off point, but a sudden jump of the conditional mean of the treatment status around the cut-off point. This RD framework = Fuzzy RD Design. The discontinuity is stochastic or “fuzzy,” and instead of measuring differences in outcomes above and below the threshold, the impact estimator would measure the difference around a neighbourhood. In the sharp assignment version, the jump in the treatment status is exactly one unit at the threshold (from 0 to 1). Impact estimator is simply the jump of the average outcomes at the cut-off point and the effect is given by the average outcome for the individuals just above the threshold minus the expected outcome for the individuals just below the threshold. In the fuzzy version of the RDD, the jump in outcomes is caused by some jump in the treatment status that needs not to be one. The effect is given by the jump of the average outcomes weighted with the jump of the average treatment status around the cut-off point.

 

The advantages of the RD method are:

(1) that it yields an unbiased estimate of treatment effect at the discontinuity;

(2) that it can take advantage of known rules for assigning people to the program; (c) that a group of eligible individuals need not be excluded from treatment.

 

The concerns with RD are:

(1) that it produces local average treatment effects that are not always generalizable;

(2) that the effect is estimated at the discontinuity, so, generally, fewer observations exist than in a randomized experiment with the same sample size.

 

Local polynomial regression

 

SE

 

 

5.       Instrumental variable approach

IV approach can be useful, because we can still identify some treatment effect under the assumption of selection on unobservables. To introduce the IV approach, let us assume a very simple Data Generating Process (DGP) of the observed wages

Instrumental variable methods allow consistent estimation when the explanatory variables (covariates) are correlated with the error terms of a regression relationship. Such correlation may occur when the dependent variable causes at least one of the covariates ("reverse" causation), when there are relevant explanatory variables which are omitted from the model, or when the covariates are subject to measurement error. In this situation, ordinary linear regression generally produces biased and inconsistent estimates. However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the explanatory equation and is correlated with the endogenous explanatory variables, conditional on the other covariates. In linear models, there are two main requirements for using an IV:

·         The instrument must be correlated with the endogenous explanatory variables, conditional on the other covariates.

·         The instrument cannot be correlated with the error term in the explanatory equation, that is, the instrument cannot suffer from the same problem as the original predicting variable.

 

 

a)       Ff

corr lwage nearc4, cov

global cov_yz=r(cov_12)

 

corr educ nearc4, cov

global cov_tz=r(cov_12)

 

display $cov_yz/$cov_tz

 

b)       IV

 

c)       LATE             

The standard IV estimator can recover local average treatment effects (LATE) rather than average treatment effects (ATE). Imbens and Angrist (1994) demonstrate that the linear IV estimate can be interpreted under weak conditions as a weighted average of local average treatment effects, where the weights depend on the elasticity of the endogenous regressor to changes in the instrumental variables. Roughly, that means that the effect of a variable is only revealed for the subpopulations affected by the observed changes in the instruments, and that subpopulations which respond most to changes in the instruments will have the largest effects on the magnitude of the IV estimate.

For example, if a researcher uses presence of a land-grant college as an instrument for college education in an earnings regression, she identifies the effect of college on earnings in the subpopulation which would obtain a college degree if a college is present but which would not obtain a degree if a college is not present. This empirical approach does not, without further assumptions, tell the researcher anything about the effect of college among people who would either always or never get a college degree regardless of whether a local college exists.

 

Comments