I'm currently working on a fascinating project to test the effectiveness of privately-owned vs. government-operated sanitation facilities/businesses - whether or not privately-owned businesses improve access to sanitation facilities. Like all non-laboratory experiments, it's a bit imperfect. Here's what we've got:
1) Treatment villages started at different times - ranging from 2008 to 2013
2) We have sanitation access data, monthly, from 2008 to 2014
3) We have identifying information and a plethora of demographics, descriptors, and the like.
So, we are going with a propensity score matching and difference-in-differences model here. What that means is we will have a test and control group, we will match up the two on multiple characteristics, then do a simple A/B test to see if there is a statistically significant difference in access to sanitation in treatment v control groups.
Here's the issue we are running into. Since we have staggered start times, we don't have a clean time 0 (as it were). We thought of simply getting rid of chronological time and lining up the villages by start time, time n+1, time n+2 (measurements taken quarterly).
We run into the following problem - how do we then pair the treatment groups with control groups, since the control groups have no time 0?
The answer...as soon as I figure it out :)