This section addresses options and issues relating to the selection and use of data. |
Timing The timing of evaluation design relative to project design and implementation is the most important determinant of evaluation quality. The earlier one starts, the more flexibility one has in terms of evaluation design, including data collection. Some of the most rigorous evaluation methods, such as random assignment to treatment, cannot be used if project implementation has already begun. Likewise, primary baseline data can only be collected before project implementation begins. Otherwise, baseline data must be reconstructed from existing data, if secondary data can be found to fill this need. Even when a project is underway or already completed, it may still be possible to design and carry out a decent impact evaluation. Existing data may allow the evaluator to reconstruct baseline data, or match treatment and comparison observations based on characteristics prior to the intervention or even based on time-invariant characteristics, if the secondary data was collected during or after project implementation. Identifying and Accessing Existing Data Finding and accessing existing data can help regardless of the timing of the evaluation, but it becomes more critical the later an evaluation begins, since it is impossible to generate new data in the past. Even well-funded evaluations which are planned prior to project implementation can still benefit identifying and exploiting useful secondary data. There is no sense in duplicating data collection that has already been done, or is already being done. Click here for more information on types of data and potential sources of data. To judge how useful a particular data set is likely to be, understand what indicators it contains by requesting a copy of the questionnaire used to collect the data. Questionnaires are usually shared readily, while getting access to full data sets may require more work. Going through a questionnaire first may save the evaluation team the hassle of requesting the full data set only to find it does not have any indicators they can use. Options for Piggybacking Leveraging existing data collection efforts can be very effective for getting useful data for a fraction of the cost of collecting it personally. Piggybacking can take two forms: Adding questions or modules to existing survey questionnaire Many countries may be willing to do this if the extra cost is covered. Some statistical institutes even have a fixed pricing structure based on the amount of content added to the questionnaire. Augmenting the sample to include observations needed for the evaluation A common situation is to over-sample project areas to generate a sufficient number of treatment observations for a small or targeted program, relying on the statistical institute’s existing master sample frame for the comparison observations. Again, many agencies are willing to administer their survey beyond the master sample, provided the additional cost is covered.
Many statistical agencies are justifiably concerned about maintaining the confidentiality of the data they collect. When collaborating with outside partners to collect data, it is important to be clear up front about exactly what will be shared and delivered, and the exact nature of the ownership and access rights for the data. Combining Data from Different Sources Using data from more than one source is common in impact evaluations. Different approaches can be taken to integrate data from two or more sources for the purposes of an evaluation. Datasets can be used in combination by: Merging separate data sets into one data set Conducting the analysis in stages Using separate data sets to investigate different aspects of a multi-topic study
Merging separate data sets into one data set Merging data from different sources into one data set relies on exploiting shared coding across data sets to match observations. For instance, if two or more data sets contain different indicators for overlapping sets of households, merging them puts all the indicators available for each household together in one data set which can then be used for analysis. A relatively straightforward case is that of combining rounds of panel data, where households or individuals (or ideally both) are given identification codes to facilitate matching observations across time periods. If the data collection was not carefully implemented, different rounds of panel data may be difficult to match. Even though common codes are used to identify households over time, matching individuals within households from one time period to the next can be tricky if the codes for individuals were not strictly maintained in all rounds. Matching individuals within a household using age and gender is one option, but implementing this in software like Stata is complicated, since household composition inevitably changes over time, thereby making a simple ordered listing insufficient for matching. In other circumstances, merging data sets often becomes a question of feasibility rather than what is ideal for the evaluation design. An evaluator would like to match observations at the level used in the evaluation, typically households or individuals. However, because individuals or households are rarely designated by standardized codes across data sets, combining data is typically done at a more aggregated level where standardized codes do exist. Geographically defined codes are usually used but even these may not always be commonly defined across agencies which collect data. Where shared coding is not available, alternative methods of matching may be possible, but they are frequently cumbersome to implement. Using address, telephone number, or name as a proxy for identification codes to match households or individuals is sometimes possible, but formatting and spelling need to be exactly the same for software to recognize the match. Conducting the analysis in stages Another approach to combining data from different sources for an evaluation is to conduct the analysis in stages, each stage using the appropriate data set. The basic idea is to use one data set to identify and construct a comparison group to estimate the counterfactual, and then to use another data set to compare outcomes for the treatment group against the estimated counterfactual. Take as an example an evaluation for an infrastructure provision project targeting villages lacking primary schools, public health clinics, and/or potable water sources. Imagine that the available data consists of a pre-implementation community-level infrastructure inventory used to identify and select villages lacking critical infrastructure, and a household survey collected after the project closed. In this scenario, an evaluator could potentially use the community-level data to model project selection and placement for propensity score matching of treatment and comparison villages. The household survey data could then be used to compare outcomes for similar households in matched communities. The basic idea is to use one data set to identify and construct a comparison group to estimate the counterfactual, and then to use another data set to compare outcomes for the treatment group against the estimated counterfactual. Using separate data sets to investigate different aspects of a multi-topic study A third approach to using separate data sets in an evaluation is to use different methods to address different questions, as data availability allows. The evaluator should use most rigorous methods possible for each question in the evaluation, as determined by data availability. Other questions, for which indicators are not present in all the data sources, can be analyzed using lesser econometric methods. Take, for example, an evaluation of a social fund, with population census data collected before the project began, and an extensive household survey collected after the project closed. For the few outcome indicators present in both the census and the survey, difference-in-difference estimation could be possible. For the majority of outcome indicators only present in the survey, only cross-sectional analysis is possible. A good practice in such a scenario is to try the inferior methods on the questions that are also addressed using the more rigorous methods, in order to compare the results and see if they are similar. This makes for a decent robustness check on the weaker methods. If results are not consistent, the evaluator must carefully consider if the results from the weaker methods may be not be biased for all the questions they are used to address.
|