Laveesh Bhandari: Doing more with less

Last Updated: Wed, Aug 24, 2011 04:20 hrs

From sampling to surveying to estimating, the the National Sample Survey Organisation (NSSO) continues to use methods that are consistent across time. However, the economic structure is changing and so is the proclivity of Indian households to respond to surveys. Methods that were developed in the past will not work as well now, and these methods need to change. The changes will require innovative practices that will need to be developed not just in research institutions such as the Indian Statistical Institute, but also within the Central Statistical Organisation and NSSO.

Take sampling, which requires the survey agency to randomly choose from all households in the country. But in an era where temporary and permanent migration is rampant, the base data need to be updated very frequently. Such updated data do not exist all the time. The Census develops a universal list every 10 years that gets dated even before it is released. The Election Commission’s list is supposed to be close to universal, but in practice rarely is. Unique Identification (UID) will have universal coverage after many years. In the past when the numbers were smaller, it was easier to overlook this problem, but now it is much more serious.

Are there solutions? Yes there are, and we need to identify such possibilities and develop new methods that incorporate them. Take remote sensing data, which can help us identify where new habitations are coming up on a real-time basis. The triangulation of mobile usage can help us identify the major centres of mobile usage and consequently human presence. Admittedly, methods may not currently exist to use such information, but the tools of collecting such information do exist, and new methods can be developed.

The need for having a lot of information has led to a very large questionnaire that can take hours to respond to and consequently many households refuse. Detailed item-by-item queries running into thousands are impossible for anyone to recall and respond to. Respondents, rich or poor, who have other more important things to do, will obviously refuse or fudge. Some researchers reward households for taking the time to respond. Yet other methods involve multiple questionnaires with one set of queries that are asked of all households, and other questions asked to only some households. Econometric methods then can be used to help generate a common set of estimates. These are just some among many experiments that need to be tried.

Policy-makers love the five-yearly large sample surveys that have greater than 100,000 respondents. The annual “small sample” surveys of as many as 40,000 or 50,000 respondents are rarely used. Why? Because 50,000 households is considered a small sample. Time and again, basic statistics has shown that you don’t need such a large sample. And increasing sample size beyond a few thousands does not generate significantly better estimates. People who have directly managed large surveys will tell you that the larger the sample size, the tougher it is to maintain quality. Large surveys requires large teams, greater monitoring and control, and there are control losses across each hierarchical level. In other words, large samples typically have lower data quality than small samples when you are in the range of tens of thousands. (Of course we cannot obtain dis-aggregated estimates, for instance, at the sub-state level, but currently we have a problem at the all-India level. Let’s solve that problem first.)

In other words, we should try to reduce the sample size as much as is possible. And the resources so saved need to be put into ensuring better quality sampling, questionnaires, responses and methods to convert those responses into estimates.

To underscore the same point, currently the small sample surveys generate results that vary greatly year on year, poverty being one example. That is obvious. There are millions of people in very close vicinity of the poverty line, and so small changes in the poverty line can have a very large impact on poverty. The yearly fluctuation in estimates of poverty is natural and not necessarily a flaw in the underlying data-generating process. But even if there is a problem in that process, it is that process that needs to be strengthened.

A related aspect is on price index and inflation estimates. Good and timely inflation estimates are required not just for monetary policy purposes but also for estimating poverty. Collection of price data of a predefined set of commodities is among the easiest data collection tasks. But we have a large number of items that are not updated regularly, there are missing data, and reported prices do not appear to be in line with the ruling prices. In short, the problem is the same — we need to invest in control and monitoring of this system.

But there is an additional problem. The government already collects masses of information that is just not used. The ministry of corporate affairs collects company financials, the ministry of commerce collects information on exports and imports but only releases highly aggregated numbers many months after collection, the Provident Fund organisation collects data on employment, the Reserve Bank of India on credit and deposits and so on. But these data are either not used, or released with such a major time lag that not much can be done to feed into policy. The Census also makes it impossible for unconnected researchers to access its raw data. All of this data can help cross-check, calibrate, and fine-tune the data being generated on the Indian economy by other arms of the government.

Finally, the data collection machinery needs to go back to its roots and reconnect with the era where innovation ruled and highly structured administrative mechanisms had not killed initiative. The best minds entered such organisations, were mentored, and they created something unlike anywhere else in the world. We can do it now if we could do it in the past.

Concluded. The first in this series “Wanted: New ways to figure the facts” appeared on August 6. And the second part “Mis-reading the numbers” appeared on August 13

More from Sify: