Saturday, September 22, 2018

Census or models?

From Yale Study Finds Twice as Many Undocumented Immigrants as Previous Estimates by Mohammad Fazel-Zarandi, Jonathan S. Feinstein, and Edward H. Kaplan.

As James C. Scott has noted in Seeing Like A State, one of the principal dynamics of a modern state is the desperate need and desire to exercise greater control by having greater legibility, greater transparency. To know whether everyone is paying their taxes, you have to know how many people there are and what their tax obligation might be and to know that, you need to know in detail how they are making their money.

In a large and complex nation, that can be a challenge.

The demographics alone are challenging. In our Constitution, we require a decadal census - everyone must be identified and counted, person by person. This is consequential because billions of federal dollars are distributed based on headcount. A state with a million people will get only a fraction of the disbursements of a state with ten million.

But actually getting an accurate physical headcount is nigh impossible. It is a big country, there are many people who do not wish to be counted, there are errors in the very process itself. We know that the decadal census headcount is always wrong, but we do not know what the right number is.

There are always advocates that we should not do a census but rather we should do a statistical sampling or algorithmic estimation. There are two weaknesses with the model-based estimation approach. 1) Models are only as good as the information used and the assumptions made. There are some circumstances where it is reasonable that a model-based estimate is probably to be more accurate but there are other cases where it is likely to be less. 2) The more fundamental weakness in the model-based estimate approach is its susceptibility to being gamed. Atlanta is a good example of this issue but it is universal.

Atlanta has been a fast growing city for several decades. Growth rates have probably averaged 2-5% a year since at least the 1980s. 5% a year growth and by the time the next census rolls around and the city will be some 60% larger but still receiving the same distribution as it did ten years ago.

To tackle this, for many programs, the Federal Government starts with the Census number and then does annual or biannual estimate updates to the numbers based on births, deaths, inbound movements, outbound movements, housing starts, etc. For two or three decades, at the end of each decade, with the true-up of the new census, it was found that Atlanta had been providing optimistic estimates to inflate its population for the between census years, thereby receiving a higher distribution of Federal dollars than was warranted. The models were gamed.

This corruption shows up time and again. Most recently there has been a dispute over the death count from Hurricane Maria in Puerto Rico which is sourced in different counting approaches. The death count is in the few dozens if you go by reported deaths and official funerals. It is three thousands if you go by the results from models.

How many people died? Between 50 (census) and 3,000 (model) is the range which is not particularly useful for resource allocation or policy. Given the corrupt history of the Puerto Rican government and its inclination to make up numbers, everyone assumes the model based number is wrong and is only being touted in order to increase the aid they might receive from the Federal Government. But the truth is that the census probably undercounts even though the model almost certainly overestimates.

From the Yale research above, we have the exact same issue in terms of the estimates of the number of illegal aliens in the US.
Generally accepted estimates put the population of undocumented immigrants in the United States at approximately 11.3 million. A new study, using mathematical modeling on a range of demographic and immigration operations data, suggests that the actual undocumented immigrant population may be more than 22 million.

Immigration is the focus of fierce political and policy debate in the United States. Among the most contentious issues is how the country should address undocumented immigrants. Like a tornado that won’t dissipate, arguments have spun around and around for years. At the center lies a fairly stable and largely unquestioned number: 11.3 million undocumented immigrants residing in the U.S. But a paper by three Yale-affiliated researchers suggests all the perceptions and arguments based on that number may have a faulty foundation; the actual population of undocumented immigrants residing in the country is much larger than that, perhaps twice as high, and has been underestimated for decades.

Using mathematical modeling on a range of demographic and immigration operations data, the researchers estimate there are 22.1 million undocumented immigrants in the United States. Even using parameters intentionally aimed at producing an extremely conservative estimate, they found a population of 16.7 million undocumented immigrants.

The results, published in PLOS ONE, surprised the authors themselves. They started with the extremely conservative model and expected the results to be well below 11.3 million.

“Our original idea was just to do a sanity check on the existing number,” says Edward Kaplan, the William N. and Marie A. Beach Professor of Operations Research at the Yale School of Management. “Instead of a number which was smaller, we got a number that was 50% higher. That caused us to scratch our heads.”

Jonathan Feinstein, the John G. Searle Professor of Economics and Management at Yale SOM, adds, “There’s a number that everybody quotes, but when you actually dig down and say, ‘What is it based on?’ You find it’s based on one very specific survey and possibly an approach that has some difficulties. So we went in and just took a very different approach.”

The 11.3 million number is extrapolated from the Census Bureau’s annual American Community Survey. “It’s been the only method used for the last three decades,” says Mohammad Fazel‐Zarandi, a senior lecturer at the MIT Sloan School of Management and formerly a postdoctoral associate and lecturer in operations at the Yale School of Management. That made the researchers curious—could they reproduce the number using a different methodology?

The approach in the new research was based on operational data, such as deportations and visa overstays, and demographic data, including death rates and immigration rates. “We combined these data using a demographic model that follows a very simple logic,” Kaplan says. “The population today is equal to the initial population plus everyone who came in minus everyone who went out. It’s that simple.”

While the logic is simple—tally the inflows and outflows over time—actually gathering, assessing, and inserting the data appropriately into a mathematical model isn’t at all simple. Because there is significant uncertainty, the results are presented as a range. After running 1,000,000 simulations of the model, the researchers’ 95% probability range is 16 million to 29 million, with 22.1 million as the mean.

Notably, the upper bound of the traditional survey approach, which also produces a range, doesn’t overlap with the lower bound of the new modeling method. “There really is some open water between these estimates,” Kaplan says. He believes that means the differences between the approaches can’t be explained by sampling variability or annual fluctuations.
The simulations from the model:

Click to enlarge.

I am not surprised that the number might be larger than 11 million. In Atlanta, and indeed, across the southeast, even in rural towns, and in my travels in the Midwest and Northeast, there are whole neighborhoods in cities which used to be working class white or black and which are now Hispanic. That might simply reflect demographic shifting, people move the suburbs and such. Never-the-less, my visual sense has been that there has been a large increase in Latin American population above the official numbers. But you can't trust your eyes alone. You have to trust but verify. I see the numbers. I see the population shifts. They don't seem to reconcile with one another. But I don't trust either the official numbers or my impressions. For me, it is an open question.

Its relevance has to do with social absorption rates and erosion of social trust. Compared to Europe, we are, on average, blessed with the quality of our emigrants, legal and illegal.

However, I think there is good reason to be concerned about the rate at which we are able to absorb and assimilate multiple different cultures. We are fairly unique in being a heterogeneous nation (by race, ethnicity, religion, etc.) compared to most and we are one of the few nations built on an idea of governance rather than blood and soil. We are a nation of laws and Age of Enlightenment rights and beliefs, rare in a world built on castes, blood groups, and other identities.

It takes time for people raised on identities to absorb the beliefs of human rights, checks and balances, consent of the governed, individual rights, etc. What is the highest level of non-conforming belief systems which the national system can sustain before the fabric is torn? I don't know but I have always estimated it to be about 15%, a number which we are bumping up against. If the true number of illegal immigrants is more than double our current estimate, then our total foreign-born are probably getting well above 15%.

It might add to the explanation of why the masses are revolting against the establishments. The masses are living in a more foreign (and sometimes antagonistic) world than are the establishment in their gated communities and private security.

Interesting study which does not resolve that which cannot be answered but which does provide evidence that the accepted number might be materially wrong.

No comments:

Post a Comment