(1) do not make absolutist statements without knowing the nature of the data; (2) Do not abuse statistical terminology; (3) do not assert a conspiracy is in place just because the data do not conform to your preferred narrative.
First, consider a comment on the Hurricane Maria death toll:
This [assertion that thousands of American citizens have died] is categorically false, Menzie. Excess deaths in PR through year end, those recorded by the Statistics Office, numbered only 654. Most of these occurred in the last ten days of September and the whole of October. While the power outages there were exacerbated by the state ownership of PR’s utility, a large portion of the excess deaths would likely have occurred regardless, given the terrain and the strength of the hurricane. Thus, perhaps 300-400 of the excess deaths would have occurred regardless of steps anyone could have made to fix the power supply. The remainder can be attributed essentially to the state ownership of the power utility.
I would note that excess deaths fell by half in December. Thus, the data suggests that the hurricane accelerated the deaths of ill and dying people, rather than killing them outright. I would expect the excess deaths at a year horizon (through, say, Oct. 1, 2018) to total perhaps 200-400. Still a notable number, but certainly not 4,600.
See the analysis: https://www.princetonpolicy.com/ppa-blog/2018/5/30/reports-of-death-in-puerto-rico-are-wildly-exaggerated
I would note that the official death toll is 2975, in GWU report commissioned by the Commonwealth of Puerto Rico, see discussion of estimates here.
Second, a 2018 post regarding uncertainty in statistical inference.
Mr. Steven Kopits takes issue with the Harvard School of Public Health led study’s point estimate of (4645) and confidence interval (798, 8498) for Puerto Rico excess fatalities post-Maria thusly:
Does Harvard stand behind the study, or not?
That is, does Harvard SPH believe that the central estimate of excess deaths to 12/31 is 4645, or not? Does it stand behind the confidence interval, or not? Is there still a 50+ probably that the death toll comes in over 4600? If there is, then the people of PR need to start looking for the 3,250 missing or the press needs to assume PR authorities are lying. Those are the implied action items.
Or should we just take whatever number HSPH publishes in the future and divide by 3 to get a realistic estimate of actual?
Let’s show a detail of the graph previously displayed (in this post):
Figure 1: Estimates from Santos-Lozada and Jeffrey Howard (Nov. 2017) for September and October (calculated as difference of midpoint estimates), and Nashant Kishore et al. (May 2018) for December 2017 (blue triangles), and Roberto Rivera and Wolfgang Rolke (Feb. 2018) (red square), and Santos-Lozada estimate based on administrative data released 6/1 (large dark blue triangle), end-of-month figures, all on log scale. + indicate upper and lower bounds for 95% confidence intervals. Orange triangle is Steven Kopits estimate for year-end as of June 4. Cumulative figure for Santos-Lozada and Howard October figure author’s calculations based on reported monthly figures.The middle paragraph (highlighted red) shows a misunderstanding of what a confidence interval is. The true parameter is either in or not in the confidence interval. Rather, this would be a better characterization of a 95% CI:
“Were this procedure to be repeated on numerous samples, the fraction of calculated confidence intervals (which would differ for each sample) that encompass the true population parameter would tend toward 95%.”
In other words, it is a mistake to say there should be a 50% probability that the actual number will be above the point estimate. But that is exactly what Mr. Kopits believes a confidence interval means. He is in this regard incorrect. From PolitiFact:
University of Puerto Rico statistician Roberto Rivera, who along with colleague Wolfgang Rolke used death certificates to estimate a much lower death count, said that indirect estimates should be interpreted with care.
“Note that according to the study the true number of deaths due to Maria can be any number between 793 and 8,498: 4,645 is not more likely than any other value in the range,” Rivera said.
Once again, I think it best that those who wish to comment on estimates should be familiar with statistical concepts.
Third, here is an example of data paranoia, from a recent post.
Reader Steve Kopits writes about the debate over employment numbers:
At the same time, I thought it possible that both surveys were in fact correct, but garbled with the effect of the recovery from the suppression, thereby creating misleading impressions because we were misinterpreting the data. That still seems possible, though I’ve read that others think the CES was manipulated to provide a more rosy picture heading into the election.
This statement joins a long pile of such allegations, e.g., Senator Barraso, Jack Welch, former Rep. Allan West, Zerohedge, Mick Mulvaney, among others. All I can say is that if there was a conspiracy, they didn’t do a very good job. With the benefit of the January benchmark revision, we can update our assessment of how badly the purported conspirators performed their job.
Figure 1: Nonfarm payroll employment in January 2023 release (red), in October 2022 release (blue), in 000’s, s.a. Source: BLS via FRED.
Now, it may turn out eventually (after another benchmark revision the results of which will be released in February 2024) that in Q2 NFP will turn out to be lower than indicated in the CES. But for purposes of deceiving the electorate in November 2022, this seems like a lousy way of doing it.
In any case, before people start crying that the data are manipulated, I wish they would read the BLS technical notes on (1) revisions and mean absolute revisions, (2) benchmark revisions, (3) the calculation of seasonal adjustment factors, (4) the application of population controls in the CPS. Before they start citing the various series, I wish they understood the informational content (relative to business cycle fluctuations) of the CPS employment series vs. that of the CES employment series. That understanding can be obtained by reading works by people who understand the characteristics of the macro data (Furman (2016); CEA (2017); Goto et al. (2021)).
From a sociological perspective, I do wonder why conspiracy theories are so attractive to some individuals. Here’s a Scientific American article laying out some of the character traits that are associated with adherence to conspiracy theories.
f