In the New York Times, Gary Marcus and Ernest Davis examine the big claims being made for the big data revolution:
Is big data really all it’s cracked up to be? There is no doubt that big data is a valuable tool that has already had a critical impact in certain areas. For instance, almost every successful artificial intelligence computer program in the last 20 years, from Google’s search engine to the I.B.M. Jeopardy! champion Watson, has involved the substantial crunching of large bodies of data. But precisely because of its newfound popularity and growing use, we need to be levelheaded about what big data can — and can’t — do.
The first thing to note is that although big data is very good at detecting correlations, especially subtle correlations that an analysis of smaller data sets might miss, it never tells us which correlations are meaningful. A big data analysis might reveal, for instance, that from 2006 to 2011 the United States murder rate was well correlated with the market share of Internet Explorer: Both went down sharply. But it’s hard to imagine there is any causal relationship between the two. Likewise, from 1998 to 2007 the number of new cases of autism diagnosed was extremely well correlated with sales of organic food (both went up sharply), but identifying the correlation won’t by itself tell us whether diet has anything to do with autism.
Second, big data can work well as an adjunct to scientific inquiry but rarely succeeds as a wholesale replacement. Molecular biologists, for example, would very much like to be able to infer the three-dimensional structure of proteins from their underlying DNA sequence, and scientists working on the problem use big data as one tool among many. But no scientist thinks you can solve this problem by crunching data alone, no matter how powerful the statistical analysis; you will always need to start with an analysis that relies on an understanding of physics and biochemistry.
Statistician-to-the-stars Nate Silver can shrug off attacks from Republicans over his 2012 electoral forecast or from Democrats unhappy with his latest forecast for the 2014 mid-terms, but he’s finding himself under attack from an unexpected quarter right now:
Ever wondered how it would feel to be dropped from a helicopter into a swirling mass of crazed, genetically modified oceanic whitetip sharks in the middle of a USS-Indianapolis-style feeding frenzy?
Just ask Nate Silver. He’s been living the nightmare all week – ever since he had the temerity to appoint a half-way skeptical scientist as resident climate expert at his “data-driven” journalism site, FiveThirtyEight.
Silver has confessed to The Daily Show that he can handle the attacks from Paul Krugman (“frivolous”), from his ex-New York Times colleagues, and from Democrats disappointed with his Senate forecasts. But what has truly spooked this otherwise fearless seeker-after-truth, apparently, is the self-righteous rage from the True Believers in Al Gore’s Church of Climate Change.
“We don’t pay that much attention to what media critics say, but that was a piece where we had 80 percent of our commenters weigh in negatively, so we’re commissioning a rebuttal to that piece,” said Silver. “We listen to the people who actually give us legs.”
The piece in question was the debut by his resident climate expert, Roger Pielke, Jr., arguing that there was no evidence to support claims by alarmists that “extreme weather events” are on the increase and doing more damage than ever before. Pielke himself is a “luke-warmer” – that is, he believes that mankind is contributing to global warming but is not yet convinced that this contribution will be catastrophic. But neither his scientific bona fides (he was Director of the Center for Science and Technology Policy Research at the University of Colorado Boulder) nor his measured, fact-based delivery were enough to satisfy the ravening green-lust of FiveThirtyEight’s mainly liberal readership.
Maggie McNeill explains why the “sex trafficking” meme has been so relentlessly pushed in the media for the last few years:
Imagine a study of the alcohol industry which interviewed not a single brewer, wine expert, liquor store owner or drinker, but instead relied solely on the statements of ATF agents, dry-county politicians and members of Alcoholics Anonymous and Mothers Against Drunk Driving. Or how about a report on restaurants which treated the opinions of failed hot dog stand operators as the basis for broad statements about every kind of food business from convenience stores to food trucks to McDonald’s to five-star restaurants?
You’d probably surmise that this sort of research would be biased and one-sided to the point of unreliable. And you’d be correct. But change the topic to sex work, and such methods are not only the norm, they’re accepted uncritically by the media and the majority of those who the resulting studies. In fact, many of those who represent themselves as sex work researchers don’t even try to get good data. They simply present their opinions as fact, occasionally bolstered by pseudo-studies designed to produce pre-determined results. Well-known and easily-contacted sex workers are rarely consulted. There’s no peer review. And when sex workers are consulted at all, they’re recruited from jails and substance abuse programs, resulting in a sample skewed heavily toward the desperate, the disadvantaged and the marginalized.
This sort of statistical malpractice has always been typical of prostitution research. But the incentive to produce it has dramatically increased in the past decade, thanks to a media-fueled moral panic over sex trafficking. Sex-work prohibitionists have long seen trafficking and sex slavery as a useful Trojan horse. In its 2010 “national action plan,” for example, the activist group Demand Abolition writes,“Framing the Campaign’s key target as sexual slavery might garner more support and less resistance, while framing the Campaign as combating prostitution may be less likely to mobilize similar levels of support and to stimulate stronger opposition.”
Emma Pierson does a bit of statistical analysis of some of Shakespeare’s plays and discovers that some of the play names are rather misleading, at least in terms of romantic dialogue:
More than 400 years after Shakespeare wrote it, we can now say that Romeo and Juliet has the wrong name. Perhaps the play should be called Juliet and Her Nurse, which isn’t nearly as sexy, or Romeo and Benvolio, which has a whole different connotation.
I discovered this by writing a computer program to count how many lines each pair of characters in Romeo and Juliet spoke to each other,1 with the expectation that the lovers in the greatest love story of all time would speak more than any other pair. I wanted Romeo and Juliet to end up together — if they couldn’t in the play, at least they could in my analysis — but the math paid no heed to my desires. Juliet speaks more to her nurse than she does to Romeo; Romeo speaks more to Benvolio than he does to Juliet. Romeo gets a larger share of attention from his friends (Benvolio and Mercutio) and even his enemies (Tybalt) than he does from Juliet; Juliet gets a larger share of attention from her nurse and her mother than she does from Romeo. The two appear together in only five scenes out of 25. We all knew that this wasn’t a play predicated on deep interactions between the two protagonists, but still.
I’m blaming Romeo for this lack of communication. Juliet speaks 155 lines to him, and he speaks only 101 to her. His reticence toward Juliet is particularly inexcusable when you consider that Romeo spends more time talking than anyone else in the play. (He spends only one-sixth of his time in conversation with the supposed love of his life.) One might be tempted to blame this on the nature of the plot; of course the lovers have no chance to converse, kept apart as they are by the loathing of their families! But when I analyzed the script of a modern adaptation of Romeo and Juliet — West Side Story — I found that Tony and Maria interacted more in the script than did any other pair.
All this got me thinking: Do any of Shakespeare’s lovers actually, you know, talk to each other? If Romeo and Juliet don’t, what hope do the rest of them have?
Update, 28 March: Chateau Heartiste says that this study shows that pick-up artists and “game” practitioners are right and also proves that “Everything important you need to know about men and women you can find in the works of Shakespeare”.
Tim Worstall pokes fun at a recent Oxfam report that claims that Britain’s five richest families own more than the bottom 20% of the population:
I read this and thought, “well, yes, this is obvious and what the hell’s it got to do with increasing inequality?” Of course Gerald Grosvenor (aka Duke of Westminster) has more wealth than the bottom 10 per cent of the country put together. It’s obvious that the top five families will have more than 20 per cent of all Britons. Do they think we all just got off the turnip truck or something?
They’ve also managed to entirely screw up the statistic they devised themselves by missing the point that if you’ve no debts and a £10 note then you’ve got more wealth than the bottom 10 or 20 per cent of the population has in aggregate. The bottom levels of our society have negative wealth.
Given what we classify as wealth, the poor have no assets at all. Property, financial assets (stocks, bonds etc), private sector pension plans, these are all pretty obviously wealth.
But then the state pension is also wealth: it’s a promise of a future stream of income. That is indeed wealth just as much as a share certificate or private pension is. But we don’t count that state pension as wealth in these sorts of calculations.
The right to live in a council house at a subsidised rent of the rest of your life is wealth, but that’s not counted either. Hell, the fact that we live in a country with a welfare system is a form of wealth — but we still don’t count that.
Doing this has been called (not by me, originally anyway) committing Worstall’s Fallacy. Failing to take account of the things we already do to correct a problem in arguing that more must be done to correct said problem. We already redistribute wealth by taxing the rich to provide pensions, housing, free education (only until 18 these days) and so on to people who could not otherwise afford them. But when bemoaning the amount of inequality that clearly cries out for more redistribution, we fail to note how much we’re already doing.
So Oxfam are improperly accounting for wealth and they’ve also missed the point that, given the existence of possible negative wealth, then of course one person or another in the UK will have more wealth than the entire lowest swathe.
David Friedman is an economist, so of course he doesn’t claim to be a climate scientist. He can, however, do math and examine numerical evidence … which doesn’t seem to support the most recent explanation for the pause in global warming:
One claim I have repeatedly seen in online arguments about global warming is that it has not really paused, because the “missing heat” has gone into the ocean. Before asking whether that claim is true, it is worth first asking how anyone could know it is true. A simple calculation suggests that the answer is one couldn’t. As follows …
Part of the claim, which I assume is true, is that from 90% to 95% of global heat goes into the ocean, which implies that the heat capacity of the ocean is 10 to 20 times that of the rest of the system. If so, and if the pause in surface and atmosphere temperatures was due to heat for some reason going into the ocean instead, that should have warmed the ocean by 1/10 to 1/20th of the amount by which the rest of the system didn’t warm.
The global temperature trend in the IPCC projections is about .03°C/year. If surface and atmospheric temperature has been flat for 17 years, that would put it about .5° below trend. If the explanation is the heat going into the ocean, the average temperature of the ocean should have risen as a result above its trend by between .025° and .05°.
Would anyone like to claim that we have data on ocean temperature accurate enough to show a change that small? If not, then the claim is at this point not an observed fact, which is how it is routinely reported, but a conjecture, a way of explaining away the failure of past models to correctly predict current data.
The good news is that in the United States, the number of police officers killed in the performance of their duties dropped to a level last seen in 1959. The bad news is that the number of people killed by the police didn’t drop:
The go-to phrase deployed by police officers, district attorneys and other law enforcement-related entities to justify the use of excessive force or firing dozens of bullets into a single suspect is “the officer(s) feared for his/her safety.” There is no doubt being a police officer can be dangerous. But is it as dangerous as this oft-deployed justification makes it appear?
The annual report from the nonprofit National Law Enforcement Officers Memorial Fund also found that deaths in the line of duty generally fell by 8 percent and were the fewest since 1959.
According to the report, 111 federal, state, local, tribal and territorial officers were killed in the line of duty nationwide this past year, compared to 121 in 2012.
Forty-six officers were killed in traffic related accidents, and 33 were killed by firearms. The number of firearms deaths fell 33 percent in 2013 and was the lowest since 1887.
This statistical evidence suggests being a cop is safer than its been since the days of Sheriff Andy Griffith. Back in 2007, the FBI put the number of justifiable homicides committed by officers in the line of duty at 391. That count only includes homicides that occurred during the commission of a felony. This total doesn’t include justifiable homicides committed by police officers against people not committing felonies and also doesn’t include homicides found to be not justifiable. But still, this severe undercount far outpaces the number of cops killed by civilians.
We should expect the number to always skew in favor of the police. After all, they are fighting crime and will run into dangerous criminals who may respond violently. But to continually claim that officers “fear for their safety” is to ignore the statistical evidence that says being a cop is the safest it’s been in years — and in more than a century when it comes to firearms-related deaths.
Last week, the Fraser Institute published Economic Freedom of North America 2013 which illustrates the relative changes in economic freedom among US states and Canadian provinces:
Click to go to the full document
Reason‘s J.D. Tuccille says of the report, “Canadian Provinces Suck Slightly Less Than U.S. States at Economic Freedom”:
For readers of Reason, Fraser’s definition of economic freedom is unlikely to be controversial. Fundamentally, the report says, “Individuals have economic freedom when (a) property they acquire without the use of force, fraud, or theft is protected from physical invasions by others and (b) they are free to use, exchange, or give their property as long as their actions do not violate the identical rights of others.”
The report includes two rankings of economic freedom — one just comparing state and provincial policies, and the other incorporating the effects of national legal systems and property rights protections. Since people are subject to all aspects of the environment in which they operate, and not just locally decided rules and regulations, it’s that “world-adjusted all-government” score that matters most, and it has a big effect — especially since “gaps have widened between the scores of Canada and the United States in these areas.” The result is is that:
[I]n the world-adjusted index the top two jurisdictions are Canadian, with Alberta in first place and Saskatchewan in second. In fact, four of the top seven jurisdictions are Canadian, with the province of Newfoundland & Labrador in sixth and British Columbia in seventh. Delaware, in third spot, is the highest ranked US state, followed by Texas and Nevada. Nonetheless, Canadian jurisdictions, Prince Edward Island and Nova Scotia, still land in the bottom two spots, just behind New Mexico at 58th and West Virginia at 57th.
Before you assume that the nice folks at Fraser are gloating, or that you should pack your bags for a northern relocation, the authors caution that things aren’t necessarily getting better north of the border. Instead, “their economic freedom is declining more slowly than in the US states.”
As Tim Harford says, “So it’s HIS fault”:
In the 1930s, Austrian sociologist, philosopher and curator Otto Neurath and his wife Marie pioneered ISOTYPE — the International System Of TYpographic Picture Education, a new visual language for capturing quantitative information in pictograms, sparking the golden age of infographics in print.
The Transformer: Principles of Making Isotype Charts is the first English-language volume to capture the story of Isotype, an essential foundation for our modern visual language dominated by pictograms in everything from bathroom signage to computer interfaces to GOOD’s acclaimed Transparencies.
The real cherry on top is a previously unpublished essay by Marie Neurath, who was very much on par with Otto as Isotype’s co-inventor, written a year before her death in 1986 and telling the story of how she carried on the Isotype legacy after Otto’s death in 1946.
Tim Worstall on a Wall Street Journal article which asks “how do we measure inequality”. Tim says “not that way, idiots” (although I might have imagined the “idiots” part):
The title of the piece is “How do you measure ‘inequality’?” to which a very good response is “Not that way”. For although all the numbers there are exact and accurate (well, as much as any economic statistic is such) the whole statement is entirely misleading. For the numbers that are being used for the USA are calculated on an entirely different basis to the way that the numbers for the other countries are. So much so that in this instance we have Wikipedia being more accurate than either the WSJ or the CIA itself. Which, while amusing, isn’t quite the world I think we’d all like to have.
Here’s what the problem is. Conceptually we can measure inequality in a number of different ways and this particular one, the Gini, looks at the spread of incomes across the society. OK, no need for the details of how we calculate it except for one. We again, conceptually, have two different incomes that can be measured.
So, the guy pulling down $1 million a year dealing bonds on Wall Street. Does he really have an income of $1 million a year? Or is it more true to say that he gets $600,000 a year after the Feds, NY State and NYC have all dipped their hands into his paycheck? And the guy at the other end, making $15,000 a year as a greeter at WalMart. Is he really making $15,000? Or should we add in the EITC, the State EITC (if there is one), Section 8 housing vouchers, Medicaid and all the rest to what he’s earning? He might be consuming as if he’s getting $25 k a year, even though his market income is only $15k.
What we actually do is we calculate both of these. The first is called the Gini at market incomes, the second the Gini after taxes and benefits. There’s nothing either right or wrong about either measure: they just are what they are. However, we do have to be clear about which we are using in any circumstance and similarly, very clear about not comparing inequality in one country by one measure with inequality in another by the other measure. Yet, sadly, that is exactly what is being done here.
In Forbes, Tim Worstall explains a misunderstanding of Ricardo’s Iron Law of One Price on the part of the Guardian:
This is a fun little bit of data calculation and visualisation. It’s a database and then mapping of the global price list for Apple’s iPhone 5s. And there are two interesting ways of using it. The first is simply to look at how prices differ around the world:
You can do this in USD or GBP as you wish. And this can be used to explore the violations of Ricardo’s Iron Law of One Price. Which is where David Ricardo insisted that the prices of traded goods would inevitably move to being equal all over the world. Well, equal minus the transport costs of getting them around the world. And transport costs for an iPhone are trivial: it would be amazing if Apple were paying more than a couple of dollars to airfreight one to anywhere at all. So, we would expect prices to be the same everywhere: but they obviously are not.
However, when The Guardian reports on this something appears to go wrong. Not their fault I suppose, it’s about economics and lefties never really do get that subject. But here:
Similar to the way the Economist tracks the cost of the ubiquitous McDonalds burger across countries, nations and states, Mobile Unlocked tracked the price of the iPhone 5S across 47 countries in native currencies with native sales tax, and then converted those prices into US dollars (USD) or British pounds (GBP).
No … the Big Mac Index operates entirely and exactly the other way around. We need to make the distinction between traded goods and non-traded goods. The Iron Law only works on traded goods. What we’re trying to find out with PPP calculations is what are the price differentials of non-traded goods? Which is why the Big Mac is used. It is (supposedly at least) exactly the same all over the world. It is also made almost entirely from local produce bought at the local price in local markets. US Big Macs use American beef, Argentine ones Argentine and so on. So we get to see the impact of local prices on the same product worldwide. That’s what we’re actually attempting with that Big Mac Index. The Economist then goes on to compare the prices of this non-traded good with exchange rates and attempt to work out whether the exchange rates are correct or not.
This is entirely different from using the price of a traded good to measure local price variations. For what we’re going to be measuring here is what interventions there are into stopping the Iron Law working, not what local price levels are.
Statistics can be very helpful tools in analysis, but the quality of analysis will depend on the accuracy of the statistics. In the US, the organization responsible for compiling the unemployment numbers is the Bureau of Labor Statistics (BLS). They actually compile several different categories of unemployment data, only one of which is commonly used by the media: the U-3 unemployment rate. Wendy McElroy explains why this may be a very misleading number:
The Bureau of Labor Statistics (BLS) compiles the United States’ unemployment statistics every month. It looks at six categories of different data, that are called U-1 to U-6. U-3 counts how many people were unemployed but were actively looking for work during the past month; this is the official unemployment rate that is broadcast by the media. By contrast, U-6 counts the unemployed and underemployed who are excluded from the U-3 data. For example, U-6 classifies people who have unsuccessfully looked for a job in the last year as “not participating in the labor force” rather than as unemployed. U-6 also includes part-time workers who need more employment in order to live, but the number of these workers is dwarfed by the number of long-term unemployed. (“Long-term employment” is defined as lasting 27 weeks or more).
The data included in the categories increase as the numbers ascend; the categories are defined as follows:
- U-3 Total unemployed, as a percent of the civilian labor force
- U-4 Total unemployed plus discouraged workers
- U-5 Total unemployed, plus discouraged workers, plus all other persons marginally attached to the labor force
- U-6 Total unemployed, plus all persons marginally attached to the labor force, plus total employed part time for economic reasons
What is America’s real unemployment rate? According to U-3 for October 2013, 11.3 million people were officially unemployed. BLS adds that 91,541,000 working age people did not participate in the labor force. If these numbers are added together, there are 102 million working age Americans who are either unemployed or not in the labor force for reasons that are not clear; for example, they could be retired. The non-working population represents 37.2% of working age people.
(Note: it is not known how the federal furlough of employees during the October shutdown affected the data, if at all. The furloughed employees seem to have been counted as both unemployed and working because they eventually received full payment for the time off.)
The unemployment rate reflected by the last four categories of BLS data break down as follows:
- U-3 = 7.3%
- U-4 = 7.8%
- U-5 = 8.6%
- U-6 = 13.8%
The American media used the U-3 numbers and reported the unemployment rate for October to be 7.3%, which is about 1/2 of the more realistic U-6 total. The media also glossed over U-3 figures that were alarming. For example, the official rate for teen unemployment (16 to 19 years old) stood at 22.2%; black unemployment is 13.1%
According to a Dallas newspaper, Houston is the focal point of a vast sex trafficking operation:
Check out this obvious crap — unbelievable to any thinking person — in the November 22 Dallas Morning News.
The Texas Senator and Representative that the paper apparently very credulously and obediently took notes from contend that there are 300,000 sex trafficking cases prosecuted every year — “in Houston alone.”
Here’s the quote from the Dallas Morning News editorial:
Editorial: Cracking down on sex traffickers
Two Texas Republicans, Sen. John Cornyn and Rep. Ted Poe of the Houston area, are co-sponsoring a bill that would impose stiff penalties on these adult victimizers of up to life in prison. The Justice for Victims of Trafficking Act, which has bipartisan support in both houses, would supplement an existing law that focuses primarily on punishing sex-trafficking organizations abroad.
Poe and Cornyn estimate that one-quarter of U.S. sex-trafficking victims have Texas roots. Poe says our state’s proximity to Mexico and high immigrant population give the state a particularly high profile. In Houston alone, about 300,000 sex trafficking cases are prosecuted each year.
Do they work butt-drunk at this paper?
300,000? Do you realize how many people that is?
Of course, Houston’s population is only 2.161 million. So, throw in my fantasy guestimate of at least 200,000 uncaught and unpunished people guilty of sex trafficking on top of the 300,000 supposedly documented. This suggests that a vast segment of Houston’s population — at least 15 percent and maybe 25 percent — is engaged in the business of sex trafficking.
Math is hard.
In Maclean’s, a look at the feel-good but economically silly reasons for senior discounts:
The seniors discount has long been justified as a way to recognize the constraints faced by pensioners stuck on fixed incomes, and as a modest token of appreciation for a lifetime spent paying taxes and contributing to society. And for those truly in need, who would quibble? But with half a million Baby Boomers — a group not known for frugality or lack of financial resources — turning 65 every year for the next few decades, the seniors discount is in for much greater scrutiny.
There was a time when the seniors discount made a lot more sense. In the mid-1970s, nearly 30 per cent of all seniors were considered poor, as defined by Statistics Canada’s low-income cut-off. But today, this has fallen to a mere 5.2 per cent. The impact of this turnaround is hard to overstate. Seniors once faced the highest rates of poverty in Canada; now they enjoy the lowest level of any age group: The poverty rate among seniors is almost half that of working-age Canadians.
Thanks to a solid system of government support programs, the very poorest seniors receive more income in retirement than they did when they were of working age. The near-elimination of seniors’ poverty is widely considered to be Canada’s greatest social policy triumph of the past half-century.
This tremendous improvement in seniors’ financial security has dramatically changed the distribution of income across age categories, as well. In 1976, median income for senior households was 41 per cent of the national average. Today, it’s 67 per cent. Over the same period, median income for families where the oldest member is aged 25-34 has fallen in both absolute and relative terms.
Then there’s the vast wealth generated for the Boomer generation by the housing and stock markets (only some of which was lost during the great recession). The stock of wealth in housing, pensions and financial assets held by the average senior family is nearly double that of working-age households. Accounting for the financial benefits of home ownership and rising house values, Statistics Canada calculates the true net annual income of retired households rises to 87 per cent of a working-age household’s income. In other words, non-working seniors are making almost as much as folks in their prime earning years, but without all the expenses and stressors that go with a job, children at home, or middle age. Not only that, the current crop of seniors enjoys historically high rates of pension coverage. The much-publicized erosion of private-sector pensions will hit younger generations who are currently far from retirement.
Debbie Downer Colin Campbell takes a survey of the state of Canada’s economy:
A key qualification for landing a job at the Bank of Canada, it seems, is an unfailing sense of optimism. In 2009, the bank forecast the economy would grow 3.3 per cent in 2011. It grew 2.5 per cent. In 2011, it said the economy would grow 2.9 per cent in 2013. It will likely be just 1.6 per cent. Now it says the economy will grow 2.3 per cent next year. How likely is that? The bank has consistently viewed the economy through rose-coloured glasses in recent years, perhaps believing its low-interest-rate policy will eventually bear fruit. Rates have been held at one per cent for three years now. But the economy seems only to be getting worse.
It grew 0.3 per cent in August, Statistics Canada said last week — mostly attributed to a familiar crutch, the oil business. Elsewhere, things aren’t looking up. A new TD Bank report said corporate Canada is “in a slump,” with profits down 16 per cent from their post-recession peak in 2011. Some observers point out that Canada is still doing better than Europe and Japan. But so are most countries that aren’t in a recession, from South Africa and New Zealand to Equatorial Guinea and Guatemala. After breezing through the recession, Canada is back to old habits: hoping its fortunes (i.e., exports) will rise along with America’s comeback. But the U.S., too, is back in a rut. Last week, the Federal Reserve said it would continue with its $85-billion-a-month bond-buying stimulus program.
With the economy sputtering, Ottawa has meanwhile remained preoccupied with fiscal restraint and balancing the budget within two years. So, with neither low interest rates nor government spending providing a boost, the outcome seems predictable: Official growth forecasts will look nice, but will keep missing the mark.