The Signal and the Noise: Why So Many Predictions Fail—but Some Don’t – Nate Silver
Thoughts: I enjoyed The Signal and the Noise. While it could have been about half the length—Silver goes into a lot of detail on each of his topics—the writing is clear, engaging and approachable. I found the chapter on climate change predictions particularly perceptive, and I’m curious to know how it would be rewritten now, a decade after the book was published.
(The notes below are not a summary of the book, but rather raw notes - whatever I thought, at the time, might be worth remembering. I read this as an e-book, so page numbers are as they appeared in the app I used, Libby.)
Silver, Nate. 2020 [2012]. The Signal and the Noise: Why So Many Predictions Fail—but Some Don’t. Penguin Books.
Preface to the 2020 Edition
- 25: System 1 thinking can be good for making predictions, but only once one has worked hard to build a good intuition for a class of problems. But for situations where we haven’t worked to develop these intuitions, System 1 can easily lead us astray, so System 2 must step in
- 29: to check out: James Surowiecki’s The Wisdom of Crowds
- 29-30: Silver summarizes Surowiecki’s conclusions for where collective judgment is usually more accurate than individual judgment:
- Diversity - “If everybody comes in thinking the same way about a problem, then forming a group is pointless…. You want to people with a diversity of backgrounds, experiences, and skill sets as part of your group.”
- Independence - “People need to be able to share ideas and dissent openly, without fear of reprisal…. Moreover, do you want to avoid information cascades wherein people’s preferences are dependent on everyone else’s preferences.”
- Trust - “In any sort of group that purports to be representative, …people need to trust that their representatives have the collective interest in mind rather than looking out for themselves.”
- 29-30: Silver summarizes Surowiecki’s conclusions for where collective judgment is usually more accurate than individual judgment:
Introduction
- 51: “We can never make perfectly objective predictions. They will always be tainted by our subjective point of view…. ¶ [This book asserts] that a belief in the objective truth—and a commitment to pursuing it—is the first prerequisite of making better predictions. The forecaster’s next commitment is to realize that she perceives it imperfectly. ¶ Prediction is important because it connects subjective and objective reality.”
1: A Catastrophic Failure of Prediction
- 67-68: Risk vs. Uncertainty (as characterized by Frank H. Knight):
- Risk: Something for which you can quantify your uncertainty about; e.g. gambling where the odds are known
- Uncertainty: Risk that is hard to measure
- 68-69: Over the 20th century, housing prices have basically held steady when compared to inflation, but have not really been profitable - much lower returns than investing in the stock market.
2: Are You Smarter Than a Television Pundit
- 95-96: In Phil Tetlock’s study of superforecasters, he identified two main groups of people, which he called hedgehogs and foxes. After Greek poet Archilochus: “The fox knows many little things, but the hedgehog knows one big thing.”
- Hedgehogs: “believe in Big Ideas” - have one main model to explain the world, and believe it can account for basically all interactions in society
- Foxes: “believe in a plethora of ideas and in taking a multitude of approaches toward a problem. They tend to be more tolerant of nuance, uncertainty, complexity, and dissenting opinion.”
- 96: “Whereas the hedgehogs’ forecasts were barely any better than random chance, the foxes demonstrated predictive skill”
- 100: “While foxes tend to get better at forecasting with experience, the opposite is true of hedgehogs: their performance tends to worsen as they pick up additional credentials.”
3: All I Care About Is W’s and L’s
- 151: “When we have trouble categorizing something, we’ll often overlook or misjudge it.”
4: For Years You’ve Been Telling Us That Rain Is Green
- 169: For chaos theory to apply in a system, two properties must hold:
- The system is dynamic: its behaviour at one moment in time is influenced by its behaviour in the past
- The system is nonlinear: its relationships are exponential, not additive/logarithmic/etc.
- 176: The National Weather Service has discovered that humans improve the accuracy of its precipitation forecasts by 25% and its temperature forecasts by 10%, and these ratios have held roughly constant over time, even as its computer models improve.
5: Desperately Seeking Signal
- 222: overfitting vs underfitting:
- underfitting: your model is not capturing as much of the signal present in the training data as it could
- overfitting: your model is fitting the noise in the training data, rather than the signal
- 227: Overfitting is particularly pernicious: “it makes our model look better on paper [(because it will have a high degree of correlation with the data)] but performs worse in the real world”
- 232: in a passage talking about predicting earthquakes, which tend to have a power law distribution - j: Seems reminiscent of some of the claims made by Steven Pinker in The Better Angels of our Nature about the decline in violence over the past century - to what extent could these predictions be similarly overfit? He makes some big claims about “hemoclasms” based on just a couple of data points, iirc…
- 234: Silver asserts that complexity theory is different than chaos theory, even though they’re sometimes lumped together: “[Complexity] theory suggests that very simple things can behave in strange and mysterious ways when they interact with one another.” j: So complexity theory deals with emergence etc., while chaos theory doesn’t necessarily?
- Per Bak offerst the example of a sand pile to which individual grains are added - any individual grain can either stick, or tumble off, or set off an avalanche. Complex systems tend to be marked by “large periods of apparent stasis marked my sudden and catastrophic failures”
- 234: Which types of noise have different underlying probability distributions (e.g. white noise). Complex systems tend to produce Brownian noise.
6: How to Drown in Three Feet of Water
- 263: Group forecasting beating individual forecasting “has been found to be true in almost every field in which it has been studied.”
7: Role Models
- 289: SIR epidemiological model: can be useful as a starting point, but like most simple models, it doesn’t come close to capturing the progression of actual outbreaks. “The model requires a lot of assumptions to work properly, some of which are not very realistic in practice.” - Assumes everyone in the population behaves the same, that they intermingle at random, that they are all equally susceptible, etc.
8: Less and Less and Less Wrong
- 325-326: The frequentist model of statistics assumes that uncertainty is a statistical problem, coming about from samples being an imperfect representation of the population
- it assumes that there is no sampling bias, it assumes that the underlying measurement uncertainty follows a bell curve
- 327: “The bigger problem, however, is that frequentist methods—in striving for immaculate statistical procedures that can’t be contaminated by the researcher’s bias—keep him hermetically sealed off from the real world. These methods discourage the researcher from considering the underlying context or plausibility of his hypothesis, something that the Bayesian method demands in the form of a prior probability.”
- 329: “The most practical definition of a Bayesian prior might simply be the odds at which you are willing to place a bet.”
- 335: There has been a movement over the past few decades (continuing beyond 2012, I assume) among some statisticians that frequentist statistics should not be taught to undergrads.
9: Rage Against the Machines
10: The Poker Bubble
11: If You Can’t Beat ’Em…
- 422: Three things about aggregate forecasts:
- “While the aggregate forecast will essentially always be better than the typical individual’s forecast, that doesn’t necessarily mean it will be good.”
- “The most robust evidence indicates that this wisdom-of-crowds principle holds when forecasts are made independently before being averaged together.”
- “Although the aggregate forecast is better than the typical individual’s forecast, it does not necessarily hold that is better than the best individual’s forecast.”
12: A Climate of Healthy Skepticism
- main idea: While climate change models appear to be quite robust when considering long-term trends, using them to predict the weather of an individual year, or even an individual decade, comes with an enormous degree of uncertainty.
- 507: It’s important not to reach for hyperbole when making predictions, because those predictions become priors to be updated when new data comes in.
- Example: someone, trying to drum up support for action on climate change, predicts that the next year has a 99% chance of being warmer than average. If that prediction doesn’t come true, a skeptic would (rightly!) consider that to be strong evidence against the reality of climate change—based on the prediction, we would be exceedingly unlikely to observe the lower temperature given that a global warming trend is occurring. A more responsible prediction, like a 55% or 60% chance that next year will be warmer than average, is much more robust against noisy year-to-year fluctuations.
- “When we advance more confident claims and they fail to come to fruition, this constitutes much more powerful evidence against our hypothesis. We can’t really blame anyone for losing faith in our forecasts when this occurs; they are making the correct inference under Bayesian logic”
13: What You Don’t Know Can Hurt You
- 520: Thomas Schelling: “There is a tendency in our planning to confuse the unfamiliar with the improbable. The contingency we have not considered looks strange; what looks strange is thought improbable; what is improbable need not be considered seriously.”
Conclusion
- 557: “Bayes’s theorem requires us to state—explicitly—how likely we believe an event is to occur before we begin to win the evidence.”
- 558: “What isn’t acceptable under Bayes’s theorem is to pretend that you don’t have any prior beliefs. You should work to reduce your biases, but to say that you have none is a sign that you have many.”
- “This is perhaps the easiest Bayesian principal to apply: make lots of forecasts…. It’s the only way to get better.” j: This is stating what I already know: must put into practice!
Posted: Jan 23, 2022. Last updated: Aug 31, 2023.