As Erik Larson points out in The Myth of Artificial Intelligence, what computers “know” must be painstakingly programmed
WILLIAM A. DEMBSKI
APRIL 24, 2021
Larson did an interesting podcast with the Brookings Institution through its Lawfare Blog shortly after the release of his book. It’s well worth a listen, and Larson elucidates in that interview many of the key points in his book. The one place in the interview where I wish he had elaborated further was on the question of abductive inference (aka retroductive inference or inference to the best explanation). For me, the key to understanding why computers cannot, and most likely will never, be able to perform abductive inferences is the problem of underdetermination of explanation by data. This may seem like a mouthful, but the idea is straightforward. For context, if you are going to get a computer to achieve anything like understanding in some subject area, it needs a lot of knowledge. That knowledge, in all the cases we know, needs to be painstakingly programmed. This is true even of machine learning situations where the underlying knowledge framework needs to be explicitly programmed (for instance, even Go programs that achieve world class playing status need many rules and heuristics explicitly programmed).
The Underdetermination Problem
Humans, on the other hand, need none of this. On the basis of very limited or incomplete data, we nonetheless come to the right conclusion about many things (yes, we are fallible, but the miracle is that we are right so often). Noam Chomsky’s entire claim to fame in linguistics really amounts to exploring this underdetermination problem, which he referred to as “the poverty of the stimulus.” Humans pick up language despite very varied experiences with other human language speakers. Babies born in abusive and sensory deprived environments pick up language. Babies subject to Mozart from the womb and with rich sensory environments pick up language. Language results from growing up with cultured and articulate parents. Language results from growing up with boorish and inarticulate parents. Yet in all cases, the actual amount of language exposure is minimal compared to language ability that emerges and the knowledge of the world that results. On the basis of the language exposure, many different ways of understanding the world might have developed, and yet we seem to get things right (much of the time). Harvard philosopher Willard Quine, in his classic Word and Object (1960), struggled with this phenomenon, arguing for what he called the indeterminacy of translation to make sense of it.
The problem of underdetermination of explanation by data appears not just in language acquisition but in abductive inference as well. It’s a deep fact of mathematical logic (i.e., the Löwenheim-Skolem theorem) that any consistent collection of statements (think data) has infinitely many mathematical models (think explanations). This fact is reflected in ordinary everyday abductive inferences. We are confronted with certain data, such as missing documents from a bank safety deposit box. There are many, many ways this might be explained: a thermodynamic accident in which the documents spontaneously combusted and disappeared, a corrupt bank official who stole the documents, a nefarious relative who got access and stole the documents, etc.
No End to “Et Cetera”
But the “et cetera” here has no end. Maybe it was space aliens. Maybe you yourself took and hid the documents, and are now suffering amnesia. There are a virtually infinite number of possible explanations. And yet, somehow, we are often able to determine the best explanation, perhaps with the addition of more data/evidence. But even adding more data/evidence doesn’t eliminate the problem because however much data/evidence you add, the underdetermination problem remains. You may eliminate some hypotheses (perhaps the hypothesis that the bank official did it, but not other hypotheses). But by adding more data/evidence, you’ll also invite new hypotheses. And how do you know which hypotheses are even in the right ballpark, i.e., that they’re relevant? Why is the hypothesis that the bank official took the documents more relevant than the hypothesis that the local ice cream vendor took them? What about the local zoo keeper? We have no clue how to program relevance, and a fortiori we have no clue how to program abductive inference (which depends on assessing relevance). Larson makes this point brilliantly in his book.