Nobody doubts that humans have unique innate capabilities for understanding language (although it is unknown to what extent these capabilities are specific to language and to what extent they are general cognitive abilities related to sequencing and forming abstractions).

This result was taken by Chomsky and others to mean that it is impossible for children to learn human languages without having an innate "language organ." As and others show, this was an invalid conclusion; the task of getting 100% on the quiz (which Gold called ) really has nothing in common with the task of performed by children, so Gold's Theorem has no relevance.

Gold's result is that if the infinite set of languages are all generated by context-free grammars then there is no strategy for guesser that guarantees she gets 100% correct every time, no matter what N you choose for the birthday.

But the vast majority of people who study tasks, such as speech recognition, quickly see that interpretation is an inherently probabilistic problem: given a stream of noisy input to my ears, what did the speaker most likely mean?

But I observe that science and engineering develop together, and that engineering success shows that something is working right, and so is evidence (but not proof) of a scientifically successful model.

For the remainder of this essay we will concentrate on reasons: that probabilistic models better representlinguistic facts, and statistical techniques make it easier for us tomake sense of those facts.

This section has shown that one reason why the vast majority ofresearchers in computational linguistics use statistical models is an reason: statistical models have state-of-the-artperformance, and in most cases non-statistical models perform worst.

In a probabilistic framework, there will be multiple parameters, perhaps with continuous values, and it is easy to see how the shift can take place gradually over two centuries.

