O’Reilly, 2009, -504 p.
This is a book about Natural Language Processing. By natural language we mean a language that is used for everyday communication by humans; languages such as English, Hindi, or Portuguese. In contrast to artificial languages such as programming languages and mathematical notations, natural languages have evolved as they pass from generation to generation, and are hard to pin down with explicit rules. We will take Natural Language Processing — or NLP for short — in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles. At the other extreme, NLP involves understanding complete human utterances, at least to the extent of being able to give useful responses to them.
Technologies based on NLP are becoming increasingly widespread. For example, phones and handheld computers support predictive text and handwriting recognition; web search engines give access to information locked up in unstructured text; machine translation allows us to retrieve texts written in Chinese and read them in Spanish. By providing more natural human-machine interfaces, and more sophisticated access to stored information, language processing has come to play a central role in the multilingual information society.
This book provides a highly accessible introduction to the field of NLP. It can be used for individual study or as the textbook for a course on natural language processing or computational linguistics, or as a supplement to courses in artificial intelligence, text mining, or corpus linguistics. The book is intensely practical, containing hundreds of fully worked examples and graded exercises.
The book is based on the Python programming language together with an open-source library called the Natural Language Toolkit (NLTK).
Language Processing and Python
Accessing Text Corpora and Lexical Resources
Processing Raw Text
Writing Structured Programs
Categorizing and Tagging Words
Learning to Classify Text
Extracting Information from Text
Analyzing Sentence Structure
Building Feature-Based Grammars
Analyzing the Meaning of Sentences
Managing Linguistic Data