Coolen A.C.C., Kühn R., Sollich P. Theory of Neural Information Processing Systems

pdf file
size 6,19 MB

added by Shushimora 12/14/2011 18:02
info modified 03/22/2021 18:42

Coolen A.C.C., Kühn R., Sollich P. Theory of Neural Information Processing Systems

Oxford University Press, 2005, -586 p.

The study of the principles behind information processing in complex networks of simple interacting decision-making units, be these units cells (‘neurons’) in brains or in other nervous tissue, or electronic processors (or even software) in artificial systems inspired by biological neural networks, is one of the few truly interdisciplinary scientific enterprises. The field involves biologists (and psychologists), computer scientists, engineers, physicists, and mathematicians; over the years these have all moved in and out of the centre stage in various combinations and permutations, modulated and triggered by advances in experimental, mathematical, or computational techniques. The reason for its unique interdisciplinary character is that this multifaceted area of research, which from now on we will simply denote as the study of ‘neural information processing systems’, is one of the few which meets the fundamental requirement of fruitful interdisciplinary science: all disciplines get something interesting and worthwhile out of the collaboration. The biologist benefits from tapping into the mathematical techniques offered by the more quantitative sciences, the computer scientist or engineer who is interested in machine learning finds inspiration in biology, and the theoretical physicist or applied mathematician finds new and challenging application domains for newly developed mathematical techniques.
We owe the knowledge that brain tissue consists of complicated networks of interacting brain cells mainly to the work (carried out towards the end of the nineteenth century) of two individuals, who shared the 1906 Nobel Prize in medicine in recognition of this achievement: Camillo Golgi, who invented a revolutionary staining method that for the first time enabled us to actually see neurons and their connections under a microscope, and Santiago Ramyn y Cajal, who used this new technique to map out systematically and draw in meticulous and artful detail the various cell types and network structures which were now being revealed (in fact Cajal had originally wanted to be an artist). Initially and for several decades neural networks continued to be regarded as a branch of medicine and biology. This situation changed, however, with the birth of programmable computing machines around the time of the Second World War, when the word ‘computer’ was still used to denote a person doing computations. It came to be realized that programmable machines might be made to ‘think’, and, conversely, that human thinking could perhaps be understood in the language of programmable machines. This period also saw the conception of ‘information theory’, which was largely the brain child of Claude Shannon. Probably the first to focus systematically on the information processing capabilities of neural networks were Warren McCulloch and Walter Pitts, who published in 1943 a paper (‘A Logical Calculus of the Ideas Immanent in Nervous Activity’) that can safely be regarded as the starting point of our research field. Looking back, one cannot help observing that McCulloch and Pitts were surprisingly typical of the kind of scientist that henceforth would tend to be drawn into this area. McCulloch had studied philosophy and psychology, then moved into medicine, and ended up in a laboratory of electronic engineering. Pitts, who was only 20 at the time when ‘A Logical Calculus’ was published, initially studied mathematics and also ended up in electronic engineering, but he never received a formal academic degree. It is not unreasonable to take the view that bringing together these disparate scientific backgrounds and interests was crucial to the achievement of McCulloch and Pitts.
The field never lost the interdisciplinary flavour with which it was born. Since the 1940s its popularity peaked at (roughly) 20-year intervals, with a second wave in the 1960s (the launch of the perceptron, and the exploration of learning rules for individual neurons), and a more recent wave in the 1980s (which saw the development of learning rules for multilayer neural networks, and the extensive application of statistical mechanics techniques to recurrent ones). Extrapolation of this trend would suggest that interesting times might soon be upon us. However, the interdisciplinary character of neural network research was also found to have drawbacks: it is neither a trivial matter to keep the disciplines involved connected (due to language barriers, motivation differences, lack of appropriate journals etc.), nor to execute effective quality control (which here requires both depth and unusual breadth). As a result, several important discoveries had to be made more than once, before they found themselves recognized as such (and hence credit was not always allocated where in retrospect it should have been). In this context one may appreciate the special role of textbooks, which allow those interested in contributing towards this field to avoid first having to study discipline specific research papers from fields in which they have not been trained.
Following the most recent wave of activity in the theory of neural information processing systems, several excellent textbooks intended specifically for an interdisciplinary audience were published around 1990. Since then, however, the connectivity between disciplines has again decreased. Neural network research still continues with energy and passion, but now mostly according to the individual scientific agendas, the style, and the notation of the traditional stake-holding disciplines. As a consequence, those neural network theory textbooks which deal with the progress which has been achieved since (roughly) 1990, tend to be of a different character. They are excellent expositions, but often quite specialized, and focused primarily on the questions and methods of a single discipline.
The present textbook aims to partly remedy this situation, by giving an explicit, coherent, and up-to-date account of the modern theory of neural information processing systems, aimed at students with an undergraduate degree in any quantitative discipline (e.g. computer science, physics, engineering, biology, or mathematics). The book tries to cover all the major theoretical developments from the 1940s right up to the present day, as they have been contributed over the years by the different disciplines, within a uniform style of presentation and of mathematical notation. It starts with simple model neurons in the spirit of McCulloch and Pitts, and includes not only the mainstream topics of the 1960s and 1980s (perceptrons, multilayer networks, learning rules and learning dynamics, Boltzmann machines, statistical mechanics of recurrent networks etc.) but also the more recent developments of, say, the last 15 years (such as the application of Bayesian methods, Gaussian processes and support vector machines) and an introduction to Amari’s information geometry. The text is fully self-contained, including introductions to the various discipline-specific mathematical tools (e.g. information theory, or statistical mechanics), and with multiple exercises on each topic. It does not assume prior familiarity with neural networks; only the basic elements of calculus and linear algebra, and an open mind. The book is pitched at the typical postgraduate student: it hopes to bring students with an undergraduate degree to the level where they can actually contribute to research in an academic or industrial environment. As such, the book could be used either in the classroom as a textbook for postgraduate lecture courses, or for the training of individual Ph.D. students in the first phase of their studies, or as a reference text for those who are already involved in neural information processing research. The material has been developed, used, and tested by the authors over a period of some 8 years, split into four individual one semester lecture courses, in the context of a one-year inter-disciplinary Master’s programme in Information Processing and Neural Networks at King’s College London.

Part I Introduction to neural networks
General introduction
Layered networks
Recurrent networks with binary neurons
Part II Advanced neural networks
Competitive unsupervised learning processes
Bayesian techniques in supervised learning
Gaussian processes
Support vector machines for binary classification
Part III Information theory and neural networks
Measuring information
Identification of entropy as an information measure
Building blocks of Shannon’s information theory
Information theory and statistical inference
Applications to neural networks
Part IV Macroscopic analysis of dynamics
Network operation: macroscopic dynamics
Dynamics of online learning in binary perceptrons
Dynamics of online gradient descent learning
Part V Equilibrium statistical mechanics of neural networks
Basics of equilibrium statistical mechanics
Network operation: equilibrium analysis
Gardner theory of task realizability
A: Probability theory in a nutshell
B: Conditions for the central limit theorem to apply
C: Some simple summation identities
D: Gaussian integrals and probability distributions
E: Matrix identities
F: The δ-distribution
G: Inequalities based on convexity
H: Metrics for parametrized probability distributions
I: Saddle-point integration

Home

Coolen A.C.C., Kühn R., Sollich P. Theory of Neural Information Processing Systems

See also

Anthony M., Bartlett P.L. Neural Network Learning: Theoretical Foundations

Franco L., Elizondo D.A., Jerez J.M. (eds.) Constructive Neural Networks

Goodfellow Ian, Bengio Yoshua, Courville Aaron. Deep Learning Book

Hu X., Balasubramaniam P. (eds.) Recurrent Neural Networks

Murphy K.P. Machine Learning: A Probabilistic Perspective

Nisbet R., Elder J., Miner G. Handbook of Statistical Analysis and Data Mining Applications