North-Holland, 1991, -286 p.
The first time I became interested in Neural Nets and Statistical Pattern Recognition was in early 1958 while I was a graduate student in the Moore School of Electrical Engineering of the University of Pennsylvania. My student subscription to the NEW YORKER magazine brought many chuckles from cartoons and stories but the only item from all those many issues that has stayed with me was a column in the December 6, 1958 issue titled "Rival" This covered an interview with Frank Rosenblatt in which he described his hopes for his "artificial intelligences" which would rival humans in perception and problem solving. By the time I read this column I knew a fair amount about Rosenblatt's research on Perceptrons, since as part of a machine learning and recognition research project and a search for a dissertation topic, I had spent much time pouring over his Cornell Aeronautical Laboratory reports. I had also read parts of a book Stochastic Models lor Learning by Bush and Mosteller (Wiley, 1955) and been studying papers on Statistical Discrimination, in particular papers by C R. Rao and the chapter on Problems of Discrimination in his book Advanced Statistical Methods in Biometric Research (Wiley, 1952). About the same time Robert Bush joined the University of Pennsylvania as chairman of Psychology. I chose Bush as my dissertation advisor, and with some support from R. Duncan Luce did a dissertation (for the Ph.D in electrical engineering!) on the analysis of some stochastic processes arising from Luce's nonlinear' "Beta" model for learning. This is how learning models, artificial neural networks, and statistical pattern classification came together in my cognizance.
Two years later, when I joined General Dynamics/Electronics (GDIE) in Rochester, New York, as Manager of the Machine Intelligence Advanced Development Laboratory, it seemed as though every company and university laboratory was working on perceptron type machines. At GDIE we also implemented our own version of an adaptive pattern recognizer which was soon called APE (Adaptive Pattern Encoder). There were many other learning machines implemented by various organizations, machines with names such as MINOS, SOCRATES, and of course ADA LINE and MADALINE. It was a time for catchy names and audacious claims [see Kanal, Proc IEEE, October 1972]. Clearly PERCEPTRON and ADALINE were the key innovations and they had memorable names, although I have it on good authority that in the 1980's when the new machine vision company Perceptron was formed, its founders had no idea that the name they had come up with had a previous incarnation. Because of simultaneous exposure to papers on learning models, perceptrons, and statistical discrimination, my attempts at understanding perceptrons and other "bionic" networks were formulated in terms of statistical classification methods, stochastic approximation procedures and stochastic models for learning "Evaluation of a class of Pattern Recognition Networks" presented at the Bionics conference in Ithaca, NY in 1961 and reprinted in this book, summarized some of that understanding. It may seem surprising now, but at that time it had been stated by some of the well known researchers writing in the engineering literature on pattern recognition, that the use of a weighted sum of binary variables as done in the perceptron type classification function 1iInited the variables to being statistically independent.
Rosenblatt had not limited himself to using just a single Threshold Logic Unit but used networks of such units. The problem was how to train multilayer perceptron networks. A paper on the topic written by Block, Knight and Rosenblatt was murky indeed, and did not demonstrate a convergent procedure to train such networks. In 1962-63 at Philco-Ford, seeking a systematic approach to designing layered classification nets, we decided to use a hierarchy of threshold logic units with a first layer of "feature logics" which were threshold logic units on overlapping receptive fields of the image, feeding two additional levels of weighted threshold logic decision units. The weights in each level of the hierarchy were estimated using statistical methods rather than iterative training procedures [L.N. Kanal & N.C. Randall, Recognition System Design by Statistical Analysis, Proc. 19th Conf. ACM, 1964]. We referred to the networks as two layer networks since we did not count the input as a layer. On a project to recognize tanks in aerial photography, the method worked well enough in practice that the U.S. Army agency sponsoring the project decided to classify the final reports, although previously the project had been unclassified. We were unable to publish the classified results! Then, enamoured by the claimed promise of coherent optical filtering as a parallel implementation for automatic target recognition, the funding we had been promised was diverted away from our electro-optical implementation to a coherent optical filtering group. Some years later we presented the arguments favoring our approach, compared to optical implementations and trainable systems, in an article titled "Systems Considerations for Automatic Imagery Screening" by T.J. Harley, L.N. Kanal and N.C Randall, which is included in the IEEE Press reprint volume titled Machine Recognition of Patterns edited by A. Agrawala. In the years which followed multilevel statistically designed classifiers and AI search procedures applied to pattern recognition held my interest, although comments in my 1974 survey "Patterns In Pattern Recognition: 1968-1974" [IEEE Trans. on IT, 1974], mention papers by Amari and others and show an awareness that neural networks and biologically motivated automata were making a comeback
In the last few years trainable multilayer neural networks have returned to dominate research in pattern recognition and this time there is potential for gaining much greater insight into their systematic design and performance analysis. Artificial neural networks trained on sample data are nonparametric statistical estimators for densities and classifiers. This leads to many questions about ANN's in comparison to alternate statistical methodologies. Such questions include the information requirements for each approach, the sample sizes for design and test, the robustness to incomplete data and different types of noise, and the generalization capability of competing procedures Additional points of comparison concern the relations of the sizes of feature vectors for each pattern category; the capability for variable-length vector pattern recognition; the capability for fusion of multiple sources or sensors; the ability to incorporate domain knowledge; the ability to work with other pattern recognition paradigms in an integrated fashion; the ability of the methodology to extend to other types of problem solving, e.g., combinatorial optimization, resource allocation, etc., using the same general network architecture; the suitability for easy mapping to VLSI or other parallel architecture. The capability of neural networks to combine adaptation with parallelism in an easy and natural fashion and the ability of learning continuously while working on a problem in a real environment are of particular interest. Finally, the cost of implementation and of training personnel in the methodology will also be determiners of comparative success.
Some of the above questions are beginning to be addressed in the literature and the present volume is also a good start in this direction. I am thankful to Professors Anil K. Jain and Ishwar K. Sethi for then initiative in assembling and editing this volume and to the authors of each chapter for their contribution to this volume. The richness of the artificial neural network paradigm for pattern recognition ensures that, despite the many individuals working in this area, much work remains to be done to gain a true understanding of ANN methodologies and their relation to better understood pattern recognition methods I expect that additional volumes will be assembled and published in this book series on the subject of artificial neural networks and their relation to and interaction with statistical pattern recognition, genetic algorithms, expert systems, and other approaches to the machine recognition of patterns.
I ANN and SPR RelationshipEvaluation of a Class of Pattern-Recognition Networks
Links between Artificial Neural Networks (ANN) and Statistical Pattern Recognition
Small Sample Size Problems in Designing Artificial Neural Networks
On Tree Structured Classifiers
Decision Tree Performance Enhancement Using an Artificial Neural Network Implementation
II ApplicationsBayesian and Neural Network Pattern Recognition: A Theoretical Connection and Empirical Results with Handwritten Characters
Shape and Texture Recognition by a Neural Network
Neural Networks for Textured Image Processing
Markov Random fields and Neural Networks with Applications to Early Vision Problems
Connectionist Models and their Application to Automatic Speech Recognition
III Implementation AspectsDynamic Associative Memories
Optical Associative Memories
Artificial Neural Nets in MaS Silicon