Greff K. Extending Hierarchical Temporal Memory for Sequence Classification

pdf file
size 1,87 MB

added by Shushimora 05/31/2014 00:50
info modified 09/09/2023 11:11

Greff K. Extending Hierarchical Temporal Memory for Sequence Classification

Diploma from Technische Universität Kaiserslautern, 2010, -93 p.

This thesis tackles the problem of sequence learning using Hierarchical Temporal Memory as a first step towards a framework for combined temporal and spatial inference. Sequence learning is a crucial part of intelligence. Sun [2001] even considered it the most prevalent form of human and animal learning. There are many applications where sequences are the pivotal elements including natural language processing, speech recognition, video analysis, planning, robotics, adaptive controls, time series prediction, finance, DNA sequencing, compression and many more. A large variety of methods deal with all the different forms of sequence learning. They include time-series analysis, regression, compression, grammars, symbolic planning, hidden Markov models, conditional random fields, recurrent neural networks et cetera.
However, in most real world applications, the temporal component is not the only dimension of the problem. Often, there is also a spatial1 problem to solve, like object recognition in the case of video analysis or identifying phonemes for speech recognition. While classification of spatial patterns is well studied, the combination of both, spatial and temporal classification, is not. In most cases, they are performed separately, i.e. first do the spatial classification and then do sequence learning on those classes. Well known examples are hidden Markov models and conditional random fields. Only few methods exist that combine both spatial and temporal learning in a very tight way (e.g. recurrent neural networks do this).
The problem is that useful information is lost due to this separation. Temporal classification could support spatial classification at various levels of abstraction. For example, to filter background noise or to track multiple objects and predict their mutual occlusion. Sequence learning enables the powerful ability to make predictions, which could be applied to verify the interpretation and to disambiguate input. This is, up to a certain point, also true for a simple concatenation of spatial and then temporal learning, but it would be much stronger for a real joint inference. Therefore, it is an important issue to figure out how to tightly combine both methods.
Hierarchical Temporal Memory (HTM) possibly offers a solution for this joint inference. It is a quite new technology (2008) inspired by the human cortex to do (spatial) classification. They have shown some promising results including CAPTCHA recognition [Hall and Poplin, 2007], content-based image retrieval [Bobier and Wirth, 2008] and spoken digit recognition [van Doremalen and Boves, 2008].
What is interesting about it is that although it is designed to do spatial classification, it uses the temporal structure of the training data. The idea can roughly be summarized as observations that are close to each other in time are likely to belong to the same cause/object. This is used to build invariant representations at different levels of abstraction within their hierarchical structure. Thereby HTMs utilize yet another advantage of the connection between spatial and temporal structure. However, when it comes to classification HTMs discard all the temporal information. So, once the training is completed, inference relies on spatial information only. At this point, there is obviously room for improvement.
The goal of this thesis is to explore the possibilities of combining spatial and temporal learning by extending Hierarchical Temporal Memory. We will adopt the assumption of hierarchically structured data, which allows HTMs to have the main classification task split up into a set of smaller tasks at different levels of abstraction. This allows moving the separation of spatial and temporal learning to a much smaller scale, so their cooperation can be much closer. Another advantage of this approach is that we can potentially re-use any existing algorithms that have been developed for sequence learning and for spatial classification because we still separate those tasks.
Implicit in approach is the assumption that the data is structured hierarchically in both, space and time. Otherwise, the described way of splitting will not work. But we believe that this hierarchical structure is inherent in a wide range of real data. Temporal hierarchies can be found for example in speech which decomposes in phonemes, syllables, morphemes, words and sentences, music that can be divided into themes, periods, phrases and motifs (see also figure 1.1) and movies which consist of frames, shots, scenes and parts. Spatial hierarchies are even more obviously found: A car consists of a chassis, wheels, doors and an engine, which consists of cylinders, spark plugs, valves and so forth. A piece of music often contains different instruments and sometimes one or more voices. A typical dinner consists of an appetizer, a main course and a dessert, etc.
Considering these examples, we expect the required structure to be inherent in many interesting real world problems. Therefore a framework that provides a close cooperation between spatial and temporal learning will probably significantly improve performance on complicated problems like video analysis or speech recognition with high background noise. In general, this method could help to tackle very difficult problems that have not been solved yet.
In this thesis, we extend the theoretical framework of HTMs enabling them to do sequence classification. The improved framework is implemented and used to evaluate the algorithms on artificial data. We show this approach to be a viable first step towards a joint inference.

Hierarchical Temporal Memory.
State of Art.
Hierarchical Learning.
Implementation.
Results.
Conclusion and Perspective.
A. Algorithms.