Adjeroh D., Bell T., Mukherjee A. The Burrows-Wheeler Transform. Data Compression, Suffix Arrays, and Pattern Matching

pdf file
size 2,09 MB

added by Shushimora 12/27/2015 00:53
info modified 12/27/2015 23:06

Adjeroh D., Bell T., Mukherjee A. The Burrows-Wheeler Transform. Data Compression, Suffix Arrays, and Pattern Matching

Springer, 2008. — 353 p.
The Burrows-Wheeler Transform is one of the best lossless compression methods available. It is an intriguing — even puzzling — approach to squeezing redundancy out of data, it has an interesting history, and it has applications well beyond its original purpose as a compression method. It is a relatively late addition to the compression canon, and hence our motivation to write this book, looking at the method in detail, bringing together the threads that led to its discovery and development, and speculating on what future ideas might grow out of it.
The book is aimed at a wide audience, ranging from those interested in learning a little more than the short descriptions of the BWT given in standard texts, through to those whose research is building on what we know about compression and pattern matching. The first few chapters are a careful description suitable for readers with an elementary computer science background (and these chapters have been used in undergraduate courses), but later chapters collect a wide range of detailed developments, some of which are built on advanced concepts from a range of computer science topics (for example, some of the advanced material has been used in a graduate computer science course in string algorithms). Some of the later explanations require some mathematical sophistication, but most should be accessible to those with a broad background in computer science.
We have aimed to provide a detailed introduction to the current state of knowledge about the Burrows-Wheeler Transform. This ranges from explanations and examples of how the transform works, through analyzing the theoretical performance of the transform from various view points, to considering issues relevant to implementing it on “real” systems. Each chapter (except the last one) contains a “further reading” section to guide the reader around the large collection of literature that has explored the BWT in detail, and Appendix B points to ongoing research.
An important theme in this book is pattern matching and text indexing using the BWT. Because the transformed text contains a sorted version of the original text, it has considerable potential to help with locating patterns, and we look in detail at a number of variations that have been proposed and evaluated.
The BWT literature uses a variety of notation for the various structures used in the transform. Where possible we have tried to use standard notation, but unfortunately some key notations conflict with those used in the standard pattern matching literature, and so we have chosen to coin some new notations to avoid having the same notation with two meanings, at times in the same paragraph! Appendix A gives a summary of the notation used to avoid any confusion.
The BWT continues to be actively researched, and this book is merely a milestone in its history. Appendix B gives links to web sites that will be worth watching for future developments of the BWT and related systems.

How the Burrows-Wheeler Transform works.
Coders for the Burrows-Wheeler Transform.
Suffix trees and suffix arrays.
Analysis of the Burrows-Wheeler Transform.
Variants of the Burrows-Wheeler Transform.
Exact and approximate pattern matching.
Other applications of the Burrows-Wheeler Transform.

A: Notation.
B: Ongoing work on the Burrows-Wheeler Transform.