Parallel Construction of Suffix Trees

Jim Fix, Department of Mathematics, Reed College

Abstract: Suffix trees and their close brethren, suffix arrays, are structures for storing and indexing a full text. As an index they are extremely powerful, handling a variety of questions about the text, say, "What's the longest repeated component?", quickly. They use memory efficiently, using space that's proportional to the size of the text, and they can be quickly constructed with a single pass over the text.

However, for very long texts that exceed the capacity of a conventional computer's internal memory, one needs to play algorithmic tricks—using multiple networked computers or carefully managing external storage to disk—to benefit from their use as a text representation. In this talk, I'll examine parallel algorithms for constructing suffix trees and how their exploration inspired recent novel sequential algorithms for suffix arrays.