Towards a complete perspective on labeled tree indexing: new size bounds, efficient constructions, and beyond
氏名 Shunsuke Inenaga
勤務先・所属・役職 Kyushu University・Department of Informatics・Associate Professor
A labeled tree (trie) is a compact representation of a set of strings. This paper considers the labeled tree indexing problem, and provides a number of new results. Kosaraju [FOCS1989] proposed the suffix tree for a backward trie, where the strings in the trie are read in the leaf-to-root direction. We call a usual trie as a forward trie. Despite a few works after Kosaraju, indexing forward/backward tries is not well understood yet. We show a full perspective on the sizes of indexing structures such as suffix trees/arrays, DAWGs, and CDAWGs, for forward and backward tries. Some of them take O(n) space in the size n of the input trie, while the others can occupy O(n2) space. Further, we show that there is a compact O(n)-space implicit representation of the DAWG for a forward trie, whose naive representation requires O(n2) space. This compact representation allows for simulating each DAWG edge traversal in O(log σ) time, and can be constructed in O(n) time and space over any integer alphabet of size σ = O(n). This readily extends to the first indexing structure that permits bidirectional pattern searches over a trie within linear space in the input trie size.
This paper discusses fundamental techniques on compact representation of a set of strings, which are important for high-speed information processing of many kinds of databases. The author comprehensively describes the state-of-the-art methods for labeled tree indexing problems, such as backward tries, suffix trees/arrays, DAWGs and their variations. The author then proposes several non-trivial, more exact theoretical results on space and time complexities to evaluate the performances of those related data structures. The author also clarifies the current remaining problems to suggest some prospective research directions. The paper is very well-organized and it will lead to progress of theories and applications for various information processing. Thus, we consider that the paper is worthy of awarding a 60th Anniversary Best Paper.