Approximating Optimal Binary Decision Trees

Brent Heeringa
Department of Computer Science, University of Massachusetts Amherst

Abstract: Decision trees have a rich history in computer science with applications ranging from medical diagnosis to experiment design. An instance of Decision Tree (DT) is a set of m binary tests T=(T_1, ..., T_m) and a set of n items X=(X_1, ..., X_n). The goal is to output a binary tree where each internal node is a test, each leaf is an item and the total external path length of the tree is minimized. The total external path length of the tree is the sum of the depths of all the leaves in the tree. In this talk, I give a (ln n + 1)-approximation for the Decision Tree problem. I also show that DT does not have a Polynomial Time Approximation Scheme (PTAS) unless P=NP. This work, while providing the first non-trivial upper and lower bounds on approximating DT, also demonstrates that DT and a subtly different problem which also bears the name decision tree, have fundamentally different approximation complexity.