We consider multiclass classification problems where the set of labels are organized hierarchically as a category tree. We associate each node in the tree with a classifier and classify the examples recursively from the root to the leaves. We propose a hierarchical Support Vector Machine (SVM) that encourages the classifier at each node of the tree to be different from the classifiers at its ancestors. More specifically, we introduce regularizations that force the normal vector of the classifying hyperplane at each node to be orthogonal to those at its ancestors as much as possible. We establish conditions under which training such a hierarchical SVM is a convex optimization problem, and develop an efficient dual-averaging method for solving it. We evaluate the method on a number of real-world text categorization tasks and obtain state-of-the-art performance.
Microsoft Research Technical Report MSR-TR-2011-54. A short version of this paper (without proofs in the appendix) appearing in Proceedings of the 28th International Conference on Machine Learning (ICML), 2011.