Classification with Guaranteed Probability of Error

We introduce a general-purpose learning machine that we call the “Guaranteed Error Machine”, or GEM, and two learning algorithms, a “real GEM algorithm” and an “ideal GEM algorithm”. The real GEM algorithm is for use in real applications, while the ideal GEM algorithm is introduced as a theoretical tool; however, these two algorithms have identical behavior most of the time. Differently from most learning machines, GEM has a ternary-valued output, that is besides 0 and 1 it can return an “unknown” label, expressing doubt. Our central result is that, under general conditions, the statistics of the generalization error for the ideal GEM algorithm is universal, in the sense that it remains the same, independently of the (unknown) mechanism that generates the data. As a consequence, the user can select a desired level of generalization error and the learning machine is automatically adjusted so as to meet this desired level, and no knowledge of the data generation mechanism is required in this process; the adjustment is achieved by modulating the size of the region where the machine returns the “unknown” label. The key-point is that no conservatism is present in this process because the statistics of the generalization error is known. We further show that the generalization error of the real algorithm is always no larger than the generalization error of the ideal algorithm. Thus, the generalization error computed for the latter can be rigorously used as a bound for the former, and, moreover, it provably provides tight evaluations in normal cases.

Article

Download

View PDF