Discrete Optimal Transport problems give rise to very large linear programs (LP) with a particular structure of the constraint matrix. In this paper we present an interior point method (IPM) specialized for the LP originating from the Kantorovich Optimal Transport problem. Knowing that optimal solutions of such problems display a high degree of sparsity, we propose a column-generation-like technique to force all intermediate iterates to be as sparse as possible. The algorithm is implemented nearly matrix-free. Indeed, most of the computations avoid forming the huge matrices involved and solve the Newton system using only a much smaller Schur complement of the normal equations. We prove theoretical results about the sparsity pattern of the optimal solution, exploiting the graph structure of the underlying problem. We use these results to mix iterative and direct linear solvers efficiently, in a way that avoids producing preconditioners or factorizations with excessive fill-in and at the same time guaranteeing a low number of conjugate gradient iterations. We compare the proposed sparse IPM method with a state-of-the-art solver and show that it can compete with the best network optimization tools in terms of computational time and memory usage. We perform experiments with problems reaching more than a billion variables and demonstrate the robustness of the proposed method.