The approximation of stochastic processes by trees is an important topic in multistage stochastic programming. In this paper we focus on improving the approximation of large trees by smaller (tractable) trees. The quality of the approximation is measured by the nested distance, recently introduced in [Pflug]. The nested distance is derived from the Wasserstein distance. It additionally takes into account the effect of information, which is increasing over time. After discussing the basic relations between processes and trees and reviewing the nested distance we introduce and analyze an algorithm for finding good approximations. The algorithm, step by step, improves the probabilities on a tree, and also improves the paths. For the important case of quadratic nested distances the algorithm, generalizing multistage, k-means clustering, finds locally best approximating trees in finitely many iterations.