Stochastic approximation techniques play an important role in solving many problems encountered in machine learning or adaptive signal processing. In these contexts, the statistics of the data are often unknown a priori or their direct computation is too intensive, and they have thus to be estimated online from the observed signals. For batch optimization of an objective function being the sum of a data fidelity term and a penalization (e.g. a sparsity promoting function), Majorize- Minimize (MM) methods have recently attracted much interest since they are fast, highly flexible, and effective in ensuring convergence. The goal of this paper is to show how these methods can be successfully extended to the case when the data fidelity term corresponds to a least squares criterion and the cost function is replaced by a sequence of stochastic approximations of it. In this context, we propose an online version of an MM subspace algorithm and we study its convergence by using suitable probabilistic tools. Simulation results illustrate the good practical performance of the proposed algorithm associated with a memory gradient subspace, when applied to both non-adaptive and adaptive filter identification problems.