We consider optimization problems over the Stiefel manifold whose objective function is the summation of a smooth function and a nonsmooth function. Existing methods for solving this kind of problems can be classified into three classes. Algorithms in the first class rely on information of the subgradients of the objective function and thus tend to converge slowly in practice. Algorithms in the second class are proximal point algorithms, which involve subproblems that can be as difficult as the original problem. Algorithms in the third class are based on operator-splitting techniques, but they usually lack rigorous convergence guarantees. In this paper, we propose a retraction-based proximal gradient method for solving this class of problems. We prove that the proposed method globally converges to a stationary point. Iteration complexity for obtaining an $\epsilon$-stationary solution is also analyzed. Numerical results on solving sparse PCA and compressed modes problems are reported to demonstrate the advantages of the proposed method.