Projecting a vector onto a simplex is a well-studied problem that arises in a wide range of optimization problems. Numerous algorithms have been proposed for determining the projection; however, all but one of these algorithms are serial. We address this gap by developing a method that preprocesses the input vector by decomposing and distributing it across multiple processors for local projection. Our method is especially effective when the projection is highly sparse; which is the case, for instance, in large-scale problems with i.i.d. entries. Moreover, the method can be adapted to work with a broad range of serial algorithms from the literature. We fill in theoretical gaps in serial algorithm analysis, and develop similar results for our parallel analogues. Numerical experiments conducted on a wide range of large-scale instances demonstrate the practical effectiveness of the method.