We present a GPU implementation of Algorithm NCL, an augmented Lagrangian method for solving large-scale and degenerate nonlinear programs. Although interior-point methods and sequential quadratic programming are widely used for solving nonlinear programs, the augmented Lagrangian method is known to offer superior robustness against constraint degeneracies and can rapidly detect infeasibility. We introduce several enhancements to Algorithm NCL, including fusion of the inner and outer loops and use of extrapolation steps, which improve both efficiency and convergence stability. Further, NCL has the key advantage of being well-suited for GPU architectures because of the regularity of the KKT systems provided by quadratic penalty terms. In particular, the NCL subproblem formulation allows the KKT systems to be naturally expressed as either stabilized or condensed KKT systems, whereas the interior-point approach requires aggressive reformulations or relaxations to make it suitable for GPUs. Both systems can be efficiently solved on GPUs using sparse \ldlt factorization with static pivoting, as implemented in NVIDIA cuDSS. Building on these advantages, we examine the KKT systems arising from NCL subproblems. We present an optimized GPU implementation of Algorithm NCL by leveraging MadNLP as an interior-point subproblem solver and utilizing the stabilized and condensed formulations of the KKT systems for computing Newton steps. Numerical experiments on various large-scale and degenerate NLPs, including optimal power flow, COPS benchmarks, and security-constrained optimal power flow, demonstrate that MadNCL operates efficiently on GPUs while effectively managing problem degeneracy, including MPCC constraints.