Though empirical testing is broadly used to evaluate heuristics, there are shortcomings with how it is often applied in practice. In a systematic review of Max-Cut and Quadratic Unconstrained Binary Optimization (QUBO) heuristics papers, we found only 4% publish source code, only 14% compare heuristics with identical termination criteria, and most experiments are performed with an artificial, homogeneous set of problem instances. To address these limitations, we implement and release as open-source a code-base of 10 MaxCut and 27 QUBO heuristics.We perform heuristic evaluation using cloud computing across a library of 3,296 instances. This large-scale evaluation provides insight into the types of problem instances for which each heuristic performs well or poorly. Because no single heuristic outperforms all others across all problem instances, we use machine learning to predict which heuristic will work best on a previously unseen problem instance, a key question facing practitioners.