We study the problem of choosing the best subset of p features in linear regression given n observations. This problem naturally contains two objective functions including minimizing the amount of bias and minimizing the number of predictors. The existing approaches transform the problem into a single-objective optimization problem either by combining the two objectives using some weights or by treating one of them as a constraint. We explain the main weaknesses of both of these approaches, and to overcome their drawbacks, we propose a bi-objective mixed integer linear programming approach with the property that it can handle additional constraints as well. We conduct a computational study and show that existing bi-objective optimization solvers are able to solve the problem in a reasonable time.