In September of 2015, President Obama announced the release of data on all universities in the United States. The data includes "how much each school’s graduates earn, how much debt they graduate with, and what percentage of a school’s students can pay back their loans – which will help all of us see which schools do the best job of preparing America for success."
So let's jump into the data and see what it tells us about how much each school’s graduates earn, how much debt they graduate with, and what percentage of a school’s students can pay back their loans. But first, to get some perspective on the issue of college debt, I want to address something we've all heard a lot about: the rising cost of college tuition. There are many articles about this subject in publications such as the New York Times (ref 1, 2), The Washington Post (ref), Time Magazine (ref), and others.
In figure 1 I plot the average faculty salary vs the in-state tuition and fees at nearly every university in the United States. I've color-coded the universities by sector in order to make better sense of the data.
The increase in tuition and faculty salaries is striking. Take for example Columbia University in New York. In 2001, tuition and fees were $27k/year. In 2014 that number grew to $51k/year, an 89% increase. In the same period, inflation rose the value of the dollar by just 39%.
Something else that catches the eye in figure 1 is the linearity of the non-profit 4 year college (orange) data. If we fit a line to the data, shown in figure 2, we find a slope of 1.62 -- this means that if tuition is raised by one dollar, on average the faculty salaries are raised by $1.62. This raises the question of how many students there are there per faculty member on average... when there's a tuition increase, what fraction of the additional revenue goes to faculty salaries? Although the CollegeScorecard data does not include the number of faculty per university (surprisingly it isn't one of the 1745 columns!), this information can be found in the Delta Cost Project database. The data reveals that for non-profit 4 year colleges there are on average 16.22 students per full time faculty member in 2012. Thus, out of the revenue generated by a tuition increased of one dollar, it appears that roughly 1/10 of that revenue goes towards paying the faculty more. This is a bit misleading because many of the top schools have lower student-faculty ratios, for example Harvard is 7:1, but I'll save a more in-depth analysis for another post.
Now that we have some perspective on tuition, let's return to how much each school’s graduates earn, and how much debt they graduate with. The CollegeScorecard data contains a column for "median earnings of students working and not enrolled 10 years after entry," and "the median original amount of the loan principal upon entering repayment." For brevity I'll refer to these as "earnings" and "debt", respectively. Note that here are many variations of these columns in the dataset if you're interested in exploring further.
In figure 3 I plot earnings vs debt for all universities in the United States. This plot appears much more scattered, with less clumping by sector compared to the faculty salary vs tuition plot above.
So which universities yield the highest-earning students? I assumed that the Ivy-League schools would pepper the top of this plot, but I was incorrect. In fact, the schools that produce the top-earners are medically-associated colleges. At the top of the list is the Louisiana State University Health Sciences Center-Shreveport, although this is slightly misleading considering they have a very small number of undergraduates (50 in 2012, currently 35 ref). A list of the top-earning universities is given in table 1 (click the button below to show).
The next question may be what does the opposite table look like? I.e., which universities are at the bottom of the list in terms of student earnings? Inspecting the bottom earning schools, I found that a number of schools in the bottom 30 listed "PrivacySuppressed", which means that the sample size was very small; so small, that a person might be able to identify the people whose salary data was used. I have excluded these universities from the list in table 2.
|Institution name||Median earnings ($)||Total number of undergrads at institution||Sample size for earnings|
|1||Louisiana State University Health Sciences Center||186500.0||50.0||70|
|2||SUNY Downstate Medical Center||128000.0||335.0||211|
|3||Albany College of Pharmacy and Health Sciences||118800.0||1069.0||618|
|5||Samuel Merritt University||108000.0||520.0||681|
|6||University of Medicine and Dentistry of New Jersey||107100.0||974.0||1346|
|7||University of Texas Southwestern Medical Center||106900.0||33.0||36|
|8||University of the Sciences||95800.0||1782.0||968|
|10||Montefiore School of Nursing||89500.0||129.0||97|
|11||Massachusetts Institute of Technology||89200.0||4477.0||770|
|12||Los Angeles County College of Nursing and Allied Health||87200.0||208.0||117|
|14||Thomas Jefferson University||86300.0||744.0||909|
|15||Cochran School of Nursing||86000.0||93.0||139|
|17||Upstate Medical University||85900.0||295.0||210|
|18||Helene Fuld College of Nursing||84200.0||354.0||442|
|20||Stevens Institute of Technology||83700.0||2542.0||991|
|21||United States Merchant Marine Academy||82000.0||987.0||116|
|22||University of Maryland Baltimore||80700.0||722.0||617|
|23||Worcester Polytechnic Institute||80300.0||3841.0||1343|
|24||University of Pennsylvania||79700.0||10679.0||2570|
|25||Rensselaer Polytechnic Institute||79600.0||5300.0||2058|
|26||The California Maritime Academy||79400.0||971.0||373|
|27||DigiPen Institute of Technology||79400.0||963.0||430|
|28||Medical University of South Carolina||79400.0||204.0||747|
|29||Rose-Hulman Institute of Technology||79200.0||2097.0||690|
|30||Maine Maritime Academy||78800.0||968.0||430|
The data show that Clinton College (South Carolina, historically black college) is at the very bottom of the list in terms of student earnings. The second is Gallipolis Career College, and I've included an image here.
There are also a handful of schools in Puerto Rico, tribal colleges, and a few "business" schools filling the list. But something else caught my eye, from exploring the interactive version of the graph. In figure 4 I have filtered for university names that contain art, design, music, or conservatory in the name of the university. It's readily seen that most of these universities fall below $50k in earnings, and in fact the average value is $36,203.
I next used scikit-learn to predict the next colleges that are most likely to fail. My approach is to use the data about universities that have closed in order to predict which other universities which are most like them, and thus likely to fail. I trained a Random Forest model which I trained using a grid search, and which yields about 96% overall accuracy.Universities most likely to fail
|Institution name||Heightened cash monitoring||Probability closed|
|1||Southern California University SOMA||1.0||0.806655|
|2||Institute of Clinical Acupuncture & Oriental Med||0.0||0.514355|
|3||American National University-Lexington||1.0||0.429721|
|4||Ultrasound Medical Institute||1.0||0.426612|
|7||Instituto Tecnologico de Puerto Rico-Recinto d...||0.0||0.322036|
|9||Globe University-Madison East||0.0||0.301356|
|10||Globe University-Sioux Falls||0.0||0.296969|
|11||Brown Mackie College-Miami||0.0||0.289619|
|13||Oxford Graduate School||0.0||0.270720|
|16||Instituto Tecnologico de Puerto Rico-Recinto d...||0.0||0.256295|
|18||Technical Career Institutes||0.0||0.250084|
|19||American InterContinental University-Atlanta||0.0||0.244845|
|20||Los Angeles Film School||0.0||0.241422|
|22||University of Phoenix-Connecticut||0.0||0.235022|
|24||Expression College for Digital Arts||0.0||0.228275|
|25||The Art Institute of New York City||0.0||0.222006|
|27||Institute of Production and Recording||0.0||0.218016|
|28||The Art Institute of Atlanta||0.0||0.212843|
|29||Rabbinical College of Ohr Shimon Yisroel||0.0||0.212571|
To look at the importance of the different variables that are most important, I retrained the model with just one variable at a time. For example, I retrained the model using just tuition, and looked at the average score, then with faculty salary, etc. I find that the most important variables are as follows:
The last piece that I would like to investigate is what fraction of students can pay back their loans at each university, but I'll have to save it for another post. This can give us an indication of which universities are taking advantage of students who can't afford the loans that they take out.
In conclusion, these data show that rising tuition costs are far out-pacing inflation. Furthermore, they expose that the median salaries for students who attend undergraduate medical/nursing schools are amongst the highest; some even higher than Ivy-League graduates. And lastly, they show that in fact going to art or music school probably isn't the best decision if you're looking to maximize your future earning potential. The choice is yours!