"Median Years to Ph.D." in new Conference Board study of doctorate programs is not what you think!

Ed Lazowska
Department of Computer Science & Engineering
University of Washington

September 1995


The just-released Conference Board study of research-doctorate programs in the United States includes a measure for each program labeled "Median Years to Degree," which is widely interpreted to be "the median number of years that students spend in this graduate program." Don't be fooled!

In fact, what the study reports is best described as "the median number of years that elapse from when the student first enters any educational program in any field at any institution after receiving his/her Bachelors degree, until the student receives his/her Ph.D."

Suppose, for example, that a student enters a Masters program immediately after receiving his/her Bachelors degree, and graduates from this Masters program in 2 years. Then the student enters the workforce for 5 years. Wanting to make the transition to research (perhaps in an entirely different field than the Masters degree!), the student then enrolls in a Ph.D. program, from which s/he graduates 4 years later. The Ph.D.-granting institution probably feels pretty good -- cranked this student out in 4 years! But in the Conference Board study, this student will weigh in at 2 + 5 + 4 = 11 years!

This semantic confusion is one issue: we all attribute a particular semantics to the "MYD" measure, which is not at all what it actually represents. A careful definition of "MYD" would reduce confusion somewhat. But many would still be confused by the misleading title. And even if the semantic confusion could be cleared up, my opinion is that the "MYD" measure does not convey information that characterizes in a meaningful way the graduate program to which it is attached: it is not relevant to a student trying to choose between graduate programs, nor to an administrator looking for bloated programs. I might even argue that it does not represent something worth tabulating and reporting at all, and that it's confusing to do so. (Looking more broadly than individual graduate programs, I question whether "MYD" is even germane to a field as a whole, since so many factors contribute to the measure.)

Background

In the Department of Computer Science & Engineering at the University of Washington, we routinely calculate the median time that students spend in our doctoral program. This number has been stable at between 5 and 6 years for more than a decade. (We do not require a Masters degree en route to a Ph.D., so this number represents "total time in graduate school" for a student who enters directly from a bachelors program. There are no tricky semantics here -- it's exactly what you'd expect.)

We were, therefore, surprised when the new Conference Board study reported 8.19 years as the "MYD" for our program. Spurred on by MIT, which noticed a similar phenomenon, we explored further.

Our first step was to calculate the median time spent in our program for graduates in the specific years considered by the Conference Board study. We did this, using our own database, and confirmed a value in the 5 to 6 year range.

Next, with the help of our Graduate School, we obtained data directly from the NRC Survey of Earned Doctorates -- the actual data that had been used as input to the Conference Board study. (Graduating students fill out an SED form which is sent to NRC.) The SED form asks the student for a wide range of data: year of high school graduation, years of attendance at every college (including 2-year) and graduate institution where the student has spent time, full time equivalent years as a student since receipt of first Bachelors degree, etc. While there were of course a few glitches among our 60-odd graduates over the multi-year reporting interval, overall the return rate was very high and the quality of the data was very good. We calculated a variety of measures from this data, and formed a variety of hypotheses.

Finally, NRC staff provided essential assistance by re-working their calculations for our program and reviewing them with us. Without this assistance -- way beyond the call of duty -- we would still be speculating. (Data gathering and analysis for the study are the responsibility of NRC's Office of Scientific and Engineering Personnel, which looks at human resource issues across all science and engineering fields.)

What "MYD" Really Means

As noted in the preamble, the "MYD" measure in the Conference Board study, while widely interpreted to be "the median number of years that students spend in this graduate program," is in fact "the median number of years that elapse from when the student first enters any educational program in any field at any institution after receiving his/her Bachelors degree, until the student receives his/her Ph.D." In disciplines or instances where significant employment occurs between receipt of a Masters degree and entry into a Ph.D. program, the difference can be huge.

This is not a measure that we've ever calculated for our graduate program, nor is it a measure that we consider particularly germane. Surely, time spent fully employed as part of a career plan, between receiving a Masters degree from some other institution and enrolling in our graduate program, is not characteristic of our graduate program. (Pushing a bit harder, it's not even obvious that the time spent in that Masters program elsewhere is germane, since we don't require a Masters degree en route to the Ph.D., and all students, regardless of background, enter our program on an even footing in terms of the "checkpoints" of the program.)

This is by far the greatest source of the discrepancy between the "MYD" figure reported by the Conference Board study and our own intuition about our graduate program. It's worth noting, though, that even when we use the Conference Board's "MYD" definition and calculate this measure from our own database, we obtain somewhat different results than the Conference Board study. There are several secondary contributing factors which may be of interest.

First, the Conference Board study calculates an integer number of years for each student, by subtracting the calendar year of entry from the calendar year of exit. A student who enters in September of Year X and graduates in January of Year X+5 actually spent 4.33 years in the program, but will be reported as 5 -- a small but consistent effect, since most students first enroll in the fall.

(It's worth noting, in this context, how the study arrives at an "MYD" that is reported to two decimal digits. Those students who fall in the median year are considered to have graduated uniformly across that year, and based upon this, an offset within that year is calculated to two digits and reported.)

Second, students occasionally mis-code themselves. In the case of our own program, four students coded themselves as "Computer Engineering" rather than "Computer Science" and were attributed to our Electrical Engineering department in the study ... offset by four students we'd never heard of who coded themselves "Computer Science!"

Third, students who omit essential fields from the SED form must of course be omitted from the calculation. This affected a non-negligible number of our graduates.

For simplicity, this explanation has been presented in the context of the University of Washington Department of Computer Science & Engineering, but it applies to all programs surveyed in the Conference Board study.

Lessons

State definitions precisely. From the Conference Board study document, one would be unlikely to discern this definition of "MYD" and its implications.

Avoid using titles that will be assumed by many to mean something other than what is really being reported. It is better to choose a title with no obvious semantics than one with the wrong obvious semantics.

Be mindful of correct definitions when making statements. Statements in the Conference Board study document such as "It took graduates in the 1980s longer to earn a degree on average than graduates of these programs took 10 years earlier" would seem to contribute to misinterpretation.

Consider the appropriateness of measures. Understanding the definition of "MYD" will allow the community to consider if this is the most appropriate measure. The SED form includes a wide variety of data; "MYD" is the measure that the Conference Board study has chosen to calculate and use.

Don't confuse accuracy and precision. The Conference Board study reports to two decimal digits a widely-misunderstood measure with lots of fuzz in it.

Handling survey instruments is difficult. Coding errors are inevitable. If the community wants reliable analyses, we are going to have to take the time to verify that we are providing reliable data.

Acknowledgements

Jeff Dean, a graduate student in our department, noticed the anomalous figure reported for us immediately after the Computing Research Association placed the Conference Board study's Computer Science information on the Web. (Juan Osuna at CRA was responsible for this effort, and also provided much assistance in tracking things down.) John Guttag of MIT contacted me after noting the same anomaly for his program, and furnished considerable guidance.

At the University of Washington, contributions came from Frankye Jones (our staff graduate program advisor), Carl Ebeling (our faculty graduate program advisor), Dale Johnson (Dean of the Graduate School), and John Drew (Manager of Computer Services at the Graduate School).

At the National Academy of Sciences, Charlotte Kuh and Jim Voytuk of the Office of Scientific and Engineering Personnel (the organization responsible for the Conference Board study) expended a large amount of time and patience helping us understand what was going on. It's important to note the magnitude and complexity of the Conference Board study: 41 fields, 274 universities, 3,634 research-doctorate programs, 78,000 faculty members, and, by 1993, nearly 40,000 Ph.D.s awarded per year. Marjory Blumenthal of the Computer Science and Telecommunications Board also provided guidance.


Related material:


lazowska at cs.washington.edu