Do Gynecologists Suck at Doing Hysterectomies? The Messy Science Of Public Health Studies

A new study in the journal of Obstetrics and Gynecology reports that a substantial number of surgeons performing hysterectomies are only doing one of these surgeries per year.  Furthermore, the performance of hysterectomy by these so-called “very low-volume surgeons” was associated with increased risk of complications, death, and costs.

In the study, “very low-volume” surgeons were defined as surgeons that perform one hysterectomy per year.  The researchers of the study analyzed data in a New York database to investigate outcomes of all surgeons who performed a hysterectomy in that state from 2000 to 2014.

At first glance of the paper’s abstract, I immediately thought here is yet another example of the harmful consequences of a broken health care system.  This study has “sensational” written all over it. Understandably, people are going to jump all over this in outrage. Why are gynecologists who only do one hysterectomy per year allowed to operate?  The pitchforks are being raised as we speak.

But if you look at the data and methodology, the study findings have a lot of limitations, and the study results should be approached with some caution.

First of all, it is astonishing that 41% of surgeons are doing only one hysterectomy per year. But these “very low volume surgeons” are just doing 1% of all the hysterectomies.  That means of the 434,000 hysterectomies analyzed in the study, only a little over 4,000 were done by “very low-volume” surgeons.

So yes a substantial percentage of surgeons performing a hysterectomy are only doing one hysterectomy per year, however, 99% of hysterectomies are done by higher volume surgeons. This is an important point because this means that only a small number of overall complications, mortality, and costs from hysterectomies were due to “very low-volume surgeons.”  For example, the overall complication rate was 32% for “very low-volume surgeons” and 10% for “higher volume surgeons”.  This means about 43,000 complications were due to higher volume surgeons while about 1,400 complications were due to “very low-volume surgeons”.  Similarly, the patient death rate was higher in “very low-volume surgeons” (2.5%) compared to “higher volume surgeons” (0.2%), but when you look at the absolute numbers, the number of deaths was about 100 patients in the “very low-volume surgeons” versus 860 patients for “high volume surgeons”.

Second, since this is a case-control study, it can be riddled with problems of confounding variables and poor data quality.  I’ll explain more about data quality later when I use an example of Texas’ maternal mortality issue.  For an example of confounding variables, the researchers did find that “very low-volume surgeons”  were more likely to do that one hysterectomy per year because it was an emergency or an urgent need.  We know emergent or urgent surgical cases are inherently more likely to have higher risks of complications and mortality.  Also, the study authors don’t answer why these surgeons were more likely to do a hysterectomy for an emergent or urgent hysterectomy?   Did they have no one else available to assist them or do the case for them? Moreover, “very low volume surgeons” also were more likely operate on older patients and patients with more medical problems which inherently makes surgeries riskier.

Third, the authors couldn’t answer what type of surgical specialist were the “very low-volume surgeons.”  You would assume that all these hysterectomies were done by a gynecologist or gynecology specialist like a gynecologic oncologist, but this is only an assumption.  Compared to non-gynecologic surgeons,  gynecologists obviously get more substantial training in residency on performing hysterectomies.  We get even more training when we subspecialize in some of the gyn subspecialities like gynecologic oncology.  Also, in this study, the authors didn’t report if there were more non-gynecologic surgeons who performed hysterectomies in the “very low volume” group.

Finally, there is the old saying of “Garbage in, Garbage out.”  One of the significant limitations of database studies is that there can be administrative coding or classification input that can create errors.  In the state of Texas, we recently had this issue happen with our maternal mortality rates.  A study from the same journal reported that Texas had the worse maternal mortality in the country.  As result of the tragedy and embarrassment for the state, a Task Force was created by the state to look into this public health disaster.  After years of money spent, investigations into why the problem exists in Texas,  the political and media fuel that it spurred (including being the impetus of an article that I wrote for the Houston Chronicle), the Task Force found that Texas maternal mortality wasn’t the worse in the country.  In fact, Texas’ maternal mortality rate was about the middle of the pack between all 50 states.  So why was Texas’ maternal mortality not as bad as it was first reported? The Task Force reviewed the cases and found dozens of administrative coding and classification errors.  Dozens of women were erroneously identified on their death certificates as being pregnant at the time of death, misclassified by doctors and medical examiners using the states electronic reporting system.

Yes, studies like these are important and are great for getting conversations started about the safety and quality of our healthcare system, but we should take them with a grain of salt (sometimes with a lot of salt), especially when it comes to strategies for changing public and health policy.  Why are very low volume surgeons more likely to do emergent or urgent hysterectomies?  Why are they more likely to operate on older patients and patients with more medical problems?  Is it bad decision making before surgery? Is it technical proficiency issues?  Or is it something else that was not measured in the study? So if we truly want to make accurate and effective changes in healthcare policy, we need to look more closely at the data to explain why these differences genuinely exist or find more high-quality evidence. 

Leave a comment