Watson and His AI Cousins Will
Always Need Humans:
Is the Converse True?
by Emily Hamilton
Senior Vice President, PeriGen
Most of us will easily concede that computers are better at number crunching than humans. How many of us, even in our prime, can quickly complete the dreaded serial sevens test? (counting down from one hundred by sevens, a clinical test used to test mental status) .
As for higher level functions like reasoning, clinical judgment, strategic planning, creativity, empathy surely these are better achieved by humans. Well yes, but maybe not always.
This year Google’s AlphaGo defeated a human champion at the ancient game of Go, not by brute force (calculating the best of every possible move at each turn) but by using deep neural networks to learn successful and efficient strategies. AlphaGo learned its strategies by playing the game. With modern computational capacity AlphaGo was able to play more games in a day and that a human could play in a decade. Furthermore, it could remember that experience!
Chess is not medicine. What does the evidence show in medicine?
In 1954 the acclaimed psychologist Paul E. Meehl began a debate that would last more than half a century when he compared the accuracy of clinical versus statistical methods to predict patient condition.(1) His analysis, described in the book Clinical vs. Statistical Prediction: A Theoretical Analysis and a Review of the Evidence, concluded that statistical (e.g., explicit equations, actuarial tables, defined algorithmic prediction) outperformed clinical methods (e.g., subjective, informal, reasoning, clinical intuition).
Later in 2000, Grove et al published a comprehensive analysis of relevant publications on man versus machine methods. (2) Their meta-analysis included 136 published reports and compared performance of clinical and statistical methods in a wide variety of domains. Their results confirmed the findings of Meehl! Statistical methods outperformed clinical methods again.
They reported that better performance with statistical methods held across subject matter (medical, mental health, forensic, academic performance ) although the advantage was greatest in the forensic domains. The level of clinician experience did not make a difference, even when the statistical methods were compared to the best performing clinician(s). Superior results were not entirely uniform. In about half of the studies the difference was small and the clinical methods were approximately the same as the statistical methods. In about one third, the statistical methods substantially outperformed clinicians especially when clinical interviews were involved. That is, detection rates were higher by about 10% or more for predictions with intermediate accuracy. In a small minority, 6% of the studies, the clinical methods were better.
In 2006, Hilton et al reported similar findings and noted a widening gap between statistical and clinical methods when reviewing 66 years of research on the prediction of violence. (3) Reports in current medical literature differ somewhat. A recent review by Sanders et al showed more equivalence between clinical methods and statistical prediction using a wider variety of assessment measures. Only 31 studies met their inclusion criteria highlighting both the relative scarcity of complex statistical techniques in clinical use and the scientific inadequacy of the comparison methods.(4)
There are many reasons to believe that clinical judgement is better today than in previous eras
Our basic understanding of disease has improved. We have better laboratory tests and higher standards for medical evidence and easier access to information. In fact, one could argue that the clinician today has better access and better information compared to many years ago when there were few genetic markers, biomarkers and environmental conditions to consider. In fact, we may have too much information. The very same mental processes that are essential to “size up” a situation efficiently in the face of so much information can also result in erroneous decisions on occasion.
Two well-established psychological phenomena bear special mention in any discussion of medical error. Recent events or vivid anecdotes form strong and highly influential memories that can distort our perception of the real incidence or usual consequences of specific scenarios. Tunnel vision refers to the tendency to perceive and confirm information that aligns with a particular viewpoint. It includes Framing bias – the tendency to create a coherent interpretation without examining all available information and Confirmation bias which refers to seeking only the information that supports a particular opinion. Finally, too much information can actually obscure critical information. These biases and the burden of too much information are not so problematic for statistical methods.
Pitting clinical methods against computer based methods is unrealistic. “Medical reasoning” and “statistical algorithms” are both derived from real clinical data Moreover, clinicians incorporate statistical methods unconsciously when reasoning. They consider the background general incidence of the condition, typical constellations of signs and symptoms and weigh the pros and cons of potential diagnosis and treatments. Many clinicians know and use scoring systems which are essentially simplified statistical weighting methods. Statistics is but a formalized mathematical way to analyze real data and then summarize it succinctly to help us make inferences. Thus one would expect performance measures of human and clinicians to converge.
Mark Twain is often credited with writing – “Facts are stubborn things, statistics are more pliable”. But in this context, clinicians are more pliable. Clinicians can obtain and integrate information from additional sources, see exceptions to the rules, factor in patient fears and desires and even make do with missing data. Clinicians communicate with patients, reason and have empathy. However, occasionally they get tired, take risky shortcuts and must deal with competing interests. In contrast, statistical facts are stubborn things and not subject to the effects of fatigue or recent experience. At present they are not very communicative nor empathetic. Robotic companions for seniors may change our opinion.
The strengths of human and statistical methods are complementary
The objective unbiased statistical methods help to counter the potential for human bias, reduce information overload and help the seasoned clinician make more confident decisions. The idea of a clear division between clinical reasoning and statistical methods is becoming increasing blurred. The good news is that the best is yet to come and it will probably arrive on your phone.
- Meehl, P.E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis: University of Minnesota
- Grove WM, Zald DH, Lebow BS, Snitz BE, Nelson C. Clinical versus mechanical prediction: a meta-analysis. Psychol Assess. 2000 ;12(1):19-30
- Hilton NZ, Harris GT, Rice ME, Sixty-Six Years of Research on the Clinical Versus Actuarial Prediction of Violence. The Counseling Psychologist, 2006 ; 34(3):400-409.
- Sanders S, Doust J, Glasziou P. A systematic review of studies comparing diagnostic clinical prediction rules with clinical judgment. PLoS One. 2015 Jun 3;10(6):e0128233.
- Lee YH, Bang H, Kim DJ. How to Establish Clinical Prediction Models. Endocrinol Metab (Seoul). 2016 Mar;31(1):38-44.