Controversy is no stranger to electronic fetal monitoring (EFM). Few technologies in medicine can claim association with such a wide range of professional reaction. At different times EFM technology has been inflated, berated, debated and sometimes obfuscated. Despite controversy, it remains a mainstay of intrapartum care, suggesting that clinicians find its benefits outweigh its disadvantages.
Clinical benefit is supported by many encouraging reports show falling rates of intrapartum-related neonatal encephalopathy (NE).(1-7) A myriad of factors influence complicated outcomes such as NE and it would be incorrect to attribute the improvement to any single one. That said, several large jurisdictions with population-wide statistics have shown steady declines, often reaching reductions in the range of 40% per decade. Figure 1 shows declining numbers of births with intrapartum-related neonatal encephalopathy by decade in several regions. (7)
Figure 1. Regional trends in the numbers of births with intrapartum-related neonatal encephalopathy
Improvement in outcome is not likely to be related simply to better recording of the fetal heart rate. EFM monitors have changed little in the past thirty years. In contrast, clinical practices today are very different from those in the mid 1980’s when the largest randomized clinical trial (RCT) on EFM was reported.(8) During that study, perinatal hypoxic death or newborn seizures occurred at an astounding rate of approximately 1 in every 225 births and primary cesarean rates were around 2.4%. (8) Today electronic fetal monitoring is used in the vast majority of births in hospitals. In high-income countries, the rate of NE is around 1.5 per 1000 births. Preventable intrapartum stillbirths are almost eliminated and cesarean rates often exceed 30%.
Over the last few decades we have come to realize that that human factors and system failures play a substantial role in adverse outcomes across all of branches medicine.(9) Human actions, such as delayed recognition of tracing abnormality or delayed intervention, are reported to have occurred in approximately half of birth-related asphyxia injuries.(10-14) Drawing from aviation and military experience, we have adopted their models to build less error-prone health care systems. Consequently policies and procedures to redress the reasons underlying human error are now widespread.(15-18) In middle- and low-income countries, improving the national socioeconomic status and access to basic and safe healthcare is strongly associated with improved health outcomes.
In order to better understand the relationship between the EFM and the improvement in NE it is helpful to review the timeline of advances in fetal monitoring.
The invention of the stethoscope is generally attributed to René Laennec in 1816. He listened to adult heart tones using a roll of paper and later fashioned a wooden tube-like instrument.
“Ear trumpets” (flared funnel shaped devices used to concentrate sound waves) had been used as hearing aids for more than a century. His internist friend, Jean Alexandre Lejumeau, Vicomte de Kergaradec, tried this device to listen to the “noises” of amniotic fluid but instead he heard fetal heart tones. His questioning merits translation: “From the changes occurring in the strength and rate of fetal heart beats, wouldn’t it be possible to know about the status of health or sickness of the fetus?”(19, 20) How prescient considering that we ask the same question nearly 200 years later!
Despite the challenges of disseminating medical information in that era, the significance of fetal heart tones and heart rate as an indicator of fetal status was recognized relatively rapidly. By 1833 Dublin physician Evory Kennedy published a monograph on auscultation of the fetal heart, commenting on the ominous heart rate pattern of “slowness of its return when a contraction is passing on,” the effects of head or cord compression on heart rate, and the significance of meconium stained amniotic fluid.(20, 21) The first stethoscope designed specifically for fetal heart tones was created 79 years later by Adolphe Pinard in 1895. Another half century passed before reports appeared describing electronic methods to detect the fetal heart rate. Figure 2 shows a timeline of three phases and key events in the evolution in electronic fetal monitoring.
Figure 2. Historical phases in the evolution of electronic fetal monitoring
Spanning two decades, this first phase saw the creation of the basic electronic sensors and equipment to measure the fetal heart rate (FHR) and contractions. By 1968 the first commercial fetal monitor device was released using phonocardiography and external tocography. Phonocardiography quickly gave way to direct scalp electrode for the measurement of FHR followed by the appearance of ultrasound-based sensors.(22)
During the next three decades, our understanding of human fetal physiology grew extensively. With measurement now possible it became feasible to observe fetal heart rate patterns during pregnancy and in labor. Using animal experimentation with pharmacological blocking agents, vessel cannulation/occlusion and nerve transection studies researchers elucidated how the fetal heart was regulated. Retrospective analysis of tracing in labors with adverse outcomes or other “natural experiments” like anencephaly or severe fetal anemia revealed other insights. (Important cardiac control mechanisms are discussed in more detail in the upcoming paper, EFM Basics: Physiology.) During the 30-year physiology phase, over 8,000 publications are found in PubMed using the search terms fetal heart rate. An outstanding summary and organization of electronic fetal monitoring literature was compiled by the Royal College of Obstetricians and Gynaecologists in the UK and remains an excellent resource today (23)
As information became available from diverse research teams and electronic fetal heart rate monitoring adoption rose, an assortment of new terms followed. By 1971 the first international conference on monitoring of the fetal heat convened and developed consensus on related terminology. Unfortunately, their conclusions were never published. Over the years terminology was revisited by several professional groups.(24-28) Measuring the FHR, labeling patterns, and understanding the relevant physiology did not in themselves improve outcomes. However, they were the foundation upon which all guidelines directing clinical management were built.
Clinical guidelines for electronic fetal monitoring management generally have two components:
- A graded classification defines the degree of tracing abnormality which in turn ordains the nature and urgency of intervention.
- A variety of EFM scoring systems were developed for both the antepartum and intrapartum periods. The newly introduced term reactive antepartum tracing gave rise to no less than 21 different definitions of a reactive test!(29) Early methods for the more difficult task of intrapartum classification used scoring systems. Points were awarded for various features with a maximum possible score of 10. Higher scores were more reassuring than lower scores. In practice scores were generally grouped into three levels so in effect 3-level classification systems have been is use for about 40 years.
Early classification systems published by Krebs et al in 1979 and FIGO in 1987 are summarized in Appendix 1. (30, 31)
Four modern classification systems place more reliance upon minimal baseline variability as a deciding factor, specify more details on the size, number and type of decelerations that are of concern and recommend clinical actions. For ease of comparison, they are also summarized in Appendix 1.(32-35)
The clinical goals of electronic fetal monitoring are to identify fetuses with increased risk of hypoxic injury so that intervention can avoid adverse outcome without also causing excessive numbers of unnecessary interventions. Thus, it is important to measure how often the classification method detected as abnormal the tracings of babies who showed hypoxic injury and how often it did the same in babies who were born without a problem. These measures are the hallmarks used to assess the performance of a diagnostic method. However the goal of EFM is prevention of illness and not diagnosis. It is not straightforward to measure the performance of a prevention technique such as EFM for two important reasons.
The first reason is related to “intervention paradox”. The fundamental measures of performance of a diagnostic are sensitivity (% of unhealthy patients with a positive test) and false positive rates (% of healthy patients with a positive test). With EFM we have an intervention paradox. When the EFM is positive, intervention can prevent the illness from occurring. The intervention paradox is that a positive test is now accompanied by a healthy outcome due to successful clinical intervention. We rarely know with certainty in this situation if a bad outcome was truly averted or if the EFM-based indication was just a false positive. Unless there is a test that can indicate that a bad outcome was impending, intervention paradox will cause us to underestimate sensitivity and overestimate false positive rates. Thus we should keep in mind that sensitivity and specificity are very conservative estimates of the value of EFM.
Intervention paradox should not prevent us from measuring sensitivity and specificity. These fundamental performance measures remain useful especially when comparing EFM classification systems done on the same dataset because they are subjected to the same limitations. Both high sensitivity and high specificity are desirable for different reasons. High sensitivity is desirable because hypoxic injury can have devastating long-term consequences. High specificity is desirable because normal outcome is far more common than hypoxic injury and many interventions based on a high false positive rate will create a major health care burden with no benefit. Summing sensitivity and specificity is a simple way of combining both measures with equal weight in order to compare the classification methods. A perfect test would have a score of 200; a test that was no better than chance would have a score of 100.
Table 1 shows a summary of performance for a variety of classification methods.(36-40) The sensitivity and false positive rates are shown graphically in Figure 3. The studies are not comparable in terms of adverse outcome studied. Some examined HIE, others used various levels of acidemia. Some used visual analysis and others used automated methods. Nevertheless the results show an interesting pattern.
Figure 3 demonstrates the limitation of rule-based classification methods using EFM and EFM classical features. High sensitivity can be achieved only at the expense of a high false positive rate. The general relationship of sensitivity and specificity is shown by the solid line. Any new technique with a true advance should show performance levels much higher and to the left of this line. The best performing method is represented by the spot that is closest to the upper left corner of the graph.
Figure 3. Graphical representation of the relationship between sensitivity and false positive rates for several FHR classification methods
The second reason measuring the performance of electronic fetal monitoring is challenging relates to the use of clinical outcome as a measurement. Clinical outcome results from the cumulative effect of several steps namely beginning with signal acquisition, followed by diagnosis, and ending with timely and effective intervention. A deficiency at any step can negate all the benefit of a previous step. Likewise performance at an early step can hamper all subsequent steps. Several studies have reported that human actions, such as delayed recognition of tracing abnormality and/or delayed intervention, were present in large percentages of cases with asphyxial injuries.(10-14) Thus, clinical outcome measures reflect the entire process instead of any single step. Reflect upon the similarity of the basic electronic fetal monitor and the classifications systems used in the 1980s to their counterparts in place today. Contrast this with the rate of perinatal death or HIE of 1 in 225 in the 1980s to 1.5 per 1000 commonly seen today. Something happened to improve outcomes.
The patient safety movement has helped us understand the causes of error-prone health care systems. Some mitigation is directed at system vulnerabilities, such as legislation to limit working hours, recommendations on staff-to-patient ratios, setting standards for availability of obstetricians and operating room facilities, simulation training for emergency procedures and formal feedback on performance to clinicians.(15-18) Other actions are very specific to electronic fetal monitoring: Standardized nomenclature; graded classifications of abnormality; and formal guidelines for clinical management. Finally, improved understanding about the clinical significance of associating some fetal heart rate patterns with outcome helps clinicians respond better. For example defining the correlation between rates of fetal death or HIE and the interval between persistent bradycardia and delivery is crucial to establishing desirable response times.(41)
Another advance has been the growing realization that small changes in critical areas can have a major impact. Simple checklists, limited to a few top items have been associated with dramatic improvements across diverse medical specialties. Some of the most notable checklist achievements are highlighted in Figure 4.(42)
Figure 4. Falling complication rates associated with clinical checklist use
In obstetrics, compliance with a simple checklist of six EFM related conditions to trigger discontinuation of oxytocin was associated with significant clinical benefit. In a study spanning four years and 14,398 term inductions, outcomes were compared in patients with and without checklist compliance.(43) Compliance was associated with fewer NICU admissions (4.4% vs 2.9%) and a lower primary cesarean rate (18.8% vs 15.8%). Given all of the limitations of EFM, usage consistent with management guidelines was associated with improved outcomes.
Today clinicians rely largely upon visual inspection of electronic fetal monitoring (EFM) tracings to assess fetal tolerance to labor. Rule-based classification methods are limited since they require a tradeoff between sensitivity and false positive rates. The EFM features in the rules are constrained to what the human eye can recognize and measure. Human assessment is inconsistent, especially when assessments are carried over long periods of time in the presence of clinician fatigue or distractions. Low incidence-high consequence medical problems are among the most challenging for clinicians, especially when false positives are common. A FHR aberration is almost always associated with a good outcome from the perspective of a front-line clinician. Considering all of these impediments it is quite remarkable that HIE levels have fallen, albeit at the cost of high cesarean rates.
Active exploration in three areas is likely to advance the quest to reduce birth-related injury.
- Organizational and human behavior science will help us design and sustain higher quality healthcare systems. However. even a perfectly functioning labor & delivery unit cannot overcome the basic limitation of current EFM guidelines to identify and intervene on behalf of fetuses with increased risk of hypoxic injury without excessive numbers of unnecessary interventions.
- New sensors could provide a better or more direct measure of fetal cerebral state with respect to impending hypoxic injury. Experience with fetal oxygen saturation techniques and ST segment analysis of the fetal ECG have been disappointing to date.
- Better analysis of existing electronic fetal monitoring signals using approaches that can measure components that are not readily measurable or visible to the human eye.
We are entering a new phase in the evolution of fetal surveillance. Historically EFM research was stymied due to our inability to access and analyze the large data sets. We have entered the “big data” era in obstetrics. Hospitals store vast amounts of clinical data and digital EFM tracings. We can now analyze these digital EFM signals directly, measure standard EFM features as well as components and relationships that are not readily visible to humans. As these impediments are overcome we are seeing a resurgence of research producing a better understanding of which electronic fetal monitoring characteristics are truly predictive of neonatal depression or metabolic acidosis and a search for new sensors.(44-46)
We all look forward to the next phase in the evolution of fetal surveillance where we will have better healthcare delivery practices and better technology to help with the difficult challenge of assessing the fetal status during labor.
Full Article PDF | Appendix | References