Main

Genomic medicine (GM) uses information about a person’s genome to improve his or her health. Growing interest in GM coincides with heightened recognition of the need for better-quality evidence to support informed decisions. Therefore, the widespread adoption of GM into patient care will require high-quality evidence that it improves patient outcomes when compared with conventional care. A third trend, which may influence the research in genomic medicine, is the inclusion of study outcomes that patients say are of greatest concern to them.

This focus on patient-relevant outcomes is the hallmark of comparative effectiveness research (CER). The purpose of CER is to improve the evidence base for making decisions that are relevant to patients and other stakeholders. CER encompasses the synthesis of existing evidence and the generation of new evidence that compares alternative approaches to the prevention, diagnosis, or treatment of a health condition. In the context of genetic tests, CER is applicable both in the analysis of individual studies and when conducting systematic reviews of a body of evidence. Specifically, after establishing the analytic validity (reliability in clinical laboratory practice) and clinical validity (diagnostic or prognostic accuracy) of a test, CER is the approach for determining how the use of the test impacts health outcomes compared with an appropriate alternative (no testing or a comparison test).

The purpose of this review is to identify opportunities for CER to contribute to making the application of GM to patient care more evidence-based and more patient-centered. Previous studies have proposed a conceptual framework or focused on specific topics such as cancer tumor profiling.1,2,3,4,5,6,7,8 We summarize the findings of systematic reviews of CER of specific GM interventions (“structured review”) and use expert assessment of these findings to help identify evidence gaps (“landscape analysis”). Structured literature reviews map the literature landscape by identifying what evidence is available and assessing the findings with respect to gaps in the evidence,9 whereas landscape analysis provides an overview of a topic by combining a structured literature review with expert input.10

Methods

We abstracted and then summarized information from each included systematic review ( Figure 1 ). A technical working group (TWG) with GM expertise (the members are listed in the Supplementary Materials online) reviewed this information and suggested ways to portray the current landscape of GM. Our research questions were as follows:

  • 1. What is the evidence from systematic reviews of CER of GM?

    • a. What tests, testing indications, comparators, and outcomes have been studied?

    • b. What is the impact of GM on patient outcomes (clinical utility)?

    • c. Did the systematic reviews use standard methods to evaluate the quality of the studies and what did they conclude?

    • d. What did the reviews say about the potential clinical role of the GM interventions?

    • e. Did any studies use patient-reported outcomes (e.g., impact on activities of daily living)?

    • f. Did the reviews identify gaps in the evidence about patient outcomes? What CER might address those gaps?

    • 2. When taking into account both the evidence from the systematic reviews and expert assessments, what is the current state of CER of GM and its future?

Figure 1
figure 1

Study design. We conducted a structured literature review by abstracting information from each included review and then summarizing the results (Table 1; Supplementary Materials online). The reviews, along with interviews and assessments by the Technical Working Group, were used to develop the landscape analysis.

Structured review

We systematically identified, selected, and appraised systematic reviews conducted by technology-assessment groups (TAGs), which we defined as organizations that assess health technologies to support clinical practice guidelines or insurance coverage decisions. We limited ourselves to systematic reviews because they have several desirable characteristics. Most importantly for assessing the quality of the body of evidence (our key objective), they use widely accepted quality-assessment instruments that incorporate measures of the components of study quality. They also choose topics that are important to the public at large. For example, the Blue Cross Blue Shield Association Technology Evaluation Center (BCBSA TEC; now called BCBSA Evidence Street) performs systematic reviews for its independent Medical Advisory Panel, and other organizations11 use its reports, and the Agency for Health Care Research and Quality (AHRQ) uses public input to inform the choice of topics for systematic reviews. We focused on systematic reviews because they are comprehensive by design, bias-free in their conduct, and collectively summarize a vast body of individual studies.

We describe our approach to identifying systematic reviews in Figure 2 and the Supplementary Materials online. We first developed a list of potential TAGs based on our review of the literature and confirmed this list with the TWG. Using inclusion criteria, we narrowed this to 13 key TAGs (Supplementary Materials online). We searched TAG websites using the key words “personalized medicine,” “precision medicine,” or “genomic testing” and by reviewing report titles. We included a systematic review if it summarized multiple studies of CER that addressed GM tests (as of December 2015). We limited our search to reviews from the past 5 years, which is an empirically derived interval after which a systematic review is considered outdated.12

Figure 2
figure 2

PRISMA diagram. We conducted a structured literature review and identified 348 total evidence reviews on the Technical Assessment Websites. After screening, we included 21 in the study.

We found relevant systematic reviews by BCBSA TEC (9 reviews), AHRQ (10 reviews), and the Cochrane Collaboration (2 reviews). Only two CER reviews were older than 5 years. All 21 performed at least a Medline search, and five performed a gray literature search. All had prespecified inclusion and exclusion criteria. In all the AHRQ and Cochrane reviews, two individuals independently extracted data from individual studies. Fourteen systematic reviews described the standards used to rate the risk of bias (Supplementary Materials online).

Data abstraction

We abstracted variables chosen to reflect study objectives and approaches used to prioritize topics for CER13 and TWG input (Supplementary Materials online). We coded reviews using the test(s) included in the review as the unit of analysis, not each individual study within the review. We categorized the reviews according to clinical testing indications (e.g., cancer tumor profiling tests). Four authors (K.P., P.D., H.S., and M.D.) summarized each review. One author (H.S.) assessed the quality of the methods used in individual studies within reviews by describing the instruments used to assess quality and recording the overall quality ratings of the evidence ( Table 1 ). We assessed whether our conclusions were likely to have been different if we had reviewed individual studies versus TAGs using two approaches. First, we determined whether the TWG reviewed all key conclusions and noted any outdated findings. Second, we compared conclusions from an included TAG review of sequencing tests for prenatal screening to recent systematic reviews not included in our study because they were not TAG reviews14,15,16 as well as to key individual studies.17,18 We found that these conclusions were similar and would not have substantively changed the findings of our study.

Table 1 Reviews of comparative effectiveness research of genomic medicine (N = 21)

Synthesis of the body of evidence

We first obtained background information by conducting semistructured interviews with each TWG member and four other experts, including one patient advocate (Supplementary Materials online). Later, we asked TWG members to identify related GM tests that had not been subjected to CER and important changes in the evidence since the last study included in the 21 reviews. TWG members also commented on the CER questions. Responses reflected the individual TWG member’s opinion, not a formal process of priority setting.

Results

Table 1 lists the 21 reviews along with their stated objectives and their evidence quality ratings (details in Supplementary Materials online). Every review found methodological shortcomings and little or no evidence about the impact of GM on patient outcomes. Cancer-related tests predominated (tumor profiling n = 9; germ-line testing n = 4), whereas the other eight reviews were divided among five topics. Twelve reviews discussed the importance of patient-centered outcomes, although most noted the limited direct evidence about these outcomes. Two reviews focused on delivery of genomic information to patients (communication of risk information and approaches to risk assessment). Most reviews (81%) explicitly identified comparators, which included no testing, other genomic tests, nongenomic tests, clinical criteria, and phenotype-based risk scores.

Cancer: tumor profiling for cancer diagnosis, prognosis, and/or treatment

Clinical context. Tumor profiling means testing tumor tissue for mutations or abnormal expression of gene products (GEP) that may be driving malignant behavior. Tumor profiling may classify the patient’s probability of recurrent disease or identify treatments that target the molecular mechanism of malignant growth.

Review topics. Review topics include urine-based tests for bladder and prostate cancer; gene expression tests for prostate, breast, and colon cancer; tests for cancers of unknown primary sites; prognostic tests for common cancers; genetic tests for cancer drug metabolism variants; and molecular tests to target cancer treatment.

Review conclusions ( N = 9) 19, 20, 21, 22, 23, 24, 25, 26, 27

  • • The best evidence for a gene expression profiling (GEP) test shows that OncotypeDx, which estimates the recurrence rate after surgery for early stage breast cancer, improves predictions based on clinical prognostic factors alone. Low to moderate risk of bias studies show that lower-risk OncotypeDX results are associated with lower rates of adjuvant chemotherapy.

  • • No published trials have prospectively measured the effect of OncotypeDX testing on clinical outcomes of early-stage breast cancer (clinical utility). A randomized trial (TAILORx)28 of adjuvant chemotherapy versus no chemotherapy is underway involving patients at intermediate risk for recurrence by OncotypeDX testing.

  • • In a 2013 review,20 tests for individual mutations (KRAS, ALK, EGFR, BRAF) and expression of multiple mRNA biomarkers (OncotypeDX, MammaPrint) improved prognostication when compared to clinical predictors (clinical validity). No study directly assessed whether these tests change downstream health outcomes, although OncotypeDX results did affect treatment decision making.

  • • A 2013 review25 of multigene panels to detect targeted therapy opportunities in advanced cancer found three prospective-retrospective studies using archival tumor samples. One compared outcomes of therapy matched to a single panel biomarker with unmatched therapy, whereas two had no controls.

  • • Studies should measure whether genomic tests lead to better clinical outcomes than alternative prognostic methods during different stages of common cancers.

Landscape analysis. The evidence base for using genomic tests to individualize cancer care is small. Good-quality evidence shows an effect on treatment choice in one test–cancer combination (OncotypeDX for recurrence of early-stage breast cancer). Furthermore, the MINDACT29 and TAILORx28 studies showed that the Mammaprint and OncotypeDX GEP panels predicted which early-stage breast cancer patients with low risk scores may be managed with endocrine therapy and avoid chemotherapy.29 The outcomes of these large trials involving breast cancer have been overall survival, disease-free survival, and survival without distant metastases. However, the RxPONDER trial (evaluation of OncotypeDX in node-positive patients) did involve patient advocates in the design of the study and includes quality of life and other patient-reported outcomes.11

Although GEP tests can estimate prognosis (clinical validity), CER should assess its effect on clinical outcomes (clinical utility). CER could also focus on colorectal, lung, skin, and hematologic cancers, for which tests have been developed and could be useful.30 The one review that assessed multiple molecular markers to target treatment found no high-quality evidence and highlighted the methodological complexities of designing unbiased studies to measure clinical utility.25

Cancer risk assessment

Clinical context. Detecting genes that are associated with an increased probability of developing cancer or harboring an undiagnosed cancer may lead to more intense surveillance or early treatment.

Review topics. Review topics include fecal DNA testing for colorectal cancer risk and genomic risk assessment for breast cancer risk.

Review conclusions ( N = 4) 31, 32, 33, 34

  • • Fecal DNA testing can detect colorectal cancer and large adenomas that are likely to become malignant. The only study of a currently marketed test measured sensitivity and specificity (clinical validity), but not the added impact of testing on clinical outcomes.34

  • • Short-term patient outcomes such as reduced distress and accuracy of perceived risk improve after breast cancer risk assessment.

  • • An intact chain of evidence leads from a strong family history of breast cancer to GM testing for BRCA1/2 to better outcomes after prophylactic bilateral mastectomy for BRCA mutation carriers,35,36 but no studies have directly measured the impact of BRCA testing on health outcomes. Existing studies lack real-world settings or diverse at-risk populations.

  • • Studies should examine consequences of testing for individuals and families, including acceptability to patients, adherence to screening, delivery of genomic testing, and models to estimate the incremental net benefit of testing and optimal testing intervals.

Landscape analysis. The list of inherited genetically determined variations in cancer risk is growing, as is the demand for genomic approaches to risk assessment. The included systematic reviews found no studies of clinical impact on disease outcomes. Targets for CER include family history and BRCA testing to identify breast/ovarian cancer risk and family histories of colorectal cancer and testing for mutations in the five genes associated with Lynch syndrome.30 The availability of several genomic tests for the same cancer presents opportunities for head-to-head comparisons. In addition to clinical outcomes, these studies should evaluate the acceptability of the tests and follow-up rates after abnormal test results.

Chronic conditions including neurodevelopmental delays

Clinical context. Among children with delayed cognitive development, identifying a gene that is associated with a specific condition has potential benefits, including informed reproductive decision making, a firmer prognosis, access to needed services, avoidance of unnecessary testing, and, possibly, improved health outcomes.

Review topics. Review topics include testing for developmental delay, intellectual disability, and autism spectrum disorder.

Review conclusions ( N = 2) 37, 38

  • • Observational, noncomparative studies have measured the yield of chromosomal microarray testing for gene copy number variants (which are more common in developmentally disabled children) and the actions taken by families. The effects of these actions and whether they would have occurred without GM testing are unknown.

  • • No study has compared the clinical outcomes of genomic testing for neurodevelopmental disorders with no testing.

  • • Comparative studies should compare positive and negative outcomes important to patients and their families, including the impact on reproductive decision making.

Landscape analysis. Chronic conditions have a high health burden and impact family members. Testing could potentially be useful for a large number of chronic conditions, but our selection of reviews addressed only neurodevelopmental delays in children (two reviews). Tests for familial hypercholesterolemia30 offer other opportunities for CER studies of the incremental benefits and harms of genomic testing. Because genes related to chronic diseases typically have low penetrance and high heterogeneity, assessing the impact of genomic testing on health outcomes will most likely require very large populations, suggesting the need for disease registries. Although the protracted course of chronic disease implies the need for long follow-up, the effect of genomic testing could lead relatively quickly to interventions that affect short-term patient-centered outcomes.

Pharmacogenetic testing

Clinical context. The metabolism of some drugs (conversion to an active form or an inactive form) is under genetic control. Mutations in the genes for enzymes that metabolize drugs may lead to too much or too little active drug when physicians prescribe standard doses. A pharmacogenetic test aims to detect a genetic basis for differences in the response to a drug.

Review topics. Review topics include testing for CYP2C19 genetic variants to guide antiplatelet therapy in coronary artery disease (clopidogrel) and testing for CYP2D6 genetic variants to guide tamoxifen therapy for women at high risk for primary breast cancer or recurrence.

Review conclusions ( N = 2) 39, 40

  • • A heterogeneous body of evidence regarding the effects of testing for CYP2C19 variants on clinical outcomes is insufficient to show that testing or CYP2C19 status alters cardiovascular event rates. Studies were small, had short-term outcomes, and seldom reported clinical outcomes.

  • • Trials of the impact of CYP2D6 testing on breast cancer outcomes have not been performed, probably because the evidence that variants in the CYP2D6 gene affect clinical outcomes is observational, inconsistent, and of only moderate quality.

  • • Studies should focus on standardizing testing methods and directly comparing the impact of testing strategies on patient-relevant clinical outcomes in large trials.

Landscape analysis. The two reviews identified only one small trial that measured the impact of pharmacogenetic testing on clinical outcomes. Since the completion of the systematic review, the National Heart, Lung, and Blood Institute and Mayo Clinic have been sponsoring a large pragmatic trial (TAILOR-PCI) to evaluate use of CYP2C19 genotyping to guide the choice between clopidogrel and ticagrelor. The purpose of this study was to determine whether using pharmacogenetic (PGx) testing to guide anti-platelet therapy improves cardiovascular outcomes following coronary stent placement in patients with impaired activation of clopidogrel.41 There are many other opportunities to conduct CER studies of PGx tests because nearly 200 drugs contain pharmacogenetic information.42 For example, three high-quality trials published in 2013 showed that, overall, a clinical algorithm was a better approach than PGx testing for dosing either warfarin or acenocoumarol and phenprocoumon and achieving desired states of anticoagulation.43,44,45 PGx testing might impact several conditions (e.g., infectious disease, mental health conditions, Stevens-Johnson syndrome).46 Further opportunities for CER include comparing PGx testing before starting treatment versus prescribing without testing. This strategy involves having access to PGx data at the point of care and requires access to clinical decision support but overcomes many of the logistical hurdles that have limited clinical integration of PGx testing.47

Prenatal screening

Clinical context. Aneuploidies (an abnormal number of autosomal or sex chromosomes) often lead to developmental abnormalities, the best known of which is Down syndrome (trisomy 21). Until recently, noninvasive tests for aneuploidies were not specific enough to act upon and therefore required tissue sampling for confirmatory karyotyping. Sequencing DNA from maternal serum can detect fragments of fetal DNA, which are present as early as 8–10 weeks of gestation.

Review topics. Review topics include sequencing-based tests for fetal Down syndrome (trisomy 21) and other aneuploidies.

Review conclusions ( N = 2) 48, 49

  • • The goals of the studies were to measure sensitivity, measure specificity, and measure posttest probabilities, but not to measure the clinical impact of fetal DNA testing versus other noninvasive tests for aneuploidies.

  • • Tests for aneuploidies in maternal serum are nearly 100% specific (rare false-positive results) and more sensitive than conventional tests. The study’s risk of bias was low.

  • • The posttest probabilities of aneuploidies after positive and negative test results were better than conventional screening tests, thereby providing indirect evidence of better clinical utility for DNA sequencing–based tests for trisomy 21, trisomy 18, and trisomy 13.

  • • No studies measured patient-centered outcomes.

  • • Future studies should compare all potential screening strategies and prospectively examine outcomes in average-risk populations and screening for other chromosomal abnormalities.

Landscape analysis. Prenatal screening for fetal chromosomal abnormalities is considered the standard of care in the United States for women at high risk. Because of its very high specificity, cell-free DNA screening is a strong alternative to testing for maternal serum markers and fetal ultrasound and should result in fewer invasive procedures to perform karyotyping. CER studies could compare parent preferences for further testing after learning the results of DNA testing versus conventional noninvasive testing, explore ethical questions, and study patient, family, and system effects of expanding prenatal testing to average-risk populations.

Population screening for risk assessment

Clinical context. DNA-based tests can detect genes that increase the risk of diseases like breast cancer, lung cancer, or diabetes. In some cases, detecting such genes could lead to preventive measures (e.g., bilateral mastectomy for carriers of BRCA mutations, which increases the risk of breast cancer) or changes in risky behaviors.

Review topics. Review topics include effects of communicating DNA-based risk estimates on risk-reducing behaviors.

Review conclusions ( N = 1) 50

  • • Larger and better-quality randomized, control trials should examine patient-centered outcomes, including behavior and unintended adverse effects. Sample sizes should be large enough to detect small effects.

  • • Poor-quality evidence suggests that communicating genetic disease risk has little or no impact on smoking or physical exercise.

  • • No studies had a low risk of bias or compared genomic test and conventional risk factor counseling versus counseling alone.

Landscape analysis. Although risky behaviors are a major health threat, current evidence does not support claims that DNA-based risk assessments motivate behavior change. The key unanswered question is the added effect of genomic risk information on clinical outcomes when combined with and compared to conventional testing, family history, clinical risk factors, and treatment. Such studies require a comparison group that receives conventional care but no genomic testing. Several randomized trials involved in the only systematic review that focused on provision of genetic information to reduce risk behaviors (e.g., smoking, diet, physical activity) did meet this study design criterion but were of low quality. The best approach to assessing the impact of genomic testing on risky behaviors such as cigarette smoking may be to study screening for mutations in highly penetrant genes in patients with a target condition for which effective interventions exist. Such studies should examine whether genomic information has an incremental impact on behavior and outcomes by using carefully designed comparators.

Whole-exome sequencing and whole-genome sequencing and testing for rare diseases

Clinical context. Disorders that present with multiple anomalies, often early in life, suggest a mutation in a single gene. These disorders can be difficult to diagnose because of their rarity, nonspecific phenotype, and lack of a well-defined pathway to a diagnosis. Establishing a genetic cause can lead to specific treatment and to establishing the carrier status of family members.

Review topics. Review topics include sequencing for disorders caused by a single gene.

Review conclusions ( N = 1) 51

  • • Sequencing may end “diagnostic odysseys” for patients and inform reproductive decisions, but the only information regarding clinical utility is anecdotal (no systematic study of clinical impact).

  • • No studies of the broader uses of sequencing tests.

  • • Ethical issues, such as the consequences of pursuing secondary findings, require study.

Landscape analysis. Opportunities for CER studies may increase as the indications for whole-exome sequencing (WES) and whole-genome sequencing (WGS) expands. Testing only to diagnose a suspected rare disease will have a small public health impact due to low prevalence and lack of treatments. Evidence of clinical utility is lacking for the broader use of WES and WGS beyond diagnosis of rare diseases. The American College of Medical Genetics and Genomics guidelines on the return of secondary findings from WES and WGS list some potentially important tests for CER studies such as detection of genetic familial hypercholesterolemia.52,53

Discussion

In summary, we found a very limited body of evidence about the effect of using genomic tests on health outcomes and many evidence gaps for CER to address. Like the systematic reviews, we defined clinical utility as improved health outcomes triggered by a test result and mediated by changes in patient behavior and clinical decisions. We also found a lack of evidence for effects on intermediate outcomes (e.g., avoiding unnecessary care, improving access to services, or providing prognostic or predictive information), which negates efforts to build a chain of evidence from test results through these intermediate outcomes to altered clinical outcomes.

Key implications of our findings for research and policy are presented here.

Address important questions

Two important decisions relevant to CER involve whether to perform a test and what to do with patients at intermediate risk or with uncertain test results. The first decision addresses the question “Is the test result likely to change the treatment plan suggested by my clinical assessment?” A CER study design might compare health-related outcomes after the clinical assessment alone with outcomes after performing the assessment plus a GM test. The second decision is in regard to what to do in ambiguous situations. A study design might randomly assign patients with intermediate-risk test results to receive either low-intensity treatment or high-intensity treatment.

Diversify the evidence base

Most of the reviews focused on cancer; however, many oncology-specific GM tests known to have promising evidence supporting their use in clinical practice were not included in the reviews because of either their scope or their timing. In addition to oncology, active areas of research in neurology, psychiatry, cardiology, and rare disorders suggest opportunities for future studies and evidence syntheses.54 The Centers for Disease Control and Prevention’s Office of Public Health Genomics categorize genetic tests in three tiers according to whether they have a base of synthesized evidence.30 The agency’s list of tests categorized as “Tier 2” would be attractive targets for future CER studies because they are mentioned in clinical practice guidelines or Food and Drug Administration labeling, but the supporting evidence is insufficient to guide clinical use. Finally, even the best studies focused on how test results affected decision-making by clinicians rather than by patients. Missing are high-quality studies comparing the effect of different models for communicating DNA-based disease risk estimates on patient motivation to take appropriate actions.

Use established methods to improve the evidence base

We found only a few randomized trials designed to assess the impact of GM testing on patient outcomes. However, observational studies (cohort, retrospective-prospective, single arm) and use of indirect evidence (modeling) can also compare clinical utility, albeit with less certain results. Studies to establish analytic and clinical validity should precede studies to measure clinical utility to ensure that study participants are exposed to accurate and reliable tests that could change patient care. Many of the included tests had good evidence regarding analytical validity but lacked a chain of evidence leading from test results to clinical outcomes. Several publications have provided methodological guidance for studying genomic tests.55,56,57 GM has some characteristics that may require more complex analyses than other types of interventions (e.g., analyzing the impact of inherited mutations on family members), although conventional study designs can still be used. Using established methods could reduce concerns that GM is subject to higher evidence standards than other interventions.

Include outcomes that matter to patients

For genomic tests to become routine practice, using them should favorably impact outcomes that matter to patients. To determine whether GM is fulfilling its promise, patient-centered CER should compare GM tests plus non-GM risk assessment versus non-GM risk assessment alone, other GM tests, or both. Research team leaders should use real-world settings and seek advice from patients about which outcomes matter most.58

Use consistent and unbiased study methods

In the future, the evidence for individualized patient care will increasingly come from statistically reliable subgroup analyses derived from patient-level meta-analyses. These meta-analyses require many large, high-quality studies. Some evidence gaps in the systematic reviews were due to poor-quality research that could not support any conclusions. Moreover, the studies used diverse study designs, tests, study populations, and outcome measures, which would make it difficult to draw conclusions from a pooled study population. To raise the standard of evidence, GM researchers should cooperate to establish study quality standards.

Keep pace with technology changes

As GM evolves beyond single-gene tests, CER studies must keep pace. Only two of our systematic reviews covered multigene tests, WES, and WGS. Multigene tests are complex and evolving. A recent National Academy of Medicine report noted the obstacles and recommended several approaches to address them.59 Tumor-agnostic, biomarker-driven clinical trials such as NCI-MATCH, SWOG’s LUNG-MAP, and the American Society of Clinical Oncology’s TAPUR are evidence of progress. Recent publications about WES in rare diseases, cancer, and complex disorders suggest the need to address current evidence gaps about personal and clinical utility soon.53,60

Our review suggests opportunities to close important evidence gaps about using genomic tests; however, additional methods-development work is needed. Methodological topics that need ongoing stakeholder-informed dialogue include the following.

Patient preferences

The purpose of CER is to design and conduct studies that meet the information needs of patients, clinicians, and policymakers facing decisions. Engaging these stakeholders as partners in all phases of the research process should ensure that future CER meets their information needs. Meaningful engagement of patients in GM CER will require attention to numeracy and genetic literacy and the willingness to engage in shared decision-making about performing genomic tests and their interpretation. For example, noninvasive prenatal screening tests for fetal chromosomal abnormalities (i.e., trisomy 21, 18, and 13) have become standard clinical practice on the basis of their clinical validity and private insurer coverage.48 However, questions remain about whether clinicians and patients fully understand the risks and benefits of using these tests, which also detect sex chromosome aneuploidies and microdeletions, as compared to alternative screening methods that do not detect them. Also, we lack research about how patients and clinicians make decisions after positive test results and about between-partner concordance about pregnancy termination decisions in patient subgroups stratified by maternal risk, race, ethnicity, or socioeconomic status.61,62 Another issue is that outcomes of WES and WGS may lead to gains in well-being that go beyond the impact on morbidity and mortality (e.g., reassurance or providing a diagnosis for an untreatable condition).51 GM tests may also harm. CER questions to address might include whether WES tests increase personal well-being beyond conventional testing and genetic counseling and under what circumstances. The answers to these questions might inform the debate about whether to consider personal utility when developing practice guidelines and coverage policies.

Behavior change strategies

Changing risky behaviors is difficult, and the motivation to change is part of the problem.50 Of particular relevance to common, chronic conditions is whether current risk assessment approaches such as obtaining family histories are sufficient or whether taking the next step and performing genetic testing lead to better health outcomes.38 In the field of cardiovascular disease, CER questions involve the following: the net benefits to patients, families, and society from adding genetic information on cardiovascular disease to conventional risk assessment approaches, such as family risk history and clinical risk scores, and whether providing genetic information in addition to information about smoking, hypertension, and hypercholesterolemia leads to adopting a healthier lifestyle or more comprehensive preventive therapy and, ultimately, better cardiac outcomes. Trials to date have shown little or no effect on health-risk behaviors.

Value

With decreasing costs of next-generation sequencing, proponents of multigene panels have argued that it is more efficient to test for multiple mutations simultaneously rather than to conduct multiple single-gene tests. High-quality evidence is lacking.63 Advocates for using multigene panels for tumor profiling argue that future treatment advances depend on an understanding of tumor biology gained through tumor biomarker panels, not conventional classifications based on tumor histology, grade, and stage.64 Key questions include the optimal gene panel size, how to select treatments based on presumed driver mutations, and the role of tumor genome heterogeneity.25,65,66 CER could address the key question regarding how often patients are matched to potential treatments using large, tumor-agnostic multigene panels and the clinical outcomes versus tumor-specific single-gene testing and clinical factors alone.

Our study had limitations. We examined systematic reviews rather than individual CER studies, and we selected systematic reviews commissioned or conducted by TAGs. Thus, our results do not represent the entire universe of CER studies of GM. This strategy did mean that experienced teams of systematic reviewers performed the included reviews using standardized definitions of study quality, which was important to one of our principal goals: to form credible judgment regarding the quality of CER for GM. Although the TWG found the conclusions of the reviews to be representative of the broader literature, we did not systematically search for recent articles on these topics. Although we included a range of TAGs in our search, 2 of them (AHRQ and BCBSA TEC) produced all but 2 of the 21 included systematic reviews and therefore provide a predominately United States–focused perspective. Finally, to address our study objectives, we had to categorize and summarize across disparate reviews and augment the review findings with expert opinions. Therefore, readers should not consider our results definitive. Still, the included systematic reviews cover most of the spectrum of clinical topics addressed by GM. Future research should examine the cost-effectiveness and budget impact and how evidence can be aligned with the current use of tests in the best way possible.

In conclusion, our findings can inform decisions about where to focus future research and policy initiatives. Over the next few decades, patients, clinicians, and policymakers will be asking whether the added information provided by GM leads to better health outcomes than using conventional clinical information, such as family history and non-GM tests. CER is the research design for answering these questions.

Disclosure

This study was funded in part by a National Human Genome Research Institute grant (R01HG007063 to K.A.P.) and a National Cancer Institute grant to the UCSF Helen Diller Family Comprehensive Cancer Center (5P30CA082013). In addition, research reported in this publication was partially funded through PCORI contracts (to K.A.P., P.A.D., S.R.T., and L.A.O.). H.C.S. is an employee of PCORI. The other authors declare no conflict of interest.