HomeCirculation ResearchVol. 108, No. 4Thought Exercises on Accountability and Performance Measures at the National Heart, Lung, and Blood Institute (NHLBI): An Invited Commentary for Circulation Research Free AccessResearch ArticlePDF/EPUBAboutView PDFView EPUBSections ToolsAdd to favoritesDownload citationsTrack citationsPermissions ShareShare onFacebookTwitterLinked InMendeleyReddit Jump toFree AccessResearch ArticlePDF/EPUBThought Exercises on Accountability and Performance Measures at the National Heart, Lung, and Blood Institute (NHLBI): An Invited Commentary for Circulation Research Michael S. Lauer, MD Michael S. LauerMichael S. Lauer From the Office of the Director, Division of Cardiovascular Sciences, NHLBI, Bethesda, MD. Search for more papers by this author Originally published18 Feb 2011https://doi.org/10.1161/RES.0b013e3182125662Circulation Research. 2011;108:405–409Recently, popular magazines and newspapers broadcast skeptical headlines, such as, “Desperately Seeking Cures”1; “A Decade Later, Genetic Map Yields Few Cures”2; “Faltering Cancer Trials”3; and “Grant System Leads Cancer Researchers to Play It Safe.”4 Some patients, physicians, advocacy groups, journalists, scholars, and policymakers openly wonder about the value of billions of dollars in government-supported research. During the past 10 years, the NIH budget has doubled, yet cardiovascular disease, while decreasing in incidence and severity, is still the leading cause of death and cancer incidence and death rates have declined little. Along with “Where Are the Cures?” critics ask “Who Is Accountable?”5Interest in Measuring the Performance of ScienceThere is a rapidly growing body of scholarship on the metrics of science, methods by which scientists, scientific organizations, policymakers, and funding agencies can gauge the value and impact of their work and investments.6 To assess the impact of the 2009 American Recovery and Reinvestment Act (ARRA), the National Science Foundation, the National Institutes of Health, and the White House Executive Office are engaged in an ambitious “STAR METRICS” project.7 Metrics often focus on publications and citations (“bibliometrics”) but also consider commercial products and quantitative impacts on practice or intellectual thought. Although scientists are typically eager to publish their work and see it cited, many worry about seeing their value to society, employers, and funders graded by black-box numbers.8 Some professional groups have harshly criticized bibliometrics, noting their inherent flaws and the risk that the act of measuring science will damage innovation, which is at the heart of the scientific enterprise.9At NHLBI, we increasingly recognize that we need to be held accountable for our performance, which means that it is incumbent on us to assess the performance of the scientists and projects we choose to support. It is a genuine challenge, however, for any nonprofit organization to assess its performance, because we cannot look to readily available metrics, such as revenues, profits, and stock prices. As in any business, it is important to align investments with expected returns. U.S. taxpayers are investing more than $30 billion a year in the NIH and expect more concrete measures than advancement in biomedical knowledge and a fiscal accounting of funds allocated to research. Many scientists whose career trajectories are largely dependent on securing NIH funding may be frustrated by critics who hold them responsible for high population burdens of disease, along with the inability of a fragmented healthcare system to exploit prior research discoveries. How can we manage expectations and encourage investments appropriate to desired outcomes?A Thought Exercise About Unemployment and Jobs TrainingImagine that instead of funding research in cardiovascular biology and disease, we at NHLBI were asked to establish and run a jobs-training program in an impoverished city, a city that has seen its manufacturing base disappear and has not yet realized the benefits of the new service, technology-oriented economy. Our ultimate “end-product” would be a city bustling with full-employment, with high levels of personal income and socioeconomic well-being. As a community, we'll need to bring together numerous partners and stakeholders, including legislators, government executives and agencies, employers, labor unions, and schools and universities. We'll need to develop a multiple-pronged strategy to deal with short- and long-term factors that contribute to unemployment; one part of that strategy may be to establish a jobs-training program, the program for which we now find ourselves responsible.Should our jobs-training program be held solely responsible for the city's unemployment rate? Somehow, that does not seem right, as there are many factors contributing to unemployment that go way beyond what we can possibly address with the resources available to us. However, there are performance measures we could “accept” as reasonable reflections of the quality of our work. These might include the number of unemployed workers we train and the number of courses we offer. But these measures are descriptive only of the resources we have put in place and do not tell us whether we provided the right training to the right people. We could look for other measures of impact, like how many (and what proportion) of our trainees secured employment, how many (and what proportion) secured high-paying jobs, and how many employers choose to come to us to find future employees. If we focus on these measures, we can demonstrate our value to the community, and equally important, we can make informed decisions about changes we can make that will improve our efficiency and effectiveness.Results-Based AccountabilityWe have just applied an approach called “Results-Based Accountability (RBA),” which is “a disciplined way of thinking and taking action that can be used to improve the quality of life in communities, cities, counties, states, and nations. Results-based accountability can also be used to improve the performance of programs, agencies, and service systems.”10 During the past few years, a number of NIH leaders have learned about RBA as part of a formal NIH senior leadership program run in conjunction with the University of Maryland School of Public Policy. I summarize how we applied RBA to unemployment and job training in Table 1.Table 1. How Results-Based Accountability Applies to Two Different Types of Government ProgramsCase Unit of Accountability/StrategyCity Jobs ProgramResearch Funding AgencyPopulation (for communities, states, nation)City residents and external businesses that may be attracted to relocateU.S. and global populationResultsFull employment, economic well-beingNation free of cardiovascular diseaseIndicatorsUnemployment rate, annual personal income, job satisfaction ratingsCardiovascular mortality rateMyocardial infarction incidence and case fatalityStrategiesIdentify causes of unemployment (manufacturing shifts, tax policies, inadequate education)Identify causes (biology of cardiovascular disease, population behaviors, economic and social policies)Identify new opportunities for employmentIdentify scientific opportunities for investigationIdentify partners (employers, state government, labor unions, schools and universities)Identify partners (researchers, policymakers, physicians, schools and universities)Establish programs (eg, job-training program)Establish programs (eg, government funding of researchers)Performance (for programs/agencies)Identify customersUnemployed workersApplicants/potential investigatorsPotential employersUniversities, along with their applicants and granteesIdentify servicesProvide training of new skillsEnable applicants to navigate system by establishing fair competition, answering questions, providing adviceEnable grantees to perform research by giving funds, clarifying rules, monitoring progress, allowing flexibility as appropriateIdentify metrics (over time—Figure 1)How much?Courses offered (eg, computer programming, basics of laboratory technique)Applications received and reviewedNumber of students enrolledGrants fundedResearch funds disbursedHow well?Faculty hired and retainedGrant reviews processed and completed according to scheduleStudents who completed courses and passed examsFunds disbursed according to scheduleProgress reports received, reviewed, and acted upon according to scheduleWhat impact?Number or proportion of students who secured jobsApplicant and grant queries responded to in timely and effective mannerAverage income of students who secured jobsClinical trials that recruit subjects and complete processes on time and within planned budgetsPublications (including number, cost per publication, appearance in high-impact journals)Citations (including number, cost per citation, appearance in major systematic reviews or guidelines, appearance in high-impact journals, h-index)Commercial (including patents, products readied for clinical testing, cost per patent)Economic (jobs created, additional business activity stimulated)Practice or policy (including changes to guidelines or law, changes in evidence-base prescription practices)Tell the stories behind the metricsAre there measurement artifacts that convey misleading impressions?Are there measurement artifacts that convey misleading impressions?Were the appropriate skills taught?Is peer review identifying the best scientists and best projects?Were students correctly matched to courses?Is competition from industry, foundations, and agencies from other countries crowding us out?Were the right faculty hired?Are budgets appropriate and are they adequately flexible? Do they account for dynamic changes in technologies and prices?Did we adequately research employer needs?Are powerful economic forces keeping researchers and/or patients out of government-sponsored projects? Are economic forces preventing adoption and dissemination of evidence-based technologies?Identify partnersLocal employers, educators, experts in program evaluationAcademic leaders, professional societies, journal editors, other government agencies, industry researchers, research leaders, experts in program evaluationStrategies and action planModify and improve metricsModify and improve metricsChange hiring criteria for facultyModify peer review criteriaAdopt technologies to improve teaching efficiencyNew program announcements (sometimes with dedicated review) or initiativesSystematize feedback with employers to enable continuous change and improvementsWorkshops and working groupsJoint programs with other agenciesEstablish special advantages (eg, to early stage investigators)Emphasis is on performance accountability. The table provides examples but is not meant to be comprehensive.In RBA, we start by thinking about population ends, in this case an economically successful city operating at full employment. We then move to the role of our jobs-training program and come to realize that we should focus on means, in this case training unemployed workers and enabling them to secure new jobs. We identify measureable and meaningful performance metrics, such as the proportion of our trainees who secure jobs. RBA is a helpful construct because it forces us to separate population ends from program performance.In Table 1 and Figure 1, I summarize a similar thought exercise about how we, a Federal funding agency that supports biomedical researchers, can use the RBA construct to evaluate our performance. We proceed through a series of steps as follows. 1) Identify our customers and articulate the services we provide (eg, provide funds to applicants who submit highly meritorious proposals).2) Identify a relatively small number of “headline” performance metrics. Some metrics focus on process (eg, accruing patients to trials, completing application reviews on time, answering queries from applicants and grantees), whereas others focus on impact (eg, publications, citations, ancillary studies, new commercial products, changes in clinical practice).3) For each metric, plot changes over time and forecast what is likely to happen if we make no changes in our processes.4) “Tell the story” behind the metric. We ask a series of questions to identify factors that have improved or worsened performance.5) Based on our story, identify partners and possible strategies to “turn the curve,” that is, attempt to change adverse trends.Download figureDownload PowerPointFigure 1. How to use performance metrics in results-based accountability. The X-axis represents time, whereas the Y-axis represents a quantitative metric, which could measure process or impact. After plotting the data, along with a forecast of what we think will happen in the absence of any change, we seek to “tell the story” behind the numbers by asking questions that identify contributing and restricting factors. Armed with the data and the story, we find partners and develop strategies that will “turn the curve,” producing better outcomes. Adapted.10Performance Metrics for Biomedical ResearchTo identify what performance metrics might be useful, we can do another thought experiment. Suppose we were an outright failure, that is, we funded universities and researchers who did absolutely nothing scientific with our money. How would we know? We would find no publications that could be linked to our grants. With no publications, there would be no citations. We might also find no evidence of new products being made ready for clinical testing and eventual commercialization (ie, no patents). We would see no trials going anywhere near completion, resulting in no publications, citations, or guidelines citing our research for recommendations on changes in practice.This worst-case scenario is easy for all of us to assess. In the real world, it is more difficult to assess whether a given level of productivity is appropriate in quantity (numbers of publications or inventions), quality (impact on the field of science or practice of medicine), and timeliness (how long it takes for work to “pay off”). During the past few years, there has been extraordinary interest in bibliometrics, science performance metrics based on publications and citations. I summarize some commonly used measures in Table 2.6 Some scholars have suggested that bibliometrics are powerful predictors of future scientific success and recognition11 and that they can be used to assess teamwork in science.12 Others express “deep concern” about the misapplication of metrics for purposes beyond which they were created13 and about the misuse of citation streams to support false beliefs.14 Remarkably, an almost identical conversation is taking place about education, in which there is an increasing emphasis on accountability of schools at all levels and concern that the items that can be measured may not reflect value and actually provide an incentive for schools, including universities, to reward the activities that improve test performance at the expense of the development of critical thinking and creativity in students.15Table 2. Overview of Commonly Used Bibliometrics (Adapted6)MetricDefinition(s)CommentsPublicationsNumber in peer-reviewed journalsSome publications may be missed because authors do not include grant number in their manuscriptsNumber of original research articles (ie, excluding reviews, commentaries, and editorials)Incentives for “salami slicing”Number in high-impact journalsSome consider focus on journal impact (as assessed by impact factor) to be a “mortal sin”Along with citations, can be used to measure the extent and impact of scientific teamworkDifficult to assess the relative contributions of multiple coauthorsDifficult to assess the relative contributions of specific grants and contracts for publications that credit multiple grants and contractsDifficult to compare across fieldsCitationsTotal numberHighly correlated with scientific field, career stage, personal reputationNumber in peer-reviewed journalsMay spread misinformation, reflecting network phenomena rather than genuine scientific impact14Number per publicationMay not necessarily reflect positive impact on fieldNumber weighted for impact factor of citing journalNumber weighted for number of coauthorsNumber weighted for scientific fieldImpact FactorFor a journal, the impact factor for year y is the number of citations appearing in year y to articles published in years y-1 and y-2, divided by the total number of “citable articles” published in years y-1 and y-2Not a valid measure of individual researchers or an individual articleEven for high-impact journals, most citations come from a small proportion of articlesDifficult to compare across fieldsH-indexFor a researcher (or laboratory, department, or any unit), the h-index is the number of publications that have been cited at least h times (eg, an h-index of 40 means a researcher has published at least 40 articles that have each been cited at least 40 times)Measures the impact of a large body of workCorrelates with other measures of scientific recognition, like Nobel prizesCan only increase with age, meaning it is unable to detect declining impactDifficult to compare across fieldsOnline accessesNumber of visitsAssesses impact beyond other scientists who publish articlesNumber of downloadsStandards not yet developedAn Incomplete ExampleUsing an internal NIH-based search tool, I used the keyword “myocard*” to identify 1267 R01 grants that were awarded between 1990 and 2009 through the NHLBI cardiovascular research division and that were classified as “nonhuman.” To date, these grants and their successful competitive renewals account for $2.2 billion in funding. They have led to at least 15 656 publications that have garnered 448 830 citations (approximately 29 citations per publication). The average cost per publication is $143 276, and the average cost per citation is $4997. These figures compare favorably to national estimates of cost per publication for all academic-based research, which has increased from $186 567 in 1998 to $308 641 in 2008.16Figure 2 shows the behavior of selected process and impact metrics over time. There has been a marked increase in the number of projects funded and total funding (even after accounting for inflation17), coincident in part with the NIH doubling in the late 1990s and early 2000s. To allow for fair comparisons, the output metrics related to publications and citations are shown for each project from its start until 5 years later. The number of publications increased commensurate with the increase in funding, whereas the cost per publication remained remarkably constant. However, the number of citations has not kept pace, with a decrease in the number of citations per publication and an increase in cost per citation.Download figureDownload PowerPointFigure 2. An example showing the behavior of selected performance metrics for a portfolio of nonhuman R01 grants that responded to a keyword of “myocard*” and that were administered by the NHLBI cardiovascular research division. Process measures (projects and funding) are shown for projects starting between 1990 and 2009. Impact measures (publications, citations, and costs) are only shown for projects starting between 1990 and 2004, to allow all projects the opportunity to garner citations up to 5 years after their start. To assure fair comparisons, we only consider publications and citations occurring within 5 years of project initiation. All dollar figures are adjusted for the Biomedical Research and Development Price Index (BRDPI).17Our next step, which is critical, is to ask a series of questions that will reveal the story behind these data. This is an exercise that is beyond the scope of this article. We might start by asking why the citation rate is going down. Is it because the field is losing luster? Or, perhaps, is it because we funded a number of relatively new investigators during the doubling, whose projects have not yet matured to the point at which they can garner many citations?Closing ThoughtsAs stewards of taxpayer monies, we at NHLBI expect to be held accountable for our decisions and policies. No differently than scientists, hospital administrators, and for-profit company executives, we look to performance metrics as tools—nothing more or nothing less—to help understand the stories behind the effectiveness of our work and to help us make the best informed decisions. We look to colleagues in NIH, in the extramural scientific community, in medical editing, and in professional societies to work with us as we develop greater degrees of sophistication in program evaluation, ultimately to reach the goal of a society free of the ravages of cardiovascular disease.FootnotesThe opinions expressed in this NHLBI Page are not necessarily those of the editors or of the American Heart Association.Correspondence to Michael S. Lauer, MD, Office of the Director, Division of Cardiovascular Sciences, NHLBI, 6701 Rockledge Drive, Room 8128, Bethesda, MD 20892. E-mail [email protected]nih.gov