Parsing Volume to Value, Proxy Measures, and the Streetlight Effect

Despite some concern that the migration from fee-for-service to value based payment (VBP) is being reversed, there remains strong momentum for VBP – both nationally – in the form of the bipartisan 21st Century Cures Act that was passed and signed into law last December, and many state and commercial initiatives, including the one I’m personally involved with, New York Delivery System Reform Incentive Payment Program (DSRIP). Defining value, of course, is not easy. I’ve often returned to Michael Porter’s short essay on this topic when I feel my definition meandering. Go read it now. Please.

Ok, you’re back? Cool. That was good, eh? I love the last paragraph:

The failure to prioritize value improvement in health care delivery and to measure value has slowed innovation, led to ill-advised cost containment, and encouraged micromanagement of physicians’ practices, which imposes substantial costs of its own. Measuring value will also permit reform of the reimbursement system so that it rewards value by providing bundled payments covering the full care cycle or, for chronic conditions, covering periods of a year or more. Aligning reimbursement with value in this way rewards providers for efficiency in achieving good outcomes while creating accountability for substandard care.

As CEO of Alliance for Better Health, I’ve been working with care delivery organizations in our community to navigate the path forward. They clearly have their feet in two canoes: the majority of their reimbursement continues to come from traditional sources with a traditional structure: more patients seen = more money. And then – from the edges, they have people like me telling them that the future is something different. It’s new, it’s going to pay them to do something that they would like to do – but they’re not quite sure how to do it, and, yes, some fear accountability.

Walk before we run, or jump right into the deep end? How do we traverse this gap between where we are and where we would like to be? One framework says that here is no traverse at all: we need to leapfrog to tomorrow and start from scratch. Iora Health is one such model. Care providers are focused on personalized, proactive care. The practice is led by health coaches, nurses, physicians, and administrators working together as teams to maximize health for the communities they serve. Reduced cost is a byproduct of great care, not a target itself. The office workflow is different from a traditional practice, the architecture is different, the hiring practices are different, the EHR is different. This model steps out of the old canoe and into the new one. For those with the guts, it’s a great model. For the rest, a slower path may work better. Of the slower paths, there are a handful of options, and many of them are complimentary rather than mutually exclusive. Accountable Care Organizations represent a compelling alternative to the Iora-style leapfrog. By offering a migration path – with increasing levels of shared risk, an ACO can coalesce a community of providers, collaborate with the federal government or commercial payers to standardize care for the better, and improve health outcomes. There are many models of ACOs, but I would argue that a common thread for the successful ones is that they have maintained laser focus on two guiding principles: a) success will attract the right partners; b) great primary care is the keystone of an ACO. Let’s parse this for a moment: why do I say that success will attract the right partners? There is a misconception that one should start with the creation of a large ACO. Growing the numbers of care delivery organizations will grow the number of “accountable lives” (people) and therefore, if one follows the “bigger is better” hypothesis, one can take advantage of the scale to reduce overall risk, and create a more powerful negotiating lever with the payers. While seductive, this hypothesis is flawed. A big network is hard to manage, and an ACO will be forever “herding cats” if they start too big. They won’t see shared savings, and they won’t be able to meaningfully accept risk, because they can’t be confident that they will perform well. An alternative model, and one that has been followed by all successful ACOs, (which, of course, includes my friends at Aledade) is to start small. For the first turn of the ACO wheel in a community, focus on a small group of providers who are “all-in.” They are fully engaged and dedicated to the success of the program. When successful, this attracts others – like moths to a light bulb – to the program. The ACO can then attract great partners (great primary care providers) rather than working hard to corral everyone and then re-educate them to the new ways. The difference, of course, is “pull” vs “push.” “Pull” usually works – and if it doesn’t, it wasn’t meant to. “Push” never does. We call this Motivational LeadershipTM (More on this in another blog post.)

DSRIP Performance. Many states have DSRIP programs, and it’s beyond the scope of my essay today to explain what DSRIP is, or what exactly New York’s variant represents. Today, our focus is on DSRIP Performance. Click on the image over there for a snapshot of what I mean. Each line is a measureAlliance Performance Measures and our performance against this measure will determine a payment from the New York Department of Health. The program (more than) pays for itself: with improved health of a population, unnecessary acute care services are prevented. Healthier people, better care experience, lower cost. In that order.  One challenge that we have is that the dependent variable here is our community’s performance, yet we won’t know what that is for 6-12 months .. which gets us to the heart of our story today: proxy measures and why we need them.

  1. Problem to solve: we want to pay our community for performance against DSRIP goals. Most of these goals of course are measures. We call them outcome measures but internally we know that most of them are process measures. That’s ok. It’s all a continuum. We’re not going to measure life expectancy (we don’t have 50 years)  so we?ll have to draw the line somewhere  and preventable ED visits? (and the 38 other measures you can see by clicking on that thumbnail above) may be just fine.
  2. Hurdle to leap: DSRIP funds have too long a payment lag. Telling a CBO or small practice or a hospital CFO that I’ll pay you Tuesday for a hamburger today (I’ll pay you in 2019 for preventing ED visits now) just won’t work.It’s too far. I can’t train my dog to sit by rewarding him in an hour. I need to tie the positive reinforcement to the act that I’m reinforcing.
  3. Opportunity: we?ve created an incentive program in which we have committed to distribute funds (which we have in the bank) in advance of performance. Up to 30% of the funds that could be earned this year will be distributed quarterly (up to 7.5% per quarter) for near-term performance.
  4. You are now asking the right (next) question:? How will you know what near term performance looks like? Aaahh .. yes! We will need to measure performance! In some cases (preventable ED visits) we will do our best to mirror DOH methods with the data that we have available from claims data, from clinical data feeds, and other sources that are available. Of course data we have available is a classic quality measurement challenge the so-called streetlight effect. We’ll avoid that as much as possible by using proxy measures.
  5. Proxy Measures are therefore a big topic of conversation in these parts. What’s a good one? What’s not? We want to let the community do some of this work as thinking about how to measure value is a great exercise for them as they transition to value based payment. We don?t need them to make these perfect!? That’s what I think is the elegance to this model. Worst case:? they make easy proxy measures that look like success, get 30% up front, miserably fail on the real? measures from DOH and we get $0 at the end. This is fine. We will have tried and they will have cheated? us for 30%. But we have the 30% this year to cover our experiment because of the evolution of New York DOH’s DSRIP program: this year, we still get some funding to support “pay for reporting.” Next year, we shift to nearly 100% “pay for performance” and $0 for “pay for reporting.” By allowing for this evolution, we encourage providers to experiment with proxy measures, allow them to be imperfect, all while pulling (not pushing!) forward into value based payment.  It’s unlikely that they’ll fail miserably and “cheat” us. Much more likely is that this enough to cause them to work really hard for true success. The 30% is then just a pre-payment  and they’ll get the 70% next year when it flows from DOH for our extraordinary performance.What’s an example of a proxy measure? Ideally, a proxy measure is a perfect reflection of the “real” measure we’re aiming to satisfy. So if we want to reduce preventable Emergency Department visits, and our performance measure will be “% annual reduction in preventable ED visits,” then a monthly (weekly? daily?) measure of this would be optimal. Indeed, if we had rapid insight, we could intervene. This where quality measurement, if performed real-time, actually becomes decision support. (This is a topic for another day …) So here’s an example of a less obvious but perfectly reasonable proxy measure: if we accept the hypothesis that preventable ED visits are a given percentage of all ED visits, and the hypothesis that ED visits resulting in hospital admissions are less likely to be preventable ED visits (they represent conditions that merit a hospital admission) then if the proportion of ED visits that result in hospital admissions grows, one might conclude that the number/proportion of preventable (unnecessary) visits fell.  Long-term, this would be a terrible performance measure, since it may cause the ED staff to feel pressure to admit more patients. But as a proxy for a reduced number of preventable ED visits, I think it does a nice job. Do you agree? Disagree?

You can play too, if you like. Here is an editable spreadsheet with all 39 of our measures. Add/edit columns with your ideas for proxies! You can also see much of the baseline data for the DSRIP performance measures (and others) by poking around here.