Most Organizations Select a Coding Vendor Carefully and Then Stop Managing Them
The RFP goes out. References get checked. A pilot runs for 30 days. Leadership signs the contract and moves on. Six months later, denials are creeping up and nobody can say exactly why. Chart count reports arrive on schedule, but they tell you almost nothing about whether money is being left on the table or whether a payer audit is quietly building in the background.
This is the standard pattern, and it costs real money. A coding accuracy rate that drifts from 95 percent to 91 percent sounds minor. Across a high-volume outpatient facility or a busy multispecialty group, that gap can translate to six figures in annual revenue leakage, rework cost, and denial write-offs, before you factor in any downstream audit exposure.
A coding vendor scorecard does not fix a bad vendor. What it does is make drift visible early enough to act on it, and it gives your vendor a clear, contractual definition of what good performance looks like rather than leaving them to define it themselves.
Why Price-Per-Chart Is the Wrong Management Metric
Price per chart gets the most attention during vendor selection and almost none after the contract starts. That is a problem because it optimizes for the wrong behavior. A coder paid by volume has a structural incentive to move charts, not to interrogate documentation or query a physician when the diagnosis does not support the planned procedure.
Volume metrics reward throughput. They do not reward catching an underdocumented complication that would justify a higher-severity DRG, or flagging a modifier 25 situation that protects a same-day E/M from a bundling edit. They do not penalize a coder who downcodes to avoid a query because queries slow turnaround time.
A vendor reporting 500 charts coded per day is giving you an activity report. It tells you nothing about whether the codes were right, whether they captured the full clinical picture, or whether they will survive a RAC or UPIC review. Managing a coding relationship on chart counts alone is the equivalent of managing a sales team only on call volume with no attention to close rate or deal quality.
The scorecard replaces that single vanity metric with five categories that connect directly to cash, compliance, and clinical accuracy.
The Five KPIs That Belong on Every Coding Vendor Scorecard
1. Coding Accuracy Rate, Measured by Independent Audit
This is the anchor metric. A coding accuracy rate means the percentage of coded charts that, on review, require no change to the principal diagnosis, secondary diagnoses, procedures, or modifiers. Industry benchmark for acceptable outpatient and professional coding accuracy is generally 95 percent or higher on a statistically valid sample. Facility inpatient coding, where DRG assignment is the outcome measure, often carries the same threshold but is evaluated with more weight on MS-DRG or APR-DRG impact.
The critical word in that definition is "independent." Vendor-reported accuracy is not accuracy. It is self-assessment, and no serious operational agreement should treat the two as equivalent. The scorecard should require an independent coding quality audit of a statistically meaningful sample, typically 30 to 50 records per coder or per specialty per quarter, conducted by a party that has no financial interest in the outcome. Vendors who resist this clause should be treated as a red flag, not an exception.
Define what a coding error means before signing the contract. A missed secondary diagnosis that changes the DRG and reduces reimbursement is not the same as a minor sequencing variation with no payment impact, but both should be counted, weighted differently, and tracked over time.
2. Denial and Rejection Rate Attributable to Coding
Not every denial is a coding denial, and your scorecard should isolate the ones that are. Denials driven by medical necessity mismatches on ICD-10-CM diagnosis codes, improper modifier use, NCCI bundling violations, and procedure-diagnosis mismatches on CPT claims are directly within the vendor's control.
Set a baseline from your own historical data, then hold the vendor to that benchmark or better. Tracking coding-attributable denials as a percentage of total claims submitted gives you a rate you can trend month over month. A denial rate above 2 to 3 percent on coding-specific grounds in most specialty or outpatient settings is a performance conversation. Above 5 percent is a contract conversation.
Watch the rework cost too. Each denied claim that requires appeal, additional documentation, or resubmission carries an administrative cost often estimated between $25 and $118 per claim depending on complexity. Multiply that by monthly denial volume and the conversation about accuracy investment gets easier to make internally.
3. Turnaround Time and Coding Lag
Coding lag, the number of days between a service date and the date a claim is ready to drop, is one of the most direct connections between coding operations and days in AR. A clean claim that takes 11 days to code instead of 5 adds nearly a week to your revenue cycle for that encounter before the payer clock even starts.
Standard service levels for outpatient and physician coding (ProFee) should target chart completion within 24 to 48 hours of documentation availability for routine encounters and within 72 hours for complex cases requiring queries. Inpatient facility coding timelines vary by case complexity but a benchmark of 3 to 5 days post-discharge for most DRGs is defensible.
Build turnaround time into the scorecard with both an average and a tail metric. Hitting a 48-hour average while leaving 15 percent of charts past 7 days is a cash flow problem that the average hides.
4. Query Rate and Physician Response Time
A coding partner who never sends physician queries is not necessarily efficient. They may be making assumptions, defaulting to lower-specificity codes, or avoiding the friction of documentation improvement because it slows throughput. A query rate of zero is a red flag, not a performance achievement.
The scorecard should track the percentage of charts that generate a clarification query, the specificity and compliance of those queries under AHIMA and ACDIS guidelines, and the average time from query submission to physician response. Unresolved or pending queries sitting beyond 48 to 72 hours should have an escalation path defined in the contract.
On the inpatient side, a clinical documentation integrity program cannot function without a coding vendor who actively participates in the query process. If your vendor is producing query rates dramatically below specialty benchmarks, ask to see a sample of recently coded charts in that specialty. The documentation gaps will still be there. The queries just are not.
5. Case-Mix Index and E/M Distribution Stability
Case-mix index for facility coding and E/M level distribution for outpatient coding and professional fee coding serve as early warning indicators of systematic undercoding or overcoding. Neither should swing dramatically month to month without a corresponding shift in the actual patient population or payer mix.
A drop in CMI without a change in clinical volume or acuity is often a sign of reduced secondary diagnosis capture. A sudden compression toward level 3 and 4 E/M codes (99213 and 99214) across an internal medicine practice, when documentation would support a higher distribution, is undercode creep. A sudden shift the other direction warrants a different conversation about audit exposure.
Track these distributions quarterly. Compare them to prior periods and to specialty benchmarks. Make the vendor explain deviations in writing.
Setting Service Levels and Remediation Terms
A scorecard without consequences is a suggestion. Each KPI needs a defined threshold that triggers a formal remediation process, and remediation needs teeth.
A reasonable structure looks like this: if accuracy drops below 95 percent on an independent audit, the vendor submits a root cause analysis within 10 business days and a corrective action plan within 20. A second consecutive audit below threshold triggers a fee reduction, a coder reassignment, or both. Three consecutive failures trigger a termination clause without penalty.
Remediation terms protect both parties. They give the vendor a structured path to recover before losing the contract, and they protect you from a situation where you are locked in to a underperforming partner with no contractual recourse. If a vendor pushes back hard on remediation language, that resistance tells you something important before you sign.
Reporting Cadence: What to Expect and When
Monthly reporting is the minimum for an active coding relationship. At a minimum, the monthly report package should include accuracy rate from the most recent audit cycle, denial rate attributable to coding, average and tail turnaround time, open query aging, and a comparison of E/M or CMI distribution against the prior period and prior year.
Quarterly business reviews should go deeper: coder-level performance breakdowns, specialty-level accuracy comparisons, payer-specific denial trend analysis, and a forward-looking plan on any identified risk areas. These are not status meetings. They are structured accountability sessions, and your vendor should be bringing the data, not waiting for you to pull it.
If your current vendor's monthly report is a chart count and an invoice, you do not have a performance management relationship. You have a transaction.
Red Flags That a Vendor Is Optimizing for Volume Over Accuracy
- Resistance to independent audits or requests to use their own internal QA results as the accuracy metric
- No structured query process or query rates that are implausibly low for the specialty mix
- Reporting that leads with charts coded and buries or omits denial attribution data
- Coder turnover that is never disclosed, meaning the coders on your account change frequently without notice
- Flat E/M distributions that do not shift when documentation complexity shifts
- Pushback on remediation clauses during contract negotiation
- Inability to produce coder-level accuracy breakdowns, meaning all accuracy reporting is pooled and anonymized
Any one of these in isolation is a conversation. Several together are a pattern.
A Good Partner Reports Against This Scorecard Without Being Asked
The vendors worth keeping are the ones who come to the quarterly review with the scorecard already filled in. They flag their own misses before you find them, they bring an explanation and a plan, and they treat the accuracy conversation as a shared interest rather than a threat.
If you are not sure what your current vendor's accuracy rate actually is, or if it has never been measured by an independent reviewer, that is the starting point. Understanding the criteria for evaluating a coding partner against objective standards, and understanding why billing companies build audit oversight into their outsourcing model, both point to the same conclusion: accountability has to be designed in, not assumed.
Before you can manage a vendor against a scorecard, you need to know where you are starting. Use the free Coding Outsourcing ROI Calculator to quantify what current coding performance gaps are costing your organization in real revenue terms, so the scorecard conversation starts with numbers, not hunches.
If you want a vendor that brings this reporting structure as standard practice rather than under contractual pressure, talk to the MedCodex Health team about our coding quality audit and managed coding programs, built around the metrics that actually connect to your bottom line.