10 Common Benchmarking Mistakes in Transfer Pricing (and How to Avoid Them)
Borys Ulanenko
CEO of ArmsLength AI
TL;DR - Key Takeaways
The #1 mistake: insufficient documentation of your screening process. Without an accept/reject matrix showing why each company was included or excluded, your study is materially weaker in audit.
Quality over quantity: 6 excellent comparables beat 15 marginal ones. Tax authorities will strip weak comparables from your set, potentially eliminating your range.
Update annually—at minimum. Relying on outdated studies (especially post-COVID) signals to tax authorities that you didn't do your homework.
Match the PLI to the value driver. Using ROA for a low-asset distributor or Operating Margin for a pass-through agent creates immediate audit risk.
A self-audit checklist before submission can catch most of these mistakes—and save you significant audit headache later.
Get the latest transfer pricing insights, AI benchmarking tips, and industry updates delivered straight to your inbox.
Quick Answer: The 10 Most Common Mistakes
The most common benchmarking mistakes are: (1) poor screening documentation—no accept/reject matrix, (2) wrong PLI selection—mismatched to the tested party's value driver, (3) inflated sample size with weak comparables, (4) ignoring working capital adjustments when differences are material, (5) outdated studies—not updating annually, (6) wrong database or market—missing local comparables when required, (7) improper handling of loss-makers—blanket exclusion or inclusion without analysis, (8) inconsistent multi-year analysis—mixing single-year and multi-year data, (9) shallow functional analysis—boilerplate that doesn't support your method, and (10) incomplete accept/reject matrix—missing reasons or companies.
These mistakes aren't hypothetical—they're routinely found in audits and have triggered significant adjustments. Addressing them before submission dramatically improves defensibility.
Mistake #1: Poor Screening Documentation
What it is: Failing to document the search and screening process for comparables. The study shows 10 final comparables but no record of the dozens of companies considered and rejected.
Why it happens: Practitioners treat the search process as internal workpapers and only include results in the report. Time pressure or lack of a standardized process leads to skipping detailed documentation.
Tax authority reaction: Lack of screening documentation is a major red flag. HMRC notes that transfer pricing documentation is often "too high level" and insufficiently evidenced, making it impossible to confirm an arm's length result. The IRS expects sufficient documentation to show you reasonably selected and applied the best method—and that you can produce it quickly in audit. Weak support for comparable selection increases audit burden and invites extensive Information Document Requests asking for details you should have documented upfront.
Audit Risk: If you cannot show which companies were rejected and why, auditors may suspect you excluded companies arbitrarily to manipulate the outcome. The credibility of the entire study collapses.
How to fix it:
Document every step of your comparables search
Record database query parameters: keywords, NAICS/NACE codes, geographic scope, financial screens, date of search
Create an accept/reject matrix listing all companies that passed initial filters with your decision and reason for each
Ensure reasons are specific: "Reject – persistent losses," "Reject – >25% related-party sales," "Accept – functionally similar distribution services"
The matrix demonstrates a diligent, consistent process and can save hours in audit by proving you applied criteria systematically.
Mistake #2: Wrong PLI Selection
What it is: Choosing an inappropriate Profit Level Indicator for the tested party's business model. Using Return on Assets for a distributor with minimal assets, or Operating Margin when pass-through costs dominate.
Why it happens: PLI selection is sometimes done by rote ("we always use operating margin") or based on what makes the tested party look better. A shallow functional analysis contributes—if you haven't truly understood the tested party's value drivers, you'll pick a PLI that doesn't capture them.
Tax authority reaction: Using the wrong PLI attracts immediate auditor criticism. IRS regulations emphasize that the PLI should "provide the most reliable measure of an arm's-length result." Auditors may recompute your analysis with a different PLI—and if that yields a different result, expect an adjustment.
The Rule: Match the PLI to the value driver. Revenue-driven → Operating Margin. Cost-driven → Net Cost Plus. Asset-intensive → ROA/ROOA. Pass-through with value in OPEX → Berry Ratio (with conditions).
How to fix it:
Perform rigorous functional analysis first—identify what drives profits
If the tested party uses few tangible assets, avoid ROA
If substantial pass-through costs exist, consider Berry Ratio (document OECD ¶2.107–2.108 conditions)
Cross-check by testing different PLIs as a sensitivity analysis—if the choice alone is the difference between in-range and out-of-range, investigate
Mistake #3: Inflated Sample Size with Weak Comparables
What it is: Including too many comparables, some of which are poor quality matches, just to have a "big sample." Valuing quantity over quality.
Why it happens: Practitioners believe larger samples are inherently more defensible. Some include marginal comparables to ensure the tested party falls within range—padding the set with low- or high-margin companies to widen it. The "if in doubt, include it" approach leads to bloated sets where comparables aren't truly comparable.
Tax authority reaction: Tax authorities prioritize quality over quantity. Courts have consistently upheld authorities rejecting weak comparables from taxpayer sets—even when this reduces the sample size significantly. The underlying principle: a smaller set of truly comparable companies is more reliable than a larger set with "apples and oranges." The IRS FAQs explicitly encourage stress-testing your set by removing one comparable and seeing what happens—if your result hinges on a single company, the set isn't reliable.
How to fix it:
Focus on comparability, not count—6 excellent comparables beat 15 marginal ones
Use financial consistency checks: persistent losses, extremely high margins, erratic results all warrant investigation
Create a "quality scorecard" evaluating how closely each matches on industry, functions, market, size, asset intensity
Test: if removing one comparable changes your outcome from in-range to out-of-range, your set is too volatile
There's no regulatory "minimum 10" rule. Many jurisdictions accept 3-5 comparables if quality is high. Strive for a sample where you could defend each company individually in front of an auditor.
Mistake #4: Ignoring Working Capital Adjustments
What it is: Failing to perform working capital adjustments when comparables have materially different levels of receivables, payables, or inventory than the tested party. These differences affect profitability—a company carrying large receivables has financing costs that lower reported profit.
Why it happens: WCAs can be complex and are sometimes viewed as optional. Practitioners skip them to avoid complication or assume the impact is immaterial. In other cases, balance sheet data isn't readily available. Practice is inconsistent: some treat WCA as standard, others as "rarely needed."
Tax authority reaction: Working capital differences are a recognized comparability factor. OECD and many local regulations acknowledge that where payment terms or inventory differ and impact pricing, adjustment can improve comparability. In some audits (particularly in India), authorities may compute WCAs even if the taxpayer did not. If the tested party's working capital is an outlier, auditors will ask why no adjustment was made.
How to fix it:
Assess working capital profiles early: compare DSO, days payable, and inventory days between tested party and comparables
Define a materiality threshold (e.g., "if differences in DSO exceed 30 days, consider adjustment")
When adjusting, use a consistent formula and reasonable interest rate
If you decide not to adjust, document why: "WC levels are within the range of comparables (all DSO 60–75 days), so no adjustment was made"
What it is: Relying on old or stale comparables for current-year transfer pricing without proper updates. Reusing the same set for multiple years under the assumption that "nothing has changed."
Why it happens: Benchmarking studies are time-consuming and costly. OECD notes that some tax administrations may accept refreshing the comparables search every three years, with annual financial data updates in between. But this isn't a universal safe harbor—economic conditions change (recessions, booms, pandemics) and comparables can drift (acquired, diversified, defunct).
Tax authority reaction: Many tax authorities expect annual, contemporaneous documentation. While comparables searches might not be redone every single year if facts are constant, very old studies are viewed with suspicion. The IRS expects analysis reflecting the economic conditions of the tax year in question—especially after events like COVID-19. Tax authorities may conduct their own fresh search and find different results.
Post-COVID Alert: Studies based on pre-2020 data applied to 2020-2022 results are especially vulnerable. Economic volatility means comparability requires year-specific judgment.
How to fix it:
Update financials annually at minimum; conduct a full re-search every 3 years (or sooner if material changes occur)
Confirm existing comparables are still valid each year (still independent? still in same business?)
If operating conditions are unchanged, document that: "Screening of database as of 2025 did not identify new companies meeting criteria. The industry and functions remain consistent, so prior comparables were retained with updated FY2024 data."
Always use the latest available financial data for comparables
Mistake #6: Using the Wrong Database or Market
What it is: Conducting the comparables search in a database not appropriate for the transaction's geographic or industry context. Using only foreign comparables when local ones are available and preferred.
Why it happens: Convenience or subscription access—the analyst uses whichever database they have, even if it's not ideal for that region. Many countries have a preference or requirement for local market comparables, and ignoring this creates risk.
Tax authority reaction: Many jurisdictions have strong preferences for local comparables:
Japan: Often expects domestic comparables where available for Japanese tested parties
India: TPOs frequently reject foreign comparables if sufficient Indian ones are available
Thailand: Strong local preference; taxpayer should evidence scarcity if using foreign comparables
China: SAT strongly favors Chinese comparables and maintains internal data sources
If you use only European companies for an Indian transaction, the TPO will likely reject most of them as not reflecting Indian market conditions.
If local data is insufficient, expand carefully to the region and justify how those markets are economically similar
Document steps taken to find local comparables (even if none found): "Searches for domestic comparables were conducted. In [Country], data on independent companies providing XYZ services is scarce; therefore, regional comparables were used with adjustments as necessary."
Mistake #7: Mishandling Loss-Making Comparables
What it is: Improper inclusion or exclusion of comparables solely based on whether they have losses. This manifests two ways: (a) automatically excluding all loss-makers without analysis, even if losses reflect normal business cycles, or (b) including a loss-maker whose losses stem from extraordinary circumstances unlike the tested party's situation.
Why it happens: Loss-makers are contentious. Some practitioners take a "reject all losses" rule of thumb, believing independents don't persistently sell at losses (not always true). Others include loss-makers to defend a low margin without checking why the loss occurred. Without clear guidance, firms default to convenience or bias.
Tax authority reaction: OECD Guidelines explicitly state that you should not reject a comparable solely because it has losses (¶3.64–3.65). Recent case law reinforces this: the Italian Supreme Court (Decision No. 19512, July 2024) ruled that loss-making comparables cannot be excluded just for being in loss—the reasoning must be economic, not automatic. Colombia's Council of State (2024) similarly clarified that a single year of loss doesn't justify exclusion unless losses are recurrent or signal materially different economic conditions.
The Principle: Investigate the circumstances. Ask: Do losses reflect normal business volatility, or abnormal conditions (bankruptcy, restructuring, speculative strategy)? Document your reasoning for each loss-maker.
How to fix it:
Evaluate loss-making comparables on their merits
Ask: Is the loss due to normal industry downturn (include) or extraordinary circumstances (investigate/exclude)?
Are losses recurrent or a one-year dip? Single-year losses within a multi-year period may not warrant exclusion
If excluding, document why: "Company X rejected—two years of significant losses (30% negative margins) resulted from major restructuring, not reflective of routine operations or tested party's risk profile"
Don't apply blanket "Profit > 0" filters without secondary review
Mistake #8: Inconsistent or Improper Multi-Year Analysis
What it is: Inconsistency in using multi-year data—such as averaging multiple years of comparables data but testing the single-year result of the tested party. Or cherry-picking years for different companies without adjusting approach.
Why it happens: Practitioners automatically average three years of comparables data without realizing some tax authorities require year-by-year analysis. Others average because it makes the tested party look better (diluting one bad year). Inconsistencies arise when data availability varies—instead of dropping comparables, analysts average whatever is available for each.
Tax authority reaction: Consistency is universally expected. The Canada Revenue Agency explicitly states that arm's length pricing should be established year by year, not by averaging multiple years. If you mix single-year and multi-year data, auditors will flag it as methodological flaw. If your analysis masks a non-compliant year through averaging, the IRS will still target that year.
How to fix it:
Determine local expectations: does the tax authority prefer year-by-year or allow multi-year averaging?
Be consistent: if averaging comparables over three years, also consider the tested party's three-year average (as supplemental, not replacement)
Keep data alignment: use the same three years for all comparables and the tested party
Address missing data: if one comparable lacks a year, either exclude it or analyze it separately—don't silently let each have different denominators
Don't average away non-compliance: if one year is outside range, explain it rather than hoping averaging will mask it
Mistake #9: Shallow Functional Analysis
What it is: Providing a superficial functional analysis that doesn't adequately distinguish the tested party's profile or support the choice of comparables and method. Generic sentences like "Company A is a distributor that performs marketing and sales functions" that could describe any distributor anywhere.
Why it happens: Some view functional analysis as compliance formality and reuse boilerplate. If the preparer doesn't engage with operational teams, they can't gather nuanced information. Time constraints lead to cutting corners. Template-based or AI-generated outputs can be generic.
Tax authority reaction: Auditors place huge importance on functional analysis—it's often the first thing they read. The IRS has criticized taxpayers who present just a "functional analysis checklist" without connecting it to the transfer pricing method. HMRC notes common issues include analysis being too high-level or one-sided (not analyzing both sides of the transaction). A shallow functional analysis undermines confidence in the entire study.
How to fix it:
Invest time in a thorough functional analysis—it's the backbone of your study
Gather information from people who know the business: operations, finance, supply chain
Cover all five comparability factors: functions performed, assets used, risks assumed, contractual terms, economic conditions
Describe not just what functions are performed but how and their significance
Link functional analysis to your benchmarking choices: "Because the entity does not assume significant product liability risk, a routine cost-plus margin is appropriate, and comparables were chosen accordingly"
Perform a two-sided analysis: at least acknowledge what the counterparty does
The Test: Read your functional analysis aloud. If it could apply to any company in your industry, it's too generic. Add specific facts about this tested party.
Mistake #10: Incomplete Accept/Reject Matrix
What it is: Failing to include a complete accept/reject summary—or including one that's missing key elements. The matrix might omit reasons for rejection, not list all companies considered, or lack important fields (turnover, RP percentage).
Why it happens: Preparing a comprehensive matrix is tedious. Practitioners assume a narrative description suffices, or fear listing every rejected company gives auditors a roadmap to second-guess each exclusion. Sometimes it's oversight—the team screened internally but didn't document every step in the report.
Tax authority reaction: An incomplete matrix raises questions: What are you hiding? Did you apply criteria consistently? Some jurisdictions have specific requirements—for example, certain Israeli regimes (such as local R&D/cost-plus frameworks under ITA circulars) require attaching a full TP study including an accept/reject matrix to the annual return. If only aggregate stats are given ("we excluded 10 companies due to losses"), auditors will ask you to name them and prove it.
How to fix it:
Include a detailed accept/reject matrix in your study
At minimum, list: each company that passed initial quantitative filters, whether accepted or rejected, and reason for rejection
What's the single most common benchmarking mistake?
Insufficient documentation of the comparables selection process—essentially, poor or missing support for why certain comparables were chosen and others rejected. Tax authorities consistently complain about documentation that shows final outcomes without the trail. When rationale isn't clear, it raises suspicions of cherry-picking. This is the most prevalent issue and, fortunately, the easiest to fix: document everything in an accept/reject matrix.
How often should I update my benchmarking study?
Annually at minimum, even if that means just updating comparables' financials and confirming nothing material changed. OECD notes that some tax administrations may accept refreshing the search every three years with annual financial updates—but this isn't a universal safe harbor, and many authorities expect annual contemporaneous documentation. Given recent volatility (COVID, supply chains, inflation), older studies may not reflect current arm's length conditions. Stay updated to control your narrative.
What triggers a transfer pricing audit related to benchmarking?
Common triggers include: (1) consistent losses or low profits without clear reason, (2) significant year-on-year profit swings after reorganization, (3) large related-party transactions relative to overall business, (4) payments to low-tax jurisdictions, (5) industries or transaction types under regulatory focus (pharma, tech, intangibles), and (6) poor documentation quality from prior years—which signals higher risk.
Can I reuse the same benchmarking study for multiple years?
You can reuse elements, but don't copy-paste without updating. At minimum, update financials each year and check if comparables are still valid (not acquired, diversified, or defunct). If nothing significant changed, you might carry the same set for 2-3 years—but document that you reviewed and determined it still appropriate. Be cautious: if audited in Year 3, authorities will examine whether Year 3 conditions match Year 1's.
What if my comparable set is small (fewer than 5 companies)?
Small sets aren't automatically rejected—quality matters more than quantity. Many jurisdictions accept 3-5 comparables if they're highly comparable. Document thoroughly why more couldn't be found and whether additional adjustments are needed. If the set is genuinely robust, defend it confidently. However, if a single comparable drives your outcome, that's a vulnerability—consider whether you've applied filters too aggressively.
How do I know if my comparables are "good enough"?
Signs of a solid set: (1) they truly mirror the tested transaction's functional and risk profile, (2) no single comparable drives the outcome—removing one doesn't drastically change the range, (3) quality over quantity—each is defensible on its own, (4) consistent data across all (same years, same PLI denominator definitions), (5) no glaring differences unaddressed (different business models, persistent losses, different country). If defending any single comparable makes you uncomfortable, it probably doesn't belong.
What do tax authorities expect to see in documentation?
A complete local file analysis, including: (1) clear description of controlled transactions, (2) detailed functional analysis (FAR), (3) explanation of method chosen and why appropriate, (4) selection of comparables with search criteria and accept/reject summary, (5) comparables data with company descriptions and financials, (6) any comparability adjustments made, (7) arm's length range and how tested party compares, (8) conclusion on arm's length compliance. Documentation should allow an auditor to reproduce and understand your analysis step by step.