Loading...
Loading...
Borys Ulanenko
CEO of ArmsLength AI

Get the latest transfer pricing insights, AI benchmarking tips, and industry updates delivered straight to your inbox.
A transfer pricing benchmarking study supports an arm’s length result by comparing the tested party’s profitability to independent companies with similar economically relevant characteristics (functions, assets, risks, contractual terms, and economic circumstances). When the screening trail, comparability rationale, or financial adjustments aren’t coherent, tax authorities can more easily challenge (or rebuild) your analysis—often by stripping weak comparables, changing the profit level indicator (PLI), or rerunning the search.
The most common benchmarking mistakes are:
These mistakes aren't hypothetical—they're routinely found in audits and have triggered significant adjustments. Addressing them before submission dramatically improves defensibility.
What it is: Failing to document the search and screening process for comparables. The study shows 10 final comparables but no record of the dozens of companies considered and rejected.
Why it happens: Practitioners treat the search process as internal workpapers and only include results in the report. Time pressure or lack of a standardized process leads to skipping detailed documentation.
Tax authority reaction: Lack of screening documentation is a major red flag. HMRC notes that transfer pricing documentation is often "too high level" and insufficiently evidenced, making it impossible to confirm an arm's length result. The IRS expects sufficient documentation to show you reasonably selected and applied the best method—and that you can produce it quickly in audit. Weak support for comparable selection increases audit burden and invites extensive Information Document Requests asking for details you should have documented upfront.
Audit Risk: If you cannot show which companies were rejected and why, auditors may suspect you excluded companies arbitrarily to manipulate the outcome. The credibility of the entire study collapses.
How to fix it:
The matrix demonstrates a diligent, consistent process and can save hours in audit by proving you applied criteria systematically.
What it is: Choosing an inappropriate Profit Level Indicator for the tested party's business model. Using Return on Assets for a distributor with minimal assets, or Operating Margin when pass-through costs dominate.
Why it happens: PLI selection is sometimes done by rote ("we always use operating margin") or based on what makes the tested party look better. A shallow functional analysis contributes—if you haven't truly understood the tested party's value drivers, you'll pick a PLI that doesn't capture them.
Tax authority reaction: Using the wrong PLI attracts immediate auditor criticism. IRS regulations emphasize that the PLI should "provide the most reliable measure of an arm's-length result." Auditors may recompute your analysis with a different PLI—and if that yields a different result, expect an adjustment.
The Rule: Match the PLI to the value driver. Revenue-driven → Operating Margin. Cost-driven → Net Cost Plus. Asset-intensive → ROA/ROOA. Pass-through with value in OPEX → Berry Ratio (with conditions).
How to fix it:
For detailed guidance, see our PLI Selection Guide.
What it is: Including too many comparables, some of which are poor quality matches, just to have a "big sample." Valuing quantity over quality.
Why it happens: Practitioners believe larger samples are inherently more defensible. Some include marginal comparables to ensure the tested party falls within range—padding the set with low- or high-margin companies to widen it. The "if in doubt, include it" approach leads to bloated sets where comparables aren't truly comparable.
Tax authority reaction: Tax authorities prioritize quality over quantity. Courts have consistently upheld authorities rejecting weak comparables from taxpayer sets—even when this reduces the sample size significantly. The underlying principle: a smaller set of truly comparable companies is more reliable than a larger set with "apples and oranges." The IRS FAQs explicitly encourage stress-testing your set by removing one comparable and seeing what happens—if your result hinges on a single company, the set isn't reliable.
How to fix it:
There's no regulatory "minimum 10" rule. Many jurisdictions accept 3-5 comparables if quality is high. Strive for a sample where you could defend each company individually in front of an auditor.
What it is: Failing to perform working capital adjustments when comparables have materially different levels of receivables, payables, or inventory than the tested party. These differences affect profitability—a company carrying large receivables has financing costs that lower reported profit.
Why it happens: WCAs can be complex and are sometimes viewed as optional. Practitioners skip them to avoid complication or assume the impact is immaterial. In other cases, balance sheet data isn't readily available. Practice is inconsistent: some treat WCA as standard, others as "rarely needed."
Tax authority reaction: Working capital differences are a recognized comparability factor. The OECD and many local regulations acknowledge that where payment terms or inventory differ and impact pricing, adjustment can improve comparability. In some audits (particularly in India), authorities may compute WCAs even if the taxpayer did not. If the tested party's working capital is an outlier, auditors will ask why no adjustment was made.
How to fix it:
For step-by-step calculations, see our Working Capital Adjustments Guide.
What it is: Relying on old or stale comparables for current-year transfer pricing without proper updates. Reusing the same set for multiple years under the assumption that "nothing has changed."
Why it happens: Benchmarking studies are time-consuming and costly. The OECD notes that some tax administrations may accept refreshing the comparables search every three years, with annual financial data updates in between. But this isn't a universal safe harbor—economic conditions change (recessions, booms, pandemics) and comparables can drift (acquired, diversified, defunct).
Tax authority reaction: Many tax authorities expect annual, contemporaneous documentation. While comparables searches might not be redone every single year if facts are constant, very old studies are viewed with suspicion. The IRS expects analysis reflecting the economic conditions of the tax year in question—especially after events like COVID-19. Tax authorities may conduct their own fresh search and find different results.
Post-COVID Alert: Studies based on pre-2020 data applied to 2020-2022 results are especially vulnerable. Economic volatility means comparability requires year-specific judgment.
How to fix it:
What it is: Conducting the comparables search in a database not appropriate for the transaction's geographic or industry context. Using only foreign comparables when local ones are available and preferred.
Why it happens: Convenience or subscription access—the analyst uses whichever database they have, even if it's not ideal for that region. Many countries have a preference or requirement for local market comparables, and ignoring this creates risk.
Tax authority reaction: In practice, some tax authorities may prefer local comparables where reliable local data exists—especially when market conditions are likely to differ materially. If your scope is misaligned (e.g., using only foreign comparables with no documented attempt to identify local candidates), expect questions and a potential re-run of the search by the auditor.
If you use only European companies for an Indian transaction, the TPO will likely reject most of them as not reflecting Indian market conditions.
How to fix it:
What it is: Improper inclusion or exclusion of comparables solely based on whether they have losses. This manifests two ways: (a) automatically excluding all loss-makers without analysis, even if losses reflect normal business cycles, or (b) including a loss-maker whose losses stem from extraordinary circumstances unlike the tested party's situation.
Why it happens: Loss-makers are contentious. Some practitioners take a "reject all losses" rule of thumb, believing independents don't persistently sell at losses (not always true). Others include loss-makers to defend a low margin without checking why the loss occurred. Without clear guidance, firms default to convenience or bias.
Tax authority reaction: OECD Guidelines explicitly state that you should not reject a comparable solely because it has losses (). Recent case law reinforces this: the Italian Supreme Court (Decision No. 19512, July 2024) ruled that loss-making comparables cannot be excluded just for being in loss—the reasoning must be economic, not automatic. Colombia's Council of State (2024) similarly clarified that a single year of loss doesn't justify exclusion unless losses are recurrent or signal materially different economic conditions.
The Principle: Investigate the circumstances. Ask: Do losses reflect normal business volatility, or abnormal conditions (bankruptcy, restructuring, speculative strategy)? Document your reasoning for each loss-maker.
How to fix it:
What it is: Inconsistency in using multi-year data—such as averaging multiple years of comparables data but testing the single-year result of the tested party. Or cherry-picking years for different companies without adjusting approach.
Why it happens: Practitioners automatically average three years of comparables data without realizing some tax authorities require year-by-year analysis. Others average because it makes the tested party look better (diluting one bad year). Inconsistencies arise when data availability varies—instead of dropping comparables, analysts average whatever is available for each.
Tax authority reaction: Consistency is universally expected. The Canada Revenue Agency explicitly states that arm's length pricing should be established year by year, not by averaging multiple years. If you mix single-year and multi-year data, auditors will flag it as methodological flaw. If your analysis masks a non-compliant year through averaging, the IRS will still target that year.
How to fix it:
What it is: Providing a superficial functional analysis that doesn't adequately distinguish the tested party's profile or support the choice of comparables and method. Generic sentences like "Company A is a distributor that performs marketing and sales functions" that could describe any distributor anywhere.
Why it happens: Some view functional analysis as compliance formality and reuse boilerplate. If the preparer doesn't engage with operational teams, they can't gather nuanced information. Time constraints lead to cutting corners. Template-based or AI-generated outputs can be generic.
Tax authority reaction: Auditors place huge importance on functional analysis—it's often the first thing they read. The IRS has criticized taxpayers who present just a "functional analysis checklist" without connecting it to the transfer pricing method. HMRC notes common issues include analysis being too high-level or one-sided (not analyzing both sides of the transaction). A shallow functional analysis undermines confidence in the entire study.
How to fix it:
The Test: Read your functional analysis aloud. If it could apply to any company in your industry, it's too generic. Add specific facts about this tested party.
What it is: Failing to include a complete accept/reject summary—or including one that's missing key elements. The matrix might omit reasons for rejection, not list all companies considered, or lack important fields (turnover, RP percentage).
Why it happens: Preparing a comprehensive matrix is tedious. Practitioners assume a narrative description suffices, or fear listing every rejected company gives auditors a roadmap to second-guess each exclusion. Sometimes it's oversight—the team screened internally but didn't document every step in the report.
Tax authority reaction: An incomplete matrix raises questions: What are you hiding? Did you apply criteria consistently? Some jurisdictions have specific requirements—for example, certain Israeli regimes (such as local R&D/cost-plus frameworks under ITA circulars) require attaching a full TP study including an accept/reject matrix to the annual return. If only aggregate stats are given ("we excluded 10 companies due to losses"), auditors will ask you to name them and prove it.
How to fix it:
What it is: Treating companies as “independent” when they have material related-party links, related-party transactions, or are part of a group that influences pricing. This can happen through superficial ownership screens, overreliance on a single database flag, or not validating independence during manual review.
Why it happens: Independence filters are easy to apply mechanically, but hard to apply correctly. Data can be incomplete, and “independence” indicators differ by database. Time pressure pushes teams to accept database labels at face value.
Tax authority reaction: If a comparable is shown to be non-independent, it can be removed—sometimes along with other companies screened using the same weak logic. That undermines the set and your screening credibility.
How to fix it:
For a systematic approach to independence screening, see our Quantitative Screening Filters guide.
What it is: Applying quantitative filters (industry codes, size, profitability, data availability, independence) in a way that creates arbitrary exclusions, inconsistent denominators, or a biased candidate pool. Common symptoms include overly tight size filters, profitability screens applied too early, or inconsistent data availability requirements.
Why it happens: Filters are often copied from prior studies without checking whether they fit the tested party and data universe. Teams also confuse “efficient” with “defensible,” optimizing for speed rather than a transparent, reproducible funnel.
Tax authority reaction: If your quantitative screen looks arbitrary—or if it appears designed to force an outcome—auditors will rerun it with different thresholds and argue your set is unreliable.
How to fix it:
For recommended filter sequencing and common pitfalls, see our Quantitative Screening Filters.
What it is: Selecting TNMM (or another method) by default without documenting why it’s the most appropriate method given the accurately delineated transaction, available data, and the reliability of internal vs external comparables.
Why it happens: TNMM is often practical with public data, so teams treat it as the default. But “practical” isn’t the same as “most appropriate,” and tax authorities expect to see the reasoning chain.
Tax authority reaction: If the method choice is weak, authorities may recharacterize the analysis (e.g., different method, different tested party, different PLI), which can cascade into a different outcome.
How to fix it:
If you need a full end-to-end benchmark workflow, see our Benchmarking Study Guide and Tested Party Selection Guide.
What it is: Recognizing material differences (capacity utilization, accounting classification, risk profile, or other structural differences) but either not adjusting—or adjusting inconsistently without explaining the impact.
Why it happens: Beyond working capital adjustments, many adjustments require judgment and extra analysis, so teams either skip them or apply them selectively. Sometimes the issue is documentation: the team did consider adjustments, but never wrote down why they were immaterial.
Tax authority reaction: If differences are obvious and unaddressed, auditors can argue your set isn’t comparable, remove companies, or substitute their own adjustments.
How to fix it:
For common adjustments beyond WCA, see our Comparability Adjustments Guide.
What it is: A disconnect between the narrative (transaction description, functional analysis, risk allocation) and the benchmarking mechanics (tested party, segmentation, PLI definition, financials used). For example: the report describes a “routine, low-risk distributor,” but the financial segmentation includes non-routine revenue streams or includes/excludes costs inconsistently with the chosen PLI.
Why it happens: Different teams own different parts of the file. Functional analysis is written from interviews, the benchmark is built from database work, and financial segmentation is done separately—without one person reconciling the full story end-to-end.
Tax authority reaction: Inconsistencies are easy audit wins. They trigger follow-up questions, broader information requests, and reduce trust in the whole analysis—even if the numbers are otherwise reasonable.
How to fix it:
For how benchmarking integrates into the Local File narrative, see Local File Best Practices.
Before finalizing a benchmarking study, run through this checklist to catch weaknesses before an auditor does.
Understanding what tax authorities scrutinize helps you preempt problems:
Benchmarking Guides:
Documentation Guides:
Glossary:
Insufficient documentation of the comparables selection process—essentially, poor or missing support for why certain comparables were chosen and others rejected. Tax authorities consistently complain about documentation that shows final outcomes without the trail. When rationale isn't clear, it raises suspicions of cherry-picking. This is the most prevalent issue and, fortunately, the easiest to fix: document everything in an accept/reject matrix.
Annually at minimum, even if that means just updating comparables' financials and confirming nothing material changed. The OECD notes that some tax administrations may accept refreshing the search every three years with annual financial updates—but this isn't a universal safe harbor, and many authorities expect annual contemporaneous documentation. Given recent volatility (COVID, supply chains, inflation), older studies may not reflect current arm's length conditions. Stay updated to control your narrative.
Common triggers include: (1) consistent losses or low profits without clear reason, (2) significant year-on-year profit swings after reorganization, (3) large related-party transactions relative to overall business, (4) payments to low-tax jurisdictions, (5) industries or transaction types under regulatory focus (pharma, tech, intangibles), and (6) poor documentation quality from prior years—which signals higher risk.
You can reuse elements, but don't copy-paste without updating. At minimum, update financials each year and check if comparables are still valid (not acquired, diversified, or defunct). If nothing significant changed, you might carry the same set for 2-3 years—but document that you reviewed and determined it still appropriate. Be cautious: if audited in Year 3, authorities will examine whether Year 3 conditions match Year 1's.
Small sets aren't automatically rejected—quality matters more than quantity. Many jurisdictions accept 3-5 comparables if they're highly comparable. Document thoroughly why more couldn't be found and whether additional adjustments are needed. If the set is genuinely robust, defend it confidently. However, if a single comparable drives your outcome, that's a vulnerability—consider whether you've applied filters too aggressively.
Signs of a solid set: (1) they truly mirror the tested transaction's functional and risk profile, (2) no single comparable drives the outcome—removing one doesn't drastically change the range, (3) quality over quantity—each is defensible on its own, (4) consistent data across all (same years, same PLI denominator definitions), (5) no glaring differences unaddressed (different business models, persistent losses, different country). If defending any single comparable makes you uncomfortable, it probably doesn't belong.
A complete local file analysis, including: (1) clear description of controlled transactions, (2) detailed functional analysis (FAR), (3) explanation of method chosen and why appropriate, (4) selection of comparables with search criteria and accept/reject summary, (5) comparables data with company descriptions and financials, (6) any comparability adjustments made, (7) arm's length range and how tested party compares, (8) conclusion on arm's length compliance. Documentation should allow an auditor to reproduce and understand your analysis step by step.
The OECD Transfer Pricing Guidelines provide comprehensive guidance on avoiding benchmarking mistakes: