Published December 27, 202512 min read

Citation Hallucination in Transfer Pricing AI: Measuring Accuracy with the OECD Guidelines API

We tested three AI configurations on 10 transfer pricing questions. ChatGPT with web search scored 83%, but models without web access fabricated most quotes. Here's what we learned.

Borys Ulanenko

CEO of ArmsLength AI

Citation Hallucination in Transfer Pricing AI: Measuring Accuracy with the OECD Guidelines API

TL;DR key takeaways

ChatGPT with GPT-5.2 and web search has improved significantly, achieving 83% citation accuracy in our tests.
Models without web access (GPT-4.1 and similar) fabricate quotes 91% of the time, even when citing real paragraph numbers.
GPT-5.2 with our OECD API achieves 99% accuracy by retrieving actual paragraph text before responding.
For professional TP documentation, verify all AI-generated citations against the source.

Sources

OECD Transfer Pricing Guidelines 2022

Back to resources

Continue exploring

More resources for your next workflow step

Browse the full resource library or contact us if you want recommendations for your specific use case.

All resources Talk to us

Citation Hallucination in Transfer Pricing AI: Measuring Accuracy with the OECD Guidelines API

We tested three AI configurations on 10 transfer pricing questions. ChatGPT with web search scored 83%, but models without web access fabricated most quotes. Here's what we learned.

Borys Ulanenko

CEO of ArmsLength AI

TL;DR key takeaways

ChatGPT with GPT-5.2 and web search has improved significantly, achieving 83% citation accuracy in our tests.
Models without web access (GPT-4.1 and similar) fabricate quotes 91% of the time, even when citing real paragraph numbers.
GPT-5.2 with our OECD API achieves 99% accuracy by retrieving actual paragraph text before responding.
For professional TP documentation, verify all AI-generated citations against the source.

Configuration

Model

Tools

Use Case

Baseline

GPT-4.1

None

Internal tools, agents, automations

Web Search

GPT-5.2

Web Search

Default ChatGPT experience

OECD API

GPT-5.2

OECD Guidelines API

RAG-grounded citation

Configuration

Citation

Quote

Gap

OECD API

100%

99%

1pp

Web Search

97%

66%

31pp

Baseline

58%

49pp

Failure Type

Description

Count

Fabricated quotes

Quote doesn't exist in cited paragraph

Wrong paragraph

Correct concept, wrong paragraph number

Invented references

Paragraph doesn't mention the claimed content

Total failures

8/10

Configuration

Avg Time

Input Tokens

Notes

Baseline

15s

~40

Fast but unreliable

OECD API

65s

~15,000

Best accuracy per token

Web Search

122s

~45,000

Slowest, high token usage

Category	What We Tested
Direct Citation	Can the model cite specific paragraphs accurately?
Trap Questions	Does the model invent guidance on topics not covered?
Technical Interpretation	Can the model synthesize method selection guidance?
Multi-Step Reasoning	Can the model cite multiple relevant sections?

Category	What We Tested
Direct Citation	Can the model cite specific paragraphs accurately?
Trap Questions	Does the model invent guidance on topics not covered?
Technical Interpretation	Can the model synthesize method selection guidance?
Multi-Step Reasoning	Can the model cite multiple relevant sections?

Citation Hallucination in Transfer Pricing AI: Measuring Accuracy with the OECD Guidelines API

TL;DR key takeaways

Sources

More resources for your next workflow step

Citation Hallucination in Transfer Pricing AI: Measuring Accuracy with the OECD Guidelines API

TL;DR key takeaways

The Problem

ChatGPT Has Improved

What We Tested

Results

Web Search: Good but Variable

Baseline Models: The Fabrication Problem

OECD API: Consistent Accuracy

How the API Works

Performance

What This Means for Practice

Methodology

Summary

Access the OECD API

Stay informed on Transfer Pricing

Sources

More resources for your next workflow step

The Problem

ChatGPT Has Improved

What We Tested

Results

Web Search: Good but Variable

Baseline Models: The Fabrication Problem

OECD API: Consistent Accuracy

How the API Works

Performance

What This Means for Practice

Methodology

Summary

Access the OECD API

Stay informed on Transfer Pricing