OpenAI o3 vs Fintool Chat

Testing how OpenAI o3's advanced reasoning coupled with internet search performs against specialized financial AI infrastructure when analyzing complex equity research questions from FinanceBench.

Nicolas Bustamante

•Jun 3, 2025•8 min read

When OpenAI released OpenAI o3 on April 16, 2025, it promised a new era of AI reasoning capabilities. The o3 model introduces a “private chain of thought” mechanism, allowing it to plan and execute intermediate reasoning steps before generating responses. With impressive benchmarks like achieving a 96.7% accuracy rate on AIME 2024, demonstrating strong mathematical reasoning, o3 appeared to be the breakthrough AI needed for complex analytical tasks.

But how does this latest model perform when it comes to the precision-critical world of financial analysis? We put OpenAI o3 head-to-head with Fintool Chat across four complex financial questions from FinanceBench, the industry's leading benchmark for evaluating LLMs on equity research tasks.

Financial analysis demands more than just sophisticated reasoning—it requires absolute accuracy, verifiable sources, and audit-ready citations. A single incorrect number can cascade into flawed investment decisions worth millions. This is where the rubber meets the road for AI models claiming to handle professional financial work.

Our test questions span the core challenges of equity research: calculating capital expenditures from cash flow statements, computing depreciation margins, deriving EBITDA figures, and analyzing inventory turnover ratios. Each question requires parsing complex 10-K filings, extracting precise numerical data, and performing accurate calculations—exactly the kind of work that separates reliable financial AI from expensive hallucination machines.

Side-by-Side Response Comparison

Question 1: What is the FY2018 capital expenditure amount (in USD millions) for 3M? Give a response to the question by relying on the details shown in the cash flow statement.

From FinanceBench, the leading industry benchmark for evaluating LLMs on Equity Research questions

OpenAI o3 Response

Inaccurate

Slow

View ChatChat

Thought for 1m 14s

Capital expenditures in FY 2018: $1,749 million (≈ $1.75 billion) cash outflow, shown on the cash-flow statement under “Purchases of property, plant and equipment.”
stockanalysis.com

Issues Identified

Incorrect data: $1,749 million instead of $1,577 million (10.9% error)
Inaccurate retail website: stockanalysis.com
Cannot audit the numbers without proper citations

Fintool Response

Accurate

Fast

View ChatChat

Thought for 12 sec

Capital expenditures for FY 2018 were $1,577 million, as reported in the Consolidated Statements of Cash Flows under Purchases of property, plant and equipment.
10-K FY 2018

Accuracy Verified

Right number: $1,577 million (correct amount)
Auditable source: Official SEC 10-K filing
Faster by: 87% (11 sec vs 1m 23s)

Question 2: Answer the following question as if you are an equity research analyst and have lost internet connection so you do not have access to financial metric providers. According to the details clearly outlined within the P&L statement and the statement of cash flows, what is the FY2015 depreciation and amortization (D&A from cash flow statement) % margin for AMD?

From FinanceBench, the leading industry benchmark for evaluating LLMs on Equity Research questions

OpenAI o3 Response

Inaccurate

Slow

View ChatChat

Thought for 11 seconds

Based on AMD's FY2015 10-K: Depreciation & amortization (cash-flow add-back): ≈ $243 million. Revenue: ≈ $3.99 billion. D&A margin = $243m/$3990m ≈ 6.1%. So the FY2015 D&A margin is roughly 6%.
No citation

Issues Identified

Incorrect D&A: $243 million instead of $167 million (45.5% error)
Wrong margin calculation: 6.1% instead of 4.2% (45% error)
No source citations to verify the hallucinated numbers

Fintool Response

Accurate

Fast

View ChatChat

Thought for 8 seconds

FY2015 depreciation and amortization (D&A) % margin for AMD: D&A for FY2015 (from cash flow statement): $167 million. Revenue for FY2015: $3,991 million. D&A % margin = ($167 million / $3,991 million) * 100 = 4.2%
10-K FY 2015

Accuracy Verified

Correct D&A: $167 million (accurate from 10-K)
Right margin: 4.2% (verified calculation)
Faster by: 27% (8 sec vs 11 sec)

Question 3: Basing your judgments off of the cash flow statement and the income statement, what is American Water Works's FY2021 unadjusted operating income + depreciation and amortization from the cash flow statement (unadjusted EBITDA) in USD millions?

From FinanceBench, the leading industry benchmark for evaluating LLMs on Equity Research questions

OpenAI o3 Response

Inaccurate

Slow

View ChatChat

Thought for 1m 23s

Component Source line item USD millions

Operating income: $1,350 million (Income statement “Operating Income”)

Depreciation & amortization: $649 million (Cash-flow statement “Depreciation & Amortization”). Unadjusted EBITDA = $1,350 + $649 = $1,999 million (about $2.0 billion)

stockanalysis.com

Issues Identified

Wrong operating income: $1,350 million instead of $1,196 million
Incorrect D&A: $649 million instead of $636 million (2.0% error)
Wrong EBITDA: $1,999 million instead of $1,832 million (8.4% error)
Unreliable source: stockanalysis.com (not official SEC filing)

Fintool Response

Accurate

Fast

View ChatChat

Thought for 11 sec

To calculate American Water Works's FY2021 unadjusted EBITDA:

Operating Income (FY2021): $1,196 million10-K

Depreciation and Amortization (FY2021): $636 million10-K

Calculation: EBITDA = Operating Income + Depreciation and Amortization

EBITDA (FY2021) = $1,196 million + $636 million = $1,832 million10-K

Accuracy Verified

Correct operating income: $1,196 million (accurate from 10-K)
Right D&A: $636 million (verified from cash flow)
Accurate EBITDA: $1,832 million (correct calculation)
Faster by: 87% (11 sec vs 1m 23s)

Question 4: Roughly how many times has JnJ sold its inventory in FY2022? Calculate inventory turnover ratio for FY2022

From FinanceBench, the leading industry benchmark for evaluating LLMs on Equity Research questions

OpenAI o3 Response

Inaccurate

Slow

View ChatChat

Thought for 55 seconds

Inventory turnover (FY 2022)

Item: 2022 figure (USD bn)

Cost of goods sold (COGS): 24.60

Ending inventory: 10.27

2021 ending inventory: 10.39 (≈1.15 % above 2022)

Average inventory 2022 ≈ (10.27 + 10.39)/2 ≈ 10.33 bn

Inventory-turnover ratio 2022: COGS ÷ Average inventory ≈ 24.60 / 10.33 ≈ 2.4×

That means J&J “sold through” its entire stock roughly 2 to 3 times during the year—about one turn every 153 days.

Macrotrends

Issues Identified

Wrong COGS: $24.60 billion instead of $31.089 billion (21% error)
Incorrect inventory figures: 10.27 & 10.39 vs actual 12.483 & 10.387
Wrong inventory turnover: 2.4× instead of 2.72× (11.8% error)
Unreliable source: Macrotrends (not official SEC filing)

Fintool Response

Accurate

Fast

View ChatChat

Thought for 14 sec

To calculate the inventory turnover ratio for Johnson & Johnson (JNJ) in FY2022:

Formula: inventory_turnover_ratio = cost_of_goods_sold / average_inventory

Step 1: Find Cost of Goods Sold (COGS) for FY2022:

COGS (FY2022) = $31,089 million10-K

Step 2: Find Average Inventory for FY2022:

Inventory at start of FY2022 (Jan 2, 2022): $10,387 million10-K

Inventory at end of FY2022 (Jan 1, 2023): $12,483 million10-K

Average Inventory = (10,387 + 12,483) / 2 = $11,435 million

Step 3: Calculate Inventory Turnover Ratio:

inventory_turnover_ratio = 31,089 / 11,435 ≈ 2.72

Accuracy Verified

Correct COGS: $31,089 million (accurate from 10-K)
Right inventory figures: $10,387M & $12,483M (verified from 10-K)
Accurate inventory turnover: 2.72× (correct calculation)
Faster by: 75% (14 sec vs 55 sec)

Accuracy Trumps Sophistication

Our comprehensive comparison reveals important insights about AI in financial analysis. While OpenAI o3 demonstrates impressive capabilities and sophisticated reasoning—and is indeed one of the models we leverage within Fintool's infrastructure—raw model sophistication alone isn't sufficient for accurate equity research.

The key difference lies not in the underlying AI model, but in the complete infrastructure designed around it. Effective financial AI requires purpose-built data pipelines, verified SEC filing integration, specialized financial context, and audit-ready citation systems. Even the most advanced language models need this foundation to deliver the precision that equity research demands.

OpenAI o3 Performance

0/4 questions answered correctly
Average error rate: 23.1%
No verifiable source citations
4-7x slower response times
Relies on unreliable web sources

Fintool Performance

4/4 questions answered correctly
Zero calculation errors
Official SEC filing citations
4-7x faster responses
Audit-ready documentation

The implications extend far beyond academic benchmarks. In professional finance, where a single miscalculated ratio can swing investment decisions worth millions, infrastructure and verifiability matter as much as reasoning capability. While OpenAI o3's advanced architecture provides powerful reasoning foundations, it requires specialized financial infrastructure to channel that power effectively for equity research applications.

This is why Fintool combines advanced AI models like o3 with purpose-built financial infrastructure—verified data sources, specialized financial context, comprehensive SEC filing integration, and audit-ready citation systems. The result is both the sophistication of cutting-edge AI and the reliability that professional finance demands.

Ready to Experience Accurate Financial AI?

See the difference that purpose-built financial AI makes. Get precise, citation-backed answers to your equity research questions in seconds, not minutes.

View All Benchmarks