12 May 2026·7 min read

Why Claude is the default AI across every SourceForge product

The reasoning behind making Anthropic Claude the default text model across every SourceForge product — and the one exception we make for media.

Indrajit Banerjee
Founder, SourceForge

When we standardised our AI stack two years ago, the obvious question was which model family to default to. We picked Claude, and we wrote a lint rule that fails CI on any call to a different text provider. This is the reasoning.

One model family, one brand voice

Most of what our products generate is customer-facing prose. Invoices. Compliance reports. Vendor emails. Blog drafts. Conversation transcripts. The single biggest source of customer-perceived quality drift is when half the prose sounds like one writer and half sounds like another. If you switch model families behind the scenes — Claude here, GPT-4 there, Gemini for the cheap stuff — the voice fragments. The customer can hear it, even when they can't articulate it.

Standardising on Claude means one prompt-engineering discipline. The brand voice profile we capture in onboarding sticks. The Brand Voice Enforcer agent has one fingerprint to compare against, not three.

Reliability on long structured tasks

Most of our work is not chatbots. We write invoices, draft reports, run multi-step agent workflows, generate compliance documents that need to be valid as input to a downstream parser. Long, structured, deterministic-feeling output is where Claude has consistently outperformed the alternatives for our workloads.

Two specific behaviours matter:

  • Claude follows long system prompts faithfully — the 60-agent framework in Marketing OS uses system prompts of 4–8K tokens that define tools, schemas, voice rules and refusal conditions. Faithfulness to these matters more than benchmark headroom.
  • Claude's structured output mode is reliable enough that we treat its JSON as parseable by default. We still validate with Zod at the boundary, but parse failures are rare.

Honest refusals

Claude refuses unsafe requests, and when it doesn't know something, it tends to say so. For products that touch finance and HR — vendor invoices, AML thresholds, customer KYC — we'd rather a refusal than a confident hallucination. Anthropic's training has shifted the optimisation away from "always have something to say".

A predictable partner

Anthropic's pricing and model availability have been stable in a way that matters when you ship multi-year contracts. Sonnet 3.5 was usable for production a year before it was deprecated; the migration to Sonnet 4 was a config change. We've never had to rewrite a prompt because the model behaviour drifted underneath us.

The one exception

We make exactly one. Media workloads — image generation through gpt-image-1, speech-to-text through Whisper, and text-to-speech where a neural voice is required — go to OpenAI. Claude does not currently offer these capabilities, and the alternatives in open-source are not yet at the bar we need.

The split is enforced by lint. Calls to the Anthropic SDK can only originate from packages/ai/anthropic-client.ts. Calls to the OpenAI SDK can only originate from packages/ai/openai-client.ts, and are restricted at the agent level to AGT-120 through AGT-124 in Marketing OS. Any other usage fails CI. This is the kind of discipline that's expensive to set up and cheap to maintain.

Cost economics that compound

The headline price difference between providers matters less than the prompt-cache hit rate, the Batch API discount and the model-tier mix you can hold across a portfolio. On our workloads:

  • Cache hit rate sits at 80%-plus on system prompts and brand context
  • Batch API saves around 50% on non-realtime workloads (most reporting and content-gen)
  • The model mix runs at 15% Haiku, 75% Sonnet, 10% Opus — enforced by per-agent budgets

The compounded effect is roughly an order of magnitude cheaper per deliverable than the headline list price would suggest. Standardising on one provider is what makes the cache economics work — you can't cache across providers.

When we'd revisit

The decision isn't sacred. We'd revisit if Anthropic's pricing moved sharply, if a competitor shipped a clearly superior tool-calling implementation, or if Claude itself drifted in quality. We watch this quarterly. So far, the answer has been: stay on Claude. The default is sticky for a reason — every product we ship is a vote of confidence in that decision, and switching costs across nine products are non-trivial.

The lesson, if there is one, is that a single-provider strategy is a forcing function. It makes you engineer the cost levers properly. Multi-provider sounds prudent and turns out to be just inertia in disguise.

Written by
Indrajit BanerjeeFounder, SourceForge

Published 12 May 2026 by SourceForge Software Services Pvt Ltd. Replies, corrections and follow-up questions: info@sourceforge.in.

Have a project that touches what you just read?

The blog exists because we'd rather show our thinking than pitch it. If something here resonated, let's talk about how it applies to your situation.

WhatsAppCall us