DeepSeek V4-Pro and V4-Flash: Frontier AI at Indie Prices
A frontier-class model just dropped at $0.28 per million tokens
DeepSeek released preview versions of its V4-Pro and V4-Flash models today, exactly one year after the original R1 release that upended the global AI conversation. V4-Pro is now the largest open-weights model in the world. V4-Flash costs roughly one-hundredth of what frontier US models charge for the same work.
For small businesses paying per token every time an AI tool runs, that pricing gap is the entire story.
The news in five facts
- V4-Pro is a 1.6 trillion parameter Mixture-of-Experts model with 49 billion active per token. V4-Flash is a lighter 284 billion / 13 billion active variant. Both have 1 million token context windows.
- Both are published under the MIT license — commercial use allowed, no royalty, no gatekeeping.
- V4-Pro output pricing: $3.48 per million tokens. V4-Flash output pricing: $0.28 per million tokens, according to DeepSeek’s own API docs.
- V4-Pro “claims top performance on coding and math among open models” and trails GPT-5.4 and Gemini 3.1-Pro by what DeepSeek calls “approximately 3 to 6 months” on general benchmarks, per coverage in Fortune.
- The models are tuned for Huawei Ascend chips, not Nvidia — a continuation of the hardware-independence push we covered last month.
The number that matters: $0.28 per million output tokens means a small business can run 50,000 AI-generated review replies for roughly fourteen dollars.
Why this matters for small businesses
The V4 preview does not mean you should rush to swap vendors. It means the floor under AI pricing just moved again.
The tools you pay for got cheaper to build
Most of the AI products small businesses use — chatbots, content tools, intake widgets, review responders — are thin wrappers over a few underlying models. When the cost of the underlying model drops an order of magnitude, the economics of the wrappers change. Either the wrapper passes savings along, or a competitor shows up charging less.
This is the same pattern Nvidia’s GTC keynote on “cheap inference” described in March. V4-Flash turns that forecast into a shipping model. The vendors that can’t justify their pricing against a $0.28 alternative will either discount or disappear.
Open-source frontier models finally cross “good enough”
Twelve months ago, the honest argument for paying top dollar to OpenAI or Anthropic was capability. On most business tasks, the open models weren’t close. V4-Pro narrows that gap to 3 to 6 months on DeepSeek’s own benchmarks — and on coding and math, they claim the lead among open models outright.
For a small business writing product descriptions, summarizing calls, or drafting email replies, a 3-month capability gap is not a real gap. The work gets done.
Vendor concentration risk is falling
DeepSeek’s seven-hour outage on April 17 was a reminder that one cheap model hosted by one provider is still a single point of failure. Open weights change that math. Any business that licenses V4-Pro or V4-Flash can run it through Together.ai, Fireworks, Groq, or Cloudflare Workers AI — or self-host on rented GPUs — and swap providers without changing code. That optionality used to be enterprise-only.
Our take: watch the price, not the press release
DeepSeek’s marketing leans into the “Sputnik moment” comparison. Ignore that. The real story is that the pricing on the hosted API is aggressive enough to reset expectations for every AI vendor serving small businesses.
The bottom line: V4-Flash at $0.28 makes frontier-ish AI a line-item commodity, not a budget conversation.
Two cautions before you switch anything.
Trust and provenance still matter
The Anthropic accusation that DeepSeek distilled stolen training data has never been resolved. If your business handles anything sensitive — medical notes, legal drafts, customer PII — route those workloads through a US-based provider hosting the open weights rather than DeepSeek’s own API. The open MIT license makes that easy. The hosted endpoint is the part worth being cautious about.
”Preview” means preview
DeepSeek labeled both models as preview releases. That typically means pricing, context behavior, and availability can change. Don’t build production infrastructure on a preview endpoint without a fallback model configured. This is the same rule that applies to any model from any vendor — OpenAI’s “research previews” have the same caveat — but it matters more when the draw is price.
What you should do this week
Three low-effort moves that will compound as prices keep falling.
- Price your current AI spend per output token. Pull one month of invoices. Divide by how much the tool actually generated. Most small businesses discover they are paying the equivalent of $20 to $60 per million tokens for tasks V4-Flash handles at $0.28. That number is your negotiating leverage the next time a vendor raises prices.
- Ask your vendors which models they use — and whether they will swap. A good AI vendor will tell you. A great one will let you pick. If yours can’t or won’t, that is a signal. Our guide to evaluating AI tools for small business walks through the specific questions to ask.
- Identify one high-volume, low-stakes task — review replies, FAQ answers, meeting summaries — and put it on a cheap model. V4-Flash, Mistral Small 4, or an equivalent. Measure quality for two weeks. If it holds, the savings show up immediately.
Watch for
- US and EU hosted endpoints for V4-Pro and V4-Flash showing up on Together, Fireworks, and Groq within a week. That is the signal it is safe to use in business workflows.
- Quiet price drops from OpenAI, Anthropic, and Google on their cheaper tiers within the next 60 days. When the big labs cut prices on their “mini” and “flash” models, it means DeepSeek’s pricing broke something.
The quiet part
The loud headline today is “China’s DeepSeek is back.” The quiet one is that frontier AI now costs what a domain name used to. Small businesses that used to worry about whether they could afford AI should worry about whether they are paying too much.
Looking at whether your AI tools pass the new cost test? Browse our AI Employees catalog — each agent runs on whichever model gets the job done, not whichever vendor had the biggest launch party. Or get in touch if you’d rather have someone else do the math.