DeepSeek V4 Is MIT-Licensed: Should Developers Self-Host or Stay on Closed APIs?

DeepSeek V4 launched 24 April 2026 with open weights. For most devs the closed API stays cheaper. The math, GPU honesty, and when self-host pays.

4 May 2026 Updated 19 May 2026 ~12 min read

Composite editorial illustration for the 24 April 2026 DeepSeek V4 launch, the open-weights release this article unpacks for developers

Composite editorial imagery for Neural Tech Daily’s coverage of the 24 April 2026 DeepSeek V4 launch. Diagram by Neural Tech Daily.

The bottom line

DeepSeek V4-Pro and V4-Flash shipped on 24 April 2026 under an MIT license, with weights on Hugging Face and a hosted API at platform.deepseek.com.¹ For nearly every developer reading this, the practical answer is the same: keep using the closed APIs you’re already on, and call DeepSeek’s hosted V4-Flash endpoint at $0.14 input / $0.28 output per million tokens when its price actually matters to your bill.²

V4-Flash is the variant most teams should look at first. It’s the lighter 284B-parameter mixture-of-experts model with 13B active parameters, priced an order of magnitude below Claude Sonnet 4.5 and GPT-5.5, and runnable on a small GPU cluster if self-host genuinely matters to you. V4-Pro is the 1.6-trillion-parameter flagship — its hosted API runs $1.74 input / $3.48 output per million tokens (with a 75% promo discount taking it to $0.435 / $0.87 through 31 May 2026), and self-hosting it needs roughly eight H100 80GB GPUs per replica, which is a ₹35-40 lakh per month rental on AWS Mumbai.²³ Closed-API spend has to clear genuinely large token volumes before that math turns over.

The interesting story is what MIT-licensed weights buy you when you’re not running the model yourself: audit access, fine-tuning rights, regulatory headroom under DPDP Act 2023 for sensitive workloads, and an exit option if any one provider squeezes pricing.⁴ That’s the value, and it’s real. The “self-host on a laptop” framing is not.

What actually shipped on 24 April 2026

DeepSeek released two model variants. V4-Pro is the 1.6T-parameter flagship with 49B active parameters per forward pass, served through a mixture-of-experts router. V4-Flash is a 284B-parameter variant with 13B active parameters, intended for latency-sensitive deployments and as the more practical self-host target.⁵ Both ship with MIT-licensed weights, the most permissive license any frontier-class open model has carried.

The headline benchmarks, as published on the V4-Pro Hugging Face model card, claim 80.6% on SWE-bench Verified, 95.2% on HMMT 2026 February, and 89.8% on IMOAnswerBench for V4-Pro.⁵ MIT Technology Review’s launch coverage frames V4 as in genuine competition with closed-API frontier systems.⁶ Our own caveat is that benchmark numbers from any lab need third-party replication before becoming load-bearing. Independent tracking by Artificial Analysis confirms the 80.6% SWE-bench Verified figure and notes V4-Pro effectively ties Gemini 3.1 Pro and trails Claude Opus 4.6 (80.8%) by 0.2 points.⁷ Treat the published numbers as the lab’s claim plus one independent corroboration, not as a settled empirical fact across all evaluators.

The hosted API at platform.deepseek.com prices V4-Flash at $0.14 per million input tokens and $0.28 per million output tokens; V4-Pro at $1.74 input / $3.48 output, with a 75% discount running through 31 May 2026 that takes V4-Pro to $0.435 / $0.87.² Verify the live page before any commercial decision because API pricing fluctuates. V4-Flash is roughly an order of magnitude cheaper than Claude Sonnet 4.5 ($3 in / $15 out) or GPT-5.5’s comparable tier; V4-Pro at the regular rate is roughly 2x cheaper than Sonnet 4.5 on input, comparable on output. The chat interface at chat.deepseek.com is free for browser-based use.⁸

The “self-host” claim, honestly

Open weights are not the same thing as runnable weights. V4-Pro at 1.6T parameters with 49B active needs roughly 8x H100 80GB GPUs to serve one production replica with reasonable batch size and acceptable latency. That is not a hardware footprint a solo developer or a lean Indian startup keeps around. It’s a hardware footprint a company with a Series B and a sovereign-AI mandate keeps around.

The Hugging Face model card for deepseek-ai/DeepSeek-V4-Pro, showing the MIT license badge and 1.6T-parameter architecture

The Hugging Face model card for deepseek-ai/DeepSeek-V4-Pro, reproduced for editorial coverage of the model release.

What the GPU rental actually costs in India, as of 5 May 2026:

AWS EC2 p5.48xlarge (8x H100 80GB): on-demand pricing of $55.04 per hour per Vantage’s published tracker, which is $40,179 per month at 730 hours, landing at roughly ₹35-40 lakh per month at $1 = ₹95 for one replica running 24x7.⁹ Note that $55.04/hour is the US East (N. Virginia) rate Vantage surfaces; Mumbai (ap-south-1) pricing typically runs higher and should be checked at the AWS Pricing console before commitment. Reserved-instance and savings-plan pricing can knock 30-40% off this on multi-year commits.
E2E Networks (Indian cloud, Bangalore and Delhi data centres): publishes H100 GPU cloud pricing in INR; advertised rates land in roughly the ₹1.13-1.82 lakh per H100 per month range depending on commitment, so an 8x cluster lands in the ₹9-15 lakh range.¹⁰ Check live pricing.
Lambda Labs (US-region H100 clusters, no India region as of writing): around $2-3 per H100 hour on-demand, which puts an 8x cluster at roughly ₹11-16 lakh per month. US-region hosting typically adds noticeable cross-border latency for Indian users.

V4-Flash at 284B / 13B active is more humane. The model runs on a 4x H100 cluster at FP16 precision, or fewer GPUs with FP8 or INT4 quantisation per common vLLM and SGLang deployment recipes. On E2E Networks that lands at roughly ₹4.5-7 lakh per month for an FP16 4x H100 cluster, or ₹1.5-2 lakh per month for a single H100 running a quantised build at meaningful throughput. Flash is the realistic self-host target for an Indian team that wants the open-weights advantages without the eight-figure annual GPU spend. The trade-off is capability. Flash isn’t V4-Pro, and benchmark gaps on hard reasoning tasks are real even before independent replication lands.

The “run it on your laptop” framing that drifts through some open-source coverage is wrong on its face. A 49B-active MoE will not run on consumer hardware at production latency. Quantised local inference for hobby experimentation is possible at slow token rates; serving real traffic is not.

When closed-API beats self-host on cost

The arithmetic is cleaner than the marketing on either side suggests. Take the AWS Mumbai 8x H100 figure of roughly ₹38 lakh per month and divide by the closed-API blended rate to find the break-even token volume.

If you’re paying DeepSeek’s hosted V4-Pro API at the regular rate of $1.74 input / $3.48 output (assume a 50/50 mix at $2.61 blended per million tokens, roughly ₹248 at $1 = ₹95), the break-even volume is:

₹38,00,000 / ₹248 per million tokens ≈ 15,300 million tokens per month, or roughly 510 million tokens per day before self-hosting V4-Pro on AWS Mumbai is cheaper than calling DeepSeek’s own hosted API at the regular rate
At V4-Pro’s promo rate of $0.435 / $0.87 (through 31 May 2026), break-even balloons to roughly 2 billion tokens per day. The promo makes hosted API even harder to beat
Against Claude Sonnet 4.5 at $3 / $15 (blended $9 per million ≈ ₹855), break-even drops to roughly 4.5 billion tokens per month, or about 150 million tokens per day

For V4-Flash, the math runs differently because hardware is much smaller. A single H100 on E2E at roughly ₹1.8 lakh per month, against V4-Flash’s hosted API at $0.14 / $0.28 (blended $0.21 per million ≈ ₹20), works out to:

₹1,80,000 / ₹20 per million tokens = 9,000 million tokens per month, or roughly 300 million tokens per day before V4-Flash self-hosting beats V4-Flash hosted API
Against Claude Sonnet 4.5, V4-Flash self-hosting on a single H100 breaks even at roughly 7 million tokens per day

So the self-host break-even depends almost entirely on which closed API you’re comparing to and which V4 variant you’d run. The answer changes by orders of magnitude.

Indian developer cost framing: DeepSeek V4 vs closed APIs vs self-host (as of 5 May 2026; verify live pricing before commercial decision)

Axis	DeepSeek V4-Flash (hosted API)	DeepSeek V4-Pro (hosted API)	Claude Sonnet 4.5 (API)	Self-host V4-Pro on AWS Mumbai
Approx. cost per million input tokens	$0.14	$1.74 ($0.435 promo)	$3.00	Fixed monthly GPU bill
Approx. cost per million output tokens	$0.28	$3.48 ($0.87 promo)	$15.00	Fixed monthly GPU bill
Monthly fixed cost	Pay-as-you-go	Pay-as-you-go	Pay-as-you-go	~₹35-40 lakh (8x H100, 24x7)
Break-even volume vs self-host on AWS Mumbai	Far above any realistic Indian dev volume	~510M tokens/day	~150M tokens/day	—
DPDP Act 2023 fit for sensitive data	Hosted in DeepSeek infra (China)	Hosted in DeepSeek infra (China)	Hosted in Anthropic infra (US)	Indian data centre, full audit
Fine-tuning / fork rights	Limited (hosted)	Limited (hosted)	Limited (hosted)	Full (MIT license)

DeepSeek V4-Flash (hosted API)

Approx. cost per million input tokens: $0.14
Approx. cost per million output tokens: $0.28
Monthly fixed cost: Pay-as-you-go
Break-even volume vs self-host on AWS Mumbai: Far above any realistic Indian dev volume
DPDP Act 2023 fit for sensitive data: Hosted in DeepSeek infra (China)
Fine-tuning / fork rights: Limited (hosted)

DeepSeek V4-Pro (hosted API)

Approx. cost per million input tokens: $1.74 ($0.435 promo)
Approx. cost per million output tokens: $3.48 ($0.87 promo)
Monthly fixed cost: Pay-as-you-go
Break-even volume vs self-host on AWS Mumbai: ~510M tokens/day
DPDP Act 2023 fit for sensitive data: Hosted in DeepSeek infra (China)
Fine-tuning / fork rights: Limited (hosted)

Claude Sonnet 4.5 (API)

Approx. cost per million input tokens: $3.00
Approx. cost per million output tokens: $15.00
Monthly fixed cost: Pay-as-you-go
Break-even volume vs self-host on AWS Mumbai: ~150M tokens/day
DPDP Act 2023 fit for sensitive data: Hosted in Anthropic infra (US)
Fine-tuning / fork rights: Limited (hosted)

Self-host V4-Pro on AWS Mumbai

Approx. cost per million input tokens: Fixed monthly GPU bill
Approx. cost per million output tokens: Fixed monthly GPU bill
Monthly fixed cost: ~₹35-40 lakh (8x H100, 24x7)
Break-even volume vs self-host on AWS Mumbai: —
DPDP Act 2023 fit for sensitive data: Indian data centre, full audit
Fine-tuning / fork rights: Full (MIT license)

So the call for almost every Indian developer reading this is: stay on the closed API you’re already paying. If your monthly LLM bill is sub-₹50,000, the comparison isn’t even close. You’d waste money self-hosting V4-Pro to save on token costs you don’t have. Even at ₹2-3 lakh per month in token spend, you’re still in API territory unless something else (data sovereignty, fine-tuning needs, audit) is forcing the move.

When the open-weights actually matter

The case for caring about V4, even if you don’t self-host, is real. Three scenarios where MIT-licensed weights matter beyond the “save money on tokens” framing:

DPDP Act 2023 sensitive workloads. If you’re processing health, financial, or government data and your DPO has flagged cross-border transfer to OpenAI or Anthropic as a compliance risk, V4-Flash on an Indian-data-centre H100 (E2E Networks, Yotta, NxtGen, or a private GPU server) gives you a model running entirely within Indian regulatory perimeter. The Digital Personal Data Protection Act, 2023 governs cross-border data transfer and processor accountability for personal data of Indian principals; running inference inside India sidesteps the cross-border transfer questions entirely.⁴ The model is auditable because the weights are public; the inference logs are yours; the data doesn’t leave the country. That’s a real differentiator no closed API offers today.
Fine-tuning for Indic-language or domain-specific work. MIT license means you can LoRA-fine-tune V4-Flash on Hindi, Tamil, Bengali, Telugu, Marathi, or Kannada corpora, or on legal, medical, agricultural domain text, and ship the resulting weights into your own product. Fine-tuning hosted Claude or GPT-5.5 is possible only in narrow OpenAI / Anthropic-approved configurations and the resulting weights aren’t yours to redistribute.
Provider-lock exit option. If Anthropic or OpenAI raises API prices 3x next quarter, having a tested fallback path on DeepSeek hosted API plus a rehearsed migration to self-hosted V4-Flash is procurement insurance. You don’t need to be running it; you need to know you can.

For a solo dev, freelance contractor, or pre-seed startup, none of these usually clear the urgency bar. For a Series B fintech, a hospital chain building internal tooling, or a government-adjacent consultancy, all three can.

What about V4-Flash?

The DeepSeek API platform pricing surface, showing V4 input and output token rates at the time of writing

The DeepSeek API platform at api-docs.deepseek.com/quick_start/pricing, reproduced for editorial coverage of the V4 pricing surface.

V4-Flash is the variant most Indian developer teams should actually look at if any of the open-weights advantages above apply. The 284B-parameter / 13B-active footprint runs on a 4x H100 cluster at FP16, or on fewer GPUs with FP8 or INT4 quantisation, which puts the serving-cost band somewhere between ₹1.5 lakh and ₹7 lakh per month on E2E Networks or comparable Indian GPU clouds depending on quantisation choice and throughput target.¹⁰ That’s a serving-cost band where the math can work for a team with serious volume, real DPDP Act constraints, or a fine-tuning roadmap.

Capability won’t match V4-Pro. The benchmark gap between Flash and Pro hasn’t been independently replicated yet, and DeepSeek’s own announcement positions Flash as the latency-sensitive variant rather than the strongest reasoning model. Treat Flash as “good enough for production with fine-tuning” rather than “drop-in for every Claude Sonnet workload.”

The realistic Indian deployment shape, if you’re going to self-host, is: V4-Flash on E2E Networks or Yotta, fine-tuned on your domain corpus, serving a specific product surface where the open-weights advantages compound, not as a general LLM-replacement layer.

What this changes today

For nearly every individual Indian developer: nothing immediately. The closed APIs you’re using are still cheaper for your token volume, easier to integrate, and don’t carry the operational tax of running production GPU infrastructure.

For technical leads at companies with DPDP Act 2023 obligations, regulated industries, or serious fine-tuning roadmaps: V4 is worth a real evaluation in the next 30-60 days. Spin up V4-Flash on E2E or Yotta, run your evaluation suite against it, measure the latency from Indian endpoints, and assess whether the open-weights advantages clear your specific cost and compliance bar.

For everyone else: bookmark V4 as a credible exit option if the closed-API economics shift, and check back when the open-source community has run independent benchmarks. The MIT license is the part of this story that ages well. It doesn’t expire when V4.1 ships.

Prices and availability fluctuate; verify the live AWS, E2E, DeepSeek, Anthropic, and OpenAI pricing pages before committing to any architecture decision.

How this article was made: an autonomous AI pipeline researched, drafted, fact-checked, and reviewed this piece, aggregating publicly-available information from the sources consulted below. AI (artificial intelligence) can make mistakes, so please cross-check the consulted sources before acting on anything here. Neural Tech Daily is not liable for decisions or outcomes based on this article.

Sources consulted

Cited Sources

1. DeepSeek API Docs — V4 announcement (news260424), 24 April 2026 launch, V4-Pro and V4-Flash variant details, MIT license confirmation (accessed 2026-05-05) ↩
2. DeepSeek API Docs — Models & Pricing page; V4-Flash \$0.14 input / \$0.28 output per million tokens; V4-Pro \$1.74 input / \$3.48 output (75% promo discount running to 31 May 2026 takes V4-Pro to \$0.435 / \$0.87) (accessed 2026-05-05) ↩
3. Vantage — AWS p5.48xlarge on-demand pricing \$55.04/hour (8x H100 80GB), monthly cost ~\$40,179 at 730 hours; verify Mumbai region availability separately (accessed 2026-05-05) ↩
4. Ministry of Electronics and IT — Digital Personal Data Protection Act, 2023; cross-border data transfer and processor accountability framework for personal data of Indian principals (accessed 2026-05-05) ↩
5. Hugging Face — deepseek-ai/DeepSeek-V4-Pro model card; 1.6T-parameter MoE architecture with 49B active parameters; 80.6% SWE-bench Verified, 95.2% HMMT 2026 February, 89.8% IMOAnswerBench; MIT license (accessed 2026-05-05) ↩
6. MIT Technology Review — Why DeepSeek's V4 matters, 24 April 2026 launch coverage and capability framing (accessed 2026-05-05) ↩
7. Artificial Analysis — independent tracking of V4-Pro and V4-Flash; corroborates 80.6% SWE-bench Verified, frames V4-Pro as tying Gemini 3.1 Pro and trailing Claude Opus 4.6 by 0.2 points (accessed 2026-05-05) ↩
8. DeepSeek chat interface — free browser-based consumer endpoint (accessed 2026-05-05) ↩
9. AWS EC2 P5 instance type page — H100 GPU specifications and instance family reference (accessed 2026-05-05) ↩
10. E2E Networks — H100 GPU cloud pricing in India; advertised rates roughly ₹1.13-1.82 lakh per H100 per month depending on commitment; Bangalore and Delhi data centres (accessed 2026-05-05) ↩