Promptfoo vs LangSmith vs Helicone: Which LLM Observability Tool in 2026?

Promptfoo for pre-deploy eval, Helicone for production logging, LangSmith if you're on LangChain. Most teams need two of three, not all three.

5 May 2026 Updated 19 May 2026 ~21 min read

Promptfoo LangSmith Helicone

The three LLM observability tools compared in this guide.

The bottom line

For a dev team in 2026 weighing Promptfoo, LangSmith, and Helicone, the three tools are not really competing on the same axis. Each owns a different part of the LLM observability stack, and most teams pick two of the three rather than all three or just one. (Pricing and feature claims as of 5 May 2026 across all three products; tool pricing changes frequently, so verify before subscribing.)

Pick Promptfoo for the pre-deploy evaluation layer. The framework is open-source under MIT licence, runs locally as a CLI¹, and does what nothing else on this list does well: A/B test prompts and models against a fixed test set with assertion-based grading². Write a YAML config, run promptfoo eval, get a comparison matrix. For a team that wants to verify prompt quality before shipping changes to production, this is the right tool, and the local-first design means no SaaS dependency at the eval stage.

Pick LangSmith for production tracing if your application is built on LangChain or LangGraph³. The integration is first-party, every chain step traces automatically, and the evaluation harness lets you run reference-based and LLM-as-judge graders against captured traces⁴. The free Developer tier covers 5,000 traces a month; the Plus tier starts at $39 per seat per month with 10,000 traces included plus volume-based add-ons for teams that need higher volume and collaboration features⁵. For non-LangChain stacks, the value-to-friction ratio is weaker; Helicone is usually the better choice.

Pick Helicone for production logging and cost tracking on any LLM stack. Integration is a one-line proxy change for OpenAI-compatible APIs or a wrapper SDK for direct integration⁶. The free tier covers 10,000 requests a month, and the Pro tier at $79 per month adds unlimited request retention and team collaboration⁷. Open-source under Apache 2.0⁸, so self-hosting is a real option for teams with data-residency constraints, particularly in regulated industries.

The combination most production teams converge on: Promptfoo for pre-deploy eval plus Helicone for post-deploy logging. LangSmith joins the stack only when the application is already on LangChain. Skip all three if the application makes fewer than a few hundred LLM calls a day and a Postgres table with a logs schema does the job. Tooling earns its place at scale, not at the smallest cases.

Vendor state, March 2026

Two acquisitions in March 2026 reshaped the picture for all three tools. The recommendations above still hold; the strategic framing around each tool has shifted.

Promptfoo was acquired by OpenAI on 9 March 2026. The open-source CLI continues to ship under the existing MIT licence and OpenAI stewardship; the commercial Promptfoo Cloud tier is now part of OpenAI’s evals offering¹⁰. For teams already running on OpenAI models, the integration story is now tighter. For teams on Anthropic, Google, or open-source providers, Promptfoo’s CLI remains provider-neutral but the long-term roadmap on the commercial Cloud tier is worth tracking; the vendor-neutral framing the project leaned on through 2025 has changed shape.

Helicone was acquired by Mintlify on 3 March 2026. Mintlify positioned the deal as a documentation-and-observability suite; the Helicone roadmap is now in maintenance mode (security patches, bug fixes, and new model support continue to ship; active feature development has ended)¹¹. Existing Helicone deployments and the open-source self-hosted path continue to work. Teams choosing Helicone in mid-2026 should treat it as a stable proxy-style observability layer rather than a platform with a forward roadmap; teams committing for a multi-year horizon should evaluate Langfuse or Arize Phoenix as alternatives that retain active development.

LangSmith renamed its paid tiers in March 2026. The previous “Plus” tier became “Developer” (the free tier with 5,000 traces / month); the previous “Pro” tier became “Plus” at $39 per seat per month with 10,000 traces included plus volume-based add-ons; the Enterprise tier is unchanged⁵. Articles and forum threads written before March 2026 may still reference the old tier names.

What each tool actually is

Promptfoo, in one paragraph

Promptfoo is an open-source CLI and library for testing LLM applications¹. The mental model is “unit tests for prompts.” A developer writes a YAML or JSON config that lists prompts, models to test against, and a set of test cases with assertions (“output contains X,” “output matches regex Y,” “an LLM judge says the output answers the question”). Run promptfoo eval, and the tool calls each model with each prompt against each test case, grades the output, and produces a side-by-side comparison². The framework runs locally by default; no data leaves the developer’s machine. There is also a hosted Cloud tier for teams that want shared dashboards, a vulnerability scanner for adversarial testing, and CI integration⁹, but the open-source core is fully usable on its own.

LangSmith, in one paragraph

LangSmith is the production observability and evaluation platform from LangChain, the team behind the LangChain framework and LangGraph agent runtime³. The mental model is “traces, datasets, and graders.” Every LangChain or LangGraph application step (LLM call, tool invocation, retrieval lookup, sub-chain execution) emits a trace event when LangSmith is enabled; the platform stores the trace, lets the developer browse and filter, and pipes traces into datasets that grading workflows can run against⁴. The integration with LangChain is one environment variable; the integration with non-LangChain stacks goes through the LangSmith SDK, which is more wiring than the LangChain path. Pricing has a free Developer tier with 5,000 traces per month, a paid Plus tier from $39 per seat per month with 10,000 traces included plus volume-based add-ons, and an Enterprise tier on request⁵.

Helicone, in one paragraph

Helicone is a production LLM observability platform structured around the question “where do my requests, costs, and latencies go”⁶. Two integration shapes: a proxy that sits between the application and OpenAI-compatible APIs (one-line base URL change) and direct integrations through the Helicone SDK or async loggers for non-proxy paths. Once integrated, Helicone captures every request, response, latency, token count, and cost; surfaces them in a dashboard with filters, custom properties, and user-level segmentation; and lets the developer set up alerts, caching, and rate limits at the gateway layer⁶. Helicone is open-source under Apache 2.0⁸, and the team publishes a self-hosting path for teams that need to keep request data inside their own infrastructure.

At a glance: the comparison table

Tool state as of 5 May 2026, fetched from each project's official documentation, pricing page, and GitHub repository. Prices fluctuate; verify before subscribing. Two of the three vendors changed corporate ownership in March 2026 (Promptfoo to OpenAI; Helicone to Mintlify) and LangSmith renamed paid tiers in the same month; treat tier names and pricing as point-in-time. Pricing in USD; Indian buyers should factor in forex spread plus 18 per cent IGST on equivalent imports of services where applicable.

Axis	Promptfoo	LangSmith	Helicone
Primary purpose	Pre-deploy prompt and model A/B evaluation	Production tracing, datasets, and grading for LangChain stacks	Production logging, cost tracking, and gateway features for any LLM stack
Tool type	Open-source CLI plus library; optional hosted Cloud tier	Managed SaaS; tightly integrated with LangChain and LangGraph	Managed SaaS or self-hosted; works with OpenAI-compatible APIs and direct SDK integrations
Open-source licence	MIT (fully open-source)	Closed-source SaaS; LangChain framework itself is open-source but LangSmith is not	Apache 2.0 (fully open-source, self-hostable)
Local-first option	Yes — runs entirely on the developer's machine; no data leaves by default	No — SaaS-only for the LangSmith product	Self-host path available; SaaS is the default
LangChain integration	Works with any LLM provider; LangChain support via the OpenAI-compatible interface	First-party; one environment variable; every chain step auto-traces	Works with any LLM provider; LangChain stacks integrate through the OpenAI-compatible base URL
Production tracing	Not the primary focus; the Cloud tier adds limited production-trace features	Deep, multi-level trace tree across chains, agents, tools, retrievers	Request and response logs with metadata; flat trace shape, not multi-level chain trees
Eval framework	The core feature; assertions, LLM-as-judge, model comparisons in YAML	Datasets and grading workflows; reference-based and LLM-as-judge graders	Lighter eval surface; primary use is logging plus alerting, not structured eval
Free tier	Open-source CLI is fully free; Cloud free tier for individuals	5,000 traces per month on the Developer tier (renamed from Plus in March 2026); one seat	10,000 requests per month on the free tier; basic dashboards
Paid tier entry price (USD, as of May 2026; verify before purchase)	Cloud Team $50 per month flat-rate; Enterprise custom pricing; open-source CLI free	Developer free with 5,000 traces / month; Plus from $39 per seat per month with 10,000 traces; Enterprise on request	Pro from $79 per month with unlimited seats; usage-based overages
Indian payment path	USD billing on Cloud; open-source CLI carries no billing surface	USD billing through LangChain Inc.; standard forex plus GST on imported services	USD billing on SaaS; self-hosted path eliminates billing entirely
Best fit	Teams that need structured pre-deploy eval; CI checks on prompt changes; model and provider A/B tests	Teams already on LangChain or LangGraph who need deep production traces and grading without separate wiring	Teams on any LLM stack who need cost visibility, request logging, and gateway features without a framework lock-in

Promptfoo

Primary purpose: Pre-deploy prompt and model A/B evaluation
Tool type: Open-source CLI plus library; optional hosted Cloud tier
Open-source licence: MIT (fully open-source)
Local-first option: Yes — runs entirely on the developer's machine; no data leaves by default
LangChain integration: Works with any LLM provider; LangChain support via the OpenAI-compatible interface
Production tracing: Not the primary focus; the Cloud tier adds limited production-trace features
Eval framework: The core feature; assertions, LLM-as-judge, model comparisons in YAML
Free tier: Open-source CLI is fully free; Cloud free tier for individuals
Paid tier entry price (USD, as of May 2026; verify before purchase): Cloud Team $50 per month flat-rate; Enterprise custom pricing; open-source CLI free
Indian payment path: USD billing on Cloud; open-source CLI carries no billing surface
Best fit: Teams that need structured pre-deploy eval; CI checks on prompt changes; model and provider A/B tests

LangSmith

Primary purpose: Production tracing, datasets, and grading for LangChain stacks
Tool type: Managed SaaS; tightly integrated with LangChain and LangGraph
Open-source licence: Closed-source SaaS; LangChain framework itself is open-source but LangSmith is not
Local-first option: No — SaaS-only for the LangSmith product
LangChain integration: First-party; one environment variable; every chain step auto-traces
Production tracing: Deep, multi-level trace tree across chains, agents, tools, retrievers
Eval framework: Datasets and grading workflows; reference-based and LLM-as-judge graders
Free tier: 5,000 traces per month on the Developer tier (renamed from Plus in March 2026); one seat
Paid tier entry price (USD, as of May 2026; verify before purchase): Developer free with 5,000 traces / month; Plus from $39 per seat per month with 10,000 traces; Enterprise on request
Indian payment path: USD billing through LangChain Inc.; standard forex plus GST on imported services
Best fit: Teams already on LangChain or LangGraph who need deep production traces and grading without separate wiring

Helicone

Primary purpose: Production logging, cost tracking, and gateway features for any LLM stack
Tool type: Managed SaaS or self-hosted; works with OpenAI-compatible APIs and direct SDK integrations
Open-source licence: Apache 2.0 (fully open-source, self-hostable)
Local-first option: Self-host path available; SaaS is the default
LangChain integration: Works with any LLM provider; LangChain stacks integrate through the OpenAI-compatible base URL
Production tracing: Request and response logs with metadata; flat trace shape, not multi-level chain trees
Eval framework: Lighter eval surface; primary use is logging plus alerting, not structured eval
Free tier: 10,000 requests per month on the free tier; basic dashboards
Paid tier entry price (USD, as of May 2026; verify before purchase): Pro from $79 per month with unlimited seats; usage-based overages
Indian payment path: USD billing on SaaS; self-hosted path eliminates billing entirely
Best fit: Teams on any LLM stack who need cost visibility, request logging, and gateway features without a framework lock-in

Pick Promptfoo when pre-deploy evaluation is the bottleneck

Promptfoo is the right pick when the team’s discomfort is “we changed a prompt and we’re not sure whether it’s better.” Three signals say this is the situation.

The first signal is that the team has a body of test cases, even informal ones. Promptfoo’s value comes from running prompts and models against a fixed set of inputs and grading the outputs. Teams without test cases get less out of the framework, but writing the first thirty cases tends to be the cheapest investment in prompt quality a team can make. Once the cases exist, switching prompts becomes a measured comparison rather than a vibe check².

The second signal is that the team wants to A/B test models or providers. Comparing GPT-4o-mini against Claude Haiku against an open-source model on the team’s actual workload is a Promptfoo strength. The framework runs each prompt across each model and produces a side-by-side matrix with assertion pass rates, latency, token cost, and qualitative output samples². For an Indian team weighing whether a self-hosted Llama deployment can replace a frontier API call on a specific task, Promptfoo is the right tool to answer the question with data rather than instinct.

The third signal is that the team wants prompt evaluation in CI. Promptfoo runs as a CLI, exits non-zero when assertions fail, and integrates with GitHub Actions or any CI runner⁹. A pull request that changes the prompt also runs the eval, and the CI status reflects whether the new prompt passes the assertion suite. This is the cheapest way to keep a team from accidentally shipping a regression on a critical path.

The trade-off is that Promptfoo is not the right tool for production observability. Once the application is deployed and serving traffic, the question shifts from “is this prompt good” to “what just happened in this request.” A team that needs full request-level logs, cost dashboards, and trace trees for every call is better served by Helicone or LangSmith.

For Indian developers, Promptfoo’s open-source core is fully usable without a billing relationship. The framework runs locally; the LLM calls go directly to whichever provider’s API the team is testing; no third party sits between the developer and the model. The Cloud tier is USD-billed when the team needs shared dashboards, but the entry point is free and stays free for many teams¹.

LangSmith product page from langchain.com showing the production tracing and evaluation platform with first-party LangChain and LangGraph integration

Image: LangSmith product page (langchain.com/langsmith), used for editorial coverage of the LLM observability tool compared in this guide.

Pick LangSmith when the application is already on LangChain

LangSmith is the right pick when the team has already chosen LangChain or LangGraph as the orchestration layer and now needs production observability. The integration is first-party and the friction is one environment variable³.

The clearest signal is that the team is using LangGraph for stateful agents and needs to debug what happens across multi-step workflows. LangSmith’s trace view shows the full execution tree: which nodes fired, what state was passed between them, what the LLM saw at each step, what tools were called, and how long each step took⁴. For a team debugging an agent that occasionally loops or escalates incorrectly, the trace tree is what makes the failure mode visible. Recreating this view in a non-LangSmith stack is possible but takes meaningful engineering work.

The second signal is that the team wants to run grading workflows against captured traces. LangSmith datasets let the developer pin a set of traces, label them with expected outputs, and run reference-based or LLM-as-judge graders against them⁴. The output is a per-grader pass rate and the per-trace failure list. For a team that wants to track prompt quality over time without writing a separate eval harness, this is a tighter fit than wiring Promptfoo against the same traces.

The third signal is that the team values having tracing, evaluation, and a hosted dashboard from one vendor whose framework they have already committed to. Vendor concentration cuts both ways; a team that is happy with LangChain’s API stability and pace of change will find LangSmith’s tight integration a feature rather than a risk.

The trade-off is the lock-in. LangSmith works best with LangChain. A team that builds part of its stack on LangChain and another part on plain provider SDKs ends up with traces in LangSmith for half the surface and gaps for the other half⁴. Migrating off LangChain later means evaluating the observability replacement separately, since LangSmith does not travel cleanly with a non-LangChain rewrite.

For Indian developers, LangSmith’s billing is in USD and follows the pattern of other US-billed developer SaaS tools: forex plus GST on imported services. US, EU, and UK teams pay the USD sticker directly. The Developer tier at 5,000 traces a month is enough for prototype work; the Plus tier (renamed from Pro in March 2026) at $39 per seat per month (approximately ₹3,315 at 2026-05-19 reference rates of $1 ≈ ₹85; FX fluctuates) with 10,000 traces included plus volume-based add-ons is where a small production team typically lands⁵. There is no India-specific pricing or India-region data residency option on the public pricing page; verify with sales if data residency is a hard requirement.

Pick Helicone when the application is on any LLM stack and needs cost visibility

Helicone is the right pick when the team needs production observability and the application is not built on LangChain, or when LangChain is in the stack but the team wants tracing and cost tracking that travels independently of the framework choice.

The clearest signal is that the team’s primary observability question is cost. “How much are we spending on LLM calls this month, broken down by feature, by user, by model” is a Helicone strength. The dashboard surfaces token usage, cost per request, cost per user, and cost trends, and the gateway features (caching, rate limits, fallbacks) let the team take direct action on the cost line⁶. For a startup with USD-billed LLM costs and a team trying to stay under a monthly budget, this is the load-bearing tool.

The second signal is integration shape. Helicone’s proxy mode is a one-line base URL change for OpenAI-compatible APIs⁶. Point the OpenAI client at oai.helicone.ai instead of api.openai.com, add a header with the Helicone API key, and every request flows through the gateway with logging on by default. For non-OpenAI-compatible providers or for stacks that prefer a non-proxy integration, the Helicone SDK and async loggers offer a wrapper-based path. The integration footprint is intentionally small; teams that already have an LLM client wired up can add Helicone in an afternoon.

The third signal is data residency. Helicone is open-source under Apache 2.0⁸ and the project publishes a self-hosting path. For Indian teams in regulated industries (banking, insurance, healthcare) where request data cannot leave the team’s infrastructure under DPDP Act or sector-specific rules, self-hosting Helicone is a real option. The trade-off is the operational cost of running the platform (Postgres, Clickhouse, and the Helicone services themselves), but for teams with a platform engineering function, the cost is bounded and predictable.

The trade-off is that Helicone is not the deepest eval tool. The platform supports prompt versioning, custom properties, and basic eval workflows, but a team that needs structured CI assertions or model A/B tests across a test suite will want Promptfoo on top. Pairing the two is common: Promptfoo runs pre-deploy eval, Helicone runs production logging, and the team gets both halves of the observability surface.

For Indian developers, Helicone’s free tier at 10,000 requests a month is useful for prototype and small-production workloads⁷. Pro at $79 per month (approximately ₹6,715 at 2026-05-19 reference rates) adds unlimited request retention and team features. The self-hosted path eliminates the USD billing surface entirely, at the cost of running the infrastructure. US, EU, and UK teams pay the USD sticker directly.

Helicone product home page from helicone.ai showing the production LLM observability platform with proxy-mode integration, cost tracking, and gateway features

Image: Helicone product page (helicone.ai), used for editorial coverage of the LLM observability tool compared in this guide.

How to combine them

Most production teams in 2026 end up running at least two of these three. Three combinations are common, and each has a reason.

Promptfoo plus Helicone is the default for teams not on LangChain. Promptfoo runs in CI on every prompt change, gating regressions before they ship. Helicone sits in front of the production LLM calls, logging every request, surfacing cost and latency, and giving the team a single pane to debug a specific request that went wrong. The two tools cover the pre-deploy and post-deploy halves of the observability problem with no overlap and no LangChain dependency¹⁶.

LangSmith plus Promptfoo is the pattern for teams on LangChain who want structured CI gating in addition to LangSmith’s trace and grading workflows. LangSmith handles the production observability layer with deep chain traces; Promptfoo handles the per-PR assertion suite that LangSmith’s grading workflows do not gate at commit time⁴. The two overlap somewhat on the eval axis, but the local-first CI integration of Promptfoo is the differentiator.

Helicone plus LangSmith is uncommon. Both tools cover production observability; running both means double-paying for trace storage and managing two dashboards. Most teams pick one or the other⁶⁴.

The fourth combination, all three, is rare and usually a sign the team has not yet decided which two it actually needs. The honest pattern: start with Promptfoo for eval, add Helicone or LangSmith for production observability based on whether the stack is LangChain-shaped or not.

How to choose

Four questions narrow the decision, and the March 2026 vendor changes shift the weight of two of them.

One. Is the bottleneck pre-deploy or post-deploy? If the team’s primary discomfort is “we changed a prompt and we’re not sure whether it’s better,” the bottleneck is pre-deploy eval and Promptfoo is still the right place to start¹. The OpenAI acquisition does not change the CLI behaviour at the open-source layer; it does shift the long-term portability question for teams that want the commercial Cloud tier without a tightening tie to OpenAI’s ecosystem¹⁰. If the discomfort is “we have a production request that returned the wrong thing and we don’t know why,” the bottleneck is post-deploy observability and the choice is between LangSmith and Helicone.

Two. Is the application built on LangChain or LangGraph? If yes, LangSmith’s integration depth is hard to match; the chain-trace tree is what makes multi-step debugging tractable, and the integration cost is one environment variable³. LangSmith remains the standalone option in this category, at the cost of USD-card billing and the $39 per seat per month Plus tier post-rename⁵. If the application is not on LangChain, the choice tightens further given the Helicone status note below.

Three. How long is the commitment horizon for production observability? Helicone’s proxy-mode integration still works well for current cost-tracking and request-logging needs, and the open-source self-hosted path remains valid⁶. The Mintlify acquisition put the roadmap into maintenance-only mode¹¹, so teams committing for a multi-year horizon should evaluate Langfuse or Arize Phoenix as alternatives that retain active development. Teams shipping in the next 6 to 12 months and willing to revisit the tooling choice in 2027 can pick Helicone today and accept the trajectory.

Four. What are the data residency and billing constraints? Regulated industries with DPDP Act constraints, banking circulars, or sector-specific data-localisation rules should look at Helicone’s self-hosted path first; Apache 2.0 plus a documented self-hosting setup is a clearer story than an SaaS that needs an enterprise-tier negotiation for India-region data residency⁸. The Mintlify ownership does not change the licence or the self-hosting capability, but it does mean the self-hosted artefacts ship at a slower cadence going forward¹¹. For teams without residency constraints, the SaaS tiers of LangSmith and Helicone are the simpler operational choice; the trade-off is USD billing and standard forex friction.

A fifth consideration for cost-conscious teams: LLM bills in USD compound quickly at scale, and the team that does not have visibility into cost per feature, cost per user, and cost per request finds out about budget overruns at the end of the month rather than during the day. Helicone’s gateway features, caching that returns identical-input responses from a cache, rate limits per API key, and fallbacks to cheaper models when the primary is rate-limited, are the most direct cost-control levers in this comparison⁶. For high-volume production workloads, that cost lever is what often decides the tool choice — even with the maintenance-only roadmap caveat.

Honest caveats

Three things readers should know before treating this comparison as settled.

First, the vendor landscape shifted twice in March 2026. Promptfoo joined OpenAI; Helicone joined Mintlify with its roadmap moved to maintenance-only¹⁰¹¹. LangSmith’s non-LangChain SDK has continued to mature through 2026 and is the most actively-developed of the three commercial offerings as of writing. Re-read each tool’s release notes and ownership state before treating any specific feature comparison or roadmap claim as load-bearing.

Second, pricing in USD is the floor for Indian teams; the actual cost includes forex spread (typically 1 to 3 per cent depending on the card or remittance path) plus 18 per cent GST on imported services where applicable. The published USD numbers in this article are the vendor list price; the landed cost in INR is higher. For teams optimising for unit economics, the self-hosted path of Helicone or the open-source CLI of Promptfoo eliminates this friction at the cost of operational overhead.

Third, tool choice does not substitute for engineering discipline. A team that picks the right tool but never writes assertions, never reviews traces, and never acts on cost dashboards will not get the benefit any of these tools promise. Tool choice is a forcing function for the practice; the practice is what determines whether the LLM application is reliable, cheap, and improving over time.

For a dev team in 2026, the tooling decision is meaningful but is not the highest-payoff choice. The eval discipline (do we have test cases?), the cost discipline (do we know cost per feature?), and the failure-mode discipline (do we review traces from failures, not just passing requests?) weigh more heavily over a 12-month horizon than the specific tool stack. Pick the tools whose mental models match the team’s actual bottlenecks, and spend the engineering time on the discipline the tools enable.

How this article was made: an autonomous AI pipeline researched, drafted, fact-checked, and reviewed this piece, aggregating publicly-available information from the sources consulted below. AI (artificial intelligence) can make mistakes, so please cross-check the consulted sources before acting on anything here. Neural Tech Daily is not liable for decisions or outcomes based on this article.

Sources consulted

Cited Sources

1. Promptfoo project home: open-source CLI and library for testing LLM applications; MIT licence; runs locally by default with optional hosted Cloud tier for shared dashboards, vulnerability scanning, and CI integration. (accessed 2026-05-05) ↩
2. Promptfoo documentation introduction: YAML-config-driven prompt evaluation; assertions cover deterministic checks (contains, regex, JSON schema) and LLM-as-judge graders; runs prompts across multiple models and produces a side-by-side comparison matrix. (accessed 2026-05-05) ↩
3. LangSmith product home: production tracing and evaluation platform from the LangChain team; first-party integration with LangChain and LangGraph; one environment variable enables auto-tracing of every chain step. (accessed 2026-05-05) ↩
4. LangSmith documentation: trace view shows the full execution tree across chains, agents, tools, retrievers; datasets let developers pin traces and run reference-based or LLM-as-judge graders; non-LangChain integration via the LangSmith SDK is supported but requires more wiring than the LangChain path. (accessed 2026-05-05) ↩
5. LangSmith pricing page: Developer tier free with 5,000 traces per month and one seat; Plus tier from \$39 per seat per month with 10,000 traces included plus volume-based trace add-ons; Enterprise tier on request. USD billing. (accessed 2026-05-05) ↩
6. Helicone product home: production LLM observability platform; proxy-mode integration for OpenAI-compatible APIs is a one-line base URL change; SDK and async-logger paths for non-proxy integrations; dashboard surfaces token usage, cost per request, cost per user, latency; gateway features include caching, rate limits, and fallbacks. (accessed 2026-05-05) ↩
7. Helicone pricing page: free tier with 10,000 requests per month and basic dashboards; Pro tier from \$79 per month with unlimited request retention and team features; volume-based add-ons for higher request counts; self-hosted path eliminates billing surface entirely. (accessed 2026-05-05) ↩
8. Helicone GitHub repository (Helicone/helicone): Apache 2.0 licence; documented self-hosting path with Postgres, Clickhouse, and Helicone services; suitable for teams with data-residency constraints under DPDP Act or sector-specific rules. (accessed 2026-05-05) ↩
9. Promptfoo GitHub repository (promptfoo/promptfoo): MIT licence; CLI exits non-zero when assertions fail; integrates with GitHub Actions and other CI runners for per-PR prompt-quality gating. (accessed 2026-05-05) ↩
10. OpenAI acquires Promptfoo (announcement, 9 March 2026): the open-source Promptfoo CLI continues under existing MIT licence and OpenAI stewardship; commercial Promptfoo Cloud tier joined OpenAI's evals offering. Roadmap impact for non-OpenAI providers is worth tracking; vendor-neutral framing the project leaned on through 2025 has shifted shape. (accessed 2026-05-19) ↩
11. Mintlify acquires Helicone (announcement, 3 March 2026): Mintlify positioned the deal as a documentation-and-observability suite; Helicone roadmap is now in maintenance mode (security patches, bug fixes, and new model support continue to ship per Helicone's joining-Mintlify post; active feature development has ended). Existing deployments and the open-source self-hosted path continue to function. (accessed 2026-05-19) ↩