WT

Well Tested Team

Release confidence editorial

Vibe Testing

What is vibe testing? A category for testing AI-built apps

Vibe testing is QA for the AI-coding generation: a real browser loop that turns a one-sentence goal into a public, shareable proof page. Here is what the category is, what it isn't, and how to use it without overpromising.

Article

Vibe testing is the practice of running real browser checks against apps built (or modified) with AI coding tools, with results that are public and shareable. It is the QA category for the AI-coding generation — built for the moment a PM types "log in and check onboarding" into a chat and expects a result, not a ticket.

This post defines the category, separates it from adjacent ideas that have already failed to deliver, and shows the honest scope: what is live in the AI planner / vision evidence / release-health loop, what is in private beta, and what is still roadmap.


Why a new QA category now

The shape of "shipping software" changed in 2024–2026. Three things shifted at once:

  • AI coding tools (Cursor, Claude Code, Copilot, v0, Lovable, Bolt) let one person push to a deploy URL in an afternoon.
  • AI-built apps generate copy, routes, and UI faster than anyone can write tests for them.
  • Trust signals still expect a working product: SEO, sign-up flows, onboarding, dashboards, billing.

The result is that the natural unit of work is "this URL" and "this one-sentence goal." Not "this user story in our test management system." Not "this Q4 regression sprint."

That unit is what vibe testing is designed to run against.


Definition

Vibe testing is:

  • A real browser test, dispatched against a real URL with a real Playwright (or comparable) engine.
  • Authored from a one-sentence goal, optionally refined by an LLM into a small, stable scenario.
  • Honest about what executed: passed, failed, partial, timed out, or unavailable.
  • Output as a shareable proof page — a public URL anyone with the token can read, that shows the run, the steps, and the failure mode if any.

It is not:

  • An autonomous test-generator that authors tests for your repo on every PR.
  • A visual snapshot review tool.
  • A self-healing suite that silently rewrites flaky assertions.
  • A memory graph of "what your product does."
  • A "k6-style" load test.
  • A no-signup /check endpoint for live production monitoring.

Those are useful adjacent ideas, but they aren't vibe testing, and we do not market them as live until they are.


The unit of work

The smallest useful vibe test fits on one line:

URL + one-sentence goal → public proof page.

For example:

  • https://acme.com + "User can sign up with email and reach the dashboard."
  • https://acme.com/pricing + "Pricing tiers render with the right monthly and annual amounts."
  • https://acme.com/checkout + "Checkout flow accepts a Stripe test card and returns a confirmation page."

The output is a public, token-gated page that anyone can read:

  • What the URL was.
  • What the goal said.
  • What steps the LLM authored.
  • Which steps passed, which failed, what the artifact looked like.
  • An honest error category if the run failed (unavailable_mcp, timeout, invalid_target, partial).

This proof page is the artifact that earns the name: URL-in, evidence-out.


Where vibe testing earns its keep

Three concrete use cases where vibe testing outperforms the alternatives:

1. After a vibe-coding session

A PM types into a chat: "log in, hit the new dashboard, take a screenshot." Vibe testing is what runs when that chat answer needs to be real. It produces a public link the PM can paste into Slack without saying "trust me, it works on my laptop."

2. Pre-launch smoke

A startup wants to confirm "sign-up, dashboard, billing" still works after a Friday deploy. Vibe testing produces three public links in three minutes — one per flow — that anyone can audit without logging in.

3. Sales proof

A founder needs to show "we tested it for real" to a prospect. A vibe-test share page is a stronger artifact than a screenshot: it shows the goal, the URL, the steps, the result. It is harder to fake and harder to misread.


The honest scope today

A vendor that promises everything is lying. Here's where things stand as of 2026-06-17:

Wired and ready (the AI-native core loop):

  • AI planner at /app/vibe — URL + one sentence produces a plan with stated assumptions, explicit missing context, and a per-step confidence score. Customers can push back on the plan before the browser dispatches.
  • Playwright MCP browser run against any URL the user provides. Real clicks, real network, real timing.
  • Vision evidence pass — a vision model scores the screenshots and the step trace to produce a verdict (passed, failed, unavailable, partial).
  • Shareable proof page at /vibe/[slug]/share?t=[token] (revocable, no login required to read).
  • Customer release-health dashboard — every run feeds a release-health entry on Growth (basic) and Scale (full + bug / regression history + exports).
  • ✅ OpenTelemetry traces for every step, correlated by welltested.vibe_run_id. SigNoz dashboard for run lifecycle, MCP outcomes, and failure categories.

Live in private beta (allowlisted users):

  • 🚧 Stripe-billed self-serve with the 7-day trial flow (Starter, Growth) and the 3-run Free tasting menu. Trial, billing, and quota enforcement are verified end-to-end in Stripe test mode; production-mode products and the live webhook are gated on the production billing pass.
  • 🚧 k6 / API stress execution for Growth and Scale, bounded by plan limits and served through tokenized API-stress proof pages. Public launch still requires worker deployment verification.

Not live yet (and not marketed as live):

  • ❌ Autonomous test generation that reads your repo and writes PRs.
  • ❌ Self-healing assertions.
  • ❌ Production monitoring on a public URL (no-signup /check).
  • ❌ A memory graph of "everything your product does."

That list matters because it is the same list we run on internally. We don't market features we haven't shipped. The "wired and ready" bullets above are the user-facing surface we test against every deploy; the "in private beta" bullet is honest about the launch gate that's still open (production billing); the "not live yet" list is honest about features we explicitly do not advertise.


How to run your first vibe test

  1. Sign up at /login.
  2. Open /app/vibe.
  3. Paste a URL and a one-sentence goal.
  4. Watch the run progress. When it terminates, you get a public share link.
  5. Share that link in Slack, in an investor update, or with a prospect — no login required to read it.

Pricing and quota: pricing.


FAQ

Is vibe testing just a UI smoke test with extra steps?

It overlaps with UI smoke testing on the "real browser against a real URL" axis. The differences are: the input is a one-sentence goal (not a scripted test), the output is a public proof page (not a JUnit report), and the distribution is shareable (not buried in CI).

Does vibe testing replace my existing test suite?

No. It is a complement, not a replacement. Use your test suite for fast PR feedback. Use vibe testing for "this URL works against this goal" — especially when the URL was just deployed or just changed.

What happens when the goal is vague?

The LLM-authored scenario is conservative: 3–6 small, verifiable steps that target stable UI text (labels, roles, visible copy, URL paths). The output is honest about what it could and could not verify. A vague goal produces a smaller scenario, not a fake pass.

Can I revoke a public share link?

Yes. The share URL carries a 32-character token. Revoking the token 404s the link without breaking the underlying run.

What if Playwright MCP is unavailable?

The share page says so, explicitly, with a unavailable_mcp failure category. The intent (URL + goal) is preserved; the result is not faked.


Summary

  • Vibe testing is the QA category for the AI-coding generation.
  • The unit is URL + one-sentence goal → public proof page.
  • The proof is honest: passed, failed, partial, timed out, or unavailable — never faked.
  • The artifact is shareable: a tokenized public link anyone with the token can read.
  • The scope is explicit: we list what is live and what is not, the same list we run on internally.

If you want to put this loop behind your next deploy, the entry point is /app/vibe. The pricing is at /pricing. The canonical reference is this page.

Vibe testing · live in private beta
Test a URL with the AI planner.
Paste a URL and one sentence — the AI planner drafts a Playwright scenario, the browser runs it, the vision pass scores it, and you get a shareable proof page.
Test a URL

Free gives you 3 runs per month to prove the loop; Starter and Growth have a 7-day trial (card on file, no charge if you cancel).