From the team that tested Chrome & Search

AI for Confidence

Free AI quality report · No credit card · ~15 min
Screenshot preview
📸 Screenshot readyAI will analyze for bugs on send
TAI Model
My Reports
The First AI Model
Built for QA & Testing
Spend pennies on AI compute. Save dollars in human QA hours. Faster findings, fewer false positives — at a fraction of the cost of a single manual tester.
TAI Model · Testers.AI

An AI Model Built for Software Quality and Confidence

Leaderboard →
ChatGPT & Claude
TAI Model
Trained to sound helpful, not test
Built to test — not to chat
Hallucinate bugs that don’t exist
Every finding grounded in evidence
False positives waste engineer time
5 modes tune precision vs. recall
Not focused on QA tasks
Fine-tuned on QA data only
Thousands of lost hours at scale
Ensemble-orchestrated for coverage
⚖️
balanced
Balanced
Best overall F1. Good precision and recall.
Good for: general QA, sprint testing
🪨
grounded
Groundedness
Almost no hallucinations.
Good for: exec reporting
🎯
precision
High Precision
Near-zero false positives.
Good for: sign-off gates
🔍
recall
Maximum Recall
Catches more, trades some precision.
Good for: exploratory testing
🌐
discovery
Discovery
Finds novel or unexpected issues.
Good for: first-time audits
tuned
0.5×
Fine-tuned
Fastest and cheapest.
Good for: high-volume CI
balanced
Best overall F1. Good precision and recall. Default choice for most teams.
⚡ General QA · sprint testing
relative
model cost
Precision
72%
Recall
68%
Issues found
60%
Discovery
45%
Groundedness
80%
A fast triage agent filters noise first, then specialist sub-agents handle precision, grounding, and discovery in parallel. Those extra pennies per run save dollars of human QA time and compress release cycles. Single-tier models fail here — no precision vs. recall trade-offs, and every false positive or missed bug falls back on your engineers.

Already have a key? Enter it

AI Testing Plugin · Early Access

Test as you build — inside Claude, Codex & more#

The AI testing plugin runs directly inside your coding environment. As your AI writes code, it audits in real time — finding bugs, regressions, and quality gaps before they ever reach review.

Claude Code
#
MCP Plugin
Integrates directly into Claude Code as an MCP server. Every file your agent touches gets tested — bugs surface as inline findings before you commit.
Codex / Antigravity / VS Code
#
Extension
A VS Code extension that integrates with Codex, Antigravity, and Copilot workflows. Run quality checks on any diff, surfacing real bugs — not just linting — without leaving your editor.
Zero setup, zero context switch
Testing happens where you already are. No dashboard, no separate tool, no tab switching — just findings inline as you work.
🎯
Real bugs, not noise
30 specialized AI testers — accessibility, security, UX, performance — run in parallel. Every finding is grounded, prioritized, and actionable.
🔁
Continuous, not gated
Testing runs on every change, not at release. Catch regressions the moment AI introduces them — before they compound into incidents.
🧠
Quality aware of your context
The plugin understands your stack, your user flows, and your risk areas. Findings are scoped to what matters in your product — not generic warnings.
Free during early access · We'll reach out with setup instructions
30 AI testing specialists — run in parallel on every report

Can I bring my own LLM?

Yes. Pick from Anthropic Claude, OpenAI (GPT-4o / GPT-5 etc.), Google Gemini, or Azure OpenAI. For fully air-gapped or zero-egress setups, point the platform at a self-hosted endpoint (Ollama, vLLM, LocalAI, or any OpenAI-compatible API). Provider + model are passed per-request via the provider / model fields, or set globally per deployment.

Can I self-host on my own private network?

Yes — three ways:

  • Docker / Docker Compose — one-line bring-up via cloud/enterprise/docker-compose.yml.
  • Kubernetes — manifests in cloud/enterprise/kubernetes/; tested on EKS, GKE, AKS, and bare-metal k3s.
  • Single VM — clone, set ADMIN_TOKEN + LLM key, docker compose up -d --build. Up in under 10 minutes.

All three ship as the same Node + Playwright + cloudflared image, with Firestore (or any Firestore-API-compatible backend) for metadata and a configurable object store for artifacts.

Can I run fully air-gapped?

Yes. Pair a self-hosted deployment with a self-hosted LLM endpoint (Ollama / vLLM / LocalAI) and the entire system runs without outbound internet — neither testers.ai nor any LLM vendor sees your traffic or your reports. The hosted UI, the runner, the LLM call, and the artifact store all live on your network.

How do I tunnel into private / VPN-protected targets?

The runner can bring up a tunnel for the duration of a single test, then tear it down. Supported tunnel types:

  • Tailscale — join the runner to your tailnet; address the target by its tailnet hostname.
  • cloudflared — runs the Cloudflare connector inside the runner container.
  • ngrok — for ad-hoc reverse tunnels.
  • SSH reverse — opens an SSH reverse forward to your jump host.
  • WireGuard, OpenVPN, IPSec — supported on self-hosted deployments.
  • GCP VPC connector — for managed Cloud Run deployments inside your GCP project.
  • Reverse proxy — pass-through if your target is already exposed via a corporate reverse-proxy host.
Can I import / export tests + findings?

Yes. Every stored report renders to multiple formats on demand:

  • JSON — full report (issues, severity, evidence, persona reviews, flow steps, screenshots, timing). Stable schema, version-tagged. GET /r/:id.json
  • Markdown — a human-readable report with embedded screenshots and one fix-prompt per issue. GET /r/:id.md
  • TXT — a flat list of every issue's prompt-to-fix-this-issue, ready to pipe into your AI coding agent (Claude, Cursor, Copilot, Antigravity).
  • HTML — the shareable web report (/r/:id), with the report itself shareable as a permanent URL.

Test cases can also be exported to CSV, Jira, TestRail, or Xray directly from the chat UI.

Can reports be shared?

Yes. Every run gets a permanent shareable URL (https://reports.jank.ai/r/<id> on hosted, or your equivalent base URL on self-host). You choose visibility: "public" (anyone with the link views the report) or visibility: "private" (admin-token gated). Optional emails list sends a "report ready" email when a run completes.

How long does a run take?

A full multi-dimensional run (bug finding + exploratory + functional + competitive + personas + accessibility + crawl) typically lands in ~12–15 minutes. Smaller scoped runs (single-page bugs only, no personas, no flows) finish in 3–5 minutes. Every agent runs in parallel — adding more dimensions doesn't multiply the runtime, it just lights up more lanes.

What can I configure per run?
  • URLs — 1 to 25 per submission, batch-mode supported.
  • Subpages — let the AI pick N additional pages from the entry URL (or disable).
  • Flows — generate N test flows; pass customPrompt to steer the agent (e.g., "focus on the checkout funnel").
  • Personas — generate N persona reviews with optional customPrompt to bias toward your audience.
  • Provider + model — pick LLM per-run.
  • Visibility — public / private / admin-token gated.
  • Tunnel spec — Tailscale, cloudflared, ngrok, SSH, WireGuard, OpenVPN, IPSec, GCP VPC.
  • Email notifications — comma-separated list of recipients per run.
  • Custom checks — per-brand / per-customer test rules layered on top of the standard suite.
  • Label — free-form tag for grouping in the admin dashboard.
Does it have a REST API + CLI?

Yes. POST /api/reports with a JSON list of URLs and the runner returns report IDs immediately; poll GET /api/reports/:id for status, fetch /r/:id.json for the result. Auth is via an X-Api-Key header. There's also a scripts/submit.sh curl wrapper bundled with the cloud package, and a CLI runner for CI pipelines (GitHub Actions, GitLab CI, Jenkins, CircleCI).

What about admin / ops?

An admin dashboard at /admin shows every report, its queue/running/done state in real time, with one-click retry on failures. Per-key quotas, per-account demo limits, and a separate ops API (see docs/api-internal.md) cover the operator side. Artifacts are versioned in object storage; metadata and run state live in Firestore (or a Firestore-compatible store on self-host).

IcebergQA · Expert-Managed AI Testing

Want AI Testing Experts to Run It For You? #

IcebergQA is a different category of QA service — built around AI from the start, not retrofitted onto a manual practice. Senior QA engineers run the AI agents, curate the findings, and deliver a clear roadmap. You get the speed of automation with the judgment of experience — so your team ships with confidence, no matter how fast.

500+
client engagements
20+
years of QA services delivery
10+
year avg client tenure
30+
AI testing agents across web, mobile & API
Most requested
🔄
Convert Existing Tests to AI
Your Selenium or Playwright suite took years to build. We migrate it to AI-native automation — faster execution, self-healing selectors, and real-quality signals — without throwing away what works.
Selenium → AI Playwright → AI Manual scripts
Add value immediately
Add AI to Your Existing Tests
Not ready to migrate? We layer AI-powered quality checks on top of your current suite — catching what your existing tests miss without disrupting your workflow or CI pipeline.
No migration needed CI/CD ready Instant coverage lift
Fully bespoke
🎨
Custom & Bespoke AI Testing
We work with you as much or as little as you like — consultative, AI-first, and deeply technical. From a one-time audit to an embedded quality partner, the engagement is shaped entirely around your product and team.
Consultative AI-first Any scope
🔍
Deep Audit & Competitive Benchmarking
Full analysis of your existing QA stack with live benchmarking against category competitors.
🤝
Collaborative Roadmap
We build the testing strategy with you — then execute as a full managed service or embedded partner.
📈
Flexible & Scalable
Starts with a $5,000 discovery month — no long-term contract. Scale up or partner as needed.
🎯
Risk-Based Coverage Planning
Prioritize by AI risk signals and business context — not just code coverage numbers.
IcebergQA Led by Jason Arbon (testers.ai, ex-Google/Microsoft) and Phil Lew (XBOSoft, 20+ years enterprise QA) — the team behind testers.ai.
Confidence Engineering · Open Manifesto
"The next era of engineering will be defined by who can justify confidence in what they generate."

AI now generates more code than any team can fully read or understand. Software ships continuously — faster than the verification models built around it.

94% code coverage and 10,000 passing tests are not confidence. They are the illusion of diligence. Vanity metrics that give teams permission to ship without evidence.

Perfect software isn't achievable. But confidence is. Confidence is knowing — with real evidence — what works, what breaks, and where risk lives right now.

Ship with real evidence, not optimism. The teams that win in the GenAI era will be the ones who can prove what they're shipping is ready.

Real-world outcomes over synthetic validation
For people, AI agents, APIs, and the systems that depend on them
The testing coverage gap — what's being missed
The gap between what's tested and what's actually shipping
Testing frequency vs. shipping frequency
Shipping frequency has outpaced verification cadence
Evidence over opinion
Verification over assumption
Judgment over blind automation
Risk over vanity metrics
Connection types · Starter plan
Configure these when submitting a report on Starter plan and above — test behind login walls, staging environments, and private networks.
Get Starter →