AI Prompt Engineering

Getting ChatGPT to Write Accurate Playwright Test Scripts Without Flaky Selectors

June 30, 2026 10 min read 3 views

You paste in a user story, ask ChatGPT to write a Playwright test, and get back something that looks perfectly reasonable. Then it fails on the first run because the model reached for div.btn-primary instead of a stable selector β€” and your CI pipeline turns red before anyone even reviews the feature.

The problem is not that ChatGPT cannot write Playwright tests. It can. The problem is that without explicit constraints, it defaults to whatever selector pattern it saw most often in its training data, which skews toward class names, XPath, and positional selectors that are fragile by design.

What You'll Learn

  • Why ChatGPT gravitates toward unstable selectors and how to override that default
  • Four prompt patterns that consistently produce stable, role-based Playwright tests
  • How to give the model enough context to avoid guessing at your DOM structure
  • Common async handling mistakes in AI-generated tests and how to catch them before they hit CI
  • A review checklist for hardening ChatGPT output before committing it

Prerequisites

This guide assumes you already have a Playwright project set up (v1.30+ recommended) and are comfortable writing basic tests yourself. You do not need any special ChatGPT subscription β€” these prompting techniques work with GPT-4o and GPT-4 Turbo. Examples are in TypeScript, but the concepts apply equally to JavaScript.

The Root Cause: How ChatGPT Picks Selectors by Default

Large language models are pattern-matching engines. When ChatGPT sees a request like "write a Playwright test for a login form," it generates code that resembles the Playwright tests most common in its training data. Publicly available test suites lean heavily on CSS class selectors (.login-btn), nth-child tricks, and text matching against raw display strings.

None of those are inherently wrong, but they couple your tests tightly to implementation details. A designer renames btn-primary to btn-cta, or a translator updates the button label, and the test breaks β€” even though the feature works perfectly.

Playwright's own best-practice guidance pushes toward user-facing locators: getByRole, getByLabel, getByTestId. ChatGPT knows these APIs exist, but it will not use them unless you explicitly tell it to. That is the lever you have.

Prompt Pattern 1: Enforce Role-Based and Test-ID Selectors

The single most effective change you can make is adding a hard constraint list to your prompt. State the allowed selector strategies explicitly and forbid the fragile ones.

Here is a reusable constraint block you can paste at the top of any Playwright prompt:

Selector rules (follow strictly):
1. Prefer getByRole() with a name option wherever a semantic role exists.
2. Use getByLabel() for form fields that have associated labels.
3. Use getByTestId() for elements that have a data-testid attribute.
4. Use getByText() only for asserting visible content, not for clicking.
5. Never use CSS class selectors, ID selectors, or XPath.
6. Never use nth-child or positional selectors.

Pair that with a concrete request:

Write a Playwright TypeScript test that:
- Navigates to /login
- Fills the email field (label: "Email address")
- Fills the password field (label: "Password")
- Clicks the submit button (role: button, name: "Sign in")
- Asserts the user lands on /dashboard

Apply the selector rules above. Use async/await throughout. Do not use page.waitForTimeout().

The output you get from that prompt is dramatically more stable than the default. A typical result looks like this:

import { test, expect } from '@playwright/test';

test('user can log in with valid credentials', async ({ page }) => {
  await page.goto('/login');

  await page.getByLabel('Email address').fill('user@example.com');
  await page.getByLabel('Password').fill('s3cr3t');
  await page.getByRole('button', { name: 'Sign in' }).click();

  await expect(page).toHaveURL('/dashboard');
});

Compare that to what you get without constraints: page.locator('.login-form input[type="email"]').fill(...). The unconstrained version breaks if you restructure the form or update your CSS library.

Prompt Pattern 2: Give ChatGPT Your Component Structure

ChatGPT hallucinates selectors when it has to guess your DOM. The fix is simple: paste in the relevant HTML or JSX and tell it to derive selectors from that, not from assumptions.

You do not need to share your entire component. A trimmed snapshot of the interactive elements is enough:

Here is the relevant JSX for the checkout form:

<form aria-label="Checkout">
  <label htmlFor="card-number">Card number</label>
  <input id="card-number" data-testid="card-number-input" />

  <label htmlFor="expiry">Expiry date</label>
  <input id="expiry" data-testid="expiry-input" />

  <button type="submit">Pay now</button>
</form>

Using the selector rules above, write a Playwright test that fills out this form and submits it.
Derive all selectors from the actual markup. Do not invent attributes that are not present.

That last sentence matters. ChatGPT will sometimes add data-testid attributes that do not exist in your real markup. Telling it explicitly to use only what is present in the snippet you provided cuts that behaviour significantly.

If your application does not have data-testid attributes yet, this is a good moment to add them to your most-tested components. They are a low-cost investment with a high payoff in test stability. You can mention that to ChatGPT too: "Add data-testid attributes to the JSX and then write the test using those attributes." It will generate both the markup changes and the test together.

Prompt Pattern 3: Specify Async and Waiting Strategy Explicitly

After selector problems, async handling is the second most common source of flakiness in AI-generated Playwright tests. ChatGPT sometimes inserts page.waitForTimeout(2000) as a lazy way to handle animations or network delays. That is a hard-coded sleep, and it will either make your suite slow or fail in a different environment where the delay is longer.

Playwright has proper waiting built in: page.waitForURL(), page.waitForResponse(), expect(locator).toBeVisible(), and locator assertions that auto-retry. Tell ChatGPT to use those instead.

Async waiting rules:
- Never use page.waitForTimeout().
- Use await expect(locator).toBeVisible() to wait for elements to appear.
- Use await page.waitForURL() after navigation.
- Use await page.waitForResponse() if the action triggers a network request that must complete before the assertion.
- All locator interactions are already auto-retrying; do not add manual retries around them.

Here is an example of the pattern applied to a scenario involving a network request:

test('submitting the form saves data to the API', async ({ page }) => {
  await page.goto('/settings/profile');

  await page.getByLabel('Display name').fill('Ada Lovelace');

  const saveResponse = page.waitForResponse(
    (resp) => resp.url().includes('/api/profile') && resp.status() === 200
  );

  await page.getByRole('button', { name: 'Save changes' }).click();
  await saveResponse;

  await expect(page.getByRole('alert')).toHaveText('Profile saved.');
});

Notice page.waitForResponse() is set up before the click, not after. This is a race-condition trap that ChatGPT frequently gets wrong when you do not specify the ordering. If you see the model setting up the response waiter after the triggering action, flag it explicitly in a follow-up message: "Move the waitForResponse setup to before the click to avoid a race condition."

This kind of attention to detail in AI-generated async code mirrors what you would watch for in ChatGPT-generated retry logic, where similar timing assumptions create silent bugs.

Prompt Pattern 4: Ask for Page Object Model Structure

If you are generating tests for a feature that you will revisit often, ask ChatGPT to produce a Page Object Model (POM) instead of a flat test function. A POM encapsulates all the selectors in one class, so when your UI changes, you update one file instead of hunting across a test suite.

Generate a Playwright Page Object Model class for the login page at /login.
Apply the selector rules above.
The class should expose methods: goto(), fillEmail(email), fillPassword(password), submit(), and a getter currentUrl.
Then write one test that uses this class to verify a successful login.

A well-structured output looks like this:

// login-page.ts
import { type Page, type Locator } from '@playwright/test';

export class LoginPage {
  readonly page: Page;
  readonly emailInput: Locator;
  readonly passwordInput: Locator;
  readonly submitButton: Locator;

  constructor(page: Page) {
    this.page = page;
    this.emailInput = page.getByLabel('Email address');
    this.passwordInput = page.getByLabel('Password');
    this.submitButton = page.getByRole('button', { name: 'Sign in' });
  }

  async goto() {
    await this.page.goto('/login');
  }

  async fillEmail(email: string) {
    await this.emailInput.fill(email);
  }

  async fillPassword(password: string) {
    await this.passwordInput.fill(password);
  }

  async submit() {
    await this.submitButton.click();
  }

  get currentUrl() {
    return this.page.url();
  }
}

// login.spec.ts
import { test, expect } from '@playwright/test';
import { LoginPage } from './login-page';

test('successful login redirects to dashboard', async ({ page }) => {
  const loginPage = new LoginPage(page);
  await loginPage.goto();
  await loginPage.fillEmail('user@example.com');
  await loginPage.fillPassword('s3cr3t');
  await loginPage.submit();

  await expect(page).toHaveURL('/dashboard');
});

The POM approach also gives ChatGPT a template to follow in subsequent prompts. Once you have a base class, you can paste it in and say "write three more tests using this LoginPage class" and the model will stay consistent with your existing structure.

Common Pitfalls in AI-Generated Playwright Tests

Chaining locators by index

ChatGPT sometimes writes page.locator('button').nth(0).click() when it cannot identify a unique selector. This is a red flag. If no unique role or label distinguishes the button, your prompt should tell the model to ask you for a data-testid rather than guessing positionally.

Missing baseURL configuration

Generated tests often hard-code full URLs like https://staging.example.com/login. Your Playwright config already has a baseURL; the test should use a relative path. Add this to your prompt: "Use relative paths in page.goto() β€” do not hard-code a domain."

Assertions that never fail

A subtle one: await expect(page.locator('body')).toBeVisible() will always pass because the body is always present. ChatGPT inserts these as filler when it is uncertain what to assert. Specify exactly what the test should confirm: the URL, a specific heading, a success message. Vague test goals produce vague assertions.

Forgetting to handle authentication state

If your test requires a logged-in user, ChatGPT will often replay the full login flow in every test. In Playwright you can store authentication state with storageState and reuse it. Mention this in your prompt if it applies: "Assume auth state is set up via storageState in the project config. Do not include a login flow in this test."

This kind of context-setting is similar to what you need when prompting ChatGPT for background task configs β€” the model needs to know what the surrounding infrastructure already handles so it does not duplicate it.

Reviewing and Hardening the Output

Even with the best prompts, treat ChatGPT output as a first draft written by a capable but context-limited collaborator. Run through this checklist before committing any generated test:

  • No CSS or XPath selectors. Search the file for locator('., locator('#, and locator('//'. If you find any, replace them with role-based equivalents.
  • No waitForTimeout. A single grep catches this. If you find one, ask ChatGPT to rewrite that block using a proper locator assertion or waitForResponse.
  • Response waiters appear before the triggering action. Read through any waitForResponse or waitForRequest calls and confirm they are set up before the click or form submit.
  • Assertions are specific. Every test should assert something that would actually fail if the feature broke. "Page is visible" is not a meaningful assertion.
  • Run it once locally in headed mode. npx playwright test --headed lets you watch what the test is doing. Flakiness that is invisible in logs often becomes obvious when you see the browser.

The review discipline here is the same muscle you need when auditing any AI-generated infrastructure code. If you have read through how to prompt ChatGPT for Nginx configs, you will recognise the same pattern: constrain the output format, provide real context, verify the result against your actual environment.

For teams doing this at scale, it is worth building a small prompt template file that lives in the repo. Every engineer uses the same constraint block when generating tests, which keeps the selector strategy consistent across the whole suite. You can pair that with a lint rule or a pre-commit hook that rejects any test file containing waitForTimeout.

The broader point is that ChatGPT is genuinely useful here. The model understands Playwright's API well enough to produce correct, idiomatic code β€” it just needs your constraints to channel that knowledge toward stable output rather than the path of least resistance. The same principle applies across AI-assisted coding: giving the model explicit rules about what not to do is often more effective than describing what you want in positive terms alone.

Next Steps

  • Add the selector constraint block to a playwright-prompt-template.md file in your repo so the whole team uses consistent prompts.
  • Audit your existing test suite for CSS class selectors and waitForTimeout calls β€” use ChatGPT to rewrite those too, now that you have the right prompt.
  • Add data-testid attributes to your five most-tested components if they do not already have them; it makes every future test easier to generate and more stable.
  • Set up a pre-commit hook or CI lint step that fails if any test file contains waitForTimeout or a raw locator('.' call.
  • Try the Page Object Model prompt pattern on a high-churn page in your app β€” the upfront investment pays off quickly when that page changes.

Frequently Asked Questions

Why does ChatGPT keep generating CSS class selectors for Playwright tests?

ChatGPT defaults to the selector patterns most common in its training data, which includes many older test suites that rely on CSS classes. You can override this by explicitly listing allowed selector strategies in your prompt and forbidding class and ID selectors.

How do I stop AI-generated Playwright tests from using waitForTimeout?

Add a rule to your prompt that explicitly bans page.waitForTimeout() and lists the correct alternatives: expect(locator).toBeVisible() for element waits, page.waitForURL() for navigation, and page.waitForResponse() for network-dependent actions. ChatGPT will follow the constraint reliably.

Is it safe to use data-testid attributes in production code?

Yes, data-testid attributes have no effect on rendering or behavior and add only a negligible amount of HTML weight. They are widely considered a best practice for testability and are explicitly recommended by Playwright's own documentation.

Can ChatGPT generate a full Page Object Model for a complex multi-step flow?

ChatGPT handles POM generation well when you provide the relevant HTML or JSX and a clear list of the interactions to encapsulate. For complex flows, break the prompt into one page object at a time rather than asking for the entire application at once.

How do I handle authentication state in ChatGPT-generated Playwright tests?

Tell ChatGPT in your prompt that authentication is handled via Playwright's storageState in the project config and to skip any login steps. Without that instruction the model will include a full login flow in every test, which slows your suite and creates redundant coverage.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.