GitHub Copilot for Writing Tests: Skip the Boilerplate, Fix the Gaps

June 07, 2026 8 min read 3 views
A robotic arm assembling puzzle pieces into a grid of test checkboxes, representing AI-assisted test generation in software development.

You have a function that works. You know you should write tests for it. You open a new file, stare at the blank screen, and spend ten minutes writing imports and setup code before you even touch an assertion. That friction is exactly where GitHub Copilot earns its place in a testing workflow.

But Copilot is not a QA engineer. It will generate plausible-looking tests that miss edge cases, test the wrong thing, or give you the comfortable illusion of coverage without the substance. This guide shows you how to use it well β€” and how to catch it when it cuts corners.

What You'll Learn

  • How to prompt Copilot to generate useful test scaffolding quickly
  • Which types of tests Copilot handles well versus where it struggles
  • How to review generated tests so they actually protect your code
  • Common patterns for edge cases Copilot routinely misses
  • A practical workflow for integrating Copilot into your test-writing habit

Prerequisites

This article assumes you have GitHub Copilot set up in VS Code, JetBrains, or another supported editor. The examples use Python with pytest, but the concepts apply to any language. You should be comfortable writing basic tests manually β€” Copilot works best when you can evaluate what it gives you.

Why Tests Are the Perfect Copilot Task

Tests have a predictable structure. There is setup, an action, and an assertion. That repetition is exactly what autocompletion models are good at. When Copilot sees a function signature and a test file, it has strong patterns to draw from.

The boilerplate problem is real. Writing the twentieth setUp method or the fifteenth fixture feels mechanical because it is. Copilot handles that mechanical layer well, which frees you to focus on what actually requires judgment: deciding what to test and whether the assertions are meaningful.

Getting Copilot to Generate Tests Quickly

The most reliable trigger is context. Open your source file and your test file side by side, or at minimum have the source file open in a recent tab. Copilot uses the surrounding context in your editor session to infer what it should generate.

Start with a comment that names the function and the scenario:

# Tests for calculate_discount(price, customer_tier)

import pytest
from myapp.pricing import calculate_discount

After that comment and import block, start typing def test_ and Copilot will typically suggest a complete test function. Accept it, then add another comment like # test when customer_tier is invalid and let it suggest the next one.

This comment-driven approach is more reliable than waiting for Copilot to guess what you want. Treat your comments as a test plan written in plain English, and let Copilot translate each bullet into code.

A Realistic Example

Say you have this function:

def calculate_discount(price: float, customer_tier: str) -> float:
    """Return the discounted price based on customer tier."""
    tiers = {"bronze": 0.05, "silver": 0.10, "gold": 0.20}
    if price <= 0:
        raise ValueError("Price must be positive")
    discount = tiers.get(customer_tier, 0)
    return round(price * (1 - discount), 2)

With the source file open, Copilot will typically generate something like:

def test_gold_tier_discount():
    assert calculate_discount(100.0, "gold") == 80.0

def test_unknown_tier_no_discount():
    assert calculate_discount(100.0, "platinum") == 100.0

def test_negative_price_raises():
    with pytest.raises(ValueError):
        calculate_discount(-10.0, "gold")

That is a reasonable starting set. The happy path is covered, the unknown tier is handled, and the ValueError is tested. But notice what is missing β€” and this is where your judgment is essential.

The Gaps Copilot Routinely Leaves

Copilot tends to generate tests for the obvious paths. It reads the logic that exists and mirrors it back as assertions. That is useful but insufficient. Here are the gaps to check for every time.

Zero and boundary values

In the example above, the function raises on price <= 0. Copilot tested -10.0 but likely skipped 0.0 exactly. Boundary conditions β€” the exact value at which behavior changes β€” are where bugs hide.

# Copilot missed this one β€” add it yourself
def test_zero_price_raises():
    with pytest.raises(ValueError):
        calculate_discount(0.0, "gold")

Floating-point precision

Copilot may assert == 80.0 on a float result without hesitation. For this function the rounding call makes that safe, but in functions that chain float arithmetic, exact equality assertions are a trap. Use pytest.approx or math.isclose when the function does not explicitly round.

None and type mismatches

What happens when customer_tier is None? Or an integer? Copilot rarely generates tests for incorrect argument types unless you explicitly ask with a comment like # test when customer_tier is None. Python's dict.get(None, 0) returns 0, so the function silently applies no discount. Whether that is correct is a product decision β€” but the test should make the behavior explicit either way.

Side effects and state mutation

For pure functions like this example, Copilot does fine. For functions that write to a database, call an external API, or modify shared state, Copilot-generated tests often skip mocking entirely. They may call the real dependency, which makes your test suite slow, brittle, and environment-dependent.

When you see a generated test that calls a method touching I/O, pause and add a mock before accepting it.

from unittest.mock import patch

def test_send_confirmation_email_called_on_purchase():
    with patch("myapp.notifications.send_email") as mock_send:
        process_purchase(order_id=42)
        mock_send.assert_called_once()

Prompting for Edge Cases Directly

You do not have to wait for Copilot to discover gaps on its own. Prompt it explicitly. After your initial generated tests, add a comment block describing the edge cases you care about:

# Edge cases to cover:
# - price is a very large number (no overflow expected but assert it)
# - customer_tier is an empty string
# - price has many decimal places, check rounding

Copilot will attempt to generate a test for each line. You still need to verify the assertions are correct, but now you are directing the generation rather than hoping it reads your mind.

Integration Tests: More Scaffolding, More Caution

Copilot is useful for integration test scaffolding too β€” setting up a test client, constructing a request payload, and asserting a status code. For a FastAPI or Flask endpoint, it can generate a fixture and a test in seconds.

import pytest
from fastapi.testclient import TestClient
from myapp.main import app

@pytest.fixture
def client():
    return TestClient(app)

def test_create_order_returns_201(client):
    response = client.post("/orders", json={"product_id": 1, "quantity": 2})
    assert response.status_code == 201

That is genuinely useful boilerplate. But Copilot often stops at the status code. A real integration test should also check the response body, verify the database state changed, and confirm any side effects. Add those assertions yourself β€” the scaffolding just got you started faster.

Reviewing Copilot Tests Before You Commit

Treat every Copilot-generated test like a code review item. Ask three questions before accepting it:

  1. Is the assertion testing behavior or implementation? A test that asserts a specific internal variable was set is brittle. A test that asserts the visible output is correct is robust.
  2. Would this test catch a real bug? Delete the assertion mentally. If the test would still pass with a broken version of the function, the assertion is not doing its job.
  3. Does this test depend on external state? If yes, is that dependency mocked, or is it acceptable for this test to require a live environment?

These three questions take about thirty seconds per test. They are the difference between a test suite that gives you confidence and one that gives you green checkmarks.

Common Pitfalls

Accepting tests without running them. Copilot occasionally generates test code that does not execute correctly β€” wrong import paths, mismatched argument counts, or assertions on attributes that do not exist. Always run the test before moving on.

Trusting coverage percentages generated by Copilot tests. High line coverage is easier to fake with Copilot-generated tests than with hand-written ones, because Copilot mirrors the code paths it sees. Coverage percentage tells you which lines were executed, not whether the assertions caught anything meaningful.

Letting Copilot name your tests for you without checking. Names like test_function_works are useless. Good test names describe the scenario and the expected outcome: test_gold_tier_applies_20_percent_discount. Rename any vague suggestions before committing.

Over-relying on it for complex domain logic. If a function implements a multi-step business rule, Copilot's generated tests reflect the code as written. If the code has a logical error, the test will encode that error too. For business-critical logic, write at least the core assertions yourself from the specification, not from the implementation.

A Practical Workflow

Here is a repeatable pattern that balances Copilot's speed with your own judgment:

  1. Write a comment block listing the scenarios you want to test before generating anything. Think of it as a micro test plan.
  2. Let Copilot generate the scaffolding for each scenario.
  3. Run the tests immediately. Fix anything that does not execute.
  4. Read each assertion. Replace any that test implementation details.
  5. Add boundary and error cases that Copilot missed. A quick mental pass through zero values, None inputs, and type mismatches is usually enough.
  6. Check whether any test touches real I/O and add mocks where needed.

This workflow typically takes 20–30% of the time that writing tests from scratch would. The savings come from the scaffolding layer β€” imports, fixtures, the test function skeleton β€” not from skipping the review.

Wrapping Up

Copilot is a useful testing partner for eliminating the setup friction that makes developers delay writing tests in the first place. It is not a replacement for thinking about what your tests should prove.

Here are concrete next steps you can take today:

  • Pick one module you have been avoiding testing. Open a new test file, write a comment-based test plan, and let Copilot generate the scaffolding. Review and fill the gaps.
  • Add a step to your PR review checklist: check whether new Copilot-generated tests include assertions on edge cases, not just the happy path.
  • Practice the three-question review (behavior vs. implementation, catches real bugs, external state) on the next batch of generated tests until it becomes automatic.
  • Try the explicit edge-case prompt pattern on an existing function with tricky boundary behavior and see how many scenarios Copilot can surface versus how many you had to add yourself.
  • Measure your test-writing time before and after adopting this workflow for two weeks. The boilerplate savings are real β€” make sure the quality holds up too.

πŸ“€ Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

πŸ“¬ Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.