Can GitHub Copilot generate tests for functions it has never seen before?

Yes, but the quality depends heavily on the context you provide. Always attach the source file using the #file reference in Copilot Chat rather than describing the function in plain text, or Copilot will generate tests based on assumptions about what the function does rather than its actual implementation.

How do I know if Copilot-generated tests are actually testing the right behavior?

Run the tests, then verify the expected values in assertions manually for any logic-heavy cases. Also enable branch coverage with pytest --cov-branch to check whether every code path in the function is exercised, not just the lines.

What's the best way to prompt Copilot to cover edge cases in legacy functions?

After generating an initial test suite, send a follow-up prompt asking Copilot to list edge cases it might have missed, then generate tests for them. Being explicit in your initial prompt about inputs like None, zero, negative numbers, and boundary values also dramatically improves coverage.

Does Copilot work for writing tests in languages other than Python?

Yes — the prompting strategy in this guide applies to any language Copilot supports, including JavaScript with Jest, Java with JUnit, and C# with xUnit. The #file context and explicit coverage instructions work the same way regardless of language.

Should I mock external dependencies when Copilot writes tests for legacy functions?

Always mock external dependencies like database calls, HTTP requests, and file system access. Tell Copilot explicitly which dependencies exist and which mocking library to use, otherwise it may generate tests that require a live environment to run or that silently pass because the mock is patching the wrong path.

Copilot in VS Code: Write Tests for Legacy Functions

You inherit a function that's been running in production for four years. No tests, no comments, and the original author left the company two years ago. You need to change it — but without tests, every edit is a leap of faith.

GitHub Copilot in VS Code can accelerate the process of writing those first tests significantly. But the output is only as good as the context you give it. This guide walks through a repeatable workflow that produces tests you can actually trust.

What You'll Learn

How to read and summarize a legacy function before prompting Copilot
The difference between using inline suggestions vs. Copilot Chat for test generation
How to write prompts that produce tests with real coverage, not just happy-path stubs
How to validate Copilot's test output before committing it
Common failure modes to watch out for

Prerequisites

You'll need VS Code with the GitHub Copilot extension installed and an active Copilot subscription. The examples in this article use Python with pytest, but the prompting strategy applies equally to JavaScript/TypeScript with Jest, Java with JUnit, or C# with xUnit. You should be comfortable reading the legacy function's language even if you didn't write it.

Understanding the Function Before You Prompt

Copilot generates better tests when you understand the function first. Don't skip this step and expect the AI to do the analysis for you — it will fill gaps with assumptions that may be wrong.

Before opening Copilot Chat, answer these four questions about the function:

What does it accept? List every parameter and its expected type and shape.
What does it return? Note the return type and any documented or implied invariants.
What are the edge cases? Empty inputs, None, negative numbers, empty lists, very large values.
What side effects does it have? Database writes, file I/O, external HTTP calls — these need mocking.

Here's the example function we'll use throughout this article:

def calculate_late_fee(days_overdue, base_amount, membership_tier):
    if days_overdue <= 0:
        return 0.0
    rate = 0.05
    if membership_tier == "gold":
        rate = 0.03
    elif membership_tier == "platinum":
        rate = 0.01
    fee = base_amount * rate * days_overdue
    if fee > base_amount * 0.5:
        fee = base_amount * 0.5
    return round(fee, 2)

It's short, but it has multiple branches and a cap on the output. Those are exactly the things a generated test suite might miss if you don't guide Copilot explicitly.

How to Use Copilot Inline Suggestions to Bootstrap Tests

Copilot's inline autocomplete works best when it has strong surrounding context. Open a new test file and start writing a test function name that describes intent clearly.

import pytest
from billing import calculate_late_fee

def test_calculate_late_fee_returns_zero_when_not_overdue():

After typing that function signature, pause for a second. Copilot will often suggest a complete test body. Accept it with Tab if the logic looks correct, then move to the next test. The key here is that your function name is the prompt. A name like test_fee_gold_tier_applies_reduced_rate will guide Copilot toward the right assertion far more reliably than test_fee_2.

Use this approach to quickly scaffold five to eight test shells before switching to Copilot Chat for the harder cases. It's faster than typing and gets you a rough structure to react to.

Using Copilot Chat to Generate Tests With Context

For functions with branching logic, Copilot Chat gives you much more control. Open the Chat panel (Ctrl+Alt+I on Windows/Linux, Cmd+Alt+I on Mac) and use the #file context variable to attach the source file directly.

A basic prompt might look like:

Using #file:billing.py, write a complete pytest test suite for the calculate_late_fee function. Cover: zero and negative days overdue, each membership tier, the 50% fee cap, and floating point rounding. Use descriptive test function names.

The #file reference means Copilot reads the actual source rather than inferring from your description. This matters. Without it, Copilot often generates tests for a function it has imagined, not the one you have. If your function lives in a larger module, you can also highlight just the function, right-click, and choose Copilot > Generate Tests from the context menu — this scopes the context automatically.

Copilot Chat is also useful for understanding legacy code you haven't deciphered yet. This pairs well with the workflow described in prompting Copilot Chat for accurate refactors on legacy codebases, which covers how to extract meaning from dense or poorly commented code before modifying it.

Crafting Prompts That Produce Useful Tests

The single biggest mistake people make is asking Copilot to "write tests" without specifying what the tests should cover. You get happy-path stubs that all pass trivially and tell you nothing about the function's actual behavior under stress.

Be explicit about coverage targets

Instead of:

Write tests for calculate_late_fee.

Use:

Write pytest tests for calculate_late_fee that test: (1) days_overdue of 0 and -5, (2) membership tiers "standard", "gold", and "platinum", (3) a scenario where the computed fee would exceed 50% of base_amount to confirm the cap, (4) a non-membership tier string to confirm fallback behavior, and (5) a base_amount of 0.

That second prompt is harder to write, but it forces you to think through the function's contract — and Copilot's output will be dramatically more useful.

Ask for edge cases explicitly

After the first batch of tests, follow up with:

What edge cases might I have missed for this function? List them, then generate tests for any you think are untested.

Copilot will often surface things like base_amount being negative, or a membership_tier of None, that weren't in your original list. You're using it as a thinking partner here, not just a code printer.

Request mocks when you need them

If the function makes external calls, tell Copilot explicitly:

The function calls db.get_member() which hits a database. Use unittest.mock.patch to mock it in each test so tests run without a real database connection.

This same principle — being explicit about what you want the AI to handle and what you want it to avoid — applies across AI-assisted coding tasks. The approach to writing SQL with ChatGPT without blind trust covers a similar mindset for query generation.

Reading and Validating What Copilot Gives You

Never commit Copilot-generated tests without running them and reading every assertion. This isn't optional. Here's what to check:

Do the tests actually run? Run pytest -v immediately. A test that errors on import is useless.
Do the assertions match real expected values? Copilot sometimes calculates expected values incorrectly. Check the math manually for at least the arithmetic-heavy cases.
Are the test names accurate? A test named test_gold_tier_applies_discount that actually tests the platinum tier is a time bomb.
Are mocks patching the right import path? Mocks that patch the wrong module path silently do nothing.
Do all tests pass for the right reasons? A test that asserts fee == 0.0 and passes because of a bug, not because the function is correct, gives false confidence.

For the calculate_late_fee function, manually verify a case like this:

# Gold tier, 20 days overdue, $100 base
# rate = 0.03, fee = 100 * 0.03 * 20 = 60.0
# cap = 100 * 0.5 = 50.0
# fee exceeds cap, so fee = 50.0
assert calculate_late_fee(20, 100, "gold") == 50.0

Work through the arithmetic yourself before trusting Copilot's expected value in the assertion. This habit catches a large proportion of incorrect tests.

If you're concerned about Copilot generating suggestions that silently break edge cases in other parts of your codebase, debugging AI code suggestions that silently break edge cases is worth reading alongside this workflow.

Common Pitfalls When Using Copilot for Legacy Tests

Copilot tests the function it imagines, not the one you have

Without explicit file context, Copilot invents plausible function behavior based on the name and any surrounding code. Always attach the source file using #file or by highlighting and using the context menu. Don't describe the function in plain text and expect accurate tests.

All tests pass but coverage is shallow

A test suite where every test exercises the default happy path can show 90% line coverage while testing almost nothing useful. Use pytest-cov with branch coverage enabled (pytest --cov=billing --cov-branch) to spot untested branches, not just untested lines.

Mocks don't reflect real behavior

Copilot's mocks are structurally correct but often return simplified values. A mocked database call that always returns True doesn't help you test what happens when the database returns None or raises an exception. Edit the mock return values to cover failure scenarios too.

Copilot generates tests for the wrong version of the function

If you've edited the function during the same session, Copilot may still reference an older version from context. After significant changes, close and reopen the Copilot Chat session or explicitly re-attach the updated file with #file.

Generated docstrings in test functions are misleading

Sometimes Copilot adds a docstring to a test that describes different behavior than what the test actually checks. Delete or rewrite any docstring that doesn't match the assertion. The same discipline applies when generating accurate docstrings with Copilot for production code — always verify the description matches the behavior.

Wrapping Up: Next Steps

Copilot won't write a perfect test suite for your legacy function on the first try, but it will get you to a working first draft in a fraction of the time it would take manually. The key is treating it as an accelerator for your thinking, not a replacement for it.

Here are four concrete actions to take after reading this:

Pick one untested legacy function in your codebase right now and run through the four pre-prompt questions in the "Understanding the Function" section.
Use #file context every time you open Copilot Chat for test generation — make it a non-negotiable habit.
Enable branch coverage with pytest --cov-branch to see what your generated tests actually miss, not just what lines they touch.
Review every assertion manually for at least one arithmetic or logic-heavy case per function before committing the tests.
Follow up with an edge-case prompt after the first batch of tests — ask Copilot what it might have missed, then evaluate its suggestions critically.

Using Copilot in VS Code to Write Tests for Untested Legacy Functions

What You'll Learn

Prerequisites

Understanding the Function Before You Prompt

How to Use Copilot Inline Suggestions to Bootstrap Tests

Using Copilot Chat to Generate Tests With Context

Crafting Prompts That Produce Useful Tests

Be explicit about coverage targets

Ask for edge cases explicitly

Request mocks when you need them

Reading and Validating What Copilot Gives You

Common Pitfalls When Using Copilot for Legacy Tests

Copilot tests the function it imagines, not the one you have

All tests pass but coverage is shallow

Mocks don't reflect real behavior

Copilot generates tests for the wrong version of the function

Generated docstrings in test functions are misleading

Wrapping Up: Next Steps

Frequently Asked Questions

Related Articles

Cursor AI Notepads vs Context Files: Stop Feeding It the Wrong Code

Retrieval Latency Spikes in Production RAG: Diagnosing the Real Bottleneck

Embedding Drift Is Breaking Your Recommendation Model in Production

Comments (0)

Leave a Comment

Using Copilot in VS Code to Write Tests for Untested Legacy Functions

What You'll Learn

Prerequisites

Understanding the Function Before You Prompt

How to Use Copilot Inline Suggestions to Bootstrap Tests

Using Copilot Chat to Generate Tests With Context

Crafting Prompts That Produce Useful Tests

Be explicit about coverage targets

Ask for edge cases explicitly

Request mocks when you need them

Reading and Validating What Copilot Gives You

Common Pitfalls When Using Copilot for Legacy Tests

Copilot tests the function it imagines, not the one you have

All tests pass but coverage is shallow

Mocks don't reflect real behavior

Copilot generates tests for the wrong version of the function

Generated docstrings in test functions are misleading

Wrapping Up: Next Steps

Frequently Asked Questions

Related Articles

Cursor AI Notepads vs Context Files: Stop Feeding It the Wrong Code

Retrieval Latency Spikes in Production RAG: Diagnosing the Real Bottleneck

Embedding Drift Is Breaking Your Recommendation Model in Production

Comments (0)

Leave a Comment

Stay ahead of the curve