Generating Accurate Docstrings With Copilot Without Editing Every Output
You ask Copilot to document a function and it returns something plausible-looking but wrong: the parameter descriptions are generic, the return type is inferred incorrectly, and the summary describes what the function looks like it does rather than what it actually does. You spend more time fixing the output than you would have spent writing the docstring from scratch.
The problem is not that Copilot is bad at docstrings. The problem is that you have not given it enough signal. With a few specific techniques, you can make Copilot's docstring suggestions accurate enough to accept most of the time without a full review pass.
What You'll Learn
- Why Copilot produces inaccurate or generic docstrings and how to address the root cause
- How to prompt Copilot Chat to follow a specific docstring format consistently
- How to use seed docstrings to lock in style across a file or module
- Techniques for handling complex functions with branching logic or multiple return types
- How to add a lightweight automated review layer so bad docstrings don't merge silently
Prerequisites
This guide assumes you are using GitHub Copilot with either the inline suggestion feature or Copilot Chat (available in VS Code, JetBrains, or Visual Studio). Most examples use Python with Google-style or NumPy-style docstrings, but the techniques apply equally to JavaScript JSDoc and other formats. You should be comfortable writing prompts in Copilot Chat.
Why Copilot Gets Docstrings Wrong
Copilot generates docstrings by looking at your function signature, the body, and the surrounding context in the file. When any of those three signals are weak or ambiguous, it fills in gaps with plausible-sounding filler. A function called process_data with a generic df parameter will almost always get a generic docstring back.
There are three common failure modes. First, the summary sentence describes the structure of the code rather than the business intent — "iterates over rows and applies transformation" instead of "calculates rolling 7-day revenue per region." Second, parameter types are omitted or wrong when they are not annotated. Third, the format drifts between NumPy, Google, and Epytext style depending on what Copilot saw earlier in the file.
All three problems are solvable once you understand that Copilot's output quality is mostly a function of context quality.
How Copilot Reads Your Function to Generate Docs
When you trigger inline completion on an empty docstring, Copilot sends a window of the surrounding code to the model. That window includes the lines immediately before and after the cursor — typically the function signature, any existing type annotations, the first chunk of the function body, and the last few docstrings or comments in the same file.
This means two things are directly in your control: the type annotations on the function and the existing docstrings nearby. Improve those two inputs and Copilot's output improves without any prompting at all. Add type hints to every parameter, and Copilot will use them. Write one high-quality docstring earlier in the file, and Copilot will mirror its format for subsequent ones.
# Before: Copilot has nothing to work with
def calculate_discount(price, customer_tier, promo_code):
...
# After: Copilot has everything it needs
def calculate_discount(
price: float,
customer_tier: str,
promo_code: str | None = None,
) -> float:
...
That annotation change alone is usually enough to fix wrong-type issues in generated docstrings. It takes ten seconds and it also improves your IDE's type checking, so there is no trade-off.
Establishing a Docstring Convention Copilot Can Follow
If your codebase has no consistent docstring format, Copilot will invent one per file. Pick a format — Google, NumPy, or reStructuredText — and write it down in a comment at the top of your module or in a project-level CONVENTIONS.md that Copilot Chat can reference.
The fastest way to establish consistency is to write one excellent docstring for the most important function in a file, then let Copilot follow from there. Here is a Google-style example that gives Copilot a clear template to replicate:
def calculate_discount(
price: float,
customer_tier: str,
promo_code: str | None = None,
) -> float:
"""Apply a tiered discount and optional promo code to a base price.
Discount tiers are: 'standard' (0%), 'silver' (5%), 'gold' (10%).
A valid promo_code adds an additional 3% on top of the tier discount.
Raises ValueError if customer_tier is unrecognized.
Args:
price: The base price in USD before any discounts.
customer_tier: Loyalty tier label. Must be 'standard', 'silver', or 'gold'.
promo_code: Optional promotional code string. Ignored if None.
Returns:
The discounted price as a float, rounded to two decimal places.
Raises:
ValueError: If customer_tier is not a recognized tier label.
"""
...
Notice what makes this useful as a Copilot seed: each Args entry leads with a description of meaning, not just the type (which is already in the signature). The Returns section mentions rounding behavior, which is not obvious from the signature. The Raises section is present. Copilot will model subsequent docstrings in the file on this one.
Prompting Copilot Chat for Docstrings Instead of Inline Completion
Inline completion is fast but context-limited. For anything more complex than a 10-line function, use Copilot Chat instead. You get more control over the output and you can specify the exact format in the prompt itself.
Open Copilot Chat, select the function (or paste it in), and use a prompt like this:
Write a Google-style docstring for the selected function.
Rules:
- The one-line summary must describe business intent, not implementation steps.
- Use the type annotations already in the signature; do not repeat types in Args.
- Include a Raises section only if there is an explicit raise statement in the body.
- If the function mutates any argument in place, note that in the summary.
- Do not add examples unless the logic is non-obvious.
The "business intent" instruction is the most impactful line here. It pushes Copilot away from structural summaries like "loops over items" toward intent-driven summaries like "returns the cheapest available flight for a given route."
If you work across many files, store this prompt as a snippet in your editor so you can paste it in one keystroke. That is faster than editing bad output every time. This is similar to the pattern described in prompting Copilot Chat for accurate refactors on legacy codebases, where reusable prompt templates dramatically reduce correction overhead.
Using a Seed Docstring to Anchor the Style
When you start a new file, write the first docstring by hand for the most representative function. Make it complete and correct. Every subsequent Copilot suggestion in that file will treat it as a style anchor.
For a module with mixed utility functions, use the init docstring at the module level to set context:
"""Pricing utilities for the checkout service.
All public functions in this module expect prices in USD as float values
rounded to two decimal places. Functions raise ValueError for invalid
tier labels and TypeError for non-numeric price inputs.
Docstring style: Google. Type annotations are authoritative; do not
repeat types inside Args descriptions.
"""
The last two sentences are written explicitly for Copilot. They tell it how to format Args entries and establish a behavioral contract. You are essentially writing a prompt inside your source file, and it works.
Handling Complex Functions With Multiple Return Paths
Functions with branching logic — multiple return statements, optional return types, or union return types — are where Copilot docstrings break down most often. It tends to document the happy path and ignore the rest.
Your first move is to annotate the return type explicitly with a union or Optional:
def find_user(user_id: int, include_deleted: bool = False) -> dict | None:
...
When you prompt Copilot Chat for this function, add an explicit instruction:
This function has two distinct return conditions.
Document both in the Returns section:
- What it returns when the user is found
- What it returns when the user is not found or is deleted and include_deleted is False
For functions that return different shapes depending on a flag parameter, consider whether the function itself needs to be split up. A docstring that requires three paragraphs to explain two return paths is a signal that the function is doing too much. Copilot's struggle to document it accurately is surfacing a design issue.
Common Pitfalls That Corrupt Copilot's Output
Stale or wrong comments in the function body
Copilot reads inline comments as documentation hints. If you have a comment that says # apply discount when the code actually validates input, Copilot will document the function as one that applies a discount. Audit misleading comments before you generate docstrings for a file.
Function names that obscure intent
A function named handle, run, or process gives Copilot almost nothing to work with. Either rename the function before generating the docstring, or add an intent comment on the line above the def:
# Validates promo code format and checks active status in the promotions table
def handle(code: str) -> bool:
...
That comment is the strongest context signal for the docstring. Remove it after you have accepted the generated docstring if you do not want it in production.
Mixing docstring styles in the same file
If half your functions have NumPy-style docstrings and half have Google-style, Copilot will alternate between them unpredictably. Standardize before generating. A one-time batch conversion with Copilot Chat on your legacy files is faster than doing it by hand.
Accepting without reading the Raises section
Copilot sometimes invents exceptions that are not in the code. Always read the Raises section before accepting. A docstring that claims a function raises KeyError when it does not will mislead every caller who reads it. This is the same silent-correctness problem described in debugging AI code suggestions that silently break edge cases.
Automating the Review Layer
Even with good prompting, you should not rely on manual review as your only quality gate. Add a lightweight automated check to your CI pipeline or pre-commit hooks.
pydocstyle or pydoclint can validate that your docstrings conform to the chosen format, that all parameters are documented, and that the Returns section is present when the return type is not None. Install pydoclint and add it to your pre-commit config:
repos:
- repo: https://github.com/jsh9/pydoclint
rev: 0.3.8
hooks:
- id: pydoclint
args:
- --style=google
- --arg-type-hints-in-signature=True
- --arg-type-hints-in-docstring=False
The flag --arg-type-hints-in-docstring=False tells pydoclint not to require type repetition in Args entries since you are relying on type annotations in the signature. This keeps your docstrings concise and your linter happy at the same time.
Running this on every commit means a Copilot-generated docstring that is missing a parameter or inventing a non-existent exception will fail the hook before it ever merges. You shift from reviewing every docstring to reviewing only the ones the linter flags, which is a much smaller set.
For teams using ChatGPT in addition to Copilot for documentation tasks, the same principle applies: structured prompt templates plus automated linting beats ad hoc review. The pattern for getting useful code reviews from ChatGPT without generic feedback is directly transferable here — specificity in the prompt, structure in the output, automation to catch the rest.
Wrapping Up
Getting accurate docstrings from Copilot is not about finding the perfect one-shot prompt. It is about giving Copilot enough structured context that it has no reason to guess. Here are the concrete steps to take now:
- Add type annotations to every parameter and return type in functions you plan to document. Do this before triggering Copilot.
- Write one seed docstring per file by hand, following your chosen format exactly. Let Copilot mirror it for the rest of the file.
- Use a saved Copilot Chat prompt that specifies format, instructs on business intent summaries, and lists exactly what to include in
Raises. - Audit inline comments before bulk-generating docstrings. Misleading comments produce misleading docs.
- Add pydoclint or pydocstyle to your pre-commit hooks so structural errors are caught automatically and you only manually review flagged output.
Follow those five steps and you will spend most of your time writing code, not fixing documentation that was almost right.
Frequently Asked Questions
Why does Copilot keep generating the wrong parameter descriptions in docstrings?
Copilot infers parameter descriptions from the function body and nearby code context. If your parameters have generic names or lack type annotations, Copilot fills gaps with plausible-sounding but inaccurate descriptions. Adding explicit type hints and a seed docstring in the same file dramatically improves accuracy.
How do I get Copilot to use Google style docstrings consistently across a project?
Write one complete, correct Google-style docstring for the first documented function in each file and Copilot will mirror that format for subsequent completions. For new files, add a module-level docstring that explicitly states the docstring style convention — Copilot reads it as context.
Is there a way to validate that Copilot-generated docstrings are complete without reading every one?
Yes — tools like pydoclint and pydocstyle can be added to pre-commit hooks to check that all parameters are documented, the Returns section is present, and the format matches your chosen style. This way you only manually review the docstrings that fail the automated check.
Can I use Copilot Chat to generate docstrings for multiple functions at once?
You can paste several functions into a Copilot Chat session with a prompt specifying your format rules and ask it to return all docstrings in one response. This works well for batches of 3 to 5 short functions, but accuracy tends to drop for larger batches where context gets crowded.
Why does Copilot sometimes add exceptions to the Raises section that don't exist in my code?
Copilot infers potential exceptions from patterns it associates with similar functions, not solely from explicit raise statements in your code. Always read the Raises section before accepting a suggestion, and instruct Copilot Chat to include Raises entries only when there is an explicit raise statement in the function body.
📤 Share this article
Sign in to saveRelated Articles
Comments (0)
No comments yet. Be the first!