Validating AI solutions
AI-generated code often runs but fails on edge cases — learn the three-layer validation framework and how to use AI itself to find the gaps.
- Apply the three-layer validation framework to any AI-generated solution
- Write test cases before asking AI to implement, then verify the implementation passes them
- Use AI to surface edge cases its own solution might miss
- Recognise common hallucination patterns in AI-generated code
Code that runs is not the same as code that is correct. This gap — between "it executes without error" and "it produces the right result in every case that matters" — is where AI-generated solutions most often fail. Understanding that gap and knowing how to close it systematically is one of the most practically important skills in working with AI.
The three layers of validation
Every AI-generated solution passes through at least three distinct checks, and failing one doesn't mean the next check will also fail.
Layer 1: Does it run? Syntax errors, import errors, type mismatches that Python catches at call time — the most surface-level failures. AI rarely fails at this layer for straightforward tasks in common languages.
Layer 2: Does it produce correct output for the obvious case? You give it the example from the prompt, or a simple representative input, and check the result. AI often passes this layer too, because the obvious case is the one that was implicitly or explicitly in the prompt.
Layer 3: Is it correct for all cases that matter? Edge cases, boundary conditions, unexpected inputs, scale. This is where AI-generated solutions fail most often — not because the model is careless, but because edge cases weren't in the specification and the model had no reason to consider them.
A solution that passes layers 1 and 2 and fails layer 3 can look completely correct until it reaches production. The layer-3 gap is the dangerous one.
Test-driven development with AI
One of the most effective ways to close the layer-3 gap is to write test cases before asking AI to implement the solution. This is test-driven development (TDD), applied to AI-assisted coding.
The workflow:
-
Write the tests. Before you ask for an implementation, write 3–5 test cases that cover the obvious case, boundary cases, and the cases you know from domain knowledge might be tricky. You don't need to run them yet — you're defining the contract.
-
Ask for an implementation that passes the tests. Include the tests in your prompt: "Write a function that satisfies these tests: [paste tests]." The tests become part of the specification.
-
Run the tests against the implementation. If any fail, you have a precise failure description to give the model in the next iteration.
The tests give you something AI cannot easily fabricate: an independent definition of correct behaviour that predates the implementation.
You don't need to write tests in a formal framework for this to work. Even informal check statements — "assert result == expected" in Python, or just a list of "given X, expect Y" examples in your prompt — are more effective than describing expected behaviour in prose.
Asking AI for its own edge cases
Once you have a working implementation, ask the model directly:
"What edge cases could this function fail on? Include cases you think are unlikely but possible."
This is the "request a critique" strategy from the prompt crafting module, applied to correctness. The model often knows — in the abstract — about edge cases it didn't handle when focusing on the primary implementation. Asking explicitly surfaces them.
Common categories worth probing:
- Empty input: empty list, empty string, empty dictionary
- Single-element input: when the "more than one" assumption breaks down
- All-identical input: sorts, deduplication, comparisons where every element is the same
- Type edge cases: zero, negative numbers, None, very large numbers
- Encoding and whitespace: strings with leading/trailing spaces, mixed case, unicode
You don't need to ask about all of these every time. But knowing the categories means you can prompt specifically: "How does this handle an empty input list? What about a list with one element?"
Common hallucination patterns
AI-generated code can contain specific classes of errors that are worth recognising:
Invented functions. The model generates a call to a function that does not exist in the standard library or the specified library. It sounds plausible — the name is consistent with the library's naming conventions — but it simply isn't there. Check any unfamiliar function calls against the official documentation before trusting them.
Incorrect library APIs. Real functions called with wrong signatures, wrong parameter names, or parameter orders that changed between library versions. The model's training data spans multiple library versions and may have learned outdated patterns.
Off-by-one errors. Slicing, indexing, and range boundaries are a persistent source of subtle bugs in AI-generated code. A solution that's correct for a list of length 5 may be wrong for length 1 or length 0.
Wrong assumptions about data types. The model may assume a value is always a list when it can also be None, or assume a string is always ASCII when it may contain unicode. These assumptions are rarely visible in the code — they're missing guards rather than present errors.
The most dangerous hallucination pattern is confident incorrectness — when the model presents a wrong answer with no hedging or uncertainty signals. AI-generated code that "looks right" deserves the same scrutiny as code that looks suspicious. The model's confidence is not evidence of correctness.
Code review as a validation step
After running tests, do a line-by-line read of the implementation. This is not optional. Testing tells you whether the code is correct for the cases you tested. Reading tells you whether the code is correct in general — whether the logic makes sense independent of any specific test.
Specific things to look for in a line-by-line review of AI-generated code:
- Every function call to a library — does it exist and does it have this signature?
- Every conditional — are the comparisons correct? Are there missing
elsebranches? - Every loop — are the bounds right? Is there an off-by-one?
- Every assumption about data shape — can the actual data ever violate this assumption?
Ask the model to walk you through the logic if any part is unclear. "Explain what line 7 does and why you chose this approach" is a valid review step.
Where to go next
You can now generate AI solutions and validate them systematically. The final lesson in this module covers a different role for AI: using it as a research and documentation partner — the practices for learning about unfamiliar codebases, APIs, and concepts without losing your grip on what's actually true.
Knowledge check
- 1.At which validation layer do AI-generated solutions most often fail?
- 2.Which of the following are recognised hallucination patterns in AI-generated code? Select all that apply.
- 3.If AI-generated code passes all the test cases you wrote, you can be confident it is correct for all inputs.
Debugging with AI
Give AI the right information to become an effective debugging partner — and avoid the trap of accepting fixes that treat the symptom rather than the cause.
AI as a research and documentation tool
Use AI to navigate unfamiliar codebases, summarise documentation, and accelerate learning — while keeping primary sources as your ground truth.