What AI is good at — and not so good at
Map the mechanism to real tasks so you know exactly when to trust AI output and when to verify it.
- List the categories of tasks where LLMs reliably excel
- List the categories where LLMs reliably struggle or fail
- Apply the "trust vs. verify" decision to a given task
- Recognise the warning signs of a hallucinated or unreliable response
The previous lesson showed that LLMs optimise for plausibility, not truth, and that they sample from a probability distribution. Those properties have direct, predictable consequences: there are entire categories of tasks where AI performs reliably well, and entire categories where it performs reliably poorly. Knowing the map is the most practical skill you can bring to any AI tool.
This is not a lesson about any one product's limitations — the landscape changes quickly. It is about the structural properties of LLMs that make certain tasks easy and others hard, and those structural properties are stable.
Where AI excels
Synthesis and transformation. Take existing content and reshape it: summarise a long document, translate it into another language, rewrite it in simpler terms, extract the key points into a bullet list, or convert an informal note into a formal email. The model has seen enormous amounts of each kind of text and can shift register fluently.
Explaining and teaching. Given a topic, an LLM can produce a clear explanation calibrated to a given level of background knowledge. It does this well because explanations in written text are abundant in training data. This is genuinely useful — ask it to explain a concept five different ways until one clicks.
Generating first drafts and skeletons. Writing the first version of something is the expensive part: a blank page, a stub function, a draft email you will revise anyway. AI is fast here and costs you nothing but editing time. Use it for boilerplate, scaffolding, and first passes.
Pattern transformation on code. Renaming things consistently, converting loops to list comprehensions, adding error handling everywhere it is missing, adapting code from one framework to another. These are pattern-matching tasks, and LLMs are excellent pattern-matchers.
Brainstorming. Need five approaches to a design problem? Ten variable names? A list of edge cases you might have missed? An AI can generate options quickly and without the social friction of a brainstorm with colleagues. You still decide which options are good.
Where AI struggles
Precise factual recall. A model's "knowledge" is baked into its weights at training time and cannot be updated. It may confidently cite a paper that does not exist, state the wrong release date for a library version, or give you a phone number that belongs to someone else. Any claim that requires precise factual accuracy needs external verification.
Arithmetic and exact computation. LLMs are not calculators. They produce token sequences that look like arithmetic, but the process is statistical interpolation, not actual calculation. For anything beyond trivial arithmetic, do not rely on the model — use code or a calculator. Interestingly, asking the model to write code that computes something is usually far more reliable than asking it to compute it directly.
A particularly dangerous failure mode: the model produces a number that is nearly right. A wrong answer that looks plausible is harder to catch than an obviously wrong one. Always verify computed values.
Tasks requiring real-world grounding. The model cannot browse the web (unless explicitly given a tool to do so), does not know what time it is, and cannot access your files, your codebase, or any live system. Its knowledge has a training cutoff. Questions about recent events, current prices, live system state, or anything that requires up-to-the-minute information are outside its reach — and it may confabulate an answer rather than admit ignorance.
Consistent reasoning over very long contexts. LLMs work within a context window. As the conversation grows longer, earlier content gets less weight in the model's attention mechanism. Long, complex multi-step reasoning across many turns tends to accumulate errors and drift. This is not a bug that is about to be fixed — it is a consequence of how attention works at scale.
Knowing what it doesn't know. This may be the most important limitation. Human experts usually have calibrated uncertainty: they know the edges of their knowledge and will say "I'm not sure, you should verify this." LLMs frequently do not. The model's confidence in its output correlates poorly with its accuracy. A response about a topic the model knows nothing about often reads exactly like a response about a topic it knows well.
Applying the trust-vs-verify decision
A useful rule of thumb: ask whether the task is about pattern and form or about specific facts and computation.
- Pattern and form (drafting, explaining, transforming, brainstorming): lean on the AI, edit the result.
- Specific facts, numbers, citations, real-world state: treat AI output as a first hypothesis, then verify against sources you trust.
- Code: read it before running it. AI-generated code is often correct in structure but wrong in detail — a library method that does not exist, an off-by-one error, a missing edge case. Running it is verification; reading it is understanding.
A practical rule: if a mistake in this output would cause you real harm (wrong medical information, incorrect financial figures, buggy code in production), verify it independently. If a mistake just means you have to edit a draft, the cost is low and AI is a net time-saver.
Check your understanding
Knowledge check
- 1.Which of the following are tasks where LLMs reliably perform well?
- 2.You ask an AI to tell you the current exchange rate between USD and EUR. The AI gives a specific number confidently. What should you do?
- 3.Asking an AI to write code that computes a value is generally more reliable than asking it to compute the value directly.
Where to go next
You now have a clear picture of what AI can and cannot do. The next step is practice: the lab will walk you through probing these strengths and limits directly, so you develop an intuition you can trust in real situations.