AI Capabilities and Limitations - Certification Study Guide
AI Capabilities and Limitations - Certification Study Guide
Course: AI Capabilities and Limitations Modules: 6 Target: All professionals seeking a deeper technical understanding of how LLMs work Difficulty: Intermediate
MODULE 1: Getting Started
Key Notes
- Scope of “AI” in this course: Large Language Models (LLMs) specifically — not robotics, computer vision, or narrow AI systems
- LLMs: Claude, GPT-4, Gemini, Llama, Mistral — text-in, text-out systems at their core
- Understanding HOW AI works is not just academic — it directly explains WHY AI behaves the way it does
How AI Gets Its Character — 4-Stage Training Process:
Stage 1: PRE-TRAINING
─────────────────────────────────────────────────────────────
Trained on massive text datasets (web, books, code, etc.)
Learns statistical patterns: what words/concepts follow others
Develops broad world knowledge and language capability
Result: a model that can predict text, but with no alignment
Scale: trillions of tokens; months of compute
Stage 2: SUPERVISED FINE-TUNING (SFT)
─────────────────────────────────────────────────────────────
Human trainers demonstrate preferred responses
Model learns to produce helpful, formatted, structured answers
Result: a model that can follow instructions and be useful
Scale: thousands to millions of curated examples
Stage 3: REINFORCEMENT LEARNING FROM HUMAN FEEDBACK (RLHF)
─────────────────────────────────────────────────────────────
Human raters compare responses and rank them
A "reward model" is trained on these preferences
The LLM is fine-tuned to maximize reward model scores
Result: better alignment with human values and preferences
Stage 4: CONSTITUTIONAL AI / ADDITIONAL ALIGNMENT (varies by provider)
─────────────────────────────────────────────────────────────
Explicit principles guide model behavior
Model learns to critique and revise its own outputs
Anthropic's approach: a "constitution" of values + self-critique
Result: stronger safety and values alignment
Key Insight: AI is a statistical pattern matcher shaped by human feedback — not a thinking being, not a database, not a search engine. Understanding this unlocks understanding of every capability and limitation covered in this course.
What “AI character” means:
- Tone, helpfulness, caution, verbosity = shaped by training choices
- Safety behaviors = trained, not hard-coded rules
- Knowledge = statistical patterns, not stored facts
- Reasoning = pattern-based, not logical computation
- Different providers make different training choices → different behaviors across models
Training Data Composition (approximate, varies by model):
- Web crawl data (Common Crawl and similar): largest share — broad but noisy
- Books and long-form text: higher quality writing patterns
- Code repositories: programming knowledge and structure
- Academic and scientific text: domain knowledge
- Curated datasets: filtered for quality
- The composition shapes what the model “knows” well vs. poorly
Best Practices
- When an AI behaves unexpectedly, ask: “Which stage of training might explain this?”
- Treat AI outputs as the most statistically likely response to your input, not the factually correct answer
- Understand that different AI providers make different training choices — behavior varies across models
Example
Claude declines to help with certain requests not because it cannot produce the text, but because training shaped it to decline. A different model with different training choices might respond differently. Neither is “smarter” — they reflect different training priorities.
Try It Out: Ask two different AI models the same borderline request (e.g., “write a persuasive essay arguing a controversial position”). Note: do they respond differently? What does the difference tell you about their training choices?
MODULE 2: Next Token Prediction
Key Notes
- The core mechanism of every LLM: predict the next most likely token given all previous tokens
- This single mechanism explains both the most impressive capabilities and the most confounding failures
What is a Token?
- Token ≠ word. A token is roughly 4 characters or 0.75 words on average
- “Unbelievable” → approximately 3 tokens: “Un” + “bel” + “ievable”
- “the” → 1 token (common words are compact)
- “mRNA” → may be 2–3 tokens (technical/rare terms cost more)
- Chinese and Japanese characters may be 1–2 tokens each
- Whitespace and punctuation are often their own tokens
- Why it matters: context windows are measured in tokens, not words — 100K token limit ≠ 100K words
Token Counting Examples:
"Hello, world!" → ~4 tokens
"The quick brown fox" → ~4 tokens
"Electroencephalography" → ~6 tokens
"AI will transform every
industry in the next
decade, experts say." → ~17 tokens
Average English text: ~750 words ≈ 1,000 tokens
How Next-Token Prediction Produces Intelligent-Seeming Output:
Prompt: "The capital of France is"
Model calculates probability of every possible next token:
"Paris" → 94.2% probability
"Lyon" → 1.1% probability
"a" → 0.8% probability
"situated" → 0.6% probability
...
Selects "Paris" → then predicts next token given "...France is Paris"
→ and so on until a stop token or length limit
Temperature — Controlling Randomness:
| Temperature | Effect | Token Selection | Best For |
|---|---|---|---|
| 0.0 | Fully deterministic | Always highest-probability token | Factual lookups, code generation |
| 0.1–0.3 | Near-deterministic | Nearly always top token, slight variation | Technical writing, precise instructions |
| 0.5–0.7 | Balanced | Top tokens with moderate variation | General writing, analysis |
| 0.9–1.0 | Creative | Broader distribution, more surprise | Brainstorming, creative writing |
| 1.5+ | High randomness | Very broad; often incoherent | Experimental use only |
Side-by-Side Temperature Examples (same prompt):
Prompt: "Describe how clouds form."
Temperature 0.1:
"Clouds form when water vapor in the atmosphere cools and condenses
around tiny particles called condensation nuclei, such as dust or
sea salt. As warm, moist air rises, it expands and cools..."
[Accurate, predictable, textbook-style]
Temperature 1.0:
"Clouds are born from invisible breath — the sky's own exhalation.
When warm air climbs, carrying its invisible cargo of water molecules,
it reaches the cold heights where those molecules must make a choice..."
[More creative, less predictable, still generally accurate]
Temperature 1.5:
"Clouds. Water. Cold meets warm in the atmospheric ballet of
condensation nuclei dancing with vapor — sky's tears waiting to happen,
meteorologically speaking, of course..."
[Fragmented, creative, unreliable for factual accuracy]
Capabilities That Emerge from Next-Token Prediction:
- Fluent language: predicting plausible next tokens produces grammatical, coherent text naturally
- Code generation: code has high statistical regularity — prediction works extremely well for common patterns
- Style matching: if you write formally, subsequent tokens are predicted to be formal
- Apparent reasoning: chain-of-thought works because reasoning steps are statistically likely to follow each other correctly
- Translation: cross-lingual training data teaches token-to-token mappings across languages
- Summarization: common text patterns around “in summary” and “key points” provide strong training signal
Limitations That Emerge from Next-Token Prediction:
- Hallucinations: the model predicts the most statistically plausible next token — not the most factually accurate. A confident, specific-sounding false statement can be MORE statistically likely than “I don’t know.”
- No self-correction by default: the model does not verify output against a knowledge store — it just keeps predicting
- Sensitive to phrasing: small changes in input change the probability distribution, which changes the output significantly
- Not actually counting or computing: apparent mathematical ability is pattern matching, not calculation — errors increase with complexity
Why Hallucinations Are Fundamental (not a bug to be fixed):
The model is doing exactly what it is designed to do:
predicting statistically likely text.
"The study found that [X]% of adults..."
→ A specific percentage is statistically likely after this phrase
→ The model predicts one (e.g., "73%")
→ The model has no mechanism to check if 73% is real
→ It just knows "73" is a plausible token sequence here
"According to [Author] ([Year])..."
→ A plausible author name and year are statistically likely
→ The model predicts them confidently
→ The citation may be entirely fabricated
This is not a failure of the mechanism — it IS the mechanism.
Verification requires a separate system (search, RAG, human review).
Fluent ≠ Accurate — The Core Confusion: Most AI failures come from users trusting fluency as a signal of accuracy. They are completely independent:
- A hallucination reads exactly as well as a true statement
- Grammatical perfection has zero correlation with factual correctness
- Confident tone is a property of the training data, not a signal of certainty
Best Practices
- For precision-critical outputs (statistics, citations, names, dates), always verify externally
- Use lower temperature settings for factual or technical tasks; higher for creative tasks
- Provide the factual content you need in your prompt — don’t expect the model to recall it accurately
- Treat confident AI assertions about specific facts with the same skepticism as a very well-read but unreliable friend
Try It Out: Ask an AI a factual question where you already know the precise answer (a specific statistic from your field, a date, a technical specification). Note whether the answer is correct, close, or wrong — and observe how confidently it was stated regardless of accuracy.
Example
A journalist asks AI for statistics on youth homelessness to include in an article. The AI produces several specific percentages with apparent source attribution. The journalist fact-checks all of them — two are close but not exact, one cannot be found anywhere, and one cites a real report but with a different number. The journalist uses none of the AI statistics, instead asking AI to help structure the article once they have verified the data themselves.
MODULE 3: Knowledge
Key Notes
- What LLMs “know”: statistical patterns derived from training data, not stored facts in a database
- The distinction matters: a database knows a fact with certainty; an LLM predicts what text about a topic typically says
Training Data Composition and Knowledge Depth:
HIGH-QUALITY KNOWLEDGE (frequent, consistent training data):
─────────────────────────────────────────────────────────────
Major world events and history (well-documented)
Popular programming languages (Python, JavaScript, etc.)
Well-documented science and mathematics
English-language literature and culture
Widely-covered current events (up to cutoff)
VARIABLE KNOWLEDGE (less frequent or inconsistent data):
─────────────────────────────────────────────────────────────
Niche academic fields
Less-documented languages and cultures
Recent developments near the knowledge cutoff
Rapidly evolving fields (AI itself, recent legislation)
Regional or local information
LOW-QUALITY / HIGH-HALLUCINATION RISK:
─────────────────────────────────────────────────────────────
Obscure topics with minimal internet presence
Private organizational knowledge (your company, your team)
Information that is mostly paywalled or not on the web
Very recent events (post-cutoff)
Highly technical specializations with small communities
Knowledge Capabilities:
| Capability | Description | Reliability |
|---|---|---|
| Breadth | Enormous — trained on virtually every domain | High |
| Depth on common topics | Strong — common topics appear frequently in training | High |
| Depth on specialized/rare topics | Variable to low | Medium–Low |
| Cross-domain synthesis | Strong — can connect concepts across fields | High |
| Historical knowledge | Strong for well-documented history | High |
| Technical knowledge | Strong for widely documented technologies | High |
| Recent events (near cutoff) | Partial, inconsistent | Medium |
| Post-cutoff events | None — will hallucinate or hedge | Zero |
Knowledge Cutoff Implications:
WELL BEFORE CUTOFF: Strong, consistent knowledge available
NEAR CUTOFF: Partial, inconsistent, may be wrong
AFTER CUTOFF: No knowledge — model will either hallucinate
or correctly hedge ("I don't have info on that")
HIGH-RISK DOMAINS FOR CUTOFF ISSUES:
─────────────────────────────────────────────────────────────
Current events and breaking news
Recently passed or amended legislation
Current drug approvals and clinical guidelines
Latest software versions, APIs, and frameworks
Current market prices and economic data
Recently published research
Changes in organizational leadership
Sports results and standings
AI model capabilities (the field moves faster than any cutoff)
How AI “Knowledge” Actually Works: AI does not retrieve stored facts. It generates text that is statistically consistent with what training data said about a topic:
- Topics that appeared frequently and consistently → high accuracy
- Topics with conflicting information in training → AI may present one view or hedge
- Topics barely present in training → high hallucination risk, low accuracy
- Your private organizational data → AI has no knowledge whatsoever unless you provide it
The RAG Pattern (Retrieval-Augmented Generation):
THE PROBLEM:
"What does our policy say about expense reimbursement?"
→ AI may hallucinate a policy that doesn't match yours
THE SOLUTION (RAG):
Paste the actual policy text into the prompt, then ask:
"Based on the policy text above, what does it say about
expense reimbursement?"
→ AI now reasons from the actual text you provided
WHY IT WORKS:
The context window contains the authoritative information.
Next-token prediction now draws on that text, not training data.
Accuracy is dramatically higher for specific organizational content.
COMMON RAG APPLICATIONS:
─────────────────────────────────────────────────────────────
Organizational policies and procedures
Product documentation and specs
Legal contracts and agreements
Scientific papers and reports (paste abstracts/excerpts)
Financial statements and reports
Meeting notes and prior decisions
Handling Knowledge Limitations — Decision Framework:
| Situation | Strategy |
|---|---|
| Need current events | Provide context documents; use search-augmented AI |
| Need recent research | Paste abstracts or excerpts into the prompt |
| Need organizational knowledge | RAG — paste the relevant document |
| Need precision facts | Verify independently; don’t rely on AI recall |
| Need proprietary information | AI cannot know it; provide it explicitly |
| Near knowledge cutoff | Ask AI to flag its uncertainty; verify externally |
Best Practices
- Always check the knowledge cutoff date of the AI model you are using
- For any claim about recent events, current data, or rapidly changing fields: verify externally
- Use the RAG pattern for organizational knowledge: paste documents into context rather than expecting AI to know them
- Ask AI to indicate its confidence and flag where its knowledge may be dated
Try It Out: Ask an AI about a development in your field from the last 12 months. Then ask about something that happened 5 years ago in the same field. Compare: where does it hedge? Where is it confident? Does confidence correlate with accuracy in your experience?
Example
A policy analyst asks AI to summarize the current regulatory landscape for fintech lending. AI produces a confident, well-structured summary — but the analyst notices it does not mention a major regulatory change from 8 months ago. The AI’s training cutoff predates the change. The analyst uses AI’s historical framework as a structure, then manually updates with current regulatory information from primary sources.
MODULE 4: Working Memory
Key Notes
- Context window = the “working memory” of an LLM — everything it can “see” at once
- Everything inside the context window influences the model’s predictions
- Everything outside the context window does not exist for the model
Context Window Basics:
| Concept | Explanation |
|---|---|
| Context window | Maximum tokens model can process in one interaction |
| Input tokens | Tokens in your prompt, documents, conversation history |
| Output tokens | Tokens the model generates in its response |
| Total = input + output | Must fit within context window limit |
| Conversation history | Each turn’s messages accumulate in the context window |
Context Window Sizes (approximate, vary by model and version):
| Scale | Approximate Tokens | Approximate Words | What Fits |
|---|---|---|---|
| Small | 4K–8K tokens | 3K–6K words | A few pages of text |
| Medium | 32K–128K tokens | 24K–96K words | A short book or long report |
| Large | 200K tokens | ~150K words | A full novel |
| Extended | 1M+ tokens | ~750K words | Multiple books |
The “Lost in the Middle” Effect — Critical Concept:
CONTEXT STRUCTURE:
[Beginning] ──────────── [Middle] ──────────── [End]
Model attention tends to be stronger at the beginning and end.
Information buried deep in a very long middle section
may be recalled less reliably in the model's output.
PRACTICAL IMPLICATION:
Put your most critical instructions at the BEGINNING of your prompt.
Put your most important constraints at the END as a reminder.
Do NOT bury key requirements in the middle of a long preamble.
RESEARCH FINDING:
In long contexts, models can "lose" information from the middle even
when it would be retrievable from shorter contexts. The effect is
more pronounced in older/smaller models; newer large-context models
handle it better, but the risk never fully disappears.
Context Window Capabilities:
- Can process entire long documents in one interaction (within limits)
- Maintains conversation coherence across many turns
- Can follow complex multi-step instructions when all steps are in context
- Can reference material from earlier in the conversation
- Can “remember” persona and constraints set at the start of the session
Context Window Limitations:
| Limitation | What It Means | Practical Impact |
|---|---|---|
| Finite size | Cannot hold unlimited information | Very long contexts must be truncated or summarized |
| Lost in the middle | Middle content may be recalled less reliably | Place critical info at start/end |
| No persistence | Context resets between conversations | No memory of past sessions by default |
| Cost scales with tokens | Longer contexts cost more to process | Budget implications for API use |
| Attention degradation | Tracking degrades in very long contexts | Reliability decreases with context length |
| Output length limit | Responses are also token-limited | Long generations may be cut off |
Strategies for Managing Long Contexts:
STRATEGY 1: Chunking
Break a large document into sections.
Process each section individually.
Ask AI to produce a summary after each section.
At the end, feed all summaries to AI for synthesis.
STRATEGY 2: Selective extraction
Instead of pasting the full document, extract only relevant sections.
"Here is Section 3.2 of the contract regarding liability..."
More precise, less noise, better results.
STRATEGY 3: Summarization handoff
In a very long conversation, ask AI to summarize the key points so far.
Start a new conversation with that summary as the opening context.
Preserves the key information without the full token cost.
STRATEGY 4: Critical info placement
Always put key instructions at the start of the prompt.
Restate critical constraints at the end: "Remember: output as JSON only."
Don't rely on the model to recall buried middle content.
STRATEGY 5: /compact or conversation management
Use available tools to compact/summarize long conversations.
Some AI interfaces offer built-in context management features.
No Persistent Memory (by default):
- Standard AI has no memory between conversations
- Each new conversation starts from zero — it does not know you or your history
- Workaround: paste relevant context from previous sessions into the new conversation
- Some tools offer memory features — understand exactly what is stored, how, and for how long
- Memory features vary significantly by provider and product tier
Practical Limits for Common Tasks:
FITS COMFORTABLY IN CONTEXT:
─────────────────────────────────────────────
A 20-page report (200K+ token models)
An hour of meeting transcript (~7,500 words)
A short codebase (single files or small modules)
A research paper with discussion (10,000–15,000 words)
APPROACHES LIMITS / REQUIRES MANAGEMENT:
─────────────────────────────────────────────
A full book (50,000+ words)
A large codebase (multiple files, many functions)
A months-long email chain
Multiple long documents simultaneously
BEYOND PRACTICAL LIMITS (even large context models):
─────────────────────────────────────────────
An entire database
All files in a large project
Continuous real-time data streams
Best Practices
- Place the most critical instructions at the beginning of your prompt, not the end of a long preamble
- For very long documents, guide AI to specific sections rather than asking it to “read everything”
- Periodically summarize and restate key context in long conversations
- Never assume AI remembers anything from a previous session
Try It Out: Start a conversation with AI, establish a specific persona and set of constraints (e.g., “You are a concise communicator. Always respond in exactly 3 bullet points. Never use more than 10 words per bullet.”). Have a long back-and-forth exchange on an unrelated topic (20+ messages). Then ask a new question and see whether the original constraints are still honored. This demonstrates instruction drift in long contexts.
Example
A legal team uses AI to review a 200-page contract for risk clauses. Instead of pasting all 200 pages and asking “find all risks,” they work section by section, asking AI to analyze one section at a time and produce a risk summary. At the end, they feed all summaries to AI and ask for a consolidated risk report. This keeps each interaction well within context limits, avoids attention degradation, and produces more reliable results than one massive context dump.
MODULE 5: Steerability
Key Notes
- Steerability = the ability to direct AI behavior through instructions, persona, constraints, and context
- Steerability is what makes AI useful across diverse tasks — the same model can be a tutor, analyst, coder, or creative writer
- Steerability also has limits — training shapes what the model will and won’t do
How Steerability Works (mechanically): Instructions become part of the context window. They shift the probability distribution of next-token prediction.
- “Respond only in bullet points” makes bullet-point-starting tokens more likely
- “You are a formal business analyst” makes formal, analytical token sequences more likely
- The model does not “obey” in a mechanical sense — instructions weight the probabilities
- Very long or complex instructions may only partially shift the distribution
Steerability Capabilities:
| Capability | Example | How Well It Works |
|---|---|---|
| Instruction following | “Respond only in bullet points” → AI uses bullets | Very well for simple, specific instructions |
| Tone adaptation | “Be formal and concise” → AI adjusts register | Well; more pronounced with examples |
| Role adoption | “You are a financial analyst” → AI adjusts expertise | Well; domain knowledge depth still limited by training |
| Format control | “Output as a markdown table” → structured output | Very well for well-defined formats |
| Style matching | “Write like my example” → AI matches style | Well with a clear example provided |
| Constraint adherence | “Do not mention competitors” → AI avoids competitors | Well for simple constraints; degrades with complexity |
| Persona maintenance | “Always respond as [character]” → sustained persona | Degrades in very long conversations |
| Language selection | “Respond only in French” → French output | Well; quality varies by language support |
System Prompts vs. User Prompts:
SYSTEM PROMPT (when available):
─────────────────────────────────────────────────────────────
Set at conversation start by the deployer/operator
Establishes persistent context, role, constraints for entire session
Generally has stronger influence than user messages
Invisible to the end user in many deployments
Used by organizations to create consistent AI behavior
USER PROMPT:
─────────────────────────────────────────────────────────────
Individual turn instructions from the user
Can override system prompt instructions in some cases
Influence may degrade over long conversations
INSTRUCTION HIERARCHY (typical):
Training constraints > System prompt > User instructions
(Safety training beats everything; system prompt beats user for most things)
Steerability Limitations:
| Limitation | Description | How to Handle |
|---|---|---|
| Instruction complexity cap | Very long multi-step instructions may not all be followed | Break into simpler, sequential instructions |
| Instruction drift | In long conversations, early instructions lose influence | Restate key constraints periodically |
| Training boundaries | Model will resist instructions conflicting with safety training | Work within the constraints; do not try to circumvent |
| Competing instructions | Conflicting instructions produce unpredictable results | Resolve conflicts before prompting |
| Persona breaks | Sustained persona may break under pressure or in long sessions | Re-establish persona explicitly when needed |
| Over-specification | Too many constraints at once reduces output quality | Prioritize; use 3–5 key constraints maximum |
| Format regression | AI may revert to default format in long conversations | Restate format requirements as needed |
The Steerability-Safety Interaction:
TRAINING INSTALLS FLOOR BEHAVIORS:
─────────────────────────────────────────────────────────────
Certain behaviors are trained to be very resistant to steering.
This is intentional: prevents steering AI toward harmful outputs.
WHEN AI "REFUSES" AN INSTRUCTION:
Training is working as designed. The refusal is a trained behavior.
JAILBREAKING:
Attempts to circumvent safety behaviors through clever prompting.
Modern models are robustly trained against common jailbreak attempts.
Attempting jailbreaks in professional contexts violates acceptable use.
WHAT IS NOT A SAFETY ISSUE (just needs better prompting):
Getting AI to write in a specific format it's struggling with
Getting AI to maintain a tone it keeps drifting from
Getting AI to stay on topic in long conversations
These are steerability challenges, not safety interactions.
Optimal Steerability Patterns:
EFFECTIVE STEERING:
───────────────────────────────────────────────────────────
Role + Task + Constraints + Format = clear, specific output
"You are a senior data analyst (role). Analyze the attached
sales data (task). Focus only on Q3 2024 (constraint).
Output a 5-bullet executive summary (format)."
INEFFECTIVE STEERING:
───────────────────────────────────────────────────────────
"Can you help me with this data?"
[No role, no task specification, no constraints, no format]
OVER-SPECIFIED (too many constraints):
───────────────────────────────────────────────────────────
"Use exactly 47 words, bullet points only, each starting with a
verb, no adjectives, formal register, British spelling, avoid
passive voice, use Oxford commas, include exactly 3 citations..."
[Competing constraints; model cannot satisfy all simultaneously]
Practical Steerability Applications:
Customer service deployment:
System prompt: "You are a customer service representative for [Company].
Be warm, empathetic, and professional. Keep responses under 150 words.
Focus only on [product] support questions. If a question is about billing,
say: 'I'll connect you with our billing team.' Do not discuss competitors."
Internal analyst assistant:
System prompt: "You are a business analyst assistant for [Company]'s
strategy team. Be concise and data-focused. Always structure answers
with: Key Finding, Supporting Evidence, Recommended Action.
Flag explicitly when you are uncertain about a fact."
Educational tutor:
System prompt: "You are a patient tutor for introductory calculus students.
Never give the answer directly — always guide through questions.
If the student is stuck after 3 attempts, provide a hint, not the answer.
Celebrate correct reasoning, not just correct answers."
Best Practices
- Invest time in role and constraint specification — this is the highest-leverage prompting investment
- For recurring tasks: build a reusable system prompt or prompt template that captures all steering
- If AI ignores an instruction: restate it, place it closer to the beginning, or simplify competing constraints
- Do not fight training boundaries — work within them to find compliant approaches
Try It Out: Write a system prompt for an AI assistant for a specific role (your job, a student helper, a writing coach). Deploy it and have a 10-message conversation. Then deliberately try to get the AI to break its persona or constraints through normal conversation pressure (not jailbreaking — just sustained off-topic conversation). Observe when and how the persona degrades.
Example
A customer service team deploys an AI assistant. Without a system prompt: the AI responds in varying tones, sometimes too casual, sometimes too long, occasionally going off-topic. With a well-crafted system prompt: “You are a customer service representative for [Company]. Always be warm and professional. Keep responses under 150 words. Focus only on [product] support. Escalate billing questions to a human agent.” — behavior is consistent, on-brand, and within scope.
MODULE 6: Conclusion — When Properties Collide
Key Notes
- The four properties (next-token prediction, knowledge, working memory, steerability) do not operate independently — they interact, amplify each other, and sometimes conflict
- Understanding which property is “responsible” for an unexpected output leads to targeted solutions
Property Interactions — Positive:
| Combination | Emergent Capability |
|---|---|
| Knowledge + Steerability | Powerful Q&A with domain expertise and controlled format |
| Context Window + Steerability | Consistent multi-step, long-form task completion |
| Next-token + Knowledge | Fluent, contextually appropriate writing across domains |
| All four aligned | Complex reasoning tasks, nuanced analysis, extended projects |
| Steerability + Context Window | Persona maintained consistently across a long session |
| Knowledge + Next-token | Sophisticated analogies and cross-domain synthesis |
Property Interactions — Problematic (8+ collision scenarios):
| Collision Scenario | Properties Involved | What Happens | Diagnosis |
|---|---|---|---|
| Confident hallucination | Next-token + Knowledge gap | AI states false information with high fluency and confidence | Most dangerous failure mode |
| Instruction drift in long sessions | Steerability + Working memory | Early constraints gradually lose influence as context grows | Common in multi-turn sessions |
| Outdated confident advice | Knowledge (cutoff) + Steerability | AI confidently applies a steered role using outdated information | Knowledge cutoff + role adoption |
| Context truncation | Working memory (limits) + Steerability | Instructions truncated; AI forgets constraints set earlier | Long context + instruction placement |
| Fabricated domain citation | Next-token + Knowledge + Steerability | Steered to expert role, AI predicts domain-appropriate but fabricated citations | Expert role amplifies hallucination confidence |
| Inconsistency across sessions | Working memory (no persistence) + Steerability | Same question gets different answers in different sessions | No memory; different random seeds |
| Mid-context confusion | Working memory (lost in middle) + Knowledge | AI ignores relevant context pasted into the middle of a long prompt | Attention degradation |
| Safety-steerability tension | Steerability + Training | Instructions ask AI to do something training resists | Training floors override steerability |
The “Which Property Caused This?” Decision Tree:
UNEXPECTED AI BEHAVIOR
│
▼
Is the output factually wrong or fabricated?
├── Yes → Is this about post-cutoff events?
│ ├── Yes → KNOWLEDGE (cutoff) issue
│ │ Fix: provide the correct info; use RAG
│ └── No → NEXT-TOKEN PREDICTION (hallucination) issue
│ Fix: verify; provide context; don't rely on AI recall
└── No → Continue
│
▼
Is AI ignoring your instructions?
├── Yes → Is this a safety/ethical refusal?
│ ├── Yes → STEERABILITY + TRAINING interaction
│ │ Fix: work within constraints; rephrase the request
│ └── No → STEERABILITY (drift or complexity) issue
│ Fix: simplify; restate; move instructions to start
└── No → Continue
│
▼
Is AI forgetting earlier context?
├── Yes → Is this between sessions?
│ ├── Yes → WORKING MEMORY (no persistence)
│ │ Fix: provide context at session start each time
│ └── No → WORKING MEMORY (lost in middle or window size)
│ Fix: restate; summarize; restructure prompt
└── No → Continue
│
▼
Is output inconsistent with identical prompts?
└── Yes → NEXT-TOKEN PREDICTION (temperature/randomness)
Fix: lower temperature for deterministic tasks
The Capability-Limitation Continuum (Expanded): Every strength has a shadow — produced by the same mechanism.
PROPERTY CAPABILITY LIMITATION
──────────────────────────────────────────────────────────────────────
Next-token Fluent, coherent, Hallucinations; fluency ≠
prediction grammatically perfect text accuracy; no self-check
Next-token Style matching, tone Sensitive to phrasing;
prediction adaptation, creative writing different prompts = different outputs
Knowledge Broad domain expertise Knowledge cutoff; accuracy
across virtually all fields degrades for rare topics
Knowledge Cross-domain synthesis, Overconfidence about topics
analogies, connections with limited training data
Working memory Process long documents; Finite window; lost-in-middle;
(context window) maintain conversation no persistence between sessions
coherence
Working memory Follow complex multi-step Instruction drift in long
instructions within session conversations
Steerability Adapts to any task, Safety limits that cannot be
persona, format, domain overridden; training floors
Steerability Consistent behavior in Over-specification degrades
deployed systems output; persona breaks down
Diagnostic Framework — Full Version:
| Symptom | Most Likely Property | Targeted Solution |
|---|---|---|
| Confident false information | Next-token + Knowledge | Provide correct info in prompt; verify outputs |
| Fabricated citation | Next-token prediction | Never use AI citations without verification |
| Doesn’t know recent event | Knowledge (cutoff) | Provide the information in the prompt |
| Ignores instructions | Steerability (drift or conflict) | Restate; simplify; reposition |
| Refuses a reasonable request | Steerability + Training | Rephrase; work within training constraints |
| Wrong format | Steerability (insufficient specificity) | Be more explicit; provide an example |
| Forgets earlier context | Working memory (window size) | Restate key context; use compact/summarize |
| Inconsistent across sessions | Working memory (no persistence) | Provide context at session start each time |
| Different outputs to same prompt | Next-token (temperature) | Lower temperature; accept natural variation |
| Loses persona in long chat | Steerability + Working memory | Restate persona; use system prompt |
| Too verbose / too brief | Steerability (missing constraints) | Specify length explicitly |
| Misses key detail in long doc | Working memory (lost in middle) | Restructure; put key content at start/end |
The Practitioner’s Mental Model:
When AI surprises you, ask:
1. Is this a KNOWLEDGE issue?
→ Was the info in training data? Is it post-cutoff? Is it obscure?
→ Fix: provide the information yourself (RAG pattern)
2. Is this a NEXT-TOKEN issue?
→ Is AI predicting plausible text rather than accurate text?
→ Fix: verify; provide context; use lower temperature
3. Is this a WORKING MEMORY issue?
→ Did key context fall out of the window or get ignored?
→ Fix: restate; summarize; restructure prompt; start fresh
4. Is this a STEERABILITY issue?
→ Is AI not following instructions as intended?
→ Fix: clarify; simplify; reposition; check for conflicts
The Honest Assessment of AI:
- AI is not artificially constrained intelligence waiting to be unleashed
- AI is a genuinely novel kind of system with genuine strengths and genuine structural limits
- The limits are not failures — they are properties of the mechanism
- AI fluency means working skillfully within those properties, not around them
- The practitioners who get the most from AI are those with accurate mental models — not those with the highest expectations
Best Practices
- Develop the habit of diagnosing unexpected outputs before retrying — random retry rarely fixes the underlying issue
- Match your mitigation strategy to the property causing the problem
- Maintain realistic expectations: AI is extraordinary at some things and structurally incapable of others
- Share your understanding of AI properties with colleagues — most frustration with AI comes from mismatched expectations
Try It Out: Deliberately induce each of the four failure modes:
- Hallucination: Ask AI about a real but obscure person or paper — see if it fabricates details
- Knowledge cutoff: Ask about a recent event you know occurred after the training cutoff
- Lost in middle: Paste a long document; hide a specific unusual instruction in the middle; ask AI to follow it
- Instruction drift: Set 5 strict formatting constraints; have a 15-message conversation; see which constraints survive
Example
A researcher notices AI keeps returning the same hallucinated citation despite being told the citation is wrong. Diagnosis: this is a next-token prediction issue — the statistically likely token sequence after this topic continues to predict that citation. Random correction mid-conversation does not update the model’s weights. Solution: provide the correct citation explicitly in the prompt, ask AI to use only sources provided, and verify independently. The researcher now understands why “just tell it not to” doesn’t fix hallucinations.
Final Checklist
- I can explain the 4-stage process by which an LLM gets its character (pre-training, SFT, RLHF, alignment)
- I can describe next-token prediction in plain language
- I can explain why hallucinations are a structural feature of next-token prediction, not a fixable bug
- I can define “token” and explain why it differs from a word, with examples
- I can explain what temperature controls and give examples of when to use low vs. high settings
- I can explain why fluent text does not imply accurate text
- I can explain what a knowledge cutoff is and name 5+ high-risk domains for cutoff issues
- I can describe the RAG pattern and explain when to use it
- I can describe training data composition and how it affects knowledge depth
- I can explain what a context window is and name the “lost in the middle” effect
- I can describe strategies for managing long contexts (chunking, summarization, selective extraction)
- I can explain why AI has no persistent memory by default
- I can name 5+ steerability capabilities and 5+ steerability limitations
- I can explain the difference between system prompts and user prompts
- I can explain how safety training interacts with steerability
- I can use the “which property caused this?” decision tree to diagnose an unexpected output
- I can pair each property’s limitation with a targeted mitigation strategy
- I can name at least 6 collision scenarios where properties interact problematically
- I can use the capability-limitation continuum to explain any LLM behavior