← Claude Code & Certification

AI Capabilities and Limitations - Certification Study Guide

AI Capabilities and Limitations - Certification Study Guide

Course: AI Capabilities and Limitations Modules: 6 Target: All professionals seeking a deeper technical understanding of how LLMs work Difficulty: Intermediate


MODULE 1: Getting Started

Key Notes

  • Scope of “AI” in this course: Large Language Models (LLMs) specifically — not robotics, computer vision, or narrow AI systems
  • LLMs: Claude, GPT-4, Gemini, Llama, Mistral — text-in, text-out systems at their core
  • Understanding HOW AI works is not just academic — it directly explains WHY AI behaves the way it does

How AI Gets Its Character — 4-Stage Training Process:

  Stage 1: PRE-TRAINING
  ─────────────────────────────────────────────────────────────
  Trained on massive text datasets (web, books, code, etc.)
  Learns statistical patterns: what words/concepts follow others
  Develops broad world knowledge and language capability
  Result: a model that can predict text, but with no alignment
  Scale: trillions of tokens; months of compute

  Stage 2: SUPERVISED FINE-TUNING (SFT)
  ─────────────────────────────────────────────────────────────
  Human trainers demonstrate preferred responses
  Model learns to produce helpful, formatted, structured answers
  Result: a model that can follow instructions and be useful
  Scale: thousands to millions of curated examples

  Stage 3: REINFORCEMENT LEARNING FROM HUMAN FEEDBACK (RLHF)
  ─────────────────────────────────────────────────────────────
  Human raters compare responses and rank them
  A "reward model" is trained on these preferences
  The LLM is fine-tuned to maximize reward model scores
  Result: better alignment with human values and preferences

  Stage 4: CONSTITUTIONAL AI / ADDITIONAL ALIGNMENT (varies by provider)
  ─────────────────────────────────────────────────────────────
  Explicit principles guide model behavior
  Model learns to critique and revise its own outputs
  Anthropic's approach: a "constitution" of values + self-critique
  Result: stronger safety and values alignment

Key Insight: AI is a statistical pattern matcher shaped by human feedback — not a thinking being, not a database, not a search engine. Understanding this unlocks understanding of every capability and limitation covered in this course.

What “AI character” means:

  • Tone, helpfulness, caution, verbosity = shaped by training choices
  • Safety behaviors = trained, not hard-coded rules
  • Knowledge = statistical patterns, not stored facts
  • Reasoning = pattern-based, not logical computation
  • Different providers make different training choices → different behaviors across models

Training Data Composition (approximate, varies by model):

  • Web crawl data (Common Crawl and similar): largest share — broad but noisy
  • Books and long-form text: higher quality writing patterns
  • Code repositories: programming knowledge and structure
  • Academic and scientific text: domain knowledge
  • Curated datasets: filtered for quality
  • The composition shapes what the model “knows” well vs. poorly

Best Practices

  • When an AI behaves unexpectedly, ask: “Which stage of training might explain this?”
  • Treat AI outputs as the most statistically likely response to your input, not the factually correct answer
  • Understand that different AI providers make different training choices — behavior varies across models

Example

Claude declines to help with certain requests not because it cannot produce the text, but because training shaped it to decline. A different model with different training choices might respond differently. Neither is “smarter” — they reflect different training priorities.

Try It Out: Ask two different AI models the same borderline request (e.g., “write a persuasive essay arguing a controversial position”). Note: do they respond differently? What does the difference tell you about their training choices?


MODULE 2: Next Token Prediction

Key Notes

  • The core mechanism of every LLM: predict the next most likely token given all previous tokens
  • This single mechanism explains both the most impressive capabilities and the most confounding failures

What is a Token?

  • Token ≠ word. A token is roughly 4 characters or 0.75 words on average
  • “Unbelievable” → approximately 3 tokens: “Un” + “bel” + “ievable”
  • “the” → 1 token (common words are compact)
  • “mRNA” → may be 2–3 tokens (technical/rare terms cost more)
  • Chinese and Japanese characters may be 1–2 tokens each
  • Whitespace and punctuation are often their own tokens
  • Why it matters: context windows are measured in tokens, not words — 100K token limit ≠ 100K words

Token Counting Examples:

  "Hello, world!"             → ~4 tokens
  "The quick brown fox"       → ~4 tokens
  "Electroencephalography"    → ~6 tokens
  "AI will transform every
   industry in the next
   decade, experts say."      → ~17 tokens
  Average English text:       ~750 words ≈ 1,000 tokens

How Next-Token Prediction Produces Intelligent-Seeming Output:

  Prompt: "The capital of France is"

  Model calculates probability of every possible next token:
  "Paris"      → 94.2% probability
  "Lyon"       → 1.1% probability
  "a"          → 0.8% probability
  "situated"   → 0.6% probability
  ...

  Selects "Paris" → then predicts next token given "...France is Paris"
  → and so on until a stop token or length limit

Temperature — Controlling Randomness:

TemperatureEffectToken SelectionBest For
0.0Fully deterministicAlways highest-probability tokenFactual lookups, code generation
0.1–0.3Near-deterministicNearly always top token, slight variationTechnical writing, precise instructions
0.5–0.7BalancedTop tokens with moderate variationGeneral writing, analysis
0.9–1.0CreativeBroader distribution, more surpriseBrainstorming, creative writing
1.5+High randomnessVery broad; often incoherentExperimental use only

Side-by-Side Temperature Examples (same prompt):

  Prompt: "Describe how clouds form."

  Temperature 0.1:
  "Clouds form when water vapor in the atmosphere cools and condenses
   around tiny particles called condensation nuclei, such as dust or
   sea salt. As warm, moist air rises, it expands and cools..."
  [Accurate, predictable, textbook-style]

  Temperature 1.0:
  "Clouds are born from invisible breath — the sky's own exhalation.
   When warm air climbs, carrying its invisible cargo of water molecules,
   it reaches the cold heights where those molecules must make a choice..."
  [More creative, less predictable, still generally accurate]

  Temperature 1.5:
  "Clouds. Water. Cold meets warm in the atmospheric ballet of
   condensation nuclei dancing with vapor — sky's tears waiting to happen,
   meteorologically speaking, of course..."
  [Fragmented, creative, unreliable for factual accuracy]

Capabilities That Emerge from Next-Token Prediction:

  • Fluent language: predicting plausible next tokens produces grammatical, coherent text naturally
  • Code generation: code has high statistical regularity — prediction works extremely well for common patterns
  • Style matching: if you write formally, subsequent tokens are predicted to be formal
  • Apparent reasoning: chain-of-thought works because reasoning steps are statistically likely to follow each other correctly
  • Translation: cross-lingual training data teaches token-to-token mappings across languages
  • Summarization: common text patterns around “in summary” and “key points” provide strong training signal

Limitations That Emerge from Next-Token Prediction:

  • Hallucinations: the model predicts the most statistically plausible next token — not the most factually accurate. A confident, specific-sounding false statement can be MORE statistically likely than “I don’t know.”
  • No self-correction by default: the model does not verify output against a knowledge store — it just keeps predicting
  • Sensitive to phrasing: small changes in input change the probability distribution, which changes the output significantly
  • Not actually counting or computing: apparent mathematical ability is pattern matching, not calculation — errors increase with complexity

Why Hallucinations Are Fundamental (not a bug to be fixed):

  The model is doing exactly what it is designed to do:
  predicting statistically likely text.

  "The study found that [X]% of adults..."
  → A specific percentage is statistically likely after this phrase
  → The model predicts one (e.g., "73%")
  → The model has no mechanism to check if 73% is real
  → It just knows "73" is a plausible token sequence here

  "According to [Author] ([Year])..."
  → A plausible author name and year are statistically likely
  → The model predicts them confidently
  → The citation may be entirely fabricated

  This is not a failure of the mechanism — it IS the mechanism.
  Verification requires a separate system (search, RAG, human review).

Fluent ≠ Accurate — The Core Confusion: Most AI failures come from users trusting fluency as a signal of accuracy. They are completely independent:

  • A hallucination reads exactly as well as a true statement
  • Grammatical perfection has zero correlation with factual correctness
  • Confident tone is a property of the training data, not a signal of certainty

Best Practices

  • For precision-critical outputs (statistics, citations, names, dates), always verify externally
  • Use lower temperature settings for factual or technical tasks; higher for creative tasks
  • Provide the factual content you need in your prompt — don’t expect the model to recall it accurately
  • Treat confident AI assertions about specific facts with the same skepticism as a very well-read but unreliable friend

Try It Out: Ask an AI a factual question where you already know the precise answer (a specific statistic from your field, a date, a technical specification). Note whether the answer is correct, close, or wrong — and observe how confidently it was stated regardless of accuracy.

Example

A journalist asks AI for statistics on youth homelessness to include in an article. The AI produces several specific percentages with apparent source attribution. The journalist fact-checks all of them — two are close but not exact, one cannot be found anywhere, and one cites a real report but with a different number. The journalist uses none of the AI statistics, instead asking AI to help structure the article once they have verified the data themselves.


MODULE 3: Knowledge

Key Notes

  • What LLMs “know”: statistical patterns derived from training data, not stored facts in a database
  • The distinction matters: a database knows a fact with certainty; an LLM predicts what text about a topic typically says

Training Data Composition and Knowledge Depth:

  HIGH-QUALITY KNOWLEDGE (frequent, consistent training data):
  ─────────────────────────────────────────────────────────────
  Major world events and history (well-documented)
  Popular programming languages (Python, JavaScript, etc.)
  Well-documented science and mathematics
  English-language literature and culture
  Widely-covered current events (up to cutoff)

  VARIABLE KNOWLEDGE (less frequent or inconsistent data):
  ─────────────────────────────────────────────────────────────
  Niche academic fields
  Less-documented languages and cultures
  Recent developments near the knowledge cutoff
  Rapidly evolving fields (AI itself, recent legislation)
  Regional or local information

  LOW-QUALITY / HIGH-HALLUCINATION RISK:
  ─────────────────────────────────────────────────────────────
  Obscure topics with minimal internet presence
  Private organizational knowledge (your company, your team)
  Information that is mostly paywalled or not on the web
  Very recent events (post-cutoff)
  Highly technical specializations with small communities

Knowledge Capabilities:

CapabilityDescriptionReliability
BreadthEnormous — trained on virtually every domainHigh
Depth on common topicsStrong — common topics appear frequently in trainingHigh
Depth on specialized/rare topicsVariable to lowMedium–Low
Cross-domain synthesisStrong — can connect concepts across fieldsHigh
Historical knowledgeStrong for well-documented historyHigh
Technical knowledgeStrong for widely documented technologiesHigh
Recent events (near cutoff)Partial, inconsistentMedium
Post-cutoff eventsNone — will hallucinate or hedgeZero

Knowledge Cutoff Implications:

  WELL BEFORE CUTOFF:    Strong, consistent knowledge available
  NEAR CUTOFF:           Partial, inconsistent, may be wrong
  AFTER CUTOFF:          No knowledge — model will either hallucinate
                         or correctly hedge ("I don't have info on that")

  HIGH-RISK DOMAINS FOR CUTOFF ISSUES:
  ─────────────────────────────────────────────────────────────
  Current events and breaking news
  Recently passed or amended legislation
  Current drug approvals and clinical guidelines
  Latest software versions, APIs, and frameworks
  Current market prices and economic data
  Recently published research
  Changes in organizational leadership
  Sports results and standings
  AI model capabilities (the field moves faster than any cutoff)

How AI “Knowledge” Actually Works: AI does not retrieve stored facts. It generates text that is statistically consistent with what training data said about a topic:

  • Topics that appeared frequently and consistently → high accuracy
  • Topics with conflicting information in training → AI may present one view or hedge
  • Topics barely present in training → high hallucination risk, low accuracy
  • Your private organizational data → AI has no knowledge whatsoever unless you provide it

The RAG Pattern (Retrieval-Augmented Generation):

  THE PROBLEM:
  "What does our policy say about expense reimbursement?"
  → AI may hallucinate a policy that doesn't match yours

  THE SOLUTION (RAG):
  Paste the actual policy text into the prompt, then ask:
  "Based on the policy text above, what does it say about
   expense reimbursement?"
  → AI now reasons from the actual text you provided

  WHY IT WORKS:
  The context window contains the authoritative information.
  Next-token prediction now draws on that text, not training data.
  Accuracy is dramatically higher for specific organizational content.

  COMMON RAG APPLICATIONS:
  ─────────────────────────────────────────────────────────────
  Organizational policies and procedures
  Product documentation and specs
  Legal contracts and agreements
  Scientific papers and reports (paste abstracts/excerpts)
  Financial statements and reports
  Meeting notes and prior decisions

Handling Knowledge Limitations — Decision Framework:

SituationStrategy
Need current eventsProvide context documents; use search-augmented AI
Need recent researchPaste abstracts or excerpts into the prompt
Need organizational knowledgeRAG — paste the relevant document
Need precision factsVerify independently; don’t rely on AI recall
Need proprietary informationAI cannot know it; provide it explicitly
Near knowledge cutoffAsk AI to flag its uncertainty; verify externally

Best Practices

  • Always check the knowledge cutoff date of the AI model you are using
  • For any claim about recent events, current data, or rapidly changing fields: verify externally
  • Use the RAG pattern for organizational knowledge: paste documents into context rather than expecting AI to know them
  • Ask AI to indicate its confidence and flag where its knowledge may be dated

Try It Out: Ask an AI about a development in your field from the last 12 months. Then ask about something that happened 5 years ago in the same field. Compare: where does it hedge? Where is it confident? Does confidence correlate with accuracy in your experience?

Example

A policy analyst asks AI to summarize the current regulatory landscape for fintech lending. AI produces a confident, well-structured summary — but the analyst notices it does not mention a major regulatory change from 8 months ago. The AI’s training cutoff predates the change. The analyst uses AI’s historical framework as a structure, then manually updates with current regulatory information from primary sources.


MODULE 4: Working Memory

Key Notes

  • Context window = the “working memory” of an LLM — everything it can “see” at once
  • Everything inside the context window influences the model’s predictions
  • Everything outside the context window does not exist for the model

Context Window Basics:

ConceptExplanation
Context windowMaximum tokens model can process in one interaction
Input tokensTokens in your prompt, documents, conversation history
Output tokensTokens the model generates in its response
Total = input + outputMust fit within context window limit
Conversation historyEach turn’s messages accumulate in the context window

Context Window Sizes (approximate, vary by model and version):

ScaleApproximate TokensApproximate WordsWhat Fits
Small4K–8K tokens3K–6K wordsA few pages of text
Medium32K–128K tokens24K–96K wordsA short book or long report
Large200K tokens~150K wordsA full novel
Extended1M+ tokens~750K wordsMultiple books

The “Lost in the Middle” Effect — Critical Concept:

  CONTEXT STRUCTURE:
  [Beginning] ──────────── [Middle] ──────────── [End]

  Model attention tends to be stronger at the beginning and end.
  Information buried deep in a very long middle section
  may be recalled less reliably in the model's output.

  PRACTICAL IMPLICATION:
  Put your most critical instructions at the BEGINNING of your prompt.
  Put your most important constraints at the END as a reminder.
  Do NOT bury key requirements in the middle of a long preamble.

  RESEARCH FINDING:
  In long contexts, models can "lose" information from the middle even
  when it would be retrievable from shorter contexts. The effect is
  more pronounced in older/smaller models; newer large-context models
  handle it better, but the risk never fully disappears.

Context Window Capabilities:

  • Can process entire long documents in one interaction (within limits)
  • Maintains conversation coherence across many turns
  • Can follow complex multi-step instructions when all steps are in context
  • Can reference material from earlier in the conversation
  • Can “remember” persona and constraints set at the start of the session

Context Window Limitations:

LimitationWhat It MeansPractical Impact
Finite sizeCannot hold unlimited informationVery long contexts must be truncated or summarized
Lost in the middleMiddle content may be recalled less reliablyPlace critical info at start/end
No persistenceContext resets between conversationsNo memory of past sessions by default
Cost scales with tokensLonger contexts cost more to processBudget implications for API use
Attention degradationTracking degrades in very long contextsReliability decreases with context length
Output length limitResponses are also token-limitedLong generations may be cut off

Strategies for Managing Long Contexts:

  STRATEGY 1: Chunking
  Break a large document into sections.
  Process each section individually.
  Ask AI to produce a summary after each section.
  At the end, feed all summaries to AI for synthesis.

  STRATEGY 2: Selective extraction
  Instead of pasting the full document, extract only relevant sections.
  "Here is Section 3.2 of the contract regarding liability..."
  More precise, less noise, better results.

  STRATEGY 3: Summarization handoff
  In a very long conversation, ask AI to summarize the key points so far.
  Start a new conversation with that summary as the opening context.
  Preserves the key information without the full token cost.

  STRATEGY 4: Critical info placement
  Always put key instructions at the start of the prompt.
  Restate critical constraints at the end: "Remember: output as JSON only."
  Don't rely on the model to recall buried middle content.

  STRATEGY 5: /compact or conversation management
  Use available tools to compact/summarize long conversations.
  Some AI interfaces offer built-in context management features.

No Persistent Memory (by default):

  • Standard AI has no memory between conversations
  • Each new conversation starts from zero — it does not know you or your history
  • Workaround: paste relevant context from previous sessions into the new conversation
  • Some tools offer memory features — understand exactly what is stored, how, and for how long
  • Memory features vary significantly by provider and product tier

Practical Limits for Common Tasks:

  FITS COMFORTABLY IN CONTEXT:
  ─────────────────────────────────────────────
  A 20-page report (200K+ token models)
  An hour of meeting transcript (~7,500 words)
  A short codebase (single files or small modules)
  A research paper with discussion (10,000–15,000 words)

  APPROACHES LIMITS / REQUIRES MANAGEMENT:
  ─────────────────────────────────────────────
  A full book (50,000+ words)
  A large codebase (multiple files, many functions)
  A months-long email chain
  Multiple long documents simultaneously

  BEYOND PRACTICAL LIMITS (even large context models):
  ─────────────────────────────────────────────
  An entire database
  All files in a large project
  Continuous real-time data streams

Best Practices

  • Place the most critical instructions at the beginning of your prompt, not the end of a long preamble
  • For very long documents, guide AI to specific sections rather than asking it to “read everything”
  • Periodically summarize and restate key context in long conversations
  • Never assume AI remembers anything from a previous session

Try It Out: Start a conversation with AI, establish a specific persona and set of constraints (e.g., “You are a concise communicator. Always respond in exactly 3 bullet points. Never use more than 10 words per bullet.”). Have a long back-and-forth exchange on an unrelated topic (20+ messages). Then ask a new question and see whether the original constraints are still honored. This demonstrates instruction drift in long contexts.

Example

A legal team uses AI to review a 200-page contract for risk clauses. Instead of pasting all 200 pages and asking “find all risks,” they work section by section, asking AI to analyze one section at a time and produce a risk summary. At the end, they feed all summaries to AI and ask for a consolidated risk report. This keeps each interaction well within context limits, avoids attention degradation, and produces more reliable results than one massive context dump.


MODULE 5: Steerability

Key Notes

  • Steerability = the ability to direct AI behavior through instructions, persona, constraints, and context
  • Steerability is what makes AI useful across diverse tasks — the same model can be a tutor, analyst, coder, or creative writer
  • Steerability also has limits — training shapes what the model will and won’t do

How Steerability Works (mechanically): Instructions become part of the context window. They shift the probability distribution of next-token prediction.

  • “Respond only in bullet points” makes bullet-point-starting tokens more likely
  • “You are a formal business analyst” makes formal, analytical token sequences more likely
  • The model does not “obey” in a mechanical sense — instructions weight the probabilities
  • Very long or complex instructions may only partially shift the distribution

Steerability Capabilities:

CapabilityExampleHow Well It Works
Instruction following“Respond only in bullet points” → AI uses bulletsVery well for simple, specific instructions
Tone adaptation“Be formal and concise” → AI adjusts registerWell; more pronounced with examples
Role adoption“You are a financial analyst” → AI adjusts expertiseWell; domain knowledge depth still limited by training
Format control“Output as a markdown table” → structured outputVery well for well-defined formats
Style matching“Write like my example” → AI matches styleWell with a clear example provided
Constraint adherence“Do not mention competitors” → AI avoids competitorsWell for simple constraints; degrades with complexity
Persona maintenance“Always respond as [character]” → sustained personaDegrades in very long conversations
Language selection“Respond only in French” → French outputWell; quality varies by language support

System Prompts vs. User Prompts:

  SYSTEM PROMPT (when available):
  ─────────────────────────────────────────────────────────────
  Set at conversation start by the deployer/operator
  Establishes persistent context, role, constraints for entire session
  Generally has stronger influence than user messages
  Invisible to the end user in many deployments
  Used by organizations to create consistent AI behavior

  USER PROMPT:
  ─────────────────────────────────────────────────────────────
  Individual turn instructions from the user
  Can override system prompt instructions in some cases
  Influence may degrade over long conversations

  INSTRUCTION HIERARCHY (typical):
  Training constraints > System prompt > User instructions
  (Safety training beats everything; system prompt beats user for most things)

Steerability Limitations:

LimitationDescriptionHow to Handle
Instruction complexity capVery long multi-step instructions may not all be followedBreak into simpler, sequential instructions
Instruction driftIn long conversations, early instructions lose influenceRestate key constraints periodically
Training boundariesModel will resist instructions conflicting with safety trainingWork within the constraints; do not try to circumvent
Competing instructionsConflicting instructions produce unpredictable resultsResolve conflicts before prompting
Persona breaksSustained persona may break under pressure or in long sessionsRe-establish persona explicitly when needed
Over-specificationToo many constraints at once reduces output qualityPrioritize; use 3–5 key constraints maximum
Format regressionAI may revert to default format in long conversationsRestate format requirements as needed

The Steerability-Safety Interaction:

  TRAINING INSTALLS FLOOR BEHAVIORS:
  ─────────────────────────────────────────────────────────────
  Certain behaviors are trained to be very resistant to steering.
  This is intentional: prevents steering AI toward harmful outputs.

  WHEN AI "REFUSES" AN INSTRUCTION:
  Training is working as designed. The refusal is a trained behavior.

  JAILBREAKING:
  Attempts to circumvent safety behaviors through clever prompting.
  Modern models are robustly trained against common jailbreak attempts.
  Attempting jailbreaks in professional contexts violates acceptable use.

  WHAT IS NOT A SAFETY ISSUE (just needs better prompting):
  Getting AI to write in a specific format it's struggling with
  Getting AI to maintain a tone it keeps drifting from
  Getting AI to stay on topic in long conversations
  These are steerability challenges, not safety interactions.

Optimal Steerability Patterns:

  EFFECTIVE STEERING:
  ───────────────────────────────────────────────────────────
  Role + Task + Constraints + Format = clear, specific output

  "You are a senior data analyst (role). Analyze the attached
  sales data (task). Focus only on Q3 2024 (constraint).
  Output a 5-bullet executive summary (format)."

  INEFFECTIVE STEERING:
  ───────────────────────────────────────────────────────────
  "Can you help me with this data?"
  [No role, no task specification, no constraints, no format]

  OVER-SPECIFIED (too many constraints):
  ───────────────────────────────────────────────────────────
  "Use exactly 47 words, bullet points only, each starting with a
  verb, no adjectives, formal register, British spelling, avoid
  passive voice, use Oxford commas, include exactly 3 citations..."
  [Competing constraints; model cannot satisfy all simultaneously]

Practical Steerability Applications:

Customer service deployment:

System prompt: "You are a customer service representative for [Company].
Be warm, empathetic, and professional. Keep responses under 150 words.
Focus only on [product] support questions. If a question is about billing,
say: 'I'll connect you with our billing team.' Do not discuss competitors."

Internal analyst assistant:

System prompt: "You are a business analyst assistant for [Company]'s
strategy team. Be concise and data-focused. Always structure answers
with: Key Finding, Supporting Evidence, Recommended Action.
Flag explicitly when you are uncertain about a fact."

Educational tutor:

System prompt: "You are a patient tutor for introductory calculus students.
Never give the answer directly — always guide through questions.
If the student is stuck after 3 attempts, provide a hint, not the answer.
Celebrate correct reasoning, not just correct answers."

Best Practices

  • Invest time in role and constraint specification — this is the highest-leverage prompting investment
  • For recurring tasks: build a reusable system prompt or prompt template that captures all steering
  • If AI ignores an instruction: restate it, place it closer to the beginning, or simplify competing constraints
  • Do not fight training boundaries — work within them to find compliant approaches

Try It Out: Write a system prompt for an AI assistant for a specific role (your job, a student helper, a writing coach). Deploy it and have a 10-message conversation. Then deliberately try to get the AI to break its persona or constraints through normal conversation pressure (not jailbreaking — just sustained off-topic conversation). Observe when and how the persona degrades.

Example

A customer service team deploys an AI assistant. Without a system prompt: the AI responds in varying tones, sometimes too casual, sometimes too long, occasionally going off-topic. With a well-crafted system prompt: “You are a customer service representative for [Company]. Always be warm and professional. Keep responses under 150 words. Focus only on [product] support. Escalate billing questions to a human agent.” — behavior is consistent, on-brand, and within scope.


MODULE 6: Conclusion — When Properties Collide

Key Notes

  • The four properties (next-token prediction, knowledge, working memory, steerability) do not operate independently — they interact, amplify each other, and sometimes conflict
  • Understanding which property is “responsible” for an unexpected output leads to targeted solutions

Property Interactions — Positive:

CombinationEmergent Capability
Knowledge + SteerabilityPowerful Q&A with domain expertise and controlled format
Context Window + SteerabilityConsistent multi-step, long-form task completion
Next-token + KnowledgeFluent, contextually appropriate writing across domains
All four alignedComplex reasoning tasks, nuanced analysis, extended projects
Steerability + Context WindowPersona maintained consistently across a long session
Knowledge + Next-tokenSophisticated analogies and cross-domain synthesis

Property Interactions — Problematic (8+ collision scenarios):

Collision ScenarioProperties InvolvedWhat HappensDiagnosis
Confident hallucinationNext-token + Knowledge gapAI states false information with high fluency and confidenceMost dangerous failure mode
Instruction drift in long sessionsSteerability + Working memoryEarly constraints gradually lose influence as context growsCommon in multi-turn sessions
Outdated confident adviceKnowledge (cutoff) + SteerabilityAI confidently applies a steered role using outdated informationKnowledge cutoff + role adoption
Context truncationWorking memory (limits) + SteerabilityInstructions truncated; AI forgets constraints set earlierLong context + instruction placement
Fabricated domain citationNext-token + Knowledge + SteerabilitySteered to expert role, AI predicts domain-appropriate but fabricated citationsExpert role amplifies hallucination confidence
Inconsistency across sessionsWorking memory (no persistence) + SteerabilitySame question gets different answers in different sessionsNo memory; different random seeds
Mid-context confusionWorking memory (lost in middle) + KnowledgeAI ignores relevant context pasted into the middle of a long promptAttention degradation
Safety-steerability tensionSteerability + TrainingInstructions ask AI to do something training resistsTraining floors override steerability

The “Which Property Caused This?” Decision Tree:

  UNEXPECTED AI BEHAVIOR
          │
          ▼
  Is the output factually wrong or fabricated?
  ├── Yes → Is this about post-cutoff events?
  │         ├── Yes → KNOWLEDGE (cutoff) issue
  │         │         Fix: provide the correct info; use RAG
  │         └── No  → NEXT-TOKEN PREDICTION (hallucination) issue
  │                   Fix: verify; provide context; don't rely on AI recall
  └── No  → Continue
          │
          ▼
  Is AI ignoring your instructions?
  ├── Yes → Is this a safety/ethical refusal?
  │         ├── Yes → STEERABILITY + TRAINING interaction
  │         │         Fix: work within constraints; rephrase the request
  │         └── No  → STEERABILITY (drift or complexity) issue
  │                   Fix: simplify; restate; move instructions to start
  └── No  → Continue
          │
          ▼
  Is AI forgetting earlier context?
  ├── Yes → Is this between sessions?
  │         ├── Yes → WORKING MEMORY (no persistence)
  │         │         Fix: provide context at session start each time
  │         └── No  → WORKING MEMORY (lost in middle or window size)
  │                   Fix: restate; summarize; restructure prompt
  └── No  → Continue
          │
          ▼
  Is output inconsistent with identical prompts?
  └── Yes → NEXT-TOKEN PREDICTION (temperature/randomness)
            Fix: lower temperature for deterministic tasks

The Capability-Limitation Continuum (Expanded): Every strength has a shadow — produced by the same mechanism.

  PROPERTY         CAPABILITY                    LIMITATION
  ──────────────────────────────────────────────────────────────────────
  Next-token       Fluent, coherent,             Hallucinations; fluency ≠
  prediction       grammatically perfect text    accuracy; no self-check

  Next-token       Style matching, tone          Sensitive to phrasing;
  prediction       adaptation, creative writing  different prompts = different outputs

  Knowledge        Broad domain expertise        Knowledge cutoff; accuracy
                   across virtually all fields   degrades for rare topics

  Knowledge        Cross-domain synthesis,       Overconfidence about topics
                   analogies, connections        with limited training data

  Working memory   Process long documents;       Finite window; lost-in-middle;
  (context window) maintain conversation         no persistence between sessions
                   coherence

  Working memory   Follow complex multi-step     Instruction drift in long
                   instructions within session   conversations

  Steerability     Adapts to any task,           Safety limits that cannot be
                   persona, format, domain       overridden; training floors

  Steerability     Consistent behavior in        Over-specification degrades
                   deployed systems              output; persona breaks down

Diagnostic Framework — Full Version:

SymptomMost Likely PropertyTargeted Solution
Confident false informationNext-token + KnowledgeProvide correct info in prompt; verify outputs
Fabricated citationNext-token predictionNever use AI citations without verification
Doesn’t know recent eventKnowledge (cutoff)Provide the information in the prompt
Ignores instructionsSteerability (drift or conflict)Restate; simplify; reposition
Refuses a reasonable requestSteerability + TrainingRephrase; work within training constraints
Wrong formatSteerability (insufficient specificity)Be more explicit; provide an example
Forgets earlier contextWorking memory (window size)Restate key context; use compact/summarize
Inconsistent across sessionsWorking memory (no persistence)Provide context at session start each time
Different outputs to same promptNext-token (temperature)Lower temperature; accept natural variation
Loses persona in long chatSteerability + Working memoryRestate persona; use system prompt
Too verbose / too briefSteerability (missing constraints)Specify length explicitly
Misses key detail in long docWorking memory (lost in middle)Restructure; put key content at start/end

The Practitioner’s Mental Model:

  When AI surprises you, ask:

  1. Is this a KNOWLEDGE issue?
     → Was the info in training data? Is it post-cutoff? Is it obscure?
     → Fix: provide the information yourself (RAG pattern)

  2. Is this a NEXT-TOKEN issue?
     → Is AI predicting plausible text rather than accurate text?
     → Fix: verify; provide context; use lower temperature

  3. Is this a WORKING MEMORY issue?
     → Did key context fall out of the window or get ignored?
     → Fix: restate; summarize; restructure prompt; start fresh

  4. Is this a STEERABILITY issue?
     → Is AI not following instructions as intended?
     → Fix: clarify; simplify; reposition; check for conflicts

The Honest Assessment of AI:

  • AI is not artificially constrained intelligence waiting to be unleashed
  • AI is a genuinely novel kind of system with genuine strengths and genuine structural limits
  • The limits are not failures — they are properties of the mechanism
  • AI fluency means working skillfully within those properties, not around them
  • The practitioners who get the most from AI are those with accurate mental models — not those with the highest expectations

Best Practices

  • Develop the habit of diagnosing unexpected outputs before retrying — random retry rarely fixes the underlying issue
  • Match your mitigation strategy to the property causing the problem
  • Maintain realistic expectations: AI is extraordinary at some things and structurally incapable of others
  • Share your understanding of AI properties with colleagues — most frustration with AI comes from mismatched expectations

Try It Out: Deliberately induce each of the four failure modes:

  1. Hallucination: Ask AI about a real but obscure person or paper — see if it fabricates details
  2. Knowledge cutoff: Ask about a recent event you know occurred after the training cutoff
  3. Lost in middle: Paste a long document; hide a specific unusual instruction in the middle; ask AI to follow it
  4. Instruction drift: Set 5 strict formatting constraints; have a 15-message conversation; see which constraints survive

Example

A researcher notices AI keeps returning the same hallucinated citation despite being told the citation is wrong. Diagnosis: this is a next-token prediction issue — the statistically likely token sequence after this topic continues to predict that citation. Random correction mid-conversation does not update the model’s weights. Solution: provide the correct citation explicitly in the prompt, ask AI to use only sources provided, and verify independently. The researcher now understands why “just tell it not to” doesn’t fix hallucinations.


Final Checklist

  • I can explain the 4-stage process by which an LLM gets its character (pre-training, SFT, RLHF, alignment)
  • I can describe next-token prediction in plain language
  • I can explain why hallucinations are a structural feature of next-token prediction, not a fixable bug
  • I can define “token” and explain why it differs from a word, with examples
  • I can explain what temperature controls and give examples of when to use low vs. high settings
  • I can explain why fluent text does not imply accurate text
  • I can explain what a knowledge cutoff is and name 5+ high-risk domains for cutoff issues
  • I can describe the RAG pattern and explain when to use it
  • I can describe training data composition and how it affects knowledge depth
  • I can explain what a context window is and name the “lost in the middle” effect
  • I can describe strategies for managing long contexts (chunking, summarization, selective extraction)
  • I can explain why AI has no persistent memory by default
  • I can name 5+ steerability capabilities and 5+ steerability limitations
  • I can explain the difference between system prompts and user prompts
  • I can explain how safety training interacts with steerability
  • I can use the “which property caused this?” decision tree to diagnose an unexpected output
  • I can pair each property’s limitation with a targeted mitigation strategy
  • I can name at least 6 collision scenarios where properties interact problematically
  • I can use the capability-limitation continuum to explain any LLM behavior