Domain 1 27% of exam

Agentic Architecture & Orchestration — Complete Lesson

Domain progress
Domain 1 27% of exam · 7 Task Statements

Agentic Architecture
& Orchestration

The heaviest domain at 27%. It covers everything from the mechanics of a single agent loop to coordinating fleets of specialized subagents. The exam tests whether you know the exact stopping condition for a loop, whether you understand that subagents have zero inherited context, and whether you reach for programmatic enforcement or prompt instructions when correctness is non-negotiable.

Every scenario in the exam touches Domain 1 — whether it's the Customer Support agent enforcing identity verification, the Research system coordinating subagents, or the CI/CD pipeline decomposing large reviews into focused passes.

Task Statement 1.1
Task Statement 1.1

Design and implement agentic loops for autonomous task execution

The agentic loop is the fundamental execution primitive. Getting the termination condition wrong is the most common and most consequential implementation error — and the exam tests it explicitly.

The Core Concept

An agentic loop sends a message to Claude, receives a response, checks the stop_reason field, executes any requested tools, appends the results to the conversation history, and repeats. The loop continues until Claude signals it is finished by returning stop_reason == "end_turn".

The critical insight: Claude decides when it is done, not the developer's iteration counter or text-parsing logic. The stop_reason field is the only reliable signal. Everything else is a workaround that will fail in production.

The Exam Principle: The loop terminates when stop_reason == "end_turn". The loop continues when stop_reason == "tool_use". Never parse text content to determine termination. Never rely solely on iteration caps. These are the two most tested facts in 1.1.

Loop Lifecycle

📤

1. Send Request

Send the current conversation history (including all previous tool results) to Claude. The model reasons over the full history to decide its next action.

🔍

2. Inspect stop_reason

"tool_use" → Claude wants to call a tool. "end_turn" → Claude is finished. These are the only two values that matter for loop control.

⚙️

3. Execute Tools

For each tool call in the response, execute the tool and collect the result. Tool calls are in response.content blocks with type == "tool_use".

📥

4. Append Results

Append the assistant's response AND the tool results to conversation history. Both are required — omitting the assistant turn corrupts the conversation structure.

Correct Implementation

python — correct agentic loop PRODUCTION PATTERN
def run_agent(client, tools, initial_message):
    messages = [{"role": "user", "content": initial_message}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            tools=tools,
            messages=messages
        )

        # ✓ ONLY correct termination signal
        if response.stop_reason == "end_turn":
            break

        # ✓ Continue loop on tool_use
        if response.stop_reason == "tool_use":
            # Append assistant response to history
            messages.append({
                "role": "assistant",
                "content": response.content
            })

            # Execute each tool call and collect results
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })

            # ✓ Append tool results for next iteration
            messages.append({
                "role": "user",
                "content": tool_results
            })

    return response

Anti-Patterns the Exam Tests

✗ Text-Based Termination
# NEVER do this for block in response.content: if block.type == "text": if "done" in block.text.lower() or "finished" in block.text.lower(): break # unreliable!
✗ Arbitrary Iteration Cap
# NEVER use as primary stop for i in range(10): response = client.messages... # Silently truncates valid work # No signal to the user # May cut off mid-task
🚨
Iteration caps as safety nets are fine. The anti-pattern is using them as the primary stopping mechanism. A safety cap of 50 iterations with a stop_reason check inside is correct. A loop that stops at 10 iterations without ever checking stop_reason is not.

Exam Traps for Task 1.1

The TrapWhy It FailsCorrect Pattern
Parse "I'm done" or "Task complete" in response text to terminate Text content is non-deterministic — Claude may phrase completion differently, or include those words mid-task Check stop_reason == "end_turn" exclusively
Check if response has no tool_use blocks as a completion signal A response with only text and stop_reason == "tool_use" doesn't exist — but this logic misses edge cases Rely on stop_reason, not content structure
Omit the assistant turn when appending tool results The API requires alternating user/assistant turns. Jumping straight to tool results breaks the conversation structure Append assistant response first, then append tool results as a user turn
Use pre-configured decision trees instead of model-driven tool calling Removes the model's ability to reason about context — inflexible and brittle for novel inputs Let Claude decide which tool to call based on context; use programmatic gates only for ordering constraints

🔨 Implementation Task

T1

Build and Stress-Test a Production Agentic Loop

Implement a loop and deliberately trigger each failure mode to confirm your termination logic is correct.

  • Implement the agentic loop using stop_reason as the sole termination signal
  • Add a safety cap of 25 iterations with an explicit warning log — verify it never fires on normal tasks
  • Test: have Claude call 3 tools in sequence — confirm all results are appended and reasoning is continuous
  • Break it intentionally: omit the assistant turn append — observe the API error and understand why
  • Add text-based termination as a second branch — prove it fires incorrectly on a response that contains "done" mid-reasoning

Exam Simulation — Task 1.1

Question 1 — Task 1.1 Customer Support Agent
An agent loop is implemented with a check: "if the response has no tool_use blocks, break the loop." During load testing, some support requests are being cut off — the agent stops mid-investigation without completing. What is the root cause?
  • AThe tool descriptions are too vague, causing Claude to skip tool calls on complex requests
  • BThe termination condition is incorrect — the loop should check stop_reason == "end_turn", not the presence of tool_use blocks, since Claude may reason across multiple turns before needing tools
  • CThe agent needs a higher iteration cap — increase from 10 to 25 iterations
  • DTool results are not being appended to conversation history, causing Claude to repeat tool calls
Correct: B
B is correct. Claude can produce pure-text responses (no tool_use blocks) as reasoning turns — turns where it thinks through the next step before calling a tool. These responses have stop_reason == "end_turn", which is the same signal as genuine task completion. Using the absence of tool_use blocks as a termination condition incorrectly fires on these reasoning turns, ending the loop prematurely while the task is still in progress. A is wrong: Tool description quality doesn't cause early termination. C is wrong: An iteration cap is a band-aid and not the right mental model for loop control. D is wrong: Missing tool results cause repetition, not early termination.
Question 2 — Task 1.1 Multi-Agent Research System
A developer implements the following loop termination: if "research complete" in response.content[0].text: break. What is wrong with this approach, and what is the correct implementation?
  • AThe string comparison is case-sensitive — use .lower() to make it robust
  • BThe response may contain multiple content blocks — iterate over all blocks to check for the phrase
  • CText-based termination is fundamentally unreliable — the phrase may appear in mid-task reasoning or not at all when the task genuinely completes. Use response.stop_reason == "end_turn"
  • DThe phrase should be injected into the system prompt so Claude consistently uses it as a completion signal
Correct: C
C is correct. Text-based termination is non-deterministic. Claude may write "research complete" while mid-task (e.g., "This section of research is complete, moving on to..."), or it may finish without using that exact phrase. The stop_reason field is a structured signal set programmatically by the API — the only reliable termination signal. A and B fix edge cases without addressing the fundamental design flaw. D is an interesting idea but still non-deterministic — Claude may paraphrase or the phrase may appear mid-reasoning.
Task Statement 1.2
Task Statement 1.2

Orchestrate multi-agent systems with coordinator-subagent patterns

Multiple specialized agents working in concert, coordinated by a single orchestrator. The coordinator pattern provides observability, consistent error handling, and controlled information flow — but only if you design it correctly.

The Core Concept

In a hub-and-spoke architecture, a coordinator agent receives the original request, decomposes it into subtasks, delegates each to a specialized subagent, collects results, and synthesizes the final response. No subagent communicates directly with another — all routing passes through the coordinator.

The Exam Principle: Subagents have isolated context. They do not inherit the coordinator's conversation history automatically. Every piece of context a subagent needs must be explicitly provided in its prompt. This is the most tested fact in 1.2 and 1.3.

Hub-and-Spoke Architecture

🎯

Coordinator Role

Analyzes query complexity, decomposes into subtasks, selects which subagents to invoke, aggregates results, evaluates coverage, and re-delegates if gaps exist.

🔬

Subagent Role

Specialized for one task type. Receives a complete, self-contained prompt from the coordinator. Executes and returns structured results. No awareness of other subagents.

🚦

All Routing Through Coordinator

Prevents spaghetti communication patterns. Enables consistent error handling, logging, and retry logic in one place rather than scattered across agents.

🔄

Iterative Refinement

Coordinator evaluates synthesis output for coverage gaps, re-delegates with targeted queries, and re-invokes synthesis — repeating until coverage is sufficient.

Coordinator Design Principles

The most common coordinator failure is overly narrow task decomposition — breaking "impact of AI on creative industries" into only visual arts subtasks, because that's what the coordinator knows best. The result: every subagent completes successfully, but the final output has systematic blind spots.

✗ Narrow Decomposition
Topic: "AI in creative industries" Subtasks assigned: 1. AI in digital art creation 2. AI in graphic design 3. AI in photography Result: 100% visual arts. Music, writing, film: missed. All subagents: "successful".
✓ Comprehensive Decomposition
Topic: "AI in creative industries" Subtasks assigned: 1. AI in visual arts & design 2. AI in music composition 3. AI in writing & journalism 4. AI in film & video production Coverage: breadth-first, explicit.
  • Design coordinator prompts specifying research goals and quality criteria — not step-by-step procedural instructions, to preserve subagent adaptability
  • Partition scope across subagents to minimize duplication (distinct subtopics or source types per agent)
  • Implement iterative refinement: evaluate synthesis output → identify gaps → re-delegate with targeted queries → re-synthesize
  • Route all inter-subagent information through the coordinator — never allow direct subagent-to-subagent communication

Exam Traps for Task 1.2

The TrapWhy It FailsCorrect Pattern
Subagents automatically inherit coordinator context They do not. Each subagent invocation is a fresh context. Assuming inheritance leads to silent failures where subagents lack required information Explicitly pass all needed context in the subagent's prompt
Blame synthesis agent when final output has coverage gaps If each subagent completed successfully, the gap is in what they were assigned — coordinator decomposition is the root cause Inspect coordinator logs first. Narrow decomposition is the most common cause of systematic gaps
Allow subagents to call each other directly for efficiency Bypasses coordinator's observability and error handling — creates spaghetti flows that are impossible to debug All communication routes through coordinator; coordinator handles retries and routing decisions

🔨 Implementation Task

T2

Build a 3-Agent Research Coordinator

Implement a coordinator + web search agent + synthesis agent. Deliberately create and then fix a decomposition failure.

  • Implement the coordinator with hub-and-spoke routing — all subagent communication through coordinator only
  • Run on "impact of remote work on urban planning" — log the decomposition. Identify if any major category is missing
  • Implement the iterative refinement loop: coordinator evaluates synthesis output and re-delegates if coverage is below threshold
  • Test context isolation: verify the synthesis agent has no access to the web search agent's raw conversation — only the coordinator-passed results
  • Deliberately break decomposition by giving coordinator a narrow system prompt — observe the coverage gap and fix it

Exam Simulation — Task 1.2

Question 1 — Task 1.2 Multi-Agent Research System
After running the system on "impact of AI on creative industries," all subagents complete successfully, but the final reports cover only visual arts, missing music, writing, and film. The coordinator's logs show it decomposed the topic into: "AI in digital art," "AI in graphic design," and "AI in photography." What is the most likely root cause?
  • AThe synthesis agent lacks instructions for identifying coverage gaps in findings from other agents
  • BThe coordinator's task decomposition is too narrow — it assigned only visual arts subtasks and the subagents executed those correctly, leaving other creative domains unassigned
  • CThe web search agent's queries are not comprehensive enough — it needs expanded search parameters
  • DThe document analysis agent is filtering out non-visual-arts sources due to overly restrictive relevance criteria
Correct: B
B is correct. The coordinator's logs are the diagnostic — it decomposed "creative industries" into only visual arts subtasks. The subagents did exactly what they were assigned. The failure is upstream in the coordinator's decomposition logic, not in any subagent's execution. A, C, D incorrectly blame downstream agents that performed their assigned tasks correctly. This is the official exam question Q7 — always trace failures to their root in the coordinator before blaming subagents.
Question 2 — Task 1.2 Multi-Agent Research System
A coordinator agent manages 3 specialized subagents: WebSearchAgent, DocumentAnalysisAgent, and DataValidationAgent. For each request, WebSearch and DocumentAnalysis can run simultaneously (independent inputs), while DataValidation must run after both complete. Currently all three run sequentially: total latency = 45s + 60s + 30s = 135s. What architecture achieves the minimum possible latency?
  • AGive the coordinator a meta-tool that triggers all three agents simultaneously and waits for all results before proceeding
  • BEmit both WebSearchAgent and DocumentAnalysisAgent Task tool calls in a single coordinator response (parallel), then in the next coordinator turn emit DataValidationAgent after both results are available
  • CHave WebSearch and DocumentAnalysis each invoke DataValidation directly when they finish, removing the coordinator from the final step
  • DMerge WebSearch and DocumentAnalysis into a single SuperSearchAgent that handles both operations internally, reducing coordinator round-trips from 3 to 2
Correct: B
B is correct. Multiple Task tool calls in a single coordinator response triggers parallel API execution — WebSearch and DocumentAnalysis run concurrently (max(45s, 60s) = 60s), then DataValidation runs after (30s) = 90s total, down from 135s. A is wrong: A "meta-tool" for parallel execution doesn't exist in the Claude Agent SDK — the Task tool IS the correct mechanism, and multiple Task calls in one response is how parallel execution works. C is wrong: Subagents invoking other subagents directly creates unpredictable coordination and breaks the hub-and-spoke pattern. D is wrong: Merging reduces round-trips but doesn't achieve parallel execution of independent operations — the merged agent still processes them sequentially.
Question 3 — Task 1.2 Multi-Agent Research System
A coordinator agent manages 4 subagents and accumulates all their result outputs in its context. By request 3 in a session, the coordinator is using 180k tokens of a 200k token model — mostly subagent result outputs from earlier steps that are no longer needed. The system is approaching context limits after just 3 requests. What is the most scalable architectural fix?
  • ASwitch to a model with a larger context window (e.g., 1M tokens) to accommodate accumulating subagent results across more requests
  • BAdd a system prompt instruction: "Summarize and discard subagent results immediately after each synthesis step to conserve context"
  • CExternalize subagent results to a shared state store (file or memory object passed via tool results), and have the coordinator reference only what it currently needs for the active reasoning step rather than accumulating all history in its context
  • DReduce the number of active subagents from 4 to 2 to limit the volume of result data entering the coordinator context
Correct: C
C is correct. The coordinator shouldn't hold all results in its context — results should live outside the model context and be fetched or referenced when needed. This is the architectural fix: decouple result storage from model context. A is wrong — this is the canonical exam trap: scaling the context window is a band-aid, not a fix. At 1M tokens, the same accumulation pattern repeats at request 10 or 20. B is wrong: Prompt instructions for summarization are probabilistic and unreliable for structured data — the coordinator may not consistently apply them. D is wrong: Reducing subagent count limits the system's capability and doesn't address the accumulation architectural pattern.
Task Statement 1.3
Task Statement 1.3

Configure subagent invocation, context passing, and spawning

The mechanics of subagent creation: what tool spawns them, how context flows in, and how parallel execution works. The exam tests precise knowledge of the Task tool and the parallel spawning pattern.

The Core Concept

Subagents are not automatically created — they are spawned using the Task tool. The coordinator must have "Task" in its allowedTools list, and each subagent is defined via AgentDefinition with its own system prompt, description, and tool restrictions.

Isolation is absolute. Subagents do not share memory. They do not inherit parent context. Each invocation of a subagent is fresh. If you need the synthesis agent to know what the web search agent found, you must explicitly include those findings in the synthesis agent's prompt.

The Task Tool

🔑

Requirement: allowedTools

The coordinator's allowedTools must include "Task". Without this, the coordinator cannot spawn subagents regardless of prompt instructions.

📝

AgentDefinition

Defines each subagent type with: description, system prompt, and tool restrictions. The system prompt scopes the subagent's behavior. Tool restrictions enforce role separation.

📦

Complete Context in Prompt

Every piece of information the subagent needs must be in the Task call's prompt. Source URLs, document names, prior agent outputs — all must be explicitly included.

🌿

Fork-Based Session Management

Fork sessions create independent branches from a shared analysis baseline — enabling divergent explorations without contaminating the main session context.

Context Passing Pattern

When passing context between agents, use structured data formats that separate content from metadata. Raw text blobs lose attribution — source URLs, document names, and page numbers disappear during synthesis.

python — explicit context passing to synthesis agent CORRECT PATTERN
# ✓ Pass structured context explicitly — not raw text
synthesis_prompt = f"""
You are a synthesis agent. Combine the following research findings
into a comprehensive report on: {research_topic}

Web Search Results:
{json.dumps(web_results, indent=2)}

Document Analysis:
{json.dumps(doc_analysis, indent=2)}

Each result includes: source_url, excerpt, relevance_score, date.
Preserve source attribution in your synthesis.

Quality criteria: Cover all major categories. Flag any gaps.
"""

# ✗ DO NOT pass context like this:
synthesis_prompt = f"Synthesize this: {str(all_results)}"
# ↑ Loses structure, attribution, and metadata

Parallel Spawning

To run subagents in parallel, emit multiple Task tool calls in a single coordinator response. Spawning them across separate turns forces sequential execution and negates the latency benefit.

✗ Sequential (Separate Turns)
Turn 1: Task(web_search, topic_A) → wait for result Turn 2: Task(doc_analysis, topic_B) → wait for result Turn 3: Task(web_search, topic_C) 3x latency. Sequential by design.
✓ Parallel (Single Response)
Turn 1: [ Task(web_search, topic_A), Task(doc_analysis, topic_B), Task(web_search, topic_C) ] → all 3 run concurrently 1x latency. Parallel by design.
  • Include complete findings from prior agents directly in the subagent's prompt — never assume it can access them from history
  • Use structured data formats (JSON with source URLs, document names, page numbers) to preserve attribution when passing context
  • Spawn parallel subagents by emitting multiple Task calls in a single coordinator response — not across separate turns
  • Design coordinator prompts with goals and quality criteria — not step-by-step instructions — to preserve subagent adaptability

Exam Traps for Task 1.3

The TrapWhy It FailsCorrect Pattern
Spawn subagents across multiple turns for parallelism Each turn is sequential — the coordinator waits for each Task result before proceeding to the next turn Emit all parallel Task calls in a single coordinator response
Pass raw string concatenation of results between agents Loses structure and attribution — source URLs, dates, and metadata disappear; synthesis agent cannot distinguish findings Pass structured JSON with explicit fields for content, source, date, relevance
Give coordinator step-by-step procedural instructions Overly procedural prompts make subagents rigid — they can't adapt when intermediate results reveal new requirements Specify research goals and quality criteria; let subagents determine their approach

🔨 Implementation Task

T3

Implement Parallel Subagent Spawning with Structured Context

Build parallel spawning and structured context passing, then measure the latency difference vs sequential.

  • Configure coordinator with allowedTools: ["Task", ...] and define 3 subagent types via AgentDefinition
  • Implement sequential spawning — measure total latency for 3 subagents
  • Implement parallel spawning (multiple Task calls in one response) — measure total latency and confirm it's ~1x not 3x
  • Pass context using a structured JSON schema including: content, source_url, date, relevance_score
  • Verify isolation: confirm synthesis agent has zero access to web search agent's raw conversation history

Exam Simulation — Task 1.3

Question 1 — Task 1.3 Multi-Agent Research System
The synthesis agent is producing reports that miss source attributions — it references findings but cannot cite which web source they came from. The web search agent correctly returns results with URLs. What is the most likely cause?
  • AThe synthesis agent's system prompt does not instruct it to include citations
  • BThe coordinator passes web search results as a plain text string, stripping the structured metadata (source URLs, dates) that was in the original JSON response
  • CThe synthesis agent inherits the web search agent's context and is discarding the URL fields
  • DThe web search tool is not returning URLs in its response schema
Correct: B
B is correct. Context passing format is the most common source of lost attribution. If the coordinator converts structured JSON to a plain text summary before passing it to the synthesis agent, source URLs are silently discarded. The fix: pass the full structured JSON preserving all metadata. A is a secondary issue that a prompt fix could address, but the root cause is the data pipeline losing structure. C is wrong: Subagents don't inherit context — they receive only what the coordinator explicitly passes. D is wrong: The problem statement states the web search agent correctly returns URLs.
Question 2 — Task 1.3 Multi-Agent Research System
A research pipeline spawns 4 subagents sequentially, each taking ~15 seconds. Total pipeline time is 60+ seconds. The team wants to reduce this to ~15 seconds. What is the correct architectural change?
  • AUse a faster Claude model for subagents to reduce per-agent latency from 15s to 5s
  • BAdd caching to the coordinator so repeated queries reuse previous subagent results
  • CEmit all 4 Task tool calls in a single coordinator response so the subagents execute in parallel rather than sequentially across separate turns
  • DMerge all 4 subagents into a single agent that handles all research tasks to eliminate coordination overhead
Correct: C
C is correct. Multiple Task calls in a single response triggers parallel execution — all 4 run concurrently, so total time approaches the slowest single agent (~15s) rather than the sum (~60s). This is the explicit parallel spawning pattern from the exam guide. A reduces per-agent time but doesn't address the sequential architecture — time is still 4x the single-agent time. B helps for repeated identical queries but doesn't solve the fundamental sequencing. D merges specializations, degrading quality and violating the role-scoping principle.
Task Statement 1.4
Task Statement 1.4

Implement multi-step workflows with enforcement and handoff patterns

When the business cannot tolerate a 1% failure rate, prompt instructions are not enough. This task statement is about knowing when to build programmatic gates — and how to hand off gracefully when the agent can't proceed alone.

The Core Concept

Prompt-based guidance tells Claude what it should do. Programmatic enforcement tells the system what it can do. For critical business logic — identity verification before financial operations, compliance checks before data access — only programmatic gates provide the deterministic guarantees that production requires.

The Exam Principle: "Enhance the system prompt to say X is mandatory" is always the wrong answer when the question describes a scenario with financial consequences or identity verification. The correct answer is a programmatic prerequisite gate that blocks the downstream tool call until the prerequisite is satisfied.

Programmatic Prerequisite Gates

A prerequisite gate intercepts a tool call and checks whether required prior steps have been completed. If not, it blocks the call and returns a structured error explaining what must happen first.

python — prerequisite gate blocking process_refund DETERMINISTIC ENFORCEMENT
def execute_tool(tool_name, tool_input, session_state):
    # ✓ Programmatic gate — runs before every tool execution
    if tool_name in ["process_refund", "lookup_order"]:
        if not session_state.get("verified_customer_id"):
            return {
                "error": "Prerequisite not met",
                "required": "get_customer must be called first",
                "reason": "Identity verification required before order operations"
            }

    if tool_name == "get_customer":
        result = get_customer_impl(tool_input)
        # ✓ Gate is cleared once prerequisite completes
        session_state["verified_customer_id"] = result["customer_id"]
        return result

    # ✗ Prompt-only approach (12% failure rate):
    # "Always call get_customer before any order operations"
✗ Prompt-Based (12% Failure)
System prompt: "Customer verification via get_customer is mandatory before any order operations." Production result: - 88% compliance rate - 12% skips to lookup_order - Wrong refunds issued - Financial impact
✓ Programmatic Gate (0% Bypass)
Gate logic: if "process_refund" called and no verified_customer_id → block + return error Production result: - 100% compliance - Zero bypasses possible - Deterministic guarantee

Structured Handoff Protocols

When an agent must escalate to a human, the handoff package must be complete — the human agent receiving it has no access to the conversation history. Every decision, finding, and recommendation must be compiled into a self-contained summary.

  • Decompose multi-concern requests into distinct items, investigate each in parallel using shared context, then synthesize a unified resolution before handoff
  • Compile structured handoff summaries with: customer ID, issue root cause, refund amount or action taken, recommended next action
  • Include what was attempted and what was not — the human agent needs to know where to pick up
  • Never escalate with "I couldn't help" — always include the investigation results that led to escalation

Exam Traps for Task 1.4

The TrapWhy It FailsCorrect Pattern
Enhance system prompt to make verification "mandatory" Prompt instructions have a non-zero failure rate for compliance. "Mandatory" in a prompt is advisory, not enforced Programmatic gate that physically blocks the tool call until prerequisite state is set
Add few-shot examples showing correct tool order Few-shot examples improve probability but don't provide deterministic guarantees for financial operations Gates for financial/identity operations; few-shot for classification and routing where probabilistic is acceptable
Escalate with just "I need human assistance" Human agent has no context — they must start over. Wastes the investigation work already done Structured handoff including customer ID, root cause, what was found, and recommended action

🔨 Implementation Task

T4

Build a Prerequisite Gate for a Financial Workflow

Implement the get_customer → lookup_order → process_refund gate chain and prove it enforces order deterministically.

  • Implement session state tracking with verified_customer_id flag
  • Build the prerequisite gate: block lookup_order and process_refund until get_customer sets the flag
  • Test bypass attempt: craft a prompt where the user volunteers their order ID — confirm the gate still fires
  • Implement structured handoff: when a refund exceeds $500, compile a complete handoff package and escalate
  • Compare with prompt-only: remove the gate, add a system prompt instruction — run 10 tests and count how many bypass verification

Exam Simulation — Task 1.4

Question 1 — Task 1.4 Customer Support Agent
Production data shows that in 12% of cases, the agent skips get_customer entirely and calls lookup_order using only the customer's stated name, occasionally leading to misidentified accounts and incorrect refunds. What change would most effectively address this reliability issue?
  • AAdd a programmatic prerequisite that blocks lookup_order and process_refund calls until get_customer has returned a verified customer ID
  • BEnhance the system prompt to state that customer verification via get_customer is mandatory before any order operations
  • CAdd few-shot examples showing the agent always calling get_customer first, even when customers volunteer order details
  • DImplement a routing classifier that analyzes each request and enables only the subset of tools appropriate for that request type
Correct: A
A is correct. This is the official exam question Q1. Programmatic enforcement provides deterministic guarantees that prompt-based approaches cannot. The 12% failure rate is already proof that the prompt approach is insufficient. B and C are both probabilistic — they may reduce the failure rate but cannot eliminate it when errors have financial consequences. D addresses which tools are available, not their execution order — the actual problem.
Question 2 — Task 1.4 Customer Support Agent
A customer support agent has a 5-step workflow: identify_customer → lookup_order → assess_eligibility → calculate_compensation → process_refund. Post-deployment analysis shows that in 8% of cases, process_refund is called with a null customer_id because a lookup_order failure was silently absorbed 3 steps earlier. What is the most effective architectural change to prevent null propagation through the workflow?
  • AAdd validation in process_refund to reject calls with null customer_id and return a structured error message
  • BAdd a system prompt instruction: "Always verify customer_id is not null before processing a refund"
  • CAdd retry logic at the lookup_order step so it retries up to 3 times before returning null
  • DImplement a programmatic gate at lookup_order's output: if the result lacks a valid customer_id, immediately raise a structured error that halts the workflow — preventing downstream tools from being called with incomplete state
Correct: D
D is correct. A gate at the failure source stops propagation immediately rather than letting null flow through 3 intermediate steps. This is enforcement at the right layer — the workflow cannot proceed past lookup_order with invalid state. A is wrong: Validation at process_refund only catches the null 3 steps later, after the workflow has done unnecessary work and potentially taken other actions. B is wrong: The 8% failure rate proves prompt instructions aren't sufficient — programmatic enforcement is required. C is wrong: Retry addresses transient failures, not the null propagation architecture problem. If lookup_order returns null after retries, the same null still propagates downstream.
Question 3 — Task 1.4 Customer Support Agent
Your agent must enforce this rule: a refund cannot exceed the original order total. The order total is retrieved by lookup_order at step 2, and the refund amount is determined by calculate_compensation at step 4. Currently, process_refund occasionally executes refunds exceeding the order total due to compensation calculation errors. What is the most reliable enforcement mechanism?
  • AAdd a validation instruction to calculate_compensation: "The compensation must never exceed the original order total retrieved in step 2"
  • BImplement a pre-call guard on process_refund that receives both refund_amount and order_total as required inputs, compares them programmatically in application code, and raises a structured error if refund_amount > order_total
  • CAdd a post-call audit tool that reviews completed refunds and flags overages for manual reversal within 24 hours
  • DLog the order total to session state at lookup_order so it is consistently available for the model to reference when calculating compensation
Correct: B
B is correct. A programmatic comparison in application code before the refund executes is deterministic — it cannot be bypassed regardless of what the model decided. The guard fires before any funds move. A is wrong: Prompt instructions in calculate_compensation are probabilistic — the same conditions causing occasional overages will cause them again. C is wrong: Post-call audits are a recovery mechanism, not a prevention mechanism — money has already moved before the audit runs. D is wrong: Logging the order total improves auditability but adds no enforcement — the model can still pass an incorrect amount to process_refund regardless of what's in session state.
Task Statement 1.5
Task Statement 1.5

Apply Agent SDK hooks for tool call interception and data normalization

Hooks are the layer between tool execution and model reasoning. They enable deterministic compliance and clean data normalization — without polluting tool implementations or system prompts.

The Core Concept

Hooks intercept the tool call lifecycle at two points: before a tool is called (tool call interception) and after it returns (PostToolUse). This gives you a centralized place to enforce compliance rules and normalize data formats before Claude reasons about tool results.

The Decision Rule: Use hooks when the business rule requires guaranteed compliance. Use prompts when probabilistic compliance is acceptable. The exam will present scenarios — your job is to classify them correctly. Financial operations, refund thresholds, and PII handling = hooks. Style guidelines and output formatting = prompts.

Hook Patterns

📥

PostToolUse Hook

Intercepts tool results before the model sees them. Use for: normalizing heterogeneous data formats (Unix timestamps → ISO 8601, numeric status codes → human-readable strings) from different MCP tools.

📤

Tool Call Interception Hook

Intercepts outgoing tool calls before execution. Use for: blocking policy-violating actions (refunds exceeding $500), redirecting to alternative workflows, logging compliance events.

🔒

Deterministic Guarantee

Hook logic runs in application code — not through the LLM. This means 100% enforcement. A hook that blocks a call will never fail due to model reasoning.

🧹

Data Normalization

Multiple MCP tools return different timestamp formats, status codes, and field names. PostToolUse hooks normalize everything to a consistent schema before Claude processes it.

Hook Implementations

python — PostToolUse: normalize timestamps from multiple MCP tools DATA NORMALIZATION
def post_tool_use_hook(tool_name, tool_result):
    """Normalize heterogeneous data before model processes it."""

    if "created_at" in tool_result:
        ts = tool_result["created_at"]

        # Unix timestamp → ISO 8601
        if isinstance(ts, (int, float)):
            tool_result["created_at"] = datetime.utcfromtimestamp(ts).isoformat()

    # Numeric status codes → human-readable
    status_map = {1: "active", 2: "pending", 3: "cancelled"}
    if "status" in tool_result and isinstance(tool_result["status"], int):
        tool_result["status"] = status_map.get(tool_result["status"], "unknown")

    return tool_result
python — Tool call interception: block refunds exceeding policy threshold COMPLIANCE ENFORCEMENT
def pre_call_hook(tool_name, tool_input, session_state):
    """Block policy-violating actions before execution."""

    if tool_name == "process_refund":
        amount = tool_input.get("amount", 0)

        if amount > 500:
            # Block and redirect to escalation workflow
            return {
                "blocked": True,
                "reason": "Refund exceeds $500 policy maximum",
                "action": "escalate_to_manager",
                "amount_requested": amount
            }

    return None  # None = allow the call to proceed

Exam Traps for Task 1.5

The TrapWhy It FailsCorrect Pattern
Use system prompt to enforce the $500 refund policy Prompts are probabilistic — a sufficiently unusual input or edge case will bypass the instruction Pre-call hook that reads the amount and blocks before execution, 100% of the time
Normalize data formats inside each tool implementation Scatters normalization logic across tools — inconsistent, hard to audit, breaks when new tools are added Centralized PostToolUse hook normalizes all tool outputs to a consistent schema
Use hooks for all compliance, even style/formatting guidelines Overkill — hooks are for deterministic requirements; prompts handle stylistic preferences effectively Hooks for financial, identity, and regulatory compliance. Prompts for formatting, tone, and style.

🔨 Implementation Task

T5

Build a Hook Layer for a Customer Support Agent

Implement both a PostToolUse normalization hook and a pre-call compliance hook.

  • Build PostToolUse hook that normalizes: Unix timestamps → ISO 8601, numeric status codes → strings, currency in cents → formatted dollars
  • Build pre-call hook blocking process_refund when amount > $500 and redirecting to escalate_to_manager
  • Test: send a refund request for $750 — confirm the hook blocks it before the tool executes, not after
  • Test normalization: mock a tool that returns Unix timestamp and numeric status — confirm Claude receives ISO and string versions
  • Compare: remove the hook and add a system prompt for the $500 rule — run 20 tests with varying inputs and count bypasses

Exam Simulation — Task 1.5

Question 1 — Task 1.5 Customer Support Agent
Your agent uses 3 MCP tools from different vendors. Each returns timestamps in a different format: Tool A returns Unix timestamps, Tool B returns ISO 8601 strings, Tool C returns "MM/DD/YYYY" formatted strings. Claude is making errors in time-based comparisons because it processes different formats inconsistently. What is the most architecturally sound fix?
  • AAdd a system prompt instruction: "All timestamps should be treated as UTC and converted to ISO 8601 before comparison"
  • BUpdate each of the 3 MCP tool implementations to return ISO 8601 format
  • CImplement a PostToolUse hook that normalizes all tool outputs to ISO 8601 before the model processes them, centralizing format logic in one place
  • DAdd a dedicated timestamp normalization tool that Claude can call after each tool result to convert the format
Correct: C
C is correct. A PostToolUse hook centralizes normalization in one place — consistent, auditable, and automatically applied to any new tool added in the future. A is wrong: System prompts are probabilistic and still require Claude to correctly interpret heterogeneous inputs. B would work but requires modifying external vendor tools — often not possible, and scatters normalization logic. D is over-engineered and adds unnecessary tool calls and latency to every interaction.
Question 2 — Task 1.5 Customer Support Agent
Your team needs to enforce a policy that no refund over $500 can be processed without manager approval. Which implementation approach provides the required deterministic guarantee?
  • AAdd to the system prompt: "For refunds over $500, always check with a manager before processing"
  • BInclude 5–8 few-shot examples in the system prompt demonstrating the agent escalating large refunds
  • CImplement a pre-call interception hook on process_refund that reads the amount, blocks calls over $500, and routes to an escalation workflow — running in application code before any LLM call
  • DAdd input validation to the process_refund tool that returns an error for amounts over $500
Correct: C
C is correct. A pre-call hook runs in application code and cannot be bypassed by any model behavior. A and B are probabilistic — they depend on LLM compliance, which has a non-zero failure rate. D is close but input validation inside the tool fires after the LLM has already decided to call it; the hook fires before — preventing the call entirely and enabling redirection to escalation. Additionally, tool-level validation still invokes the tool; the hook redirects to a different workflow.
Task Statement 1.6
Task Statement 1.6

Design task decomposition strategies for complex workflows

Not all workflows need the same decomposition strategy. Fixed sequential pipelines excel at predictable multi-step tasks. Dynamic adaptive decomposition shines for open-ended investigation. Knowing which to use — and why — is what the exam tests.

The Core Concept

Task decomposition is the act of breaking a complex goal into a sequence of smaller, executable steps. The right pattern depends on how much is known about the task upfront: prompt chaining for predictable workflows where the steps are known in advance, dynamic decomposition for open-ended investigations where each step reveals what to do next.

Decomposition Patterns

🔗

Prompt Chaining

Fixed sequential pipeline. Each step's output is the next step's input. Use when: the workflow has predictable stages (analyze file → summarize → compare → report). Steps known in advance.

🌱

Dynamic Adaptive Decomposition

Generates subtasks based on what's discovered. Use when: the task is open-ended and intermediate findings change what to explore next (e.g., "add comprehensive tests to a legacy codebase").

📄

Per-File + Integration Pass

Split large multi-file reviews: analyze each file individually for local issues, then run a separate cross-file integration pass. Avoids attention dilution in single-pass reviews of 10+ files.

🗺️

Map-First, Then Plan

For open-ended tasks: first map the full scope (all files, all dependencies), identify high-impact areas, then generate a prioritized plan that can adapt as dependencies are discovered.

💡
Per-file + integration pass is tested directly. A single-pass review of 14 files produces inconsistent depth and contradictory findings. The fix is not a larger context window — it's splitting into per-file local passes plus a separate cross-file integration pass. This pattern appears in Domain 1 (decomposition) and Domain 4 (multi-pass review).
✗ Single-Pass Review (14 files)
Input: all 14 files simultaneously Problems observed: - Superficial comments on some files - Obvious bugs missed - Contradictory feedback across files - Pattern flagged in file A, approved in identical file B Root cause: attention dilution
✓ Per-file + Integration Pass
Pass 1: analyze each file individually → consistent depth per file → local issues caught reliably Pass 2: cross-file integration review → data flow analysis → interface contracts → global pattern consistency Root cause eliminated.

Exam Traps for Task 1.6

The TrapWhy It FailsCorrect Pattern
Use dynamic decomposition for a predictable multi-step review Dynamic decomposition adds overhead and unpredictability when the steps are already known and fixed Prompt chaining for predictable workflows; dynamic only for open-ended investigation tasks
Switch to a larger context model to review 14 files in one pass Context window size doesn't solve attention dilution — models still process middle content less reliably Split into per-file passes + integration pass; attention is consistent within each focused pass
For "add tests to a legacy codebase," start implementing immediately Without mapping the codebase first, tests will duplicate existing coverage and miss high-impact areas Map structure → identify high-impact areas → create prioritized plan → implement adaptively

🔨 Implementation Task

T6

Implement Both Decomposition Patterns and Compare

Build both patterns on the same problem set and demonstrate when each is appropriate.

  • Implement a prompt chain for a 5-step code review: parse → analyze → summarize per file → cross-file compare → report
  • Implement dynamic decomposition for "identify all test gaps in this codebase" — observe how the plan adapts to discoveries
  • Run a single-pass review on 8 files. Document the inconsistencies in depth and any contradictions
  • Re-run with per-file passes + integration pass. Compare output quality and consistency
  • Classify 5 new task descriptions as "prompt chain" or "dynamic" and justify each classification

Exam Simulation — Task 1.6

Question 1 — Task 1.6 Claude Code for CI/CD
A PR modifying 14 files produces inconsistent single-pass review results: detailed feedback for some files, superficial comments for others, obvious bugs missed, and contradictory feedback — flagging a pattern as problematic in one file while approving identical code elsewhere. How should you restructure the review?
  • ASplit into focused passes: analyze each file individually for local issues, then run a separate integration-focused pass examining cross-file data flow
  • BRequire developers to split large PRs into smaller submissions of 3–4 files before automated review runs
  • CSwitch to a higher-tier model with a larger context window to give all 14 files adequate attention in one pass
  • DRun three independent review passes on the full PR and only flag issues appearing in at least two of three runs
Correct: A
A is correct. Per-file passes ensure consistent depth; the integration pass catches cross-file issues. This is the official exam question Q12. B shifts burden to developers without solving the review quality problem. C is the canonical exam trap — context window size does not solve attention dilution. A model with a 200k token window still processes middle content less reliably than a focused per-file review. D adds cost and complexity without addressing inconsistency — three inconsistent passes still produce inconsistent results.
Question 2 — Task 1.6 Claude Code for CI/CD
A team uses a single Claude Code agent to review a microservices PR touching: 4 service implementations, 3 shared library modules, 1 database migration, and 12 test files (20 files total, given simultaneously). Reviews are technically complete but lack depth — obvious architectural risks are noted superficially while trivial issues receive detailed discussion. What decomposition strategy would most improve review quality?
  • ASwitch to a model with a larger context window so all 20 files receive proportional attention without context dilution
  • BSplit into specialized passes: a per-file structural pass (one agent call per significant file), an integration pass examining cross-service data contracts and shared library changes, and a separate database migration safety pass
  • CRun the same review 3 times with different prompts and take the union of all findings to maximize coverage
  • DProvide a ranked priority list in the system prompt so the agent allocates more reasoning to high-risk files first
Correct: B
B is correct. Decomposing by concern rather than by file count ensures each pass has a focused, well-defined goal: structural correctness per file, cross-service contract integrity, and migration safety are distinct analyses each requiring full context. A is wrong — this is the canonical exam trap: larger context windows don't solve attention dilution. Adding more files to a larger window produces the same uneven depth of analysis. C is wrong: Multiple runs of the same under-scoped review produce consistent shallow analysis — union of three shallow passes doesn't produce depth. D is wrong: Prioritization improves ordering but doesn't fix the fundamental problem — 20 files in one context still produces shallow analysis on lower-priority files regardless of ordering.
Question 3 — Task 1.6 Multi-Agent Research System
A research synthesis task receives findings from 6 domain specialists (all given simultaneously). Post-review, the synthesis agent merged contradictory statistics from reports 2 and 5 without flagging the conflict, and ignored report 4 entirely. Which decomposition strategy would most reliably surface conflicts and ensure complete coverage?
  • AAdd a system prompt instruction: "Identify and flag all contradictions between reports before synthesizing, and confirm you have read all 6 reports"
  • BRun the synthesis agent 3 times and use the majority position on each claim to reduce variance
  • CDecompose into: (1) per-report structured extraction passes normalizing each report into a claim-evidence-source schema, then (2) a conflict detection pass comparing claims pairwise across all 6 reports, then (3) a final synthesis pass that explicitly addresses identified conflicts
  • DPlace report 4 first in the input order to ensure it receives the most attention through the primacy effect
Correct: C
C is correct. Structural decomposition guarantees coverage: by normalizing all 6 reports into a consistent claim schema first, the conflict detection pass has clean structured data to compare — it cannot miss report 4 because all 6 reports are processed in the extraction phase. The synthesis pass then works from verified, structured, conflict-annotated input. A is wrong: The same conditions that caused the miss (6 reports simultaneously in one context) persist — instructions don't change the attention distribution. B is wrong: Majority voting suppresses minority findings. If reports 2 and 5 each have a different contradictory statistic, the "majority" is silence — neither is repeated across 3 runs. D is wrong: Primacy improves the first few reports, not coverage of all 6. Report 4 placed later would still receive diminished attention.
Task Statement 1.7
Task Statement 1.7

Manage session state, resumption, and forking

Long-running investigations don't fit in a single session. This task statement covers how to pause and resume work correctly — and how to explore divergent paths from a shared baseline without contaminating either branch.

The Core Concept

Sessions preserve conversation history and tool results across work sessions. But session resumption is not always the right choice — when the files being analyzed have changed since the last session, the cached tool results are stale and the model will reason incorrectly from them.

The Key Decision: Resume when prior context is mostly valid. Start fresh with an injected summary when prior tool results are stale. The exam tests this decision — "stale tool results from a codebase that has changed" always calls for fresh start with summary injection, not blind resumption.

Session Resumption

▶️

--resume <session-name>

Continues a specific prior named conversation. The full history — including all tool calls and results — is restored. Use when: investigation is paused mid-task and no analyzed files have changed.

📋

Summary Injection

Start a new session but open with a structured summary of prior findings. Use when: files have been modified since the last session — prior tool results no longer reflect reality.

🎯

Targeted Re-Analysis

When resuming after file changes, inform the agent specifically which files changed — don't require full re-exploration of unchanged areas. Focus re-analysis on what's different.

⚠️

Stale Tool Results

A resumed session where files have changed since the last run contains tool results that contradict the current state of the code. The model will reason incorrectly from stale data.

Session Forking

fork_session creates an independent branch from the current session's state. Both branches share the same history up to the fork point, then diverge independently. Neither branch's changes affect the other.

💡
When to fork: You've completed a shared codebase analysis and want to explore two different testing strategies from the same starting point. Fork once after the analysis — then each branch explores its strategy independently, with the full shared analysis as context.
✗ Resume Stale Session
Session from 2 days ago: - Analyzed auth.py → found 3 issues - Analyzed models.py → mapped schema Since then: - auth.py was refactored - models.py has new fields Resume anyway? Model reasons from stale data. Wrong conclusions. Missed issues.
✓ Fresh Start + Summary Injection
New session opens with: "Prior analysis summary: - auth.py: had 3 issues (now refactored) - models.py: schema mapped (new fields added) Files changed since last session: - auth.py (full re-analysis needed) - models.py (check new fields only) Re-analyze these specifically."
  • Use --resume <session-name> for named session continuation when prior context is mostly valid
  • Use fork_session to compare two approaches (e.g., testing strategies, refactoring patterns) from a shared analysis baseline
  • Choose summary injection over resumption when prior tool results are stale — inject a structured summary of findings as the first message
  • When resuming after changes, explicitly tell the agent which specific files changed — enable targeted re-analysis rather than full re-exploration

Exam Traps for Task 1.7

The TrapWhy It FailsCorrect Pattern
Resume a session after the codebase has been refactored Tool results from the previous session reflect the old codebase — model reasons incorrectly from stale data Start fresh with a summary of prior findings; specify which files changed for targeted re-analysis
Fork a session to run two strategies, then merge results back Fork branches are independent — there's no merge operation. Results from forked branches must be collected by the original coordinator session Fork for independent exploration; have the coordinator collect and compare results from both branches
Re-explore the entire codebase after a targeted file change Wastes time and context on files that haven't changed; prior analysis of unchanged files is still valid Inform the resumed session specifically which files changed — only re-analyze those

🔨 Implementation Task

T7

Implement Session Resumption and Forking with Stale Detection

Build session management that correctly handles stale results and enables divergent exploration.

  • Implement a named session workflow: analyze a codebase, pause, resume with --resume and verify context is intact
  • Simulate a stale session: modify two files after a session, attempt resumption — observe where the model reasons incorrectly from old data
  • Fix it: implement fresh start with structured summary injection specifying which files changed
  • Implement fork_session: from a shared analysis baseline, explore "add unit tests" vs "add integration tests" in parallel branches
  • Compare branch results in the original coordinator session and synthesize the better approach

Exam Simulation — Task 1.7

Question 1 — Task 1.7 Developer Productivity with Claude
A developer paused a codebase analysis session yesterday. Since then, two files were significantly refactored. They want to continue the investigation. What is the most reliable approach?
  • AUse --resume to continue the session, then tell Claude about the two changed files in a follow-up message
  • BUse --resume directly — Claude will automatically detect that the files have changed and re-analyze them
  • CStart a new session with a structured summary of prior findings, explicitly noting which two files changed and requesting targeted re-analysis of only those files
  • DDiscard the prior session entirely and re-analyze the full codebase from scratch
Correct: C
C is correct. When tool results from a prior session are stale, starting fresh with a summary injection is more reliable than resumption. The fresh session uses only current file state; the summary carries forward valid findings from unchanged files; the targeted re-analysis focuses effort on what actually changed. A is closer but still problematic — the resumed session already contains stale tool results in its history, and the model will reason from those before seeing the follow-up message. B is wrong: Claude has no mechanism to automatically detect file changes — it reasons from what's in its context window. D throws away all prior valid analysis unnecessarily.
Question 2 — Task 1.7 Developer Productivity with Claude
After completing a comprehensive codebase analysis, a team wants to explore two different refactoring strategies from the same baseline — without either exploration influencing the other. What is the correct tool?
  • AStart two entirely separate new sessions from scratch, each with the codebase analysis re-run
  • BUse fork_session on the completed analysis session to create two independent branches that both share the analysis as context
  • CUse a single session and explore both strategies sequentially, using /compact between each to reduce context
  • DUse --resume twice from the same session — the second resume creates a copy automatically
Correct: B
B is correct. fork_session is exactly the tool for this scenario: both branches inherit the full analysis context, and neither branch's exploration contaminates the other. A wastes effort by re-running the entire codebase analysis twice — fork lets both branches share the work already done. C is wrong: Sequential exploration in a single session means strategy A's findings are in context during strategy B's exploration — they contaminate each other. D is wrong: --resume continues a session, it does not copy or fork it.