Our CCA-F Exam Dumps searched resources by candidates preparing for the Claude Certified Architect Foundations (CCA-F) certification. The CCA-F certification is a globally recognized foundational credential designed for individuals who want to build a strong career in cloud architecture and modern AI-driven infrastructure. It validates essential skills in system design, cloud fundamentals, and architectural thinking.
The CCA-F certification is ideal for beginners and professionals who want to enter the field of cloud computing and architecture. It provides a solid foundation for advanced certifications and real-world technical roles.
What is CCA-F Certification?
The Claude Certified Architect Foundations (CCA-F) certification is an entry-level exam that evaluates a candidate’s understanding of cloud architecture principles, system design, and infrastructure fundamentals.
It is designed to test both theoretical knowledge and practical problem-solving abilities in modern computing environments. Candidates who pass this certification demonstrate that they have the essential skills required for cloud-based architecture roles.
Best Certification: Claude Certified Architect Foundations (CCA-F) Exam 2026
The CCA-F certification is one of the most valuable entry-level certifications in 2026 for cloud and architecture learners. With increasing demand for cloud professionals and AI-based systems, this certification provides a strong starting point for a successful IT career.
It is highly recommended for individuals aiming to enter cloud computing, system architecture, and AI infrastructure roles.
CCA-F Exam Format
The exam format is designed to evaluate how well candidates can apply their knowledge in real scenarios.
The exam includes:
Multiple-choice questions (MCQs)
Scenario-based problem-solving questions
Cloud architecture fundamentals
System design concepts
AI and infrastructure basics
Candidates are expected to demonstrate logical thinking and a strong understanding of architectural principles.
CCA-F Exam Details
Field
Details
Exam Name
Claude Certified Architect Foundations (CCA-F)
Certification Level
Foundation
Duration
90 Minutes (Approx.)
Question Type
Multiple Choice + Scenario-Based Questions
Mode
Online / Test Center (varies by provider)
Language
English
The exam focuses on practical understanding rather than memorization, ensuring candidates are prepared for real-world challenges.
Exam Criteria and Requirements
The CCA-F certification is beginner-friendly and does not require advanced technical qualifications.
Requirements include:
Basic understanding of IT concepts
Familiarity with cloud computing fundamentals
Logical reasoning skills
Interest in system architecture and AI systems
There are no strict eligibility requirements, making it accessible for beginners and students.
Benefits of CCA-F Certification
The CCA-F certification offers multiple career and learning benefits:
Strong foundation in cloud architecture
Improved job opportunities in IT and cloud industries
Recognition of technical skills globally
Better salary prospects
Preparation for advanced-level certifications
Hands-on understanding of real-world architecture concepts
This certification helps candidates build confidence and technical expertise for future career growth.
FAQs About CCA-F Exam
What is the name of the Claude certification?
The certification is called Claude Certified Architect Foundations (CCA-F).
What is the exam passing score?
The passing score typically ranges between 65% to 75%, depending on the exam provider.
What is the format of the CCA-F exam?
The exam consists of multiple-choice questions and scenario-based architecture questions.
What is the exam requirements?
There are no strict requirements. Basic IT knowledge and understanding of cloud computing concepts are recommended.
How to Prepare for the CCA-F Exam with Pass4surexams
Earning the Claude Certified Architect – Foundations (CCA-F) credential is the ultimate way to prove you can design enterprise-scale, production-grade AI systems. However, because it relies on rigorous, scenario-based architecture questions, memorizing basic documentation won’t cut it. To confidently clear the 720/1000 passing threshold, you need a smart, targeted preparation strategy built around Pass4surexams.
Here is exactly how to ace your CCA-F exam using our proven platform:
Test Drive with Free Demo Questions
Don't dive in blindly. We provide free demo questions directly on our exam page so you can experience the exact caliber of the real proctored exam. These introductory questions mirror the actual formatting and scenario depth of the test, giving you an immediate baseline of where your skills stand before you commit.
Practice in a Realistic Updated Test Engine
The real CCA-F exam features complex, multi-agent workflows and detailed system architecture scenarios. Our updated test engine replicates the precise online testing environment. By practicing under real timed conditions, you will master managing the 120-minute limit across 60 grueling questions, training your brain to quickly diagnose broken agent loops, tool schemas, and context constraints.
We don't just provide study materials; we partner in your success. Our Expert's stands firmly behind our product with a 100% passing guarantee. We are so confident that our premium questions will give you the edge that we back every single purchase with a complete money-back assurance. If you prepare with our materials and do not pass your exam, we will refund your investment.
Get started with your free demo today and claim your certified future!
Anthropic CCA-F Sample Questions
Question # 1
A developer has built their complete agent system and needs to do a final review before productiondeployment. According to the exam guide's reliability checklist, which of the following is NOT arecommended pre-deployment check?
A. Verify that all safety-critical rules are enforced via hooks/code rather than only via prompts B. Confirm that rate limiting and circuit breakers are configured for all external tool calls C. Ensure 100% test coverage of all possible user input combinations D. Validate that compaction preserves critical state information through custom instructions
Answer : C Explanation:
"Ensure 100% test coverage of all possible user input combinations" is not a realistic or recommended pre-deployment check for agent systems. Here's why:
User inputs are essentially infinite and unpredictable — it's technically impossible to enumerate and test every possible combination.
The exam guide (and general AI/agent engineering best practices) focuses on risk-based testing, not exhaustive input coverage.
Aiming for 100% input coverage is a false goal that would consume unlimited time and resources without meaningfully improving reliability.
Why the other options ARE legitimate checks: A. Safety-critical rules enforced via hooks/code (not just prompts)
This is a core principle in Claude agent design — prompts alone can be ignored or bypassed, so critical guardrails must be enforced at the code/infrastructure level.
B. Rate limiting and circuit breakers for external tool calls
Essential for production reliability. Without these, a misbehaving tool or external API can cascade into full system failure. This is a standard reliability checklist item.
D. Compaction preserves critical state via custom instructions
In long-running agents, context windows get compacted (summarized). If critical state isn't preserved through that process, the agent can lose important context mid-task. Validating this is a real and important pre-deployment concern.
Question # 2
Your organization is building a document review agent that processes hundreds of contracts daily. Eachcontract review generates a structured report. They want to measure the quality of reviews over time todetect drift or degradation. What evaluation architecture supports continuous quality monitoring?
A. Randomly sample 5% of reviews for manual human evaluation weekly B. Run every review through a second Claude evaluation pass that scores quality on predefined
dimensions C. Compare each review's structure to a template and flag deviations D. Use a combination: automated structural checks on 100% of reviews plus LLM-based evaluation on
10% sample plus human review of flagged outliers
Answer : D Explanation:D describes a layered evaluation architecture — which is the recommended approach for continuous quality monitoring at scale. Let's break it down:LayerWhat it doesCoverageAutomated structural checksFast, cheap, catches obvious issues100% of reviewsLLM-based evaluationScores quality on deeper dimensions10% sampleHuman reviewCatches what automation misses, validates edge casesFlagged outliers only
This gives you:
Full coverage for structural issues
Deep quality scoring without the cost of running LLM eval on everything
Human judgment where it matters most
This is exactly how production AI monitoring systems are designed — not one method alone, but complementary layers.
Why the others fall short:
A. 5% manual review weekly
Too slow to detect drift in real time
Human-only evaluation doesn't scale to hundreds of contracts daily
Weekly cadence means problems can go undetected for days
B. Second Claude pass on every review
Expensive and slow at 100% coverage
LLM evaluating LLM output can have correlated blind spots — Claude may consistently miss the same errors
Not sustainable as volume grows
C. Template structure comparison only
Only catches formatting/structural deviations
Completely misses content quality, accuracy, and reasoning issues
Too shallow for meaningful quality monitoring
Question # 3
A developer needs to understand how Claude handles system prompt instructions vs. user instructionswhen they conflict. A system prompt says 'Always respond in formal English' but a user says 'talk tome casually bro.' What is the expected behavior according to Claude's instruction hierarchy?
A. User instructions always override system prompt instructions B. Claude will average the two styles and respond in semi-formal language C. System prompt instructions take precedence over user instructions — Claude should maintain
formal English D. The most recent instruction takes precedence regardless of source
Answer : C Explanation:According to Claude's instruction hierarchy:
The system prompt is set by the operator (the developer/organization deploying Claude)
The user is the end user interacting with Claude
Operators have higher trust than users in Claude's hierarchy
Therefore, system prompt instructions take precedence over conflicting user instructions
In this case, the operator has explicitly set a rule — "Always respond in formal English" — so Claude should maintain that regardless of what the user requests.
Why the others are wrong:
A. User instructions always override system prompt
Completely backwards. Users have less authority than operators by design. Operators set the rules for how Claude behaves in their product.
B. Claude averages the two styles (semi-formal)
Claude doesn't "split the difference." It follows the hierarchy. There's no blending mechanism — that would undermine the whole point of operator control.
D. Most recent instruction wins regardless of source
This is how a simple chatbot might work, but not Claude. Recency doesn't trump hierarchy. A user saying something last doesn't override what the operator established in the system prompt.
Question # 4
A team implements an agent with the Agent SDK. They want to add observability: logging each toolcall with timing, inputs, outputs, and model decisions. The Agent SDK uses setting_sources forconfiguration. Where should they implement the logging?
A. In the system prompt: 'Log every tool call you make with its parameters and timing' B. Enable debug mode in the Agent SDK to get automatic logging C. Wrap each tool handler with a logging decorator that captures inputs, outputs, execution time, and
errors before returning results to the Agent SDK D. Add a 'log' tool that Claude calls after every operation
Answer : C Explanation:The correct answer is C.
Why C is correct:
Wrapping tool handlers with a logging decorator is the proper software engineering approach for observability. Here's why:
It captures logging at the code level — reliable, consistent, and not dependent on Claude's behavior
Records inputs, outputs, execution time, and errors automatically every single time
Completely transparent to Claude — it doesn't change how the agent thinks or operates
This is standard production observability practice (same pattern used with APIs, microservices, etc.)
Works with the Agent SDK's architecture by sitting between the SDK and the tool execution
python
# Simple example of the pattern
def logging_decorator(tool_func):
def wrapper(*args, **kwargs):
start = time.time()
result = tool_func(*args, **kwargs)
log(tool=tool_func.__name__, inputs=args,
output=result, time=time.time()-start)
return result
return wrapper
Why the others are wrong:
A. Log via system prompt
Prompting Claude to "log its tool calls" is unreliable — Claude might forget, skip, or format inconsistently
Critical observability should never depend on model behavior alone
Same principle as the earlier question: safety/reliability rules go in code, not prompts
B. Enable debug mode
Debug mode is for development only — not suitable for production observability
Typically too verbose, unstructured, and not customizable for business needs
Not a real production logging strategy
D. Add a 'log' tool Claude calls after every operation
Relies on Claude remembering to call it after every operation — not guaranteed
Adds unnecessary tool calls, increasing latency and cost
Claude could skip it, call it inconsistently, or get distracted mid-task
Question # 5
An architect wants their agent to handle a multi-language customer base (English, Spanish, Japanese).The system prompt is in English. When a customer writes in Japanese, the agent should respond inJapanese. System prompt instructions and tool results are in English. How should the translation behandled?
A. Keep the system prompt and tools in English — Claude naturally responds in the customer'slanguage. Tool results in English are understood by Claude and the response is generated in thedetected language B. Translate the system prompt into each language and use the matching one per request C. Use a translation MCP tool to convert everything to the customer's language before Claude
processes it D. Create separate agent configurations per language
Answer : A Explanation:The correct answer is A.
Why A is correct:
Claude is natively multilingual — this is a core capability, not something that needs to be engineered around. Here's why A works:
Claude can read and understand English system prompts and tool results
Claude can detect the language of the user's message automatically
Claude can generate a response in the customer's language (Japanese, Spanish, etc.) even though its internal context is in English
No translation infrastructure needed — Claude handles this naturally
This is the simplest, most cost-effective, and most reliable architecture.
Why the others are wrong:
B. Translate system prompt per language
Unnecessary complexity — Claude doesn't need the system prompt in the customer's language to respond in that language
Translation of system prompts introduces risk of meaning drift or errors
Multiplies maintenance burden (every system prompt change must be updated in N languages)
C. Use a translation MCP tool for everything
Massive overhead and latency — translating every input/output through a separate tool
Introduces another failure point in the pipeline
Completely unnecessary given Claude's native multilingual ability
D. Separate agent configurations per language
Huge operational overhead — maintaining multiple agents for what Claude handles natively
Scaling problem: what happens when you add a 4th or 5th language?
Overkill solution to a non-problem
The key principle:
Don't engineer solutions for problems Claude already solves natively
Claude's multilingual capability means you get language handling for free. A good architect recognizes when to leverage the model's built-in strengths rather than adding unnecessary complexity.
Question # 6
A developer is implementing streaming for their Claude integration. They want to display a 'typing'indicator while Claude is generating thinking blocks, and switch to displaying text when the responsetext starts. Which streaming events signal this transition?
A. A 'phase_change' event indicates the switch from thinking to text generation B. The content_block_stop event for the thinking block followed by content_block_start for the text
block signals the transition C. The message_delta event includes a 'current_phase' field D. Thinking and text generation happen in separate streaming connections
Answer : B Explanation:The correct answer is B.
Why B is correct:
Claude's streaming API uses content block events to signal what type of content is being generated. The transition from thinking to text happens like this:
content_block_start (type: "thinking") ? show typing indicator
delta events... ? still thinking
content_block_stop ? thinking done ?
content_block_start (type: "text") ? switch to text display ?
delta events... ? stream text to UI
content_block_stop ? response complete
So the exact transition signal is:
content_block_stop ending the thinking block ? hide typing indicator
content_block_start with type "text" ? begin displaying streamed text
This gives you precise, reliable control over your UI state.
Why the others are wrong:
A. phase_change event
This event does not exist in the Anthropic streaming API
A made-up event — don't be fooled by plausible-sounding names in exam questions
C. message_delta includes a current_phase field
Also does not exist — message_delta carries token usage and stop reasons, not phase information
Another fabricated field
D. Separate streaming connections for thinking vs text
Completely false — everything happens over a single streaming connection
Splitting connections would be architecturally bizarre and unnecessary
The key principle:
Content blocks are the fundamental unit of streaming structure in Claude's API
Each block has a clear lifecycle — start ? deltas ? stop — and the type field on content_block_start tells you exactly what's coming. This is how you build responsive, accurate streaming UIs.
Question # 7
A team builds an MCP tool that manages Kubernetes clusters. The tool includes operations likescale_deployment, delete_pod, and drain_node. These are high-impact operations that should requireconfirmation. How should the tool design handle dangerous operations?
A. Add 'DANGEROUS:' prefix to tool descriptions and rely on Claude to warn the user B. Separate dangerous tools into a different MCP server that requires admin credentials C. Require the user to type 'CONFIRM' before each dangerous operation D. Implement a two-phase execution: the tool first returns a preview of what will happen (dry run), and
requires a second call with a confirmation token to execute
Answer : D Explanation:The correct answer is D.
Why D is correct:
The two-phase execution pattern (dry run + confirmation token) is the proper engineering solution for dangerous operations. Here's how it works:
Phase 1 — Preview:
User/Claude calls scale_deployment(replicas=0)
Tool returns: {
"preview": "This will scale deployment 'api-server' to 0 replicas,
causing downtime for all users",
"confirmation_token": "abc123xyz",
"expires_in": 60
}
Phase 2 — Execute:
Claude shows preview to user, user confirms
Tool called again with confirmation_token: "abc123xyz"
? Operation executes
Why this is the right approach:
Confirmation is enforced at the code/tool level — not dependent on Claude's behavior
The preview gives Claude and the user full understanding of consequences before acting
The token ensures the exact same operation is confirmed (no bait-and-switch)
Token expiry prevents stale confirmations from executing later
This is a well-established pattern in infrastructure tooling (Terraform plan ? apply, kubectl dry-run ? apply)
Why the others are wrong:
A. 'DANGEROUS:' prefix in description, rely on Claude to warn
Relies entirely on Claude's behavior — not guaranteed or enforceable
Claude might warn inconsistently or skip warnings under certain prompting
Critical safety must live in code, not prompts (recurring principle)
B. Separate MCP server with admin credentials
Adds access control but doesn't solve the confirmation problem
An admin with credentials can still accidentally trigger dangerous operations
Credentials ? confirmation of intent
C. Require user to type 'CONFIRM'
Better than A, but still weak — it's just a string check with no context
Doesn't show the user what they're confirming (no preview)
Can become muscle memory — users type CONFIRM without reading
No token means no guarantee the confirmed action matches what executes
The key principle:
High-impact irreversible operations need enforcement at the tool level with a preview-before-execute pattern
This is the same philosophy as the logging and safety questions — anything critical must be handled in infrastructure/code, not left to Claude or user discipline alone.
Question # 8
A senior developer is optimizing their token costs. They discover that 60% of their API cost comesfrom input tokens (most content is repeated context). Only 15% comes from output tokens, and 25%from thinking tokens. What optimization provides the biggest cost reduction?
A. Reduce output length with max_tokens to save on the 15% output cost B. Reduce thinking budget to save on the 25% thinking cost C. Switch to a smaller model that has lower per-token costs across all categories D. Implement prompt caching to save on the 60% input cost — cache reads cost only 10% of base
input price
Answer : D Explanation:The correct answer is D.
Why D is correct:
The math is straightforward — attack the biggest cost first.
Cost Category% of TotalOptimizationPotential SavingInput tokens60%Prompt caching (90% discount)~54% of total costThinking tokens25%Reduce budgetPartial savingOutput tokens15%Reduce max_tokensMinimal saving
Prompt caching charges cache reads at only 10% of the base input price — a 90% discount on that category. Since input tokens are 60% of total cost:
Without caching: 60% of bill = input tokens
With caching: 60% × 10% = 6% of bill for same content
Savings = ~54% reduction in total API cost
This is the highest leverage optimization by far
— especially when the developer already confirmed that "most content is
repeated context," which is exactly the use case prompt caching is
designed for.
Why the others are wrong:
A. Reduce output length (15% of cost)
You're optimizing the smallest slice of the bill
Even cutting output tokens by 50% only saves ~7.5% total
Low leverage
B. Reduce thinking budget (25% of cost)
Better than A, but thinking tokens often directly affect output quality
Cutting thinking to save cost can degrade the very results you're paying for
Still less leverage than caching the 60%
C. Switch to a smaller model
Reduces costs across all categories proportionally
But sacrifices capability and quality — may not be acceptable for the use case
Also doesn't exploit the specific insight that repeated context is the problem
Blunt instrument vs. targeted fix
The key principle:
Always optimize the largest cost driver first, and use the tool designed specifically for that problem
Prompt
caching exists precisely for repeated context scenarios. When 60% of
your cost is repeated input and caching gives a 90% discount on that —
it's not even a close call.
Question # 9
A developer is configuring Claude Code for a project that uses both JavaScript and Rust. TheJavaScript code follows Prettier formatting while Rust follows rustfmt conventions. How shouldCLAUDE.md handle this dual-language setup?
A. Include both formatting conventions in the root CLAUDE.md with clear section headers B. Create separate CLAUDE.md files in the js/ and rust/ directories C. Use .claude/rules/ with glob-based rules: one file with **/*.js,**/*.ts pattern for Prettier rules, and
another with **/*.rs pattern for rustfmt rules D. Rely on the language-specific formatters and don't include formatting rules in CLAUDE.md
Answer : C Explanation:The correct answer is C.
Why C is correct:
Using glob-based rules in .claude/rules/ is the most precise and scalable approach for a dual-language setup:
.claude/
rules/
javascript.md # applies to **/*.js, **/*.ts
rust.md # applies to **/*.rs
Each rules file only activates when Claude is working on files that match its glob pattern. So:
Editing a .js file ? Prettier rules load automatically
Editing a .rs file ? rustfmt rules load automatically
No cross-contamination between language conventions
Why this is best:
Rules are context-activated — Claude gets exactly the relevant rules for the file it's editing
Clean separation of concerns — each language's conventions live in their own file
Scales easily — adding a 3rd language (Python, Go, etc.) is just adding another rules file
No cognitive overhead — Claude isn't processing Rust rules while editing JavaScript
Why the others are wrong:
A. Both conventions in root CLAUDE.md with section headers
Claude loads all rules for every file regardless of language
Risk of confusion or mixing conventions when context switching
Works but is less precise than glob-based targeting
B. Separate CLAUDE.md in js/ and rust/ directories
Only works if your project is strictly separated by directory
Real projects often have mixed structures or shared directories
Less flexible than glob patterns which match by file extension regardless of location
D. Rely on formatters, skip CLAUDE.md entirely
Formatters handle auto-formatting but Claude still needs to know conventions when writing new code
Without rules, Claude may generate code that passes logic but fails formatting checks
Leaves Claude without guidance on style decisions formatters don't enforce
The key principle:
Glob-based rules give Claude precise, context-aware instructions — the right rules for the right files at the right time
This is the most targeted and maintainable architecture for multi-language projects in Claude Code.
Question # 10
A platform engineer is building a system where multiple Claude agents share information through acentralized message bus. Agent A publishes analysis results, Agents B and C consume results relevantto their domains. The messages include metadata, status, and data payloads. What is the keyconsideration for the message format?
A. Use the same structured format that each consuming agent can parse without additional tool calls,
with clear metadata for routing and versioning B. Use plain text messages for maximum compatibility C. Use different formats per agent pair for maximum efficiency D. Let each agent define its preferred input format
Answer : A Explanation:The correct answer is A.
Why A is correct:
In a multi-agent message bus architecture, the message format is the contract between all agents. A well-designed format needs:
json
Structured format means agents can parse directly without extra tool calls — reducing latency and cost
Clear metadata enables routing (Agent B and C only consume messages relevant to their domain)
Versioning in metadata allows the system to evolve without breaking existing consumers
Consistency means all agents speak the same language — no translation layer needed
This mirrors real-world message bus design (Kafka, RabbitMQ, etc.) — proven patterns apply
Why the others are wrong:
B. Plain text for maximum compatibility
Plain text requires each agent to parse/interpret meaning from unstructured content — unreliable and expensive
No standard structure means no reliable routing or versioning
"Maximum compatibility" is a false benefit — structured formats like JSON are universally compatible
C. Different formats per agent pair
Creates an N×N compatibility problem — every new agent needs custom format handling for every other agent
Impossible to maintain at scale
Defeats the entire purpose of a centralized message bus
D. Let each agent define its preferred input format
Same problem as C — no shared contract means chaos
Agent A would need to know every consumer's preferred format and publish multiple versions
Not scalable and tightly couples producers to consumers
The key principle:
In
multi-agent systems, a shared structured message format with metadata
for routing and versioning is the foundation of a maintainable, scalable
architecture
The message bus is only as good as the contract it enforces. Option A is the only answer that treats the message format as a first-class architectural concern rather than an afterthought.
Question # 11
A developer is using Claude's structured output to generate JSON configuration files. They need theoutput to include comments explaining each configuration option. JSON doesn't support comments.What is the best structured output approach?
A. Use json_object mode which allows comments in the output B. Use YAML mode instead of JSON for comment support C. Generate the JSON and comments in separate API calls D. Generate a schema with parallel fields: each config option has a 'value' field and a 'comment' field,
e.g., {port: {value: 8080, comment: 'Server listening port'}}
Answer : D Why D is Correct?
This is the most elegant and practical solution within the constraints of JSON. By designing a schema where each configuration option has both a value and a comment field, you:
Stay within valid JSON — no syntax violations
Keep everything in one response — comments are co-located with their respective config values
Remain machine-readable — the JSON can still be parsed programmatically
Allow downstream processing — a script can strip out comment fields to produce a clean config, while humans can read the annotated version
Key Takeaway: When a format has limitations (like JSON lacking comment support), the best approach is to design your schema to work around the limitation — not fight the format or add unnecessary complexity.
Question # 12
An AI engineer is implementing an evaluation framework for their agent system. They want to measurewhether the agent asks for clarification when a customer's request is ambiguous rather than makingassumptions. What evaluation metric captures this behavior?
A. Response accuracy — measuring whether final answers are correct B. Clarification rate — measuring the percentage of ambiguous inputs where the agent asks for
clarification before acting, compared against a ground truth set of inputs that require clarification C. Response time — faster responses indicate the agent made assumptions D. Token efficiency — lower tokens mean more decisive responses
Answer : B Why B is Correct?
This metric is purpose-built for exactly the behavior being measured. Here's why it's the right fit:
It directly measures the target behavior — whether the agent recognizes ambiguity and responds appropriately by asking for clarification
It uses a ground truth set of known ambiguous inputs, making it objective and repeatable
It produces a quantifiable percentage, making it trackable over time and across model versions
It captures both failure modes:
Agent asks for clarification when it shouldn't ? false positives
Agent makes assumptions when it should ask ? false negatives
How it works in practice:
Clarification Rate = (Ambiguous inputs where agent asked for clarification)
?????????????????????????????????????????????????????
(Total ambiguous inputs in ground truth set)
Target: as close to 100% as possible
Key Takeaway: When evaluating agent behavior, your metric must directly observe that behavior — not infer it from indirect signals like speed or token count. Clarification Rate does exactly that by comparing agent actions against a curated ground truth of ambiguous cases.
Question # 13
A developer wants to add a keyboard shortcut in Claude Code that switches between regular mode andplan mode during interactive sessions. What is the default keyboard shortcut for toggling plan mode?
A. Shift+Tab to toggle between plan mode and regular mode B. Tab to cycle between modes C. Ctrl+P to toggle plan mode D. Escape to enter plan mode
Answer : A Explanation:Shift+Tab is the default keyboard shortcut for toggling plan mode in Claude Code. It cycles through modes — including Plan — mid-session, with no restart required. LowcodeMore specifically, Shift+Tab cycles through the three modes in order: Edit ? Auto-Accept ? Plan. Medium
Here's why the other options are incorrect:
B. Tab — plain Tab is not a mode-cycling shortcut; it's Shift+Tab.
C. Ctrl+P — this is not a Claude Code shortcut for plan mode. You can type /plan as a slash command, but Ctrl+P is not the keyboard shortcut.
D. Escape — pressing Escape (or Esc twice) is used to cancel or rewind a conversation, not to enter plan mode.
Bonus tip: On Windows with Claude Code 2.1.3+, there is a known bug where Shift+Tab only cycles between Edit and Auto-Accept, skipping Plan mode entirely. The workaround is to use the /plan slash command or the mode selector in the UI instead.
Question # 14
A team is building an MCP server that integrates with Salesforce. Their tool catalog includes:search_contacts, get_contact_by_id, update_contact, create_opportunity, get_opportunity_pipeline,update_opportunity_stage, generate_report, and send_email. This is 8 tools. During testing, Claudeoccasionally picks the wrong tool (e.g., search_contacts when get_contact_by_id is more appropriate).What improves tool selection accuracy?
A. Add detailed descriptions that clearly differentiate each tool's purpose with usage guidance: e.g.,'search_contacts: Use when you need to FIND contacts by name, email, or company. NOT forretrieving a known contact (use get_contact_by_id instead)' B. Reduce to 4 tools by combining related operations C. Add a tool_router tool that Claude calls first to determine which tool to use D. Rename tools with numbered prefixes: tool1_search, tool2_get, etc.
Answer : A Explanation:The root cause of Claude picking the wrong tool is ambiguity in tool descriptions — Claude can't clearly distinguish when to use one over another. Detailed descriptions with explicit usage guidance and negative examples ("NOT for X, use Y instead") directly solve this. Here's why each option stands: A ? — Add detailed, differentiating descriptions
This is exactly what Anthropic's prompt engineering guidance recommends for tool selection. The example given is ideal because it:
States the specific trigger condition ("when you need to FIND contacts")
Adds a negative example that steers away from the wrong tool ("NOT for retrieving a known contact")
Cross-references the correct alternative ("use get_contact_by_id instead")
This is low-effort, zero architectural change, and directly targets the failure mode observed in testing. B ? — Reduce to 4 tools by combining operations
Merging tools trades one problem for another. A combined search_or_get_contact tool now pushes the disambiguation problem inside the tool call — Claude has to pass a parameter that decides behavior, which is harder to get right and makes tool behavior less predictable. You also lose the clarity of single-responsibility tools.
C ? — Add a tool_router tool
This adds latency (an extra round trip on every invocation), burns tokens, and is redundant. The whole point of well-written tool descriptions is that Claude already performs routing natively. Adding a meta-tool to do what descriptions should do is an architectural antipattern that compounds the problem rather than fixing it.
D ? — Rename with numbered prefixes
tool1_search, tool2_get conveys zero semantic meaning. Tool names and descriptions are how Claude understands intent — making names less descriptive makes selection worse, not better. This solves nothing.
Question # 15
A technical lead is migrating their agent from a 200K token context model (Claude Sonnet 4.5) to a 1Mtoken context model (Claude Sonnet 4.6). They expect to handle longer documents. What otherconsideration should they account for with the larger context?
A. The 1M context model has the same pricing per token, so costs only increase proportionally to usage B. Prompt caching minimum thresholds differ between models — Sonnet 4.5 has 1024 token minimum
while Sonnet 4.6 has 2048 token minimum, affecting caching strategy C. The architecture doesn't need any changes — just update the model parameter D. The response quality is identical regardless of context length
Answer : B Explanation:The answer is B, but with an important nuance — the specific numbers in the question are partially correct but need clarification.
What the official docs actually say:
The prompt caching minimum threshold is 1024 tokens for Claude Sonnet 4.5 (and a select group of other models). For Sonnet 4.6 and Opus 4.6, the minimum is 2048 tokens. So option B's core claim — that thresholds differ between the two models — is accurate and real. Here's a full breakdown: A ? — "Costs only increase proportionally" This
is misleading. Sonnet 4.6 includes the full 1M token context window at
standard pricing, so a 900K-token request is billed at the same
per-token rate as a 9K request — that part is true. But "only
proportionally" ignores the doubled caching threshold (from 1024 ? 2048
tokens), which can silently break an existing caching strategy and cause
unexpected cost increases if prompts that were previously cached no
longer qualify B ? — Prompt caching minimums differ This is the correct and most practically important consideration. The minimum token thresholds for prompt caching are: Haiku 4.5 requires 1,024 tokens, while Sonnet 4.6 and Opus 4.6 require 2,048 tokens. A
team migrating from Sonnet 4.5 (1024-token minimum) to Sonnet 4.6
(2048-token minimum) could silently lose all prompt caching benefits if
their cached content — say, a 1,500-token system prompt — falls below
the new threshold. Shorter prompts cannot be cached even if marked with cache_control, and no error is returned, making this a subtle and easy-to-miss regression C ? — "Just update the model parameter" This
is the most dangerous option. Beyond the caching threshold issue,
longer context also affects latency, cost per request, and the need to
revisit prompt structure. Simply swapping the model string without
reviewing caching strategy, cost projections, and context management is a
recipe for surprise bills and silent performance regressions. D ? — "Response quality is identical regardless of context length" False.
Research consistently shows that model attention and retrieval accuracy
can degrade with very long contexts, particularly for information in
the middle of the window (the "lost in the middle" problem). Quality
considerations absolutely accompany a 200K ? 1M context migration.
The key takeaway:
When migrating to a larger context model, always re-audit your prompt
caching strategy. The higher minimum threshold on Sonnet 4.6 means
previously-cached prompts may silently stop being cached, turning a
cost-saving feature into a hidden cost driver with zero error feedback.
Question # 16
A developer has a prompt engineering challenge: they need Claude to fill in a form that has 20 fields,but typically only 5-8 fields are relevant for any given request. The remaining fields should be null.Using structured output with all 20 fields as required causes Claude to hallucinate values for irrelevantfields. What schema design fixes this?
A. Make all 20 fields optional (nullable) with clear descriptions indicating when each field applies B. Use a dynamic schema generated per-request based on the input context C. Keep all fields required but add a 'confidence' field alongside each to flag uncertain values D. Create 4 different schemas for different form types, each with only the relevant required fields
Answer : B Explanation: A is correct.
This is a classic structured output problem where the schema design is forcing hallucination. Here's the full breakdown:
A ? — Make all 20 fields optional (nullable) with clear descriptions The hallucination is caused by required semantics. When a field is marked required, Claude is constrained to produce something — so it invents plausible-sounding values rather than leave a field empty. Removing that constraint resolves it directly.
The fix is two-part:
Nullable/optional fields — Claude can now legitimately return null without violating the schema
Clear descriptions indicating when each field applies — this is equally critical. Without guidance like "Only populate if the request involves a physical address", Claude still has to guess. Good descriptions give Claude the decision rule to apply, not just permission to omit
This
approach requires zero architectural change, works within a single
schema, and keeps the output structure predictable across all requests. B ? — Dynamic schema generated per-request This sounds sophisticated but introduces real problems. Generating a schema per-request means you need a first pass
to analyze the input and decide which fields are relevant — which is
exactly the reasoning you want Claude itself to do during form-filling.
You've now added latency (an extra LLM call or classification step),
complexity in schema management, and a new failure mode: what if the
schema generator gets it wrong? You've also lost a single predictable
output shape, complicating downstream parsing. Option A achieves the
same selective population without any of this overhead.
C ? — Add a confidence field alongside each
This doubles your schema complexity (20 fields ? 40 fields) without solving the root cause. Claude still has to produce a value
for required fields — now it just also rates how confident it is in the
hallucination. Downstream, you'd need logic to discard low-confidence
values, which means you're building the null-handling you could have
gotten for free with option A. It also makes the output significantly
harder to parse and use.
D ? — Create 4 schemas for different form types This only works if your form types are perfectly discrete and known in advance — which the question implies they aren't (5–8 variable
fields relevant per request). You'd also need a classification step to
route to the right schema, and edge cases that span multiple "types"
would fall through the cracks. It trades one maintenance problem
(hallucinated fields) for another (schema proliferation and routing
logic). Option A handles the variability naturally.
The core principle: Schema constraints are instructions. Marking a field required tells Claude it must
have a value, which guarantees hallucination when the data doesn't
exist. Making fields optional with clear applicability descriptions
tells Claude when to populate them — which is the actual behavior you want.
Question # 17
A solutions engineer is troubleshooting an agent that uses the Agent SDK. The agent should use the'search' tool before answering questions, but it frequently answers from training knowledge withoutsearching. The tool is properly defined and the system prompt says to 'use the search tool to findanswers.' What is the most effective fix?
A. Set tool_choice: {'type': 'tool', 'name': 'search'} for the first turn to force the initial search, then
switch to 'auto' for subsequent turns B. Rewrite the system prompt to be more emphatic: 'ALWAYS use the search tool' C. Set tool_choice: 'any' to force tool use on every turn D. Add a PreToolUse hook that blocks non-search responses
Answer : A Explanation: A ? — Force tool_choice: {type: 'tool', name: 'search'} on the first turn, then switch to autoThis precisely matches the failure mode. The agent is skipping the search on the initial turn — answering directly from training knowledge before any tool use occurs. Forcing tool_choice to a specific tool on turn 1 makes that search non-negotiable at the API level, not just at the instruction level.Then switching to auto for subsequent turns is the right design because:
Follow-up reasoning, synthesis, and clarifying questions shouldn't require a search every time
The model needs freedom to call other tools or respond directly once it has retrieved real information
Locking tool_choice permanently would cause unnecessary searches on turns where they add no value
This is a surgical, architectural fix — it solves the problem exactly where it occurs (turn 1) without over-constraining the rest of the conversation. The core principle: When instruction-following fails for a specific behavioral requirement, move enforcement from the prompt layer to the API constraint layer. tool_choice with a named tool is a hard constraint — the model cannot route around it regardless of what its training knowledge suggests. Reserve that constraint for exactly the turn where the behavior is required, then release it.
Question # 18
A team maintains a Claude Code project with a growing CLAUDE.md file that's now 800 lines. ClaudeCode seems to be ignoring rules defined in the latter half of the file. What is the recommendedapproach for managing large CLAUDE.md files?
A. Delete rules that seem to be ignored — they're probably unnecessary B. Increase Claude's context window to ensure the full file is processed C. Convert the CLAUDE.md to a more compact YAML format D. Split into a concise root CLAUDE.md with the most critical rules, use @import for detailed
sections, and use .claude/rules/ files for context-specific rules with glob patterns
Answer : D Explanation:D is correct — and confirmed by official Claude Code documentation and best practices.D ? — Concise root CLAUDE.md + @import for detail + .claude/rules/ with glob patternsThis directly addresses the root cause. If your CLAUDE.md is too long, Claude ignores half of it because important rules get lost in the noise. The fix operates on two levels: Structural decomposition: The .claude/rules/ directory lets you organize project instructions into multiple focused rule files instead of one large CLAUDE.md. All markdown files in this directory are automatically loaded into Claude Code's context when launched. Context-aware loading via glob patterns: The most powerful aspect of .claude/rules/ is conditional application — you can scope rules to specific files using YAML frontmatter with a paths field. This means React component rules only load when working on React files, database rules only when touching database files, and so on — the right rules are injected at the right moment rather than everything competing for attention all the timePriority is preserved: Rules files load with the same high priority as CLAUDE.md. When a rule has paths frontmatter, it only loads (and receives high priority) when Claude is working on matching filesThe practical structure looks like:
A developer builds an MCP tool that generates images using Stable Diffusion. The tool takes a promptand returns a base64-encoded image. Claude then describes the image to the user. However, Claude'sdescriptions don't match the generated images because Claude can't actually see the tool output as animage. How should this be architectured?
A. Have Claude generate the image description from the original prompt without seeing the actual
image B. Have the tool return a text description of what it generated alongside the image C. Store the image and return a URL that Claude mentions to the user D. Include the base64 image in a subsequent message with image content type so Claude can actually
see and describe it
Answer : D Explanation:D is correct.
This
is fundamentally a data type problem — Claude is receiving image data
encoded as a string, not as an image content block, so it literally
cannot see it.
D ? — Include the base64 image in a subsequent message with the image content type
The Anthropic API supports multimodal content blocks. A tool result can include — or a follow-up message can contain — an image content block with source.type: "base64" and the correct media_type.
When structured this way, Claude actually processes the image through
its vision capabilities rather than seeing a wall of base64 text.
The correct architecture:
Tool runs, generates the image, returns base64
The MCP layer or orchestrator wraps that base64 in a proper image content block
Claude receives it as an actual image and can genuinely see and describe it
This
is the only option that gives Claude real visual grounding — everything
else is a workaround that produces descriptions disconnected from
actual output. The core principle: Claude's multimodal capabilities only activate when image data is passed through the correct content type structure in the API. Base64 as a raw string is opaque text. Base64 wrapped in {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}} is a real image Claude can see. The architecture fix is in how the data is typed and delivered, not in working around the limitation.
Question # 20
Your team is designing an agent evaluation framework. They need to test whether the agent handlestool failures gracefully across 50 different error scenarios. Running each scenario manually takes 10minutes. What is the most efficient approach?
A. Create a comprehensive test prompt that covers all 50 scenarios in one conversation B. Test 10 representative scenarios and extrapolate for the remaining 40 C. Hire a QA team to manually test each scenario D. Use the Message Batches API to run all 50 scenarios as a batch — each scenario is a separate
request with a mocked tool failure, and results are analyzed programmatically
Answer : D Explanation:D — Message Batches API with mocked tool failures, analyzed programmatically This is purpose-built for exactly this use case. The math alone makes the case: 50 scenarios × 10 minutes each = 8+ hours of sequential manual testing. The Batches API runs all 50 in parallel as a single async job at a 50% token cost discount, and returns structured results you can analyze programmatically — pass/fail rates, failure mode categorization, response pattern analysis — without any human in the loop. The architecture is clean:
Each of the 50 scenarios is a self-contained request with a system prompt defining the agent, a user message triggering the tool call, and a mocked tool result returning the specific error condition to test
The batch job runs them all concurrently overnight or in the background
Results come back as structured output you diff against expected graceful-handling behavior
Regressions are caught automatically on every future code change by re-running the batch
This is how evaluation frameworks are supposed to work — deterministic, repeatable, scalable, and cheap enough to run continuously.
Question # 21
A developer's agent needs to process CSV files uploaded by users. The files vary from 100 rows to 1million rows. For small files, direct context inclusion works well. For large files, the context can't holdall data. What scalable architecture handles both cases?
A. Always process the full CSV regardless of size, relying on compaction if needed B. Always chunk the CSV into 100-row segments and process sequentially C. For small files (<5K rows): include directly in context. For large files: load into a database and
provide SQL query tools for the agent to explore the data on demand D. Convert all CSVs to summary statistics before processing
Answer : C Explanation:A size-based routing strategy handles both cases efficiently. Small files work well in context (full datavisibility, no tool overhead). Large files are loaded into a queryable database, giving the agent ondemand access without context overflow. Option A can't handle 1M rows. Option C is inefficient forsmall files. Option D loses individual record access.
Question # 22
A development team is implementing context awareness (citations) for a RAG application. They wantto display inline citations like 'According to [Source A], the market grew by 15%.' The API returnscitation objects with character-level indices. How should the client render these?
A. Use the citation indices to insert reference markers into the text, linking each marker back to the
specific passage in the source document B. Append all citations as footnotes at the end of the response C. All of the above are valid rendering approaches depending on UX requirements D. Create a sidebar with source highlights that appear when hovering over cited text
Answer : C Explanation:Context awareness provides character-level citation data that supports multiple rendering approaches:inline reference markers, footnotes, interactive sidebars, or highlighted source passages. The choicedepends on the application's UX requirements. The API provides the raw data (character indices tosource locations), and the client decides the presentation. All three approaches are valid uses of thecitation data.
Question # 23
A team is evaluating their Claude Code setup and wants to understand the permission model. They adda new tool via MCP that can execute shell commands. When an agent tries to use this tool, it gets apermission error. Where are MCP tool permissions managed?
A. In .claude/settings.json under the 'allowedTools' and 'blockedTools' arrays, with glob pattern
matching for MCP tool names B. In the CLAUDE.md file under an 'allowed_tools' section C. In the MCP server configuration file D. Permissions are managed per-conversation and reset each time
Answer : A Explanation:Claude Code manages MCP tool permissions through settings files (.claude/settings.json for projectlevel, settings.local.json for local overrides) using 'allowedTools' and 'blockedTools' arrays. Thesesupport glob patterns for MCP tool names (e.g., 'mcp_server_*' to allow all tools from a specificserver). Option A conflates documentation with configuration. Option C doesn't control Claude Codepermissions. Option D doesn't match the persistent model.
Question # 24
A developer builds an MCP server that provides a search tool. The search tool returns results withURLs. After deployment, they notice Claude sometimes fabricates URLs that look similar to realresults but don't exist. What server-side measures prevent this?
A. Add URL validation in the system prompt instructing Claude to only use URLs from tool results B. Use context awareness (citations) to track which URLs come from tool results vs. model generation C. Implement client-side URL validation that checks all URLs in Claude's response D. Return URLs as structured data in tool results (not embedded in text) and instruct Claude to
reference results by ID rather than regenerating URLs
Answer : B Explanation:Context awareness tracks which parts of Claude's response reference specific input context (includingtool results). This enables the application to verify that URLs in the response actually originated fromtool results rather than being generated by the model. Combined with client-side validation, thisprovides attribution verification. Option A is unreliable. Option B helps but doesn't fully preventfabrication. Option C validates but doesn't attribute.
Question # 25
A product team is designing a fraud detection agent for a bank. The agent analyzes transactions in realtime and flags suspicious activity. False positives block legitimate transactions, causing customerfrustration. False negatives allow fraud through. Currently the agent flags 30% of transactions, ofwhich 90% are false positives. What analysis framework should be applied?
A. Lower the sensitivity threshold to reduce false positives B. Use a tiered confidence system: high-confidence fraud ? block immediately, medium-confidence? add friction (2FA), low-confidence ? allow with monitoring. Each tier has different precision/recalltradeoffs C. Remove automated blocking entirely and rely on post-transaction review D. Train a separate classifier to pre-filter the agent's results
Answer : B Explanation:A tiered confidence system maps different certainty levels to different actions. High-certainty fraud isblocked (high precision required), medium certainty adds friction without blocking (balancedapproach), and low certainty allows transactions with monitoring (prioritizing availability). Thisframework addresses the precision/recall tradeoff at each tier. Option A reduces all sensitivity withoutnuance. Options C and D address symptoms.
Join the Conversation
Be part of the conversation — share your thoughts, reply to others, and contribute your experience.
Jason Reed
I passed on my first attempt after focusing on deployment and disaster recovery topics.
Jason Reed
I passed on my first attempt after focusing on deployment and disaster recovery topics.
Sofia Bennett
Same here. Practicing infrastructure workflow questions daily really improved my confidence.