The Model Context Protocol (MCP) is Anthropic's open standard for connecting AI models to external tools and data sources. Instead of hard-coding tool definitions into each agent, an MCP server centralizes them: any compliant client (Claude Code, Claude Desktop, custom agents) connects to the server and gets access to its full tool catalog.
This article covers the architecture of a production MCP server built with Spring AI's MCP server support - one that has been running daily since early 2025, backing a fleet of autonomous Claude agents handling CRM sync, document processing, analytics pipelines, and vault maintenance.
Tech Stack
- Java 21, Spring Boot 3.5
- Spring AI 1.1.2 (MCP server / webmvc transport)
- Quartz Scheduler (job scheduling and agent dispatch)
- JGit (git operations)
- dav4jvm (CardDAV / WebDAV contacts)
- Apache Commons Email, SendGrid (email send/receive)
- JSch (SSH to reMarkable tablet)
- SQLite (local persistence)
- Gradle
Architecture: One Server, Multiple Domains
The server exposes tools across 9 functional domains:
- Email - IMAP/SMTP message retrieval, search, metadata, and send
- Filesystem / Obsidian vault - Read, write, move, search, frontmatter operations, section-level edits on a local markdown vault
- Git -
add,commit,log,diff,showon the vault repository - reMarkable tablet - SSH connectivity, document and page retrieval, PNG export, annotation extraction
- CardDAV / WebDAV - Contact read, write, and vCard upload
- Scheduling - Quartz job management: create, list, delete, trigger agents on demand
- System utilities - UUID generation, date/time conversion, epoch math, notifications
- HTTP - Outbound GET requests with response capture
- Email store - Structured email search and metadata retrieval for CRM workflows
All 9 domains run in a single Spring Boot deployment. One server, one connection for agent clients.
Why one server? A single deployment means one connection for clients, a shared Spring context (beans, configuration, and database connections shared across domains), and a single monitoring surface. The tradeoff is a single point of failure - acceptable for personal automation infrastructure, but a concern for multi-tenant or high-availability deployments where domain isolation would matter more.
Defining Tools in Spring AI
Spring AI discovers tools via annotations. A method annotated with @Tool on a @Component gets registered automatically and added to the server's tool catalog.
The tool description matters more than the implementation. Claude uses descriptions to decide which tool to call and what arguments to pass. A few patterns that hold up in production:
Be explicit about side effects. Read tools and write tools need clear differentiation in their descriptions. Tools that modify state should say so - Claude is cautious about side effects and will use that information.
Make error returns structured. Return a consistent error object rather than throwing exceptions. Agents handle structured error responses better than stack traces, and they can make retry or fallback decisions based on error content.
Use natural-language parameter names. vaultRootPath and vaultSubPath communicate their purpose. p1 and p2 make the LLM guess. This is especially important for parameters with non-obvious constraints (e.g., a path that must be relative to the vault root, not an absolute filesystem path).
Test descriptions before wiring into agents. Drop into Claude.ai and simulate the tool selection decision in a message: "I have a tool called X that does Y. Given this situation, would you call it?" If Claude's reasoning doesn't match your expectation, the description needs work before you depend on it in a production workflow.
Agents as Markdown Files
Rather than hard-coding agent logic in Java, agents are defined as markdown files. Each file contains YAML frontmatter (agent ID, schedule, allowed tools) and a system prompt in the body.
A minimal agent definition looks like:
---
agent_id: crm-email-sync
schedule: 0 */60 * * * ?
tools:
- EmailStore_findAllMessageMetadataSinceDateReceived
- EmailStore_findMessagesByEmailIds
- Obsidian_readTextFile
- Obsidian_writeTextFile
- Obsidian_updateFrontmatterProperties
- System_getTimeAsFormat
---
You are a CRM email sync agent. Your job is to find new emails
involving CRM contacts since the last sync date...
The scheduler reads these files and dispatches agents according to their configured Quartz cron expression. Agents can also be triggered on demand via a remote trigger endpoint.
Defining agents as files pays off quickly: they are easy to version-control, modify without a redeploy, and inspect when something goes wrong. Every agent change is a git commit. git log shows exactly what changed when an agent starts misbehaving.
Scheduler Integration
Quartz handles job scheduling. Each agent definition maps to a Quartz job. At fire time, the scheduler resolves the agent file, builds a Claude API request with the system prompt and tool access list, and submits it to the Anthropic API.
The variable ${START_EPOCH} is interpolated into the system prompt at fire time - the current Unix epoch in seconds, with 000 appended to get milliseconds. This gives agents a reliable "now" reference without requiring a time tool call at the start of every run. Agents use it to calculate cutoff dates for incremental sync operations.
Prompt Caching for Scheduled Agents
Scheduled agents have stable system prompts - the same prompt fires on every run. Anthropic's prompt caching caches system prompts with a 5-minute TTL.
For agents that fire every 60 minutes, the cache will usually be cold on arrival. For agents that fire every 10-15 minutes (checker agents polling for new events), caching significantly reduces token costs on system prompts. Mark the system prompt as cacheable in the API request. Cache reads cost roughly 10% of the original write at current pricing - meaningful when a long system prompt fires dozens of times per day.
Tool Count Grows Fast
9 domains sounds manageable. In practice, each domain has 5-15 tools. The total catalog is 60+ tools.
Claude handles large tool catalogs, but each agent should receive only the tools it needs - not the full catalog. This keeps the context window focused, reduces the chance of a wrong tool selection, and makes the agent's behavior easier to reason about. The agent definition file's tools list enforces this at dispatch time.
Logging Tool Call Chains
The most useful debugging data is the sequence of tool calls an agent made and what each one returned. Log this at the server level. When an agent produces unexpected output, the tool call chain tells you exactly where reasoning diverged - whether it was a bad tool selection, an unexpected return value, or a multi-step logic error.
Without this log, debugging a complex multi-step agent run requires re-firing it and hoping the same inputs are still available.