Model Context Protocol Servers, widely known as MCPs, are the software components that give AI agents their hands. Where the Model Context Protocol defines the rules of engagement, an MCP server is the actual implementation — a lightweight, purpose-built application that connects an AI agent to a specific external system, whether that's a database, a file system, a calendar, or a third-party API. Any compatible AI host can connect to an MCP server, discover what it can do, and start using it, all without a single line of custom integration code.
That last part is worth pausing on. Before this standard existed, connecting an AI agent to a new tool meant writing bespoke integration code from scratch, every single time. A team building an agent that needed to read from a database, check a calendar, and query a CRM was looking at three separate custom integrations, each with its own authentication logic, error handling, and maintenance burden. MCPs replace that entire mess with a single, standardized interface. The protocol is the rulebook; the server is the player on the field.
When developers say "we built an MCP for that," they mean they created one of these server implementations. Each server is designed around a single domain — one server for the database, one for the calendar, one for the CRM. This modular approach lets teams assemble a library of independent servers that an AI agent can compose together to solve complex, multi-step problems.
What Lives Inside a Server
Every MCP server exposes its capabilities through three distinct types of interfaces. Understanding these three types is the key to understanding what any given server can and cannot do.
The first type is resources. Resources are read-only data surfaces. They give the agent context about the world it's operating in — a database schema, a text file, an API response, a list of available records. Resources are accessed via specific URIs, so the agent can request exactly the information it needs without the server having to push everything at once.
The second type is prompts. These are pre-defined templates stored on the server that guide how the agent interacts with the underlying system. By keeping prompts server-side, developers can update and refine the interaction logic without touching the agent itself. It's a clean separation of concerns that makes the whole system much easier to maintain.
The third type — and the most powerful — is tools. Tools are how the agent takes action: writing a file, updating a database record, triggering an API call, sending a message. Tools are defined with strict input and output schemas, so the agent knows exactly what parameters to provide and what to expect back. Critically, well-designed tools must be stateless and idempotent, meaning they can be called multiple times without causing unintended side effects. AI agents are non-deterministic by nature — they may retry a failed request, run multiple tools in parallel, or abandon a workflow midway through. A tool that isn't idempotent can turn that unpredictability into a data corruption incident (Janakiram MSV, 2025).
Choosing Between Local and Remote Deployments
Model Context Protocol Servers can be deployed in two fundamentally different configurations, and the choice between them shapes everything from security posture to latency.
A local deployment runs the server on the same machine as the AI agent or host application. Communication happens over standard input and output streams — about as simple as it gets. Because data never leaves the local environment, this approach offers the highest security profile and the lowest latency. It's the natural choice for servers that interact with local file systems, sensitive internal databases, or any system where data residency is a hard requirement.
A remote deployment, by contrast, hosts the server on separate infrastructure — a cloud instance, a managed service, or a shared internal platform. Communication occurs over HTTP, typically using Streamable HTTP for streaming responses. Remote servers are essential for integrating with external SaaS platforms or for providing shared capabilities across multiple agents running in different environments. The trade-off is real: remote deployments introduce authentication complexity, network latency, and a broader attack surface that requires careful engineering to manage (Apideck, 2025).
The deployment choice also affects how the server is discovered. Local servers are typically registered directly in the host application's configuration. Remote servers can be discovered dynamically via a registry or directory service, which opens the door to more flexible, adaptable workflows where agents can find and utilize new capabilities on the fly.
The Role of Context in Server Design
When building an MCP server, one of the most critical design decisions is how to manage context. AI agents operate within strict context window limits, meaning they can only process a certain amount of information at any given time. If a server returns too much data, it can overwhelm the agent, causing it to lose track of its original goal or simply fail to process the response. Conversely, if a server returns too little data, the agent may not have enough information to make an informed decision, leading to hallucinated or incorrect actions.
To strike the right balance, developers must design their servers to be context-aware. This often involves implementing pagination or filtering mechanisms for resources, allowing the agent to request data in manageable chunks. For example, instead of returning an entire database table, a server might return only the first ten rows, along with a token that the agent can use to request the next ten rows if needed. This approach not only respects the agent's context limits but also improves overall system performance by reducing the amount of data that needs to be transmitted over the network.
Another key aspect of context management is the use of semantic search. Rather than forcing the agent to sift through raw data, a server can use embeddings to find and return only the most relevant information based on the agent's current task. This is particularly useful for servers that integrate with large document repositories or knowledge bases. By offloading the search and filtering logic to the server, developers can ensure that the agent receives exactly what it needs, exactly when it needs it, without wasting valuable context space on irrelevant details.
Ultimately, the goal of context management is to make the server as helpful and efficient as possible. A well-designed server acts as a smart assistant to the AI agent, anticipating its needs and providing the right information in the right format. This requires a deep understanding of both the underlying data source and the specific capabilities and limitations of the AI models that will be consuming the data. As the MCP ecosystem continues to mature, we can expect to see more sophisticated context management strategies emerge, further blurring the line between the agent's internal reasoning and the external world it interacts with.
Getting a Server Ready for Production
There's a significant gap between a server that works on a developer's laptop and one that holds up in production. Moving an MCP server across that gap is where the real engineering discipline kicks in. A server must behave like a robust microservice — handling concurrent requests, surviving network failures, and responding gracefully to malformed inputs. The idempotency requirement for tools is especially critical here. An AI agent in production may retry a failed request, run multiple tools in parallel, or abandon a workflow midway through. If a tool modifies state in a non-idempotent way, any of those behaviors can corrupt data or leave systems in an inconsistent state.
Observability is non-negotiable, and it's also one of the areas where MCP servers most often fall short in early implementations. Production servers need structured logs with correlation identifiers, latency metrics, and success and failure rate tracking. Without this instrumentation, debugging a misbehaving agent in production is like trying to find a specific conversation in a room where everyone is talking at once. Servers should also explicitly surface their rate limits and soft limits to the client, so the agent can budget its tool calls and avoid hammering the underlying system (Janakiram MSV, 2025).
Handling large outputs deserves special attention. When a tool generates a massive amount of data, returning the entire payload in a single response can overwhelm the agent's context window. Production servers should instead return handles or URIs to resources, letting the agent fetch data incrementally. For long-running operations, Streamable HTTP lets the server emit incremental progress chunks, keeping the agent informed without holding open idle connections.
The Registry and the Growing Ecosystem
The real promise of the MCP standard isn't any single server — it's the ecosystem of servers that's emerging as more developers adopt the protocol. Thousands of community-built integrations are now available, covering everything from popular developer tools to enterprise CRM systems. The official MCP Registry serves as the central discovery layer for this ecosystem.
The registry doesn't host server code. Instead, it stores standardized metadata in a specific JSON format: the server's unique namespace, its location on a package registry like npm or Docker Hub, and execution instructions. By decoupling the metadata from the code, the registry lets developers use their preferred hosting and distribution platforms while still participating in the broader discovery ecosystem (Model Context Protocol, 2025).
Trust is handled through namespace authentication. Publishers must verify ownership of their domains or GitHub accounts before listing a server, which ensures that a server claiming to integrate with a specific platform was actually published by the legitimate owner of that platform. Downstream aggregators and marketplaces consume this metadata via a REST API, adding community ratings, curation, and security scanning on top of the baseline registry.
Navigating the Security Risks
Connecting autonomous AI agents to internal systems is genuinely powerful, and genuinely risky. MCP servers must implement robust access controls and follow the principle of least privilege. For remote servers, OAuth 2.1 is the required authentication standard. Predictable session identifiers are a known attack vector and must be avoided.
The confused deputy problem is a particularly subtle risk. It occurs when an AI agent, acting on behalf of a user, is tricked by a malicious third-party server into executing an unauthorized command. Defending against this requires servers to never echo secrets in tool results or elicitation messages, and to require explicit human confirmation before any action that changes state or incurs a cost. Human-in-the-loop workflows aren't just a nice-to-have — for high-stakes operations, they're the last line of defense (Janakiram MSV, 2025).
Tool poisoning is the other major threat. A malicious server can provide intentionally flawed prompts or resources to manipulate an agent's behavior in ways that are difficult to detect. The defense is strict validation: all inputs and outputs must be checked against predefined JSON schemas, and host applications should implement sandboxing and isolation to limit the blast radius of a compromised server.
Putting Humans Back in the Loop with Elicitation
One of the more interesting recent additions to the specification is elicitation — a mechanism that allows a server to pause an operation mid-execution and request additional information or confirmation from the user. It's a formal bridge between fully autonomous execution and human oversight.
The use case is intuitive. If an agent is about to execute a tool that will delete a database record or send an email on behalf of a user, the server can use elicitation to surface a confirmation prompt before proceeding. The user sees the request, approves or rejects it, and the server continues or aborts accordingly. For anyone who has ever watched an AI agent confidently do the wrong thing, elicitation is a welcome addition to the toolkit.
Elicitation isn't universally supported yet, so servers must check for host compatibility during the initial handshake. If the host doesn't support it, the server must degrade gracefully — either rejecting the operation or falling back to safe default parameters. Servers must also never use elicitation to harvest sensitive data or bypass authentication flows. The feature is powerful precisely because it touches the user directly, which makes it a target for abuse if not implemented carefully.
Managing Versioning and Compatibility
Building an MCP server is not a one-time project. Servers evolve. The systems they integrate with change. The agents that consume them get updated. Managing compatibility across all of these moving parts requires a disciplined versioning strategy. Semantic versioning is the standard approach, with breaking changes clearly documented. During the initial connection handshake, the server publishes its full list of capabilities, allowing the host application to adapt its behavior programmatically based on what's actually available.
The broader ecosystem is also still maturing. Different host applications support different subsets of the specification. Features like OAuth 2.1 and structured content outputs may not be universally available. Servers that want to work across a wide range of environments need to implement graceful degradation and feature flags, ensuring they function correctly even when the host can't support every capability.
As the specification continues to evolve — the shift from Server-Sent Events to Streamable HTTP being a recent example — developers who treat their servers as living software rather than one-time builds will be much better positioned to take advantage of new capabilities as they land. The standard is governed by the Agentic AI Foundation under the Linux Foundation, which means changes go through a vendor-neutral process. That's a meaningful commitment to stability for anyone building on top of it.
Vetting Third-Party Servers
The open nature of the ecosystem is one of its greatest strengths and one of its most significant risks. Anyone can publish a server, and not all of them will meet enterprise security or performance standards. Organizations deploying third-party servers in production need a systematic evaluation process.
Namespace authentication in the registry provides a baseline of trust, but it only confirms that the publisher owns the associated domain or GitHub account — not that the code itself is safe or well-built. Source code review, dependency auditing, and load testing under realistic conditions are all necessary steps before trusting a third-party server with access to internal systems. Monitoring the server's resource consumption and error rates during testing reveals how it behaves under stress, which is often very different from how it behaves in a demo.
The discipline of evaluating and vetting these servers is becoming a genuine engineering competency. As the ecosystem grows and more organizations rely on MCP servers as critical infrastructure, the teams that build rigorous evaluation processes early will be the ones that avoid the expensive lessons later.
For organizations building on top of Sandgarden, this evaluation process is built into the platform. Sandgarden's infrastructure layer handles the authentication, observability, and rate limit management that production MCP deployments require, letting engineering teams focus on the domain logic of their servers rather than the plumbing. As the MCP ecosystem continues to mature, platforms that abstract away this operational complexity will play an increasingly important role in making the standard accessible to teams that don't have the bandwidth to build all of that infrastructure from scratch.


