diff --git a/03-GettingStarted/12-sampling/README.md b/03-GettingStarted/12-sampling/README.md new file mode 100644 index 000000000..28a94ceba --- /dev/null +++ b/03-GettingStarted/12-sampling/README.md @@ -0,0 +1,353 @@ +# Sampling - delegate features to the Client + +Sometimes, you need the MCP Client and the MCP Server to collaborate to achieve a common goal. You might have a case where the Server requires the help of an LLM that sits on the client. For this situation, sampling is what you should use. + +Let's explore some use cases and how to build a solution involving sampling. + +## Overview + +In this lesson, we focus on explaining when and where to use Sampling and how to configure it. + +## Learning Objectives + +In this chapter, we will: + +- Explain what Sampling is and when to use it. +- Show how to configure Sampling in MCP. +- Provide examples of Sampling in action. + +## What is Sampling and why use it? + +Sampling is an davanced features that works in the following way: + +```mermaid +sequenceDiagram + participant User + participant MCP Client + participant LLM + participant MCP Server + + User->>MCP Client: Author blog post + MCP Client->>MCP Server: Tool call (blog post draft) + MCP Server->>MCP Client: Sampling request (create summary) + MCP Client->>LLM: Generate blog post summary + LLM->>MCP Client: Summary result + MCP Client->>MCP Server: Sampling response (summary) + MCP Server->>MCP Client: Complete blog post (draft + summary) + MCP Client->>User: Blog post ready +``` + +### Sampling request + +Ok, now we have a mile high view of a credible scenario, let's talk about the sampling request the server sends back to the client. Here's what such a request can look like in JSON-RPC format: + +```json +{ + "jsonrpc": "2.0", + "id": 1, + "method": "sampling/createMessage", + "params": { + "messages": [ + { + "role": "user", + "content": { + "type": "text", + "text": "Create a blog post summary of the following blog post: " + } + } + ], + "modelPreferences": { + "hints": [ + { + "name": "claude-3-sonnet" + } + ], + "intelligencePriority": 0.8, + "speedPriority": 0.5 + }, + "systemPrompt": "You are a helpful assistant.", + "maxTokens": 100 + } +} +``` + +There's a few things here worth calling out: + +- Prompt, under content -> text, is our prompt that is an instruction for the LLM to summarize blog post content. + +- **modelPreferences**. This section is just that, a preference, a recommendation of what configuration to use with the LLM. The user can choose whether to go with these recommendations or change them. In this case there are recommendations on model to use and speed and intelligence priority. +- **systemPrompt**, this is your normal system prompt that gives your LLM a personaly and contains guidance instructions. +- **maxTokens**, this is another property that's used to say how many tokens is recommended to use for this task. + +### Sampling response + +This response is what the MCP Client ends up sending back to the the MCP Server and is the result of the client calling the LLM, wait for that response and then construct this message. Here's what it can look like in JSON-RPC: + +```json +{ + "jsonrpc": "2.0", + "id": 1, + "result": { + "role": "assistant", + "content": { + "type": "text", + "text": "Here's your abstract " + }, + "model": "gpt-5", + "stopReason": "endTurn" + } +} +``` + +Note how the response is an abstract of the blog post just like we asked for. Also note how the used `model` isn't what we asked for but "gpt-5" over "claude-3-sonnet". This is to illustrate that the user can change their mind on what to use and that your sampling request is a recommendation. + +Ok, now that we understand the main flow, and useful task to use it for "blog post creation + abstract", let's see what we need to do to get it to work. + +### Message types + +Sampling messages aren't constrained to just text but you can also send, images and audio. Here's how the JSON-RPC looks different: + +**Text** + +```json +{ + "type": "text", + "text": "The message content" +} +``` + +**Image content** + +```json +{ + "type": "image", + "data": "base64-encoded-image-data", + "mimeType": "image/jpeg" +} +``` + +**Audio content** + +```json +{ + "type": "audio", + "data": "base64-encoded-audio-data", + "mimeType": "audio/wav" +} +``` + +> NOTE: for more detailed info on Sampling, check out the [official docs](https://modelcontextprotocol.io/specification/2025-06-18/client/sampling) + +## How to Configure Sampling in the Client + +> Note: if you're only building building a server, you don't need to do much here. + +In a client, you need to specify the following feature like so: + +```json +{ + "capabilities": { + "sampling": {} + } +} +``` + +This will then be picked up when your chosen client initializes with the server. + +## Example of Sampling in Action - Create a Blog Post + +Let's code a sampling server together, we will need to do the following: + +1. Create a tool on the Server. +1. Said tool should create a sampling request +1. Tool should wait for the clients sampling request to be answered. +1. Then the tool result should be produced. + +Let's see the code step by step: + +### -1- Create the tool + +**python** + +```python +@mcp.tool() +async def create_blog(title: str, content: str, ctx: Context[ServerSession, None]) -> str: + """Create a blog post and generate a summary""" + +``` + +### -2- Create a sampling request + +Extend your tool with the following code: + +**python** + +```python +post = BlogPost( + id=len(posts) + 1, + title=title, + content=content, + abstract="" + ) + +prompt = f"Create an abstract of the following blog post: title: {title} and draft: {content} " + +result = await ctx.session.create_message( + messages=[ + SamplingMessage( + role="user", + content=TextContent(type="text", text=prompt), + ) + ], + max_tokens=100, +) + +``` + +### -3- Wait for the response and return response + +**python** + +```python +post.abstract = result.content.text + +posts.append(post) + +# return the complete product +return json.dumps({ + "id": post.title, + "abstract": post.abstract +}) +``` + +### -4- Full code + +**python** + +```python +from starlette.applications import Starlette +from starlette.routing import Mount, Host + +from mcp.server.fastmcp import Context, FastMCP + +from mcp.server.session import ServerSession +from mcp.types import SamplingMessage, TextContent + +import json + + +from uuid import uuid4 +from typing import List +from pydantic import BaseModel + + +mcp = FastMCP("Blog post generator") + +# app = FastAPI() + +posts = [] + +class BlogPost(BaseModel): + id: int + title: str + content: str + abstract: str + +posts: List[BlogPost] = [] + +@mcp.tool() +async def create_blog(title: str, content: str, ctx: Context[ServerSession, None]) -> str: + """Create a blog post and generate a summary""" + + post = BlogPost( + id=len(posts) + 1, + title=title, + content=content, + abstract="" + ) + + prompt = f"Create an abstract of the following blog post: title: {title} and draft: {content} " + + result = await ctx.session.create_message( + messages=[ + SamplingMessage( + role="user", + content=TextContent(type="text", text=prompt), + ) + ], + max_tokens=100, + ) + + post.abstract = result.content.text + + posts.append(post) + + # return the complete blog post + return json.dumps({ + "id": post.title, + "abstract": post.abstract + }) + +if __name__ == "__main__": + print("Starting server...") + # mcp.run() + mcp.run(transport="streamable-http") + +# run app with: python server.py +``` + +### -5- Testing it in Visual Studio Code + +To test this out in Visual Studio Code, do the following: + +1. Start server in terminal +1. Add it to *mcp.json* (and ensure it's started) e.g something like so: + + ```json + "servers": { + "blog-server": { + "type": "http", + "url": "http://localhost:8000/mcp" + } + } + ``` + +1. Type a prompt: + + ```text + create a blog post named "Where Python comes from", the content is "Python is actually named after Monty Python Flying Circus" + ``` + +1. Allow sampling to happen. First time you test this you will be presented with an additional dialog you will need to accept, then you will see the normal dialog for asking you to run a tool + +1. Inspect results. You will see the results both nicely rendered in GitHub Copilot Chat but you can also inspect the raw JSON response. + +**Bonus**. Visual Studio Code tooling has great support for sampling. You can configure Sampling access on your installed server by navigating to it like so: + +1. Navigate to extension section. +1. Select the cog icon for your installed server in the "MCP SERVERS - INSTALLED" section. +1 Select "Configure Model Access", here you can select which Models GitHub Copilot is allowed to use when performing sampling. You can also see all sampling requests that happened lately by selecting "Show Sampling requests". + +## Assignment + +In this assignment, you will build a slightly different Sampling namely a sampling integration that supports generating a product description. Here's your scenario: + +**Scenario**: The back office worker at an e-commerce needs help, it takes way too much time to generate product descriptions. Therefore, you are to build a solution where you can call a tool "create_product" with "title" and "keywords" as argument and it should produce a complete product including a "description" field that should be populated by a client's LLM. + +TIP: use what you learned earlier how to construct this server and its tool using a sampling request. + +## Solution + +[Solution](./solution/README.md) + +## Key Takeaways + +Sampling is a powerful feature that allows the server to delegate tasks to the client when it needs the help of an LLM. + +## What's Next + +TODO + + + + diff --git a/03-GettingStarted/12-sampling/code/python/README.md b/03-GettingStarted/12-sampling/code/python/README.md new file mode 100644 index 000000000..c980f1cc8 --- /dev/null +++ b/03-GettingStarted/12-sampling/code/python/README.md @@ -0,0 +1,49 @@ +# Run the sample + +## Create virtual environment + +```sh +python -m venv venv +source ./venv/bin/activate +``` + +## Install dependencies + +```sh +pip install "mcp[cli]" +``` + +## Run the server + +```sh +uvicorn server:app --port 8000 +``` + +## Test the server out with GitHub Copilot and VS Code + +Add the entry to mcp.json like so: + +```json +"servers": { + "my-mcp-server-999e9ea3": { + "url": "http://localhost:8001/sse", + "type": "http" + } +} +``` + +Make sure you click "start" on the server. + +In GitHub Copilot paste the following prompt: + +```text +create a blog post named "Where Python comes from", the content is "Python is actually named after Monty Python Flying Circus" +``` + +The first time you will be asked whether to accept a Sampling action, then you will be asked to accept the tool to run "create_blog". You should see a response similar to: + +```json +{ + "result": "{\"id\": \"Where Python comes from\", \"abstract\": \"# Python's Origin\\n\\nPython, the popular programming language, derives its name from **Monty Python's Flying Circus**, the British comedy troupe, rather than the snake. This naming choice reflects the creator's desire to make programming more fun and accessible.\"}" +} +``` \ No newline at end of file diff --git a/03-GettingStarted/12-sampling/code/python/server.py b/03-GettingStarted/12-sampling/code/python/server.py new file mode 100644 index 000000000..6abf8b1d9 --- /dev/null +++ b/03-GettingStarted/12-sampling/code/python/server.py @@ -0,0 +1,69 @@ +from starlette.applications import Starlette +from starlette.routing import Mount, Host + +from mcp.server.fastmcp import Context, FastMCP + +from mcp.server.session import ServerSession +from mcp.types import SamplingMessage, TextContent + +import json + + +from uuid import uuid4 +from typing import List +from pydantic import BaseModel + + +mcp = FastMCP("Blog post generator") + +# app = FastAPI() + +posts = [] + +class BlogPost(BaseModel): + id: int + title: str + content: str + abstract: str + +posts: List[BlogPost] = [] + +@mcp.tool() +async def create_blog(title: str, content: str, ctx: Context[ServerSession, None]) -> str: + """Create a blog post and generate a summary""" + + post = BlogPost( + id=len(posts) + 1, + title=title, + content=content, + abstract="" + ) + + prompt = f"Create an abstract of the following blog post: title: {title} and draft: {content} " + + result = await ctx.session.create_message( + messages=[ + SamplingMessage( + role="user", + content=TextContent(type="text", text=prompt), + ) + ], + max_tokens=100, + ) + + post.abstract = result.content.text + + posts.append(post) + + # return the complete blog post + return json.dumps({ + "id": post.title, + "abstract": post.abstract + }) + +if __name__ == "__main__": + print("Starting server...") + # mcp.run() + mcp.run(transport="streamable-http") + +# run app with: python server.py \ No newline at end of file diff --git a/03-GettingStarted/12-sampling/solution/README.md b/03-GettingStarted/12-sampling/solution/README.md new file mode 100644 index 000000000..7b8127226 --- /dev/null +++ b/03-GettingStarted/12-sampling/solution/README.md @@ -0,0 +1,4 @@ +Solutions: + +- [Python](./python/README.md) +- [TypeScript](./typescript/README.md) \ No newline at end of file diff --git a/03-GettingStarted/12-sampling/solution/python/README.md b/03-GettingStarted/12-sampling/solution/python/README.md new file mode 100644 index 000000000..9e68286d9 --- /dev/null +++ b/03-GettingStarted/12-sampling/solution/python/README.md @@ -0,0 +1,48 @@ +# Running this sample + +You're recommended to install `uv` but it's not a must, see [instructions](https://docs.astral.sh/uv/#highlights) + +## -0- Create a virtual environment + +```bash +python -m venv venv +``` + +## -1- Activate the virtual environment + +```bash +venv\Scripts\activate +``` + +## -2- Install the dependencies + +```bash +pip install "mcp[cli]" +``` + +## -3- Run the sample + + +```bash +mcp run server.py +``` + +## -4- Test the sample + +Run the server with the command: + +Add it to *mcp.json* like so: + +```json +``` + +Start the server + +Type the following prompt: + +```text +prompt +``` + +You should see output similar to: + diff --git a/03-GettingStarted/12-sampling/solution/python/client.py b/03-GettingStarted/12-sampling/solution/python/client.py new file mode 100644 index 000000000..bab62ab4f --- /dev/null +++ b/03-GettingStarted/12-sampling/solution/python/client.py @@ -0,0 +1,121 @@ +""" +cd to the `examples/snippets/clients` directory and run: + uv run client +""" + +import asyncio +import os + +from pydantic import AnyUrl + +from mcp import ClientSession, StdioServerParameters, types +from mcp.client.stdio import stdio_client +from mcp.shared.context import RequestContext + +import os +from openai import OpenAI + +# Create server parameters for stdio connection +server_params = StdioServerParameters( + command="python", # Using python to run the server + args=["sample-server.py"] +) + +async def call_llm(prompt: str, system_prompt: str) -> str: + client = OpenAI( + base_url="https://models.github.ai/inference", + api_key=os.environ["GITHUB_TOKEN"], +) + + response = client.chat.completions.create( + messages=[ + { + "role": "system", + "content": system_prompt, + }, + { + "role": "user", + "content": prompt, + } + ], + model="openai/gpt-4o-mini", + temperature=1, + max_tokens=200, + top_p=1 + ) + + return response.choices[0].message.content + + +# Optional: create a sampling callback +async def handle_sampling_message( + context: RequestContext[ClientSession, None], params: types.CreateMessageRequestParams +) -> types.CreateMessageResult: + print(f"Sampling request: {params.messages}") + + message = params.messages[0].content.text + + # todo, call an actual llm and change below + response = await call_llm(message, "You're a helpful assistant, keep to the topic, don't make things up too much but definitely create a compelling product description") + + return types.CreateMessageResult( + role="assistant", + content=types.TextContent( + type="text", + text=response, + ), + model="gpt-3.5-turbo", + stopReason="endTurn", + ) + + +async def run(): + async with stdio_client(server_params) as (read, write): + async with ClientSession(read, write, sampling_callback=handle_sampling_message) as session: + # Initialize the connection + await session.initialize() + + # List available prompts + # prompts = await session.list_prompts() + # print(f"Available prompts: {[p.name for p in prompts.prompts]}") + + # # Get a prompt (greet_user prompt from fastmcp_quickstart) + # if prompts.prompts: + # prompt = await session.get_prompt("greet_user", arguments={"name": "Alice", "style": "friendly"}) + # print(f"Prompt result: {prompt.messages[0].content}") + + # # List available resources + # resources = await session.list_resources() + # print(f"Available resources: {[r.uri for r in resources.resources]}") + + # List available tools + # tools = await session.list_tools() + # print(f"Available tools: {[t.name for t in tools.tools]}") + + # # Read a resource (greeting resource from fastmcp_quickstart) + # resource_content = await session.read_resource(AnyUrl("greeting://World")) + # content_block = resource_content.contents[0] + # if isinstance(content_block, types.TextContent): + # print(f"Resource content: {content_block.text}") + + # Call a tool (create_product tool from fastmcp_quickstart) + result = await session.call_tool("create_product", arguments={"product_name": "paprika", "keywords": "red, juicy, vegetable"}) + print("result:", result.content[0].text) + + result = await session.call_tool("get_products", arguments={}) + print("result:", result.content[0].text) + + # result_unstructured = result.content[0] + # if isinstance(result_unstructured, types.TextContent): + # print(f"Tool result: {result_unstructured.text}") + # result_structured = result.structuredContent + # print(f"Structured tool result: {result_structured}") + + +def main(): + """Entry point for the client script.""" + asyncio.run(run()) + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/03-GettingStarted/12-sampling/solution/python/server.py b/03-GettingStarted/12-sampling/solution/python/server.py new file mode 100644 index 000000000..ff9069fff --- /dev/null +++ b/03-GettingStarted/12-sampling/solution/python/server.py @@ -0,0 +1,75 @@ +from starlette.applications import Starlette +from starlette.routing import Mount, Host + +from mcp.server.fastmcp import Context, FastMCP + +from mcp.server.session import ServerSession +from mcp.types import SamplingMessage, TextContent + +import json + + +from uuid import uuid4 +from typing import List +from pydantic import BaseModel + + +mcp = FastMCP("My App") + +class Product(BaseModel): + id: int + name: str + description: str + + def __init__(self, name: str, description: str): + super().__init__( + id=len(products) + 1, + name=name, + description=description + ) + +products: List[Product] = [] + +@mcp.tool() +async def create_product(product_name: str, keywords: str, ctx: Context[ServerSession, None]) -> str: + """Create a product and generate a product description using LLM sampling.""" + + product = Product(name=product_name, description="") + + prompt = f"Create a product description about {product_name} described by as {keywords}" + + result = await ctx.session.create_message( + messages=[ + SamplingMessage( + role="user", + content=TextContent(type="text", text=prompt), + ) + ], + max_tokens=100, + ) + + + product.description = result.content.text + + products.append(product) + + # return the complete product + return json.dumps({ + "id": product.id, + "name": product.name, + "description": product.description + }) + +if __name__ == "__main__": + print("Starting server...") + mcp.run() + + +# Mount the SSE server to the existing ASGI server +app = Starlette( + routes=[ + Mount('/', app=mcp.sse_app()), + ] +) + +# run app with: uvicorn 03-GettingStarted/12-sampling/solution/python/server:app --port 8000 \ No newline at end of file diff --git a/03-GettingStarted/12-sampling/solution/typescript/README.md b/03-GettingStarted/12-sampling/solution/typescript/README.md new file mode 100644 index 000000000..e69de29bb