iOS Development MCP Server

Overview Schema Related Servers Score Discussions

ios-mcp
skills
apple-documentation

FoundationModels-Using-on-device-LLM-in-your-app.md•12 KiB

# Foundation Models: Using Apple's On-Device LLM in Your Apps ## Overview Foundation Models is an Apple framework that provides access to on-device large language models (LLMs) that power Apple Intelligence. This framework enables developers to enhance their apps with generative AI capabilities without requiring cloud connectivity or compromising user privacy. Key capabilities include: - Text generation and understanding - Content summarization and extraction - Structured data generation - Custom tool integration ## Getting Started ### Check Model Availability Always check if the model is available before attempting to use it. Model availability depends on device factors such as Apple Intelligence support, system settings, and device state. ```swift struct GenerativeView: View { // Create a reference to the system language model private var model = SystemLanguageModel.default var body: some View { switch model.availability { case .available: // Show your intelligence UI Text("Model is available") case .unavailable(.deviceNotEligible): // Show an alternative UI Text("Device not eligible for Apple Intelligence") case .unavailable(.appleIntelligenceNotEnabled): // Ask the person to turn on Apple Intelligence Text("Please enable Apple Intelligence in Settings") case .unavailable(.modelNotReady): // The model isn't ready (downloading or other system reasons) Text("Model is downloading or not ready") case .unavailable(let other): // The model is unavailable for an unknown reason Text("Model unavailable: \(other)") } } } ``` ### Create a Session After confirming model availability, create a `LanguageModelSession` to interact with the model: ```swift // Create a basic session with the system model let session = LanguageModelSession() // Create a session with instructions let instructions = """ You are a helpful assistant that provides concise answers. Keep responses under 100 words and focus on clarity. """ let sessionWithInstructions = LanguageModelSession(instructions: instructions) ``` - For single-turn interactions, create a new session each time - For multi-turn interactions, reuse the same session to maintain context ## Basic Usage ### Provide Instructions to the Model Instructions help steer the model's behavior for your specific use case. The model prioritizes instructions over prompts. Good instructions typically specify: - The model's role (e.g., "You are a mentor") - What the model should do (e.g., "Help extract calendar events") - Style preferences (e.g., "Respond as briefly as possible") - Safety measures (e.g., "Respond with 'I can't help with that' for dangerous requests") ```swift let instructions = """ You are a cooking assistant. Provide recipe suggestions based on ingredients. Keep suggestions brief and practical for home cooks. Include approximate cooking time. """ let session = LanguageModelSession(instructions: instructions) ``` ### Provide a Prompt to the Model A prompt is the input that the model responds to. Effective prompts are: - Conversational (questions or commands) - Focused on a single, specific task - Clear about the desired output format and length ```swift // Simple prompt let prompt = "What's a good month to visit Paris?" // Specific prompt with output constraints let specificPrompt = "Write a profile for the dog breed Siberian Husky using three sentences." ``` ### Generate a Response Call the model asynchronously to get a response: ```swift // Basic response generation let response = try await session.respond(to: prompt) print(response.content) // With custom generation options let options = GenerationOptions(temperature: 0.7) let customResponse = try await session.respond(to: prompt, options: options) ``` Note: A session can only handle one request at a time. Check `isResponding` to verify the session is available before sending a new request. ## Advanced Features ### Guided Generation Guided generation allows you to receive model responses as structured Swift data instead of raw strings. This provides stronger guarantees about the format of the response. #### 1. Define a Generable Type ```swift @Generable(description: "Basic profile information about a cat") struct CatProfile { // A guide isn't necessary for basic fields var name: String @Guide(description: "The age of the cat", .range(0...20)) var age: Int @Guide(description: "A one sentence profile about the cat's personality") var profile: String } ``` #### 2. Request a Response in Your Custom Type ```swift // Generate a response using the custom type let catResponse = try await session.respond( to: "Generate a cute rescue cat", generating: CatProfile.self ) // Use the structured data print("Name: \(catResponse.content.name)") print("Age: \(catResponse.content.age)") print("Profile: \(catResponse.content.profile)") ``` #### 3. Printing a Response from your Custom Type When printing values from a LanguageModelSession.Response always use the instance property content. Not output. For example: ```swift import FoundationModels import Playgrounds @Generable struct CookbookSuggestions { @Guide(description: "Cookbook Suggestions", .count(3)) var suggestions: [String] } #Playground { let session = LanguageModelSession() let prompt = "What's a good name for a cooking app?" let response = try await session.respond( to: prompt, generating: CookbookSuggestions.self ) // Notice how print values come from content. Not output. print(response.content.suggestions) } ``` ### Tool Calling Tool calling allows the model to use custom code you provide to perform specific tasks, access external data, or integrate with other frameworks. #### 1. Create a Custom Tool ```swift // Define a tool for searching recipes struct RecipeSearchTool: Tool { struct Arguments: Codable { var searchTerm: String var numberOfResults: Int } func call(arguments: Arguments) async throws -> ToolOutput { // Search your recipe database let recipes = await searchRecipes(term: arguments.searchTerm, limit: arguments.numberOfResults) // Return results as a string the model can use return .string(recipes.map { "- \($0.name): \($0.description)" }.joined(separator: "\n")) } private func searchRecipes(term: String, limit: Int) async -> [Recipe] { // Implementation to search your database // ... } } ``` #### 2. Provide the Tool to a Session ```swift // Create the tool let recipeSearchTool = RecipeSearchTool() // Create a session with the tool let session = LanguageModelSession(tools: [recipeSearchTool]) // The model will automatically use the tool when appropriate let response = try await session.respond(to: "Find me some pasta recipes") ``` #### 3. Handle Tool Errors ```swift do { let answer = try await session.respond("Find a recipe for tomato soup.") } catch let error as LanguageModelSession.ToolCallError { // Access the name of the tool print(error.tool.name) // Access the underlying error if case .databaseIsEmpty = error.underlyingError as? RecipeSearchToolError { // Handle specific error } } catch { print("Other error: \(error)") } ``` ## Snapshot streaming - LLM generate text as short groups of characters called tokens. - Typically, when streaming tokens, tokens are delivered in what's called a delta. But Foundation Models does this different. - As deltas are produced, the responsibility for accumulating them usually falls on the developer - You append each delta as they come in. And the response grows as you do. But it gets tricky when the result has structure. - If you want to show the greeting string after each delta, you have to parse it out of the accumulation, and that's not trival, especially for complicated structures. - Structured output is at the core of the Foundation Model framework. Which is why we stream snapshots. ## Snapshot streaming - LLM generate text as short groups of characters called tokens. - Typically, when streaming tokens, tokens are delivered in what's called a delta. But Foundation Models does this different. - As deltas are produced, the responsibility for accumulating them usually falls on the developer - You append each delta as they come in. And the response grows as you do. But it gets tricky when the result has structure. - If you want to show the greeting string after each delta, you have to parse it out of the accumulation, and that's not trival, especially for complicated structures. - Structured output is at the core of the Foundation Model framework. Which is why we stream snapshots. ### What are snapshots - Snapshots represent partically generated response. Their properties are all optinoal. And they get filled in as the model produces more of the response. - Snapshots are a robust and convenient representation for streaming structure output. - You are already familar with the `@Generable` macro, and as it turns out, it's also where the definitions for partially generated types come from. - If you expand the macro, you'll discover it produces a types named `PartiallyGenerated`. It is effectively a mirror of the outer structure except every property is optional. - The partically generated type comes into play when you call the 'streamResponse` method on your session. ```swift import FoundationModels import Playgrounds @Generable struct TripIdeas { @Guide(description: "Ideas for upcoming trips") var ideas: [String] } #Playground { let session = LanguageModelSession() let prompt = "What are some exciting trip ideas for the upcoming year?" let stream = session.streamResponse( to: prompt, generating: TripIdeas.self ) for try await partial in stream { print(partial) } } ``` - Stream response returns an async sequence. And the elements of that sequence are instances of a partially generated type. - Each element in the sequence will contain an updated snapshot. - These snapshots work great with declarative frameworks like SwiftUI. - First, create state holding a partially generated type. - Then, just iterate over a response stream, stores its elements, and watch as your UI comes to life. ## Best Practices and Limitations ### Context Size Limits - The system model supports up to 4,096 tokens per session - A token is roughly 3-4 characters in languages like English - All instructions, prompts, and outputs count toward this limit - If you exceed the limit, you'll get a `LanguageModelSession.GenerationError.exceededContextWindowSize` error - For large data processing, break it into smaller chunks across multiple sessions ### Optimizing Performance - Use `GenerationOptions` to tune model behavior: ```swift let options = GenerationOptions(temperature: 2.0) // Higher temperature = more creative ``` - Use Xcode Instruments to monitor request performance - Access `Transcript` entries to see model actions during a session: ```swift let transcript = session.transcript ``` ### Prompt Engineering Tips - Be specific about what you want - Specify output constraints (e.g., "in three sentences") - Break complex tasks into multiple simple prompts - Use examples in instructions to guide the model's output format ## References - [Generating content and performing tasks with Foundation Models](https://developer.apple.com/documentation/FoundationModels/generating-content-and-performing-tasks-with-foundation-models) - [Generating Swift data structures with guided generation](https://developer.apple.com/documentation/FoundationModels/generating-swift-data-structures-with-guided-generation) - [Expanding generation with tool calling](https://developer.apple.com/documentation/FoundationModels/expanding-generation-with-tool-calling) - [Human Interface Guidelines: Generative AI](https://developer.apple.com/design/human-interface-guidelines/technologies/generative-ai)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dagba/ios-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

FoundationModels-Using-on-device-LLM-in-your-app.md•12 KiB