Amazon Bedrock Knowledge Base MCP Server

create_data_source

Add an S3 data source to an Amazon Bedrock Knowledge Base, specifying bucket location, optional folder prefixes, and custom parsing or chunking strategies for document processing.

Instructions

Knowledge Baseにデータソースを作成します。

データソースは、Knowledge Baseがデータを取得する場所を定義します。 S3バケットを指定し、必要に応じて特定のプレフィックス（フォルダ）のみを含めることができます。パーシング設定とチャンキング設定を指定することで、データソースごとに異なる処理方法を適用できます。

注意: Knowledge Baseのstorage_typeがS3_VECTORSでも、データソースのtypeは常にS3になります。これらは異なる概念です：

storage_type: Knowledge Baseのストレージ設定（S3またはS3_VECTORS）
dataSourceConfiguration.type: データソースのタイプ（S3, WEB, CONFLUENCEなど）

Args: knowledge_base_id: データソースを追加するKnowledge BaseのID name: データソースの名前（1-100文字） source_type: データソースタイプ（現在は'S3'のみサポート、デフォルト: 'S3'） bucket_arn: データソースとして使用するS3バケットのARN（arn:aws:s3:::BUCKET_NAME形式） inclusion_prefixes: 含めるS3プレフィックスのカンマ区切り文字列（オプション）例: "documents/,images/" のように複数のプレフィックスを指定可能空文字列の場合はバケット内のすべてのオブジェクトが対象

パーシング設定（オプション）:
parsing_strategy: パーシング戦略
    - 'BEDROCK_FOUNDATION_MODEL': Foundation Modelを使用したパーシング
      （マルチモーダルデータ（画像、表、グラフなど）を処理可能、プロンプトカスタマイズ可能）
    - 'BEDROCK_DATA_AUTOMATION': Bedrock Data Automationを使用したパーシング
      （マルチモーダルデータを処理可能、完全マネージド、追加プロンプト不要）
    注意: 指定しない場合はKnowledge Baseのデフォルト設定が使用されます
parsing_model_arn: Foundation ModelのARN（parsing_strategy='BEDROCK_FOUNDATION_MODEL'の場合必須）
    例: "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
    サポートされているモデル: Claude 3 Sonnet, Claude 3 Opus, Claude 3 Haikuなど
parsing_modality: マルチモーダル設定
    - 'MULTIMODAL': テキストと画像の両方を処理（オプション）
parsing_prompt_text: パーシングプロンプトのテキスト（オプション）
    Foundation Modelにドキュメントの解釈方法を指示するテキスト
    例: "Extract all text, tables, and figures from this document."

チャンキング設定（オプション）:
chunking_strategy: チャンキング戦略
    - 'FIXED_SIZE': 固定サイズのチャンクに分割（推奨: max_tokens=1000, overlap_percentage=20）
    - 'HIERARCHICAL': 階層的なチャンクに分割（大きなチャンクと小さなチャンクの2層）
    - 'SEMANTIC': セマンティックなチャンクに分割（NLPを使用して類似コンテンツでグループ化）
    - 'NONE': チャンクに分割しない（各ファイルが1つのチャンクとして扱われる）
    注意: 指定しない場合はKnowledge Baseのデフォルト設定が使用されます
chunking_max_tokens: 最大トークン数（chunking_strategy='FIXED_SIZE'または'SEMANTIC'の場合に使用）
    - FIXED_SIZE: 1以上（推奨: 500-2000）
    - SEMANTIC: 1以上（推奨: 1000-3000）
chunking_overlap_percentage: オーバーラップ率（chunking_strategy='FIXED_SIZE'の場合に使用）
    範囲: 1-99（推奨: 10-30）
    隣接するチャンク間で重複するトークンの割合
chunking_overlap_tokens: オーバーラップトークン数（chunking_strategy='HIERARCHICAL'の場合に使用）
    階層チャンキングで使用する重複トークン数
chunking_buffer_size: バッファサイズ（chunking_strategy='SEMANTIC'の場合に使用）
    範囲: 0-1（推奨: 1）
    文を比較する際の移動コンテキストウィンドウのサイズ
chunking_breakpoint_threshold: ブレークポイントのパーセンタイル閾値（chunking_strategy='SEMANTIC'の場合に使用）
    範囲: 50-99（推奨: 80-95）
    チャンクを分割するための類似度閾値（低いほど多くのチャンクが作成される）

Returns: DataSourceResponseDict: データソースの作成結果 - data_source_id: 作成されたデータソースのID - status: データソースのステータス（'CREATING', 'ACTIVE', 'FAILED'など）

Raises: ValueError: 入力値が無効な場合（source_typeが無効、バリデーションエラーなど） - source_typeが'S3'以外の場合 - parsing_strategy='BEDROCK_FOUNDATION_MODEL'でparsing_model_arnが指定されていない場合 - ARN形式が無効な場合

Examples: # 基本的なデータソースの作成（デフォルト設定） create_data_source( knowledge_base_id="KB123", name="My Data Source", source_type="S3", bucket_arn="arn:aws:s3:::my-bucket" )

# 特定のプレフィックスのみを含めるデータソース
create_data_source(
    knowledge_base_id="KB123",
    name="Documents Only",
    source_type="S3",
    bucket_arn="arn:aws:s3:::my-bucket",
    inclusion_prefixes="documents/,pdfs/"
)

# カスタムパーシングとチャンキング設定を使用
create_data_source(
    knowledge_base_id="KB123",
    name="Custom Data Source",
    source_type="S3",
    bucket_arn="arn:aws:s3:::my-bucket",
    parsing_strategy="BEDROCK_FOUNDATION_MODEL",
    parsing_model_arn="arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0",
    parsing_modality="MULTIMODAL",
    chunking_strategy="FIXED_SIZE",
    chunking_max_tokens=1000,
    chunking_overlap_percentage=20
)

Input Schema

TableJSON Schema

Name	Required	Default
`knowledge_base_id`	Yes
`name`	Yes
`source_type`	No	S3
`bucket_arn`	No
`inclusion_prefixes`	No
`parsing_strategy`	No
`parsing_model_arn`	No
`parsing_modality`	No
`parsing_prompt_text`	No
`chunking_strategy`	No
`chunking_max_tokens`	No
`chunking_overlap_percentage`	No
`chunking_overlap_tokens`	No
`chunking_buffer_size`	No
`chunking_breakpoint_threshold`	No

Output Schema

TableJSON Schema

Name	Required	Description	Default
`status`	Yes
`data_source_id`	Yes

Implementation Reference

src/bedrock_kb_mcp_server/bedrock_client.py:337-385 (handler)

The actual implementation that calls the Bedrock API to create the data source.

def create_data_source(
    self,
    knowledge_base_id: str,
    name: str,
    data_source_configuration: Dict[str, Any],
    vector_ingestion_configuration: Optional[Dict[str, Any]] = None,
) -> DataSourceResponseDict:
    """
    Knowledge Baseにデータソースを作成します。
    
    データソースは、Knowledge Baseがデータを取得する場所を定義します。
    通常はS3バケットを指定します。

    Args:
        knowledge_base_id: データソースを追加するKnowledge BaseのID
        name: データソースの名前
        data_source_configuration: データソース設定の辞書
            - type: データソースタイプ（現在は"S3"のみサポート）
            - s3Configuration: S3設定
                - bucketArn: S3バケットARN
                - inclusionPrefixes (オプション): 含めるS3プレフィックスのリスト
        vector_ingestion_configuration: ベクトル取り込み設定の辞書（オプション）
            - parsingConfiguration: パーシング設定（オプション）
            - chunkingConfiguration: チャンキング設定（オプション）
    
    Returns:
        DataSourceResponseDict: データソースの作成結果
            - data_source_id: 作成されたデータソースのID
            - status: データソースのステータス
    
    Raises:
        ClientError: AWS API呼び出しが失敗した場合
    """
    try:
        # API呼び出しパラメータを構築
        api_params = {
            "knowledgeBaseId": knowledge_base_id,
            "name": name,
            "dataSourceConfiguration": data_source_configuration,
        }
        
        # ベクトル取り込み設定が指定されている場合は追加
        # パーシング設定やチャンキング設定を含むことができます
        if vector_ingestion_configuration:
            api_params["vectorIngestionConfiguration"] = vector_ingestion_configuration
        
        # AWS Bedrock APIを呼び出してデータソースを作成
        response = self.bedrock_agent.create_data_source(**api_params)

src/bedrock_kb_mcp_server/main.py:518-605 (handler)

The MCP tool handler function which invokes the bedrock_client.create_data_source logic. Note: The code provided for handle_create_data_source spans from line 518 to 722 in main.py.

def create_data_source(
    knowledge_base_id: str,
    name: str,
    source_type: str = "S3",
    bucket_arn: str = "",
    inclusion_prefixes: str = "",
    # パーシング設定（オプション）
    parsing_strategy: str = "",
    parsing_model_arn: str = "",
    parsing_modality: str = "",
    parsing_prompt_text: str = "",
    # チャンキング設定（オプション）
    chunking_strategy: str = "",
    chunking_max_tokens: int = 0,
    chunking_overlap_percentage: int = 0,
    chunking_overlap_tokens: int = 0,
    chunking_buffer_size: int = 0,
    chunking_breakpoint_threshold: int = 0,
) -> DataSourceResponseDict:
    """
    Knowledge Baseにデータソースを作成します。
    
    データソースは、Knowledge Baseがデータを取得する場所を定義します。
    S3バケットを指定し、必要に応じて特定のプレフィックス（フォルダ）のみを
    含めることができます。パーシング設定とチャンキング設定を指定することで、
    データソースごとに異なる処理方法を適用できます。
    
    注意: Knowledge Baseの`storage_type`が`S3_VECTORS`でも、データソースの`type`は
    常に`S3`になります。これらは異なる概念です：
    - `storage_type`: Knowledge Baseのストレージ設定（S3またはS3_VECTORS）
    - `dataSourceConfiguration.type`: データソースのタイプ（S3, WEB, CONFLUENCEなど）

    Args:
        knowledge_base_id: データソースを追加するKnowledge BaseのID
        name: データソースの名前（1-100文字）
        source_type: データソースタイプ（現在は'S3'のみサポート、デフォルト: 'S3'）
        bucket_arn: データソースとして使用するS3バケットのARN（arn:aws:s3:::BUCKET_NAME形式）
        inclusion_prefixes: 含めるS3プレフィックスのカンマ区切り文字列（オプション）
            例: "documents/,images/" のように複数のプレフィックスを指定可能
            空文字列の場合はバケット内のすべてのオブジェクトが対象
        
        パーシング設定（オプション）:
        parsing_strategy: パーシング戦略
            - 'BEDROCK_FOUNDATION_MODEL': Foundation Modelを使用したパーシング
              （マルチモーダルデータ（画像、表、グラフなど）を処理可能、プロンプトカスタマイズ可能）
            - 'BEDROCK_DATA_AUTOMATION': Bedrock Data Automationを使用したパーシング
              （マルチモーダルデータを処理可能、完全マネージド、追加プロンプト不要）
            注意: 指定しない場合はKnowledge Baseのデフォルト設定が使用されます
        parsing_model_arn: Foundation ModelのARN（parsing_strategy='BEDROCK_FOUNDATION_MODEL'の場合必須）
            例: "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
            サポートされているモデル: Claude 3 Sonnet, Claude 3 Opus, Claude 3 Haikuなど
        parsing_modality: マルチモーダル設定
            - 'MULTIMODAL': テキストと画像の両方を処理（オプション）
        parsing_prompt_text: パーシングプロンプトのテキスト（オプション）
            Foundation Modelにドキュメントの解釈方法を指示するテキスト
            例: "Extract all text, tables, and figures from this document."
        
        チャンキング設定（オプション）:
        chunking_strategy: チャンキング戦略
            - 'FIXED_SIZE': 固定サイズのチャンクに分割（推奨: max_tokens=1000, overlap_percentage=20）
            - 'HIERARCHICAL': 階層的なチャンクに分割（大きなチャンクと小さなチャンクの2層）
            - 'SEMANTIC': セマンティックなチャンクに分割（NLPを使用して類似コンテンツでグループ化）
            - 'NONE': チャンクに分割しない（各ファイルが1つのチャンクとして扱われる）
            注意: 指定しない場合はKnowledge Baseのデフォルト設定が使用されます
        chunking_max_tokens: 最大トークン数（chunking_strategy='FIXED_SIZE'または'SEMANTIC'の場合に使用）
            - FIXED_SIZE: 1以上（推奨: 500-2000）
            - SEMANTIC: 1以上（推奨: 1000-3000）
        chunking_overlap_percentage: オーバーラップ率（chunking_strategy='FIXED_SIZE'の場合に使用）
            範囲: 1-99（推奨: 10-30）
            隣接するチャンク間で重複するトークンの割合
        chunking_overlap_tokens: オーバーラップトークン数（chunking_strategy='HIERARCHICAL'の場合に使用）
            階層チャンキングで使用する重複トークン数
        chunking_buffer_size: バッファサイズ（chunking_strategy='SEMANTIC'の場合に使用）
            範囲: 0-1（推奨: 1）
            文を比較する際の移動コンテキストウィンドウのサイズ
        chunking_breakpoint_threshold: ブレークポイントのパーセンタイル閾値（chunking_strategy='SEMANTIC'の場合に使用）
            範囲: 50-99（推奨: 80-95）
            チャンクを分割するための類似度閾値（低いほど多くのチャンクが作成される）

    Returns:
        DataSourceResponseDict: データソースの作成結果
            - data_source_id: 作成されたデータソースのID
            - status: データソースのステータス（'CREATING', 'ACTIVE', 'FAILED'など）
    
    Raises:
        ValueError: 入力値が無効な場合（source_typeが無効、バリデーションエラーなど）
            - source_typeが'S3'以外の場合
            - parsing_strategy='BEDROCK_FOUNDATION_MODEL'でparsing_model_arnが指定されていない場合

src/bedrock_kb_mcp_server/models.py:434-455 (schema)

The Pydantic model for validating the CreateDataSource MCP tool input.

class CreateDataSourceRequest(BaseModel):
    """
    データソース作成リクエストのバリデーションモデル
    
    データソースを作成する際に必要なパラメータを定義し、
    入力値の検証を行います。
    
    Attributes:
        knowledge_base_id: データソースを追加するKnowledge BaseのID
        name: データソースの名前（1-100文字）
        source_type: データソースタイプ（現在はS3のみサポート、デフォルト: S3）
        bucket_arn: データソースとして使用するS3バケットのARNまたはS3 URI
        inclusion_prefixes: 含めるS3プレフィックスのカンマ区切り文字列（オプション）
        vector_ingestion_configuration: ベクトル取り込み設定（オプション）
    """
    # Knowledge Base ID: 必須
    knowledge_base_id: str = Field(..., description="Knowledge Base ID")
    
    # データソース名: 必須、1-100文字
    name: str = Field(..., min_length=1, max_length=100, description="データソース名")
    
    # データソースタイプ: デフォルトはS3

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It does well by explaining the creation operation, clarifying conceptual distinctions, noting current limitations (only S3 supported), and documenting error conditions (ValueError cases). It could improve by mentioning authentication requirements or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (overview, args, parsing settings, chunking settings, returns, raises, examples). While comprehensive, it's appropriately sized for a complex tool with many parameters. Some sections could be more concise, but overall it's well-organized and front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (15 parameters, 0% schema coverage, no annotations), the description provides comprehensive documentation. It covers purpose, parameters, return values, error conditions, and includes practical examples. The presence of an output schema means the description doesn't need to fully explain return values, and it appropriately focuses on usage guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage and 15 parameters, the description provides extensive parameter documentation beyond what the schema offers. It explains each parameter's purpose, provides examples, notes optional vs required status, documents dependencies between parameters, and gives recommended values for many settings.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a data source for a Knowledge Base, specifying it defines where data is retrieved from. It distinguishes from siblings by focusing on data source creation rather than other operations like listing, retrieving, or updating knowledge bases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool (to create data sources for Knowledge Bases) and clarifies important conceptual distinctions between storage_type and dataSourceConfiguration.type. However, it doesn't explicitly mention when NOT to use it or name specific alternative tools for different scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/r3-yamauchi/bedrock-kb-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server