# Chat Completion Tools Configuration
# Provides tools for calling OpenAI-compatible LLM inference servers from Teradata
chat_ai_mapreduce:
type: prompt
description: >
Multi-step workflow to answer a high-level question using Teradata SQL
and the chat_aggregatedCompleteChat tool. The agent first builds a
Teradata query, then runs aggregated chat completion, and finally
synthesizes a global answer.
prompt: |
You are going to answer the high-level question: "{question}".
You must answer this question in three steps:
Step 1 – Build the SQL query
- Create a Teradata SQL query that selects the texts relevant to this high-level question.
- The query must return a single column, renamed to "txt".
- If it is reasonable for this question, filter the rows to keep only texts that are relevant to the question.
- If such filtering is not clearly possible or meaningful, skip the filtering and explain why in your reasoning (but not in the SQL).
- Remember you are querying a Teradata database, so use valid Teradata SQL syntax.
- Unless the question explicitly states that there must be no sampling, add a `SAMPLE 1000` clause after any filtering to limit the number of rows.
- Use the available tools to discover actual databases, tables, and columns before writing the final query.
- Do not add a semicolon at the end of the SQL statement.
- Do not use any other characters besides simple UTF-8 characters.
- In your final output for this step, provide only the SQL query, nothing else.
Step 2 – Run aggregated chat completion
- Call the tool `chat_aggregatedCompleteChat`.
- Pass the SQL from Step 1 as the `sql` parameter.
- For the `system_message` parameter, construct a system prompt for the LLM that:
- Focuses on a single text row at a time (do not ask about the whole dataset at once).
- Guides the model to produce an answer that helps to answer the original high-level question, but from the perspective of just that one text.
- Strongly enforces that the response for each text must be very short (no longer than 2–3 words).
- Instructs the model to return an empty string if the text is not relevant to the high-level question.
- May include examples of possible responses, but must not restrict the output to a fixed closed list.
- Do not use any other characters besides simple UTF-8 characters.
Step 3 – Synthesize the aggregated answer
- Use the aggregated results from Step 2 (unique `response_txt` values and their counts) to produce a final, high-level answer to the original question.
- When summarizing, remember that the responses in Step 2 were LLM-generated labels, so:
- Different labels might refer to the same underlying reason or category (for example, "bad quality" vs "poor quality").
- Where appropriate, merge or interpret similar responses together.
- Provide a clear, concise summary that explains the dominant patterns and insights that answer the high-level question.
Follow these 3 steps exactly, and do not invent your own steps
parameters:
question:
name: question
description: "High-level question you want to answer using aggregated analysis over text data in Teradata. Explicitly state if sampling is allowed or not."
required: true
type_hint: str