GUIDELINES.md•5.69 kB
# Reexpress MCP Server Guidelines
1. This is not intended for adversarial settings, nor fully unattended agent settings without a human-in-the-loop. Our pre-trained SDM estimator is intended to enhance your productivity (as the single personal user of the local SDM server) **for AI-assisted software development and data science settings, and to assist in QA questions of a technical nature when connected to web search or your retrieval system over technical documentation or code.** The underlying SDM estimator is calibrated at an alpha of 0.9, so it is neither intended nor suitable for high-risk applications (such as fully unattended pipelines). SDM estimators are what you will need for such settings, but rather than using our general pre-trained estimator, we recommend increasing alpha to 0.99 (or higher, as applicable), calibrating against your domain-specific data, and structuring your deployment to hard reject any predictions estimated to fall outside the High-Reliability region. We also recommend fine-tuning the underlying language model as an SDM language model, constructing the SDM estimator with direct access to the hidden-states of a large model as part of the training process to maximize the proportion of in-domain high-probability verifications, rather than the post-hoc proxy done here over an ensemble of models, which marginalizes over the text of the prompt and response. (See the paper linked in the README for technical details.) If you are interested in training SDM language models at scale, or building a system to serve to other users, contact us.
2. Do not expect the model to help you adjudicate social or political questions, nor to elucidate the nuances of current news stories. It is not designed for that, and such questions will tend to be out-of-distribution or have relatively low predictive probabilities. The normal principles of statistical machine learning --- and the epistemological limits of predicting future events, decision-making over out-of-distribution concepts, etc. --- remain with any learned model. Use your judgement.
3. The MCP server provides flexibility to use across MCP clients, to connect to your own retrieval systems, to use your own prompts, to use other LLM agents, and to construct test-time search and reflection as needed for your tasks. However, that flexibility also creates more space for weak links in terms of grounded verification than if we tightly controlled the full stack. Some things to keep in mind:
- Simply put, generally aim to not make the verification harder than it needs to be. **View it as an expert second opinion, but it won't be informative unless you provide sufficient context.** It can't read your mind. Break down the problem into sub-tasks (manually or via LLM assistance), rather than a general "here's a large codebase [or project]; please debug [or fix] it". The latter might be a starting point, but often will result in a low probability estimate. However, from there, you can use Reexpress to guide the resolution to your questions on a more targeted level.
- For coding, often providing the verbatim error messages can be very helpful for the tool-calling LLM and the verification LLMs to give you further guidance, well beyond the error message itself.
- For retrieval settings, an example of breaking a larger problem down into sub-tasks is first asking whether a retrieved document is relevant for your question. For code, you might ask to check the validity of an imported package name or the arguments to a function.
- For questions about API's and software after the knowledge cutoff dates of the underlying LLMs, you will want to ground the verification with links to official documentation obtained via web-search or your internal retrieval system. That's straightforward to do in MCP clients with built-in web-search such as VSCode Copilot and Claude Desktop; it may require a web-search (or retrieval) MCP server in other cases.
- Remember there's a distinction between the external files that the tool-calling LLM (e.g., Claude Opus 4.1) sees and the files that the verification LLMs of the Reexpress tool see. For small code-snippets, it can be reasonable to have the tool-calling LLM send the data to the Reexpress tool, but for larger files, use ReexpressFileSet() (and optionally, the convenience function ReexpressDirectorySet()) to directly send files to the verification LLMs to avoid the tool-calling LLM misrepresenting the grounding document.
- Unless you make the web-search or retrieved source documents directly available to Reexpress via ReexpressFileSet(), low probability estimates may result, and the verification LLMs may provide warnings that it is unclear whether the tool-calling LLM has accurately cited its sources. Currently we do not provide a built-in tool to send web content or retrieved data directly to the verification LLMs, other than saving it to disk and adding via ReexpressFileSet().
4. Our recommended prompt for the main Reexpress tool is a good place to start. With Opus 4.1, for example, it will often return to you after only 1-3 turns of checking with the Reexpress tool. That's typically what you want when getting started until you've adapted the estimator to your tasks (e.g., via ReexpressAddFalse or ReexpressAddTrue). You can also provide additional guidance to the tool-calling LLM, such as continuing for up to some set number of tool calls until a probability of at least 90% is reached.
5. MCP (and LLM agents, in general) is something of a wild-west in terms of authentication and security permissions. Only add other MCP servers from sources you trust, and preferably those with open-source codebases.