Integrations
Enables configuration of connection parameters through environment variables for connecting to MongoDB.
Allows users to clone the SwanzMCP repository for installation and setup.
Provides tools for documenting LLM safety challenges, including creating and querying models, threads, and messages to track safety issues and jailbreak attempts.
Grey Swan LLM Safety Challenge MCP Server
This MongoDB-integrated MCP server is designed for documenting and analyzing LLM safety challenges as part of the Grey Swan Arena competitions.
Introduction
The Grey Swan Arena hosts various AI safety challenges where participants attempt to identify vulnerabilities in AI systems. This MCP server provides tools to document these attempts, track safety challenges, and analyze potentially harmful interactions with LLMs.
Getting Started
Prerequisites
- Node.js (v14 or higher)
- MongoDB (v4.4 or higher)
- Cursor IDE
Installation
- Clone this repository:Copy
- Install dependencies:Copy
- Create a
.env
file in the root directory:Copy - Build the server:Copy
- Start MongoDB:Copy
- Start the MCP server:Copy
Setting Up the MCP Server in Cursor
- Open Cursor
- Go to Cursor Settings > Features > MCP
- Click '+ Add New MCP Server'
- Fill out the form:
- Name: Grey Swan LLM Safety Challenge
- Type: stdio
- Command:
node /path/to/SwanzMCP/build/index.js
- Click "Add Server"
Available MongoDB Tools
This MCP server provides six MongoDB tools for documenting LLM safety challenges:
1. mongo_model
Creates or updates organizational identifiers for your testing sessions.
2. mongo_thread
Creates or updates conversation threads with safety challenges.
3. mongo_message
Creates or updates messages in threads, including safety flags.
4. mongo_query_models
Queries organizational identifiers from the database.
5. mongo_query_threads
Queries threads from the database with various filters.
6. mongo_query_messages
Queries messages from the database.
Workflow for Grey Swan Arena Challenges
1. Preparing for a Challenge
- Create an organizational identifier using
mongo_model
with a unique name for your testing session - Create a thread using
mongo_thread
with relevant metadata and initial challenges
2. Documenting Jailbreak Attempts
For each jailbreak attempt:
- Add the user message with
mongo_message
, including safety flags - Add the model's response with
mongo_message
- Update the thread with
mongo_thread
to add new challenges discovered
3. Analyzing Results
- Use
mongo_query_threads
to find threads with specific challenge categories - Use
mongo_query_messages
withsafetyFlagsOnly: true
to analyze flagged messages - Compare different jailbreak techniques by querying threads with different tags
Example: Documenting a Prompt Injection Attack
Project Structure
Best Practices
- Consistent Tagging: Use consistent tags across threads to enable effective filtering
- Detailed Challenges: Document challenges with specific details about the technique used
- Severity Levels: Use severity levels (low, medium, high) consistently
- Status Tracking: Update challenge status as you work (identified, mitigated, unresolved)
- Safety Flags: Flag all potentially harmful messages to build a comprehensive dataset
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Based on the awesome-cursor-mpc-server project
- Created for the Grey Swan Arena AI safety challenges
This server cannot be installed
MongoDB-integrated MCP server for documenting and analyzing LLM safety challenges in the Grey Swan Arena competitions.