SECURITY_SCANNING.mdβ’12.8 kB
# Security Scanning - USPTO Final Petition Decisions MCP
This document describes the secret scanning infrastructure for the USPTO Final Petition Decisions (FPD) MCP, designed to prevent accidental exposure of sensitive credentials.
## Overview
The FPD MCP uses **[detect-secrets](https://github.com/Yelp/detect-secrets)** to scan for accidentally committed secrets in the codebase. This provides **two layers of protection**:
1. **Pre-commit hooks** - Scans files locally before they're committed to git
2. **GitHub Actions CI/CD** - Scans all code pushed to GitHub on every push/PR
## What Gets Scanned
### 1. **Secret Detection** (20+ types)
Uses detect-secrets to find accidentally committed credentials
### 2. **Prompt Injection Detection** (70+ patterns)
Scans for malicious prompt patterns targeting Final Petition Decisions system
### Secret Types Detected (20+ types)
The scanner detects the following types of secrets:
**API Keys & Authentication:**
- AWS/Azure/GCP cloud credentials
- GitHub/GitLab tokens
- Mistral API keys (used for OCR)
- OpenAI API keys
- JWT tokens
- Basic authentication credentials
**Infrastructure Secrets:**
- Private SSH keys
- SSL/TLS certificates
- Database passwords
- NPM tokens
- PyPI tokens
**High-Entropy Strings:**
- Base64-encoded secrets (limit: 4.5 entropy)
- Hexadecimal secrets (limit: 3.0 entropy)
**Communication Services:**
- Slack tokens
- Discord bot tokens
- Telegram bot tokens
- Twilio API keys
- SendGrid API keys
- Mailchimp API keys
### Prompt Injection Attack Patterns (70+ patterns)
**Attack Categories Detected:**
- Instruction override attempts ("ignore previous instructions")
- System prompt extraction ("show me your instructions")
- AI behavior manipulation ("you are now a different AI")
- Output format manipulation ("encode in hex")
- Social engineering ("we became friends")
**Final Petition Decisions Specific:**
- Petition data extraction ("extract all petition numbers")
- Decision reasoning disclosure ("reveal director decisions")
- USPTO API bypass attempts ("ignore API restrictions")
- CFR rule manipulation ("override 37 CFR requirements")
- Petitioner information exfiltration ("dump petitioner data")
### Files Excluded from Scanning
The following file patterns are excluded to reduce false positives:
- `configs/**/*.json` - Example configuration files with placeholder keys
- `*.md` - Markdown documentation files (except actual secrets)
- `package-lock.json` - NPM lock files
- `*.lock` - Other lock files (Cargo.lock, poetry.lock, etc.)
**Important:** While `CLAUDE.md` is in `.gitignore` and not scanned, all other files (including test files) are scanned to prevent accidental exposure.
## Installation
### Prerequisites
You need Python 3.11+ and either:
- **uv** (recommended, already used by FPD MCP)
- **pip** (standard Python package manager)
### Option 1: Using uv (Recommended)
```bash
# Install detect-secrets as a tool
uv tool install detect-secrets
# Verify installation
uv tool run detect-secrets --version
```
### Option 2: Using pip
```bash
# Install detect-secrets
pip install detect-secrets
# Verify installation
detect-secrets --version
```
### Install Pre-commit Hooks
```bash
# Install pre-commit package
pip install pre-commit
# Install git hooks (run from project root)
cd C:\Users\JohnWalkoe\uspto_fpd_mcp
pre-commit install
# Verify installation
pre-commit --version
```
## Usage
### Prompt Injection Detection
**Manual Scanning:**
```bash
# Scan for prompt injection patterns
uv run python .security/check_prompt_injections.py src/ tests/ *.md
# Scan specific directories
uv run python .security/check_prompt_injections.py src/fpd_mcp/
# Run via pre-commit hook
uv run pre-commit run prompt-injection-check --all-files
# Test with verbose output
uv run python .security/check_prompt_injections.py --verbose src/ tests/
```
### Pre-commit Hooks (Local Development)
Once installed, pre-commit hooks run automatically on `git commit`:
```bash
# Make changes to files
git add src/fpd_mcp/config/settings.py
# Commit (hooks run automatically)
git commit -m "Update settings"
# If secrets detected:
# - Commit is blocked
# - Files with secrets are listed
# - Fix the issues and try again
```
**Bypass hooks (NOT RECOMMENDED):**
```bash
# Only use in emergencies (requires justification in commit message)
git commit --no-verify -m "Emergency fix - bypassing hooks"
```
### Manual Scanning
Run secret scanning manually anytime:
```bash
# Scan all files against baseline (uv method)
uv tool run detect-secrets scan --baseline .secrets.baseline
# Scan all files against baseline (pip method)
detect-secrets scan --baseline .secrets.baseline
# Scan specific file
uv tool run detect-secrets scan --baseline .secrets.baseline src/fpd_mcp/main.py
# Scan git history (last 100 commits)
git log --all --pretty=format: -p -100 | uv tool run detect-secrets scan --stdin
```
### Updating the Baseline
When you add new test placeholders or example configurations, update the baseline:
```bash
# Update baseline with new findings (uv method)
uv tool run detect-secrets scan --baseline .secrets.baseline
# Update baseline with new findings (pip method)
detect-secrets scan --baseline .secrets.baseline
# Review changes
git diff .secrets.baseline
# Commit updated baseline
git add .secrets.baseline
git commit -m "Update secrets baseline for new test placeholders"
```
**When to update the baseline:**
- Adding new test files with placeholder keys like `"test_key_for_unit_tests"`
- Adding example configuration files with dummy credentials
- After verifying a detection is a false positive (e.g., UUID, documentation example)
**When NOT to update the baseline:**
- When a real secret is detected (remove the secret instead!)
- When you're unsure if it's a real secret (ask for review first)
## GitHub Actions Integration
The FPD MCP automatically scans code on every push and pull request to `main`, `master`, or `develop` branches.
### Workflow Configuration
**File:** `.github/workflows/secret-scan.yaml`
**Triggers:**
- Push to `main`, `master`, or `develop`
- Pull requests targeting those branches
**What it does:**
1. Checks out code with full git history
2. Installs detect-secrets
3. Scans current codebase against `.secrets.baseline`
4. Scans last 100 commits of git history
5. Fails build if new secrets detected (not in baseline)
### Viewing Scan Results
**On GitHub:**
1. Go to your repository
2. Click **Actions** tab
3. Select **Secret Scanning** workflow
4. View latest run results
**Build failures:**
- If scan finds new secrets, the build fails
- Check the workflow logs for detected secrets
- Remove secrets and update baseline if needed
- Push again to re-run checks
## Troubleshooting
### False Positives
**Problem:** Scanner detects a UUID or other non-secret as a secret
**Solution:**
```bash
# Add to baseline (after verifying it's not a real secret)
uv tool run detect-secrets scan --baseline .secrets.baseline
# Review the addition
git diff .secrets.baseline
# Commit the updated baseline
git add .secrets.baseline
git commit -m "Add false positive UUID to baseline"
```
### Pre-commit Hook Not Running
**Problem:** Hooks don't run on `git commit`
**Solution:**
```bash
# Verify hooks are installed
ls -la .git/hooks/pre-commit
# If missing, reinstall
pre-commit install
# Test manually
pre-commit run --all-files
```
### Baseline File Conflicts
**Problem:** Merge conflicts in `.secrets.baseline` after updating from main
**Solution:**
```bash
# Regenerate baseline from scratch
uv tool run detect-secrets scan --baseline .secrets.baseline
# Review changes carefully
git diff .secrets.baseline
# Commit regenerated baseline
git add .secrets.baseline
git commit -m "Regenerate secrets baseline after merge"
```
### Real Secret Committed by Mistake
**Problem:** You accidentally committed a real secret
**Solution:**
**1. Immediately rotate the compromised credential:**
- USPTO API key: Generate new key at https://developer.uspto.gov/
- Mistral API key: Generate new key at https://console.mistral.ai/api-keys/
**2. Remove from git history:**
```bash
# Use git filter-repo (recommended) or BFG Repo Cleaner
# See: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository
# Example with git filter-repo (install first: pip install git-filter-repo)
git filter-repo --invert-paths --path path/to/file/with/secret.py
```
**3. Force push (if already pushed to GitHub):**
```bash
git push --force origin main
```
**4. Update baseline and verify:**
```bash
uv tool run detect-secrets scan --baseline .secrets.baseline
git add .secrets.baseline
git commit -m "Remove compromised secret from baseline"
git push origin main
```
### Scanner Hangs or Times Out
**Problem:** Scanning very large files or repos takes too long
**Solution:**
```bash
# Increase timeout for large files (not typically needed for FPD MCP)
uv tool run detect-secrets scan --baseline .secrets.baseline --timeout 300
# Or exclude large files
uv tool run detect-secrets scan --baseline .secrets.baseline --exclude-files 'large_file.json'
```
## Best Practices
### For Developers
**β
DO:**
- Run `pre-commit run --all-files` before pushing
- Use environment variables for all secrets (never hardcode)
- Store secrets in `CLAUDE.md` (already in `.gitignore`)
- Use placeholder keys in tests (e.g., `"test_key_for_unit_tests"`)
- Update baseline when adding legitimate test placeholders
- Review baseline changes carefully before committing
**β DON'T:**
- Commit files with real API keys, tokens, or passwords
- Use `--no-verify` to bypass hooks (except emergencies)
- Update baseline to hide real secrets (rotate the secret instead!)
- Share API keys in chat, email, or documentation
- Reuse the same API key across multiple projects
### For API Key Management
**USPTO API Key:**
- Store in Claude Desktop config: `%APPDATA%\Claude\claude_desktop_config.json`
- Or use environment variable: `$env:USPTO_API_KEY="your_key"`
- Rotate quarterly or after any security incident
**Mistral API Key (optional):**
- Store in same Claude Desktop config file
- Or use environment variable: `$env:MISTRAL_API_KEY="your_key"`
- Monitor usage costs ($0.001/page for OCR)
### For Code Reviews
**Check for:**
- Files with high-entropy strings (not in baseline)
- Hardcoded credentials or tokens
- Configuration files with real secrets (should use env vars)
- Test files with real API keys (should use placeholders)
**Verify:**
- Baseline updates are justified (false positives or test placeholders)
- No `--no-verify` commits without good reason
- Environment variables are documented in README
## Integration with FPD MCP Architecture
### Protected Files
The following FPD MCP files are particularly sensitive and actively scanned:
**Configuration:**
- `src/fpd_mcp/config/settings.py` - Environment variable handling
- `field_configs.yaml` - Field configuration (no secrets, but scanned)
**Test Files:**
- `tests/test_basic.py` - Unit tests with placeholder keys (baseline tracked)
- `tests/test_integration.py` - Integration tests (baseline tracked)
- `tests/test_extraction.py` - Document extraction tests (CONTAINS REAL KEYS - NOT COMMITTED)
**API Client:**
- `src/fpd_mcp/api/fpd_client.py` - API authentication logic
**Documentation:**
- `CLAUDE.md` - **IN .GITIGNORE** - Contains real keys, never committed
- `README.md` - Public documentation (scanned to ensure no real keys)
### Excluded Files
**Automatically excluded:**
- `CLAUDE.md` - In `.gitignore` (contains real keys for development)
- `configs/**/*.json` - Example configurations with placeholder keys
- `*.md` files - Markdown documentation (except real secrets)
### Cross-MCP Consistency
The FPD MCP secret scanning matches the PTAB MCP pattern:
- Same detect-secrets version (v1.5.0)
- Same exclusion patterns
- Same pre-commit hooks
- Same GitHub Actions workflow structure
- Consistent with Patent File Wrapper MCP security practices
## Additional Resources
- **detect-secrets documentation:** https://github.com/Yelp/detect-secrets
- **USPTO API key management:** https://developer.uspto.gov/
- **Mistral API key management:** https://console.mistral.ai/api-keys/
- **GitHub secret scanning:** https://docs.github.com/en/code-security/secret-scanning
- **FPD MCP Security Guidelines:** `SECURITY_GUIDELINES.md`
## Support
**Found a security issue?**
- Create a private security advisory on GitHub
- Or email: [contact information]
**Questions about secret scanning?**
- Review this documentation
- Check troubleshooting section
- Open an issue on GitHub (without revealing secrets!)
---
**Last Updated:** 2025-10-12
**Version:** 1.0
**detect-secrets Version:** 1.5.0