PRE_COMMIT_CHECKLIST.mdโข3.8 kB
# โ
Pre-Commit Security Checklist
## ๐ **BEFORE PUSHING TO GITHUB**
Run through this checklist to ensure no sensitive information is committed:
### **1. Sensitive Files Removed**
- [ ] `mcp1-467108-702d9a41627c.json` - โ DELETED
- [ ] Any other `*.json` service account keys - โ DELETED
- [ ] Real `.env` file with actual credentials - โ DELETED
### **2. Template Files Updated**
- [ ] `.env` contains only placeholder values
- [ ] `mcp_config.json` contains only placeholder values
- [ ] No hardcoded folder IDs in any Python files
- [ ] No hardcoded file paths in any Python files
### **3. Git Configuration**
- [ ] `.gitignore` file exists and includes sensitive patterns
- [ ] `.gitignore` includes `*.json` (except specific allowed files)
- [ ] `.gitignore` includes `.env` and environment files
- [ ] `.gitignore` includes `processed_datasets/` and logs
### **4. Documentation**
- [ ] `SECURITY_SETUP.md` exists and is complete
- [ ] `README.md` includes security warnings
- [ ] All setup instructions reference template files only
### **5. Code Review**
- [ ] No service account keys in any file
- [ ] No Google Drive folder IDs in any file
- [ ] All sensitive data uses environment variables
- [ ] Error messages don't expose sensitive information
## ๐งช **Testing Commands**
Run these commands to verify security:
```bash
# Check what files will be committed
git status
git add . --dry-run
# Verify .gitignore is working
git check-ignore *.json
git check-ignore .env
git check-ignore processed_datasets/
# Search for potential secrets in code
grep -r "1roPPn6" . --exclude-dir=.git || echo "โ
No folder IDs found"
grep -r "1lJ5OKM" . --exclude-dir=.git || echo "โ
No folder IDs found"
grep -r "mcp1-467108" . --exclude-dir=.git || echo "โ
No service account references found"
# Check for hardcoded credentials
grep -r "service-account" . --exclude-dir=.git --exclude="*.md" || echo "โ
No hardcoded service accounts"
```
## ๐จ **Red Flags - DO NOT COMMIT IF YOU SEE:**
- Any `.json` files with real credentials
- Folder IDs starting with `1roPPn6` or `1lJ5OKM`
- File paths pointing to real service account keys
- `.env` files with real values
- Any files in `processed_datasets/` directory
- Log files or temporary files
## โ
**Green Light - Safe to Commit:**
- Only template and placeholder values
- All sensitive files in `.gitignore`
- Documentation emphasizes security setup
- No real credentials anywhere in the repository
## ๐ง **Final Verification**
```bash
# Create a test clone to verify
cd /tmp
git clone /path/to/your/repo test-clone
cd test-clone
# Try to run - should fail safely with template values
python local_test.py
# Should show error about missing/invalid credentials
# Verify no sensitive files
find . -name "*.json" -not -path "./.git/*"
# Should only show template files like mcp_config.json
echo "โ
Repository is safe for public GitHub!"
```
## ๐ **Commit Message Template**
```
feat: Add MCP dataset onboarding server
- FastAPI server with dataset processing endpoints
- True MCP server for LLM integration
- Automated file monitoring and processing
- Google Drive integration with secure credential handling
- Comprehensive documentation and security guides
โ ๏ธ Template repository - requires user configuration
๐ No sensitive data included - see SECURITY_SETUP.md
```
## ๐ฏ **After Publishing**
1. **Update repository description**: "MCP Dataset Onboarding Server - Automated data processing with LLM integration (requires setup)"
2. **Add topics**: `mcp`, `dataset`, `automation`, `fastapi`, `google-drive`, `data-quality`
3. **Create release**: Tag v1.0.0 with release notes
4. **Monitor**: Watch for any accidental credential commits in issues/PRs
Remember: **Security is not optional!** ๐