Skip to main content
Glama
leeguooooo
by leeguooooo
PERFORMANCE_OPTIMIZATION_PLAN.md12.7 kB
# 性能优化计划 ## 当前性能瓶颈分析 ### 🐌 问题 1: 下载完整邮件(最严重) **现状**: ```python # src/legacy_operations.py:159 mail.fetch(email_id, '(RFC822)') # 下载完整邮件(正文+附件) ``` **影响**: - 列出 50 封邮件 = 下载 50 封完整邮件 - 每封 1-5MB(带附件)→ 总共 50-250MB - 网络传输时间:数十秒甚至分钟 **解决方案**: ```python # 只下载头部信息 fetch_parts = '(BODY.PEEK[HEADER.FIELDS (From To Subject Date Message-ID)] FLAGS RFC822.SIZE)' mail.fetch(email_id, fetch_parts) ``` **效果**: - 每封只下载 < 1KB(头部) - 50 封 = 50KB(vs 250MB) - **速度提升 5000x** 🚀 --- ### 🐌 问题 2: 每次重新建立连接 **现状**: ```python # 每次调用 mail = conn_mgr.connect_imap() # 新建 TCP + TLS 连接 mail.login(...) mail.select(folder) # ... 操作 mail.logout() # 关闭连接 ``` **影响**: - 每次 list_emails 都要:TCP 握手 + TLS 握手 + IMAP 登录 - 延迟:~500ms-2s(取决于网络和服务器) **解决方案**: ```python # 使用连接池 from connection_pool import ConnectionPool pool = ConnectionPool() with pool.get_connection(account_id) as mail: # 复用已有连接 mail.select(folder) # ... 操作 # 连接返回池中,不关闭 ``` **效果**: - 首次:~1s(建立连接) - 后续:~50ms(复用连接) - **速度提升 20x** 🚀 --- ### 🐌 问题 3: 未使用同步数据库 **现状**: - `email_sync.db` 有完整的邮件缓存 - 但 `list_emails`/`search_emails` 仍然实时查询 IMAP - 同步数据库只有 n8n 在用 **解决方案**: ```python def list_emails(limit=50, use_cache=True, ...): if use_cache and sync_enabled(): # 从 SQLite 读取(毫秒级) return read_from_sync_db(limit, ...) else: # 实时 IMAP(秒级) return fetch_from_imap(limit, ...) ``` **效果**: - 缓存命中:~10ms - IMAP 查询:~5s - **速度提升 500x** 🚀 --- ## 优化实施计划 ### Phase 1: 快速优化(轻量级,1-2小时) #### 1.1 修改 `fetch_emails` - 只下载头部 **文件**: `src/legacy_operations.py` **修改前**: ```python mail.fetch(email_id, '(RFC822)') # 完整邮件 ``` **修改后**: ```python # 只下载头部信息 + FLAGS + 大小 fetch_cmd = '(BODY.PEEK[HEADER.FIELDS (From To Subject Date Message-ID)] FLAGS RFC822.SIZE)' result, data = mail.fetch(email_id, fetch_cmd) ``` **处理逻辑**: ```python # data 格式: [(b'1 (FLAGS (...) BODY[...] {123}', b'header bytes'), ...] header_bytes = data[0][1] if len(data[0]) >= 2 else data[1][1] msg = email.message_from_bytes(header_bytes) # 提取 FLAGS flags_str = str(data[0][0]) is_unread = '\\Seen' not in flags_str # 提取大小 size_match = re.search(r'RFC822\.SIZE (\d+)', flags_str) size = int(size_match.group(1)) if size_match else 0 ``` **影响范围**: - ✅ `fetch_emails()` - 列表显示 - ❌ `get_email_detail()` - 仍下载完整邮件(正确行为) --- #### 1.2 批量 UID FETCH **当前**: ```python for email_id in email_ids: mail.fetch(email_id, ...) # 50 次网络往返 ``` **优化后**: ```python # 一次性获取多个 uid_range = f"{email_ids[0]}:{email_ids[-1]}" result, data = mail.uid('FETCH', uid_range, fetch_cmd) # 解析批量响应 ``` **效果**: - 网络往返:50 → 1 - **速度再提升 5-10x** --- ### Phase 2: 连接池集成(中等,2-3小时) #### 2.1 修改 `get_connection_manager()` **文件**: `src/legacy_operations.py` **添加连接池**: ```python from connection_pool import ConnectionPool # 模块级别 _connection_pool = None def get_connection_pool(): global _connection_pool if _connection_pool is None: _connection_pool = ConnectionPool( max_connections_per_account=3, connection_timeout=60, idle_timeout=300 ) return _connection_pool def fetch_emails(limit=50, ...): pool = get_connection_pool() with pool.get_connection(account_id) as mail: # 使用连接池管理的连接 mail.select(folder) # ... 操作 # 连接自动返回池中 ``` **修改范围**: - `fetch_emails` - `get_email_detail` - `mark_email_read` - `delete_email` - 所有 IMAP 操作 --- ### Phase 3: 同步数据库集成(重量级,4-6小时) #### 3.1 添加缓存读取路径 **新文件**: `src/operations/cached_operations.py` ```python import sqlite3 from datetime import datetime, timedelta class CachedEmailOperations: def __init__(self, db_path='email_sync.db'): self.db_path = db_path def list_emails_cached(self, limit=50, unread_only=False, folder='INBOX', account_id=None, max_age_minutes=5): """从缓存读取邮件列表""" conn = sqlite3.connect(self.db_path) cursor = conn.cursor() # 检查缓存新鲜度 cursor.execute(""" SELECT MAX(last_synced) FROM emails WHERE account_id = ? AND folder = ? """, (account_id, folder)) last_sync = cursor.fetchone()[0] if not last_sync or \ datetime.now() - datetime.fromisoformat(last_sync) > timedelta(minutes=max_age_minutes): # 缓存过期,返回 None 触发实时查询 conn.close() return None # 从缓存读取 query = """ SELECT uid, from_addr, subject, date, flags, message_id FROM emails WHERE account_id = ? AND folder = ? """ if unread_only: query += " AND flags NOT LIKE '%\\Seen%'" query += " ORDER BY date DESC LIMIT ?" cursor.execute(query, (account_id, folder, limit)) rows = cursor.fetchall() emails = [] for row in rows: emails.append({ 'id': row[0], # UID 'from': row[1], 'subject': row[2], 'date': row[3], 'unread': '\\Seen' not in row[4], 'message_id': row[5], 'account_id': account_id }) conn.close() return emails ``` #### 3.2 修改 `fetch_emails` 支持缓存 ```python def fetch_emails(limit=50, unread_only=False, folder="INBOX", account_id=None, use_cache=True): """ Fetch emails (with optional caching) Args: use_cache: If True, try to read from sync database first """ # 尝试从缓存读取 if use_cache: cached_ops = CachedEmailOperations() cached_result = cached_ops.list_emails_cached( limit, unread_only, folder, account_id, max_age_minutes=5 # 5分钟缓存 ) if cached_result is not None: logger.debug(f"Returning cached emails for {account_id}") return { "emails": cached_result, "from_cache": True, "account_id": account_id } # 缓存未命中或禁用,走实时 IMAP logger.debug(f"Fetching live emails for {account_id}") # ... 原有 IMAP 逻辑 ``` --- #### 3.3 初始化同步服务 **确保后台同步运行**: ```bash # 初始化同步 python scripts/init_sync.py # 启动调度器(常驻) python -m src.operations.sync_scheduler & # 或使用 systemd (推荐) sudo systemctl enable mcp-email-sync sudo systemctl start mcp-email-sync ``` **验证同步数据**: ```bash # 检查数据库 sqlite3 email_sync.db "SELECT COUNT(*) FROM emails;" # 检查最近同步时间 sqlite3 email_sync.db "SELECT account_id, MAX(last_synced) FROM emails GROUP BY account_id;" ``` --- ### Phase 4: 超大邮件优化(可选,1-2小时) #### 4.1 正文截断 ```python MAX_BODY_PREVIEW = 50 * 1024 # 50KB def get_email_detail(email_id, ...): # ... 获取邮件 body = extract_body(msg) # 截断过长正文 if len(body) > MAX_BODY_PREVIEW: body = body[:MAX_BODY_PREVIEW] body_truncated = True else: body_truncated = False return { "body": body, "body_truncated": body_truncated, "body_size": len(body), ... } ``` #### 4.2 附件懒加载 ```python # 列表只返回附件元数据 attachments = [{ "filename": part.get_filename(), "size": len(part.get_payload(decode=False)), "content_type": part.get_content_type(), "download_url": f"/api/attachment/{email_id}/{idx}" # 按需下载 } for idx, part in enumerate(msg.walk()) if part.get_filename()] ``` --- ## 性能对比 ### 优化前 | 操作 | 耗时 | 网络流量 | 瓶颈 | |------|------|----------|------| | list_emails (50封) | 30-60s | 50-250MB | 下载完整邮件 | | 每次操作 | +1-2s | - | 重新建连接 | | search_emails | 20-40s | 30-150MB | 同上 | **总体体验**:😫 很慢 --- ### Phase 1 优化后(只下载头部) | 操作 | 耗时 | 网络流量 | 改善 | |------|------|----------|------| | list_emails (50封) | 3-5s | < 50KB | ✅ 10x faster | | 每次操作 | +1-2s | - | 仍需连接 | | search_emails | 2-4s | < 30KB | ✅ 10x faster | **总体体验**:🙂 可用 --- ### Phase 2 优化后(+ 连接池) | 操作 | 耗时 | 网络流量 | 改善 | |------|------|----------|------| | list_emails (首次) | 3-5s | < 50KB | 同上 | | list_emails (后续) | 0.5-1s | < 50KB | ✅ 50x faster | | search_emails | 0.5-1s | < 30KB | ✅ 50x faster | **总体体验**:😊 快速 --- ### Phase 3 优化后(+ 同步缓存) | 操作 | 耗时 | 网络流量 | 改善 | |------|------|----------|------| | list_emails (缓存命中) | 10-50ms | 0 | ✅ 500x faster | | list_emails (缓存未命中) | 0.5-1s | < 50KB | 回退到 Phase 2 | | search_emails (缓存) | 5-20ms | 0 | ✅ 1000x faster | **总体体验**:🤩 极快 --- ## 实施建议 ### 推荐顺序 1. **立即实施**:Phase 1(只下载头部) - 影响最大 - 风险最小 - 工作量最小 2. **短期实施**:Phase 2(连接池) - 需要测试连接稳定性 - 改动范围中等 3. **长期实施**:Phase 3(同步缓存) - 需要保证同步服务稳定运行 - 需要处理缓存一致性 - 最大性能提升 4. **按需实施**:Phase 4(超大邮件) - 针对特定场景 - 可选优化 --- ## 风险评估 ### Phase 1 风险:低 ✅ - **兼容性**:IMAP 标准支持 - **测试范围**:list_emails - **回滚**:简单(恢复 RFC822) ### Phase 2 风险:中 ⚠️ - **连接泄漏**:需要严格测试 cleanup - **并发问题**:多个请求共用连接池 - **服务器限制**:某些 IMAP 服务器限制并发连接数 **缓解措施**: - 限制连接池大小(每账户 2-3 个) - 实现连接健康检查 - 添加连接超时和重试 ### Phase 3 风险:高 ⚠️⚠️ - **缓存过期**:用户看到旧数据 - **同步失败**:数据库未更新 - **一致性**:IMAP 和缓存不一致 **缓解措施**: - 短缓存TTL(5分钟) - 提供"刷新"按钮强制实时查询 - 监控同步健康状态 - 缓存未命中时回退到实时查询 --- ## 监控指标 ### 添加性能日志 ```python import time def fetch_emails(...): start_time = time.time() cache_hit = False # ... 操作 elapsed = time.time() - start_time logger.info(f"fetch_emails: {elapsed:.2f}s, cache_hit={cache_hit}, count={len(emails)}") ``` ### 监控面板 建议跟踪: - 平均响应时间 - 缓存命中率 - IMAP 连接数 - 网络流量 - 同步延迟 --- ## 下一步行动 ### 立即开始(Phase 1) ```bash # 1. 备份当前代码 git checkout -b feature/performance-optimization # 2. 修改 fetch_emails # 编辑 src/legacy_operations.py # 3. 测试 python test_account_id_fix.py # 4. 性能测试 time python -c "from src.legacy_operations import fetch_emails; fetch_emails(50)" # 5. 提交 git add src/legacy_operations.py git commit -m "perf: optimize list_emails to fetch headers only (Phase 1)" ``` ### 准备 Phase 2 ```bash # 检查连接池实现 ls src/connection_pool.py # 测试连接池 python -c "from src.connection_pool import ConnectionPool; pool = ConnectionPool(); print('OK')" ``` ### 准备 Phase 3 ```bash # 检查同步数据库 sqlite3 email_sync.db "SELECT COUNT(*) FROM emails;" # 如果为空,初始化同步 python scripts/init_sync.py # 启动后台同步 python -m src.operations.sync_scheduler & ``` --- 准备好开始了吗?我可以帮你实施 Phase 1!

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/leeguooooo/email-mcp-service'

If you have feedback or need assistance with the MCP directory API, please join our Discord server