ContextFS

contextfs
docs
architecture

sync-service.md•16.7 KiB

# ContextFS Sync Service Architecture ## Overview The ContextFS sync service enables multi-device memory synchronization using vector clocks for conflict detection. This document describes the current implementation architecture. ## System Components ``` ┌─────────────────────────────────────────────────────────────────┐ │ DEVICES │ ├─────────────────┬─────────────────┬─────────────────────────────┤ │ Laptop │ Desktop │ Linux Server │ │ (macOS) │ (Windows) │ (Ubuntu) │ ├─────────────────┼─────────────────┼─────────────────────────────┤ │ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┐ │ │ │ ContextFS │ │ │ ContextFS │ │ │ ContextFS │ │ │ │ Core │ │ │ Core │ │ │ Core │ │ │ └──────┬──────┘ │ └──────┬──────┘ │ └──────┬──────┘ │ │ │ │ │ │ │ │ │ ┌──────┴──────┐ │ ┌──────┴──────┐ │ ┌──────┴──────┐ │ │ │ SQLite │ │ │ SQLite │ │ │ SQLite │ │ │ │ + ChromaDB │ │ │ + ChromaDB │ │ │ + ChromaDB │ │ │ └─────────────┘ │ └─────────────┘ │ └─────────────┘ │ └────────┬────────┴────────┬────────┴────────┬────────────────────┘ │ │ │ │ Push/Pull │ Push/Pull │ Push/Pull │ (HTTP) │ (HTTP) │ (HTTP) ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ SYNC SERVER │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ FastAPI Application │ │ │ │ (Port 8766) │ │ │ ├─────────────────────────────────────────────────────────┤ │ │ │ /api/sync/register │ Device registration │ │ │ │ /api/sync/push │ Push local changes │ │ │ │ /api/sync/pull │ Pull server changes │ │ │ │ /api/sync/status │ Get sync status │ │ │ └──────────────────────┬──────────────────────────────────┘ │ │ │ │ │ ┌──────────────────────┴──────────────────────────────────┐ │ │ │ PostgreSQL + pgvector │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │ │ │ │ memories │ │ sessions │ │ memory_edges │ │ │ │ │ │ (synced) │ │ (synced) │ │ (synced) │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────────┘ │ │ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ devices │ │ sync_state │ │ │ │ │ └─────────────┘ └─────────────┘ │ │ │ └─────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ## Key Files | File | Purpose | |------|---------| | `src/contextfs/sync/vector_clock.py` | VectorClock and DeviceTracker classes | | `src/contextfs/sync/protocol.py` | Pydantic models for sync protocol | | `src/contextfs/sync/client.py` | SyncClient for device-side operations | | `src/contextfs/sync/cli.py` | CLI commands (`contextfs sync ...`) | | `src/contextfs/sync/path_resolver.py` | Cross-machine path normalization | | `service/api/sync_routes.py` | Server-side FastAPI endpoints | | `service/db/models.py` | SQLAlchemy models for PostgreSQL | ## Vector Clock Implementation ### Data Structure ```python class VectorClock(BaseModel): clock: dict[str, int] # {device_id: counter} ``` Example: ```json {"laptop-abc123": 5, "desktop-def456": 3, "server-xyz789": 1} ``` ### Operations | Operation | Description | |-----------|-------------| | `increment(device_id)` | Increment counter for device | | `merge(other)` | Take max of each component | | `happens_before(other)` | Check if self causally precedes other | | `concurrent_with(other)` | Check if clocks are concurrent (conflict) | | `dominates(other)` | Check if self causally follows other | | `prune(active_devices, max_devices)` | Limit clock size | ### Happens-Before Relation ``` VC1 < VC2 iff ∀d: VC1[d] ≤ VC2[d] AND ∃d: VC1[d] < VC2[d] ``` ### Conflict Detection | Client Clock | Server Clock | Result | |--------------|--------------|--------| | `{A:2, B:1}` | `{A:1, B:1}` | **Accept** (server behind) | | `{A:1, B:1}` | `{A:2, B:1}` | **Reject** (client behind/stale) | | `{A:2, B:1}` | `{A:1, B:2}` | **Conflict** (concurrent) | | `{A:2, B:2}` | `{A:2, B:2}` | **Accept** (equal) | ## Sync Protocol ### Push Flow ``` Client Server │ │ │ 1. Query local memories (since last sync) │ │ │ 2. Increment vector clock for each │ │ │ │ 3. Extract embeddings from ChromaDB │ │ │ │ POST /api/sync/push ─────────────────► │ │ {device_id, memories[], sessions[]} │ │ │ │ 4. For each memory: │ - Compare clocks │ - Accept/Reject/Conflict │ - Store embedding │ │ │ ◄───────────────────── Response ────────│ │ {accepted, rejected, conflicts[]} │ │ │ │ 5. Update local vector clocks │ │ │ ``` ### Pull Flow ``` Client Server │ │ │ POST /api/sync/pull ─────────────────► │ │ {device_id, since_timestamp} │ │ │ │ 1. Query memories │ WHERE updated_at > since │ │ │ 2. Include embeddings │ │ │ ◄───────────────────── Response ────────│ │ {memories[], sessions[], has_more} │ │ │ │ 3. Batch insert to SQLite (skip_rag) │ │ │ │ 4. Insert embeddings to ChromaDB │ │ │ │ 5. Update last_sync timestamp │ │ │ ``` ## Data Models ### SyncedMemory (Protocol) ```python class SyncedMemory(SyncableEntity): # Core content content: str type: str = "fact" tags: list[str] summary: str | None # Namespace namespace_id: str = "global" # Portable source (cross-machine) repo_url: str | None # git@github.com:user/repo.git repo_name: str | None # Human-readable name relative_path: str | None # Path from repo root # Legacy fields source_file: str | None source_repo: str | None source_tool: str | None # Sync metadata (inherited) vector_clock: dict[str, int] content_hash: str | None deleted_at: datetime | None last_modified_by: str | None # Embedding (synced!) embedding: list[float] | None # 384-dim vector ``` ### SyncedMemoryModel (PostgreSQL) ```python class SyncedMemoryModel(Base): __tablename__ = "memories" id: Mapped[str] = mapped_column(Text, primary_key=True) content: Mapped[str] type: Mapped[str] tags: Mapped[list[str]] = mapped_column(ARRAY(Text)) # Sync fields vector_clock: Mapped[dict] = mapped_column(JSONB) content_hash: Mapped[str | None] deleted_at: Mapped[datetime | None] last_modified_by: Mapped[str | None] # Embedding (pgvector) embedding = mapped_column(Vector(384), nullable=True) ``` ## Embedding Synchronization ### Problem Traditional approaches require either: 1. **Centralized queries** - Network dependency for every search 2. **Local recomputation** - Expensive (10-50ms/memory on CPU) ### Solution Sync embeddings alongside content: ``` Push: ChromaDB → Extract → HTTP → PostgreSQL (pgvector) Pull: PostgreSQL → HTTP → Insert → ChromaDB (no recompute!) ``` ### Implementation **Push (client.py:_get_embeddings_from_chroma)** ```python def _get_embeddings_from_chroma(self, memory_ids: list[str]) -> dict[str, list[float]]: collection = self.ctx.rag._collection result = collection.get(ids=memory_ids, include=["embeddings"]) return {id: list(emb) for id, emb in zip(result["ids"], result["embeddings"])} ``` **Pull (client.py:_add_embeddings_to_chroma)** ```python def _add_embeddings_to_chroma(self, embeddings: list[tuple]): collection = self.ctx.rag._collection collection.upsert(ids=ids, embeddings=vectors, documents=documents) ``` ## Device Management ### Device Registration ```python class DeviceRegistration(BaseModel): device_id: str # laptop-abc123 (auto-generated) device_name: str # "My Laptop" platform: str # darwin, linux, windows client_version: str # "0.1.0" ``` ### Device ID Generation ```python def _get_or_create_device_id(self) -> str: hostname = socket.gethostname() mac = uuid.getnode() return f"{hostname}-{mac:012x}"[:32] ``` Stored in `~/.contextfs/device_id` ### Sync State Persistence Stored in SQLite `sync_state` table: | Column | Type | Description | |--------|------|-------------| | device_id | TEXT | Primary key | | server_url | TEXT | Sync server URL | | last_sync_at | TIMESTAMP | Last successful sync | | last_push_at | TIMESTAMP | Last push | | last_pull_at | TIMESTAMP | Last pull | | device_tracker | TEXT (JSON) | DeviceTracker data | ## Path Normalization ### Problem Absolute paths differ across machines: - macOS: `/Users/mlong/code/myrepo/src/file.py` - Linux: `/home/mlong/code/myrepo/src/file.py` - Windows: `C:\Users\mlong\code\myrepo\src\file.py` ### Solution: Portable Paths Store repository URL + relative path: ```python class PortablePath: repo_url: str # git@github.com:user/repo.git repo_name: str # myrepo relative_path: str # src/file.py ``` ### PathResolver ```python def normalize(self, absolute_path: str) -> PortablePath: """Convert absolute path to portable format.""" repo_root = find_git_root(absolute_path) repo_url = get_git_remote(repo_root) relative = os.path.relpath(absolute_path, repo_root) return PortablePath(repo_url=repo_url, relative_path=relative) def resolve(self, portable: PortablePath) -> Path | None: """Convert portable path to local absolute path.""" # Look up local clone of repo_url local_root = find_local_clone(portable.repo_url) return local_root / portable.relative_path ``` ## CLI Commands ```bash # Register device with sync server contextfs sync register --server http://localhost:8766 --name "My Device" # Push local changes contextfs sync push --server http://localhost:8766 contextfs sync push --server http://localhost:8766 --all # Push ALL memories # Pull from server contextfs sync pull --server http://localhost:8766 contextfs sync pull --server http://localhost:8766 --all # Initial sync # Full bidirectional sync contextfs sync all --server http://localhost:8766 # Check status contextfs sync status --server http://localhost:8766 # Run sync daemon contextfs sync daemon --server http://localhost:8766 --interval 300 ``` ## Deployment ### Docker Compose ```yaml # docker-compose.sync.yml services: sync-postgres: image: pgvector/pgvector:pg15 environment: POSTGRES_DB: contextfs_sync POSTGRES_USER: contextfs POSTGRES_PASSWORD: contextfs ports: - "5432:5432" sync-server: build: . environment: DATABASE_URL: postgresql+asyncpg://contextfs:contextfs@sync-postgres/contextfs_sync ports: - "8766:8766" depends_on: - sync-postgres ``` ### Start Services ```bash # Start infrastructure docker-compose -f docker-compose.sync.yml up -d sync-postgres # Run server locally (development) python -m service.api.main # Or run everything in Docker docker-compose -f docker-compose.sync.yml up -d ``` ## Conflict Resolution ### Current Strategy Conflicts are returned to the client for manual resolution: ```python class ConflictInfo(BaseModel): entity_id: str entity_type: str # "memory", "session", "edge" client_clock: dict[str, int] server_clock: dict[str, int] client_content: str | None server_content: str | None client_updated_at: datetime server_updated_at: datetime ``` ### Future Options 1. **Last-Write-Wins (LWW)** - Use timestamp to auto-resolve 2. **LLM-Powered Merge** - Use AI to intelligently merge versions 3. **CRDT-Style** - Automatic merge for compatible operations (tag unions) ## Performance Characteristics | Operation | 1K memories | 10K memories | |-----------|-------------|--------------| | Push (with embeddings) | ~350ms | ~3.7s | | Pull (with embeddings) | ~390ms | ~3.8s | | Incremental sync | ~50ms | ~55ms | ### Optimizations 1. **Batch Save** - Single transaction for bulk inserts 2. **Skip RAG on Pull** - Embeddings inserted directly, no recompute 3. **Pagination** - Large syncs paginated (1000 items/page) 4. **Content Hashing** - Skip identical content (deduplication) ## Security Considerations ### Current State - Device ID-based identification (no authentication) - Plain HTTP (use HTTPS in production) ### Recommended for Production 1. API key or OAuth authentication 2. TLS for transport security 3. Server-side encryption for sensitive memories 4. Rate limiting per device ## Future Enhancements 1. **CRDT Integration** - Automatic conflict resolution for compatible ops 2. **Selective Sync** - Namespace-based sync policies 3. **Federated Architecture** - Multiple sync servers 4. **End-to-End Encryption** - Client-side encryption with key sharing 5. **WebSocket Push** - Real-time sync notifications

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MagnetonIO/contextfs'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

sync-service.md•16.7 KiB