Allows for the integration of Rewind.ai data into Raycast, enabling users to search and retrieve past conversation transcripts and screen OCR content through the launcher interface.
Supports filtering and retrieving screen OCR text content specifically captured while using the Safari web browser.
Provides tools to interface with and query the Rewind.ai SQLite database to programmatically access audio transcripts, screen OCR data, and activity tracking records.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@RewindMCPWhat did I discuss in my meeting with Sarah an hour ago?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
RewindDB
A Python library for interfacing with the Rewind.ai SQLite database.
Changelog
2025-07-04 - Voice Export & Training Data Features
NEW:
--export-own-voiceCLI option for exporting user's voice transcripts organized by dayNEW:
--speech-sourcefilter to separate user voice (me) from other speakers (others)NEW: Multi-format export support: text, JSON, and audio file export
NEW:
--export-format audiowith--audio-export-dirfor exporting actual M4A audio filesNEW:
my-words.shscript for generating word clouds from your voice dataENHANCED: RewindDB core library now supports speech source filtering
USE CASE: Perfect for collecting clean voice training data for LLM fine-tuning
FILTER: Text exports contain only user's voice (no other speakers), audio exports contain full conversations
Project Overview
RewindDB is a Python library that provides a convenient interface to the Rewind.ai SQLite database. Rewind.ai is a personal memory assistant that captures audio transcripts and screen OCR data in real-time. This project allows you to programmatically access and search through this data, making it possible to retrieve past conversations, find specific information mentioned in meetings, or analyze screen content from previous work sessions.
The project consists of three main components:
A core Python library (
rewinddb) for direct database accessCommand-line tools for transcript retrieval, keyword searching, screen OCR data retrieval, and activity tracking
An MCP STDIO server that exposes these capabilities to GenAI models through the standardized Model Context Protocol
The main purpose of this project, for me, was to connect Rewind to my Raycast:
Installation
Prerequisites
Python 3.6+
Install from Source
Manual Installation
Configuration
RewindDB uses a .env file to store database connection parameters. This approach avoids hardcoding sensitive information like database paths and passwords in the source code.
Setting Up the .env File
Create a
.envfile in your project directory or in your home directory as~/.rewinddb.envAdd the following configuration parameters:
For example:
Custom .env File Location
You can also specify a custom location for your .env file when using the library or CLI tools:
CLI Tools
transcript_cli.py
Retrieve audio transcripts from the Rewind.ai database with advanced voice filtering and export capabilities.
Basic Transcript Retrieval
Voice Source Filtering
Voice Export for Training Data šļø
Perfect for collecting clean voice training data for LLM fine-tuning
Key Features:
Clean Training Data: Text exports contain only YOUR voice, filtered out other speakers
Audio Export: M4A files organized by day with transcript summaries
Multiple Formats: Text (readable), JSON (structured), Audio (original files)
Day Organization: Perfect for chronological training data or analysis
Word Cloud: Quick visualization of your most-used words with
my-words.sh
search_cli.py
Search for keywords across both audio transcripts and screen OCR data.
ocr_cli.py
Retrieve screen OCR (Optical Character Recognition) data from the Rewind.ai database. This tool allows you to see what text was visible on your screen during specific time periods, providing complete OCR text content rather than just metadata about frames and nodes.
Key Features:
Time formats: Supports relative time ("1 hour", "5h", "30m", "2d", "1w") and absolute time ranges
Application filtering: Use
--list-appsto see available applications, then--appto filter by specific appFlexible time input: Accepts various formats including date-only, time-only, and full datetime strings
Text extraction: Shows actual text content that was visible on screen, organized by timestamp and application
activity_cli.py
Display comprehensive activity tracking data from the Rewind.ai database, including computer usage patterns, application usage statistics, and calendar meetings.
Key Features:
Active Hours: Shows when your computer was actively being used, with hourly and daily breakdowns
Application Usage: Displays top applications by usage time with visual charts
Calendar Meetings: Shows meeting statistics, duration, and distribution by time of day
Visual Charts: Includes simple ASCII bar charts for easy data visualization
Time Zone Support: Displays times in local timezone by default, with UTC option available
MCP STDIO Server
The Model Context Protocol (MCP) server exposes RewindDB functionality to GenAI models through the standardized MCP STDIO protocol. This implementation is fully MCP-compliant and works with MCP clients like Claude, Raycast, and other AI assistants.
Quick Start
Available Tools
The MCP server provides the following tools:
get_transcripts_relative: Get audio transcripts from a relative time period (e.g., "1hour", "30minutes", "1day", "1week"). Returns transcript sessions with full text content suitable for analysis, summarization, or detailed review. Each session includes complete transcript text and word-by-word timing.get_transcripts_absolute: PRIMARY TOOL for meeting summaries - Get complete audio transcripts from a specific time window (e.g., '3 PM meeting'). This is the FIRST tool to use when asked to summarize meetings, calls, or conversations from specific times. Returns full transcript sessions with complete text content ready for analysis and summarization.search_transcripts: Search for specific keywords/phrases in transcripts. NOT for meeting summaries - useget_transcripts_absoluteinstead when asked to summarize meetings from specific times. This tool finds keyword matches with context snippets, useful for finding specific topics or names mentioned across multiple sessions.search_screen_ocr: Search through OCR screen content for keywords. Finds text that appeared on screen during specific time periods. Use this to find what was displayed on screen, applications used, or visual content during meetings or work sessions. Complements audio transcripts by showing what was visible.get_screen_ocr_relative: Get all screen OCR content from a relative time period (e.g., "2hours", "1day"). Returns complete OCR text content that was visible on screen, organized by application and timestamp. Use this to see everything that was displayed during a time period.Parameters:
time_period(required, e.g., '1hour', '30minutes', '1day', '1week'),application(optional, filter by app name)Use Case: "Show me all screen content from the last 2 hours" or "Show me all Chrome content from the last day"
get_screen_ocr_absolute: Get all screen OCR content from a specific time window. Returns complete OCR text content from the specified time range, with optional application filtering. Essential for reviewing what was visible during meetings or work sessions.Parameters:
from(required, ISO format),to(required, ISO format),timezone(optional),application(optional, filter by app name)Use Case: "Show me all screen content from my 3 PM meeting" or "Show me all Slack content from yesterday afternoon"
get_ocr_applications_relative: Discover all applications that have OCR data from a relative time period. Shows which applications were active and their activity levels. Use this to identify applications before filtering OCR content.Parameters:
time_period(required, e.g., '1hour', '30minutes', '1day', '1week')Use Case: "What applications were active in the last 4 hours?" - helps users discover what apps to filter by
Returns: Frame count per application, OCR node count (activity level), number of unique windows, time range when application was active, sorted by activity level
get_ocr_applications_absolute: Discover all applications that have OCR data from a specific time window. Helps identify what applications were active during specific meetings or time periods.Parameters:
from(required, ISO format),to(required, ISO format),timezone(optional)Use Case: "What applications were active during my meeting from 2-3 PM?" - helps users discover what apps to filter by
Returns: Frame count per application, OCR node count (activity level), number of unique windows, time range when application was active, sorted by activity level
get_activity_stats: Get activity statistics for a specified time period (e.g., "1hour", "30minutes", "1day", "1week"). Provides comprehensive statistics about audio recordings, screen captures, and application usage.get_transcript_by_id: FOLLOW-UP TOOL - Get complete transcript content by audio ID. Use this AFTERget_transcripts_absoluteto retrieve full transcript text for summarization. Essential second step when the first tool shows preview text that needs complete content for proper analysis.
OCR Tools Workflow Integration
The OCR tools work together in a natural workflow for comprehensive screen content analysis:
Discovery Phase: Use
get_ocr_applications_*to see what applications were activeget_ocr_applications_relative: "2hours" ā Returns: Chrome, Slack, VS Code, Zoom, etc.Focused Retrieval: Use
get_screen_ocr_*with application filter to get specific contentget_screen_ocr_relative: time_period="2hours", application="Chrome" ā Returns: All Chrome OCR content from last 2 hoursKeyword Search: Use existing
search_screen_ocrfor specific content within resultssearch_screen_ocr: keyword="meeting notes", application="Slack" ā Returns: Specific matches for "meeting notes" in Slack
Key Features:
Application Filtering: Both OCR content tools support optional application filtering with case-insensitive matching (e.g., "chrome" matches "Google Chrome")
Rich Metadata: Application discovery tools provide frame count, OCR node count (activity level), number of unique windows, and time ranges
Consistent Time Handling: All tools use smart datetime parsing supporting both relative time periods and absolute time ranges with timezone handling
Complete Content Access: OCR content tools return actual OCR text content grouped by frame, not just metadata
Meeting Analysis: Perfect for reviewing what was displayed during meetings or work sessions
Benefits and Use Cases:
Complete OCR Access: Users can now pull all screen content from any time window, not just search for specific keywords
Application Discovery: Easy way to see what apps were active during specific periods before filtering content
Flexible Filtering: Can focus on specific applications after discovery phase
Meeting Analysis: Perfect for reviewing what was displayed during meetings or presentations
Work Session Review: Analyze screen activity and content during specific work periods
Seamless Integration: Works with existing search and transcript tools for comprehensive data analysis
MCP Client Integration
The server follows the MCP specification and can be used with any MCP-compatible client. Example configuration:
For detailed STDIO MCP setup instructions, configuration examples, and troubleshooting, see README-MCP-STDIO.md.
Library Usage
Basic Usage
Retrieving Audio Transcripts
Retrieving Screen OCR Data
Searching Across Data
Database Schema
The Rewind.ai database contains several key tables:
audio: Stores audio recording segments with timestampstranscript_word: Contains individual transcribed words linked to audio segmentsframe: Stores screen capture frames with timestampsnode: Contains text elements extracted from screen captures (OCR)segment: Tracks application and window usage sessionsevent: Stores calendar events and meetingssearchRanking_content: Stores OCR text content for searching
Data Types Explained
Audio Recordings
Audio recordings are captured by Rewind.ai when you speak or when there's audio playing on your computer. Each recording is stored as a segment in the audio table with metadata like start time and duration. These recordings are then processed to extract transcribed words.
Audio snippets are stored on disk at:
Transcript Words
Individual words extracted from audio recordings through speech recognition. Each word in the transcript_word table includes information about when it occurred within the audio recording (timeOffset), its position in the full text (fullTextOffset), and its duration. Transcript words are linked to their source audio recording.
Key Fields:
speechSource: Identifies the speaker -'me'for user's voice,'others'for other speakersword: The transcribed word texttimeOffset: Timing within the audio segment (milliseconds)duration: Length of the spoken word (milliseconds)
This speaker identification enables clean voice training data export by filtering to only the user's spoken words.
Frames
Screenshots captured by Rewind.ai at regular intervals as you use your computer. Each frame in the frame table includes a timestamp (createdAt) and is linked to the application segment it belongs to. Frames are the visual equivalent of audio recordings, capturing what was on your screen at specific moments.
Screen recordings are stored on disk as chunks at:
Where:
YYYYMM is the year and month (e.g., 202505 for May 2025)
DD is the day (e.g., 13 for the 13th)
[chunk_id] is a unique identifier for the recording chunk
Nodes
Text elements extracted from screen captures using Optical Character Recognition (OCR). Each node in the node table represents a piece of text visible on your screen, including its position (leftX, topY, width, height) and other metadata. Nodes are linked to the frame they were extracted from. They are the visual equivalent of transcript words.
SearchRanking_Content
This table stores the actual OCR text content extracted from screen captures. It contains three columns:
id: A unique identifier that can be used to locate the corresponding recording chunkc0: The main text content extracted from the screenc1: Timestamp informationc2: Window/application information
This table is crucial for searching through screen content and is used by the new OCR text retrieval methods (get_screen_ocr_text_absolute() and get_screen_ocr_text_relative()) to provide complete OCR text content rather than just metadata about frames and nodes.
Segments
Application usage sessions that track when you were using specific applications and windows. Each segment in the segment table includes the application bundle ID, window name, start time, and end time. Segments help organize frames and audio recordings by the application context they occurred in.
Events
Calendar events and meetings that were scheduled during your computer usage. The event table stores information about these events, including title, start time, end time, and other metadata. Events provide additional context about what you were doing during specific time periods.
Key relationships:
Audio segments are linked to transcript words
Frames are linked to nodes (text elements)
Frames and audio segments are associated with application segments
Events may be associated with specific segments
SearchRanking_content entries are linked to frames and contain the actual OCR text
Development
Setup for Development
Running Tests
Troubleshooting
Database Connection Issues
If you encounter database connection errors:
Verify that the Rewind.ai application is installed and has created the database
Check that your
.envfile contains the correct database path and passwordEnsure the RewindDB module is properly installed
No Transcripts Found
If no transcripts are returned:
Verify that the time range contains data (try expanding the time range)
Check that Rewind.ai was actively recording during the requested time period
Use the
--debugflag with CLI tools to see more information
MCP Server Issues
If the MCP server fails to start or respond:
Check that all MCP dependencies are installed:
pip install "mcp>=0.1.0"Verify your database configuration in the
.envfileCheck the server logs in
/tmp/mcp_stdio.logfor specific error messages
Stats CLI
The stats_cli.py tool provides comprehensive statistics about your Rewind.ai data:
The Stats CLI provides information about:
Database overview (size, tables, record counts)
Audio transcript statistics (counts by time period, earliest records)
Screen OCR statistics (counts by time period, earliest records)
Application usage statistics (most used applications, usage time)
Table record counts
This tool is useful for understanding the scope and content of your Rewind.ai data, and for diagnosing potential issues with data collection or storage.
License
MIT