# iMessage Database Format Documentation
This document describes the iMessage database format used on macOS, including the schema, data types, and how to query message content. This information was reverse-engineered from macOS Tahoe (26.x) and is known to work on recent macOS versions. The exact version when Apple introduced the `attributedBody` binary format is undocumented.
## Database Location and Access
### File Location
```
~/Library/Messages/chat.db
```
This is a SQLite database file. Attachments are stored separately in:
```
~/Library/Messages/Attachments/
```
### Required Permissions
Reading the database requires **Full Disk Access** permission on macOS. This must be granted to the application accessing the database (Terminal, Python, Claude Desktop, etc.) via:
System Settings → Privacy & Security → Full Disk Access
### Connection Best Practices
The database uses SQLite with WAL (Write-Ahead Logging) mode, which allows reading while Messages.app is writing. To avoid lock contention:
```python
import sqlite3
# Use URI mode with read-only flag
uri = f"file://{db_path}?mode=ro"
conn = sqlite3.connect(uri, uri=True)
# Set busy timeout to wait for locks
conn.execute("PRAGMA busy_timeout = 5000")
# Use row factory for dict-like access
conn.row_factory = sqlite3.Row
```
## Database Schema
### Core Tables
#### `message` - Individual Messages
| Column | Type | Description |
|--------|------|-------------|
| ROWID | INTEGER | Primary key |
| guid | TEXT | Unique message identifier |
| text | TEXT | Plain text content (may be NULL on recent macOS) |
| attributedBody | BLOB | Binary format containing styled text (recent macOS) |
| handle_id | INTEGER | Foreign key to `handle.ROWID` |
| date | INTEGER | CoreData timestamp (see below) |
| date_read | INTEGER | When message was read |
| date_delivered | INTEGER | When message was delivered |
| is_from_me | INTEGER | 1 if sent by user, 0 if received |
| is_read | INTEGER | 1 if read, 0 otherwise |
| is_delivered | INTEGER | 1 if delivered |
| is_sent | INTEGER | 1 if sent successfully |
| cache_has_attachments | INTEGER | 1 if message has attachments |
| service | TEXT | "iMessage" or "SMS" |
| reply_to_guid | TEXT | GUID of message being replied to |
| associated_message_guid | TEXT | For reactions/tapbacks |
| associated_message_type | INTEGER | Type of association (tapback type, etc.) |
#### `handle` - Contacts/Recipients
| Column | Type | Description |
|--------|------|-------------|
| ROWID | INTEGER | Primary key |
| id | TEXT | Phone number (e.g., "+15551234567") or email |
| service | TEXT | "iMessage" or "SMS" |
| country | TEXT | Country code |
#### `chat` - Conversations
| Column | Type | Description |
|--------|------|-------------|
| ROWID | INTEGER | Primary key |
| guid | TEXT | Unique chat identifier (see format below) |
| display_name | TEXT | Group name (NULL for 1:1 chats) |
| service_name | TEXT | "iMessage" or "SMS" |
| chat_identifier | TEXT | Phone/email for 1:1, identifier for groups |
#### `attachment` - Media Files
| Column | Type | Description |
|--------|------|-------------|
| ROWID | INTEGER | Primary key |
| guid | TEXT | Unique identifier |
| filename | TEXT | Full path to file |
| mime_type | TEXT | MIME type (e.g., "image/jpeg") |
| transfer_name | TEXT | Original filename |
| total_bytes | INTEGER | File size |
### Join Tables
#### `chat_message_join`
Links messages to chats (a message belongs to one chat).
| Column | Type | Description |
|--------|------|-------------|
| chat_id | INTEGER | Foreign key to `chat.ROWID` |
| message_id | INTEGER | Foreign key to `message.ROWID` |
| message_date | INTEGER | Timestamp for ordering |
#### `chat_handle_join`
Links handles to chats (participants in a conversation).
| Column | Type | Description |
|--------|------|-------------|
| chat_id | INTEGER | Foreign key to `chat.ROWID` |
| handle_id | INTEGER | Foreign key to `handle.ROWID` |
#### `message_attachment_join`
Links attachments to messages.
| Column | Type | Description |
|--------|------|-------------|
| message_id | INTEGER | Foreign key to `message.ROWID` |
| attachment_id | INTEGER | Foreign key to `attachment.ROWID` |
### Table Relationships
```
handle ←──┐
│ (chat_handle_join)
chat ─────┤
│ (chat_message_join)
message ──┼─── attachment (message_attachment_join)
│
└─── handle (message.handle_id)
```
## Chat GUID Format
The `chat.guid` column uses a specific format:
```
{service};{type};{identifier}
```
### Components
| Component | Values | Description |
|-----------|--------|-------------|
| service | `iMessage`, `SMS` | The messaging service |
| type | `-`, `+` | `-` = 1:1 conversation, `+` = group chat |
| identifier | varies | Phone/email for 1:1, unique ID for groups |
### Examples
```
iMessage;-;+15551234567 # 1:1 iMessage with phone number
iMessage;-;user@example.com # 1:1 iMessage with email
SMS;-;+15551234567 # 1:1 SMS conversation
iMessage;+;chat123456 # Group iMessage chat
```
### Parsing Code
```python
def parse_chat_guid(guid: str) -> tuple[str, bool, str]:
"""Parse a chat GUID into components.
Returns:
Tuple of (service, is_group, identifier)
"""
parts = guid.split(";", 2)
if len(parts) != 3:
return ("unknown", False, guid)
service = parts[0]
is_group = parts[1] == "+"
identifier = parts[2]
return (service, is_group, identifier)
```
## CoreData Timestamp Format
Apple uses a timestamp format based on the **CoreData epoch** (January 1, 2001) rather than the Unix epoch (January 1, 1970).
### Epoch Offset
```python
COREDATA_EPOCH = 978307200 # Seconds between Unix epoch and CoreData epoch
```
### Timestamp Units
The `date` column may be in **seconds** or **nanoseconds** depending on macOS version:
- **Older macOS**: Seconds since CoreData epoch
- **Recent macOS**: Nanoseconds since CoreData epoch
### Detection Logic
```python
def coredata_to_datetime(timestamp: int | None) -> datetime | None:
"""Convert CoreData timestamp to Python datetime."""
if timestamp is None:
return None
# Detect nanoseconds (values > 1 trillion are in nanoseconds)
if timestamp > 1_000_000_000_000:
timestamp = timestamp / 1_000_000_000
# Convert to Unix timestamp
unix_ts = timestamp + 978307200
return datetime.fromtimestamp(unix_ts, tz=timezone.utc)
```
### Example Values
| CoreData Value | Interpretation | Date |
|---------------|----------------|------|
| 725760000 | Seconds | 2024-01-01 00:00:00 UTC |
| 725760000000000000 | Nanoseconds | 2024-01-01 00:00:00 UTC |
## attributedBody Binary Format
### Overview
On recent macOS versions, message text is stored in the `attributedBody` column as a binary "typedstream" format, while the `text` column is often NULL. This is an Apple-proprietary serialization of `NSAttributedString`.
### Format Structure
The binary data follows this general structure:
```
[Header: "streamtyped" + metadata]
[NSAttributedString class marker]
[NSString class marker]
[Type markers: 0x01 0x94 0x84 0x01]
['+' marker: 0x2B]
[Length prefix: 1 or 3 bytes]
[UTF-8 text content]
[Attribute dictionary (NSDictionary)]
```
### Hex Dump Example
A simple "Hello World!" message:
```
00000000: 04 0b 73 74 72 65 61 6d 74 79 70 65 64 81 e8 03 ..streamtyped...
00000010: 84 01 40 84 84 84 12 4e 53 41 74 74 72 69 62 75 ..@....NSAttribu
00000020: 74 65 64 53 74 72 69 6e 67 00 84 84 08 4e 53 4f tedString....NSO
00000030: 62 6a 65 63 74 00 85 92 84 84 84 08 4e 53 53 74 bject.......NSSt
00000040: 72 69 6e 67 01 94 84 01 2b 0c 48 65 6c 6c 6f 20 ring....+.Hello
00000050: 57 6f 72 6c 64 21 86 ... World!.
```
### Key Markers
| Offset | Bytes | Meaning |
|--------|-------|---------|
| 0x00 | `04 0b` | Header length markers |
| 0x02 | `streamtyped` | Format identifier |
| 0x17 | `NSAttributedString` | Root class |
| 0x3c | `NSString` | Text content class |
| 0x47 | `01 94 84 01` | Type markers (format metadata) |
| 0x4b | `2b` | '+' marker - text follows |
| 0x4c | `0c` | Length byte (12 = "Hello World!".length) |
| 0x4d | UTF-8 | Actual message text |
### Length Prefix Encoding
The length prefix after the '+' marker uses a variable-length encoding:
#### Single-byte length (strings up to 127 bytes)
```
[+] [length: 1 byte] [text: length bytes]
```
Example: `2b 0c 48 65 6c 6c 6f ...` = "Hello World!" (12 chars)
#### Two-byte length (strings 128-65535 bytes)
```
[+] [0x81] [length low byte] [length high byte] [text: length bytes]
```
The `0x81` byte indicates a 2-byte little-endian length follows.
Example: `2b 81 80 00 ...` = 128-byte string
### Parsing Algorithm
```python
def parse_attributed_body(blob: bytes) -> str | None:
"""Extract text from attributedBody binary format."""
# 1. Find NSString marker
ns_idx = blob.find(b"NSString")
if ns_idx == -1:
return None
# 2. Search for '+' marker within 20 bytes
search_start = ns_idx + 8 # len("NSString")
search_region = blob[search_start:search_start + 20]
plus_offset = search_region.find(b"+")
if plus_offset == -1:
return None
# 3. Get length byte
len_offset = search_start + plus_offset + 1
length = blob[len_offset]
# 4. Handle 2-byte length
if length == 0x81:
length = blob[len_offset + 1] | (blob[len_offset + 2] << 8)
text_start = len_offset + 3
else:
text_start = len_offset + 1
# 5. Extract UTF-8 text
text_bytes = blob[text_start:text_start + length]
return text_bytes.decode("utf-8")
```
### Edge Cases
#### Emoji Messages
Emoji are encoded as UTF-8, so a single emoji like "😀" is:
- 1 character in the string
- 4 bytes in UTF-8
- Length byte = 4
```
+.....NSString....+ 04 f0 9f 98 80 # Length 4, then 😀 in UTF-8
```
#### System Messages
Some messages have no text content (e.g., "Jon renamed the group"):
- The `text` column is NULL
- The `attributedBody` may be NULL or contain only metadata
- Parser returns None for these
#### Messages with Only Attachments
Photo/video-only messages:
- `cache_has_attachments = 1`
- `text` is NULL
- `attributedBody` may be NULL
- Use the attachment join table to find associated files
## Common SQL Queries
### Get Recent Messages
```sql
SELECT
m.ROWID,
m.guid,
m.text,
m.attributedBody,
m.is_from_me,
m.date,
h.id as sender,
c.guid as chat_guid,
c.display_name as chat_name
FROM message m
LEFT JOIN handle h ON m.handle_id = h.ROWID
JOIN chat_message_join cmj ON m.ROWID = cmj.message_id
JOIN chat c ON cmj.chat_id = c.ROWID
ORDER BY m.date DESC
LIMIT 50;
```
### Get Messages from a Specific Contact
```sql
SELECT
m.ROWID,
m.text,
m.attributedBody,
m.is_from_me,
m.date
FROM message m
JOIN chat_message_join cmj ON m.ROWID = cmj.message_id
JOIN chat c ON cmj.chat_id = c.ROWID
JOIN chat_handle_join chj ON c.ROWID = chj.chat_id
JOIN handle h ON chj.handle_id = h.ROWID
WHERE h.id = '+15551234567'
ORDER BY m.date DESC
LIMIT 100;
```
### List All Conversations
```sql
SELECT
c.ROWID as chat_id,
c.guid,
c.display_name,
c.chat_identifier,
(SELECT COUNT(*) FROM chat_message_join cmj WHERE cmj.chat_id = c.ROWID) as message_count,
(SELECT MAX(m.date) FROM message m
JOIN chat_message_join cmj ON m.ROWID = cmj.message_id
WHERE cmj.chat_id = c.ROWID) as last_message_date
FROM chat c
ORDER BY last_message_date DESC;
```
### Search Messages by Content
```sql
SELECT
m.ROWID,
m.text,
m.date,
h.id as sender,
c.display_name as chat_name
FROM message m
LEFT JOIN handle h ON m.handle_id = h.ROWID
JOIN chat_message_join cmj ON m.ROWID = cmj.message_id
JOIN chat c ON cmj.chat_id = c.ROWID
WHERE m.text LIKE '%search term%'
ORDER BY m.date DESC
LIMIT 50;
```
Note: This only searches the `text` column. On recent macOS, you must also parse `attributedBody` in application code.
### Get Attachments for a Message
```sql
SELECT
a.filename,
a.mime_type,
a.transfer_name,
a.total_bytes
FROM attachment a
JOIN message_attachment_join maj ON a.ROWID = maj.attachment_id
WHERE maj.message_id = 12345; -- message ROWID
```
## Handle Normalization
Phone numbers and emails may be stored with varying formats. Normalize for comparison:
```python
import re
def normalize_handle(handle: str) -> str:
"""Normalize a phone number or email for comparison."""
handle = handle.strip()
# Email addresses: lowercase
if "@" in handle:
return handle.lower()
# Phone numbers: keep only digits and leading +
if handle.startswith("+"):
return "+" + re.sub(r"[^\d]", "", handle[1:])
return re.sub(r"[^\d]", "", handle)
```
## Version Differences
The exact macOS version boundaries are undocumented, but the general pattern is:
| Feature | Older macOS | Recent macOS |
|---------|-------------|--------------|
| Text storage | `text` column | `attributedBody` column |
| Timestamp unit | Seconds | Nanoseconds |
| Message format | Plain text | Binary typedstream |
The code handles both formats via fallback logic: try `text` first, then parse `attributedBody`.
## References
This documentation was created through reverse engineering of the iMessage database. No official Apple documentation exists for:
- The attributedBody binary format
- The typedstream serialization format
- Database schema details
The information here is based on analysis of actual database contents on macOS Tahoe 26.x.