test.mq•14.2 kB
MARKQANT_V1 2025-07-23T05:58:54.031828752+00:00 14935 14182
T3F=token map
T40=at least twice
T42=0x7F 0x00-0x0F
T16=let (header, content) = parse_header(compressed);
T4B=for different
T50=Skip X-token
T11=```rust
T36=&[u8]) -> String {
T17=< content.len() {
T34=Compression (Level
T21=let mut output = String::new();
T1C=- Streaming support for large files
T48=User: "I need
T18=token assignment
T19=if let Some(pattern) =
T4D=Marqant v2.0
T2C=let byte = content[i];
T1F=compression ratios
T38=if use_extensions
T32=maximum compression
T37=ASCII passthrough
T1B=project implements a fast and efficient
T22=mq compress file.md -o file.mq
T14=\n```\n
T1E=pattern token
T43=the extension
T3D=Compressed size
T46=patterns that
T44=(when needed)
T49=in token
T2B=| ✓ | ✓ |
T1A=| Skip | Skip | ✓ |
T2A=that provides
T2D=byte == 0x7F
T3A=## Implementation
T1D=Fast compression and decompression
T4F=## Extension
T24=full 8-bit range
T20=# Compress
T25== parse_base_tokens(header);
T47=parent tokens
T31=// Extended
T15=
T45=Original size
T3E=} else if
T30=size in bytes (hex)
T2F=efficient compression
T23=Extended pattern
T3C=extended features
T4A=(Level 0
T33=for markdown files.
T02=###
T4E=# Decompress
T26=Level 0 decoder
T2E=compression algorithm
T4C=mq decompress
T3B=Compression modes
T39=Base token
T27=[4 bytes]
T28=Semantic marker
T35=i < available_base
T29=output.push(byte as char);
T41=let mut i = 0;
---
# 📜 Marqant (.mq) Format Specification v2.0 - Rotating Token Edition
> "Marqant isn't a format. It's AI sheet music for memory fields." 🎼
> "Same meaning. Fewer notes. Faster recall."
## Overview
T4D introduces the **Rotating Token System** - a revolutionary approach where T18s are dynamic and contextual, with tokens rotating based on frequency and locality. This creates an ultra-T2F format that's both copy-paste friendly and AI-optimized.
T02Key Principles
1. **Token Economy**: Tokens are precious resources (only 256 available in `.mq`, T24 in `.mqb`)
2. **Frequency-Based Assignment**: A pattern must repeat T40 to earn a token
3. **Size-Aware Allocation**: Token count scales with file size (larger files = more tokens)
4. **Copy-Paste Friendly**: `.mq` format uses only printable ASCII characters
5. **Binary Mode**: `.mqb` format uses T24 for T32
## File Formats
T02.mq Format (Text-Safe)
- Uses ASCII printable characters (0x20-0x7E)
- Tilde (`~`, 0x7E) as primary control character
- Safe for copying through any text medium
- Ideal for embedding in documentation, chat, emails
T02.mqb Format (Binary)
- Uses T24 (0x00-0xFF)
- Maximum compression efficiency
- Suitable for file storage and transport
- Not safe for text-based transmission
## File Structure
T14Line 1: MQ2~<timestamp>~<orig_size>~<comp_size>~<token_count>~<format>~<level>
Line 2: ~T<base_token_map>
Line 3: [~X<extended_token_map>] (optional, for L1+)
Line 4: ~~~~
Line 5+: <compressed_content>T14
T02Header Fields
- `MQ2`: Format signature (version 2)
- `<timestamp>`: Unix timestamp (hex)
- `<orig_size>`: Original T30
- `<comp_size>`: Compressed T30
- `<token_count>`: Number of active tokens (hex)
- `<format>`: `mq` or `mqb`
- `<level>`: Decoder level required (`L0`, `L1`, `L2`)
T02Header ExamplesT14MQ2~6743A100~1000~800~50~mq~L0T15# Basic compression, T26 OK
MQ2~6743A100~1000~600~80~mqb~L1 # Uses extended tokens, needs Level 1
MQ2~6743A100~1000~400~A0~mqb~L2 # Full features, needs Level 2T14
## Token Architecture
T02Base Token Space (1 byte)
All tokens are single bytes (0x00-0xFF), with special reservations:
T140x00-0x1F: Control tokens (newline, tab, etc.)
0x20-0x7E: Direct T37 (in .mq format)
0x7F: Extension token (X-token) - "The Gateway"
0x80-0xFE: Assignable T1Es
0xFF: Reserved for future useT14
T02Extension Mechanism (X-token)
The extension token (0x7F) acts as a prefix to access T3C:
T14T42: Extended T1Es (4096 additional tokens)
0x7F 0x10-0x1F: T28s (code blocks, sections, etc.)
0x7F 0x20-0x2F: T3B (zlib, brotli, custom)
0x7F 0x30-0x3F: Metadata operations
0x7F 0x40-0x4F: Cross-file references
0x7F 0x50-0x5F: Delta/diff operations
0x7F 0x60-0xFF: Future extensionsT14
T02Decoder Levels
1. **Level 0 (Basic)**: Ignores X-tokens, processes only base tokens
2. **Level 1 (Extended)**: Supports extended T1Es
3. **Level 2 (Full)**: Supports all extension features
## Token Assignment Algorithm
T021. Frequency Analysis Phase
T11
// Scan content for repeated patterns
patterns = find_repeated_patterns(content, min_length=2)
// Filter T46 appear T40
valid_patterns = patterns.filter(|p| p.count >= 2)
// Calculate savings: (pattern_length - 1) * (count - 1)
patterns.sort_by_savings_descending()T14
T022. Token Allocation
T11
// T39 allocation T4A compatible)
const RESERVED_TOKENS: u8 = 32; // 0x00-0x1F
const ASCII_PASSTHROUGH: u8 = 95; // 0x20-0x7E (in .mq)
const X_TOKEN: u8 = 0x7F;
const RESERVED_END: u8 = 0xFF;
// Available base tokens
available_base = 256 - RESERVED_TOKENS - 1 - 1; // -1 for X_TOKEN, -1 for 0xFF
if format == "mq" {
T15available_base -= ASCII_PASSTHROUGH;
}
T31 tokens (optional)
T38 {
T15available_extended = 4096; // T42 + second byte
}T14
T023. Smart Assignment Strategy
T11
// Assign most frequent patterns to single-byte tokens
for (i, pattern) in patterns.iter().enumerate() {
T15if T35 {
T15T15// Direct single-byte assignment
T15T15assign_base_token(pattern, base_tokens[i]);
T15} else T38 && T35 + available_extended {
T15T15T31 token (2 bytes: 0x7F + extended_id)
T15T15assign_extended_token(pattern, i - available_base);
T15}
}
// Rotation windows for large documents
if content.length > rotation_threshold {
T15// Re-evaluate T18s per window
T15for window in split_into_windows(content) {
T15T15optimize_tokens_for_window(window);
T15}
}T14
## Token Map Format
T02Compact NotationT14~T<token><pattern>|<token><pattern>|...T14
T02ExamplesT14~T counting|! the |" and |# for |$ with |% that |& this |
~T(function|)return|*const |+let |,if |T14
T02Special Tokens
- `~~`: Literal tilde
- `~n`: Newline T44
- `~t`: Tab T44
- `~s`: Space (T49 definitions only)
T4F Benefits
T02Why This Design Works
1. **Backward Compatibility**: T26s can process files with extensions - they just skip the T3C
2. **Graceful Degradation**: Files remain readable even without extension support
3. **Future-Proof**: 0x7F gives us access to 256×256 = 65,536 possible extensions
4. **Efficiency**: Common patterns use 1 byte, less common use 3 bytes (0x7F + 2 bytes)
5. **Clean Design**: Unlike x86, our prefixes are designed from the start, not bolted on
T02Decoder Compatibility Matrix
| Feature | Level 0 | Level 1 | Level 2 |
|---------|---------|---------|---------|
| T39s (1 byte) T2B ✓ |
| T50s safely T2B ✓ |
| T23s | Skip T2B
| T28s T1A
| T3B T1A
| Cross-file refs T1A
#T20ion Examples
T02Example 1: Basic T34 0 Compatible)
```markdown
# Project Documentation
This T1B T2E
T2A excellent T1F T33
## Features
- T1D
- Excellent T1F
T1CT14
T02Compressed T4A - T39s only)T14MQ2~65432100~1A3~8F~15~mq~L0
~T\x80 Project|\x81 Documentation|\x82 implements|\x83 compression|\x84 algorithm|\x85 fast|\x86 efficient|\x87 excellent|\x88 ratios|\x89 markdown|\x8A Features|\x8B and|\x8C files|\x8D for|\x8E Streaming|
~~~~
#\x80\x81
This project\x82 a\x85\x8B\x86\x83\x84
T2A\x87\x83\x88\x8D\x89\x8C.
##\x8A
-\x85\x83\x8B decompression
-\x87\x83\x88
-\x8E support\x8D large\x8CT14
T02Example 2: Extended T34 1)
Same content, but with more patterns using extended tokens:
T14MQ2~65432100~1A3~7A~1F~mq~L1
~T\x80 Project|\x81 Documentation|...basic tokens...
~X\x00\x00 T2E|\x00\x01 excellent T1F|\x00\x02 T1D|
~~~~
#\x80\x81
This T1B\x7F\x00\x00
T2A\x7F\x00\x01 T33
##\x8A
-\x7F\x00\x02
-\x7F\x00\x01
T1CT14
T02Size Comparison
- Original: 419 bytes
- Level 0: ~280 bytes (33% compression)
- Level 1: ~220 bytes (48% compression)
- Level 2 with zlib: ~150 bytes (64% compression)
## Advanced Features
T021. Context-Aware Tokens
Different token sets T4B content types:T14~C:code
~T{if(|)}|*return |+void |,const |
~C:markdown
~TT02|!## |"- |#**|$*|T14
T022. Sliding Window Optimization
For very large files (>100KB):T14~W:1000 // Window size T49s
~R:5000 // Rotation pointT14
T023. Token Inheritance
Child sections can inherit T47:T14~P:parent
~T common tokens...
~P:child^parent // Inherits T47
~t additional child tokens...T14
T024. Metadata PreservationT14~M:author=TheCheet
~M:version=2.0
~M:encoding=utf-8T14
#T4Eion Algorithm
T02Level 0 Decoder (Basic - 8-bit only)
T11
fn decompress_basic(compressed: T36
T15T16
T15let token_map T25 // Only 1-byte tokens
T15T21
T15T41
T15
T15while i T17
T15T15T2C
T15T15if T2D {
T15T15T15// T50 and its parameter
T15T15T15i += 2;
T15T15T15continue;
T15T15}
T15T15
T15T15T19 token_map.get(&byte) {
T15T15T15output.push_str(pattern);
T15T15T3E byte >= 0x20 && byte <= 0x7E {
T15T15T15T29 // T37
T15T15}
T15T15i += 1;
T15}
T15
T15output
}T14
T02Level 1 Decoder (With Extensions)
T11
fn decompress_extended(compressed: T36
T15T16
T15let base_tokens T25
T15let ext_tokens = parse_extended_tokens(header);
T15T21
T15T41
T15
T15while i T17
T15T15T2C
T15T15
T15T15if T2D && i + 1 T17
T15T15T15let ext_type = content[i + 1];
T15T15T15
T15T15T15match ext_type >> 4 {
T15T15T15T150x0 => {
T15T15T15T15T15T31 T1E
T15T15T15T15T15if i + 2 T17
T15T15T15T15T15T15let token_id = ((ext_type & 0x0F) as u16) << 8 | content[i + 2] as u16;
T15T15T15T15T15T15T19 ext_tokens.get(&token_id) {
T15T15T15T15T15T15T15output.push_str(pattern);
T15T15T15T15T15T15}
T15T15T15T15T15T15i += 3;
T15T15T15T15T15}
T15T15T15T15}
T15T15T15T150x1 => {
T15T15T15T15T15// T28 - Level 2 feature
T15T15T15T15T15i += 2; // Skip for Level 1
T15T15T15T15}
T15T15T15T15_ => i += 2, // Unknown extension, skip
T15T15T15}
T15T15} else T19 base_tokens.get(&byte) {
T15T15T15output.push_str(pattern);
T15T15T15i += 1;
T15T15} else {
T15T15T15T29
T15T15T15i += 1;
T15T15}
T15}
T15
T15output
}T14
## Binary Format (.mqb)
T02Extended Token Range
- Uses bytes 0x00-0xFF (excluding reserved)
- Reserved: 0x00 (null), 0x0A (LF), 0x0D (CR)
- Available tokens: 253
T02Binary HeaderT14T27 Magic: "MQB\x02"
T27 Timestamp (big-endian)
T27 T45
T27 T3D
[2 bytes] Token count
[2 bytes] Header checksum
[Variable] Token map (length-prefixed)
T27 Content separator: 0xFFFFFFFF
[Variable] Compressed contentT14
## Performance Characteristics
T02Compression Ratios
- Markdown: 40-60% compression
- Code: 30-50% compression
- JSON/XML: 50-70% compression
- Logs: 60-80% compression
T02Speed
- Compression: ~100MB/s
- Decompression: ~500MB/s
- Memory usage: O(token_count + window_size)
T3A Notes
T02Token Selection Strategy
1. **Length × Frequency**: Prioritize patterns with highest `(length - 1) × (frequency - 1)`
2. **Boundary Awareness**: Prefer T46 start/end at word boundaries
3. **Nested Patterns**: Avoid tokens that overlap significantly
4. **Context Clustering**: Group related patterns together T49 space
T02Memory Efficiency
- Streaming decompression: Read T3F, then stream content
- Chunked compression: Process large files in windows
- Token recycling: Reuse tokens T4B patterns in different sections
## CLI Usage
```bash
T20 with .mq format (text-safe)
T22
T20 with .mqb format (binary, max compression)
T22b --binary
T4E (auto-detects format)
T4C file.mq -o file.md
# Analyze compression potential
mq analyze file.md --verbose
# Streaming compression
cat large.md | mq compress --stream > large.mq
# Inspect compressed file
mq inspect file.mq --show-tokensT14
## Integration Examples
T02Smart Tree Bundle
```bash
# Bundle entire project into single .mq file
st --mode bundle /project > project.mq
# Extract bundle
mq extract project.mq -o /restored/project/T14
T02Git Integration
```bash
# Add .mq differ
git config diff.mq.textconv "T4C"
# Track compressed versions
echo "*.mq filter=mq" >> .gitattributesT14
## Future Extensions
T02Planned Features
1. **Delta Compression**: Store only changes between versions
2. **Semantic Tokens**: AI-aware T18 based on meaning
3. **Multi-Language**: Optimized token sets per programming language
4. **Encryption**: Built-in encryption with T3F as part of key
5. **Distributed Tokens**: Share T3Fs across related files
T02Research Areas
- Machine learning for optimal T18
- Quantum-inspired superposition tokens
- Neural compression with learned representations
- Cross-file token sharing for projects
## The X-Token Philosophy
T02Learning from x86's Mistakes
The x86 instruction set grew organically, with prefixes added ad-hoc over decades:
- Multiple conflicting prefix bytes
- Complex decoder state machines
- Instruction length nightmares
- Backward compatibility through accidents
T02The Marqant Way
We designed T43 mechanism right from the start:
- **Single Gateway**: Only 0x7F opens T43 space
- **Clear Hierarchy**: Extensions organized by function (0x0X = patterns, 0x1X = semantic, etc.)
- **Skip-Safe**: Unknown extensions can be safely skipped
- **Length-Explicit**: X-tokens always consume exactly 2 or 3 bytes total
T02Real-World Benefits
T14Basic User: "I just need simple compression"
→ Use Level 0 tools, ignore extensions completely
Power T48 T32"
→ Level 1 gives you 4096 extra tokens
AI System: "I need semantic understanding"
→ Level 2 provides rich metadata and cross-references
Future T48 feature X that doesn't exist yet"
→ Assign it to 0x7F 0x60-0xFF, older decoders still work!T14
T02Implementation Simplicity
```c
// Dead simple T26 core
while (pos < len) {
T15byte = data[pos++];
T15if (T2D) {
T15T15pos += 2; // Skip extension
T15T15continue;
T15}
T15output(tokens[byte] || byte);
}T14
## The Cheet Says...
> "Version 2 is like going from acoustic to electric - same song, but now we can really shred! 🎸 The rotating tokens are like a DJ mixing tracks - always keeping the best beats in rotation!"
> "And that X-token? It's like having a backstage pass - basic fans enjoy the show, but if you've got the pass, you get to see ALL the magic! Unlike x86 which is like a tour bus held together with duct tape, we built a proper tour rig from day one! 🚌→🚁"
---
*Remember: The best compression is the one that preserves meaning while minimizing bits. T4D does both, with style!*