Skip to main content
Glama

Smart Tree - ST

by 8b-is
MIT License
0
138
  • Apple
  • Linux
test.mq14.2 kB
MARKQANT_V1 2025-07-23T05:58:54.031828752+00:00 14935 14182 T3F=token map T40=at least twice T42=0x7F 0x00-0x0F T16=let (header, content) = parse_header(compressed); T4B=for different T50=Skip X-token T11=```rust T36=&[u8]) -> String { T17=< content.len() { T34=Compression (Level T21=let mut output = String::new(); T1C=- Streaming support for large files T48=User: "I need T18=token assignment T19=if let Some(pattern) = T4D=Marqant v2.0 T2C=let byte = content[i]; T1F=compression ratios T38=if use_extensions T32=maximum compression T37=ASCII passthrough T1B=project implements a fast and efficient T22=mq compress file.md -o file.mq T14=\n```\n T1E=pattern token T43=the extension T3D=Compressed size T46=patterns that T44=(when needed) T49=in token T2B=| ✓ | ✓ | T1A=| Skip | Skip | ✓ | T2A=that provides T2D=byte == 0x7F T3A=## Implementation T1D=Fast compression and decompression T4F=## Extension T24=full 8-bit range T20=# Compress T25== parse_base_tokens(header); T47=parent tokens T31=// Extended T15= T45=Original size T3E=} else if T30=size in bytes (hex) T2F=efficient compression T23=Extended pattern T3C=extended features T4A=(Level 0 T33=for markdown files. T02=### T4E=# Decompress T26=Level 0 decoder T2E=compression algorithm T4C=mq decompress T3B=Compression modes T39=Base token T27=[4 bytes] T28=Semantic marker T35=i < available_base T29=output.push(byte as char); T41=let mut i = 0; --- # 📜 Marqant (.mq) Format Specification v2.0 - Rotating Token Edition > "Marqant isn't a format. It's AI sheet music for memory fields." 🎼 > "Same meaning. Fewer notes. Faster recall." ## Overview T4D introduces the **Rotating Token System** - a revolutionary approach where T18s are dynamic and contextual, with tokens rotating based on frequency and locality. This creates an ultra-T2F format that's both copy-paste friendly and AI-optimized. T02Key Principles 1. **Token Economy**: Tokens are precious resources (only 256 available in `.mq`, T24 in `.mqb`) 2. **Frequency-Based Assignment**: A pattern must repeat T40 to earn a token 3. **Size-Aware Allocation**: Token count scales with file size (larger files = more tokens) 4. **Copy-Paste Friendly**: `.mq` format uses only printable ASCII characters 5. **Binary Mode**: `.mqb` format uses T24 for T32 ## File Formats T02.mq Format (Text-Safe) - Uses ASCII printable characters (0x20-0x7E) - Tilde (`~`, 0x7E) as primary control character - Safe for copying through any text medium - Ideal for embedding in documentation, chat, emails T02.mqb Format (Binary) - Uses T24 (0x00-0xFF) - Maximum compression efficiency - Suitable for file storage and transport - Not safe for text-based transmission ## File Structure T14Line 1: MQ2~<timestamp>~<orig_size>~<comp_size>~<token_count>~<format>~<level> Line 2: ~T<base_token_map> Line 3: [~X<extended_token_map>] (optional, for L1+) Line 4: ~~~~ Line 5+: <compressed_content>T14 T02Header Fields - `MQ2`: Format signature (version 2) - `<timestamp>`: Unix timestamp (hex) - `<orig_size>`: Original T30 - `<comp_size>`: Compressed T30 - `<token_count>`: Number of active tokens (hex) - `<format>`: `mq` or `mqb` - `<level>`: Decoder level required (`L0`, `L1`, `L2`) T02Header ExamplesT14MQ2~6743A100~1000~800~50~mq~L0T15# Basic compression, T26 OK MQ2~6743A100~1000~600~80~mqb~L1 # Uses extended tokens, needs Level 1 MQ2~6743A100~1000~400~A0~mqb~L2 # Full features, needs Level 2T14 ## Token Architecture T02Base Token Space (1 byte) All tokens are single bytes (0x00-0xFF), with special reservations: T140x00-0x1F: Control tokens (newline, tab, etc.) 0x20-0x7E: Direct T37 (in .mq format) 0x7F: Extension token (X-token) - "The Gateway" 0x80-0xFE: Assignable T1Es 0xFF: Reserved for future useT14 T02Extension Mechanism (X-token) The extension token (0x7F) acts as a prefix to access T3C: T14T42: Extended T1Es (4096 additional tokens) 0x7F 0x10-0x1F: T28s (code blocks, sections, etc.) 0x7F 0x20-0x2F: T3B (zlib, brotli, custom) 0x7F 0x30-0x3F: Metadata operations 0x7F 0x40-0x4F: Cross-file references 0x7F 0x50-0x5F: Delta/diff operations 0x7F 0x60-0xFF: Future extensionsT14 T02Decoder Levels 1. **Level 0 (Basic)**: Ignores X-tokens, processes only base tokens 2. **Level 1 (Extended)**: Supports extended T1Es 3. **Level 2 (Full)**: Supports all extension features ## Token Assignment Algorithm T021. Frequency Analysis Phase T11 // Scan content for repeated patterns patterns = find_repeated_patterns(content, min_length=2) // Filter T46 appear T40 valid_patterns = patterns.filter(|p| p.count >= 2) // Calculate savings: (pattern_length - 1) * (count - 1) patterns.sort_by_savings_descending()T14 T022. Token Allocation T11 // T39 allocation T4A compatible) const RESERVED_TOKENS: u8 = 32; // 0x00-0x1F const ASCII_PASSTHROUGH: u8 = 95; // 0x20-0x7E (in .mq) const X_TOKEN: u8 = 0x7F; const RESERVED_END: u8 = 0xFF; // Available base tokens available_base = 256 - RESERVED_TOKENS - 1 - 1; // -1 for X_TOKEN, -1 for 0xFF if format == "mq" { T15available_base -= ASCII_PASSTHROUGH; } T31 tokens (optional) T38 { T15available_extended = 4096; // T42 + second byte }T14 T023. Smart Assignment Strategy T11 // Assign most frequent patterns to single-byte tokens for (i, pattern) in patterns.iter().enumerate() { T15if T35 { T15T15// Direct single-byte assignment T15T15assign_base_token(pattern, base_tokens[i]); T15} else T38 && T35 + available_extended { T15T15T31 token (2 bytes: 0x7F + extended_id) T15T15assign_extended_token(pattern, i - available_base); T15} } // Rotation windows for large documents if content.length > rotation_threshold { T15// Re-evaluate T18s per window T15for window in split_into_windows(content) { T15T15optimize_tokens_for_window(window); T15} }T14 ## Token Map Format T02Compact NotationT14~T<token><pattern>|<token><pattern>|...T14 T02ExamplesT14~T counting|! the |" and |# for |$ with |% that |& this | ~T(function|)return|*const |+let |,if |T14 T02Special Tokens - `~~`: Literal tilde - `~n`: Newline T44 - `~t`: Tab T44 - `~s`: Space (T49 definitions only) T4F Benefits T02Why This Design Works 1. **Backward Compatibility**: T26s can process files with extensions - they just skip the T3C 2. **Graceful Degradation**: Files remain readable even without extension support 3. **Future-Proof**: 0x7F gives us access to 256×256 = 65,536 possible extensions 4. **Efficiency**: Common patterns use 1 byte, less common use 3 bytes (0x7F + 2 bytes) 5. **Clean Design**: Unlike x86, our prefixes are designed from the start, not bolted on T02Decoder Compatibility Matrix | Feature | Level 0 | Level 1 | Level 2 | |---------|---------|---------|---------| | T39s (1 byte) T2B ✓ | | T50s safely T2B ✓ | | T23s | Skip T2B | T28s T1A | T3B T1A | Cross-file refs T1A #T20ion Examples T02Example 1: Basic T34 0 Compatible) ```markdown # Project Documentation This T1B T2E T2A excellent T1F T33 ## Features - T1D - Excellent T1F T1CT14 T02Compressed T4A - T39s only)T14MQ2~65432100~1A3~8F~15~mq~L0 ~T\x80 Project|\x81 Documentation|\x82 implements|\x83 compression|\x84 algorithm|\x85 fast|\x86 efficient|\x87 excellent|\x88 ratios|\x89 markdown|\x8A Features|\x8B and|\x8C files|\x8D for|\x8E Streaming| ~~~~ #\x80\x81 This project\x82 a\x85\x8B\x86\x83\x84 T2A\x87\x83\x88\x8D\x89\x8C. ##\x8A -\x85\x83\x8B decompression -\x87\x83\x88 -\x8E support\x8D large\x8CT14 T02Example 2: Extended T34 1) Same content, but with more patterns using extended tokens: T14MQ2~65432100~1A3~7A~1F~mq~L1 ~T\x80 Project|\x81 Documentation|...basic tokens... ~X\x00\x00 T2E|\x00\x01 excellent T1F|\x00\x02 T1D| ~~~~ #\x80\x81 This T1B\x7F\x00\x00 T2A\x7F\x00\x01 T33 ##\x8A -\x7F\x00\x02 -\x7F\x00\x01 T1CT14 T02Size Comparison - Original: 419 bytes - Level 0: ~280 bytes (33% compression) - Level 1: ~220 bytes (48% compression) - Level 2 with zlib: ~150 bytes (64% compression) ## Advanced Features T021. Context-Aware Tokens Different token sets T4B content types:T14~C:code ~T{if(|)}|*return |+void |,const | ~C:markdown ~TT02|!## |"- |#**|$*|T14 T022. Sliding Window Optimization For very large files (>100KB):T14~W:1000 // Window size T49s ~R:5000 // Rotation pointT14 T023. Token Inheritance Child sections can inherit T47:T14~P:parent ~T common tokens... ~P:child^parent // Inherits T47 ~t additional child tokens...T14 T024. Metadata PreservationT14~M:author=TheCheet ~M:version=2.0 ~M:encoding=utf-8T14 #T4Eion Algorithm T02Level 0 Decoder (Basic - 8-bit only) T11 fn decompress_basic(compressed: T36 T15T16 T15let token_map T25 // Only 1-byte tokens T15T21 T15T41 T15 T15while i T17 T15T15T2C T15T15if T2D { T15T15T15// T50 and its parameter T15T15T15i += 2; T15T15T15continue; T15T15} T15T15 T15T15T19 token_map.get(&byte) { T15T15T15output.push_str(pattern); T15T15T3E byte >= 0x20 && byte <= 0x7E { T15T15T15T29 // T37 T15T15} T15T15i += 1; T15} T15 T15output }T14 T02Level 1 Decoder (With Extensions) T11 fn decompress_extended(compressed: T36 T15T16 T15let base_tokens T25 T15let ext_tokens = parse_extended_tokens(header); T15T21 T15T41 T15 T15while i T17 T15T15T2C T15T15 T15T15if T2D && i + 1 T17 T15T15T15let ext_type = content[i + 1]; T15T15T15 T15T15T15match ext_type >> 4 { T15T15T15T150x0 => { T15T15T15T15T15T31 T1E T15T15T15T15T15if i + 2 T17 T15T15T15T15T15T15let token_id = ((ext_type & 0x0F) as u16) << 8 | content[i + 2] as u16; T15T15T15T15T15T15T19 ext_tokens.get(&token_id) { T15T15T15T15T15T15T15output.push_str(pattern); T15T15T15T15T15T15} T15T15T15T15T15T15i += 3; T15T15T15T15T15} T15T15T15T15} T15T15T15T150x1 => { T15T15T15T15T15// T28 - Level 2 feature T15T15T15T15T15i += 2; // Skip for Level 1 T15T15T15T15} T15T15T15T15_ => i += 2, // Unknown extension, skip T15T15T15} T15T15} else T19 base_tokens.get(&byte) { T15T15T15output.push_str(pattern); T15T15T15i += 1; T15T15} else { T15T15T15T29 T15T15T15i += 1; T15T15} T15} T15 T15output }T14 ## Binary Format (.mqb) T02Extended Token Range - Uses bytes 0x00-0xFF (excluding reserved) - Reserved: 0x00 (null), 0x0A (LF), 0x0D (CR) - Available tokens: 253 T02Binary HeaderT14T27 Magic: "MQB\x02" T27 Timestamp (big-endian) T27 T45 T27 T3D [2 bytes] Token count [2 bytes] Header checksum [Variable] Token map (length-prefixed) T27 Content separator: 0xFFFFFFFF [Variable] Compressed contentT14 ## Performance Characteristics T02Compression Ratios - Markdown: 40-60% compression - Code: 30-50% compression - JSON/XML: 50-70% compression - Logs: 60-80% compression T02Speed - Compression: ~100MB/s - Decompression: ~500MB/s - Memory usage: O(token_count + window_size) T3A Notes T02Token Selection Strategy 1. **Length × Frequency**: Prioritize patterns with highest `(length - 1) × (frequency - 1)` 2. **Boundary Awareness**: Prefer T46 start/end at word boundaries 3. **Nested Patterns**: Avoid tokens that overlap significantly 4. **Context Clustering**: Group related patterns together T49 space T02Memory Efficiency - Streaming decompression: Read T3F, then stream content - Chunked compression: Process large files in windows - Token recycling: Reuse tokens T4B patterns in different sections ## CLI Usage ```bash T20 with .mq format (text-safe) T22 T20 with .mqb format (binary, max compression) T22b --binary T4E (auto-detects format) T4C file.mq -o file.md # Analyze compression potential mq analyze file.md --verbose # Streaming compression cat large.md | mq compress --stream > large.mq # Inspect compressed file mq inspect file.mq --show-tokensT14 ## Integration Examples T02Smart Tree Bundle ```bash # Bundle entire project into single .mq file st --mode bundle /project > project.mq # Extract bundle mq extract project.mq -o /restored/project/T14 T02Git Integration ```bash # Add .mq differ git config diff.mq.textconv "T4C" # Track compressed versions echo "*.mq filter=mq" >> .gitattributesT14 ## Future Extensions T02Planned Features 1. **Delta Compression**: Store only changes between versions 2. **Semantic Tokens**: AI-aware T18 based on meaning 3. **Multi-Language**: Optimized token sets per programming language 4. **Encryption**: Built-in encryption with T3F as part of key 5. **Distributed Tokens**: Share T3Fs across related files T02Research Areas - Machine learning for optimal T18 - Quantum-inspired superposition tokens - Neural compression with learned representations - Cross-file token sharing for projects ## The X-Token Philosophy T02Learning from x86's Mistakes The x86 instruction set grew organically, with prefixes added ad-hoc over decades: - Multiple conflicting prefix bytes - Complex decoder state machines - Instruction length nightmares - Backward compatibility through accidents T02The Marqant Way We designed T43 mechanism right from the start: - **Single Gateway**: Only 0x7F opens T43 space - **Clear Hierarchy**: Extensions organized by function (0x0X = patterns, 0x1X = semantic, etc.) - **Skip-Safe**: Unknown extensions can be safely skipped - **Length-Explicit**: X-tokens always consume exactly 2 or 3 bytes total T02Real-World Benefits T14Basic User: "I just need simple compression" → Use Level 0 tools, ignore extensions completely Power T48 T32" → Level 1 gives you 4096 extra tokens AI System: "I need semantic understanding" → Level 2 provides rich metadata and cross-references Future T48 feature X that doesn't exist yet" → Assign it to 0x7F 0x60-0xFF, older decoders still work!T14 T02Implementation Simplicity ```c // Dead simple T26 core while (pos < len) { T15byte = data[pos++]; T15if (T2D) { T15T15pos += 2; // Skip extension T15T15continue; T15} T15output(tokens[byte] || byte); }T14 ## The Cheet Says... > "Version 2 is like going from acoustic to electric - same song, but now we can really shred! 🎸 The rotating tokens are like a DJ mixing tracks - always keeping the best beats in rotation!" > "And that X-token? It's like having a backstage pass - basic fans enjoy the show, but if you've got the pass, you get to see ALL the magic! Unlike x86 which is like a tour bus held together with duct tape, we built a proper tour rig from day one! 🚌→🚁" --- *Remember: The best compression is the one that preserves meaning while minimizing bits. T4D does both, with style!*

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/8b-is/smart-tree'

If you have feedback or need assistance with the MCP directory API, please join our Discord server