Skip to main content
Glama
RAG_PERSONA_ARCHITECTURE.mdβ€’4.87 kB
# 🧠 RAG 기반 μ§„μ§œ μ „λ¬Έκ°€ μ‹œμŠ€ν…œ (v3.1.0) **핡심**: 페λ₯΄μ†Œλ‚˜ μ—­λŸ‰ 상세 λ¬Έμ„œν™” β†’ RAG둜 동적 검색 β†’ λ¬΄ν•œν•œ μ „λ¬Έμ„± --- ## 🎯 μ•„ν‚€ν…μ²˜ ### κΈ°μ‘΄ vs RAG **κΈ°μ‘΄ (v3.0.0)**: ``` 410-llm-engineer.txt (15KB) β†’ Context 전체 λ‘œλ“œ β†’ ν•œκ³„: μ œν•œλœ 깊이, 높은 토큰 λΉ„μš© ``` **RAG (v3.1.0)**: ``` 410-llm-engineer.txt (1KB λ©”νƒ€λ°μ΄ν„°λ§Œ) knowledge-base/410-llm-engineer/ (100MB+) β”œβ”€β”€ core-competencies/ (200 pages) β”œβ”€β”€ case-studies/ (100+ cases) β”œβ”€β”€ code-examples/ (500+ examples) └── research-papers/ (200+ summaries) β†’ 질문 λ°œμƒ μ‹œ RAG 검색 β†’ κ΄€λ ¨ 3-5 청크만 (3KB) β†’ 98% 토큰 절감 + λ¬΄ν•œν•œ 깊이 ``` --- ## πŸ—οΈ 디렉토리 ꡬ쑰 ``` persona-mcp/ β”œβ”€β”€ community/ β”‚ └── 410-llm-engineer.txt (κ²½λŸ‰ 메타데이터) β”‚ β”œβ”€β”€ knowledge-base/ β”‚ └── 410-llm-engineer/ β”‚ β”œβ”€β”€ core-competencies/ β”‚ β”‚ β”œβ”€β”€ transformer-architectures.md (50 pages) β”‚ β”‚ β”œβ”€β”€ prompt-engineering.md (80 pages) β”‚ β”‚ β”œβ”€β”€ model-optimization.md (40 pages) β”‚ β”‚ └── deployment-strategies.md (35 pages) β”‚ β”œβ”€β”€ case-studies/ β”‚ β”‚ β”œβ”€β”€ gpt4-deployment.md β”‚ β”‚ β”œβ”€β”€ llama-fine-tuning.md β”‚ β”‚ └── 100+ more... β”‚ β”œβ”€β”€ code-examples/ β”‚ β”‚ β”œβ”€β”€ quantization.py β”‚ β”‚ β”œβ”€β”€ flash-attention.py β”‚ β”‚ └── 500+ more... β”‚ └── research-papers/ β”‚ └── 200+ summaries β”‚ └── src/rag/ β”œβ”€β”€ embeddings.ts (Voyage AI) β”œβ”€β”€ vectorStore.ts (ChromaDB) └── retrieval.ts (Semantic Search + Rerank) ``` --- ## πŸ”„ μ›Œν¬ν”Œλ‘œμš° ### 1. μ˜€ν”„λΌμΈ 인덱싱 (μ„œλ²„ μ‹œμž‘ μ‹œ) ```typescript // λͺ¨λ“  지식 λ¬Έμ„œ β†’ μ²­ν‚Ή β†’ μž„λ² λ”© β†’ 벑터 DB knowledge-base/410-llm-engineer/ (500 files) β†’ μ²­ν‚Ή (1000 tokens/chunk, 200 overlap) β†’ Voyage AI μž„λ² λ”© (1024-dim) β†’ ChromaDB μ €μž₯ (50,000 chunks) ``` ### 2. μ‹€μ‹œκ°„ 검색 (μ‚¬μš©μž 질문 μ‹œ) ```typescript User: "How to optimize 70B model inference?" β†’ Query μž„λ² λ”© β†’ 벑터 검색 (Top 20) β†’ Cohere Rerank (Top 5) β†’ Context ꡬ성 (3KB) β†’ LLM 응닡 ``` --- ## πŸ’Ύ 기술 μŠ€νƒ | μ»΄ν¬λ„ŒνŠΈ | 선택 | 이유 | |---------|------|------| | **μž„λ² λ”©** | Voyage AI | MTEB #1 μ„±λŠ₯ | | **벑터 DB** | ChromaDB | κ°„νŽΈ, 둜컬 | | **μž¬μˆœμœ„ν™”** | Cohere Rerank | 졜고 정확도 | | **μ²­ν‚Ή** | LangChain | 검증됨 | --- ## πŸ“Š μ„±λŠ₯ | μ§€ν‘œ | 전체 λ‘œλ“œ | RAG | |------|-----------|-----| | Context | 150K tokens | 3K tokens | | 정확도 | 85% | **92%** | | λΉ„μš© | $0.45/req | **$0.009/req** | | 깊이 | 15KB | **100MB+** | **효과**: 98% 토큰 절감 + 7% 정확도 ν–₯상 + λ¬΄ν•œν•œ μ „λ¬Έμ„± --- ## πŸ› οΈ κ΅¬ν˜„ μ˜ˆμ‹œ ### embeddings.ts ```typescript import { VoyageAIClient } from '@voyageai/client'; export class EmbeddingService { async embedDocument(personaId: string, filePath: string) { const content = await fs.readFile(filePath, 'utf-8'); const chunks = await this.splitter.splitText(content); const embeddings = await this.voyageClient.embed({ input: chunks }); return chunks.map((chunk, i) => ({ id: `${personaId}-${i}`, vector: embeddings.data[i].embedding, content: chunk })); } } ``` ### retrieval.ts ```typescript export class RetrievalService { async retrieve(personaId: string, query: string) { // 1. Query μž„λ² λ”© const queryVector = await this.embeddings.embedQuery(query); // 2. 벑터 검색 (Top 20) const candidates = await this.vectorStore.search(queryVector, personaId, 20); // 3. Cohere Rerank (Top 5) const reranked = await this.cohere.rerank({ query, documents: candidates.map(c => c.content), topN: 5 }); return reranked.results; } } ``` --- ## πŸš€ μ‹€ν–‰ κ³„νš ### Week 1-2: RAG 인프라 - [ ] ChromaDB + Voyage AI 연동 - [ ] μž„λ² λ”©/검색 νŒŒμ΄ν”„λΌμΈ - [ ] 410-llm-engineer 지식 베이슀 μž‘μ„± (200 pages) ### Week 3-4: 톡합 - [ ] MCP μ„œλ²„ RAG 톡합 - [ ] 10개 핡심 페λ₯΄μ†Œλ‚˜ 지식 베이슀 - [ ] μ„±λŠ₯ 벀치마크 ### Week 5-8: ν™•μž₯ - [ ] 142개 페λ₯΄μ†Œλ‚˜ 전체 지식 베이슀 - [ ] ν”„λ‘œλ•μ…˜ μ΅œμ ν™” - [ ] v3.1.0 Release --- ## πŸ’‘ 핡심 μž₯점 1. **λ¬΄ν•œν•œ 깊이**: 100MB+ 지식 vs 15KB μ œν•œ 2. **항상 μ΅œμ‹ **: 지식 베이슀 μ—…λ°μ΄νŠΈ μ¦‰μ‹œ 반영 3. **λΉ„μš© 효율**: 98% 토큰 절감 4. **높은 정확도**: κ΄€λ ¨ μ •λ³΄λ§Œ 제곡 (λ…Έμ΄μ¦ˆ 제거) 5. **ν™•μž₯μ„±**: 142 personas Γ— 100MB = 14.2GB (관리 κ°€λŠ₯) --- **μƒνƒœ**: βœ… μ•„ν‚€ν…μ²˜ 섀계 μ™„λ£Œ **λ‹€μŒ**: RAG 인프라 ꡬ좕 + 첫 지식 베이슀 μž‘μ„±

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/seanshin0214/persona-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server