batch_ingest_embeddings
Process content files to extract text and format them for batch embedding generation. Analyzes various file formats, extracts text content, and creates JSONL files ready for embedding creation with specified task types.
Instructions
EMBEDDINGS CONTENT INGESTION - Specialized ingestion for embeddings batch processing. WORKFLOW: 1) Analyzes content structure, 2) Extracts text for embedding, 3) Formats as JSONL with proper embedContent structure including task_type, 4) Validates format. OPTIMIZED FOR: Text extraction from various formats (CSV columns, JSON fields, TXT lines, MD sections). RETURNS: JSONL file ready for batch_create_embeddings with task_type embedded in each request.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| inputFile | Yes | Path to content file | |
| outputFile | No | Optional output JSONL path | |
| textField | No | For CSV/JSON: field name containing text to embed (auto-detected if not provided) | |
| taskType | Yes | Embedding task type (RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, RETRIEVAL_QUERY, CODE_RETRIEVAL_QUERY, QUESTION_ANSWERING, FACT_VERIFICATION). Use batch_query_task_type if unsure. |