TxtAI MCP Server

Overview Schema Related Servers Score Discussions

txtai
examples

80_Distilling_Knowledge_into_Tiny_LLMs.ipynb•9.6 KiB

{ "cells": [ { "cell_type": "markdown", "id": "c65bdb85", "metadata": {}, "source": [ "# Distilling Knowledge into Tiny LLMs\n", "\n", "Large Language Models (LLMs) are the magic behind AI. These massive billion and trillion parameter models have been shown to generalize well when trained on enough data.\n", "\n", "A big problem is that they are hard to run and expensive. So many just call LLMs through APIs such as OpenAI or Claude. Additionally, in many instances, developers spend a lot of time with complex prompt logic hoping to cover all the edge cases and believe they need a model that's large enough to handle all the rules.\n", "\n", "If you truly want control over your business processes, running a local model is a better choice. And the good news is that it doesn't have to be a giant and expensive multi-billion parameter model. We can finetune LLMs to handle our specific business logic, which helps us take control and limit prompt complexity. \n", "\n", "This notebook will show how we can distill knowledge into tiny LLMs." ] }, { "cell_type": "markdown", "id": "279ecd4b", "metadata": {}, "source": [ "# Install dependencies\n", "\n", "Install `txtai` and all dependencies." ] }, { "cell_type": "code", "execution_count": null, "id": "4f7bca3a", "metadata": {}, "outputs": [], "source": [ "%%capture\n", "!pip install git+https://github.com/neuml/txtai#egg=txtai[pipeline-train] datasets" ] }, { "cell_type": "markdown", "id": "c7b1b4be", "metadata": {}, "source": [ "# The LLM\n", "\n", "We'll use a [600M parameter Qwen3 model](https://hf.co/qwen/qwen3-0.6b) for this example. Our target task will be translating user requests into linux commands." ] }, { "cell_type": "code", "execution_count": 9, "id": "e1d95dd8", "metadata": {}, "outputs": [], "source": [ "from txtai import LLM\n", "\n", "llm = LLM(\"Qwen/Qwen3-0.6B\")" ] }, { "cell_type": "markdown", "id": "8f276cfc", "metadata": {}, "source": [ "Let's try one with the base model as it is." ] }, { "cell_type": "code", "execution_count": 11, "id": "686c3983", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ps -e'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "llm(\"\"\"\n", "Translate the following request into a linux command. Only print the command.\n", "\n", "Find number of logged in users\n", "\"\"\", maxlength=1024)" ] }, { "cell_type": "markdown", "id": "dcd92414", "metadata": {}, "source": [ "As we can see, the model actually has a good understanding and at least prints a command. But in this case it's not correct. Let's get to fine-tuning!" ] }, { "cell_type": "markdown", "id": "73666982", "metadata": {}, "source": [ "# Finetuning the LLM with knowledge\n", "\n", "Yes, 600M parameters is small and we can't possibly expect it to do well with everything. But the good news is that we can distill knowledge into this tiny LLM and make it better. We'll use this [linux commands dataset](https://huggingface.co/datasets/mecha-org/linux-command-dataset) from the Hugging Face Hub. We'll also use this [training pipeline from txtai](https://neuml.github.io/txtai/pipeline/train/trainer).\n", "\n", "First, we'll create the training dataset. We'll use the same prompt strategy from above.\n", "\n", "```python\n", "\"\"\"\n", "Translate the following request into a linux command. Only print the command.\n", "\n", "{user request}\n", "\"\"\"\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "id": "43d6b563", "metadata": {}, "outputs": [], "source": [ "from datasets import load_dataset\n", "from transformers import AutoTokenizer\n", "\n", "# LLM path\n", "path = \"Qwen/Qwen3-0.6B\"\n", "tokenizer = AutoTokenizer.from_pretrained(path)\n", "\n", "# Load the training dataset\n", "dataset = load_dataset(\"mecha-org/linux-command-dataset\", split=\"train\")\n", "\n", "def prompt(row):\n", " text = tokenizer.apply_chat_template([\n", " {\"role\": \"system\", \"content\": \"Translate the following request into a linux command. Only print the command.\"},\n", " {\"role\": \"user\", \"content\": row[\"input\"]},\n", " {\"role\": \"assistant\", \"content\": row[\"output\"]}\n", " ], tokenize=False, enable_thinking=False)\n", "\n", " return {\"text\": text}\n", "\n", "# Map to training prompts\n", "train = dataset.map(prompt, remove_columns=[\"input\", \"output\"])" ] }, { "cell_type": "code", "execution_count": 2, "id": "7f71dce0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <div>\n", " \n", " <progress value='210' max='210' style='width:300px; height:20px; vertical-align: middle;'></progress>\n", " [210/210 01:12, Epoch 1/1]\n", " </div>\n", " <table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: left;\">\n", " <th>Step</th>\n", " <th>Training Loss</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <td>50</td>\n", " <td>0.625300</td>\n", " </tr>\n", " <tr>\n", " <td>100</td>\n", " <td>0.490200</td>\n", " </tr>\n", " <tr>\n", " <td>150</td>\n", " <td>0.403300</td>\n", " </tr>\n", " <tr>\n", " <td>200</td>\n", " <td>0.391800</td>\n", " </tr>\n", " </tbody>\n", "</table><p>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from txtai.pipeline import HFTrainer\n", "\n", "# Load the training pipeline\n", "trainer = HFTrainer()\n", "\n", "# Train the model\n", "# Set output_dir to save, trained in memory for this example\n", "model = trainer(\n", " \"Qwen/Qwen3-0.6B\",\n", " train,\n", " task=\"language-generation\",\n", " maxlength=512,\n", " bf16=True,\n", " per_device_train_batch_size=4,\n", " num_train_epochs=1,\n", " logging_steps=50,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "68b5bc4e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'who | wc -l'" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from txtai import LLM\n", "\n", "llm = LLM(model)\n", "\n", "llm([\n", " {\"role\": \"system\", \"content\": \"Translate the following request into a linux command. Only print the command.\"},\n", " {\"role\": \"user\", \"content\": \"Find number of logged in users\"}\n", "])" ] }, { "cell_type": "code", "execution_count": 14, "id": "7427186d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ls ~/'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "llm([\n", " {\"role\": \"system\", \"content\": \"Translate the following request into a linux command. Only print the command.\"},\n", " {\"role\": \"user\", \"content\": \"List the files in my home directory\"}\n", "])" ] }, { "cell_type": "code", "execution_count": 16, "id": "a141d63b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'zip -r data.zip data'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "llm([\n", " {\"role\": \"system\", \"content\": \"Translate the following request into a linux command. Only print the command.\"},\n", " {\"role\": \"user\", \"content\": \"Zip the data directory with all it's contents\"}\n", "])" ] }, { "cell_type": "markdown", "id": "fc7d0fda", "metadata": {}, "source": [ "It even works well without the system prompt." ] }, { "cell_type": "code", "execution_count": 20, "id": "a7d0c671", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'du -sh ~'" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "llm(\"Calculate the total amount of disk space used for my home directory. Only print the total.\")" ] }, { "cell_type": "markdown", "id": "48cfcacf", "metadata": {}, "source": [ "# Wrapping up\n", "\n", "This notebook demonstrated how it's very straightforward to distill knowledge into LLMs with `txtai`. Don't always go for the giant LLM, spend a little time finetuning a tiny LLM, it is well worth it!" ] } ], "metadata": { "kernelspec": { "display_name": "local", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.19" } }, "nbformat": 4, "nbformat_minor": 5 }

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/neuml/txtai'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

80_Distilling_Knowledge_into_Tiny_LLMs.ipynb•9.6 KiB