Research Powerpack MCP

research-powerpack-mcp
new-strings

03-scrape-llm-wrapper.md•2.72 KiB

# 03 — Scrape LLM Wrapper Audit (`scrape_pages`)

## Metadata

| Field | Value |
|---|---|
| Canonical tool | `scrape_pages` |
| Legacy alias | `scrape_links` -> `scrape_pages` |
| Source | `src/services/llm-processor.ts` (`processContentWithLLM`) |
| Trigger | Per URL when `use_llm=true` |
| Role | Wraps extraction instruction + page content before LLM call |

## Current Wrapper Text (Verbatim)

**Branch A (`what_to_extract` provided):**

```text
Extract and clean the following content. Focus on: {config.what_to_extract}

Content:
{truncatedContent}
```

**Branch B (`what_to_extract` missing):**

```text
Clean and extract the main content from the following text, removing navigation, ads, and irrelevant elements:

{truncatedContent}
```

## Context in Prompt Assembly

1. `src/tools/scrape.ts` builds `enhancedInstruction` (prefix + user target + suffix).
2. `processContentWithLLM` wraps that instruction with this frame.
3. Final request is currently user-role only (no system-role constraints).

## Criticism Table (12)

| # | Criticism | Impact |
|---:|---|---|
| 1 | No system-role for fixed rules | Lower instruction priority |
| 2 | Wrapper repeats "extract/clean" semantics already in prefix | Token waste |
| 3 | Branch A vs B has different policy surfaces | Inconsistent outputs |
| 4 | `Focus on:` is vague boundary text | Weaker controllability |
| 5 | No explicit anti-preamble rule here | Fluff drift possible |
| 6 | No explicit output-shape contract | Parsing variability |
| 7 | Generic `Content:` delimiter only | Instruction/source bleed risk |
| 8 | Branch B bypasses enriched extraction targets | Lower relevance |
| 9 | Fixed overhead repeats per URL | Scales cost linearly |
| 10 | Wrapper language is verbose relative to control value | Poor density |
| 11 | No confidence/uncertainty output guidance | Inconsistent certainty handling |
| 12 | Not normalized with `deep_research` system+user split style | Cross-tool inconsistency |

## Recommended Wrapper (copy-paste design)

```text
SYSTEM: Extract only from SOURCE. No hallucination. No preamble. Structured data -> markdown table; otherwise tight bullets.
USER: TARGETS:\n{what_to_extract_or_default}\n\nSOURCE:\n{truncatedContent}
```

## Alternatives (3)

| Alternative | Pros | Cons |
|---|---|---|
| A — **Recommended** system+user split | Best adherence, clear precedence, lower redundancy | Requires message-shape refactor |
| B — Single-user compact frame | Minimal code change | Weaker policy priority |
| C — Keep current, trim wording | Safest rollout | Retains structural weaknesses |

## System vs User Tradeoff

- Put stable policy in **system** (groundedness, format, anti-fluff).
- Put variable intent + source in **user**.
- Current design blends both, reducing consistency.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yigitkonur/research-powerpack-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

03-scrape-llm-wrapper.md•2.72 KiB

# 03 — Scrape LLM Wrapper Audit (`scrape_pages`)

## Metadata

| Field | Value |
|---|---|
| Canonical tool | `scrape_pages` |
| Legacy alias | `scrape_links` -> `scrape_pages` |
| Source | `src/services/llm-processor.ts` (`processContentWithLLM`) |
| Trigger | Per URL when `use_llm=true` |
| Role | Wraps extraction instruction + page content before LLM call |

## Current Wrapper Text (Verbatim)

**Branch A (`what_to_extract` provided):**

```text
Extract and clean the following content. Focus on: {config.what_to_extract}

Content:
{truncatedContent}
```

**Branch B (`what_to_extract` missing):**

```text
Clean and extract the main content from the following text, removing navigation, ads, and irrelevant elements:

{truncatedContent}
```

## Context in Prompt Assembly

1. `src/tools/scrape.ts` builds `enhancedInstruction` (prefix + user target + suffix).
2. `processContentWithLLM` wraps that instruction with this frame.
3. Final request is currently user-role only (no system-role constraints).

## Criticism Table (12)

| # | Criticism | Impact |
|---:|---|---|
| 1 | No system-role for fixed rules | Lower instruction priority |
| 2 | Wrapper repeats "extract/clean" semantics already in prefix | Token waste |
| 3 | Branch A vs B has different policy surfaces | Inconsistent outputs |
| 4 | `Focus on:` is vague boundary text | Weaker controllability |
| 5 | No explicit anti-preamble rule here | Fluff drift possible |
| 6 | No explicit output-shape contract | Parsing variability |
| 7 | Generic `Content:` delimiter only | Instruction/source bleed risk |
| 8 | Branch B bypasses enriched extraction targets | Lower relevance |
| 9 | Fixed overhead repeats per URL | Scales cost linearly |
| 10 | Wrapper language is verbose relative to control value | Poor density |
| 11 | No confidence/uncertainty output guidance | Inconsistent certainty handling |
| 12 | Not normalized with `deep_research` system+user split style | Cross-tool inconsistency |

## Recommended Wrapper (copy-paste design)

```text
SYSTEM: Extract only from SOURCE. No hallucination. No preamble. Structured data -> markdown table; otherwise tight bullets.
USER: TARGETS:\n{what_to_extract_or_default}\n\nSOURCE:\n{truncatedContent}
```

## Alternatives (3)

| Alternative | Pros | Cons |
|---|---|---|
| A — **Recommended** system+user split | Best adherence, clear precedence, lower redundancy | Requires message-shape refactor |
| B — Single-user compact frame | Minimal code change | Weaker policy priority |
| C — Keep current, trim wording | Safest rollout | Retains structural weaknesses |

## System vs User Tradeoff

- Put stable policy in **system** (groundedness, format, anti-fluff).
- Put variable intent + source in **user**.
- Current design blends both, reducing consistency.