Skip to main content
Glama

hz_fetch_items

Fetch and deduplicate content from specified sources, then write results to the raw processing stage for analysis.

Instructions

抓取并去重内容,写入 run 的 raw 阶段。

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
hoursNo
run_idNo
horizon_pathNo
config_pathNo
sourcesNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The `fetch_items` method in `HorizonPipelineService` handles the logic for fetching items from sources, merging them, and saving the raw data.
    async def fetch_items(
        self,
        hours: int = 24,
        run_id: str | None = None,
        horizon_path: str | None = None,
        config_path: str | None = None,
        sources: list[str] | None = None,
    ) -> dict[str, Any]:
        if hours <= 0:
            raise HorizonMcpError(code="HZ_INVALID_INPUT", message="hours 必须大于 0。")
    
        ctx, selected_sources, unknown_sources = self._build_context(
            horizon_path=horizon_path,
            config_path=config_path,
            sources=sources,
        )
    
        storage = make_storage(ctx.runtime, ctx.config_path)
        orchestrator = make_orchestrator(ctx.runtime, ctx.config, storage)
    
        run_id = self.run_store.create_run(run_id)
        since = datetime.now(timezone.utc) - timedelta(hours=hours)
    
        raw_items = await orchestrator._fetch_all_sources(since)
        merged_items = orchestrator._merge_cross_source_duplicates(raw_items)
    
        self.run_store.save_items(run_id, "raw", items_to_dicts(merged_items))
        meta = self.run_store.update_meta(
            run_id,
            {
                "horizon_path": str(ctx.horizon_path),
                "config_path": str(ctx.config_path),
                "hours": hours,
                "since": since.isoformat(),
                "source_selection": selected_sources,
                "unknown_sources": unknown_sources,
                "raw_count_before_merge": len(raw_items),
                "raw_count": len(merged_items),
            },
        )
    
        return {
            "run_id": run_id,
            "fetched": len(merged_items),
            "raw_before_merge": len(raw_items),
            "source_counts": get_source_counts(merged_items),
            "artifact": str((self.run_store.run_dir(run_id) / "raw_items.json").resolve()),
            "meta": meta,
        }
  • The `hz_fetch_items` MCP tool is registered here and calls `service.fetch_items`.
    @mcp.tool()
    async def hz_fetch_items(
        hours: int = 24,
        run_id: str | None = None,
        horizon_path: str | None = None,
        config_path: str | None = None,
        sources: list[str] | None = None,
    ) -> dict[str, Any]:
        """抓取并去重内容,写入 run 的 raw 阶段。"""
    
        return await _run_tool(
            "hz_fetch_items",
            lambda: service.fetch_items(
                hours=hours,
                run_id=run_id,
                horizon_path=horizon_path,
                config_path=config_path,
                sources=sources,
            ),
        )
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'fetch and deduplicate content' and 'write to the raw stage', which implies data retrieval, processing, and storage, but lacks details on permissions, side effects, error handling, or performance traits. For a tool with 5 parameters and no annotations, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with a single sentence, but it's under-specified rather than efficiently informative. It's front-loaded but lacks detail, making it somewhat wasteful in terms of clarity. However, it avoids redundancy and is structurally simple.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters with 0% schema coverage, no annotations, and an output schema (which might help with return values), the description is incomplete. It doesn't explain the tool's role in the pipeline, parameter purposes, or behavioral context, making it inadequate for a tool of this complexity. Sibling tools suggest a data processing workflow, but this isn't leveraged.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so parameters are undocumented in the schema. The description does not mention any parameters or provide meaning beyond the generic 'fetch and deduplicate content'. It fails to compensate for the low coverage, leaving all 5 parameters (hours, run_id, horizon_path, config_path, sources) without contextual explanation in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description '抓取并去重内容,写入 run 的 raw 阶段' (Fetch and deduplicate content, write to the raw stage of a run) states a purpose but is vague. It mentions 'fetch and deduplicate content' without specifying what content or from where, and 'write to the raw stage of a run' lacks context on what a 'run' or 'raw stage' entails. It doesn't clearly distinguish from siblings like hz_enrich_items or hz_filter_items, which might involve similar processing stages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives is provided. The description implies it's for an initial data ingestion phase ('raw stage'), but it doesn't specify prerequisites, when not to use it, or how it relates to siblings like hz_run_pipeline or hz_list_runs. Usage is only vaguely implied by the mention of 'run' and 'raw stage'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/henry-insomniac/Horizon-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server