Skip to main content
Glama
mayai-it

bandiradar-mcp

by mayai-it

BandiRadar

PyPI CI Python 3.12 License: MIT

Open-source engine that monitors Italian public funding opportunities (public tenders, grants, incentives), normalizes them into one canonical model, and ranks them against a company profile with a two-stage matcher. One normalized feed of OPEN Italian tenders (incl. sub-threshold gare) + incentives, behind a crawl that repairs itself when a portal drifts.

Runs offline, zero secrets · 9 live key-less sources + 10 LLM-assisted scrapers · includes live OPEN Italian tenders (incl. sub-threshold) · optional LLM Stage-2 · MIT

Coverage

Coverage map — an honest map of where Italian public funding is published and what BandiRadar covers: open feeds vs gated, with the honest gap.

Italian public funding — data coverage map

Related MCP server: Grant Hunter MCP

Features

  • Two-stage matcher — a deterministic prefilter + LLM relevance scoring, with a zero-secrets offline heuristic fallback (the LLM is optional).

  • 9 live, key-less sources — TED (EU), incentivi.gov.it (national), anac_pvl (national open tenders), and the regions Lombardia, Lazio, Sicilia, Emilia- Romagna and Trento (FEASR); plus ANAC OCDS as a key-less historical / awarded-contracts feed (analysis, not open calls). 10 LLM-assisted scrapers for API-less portals — Toscana, Veneto, Piemonte, Puglia, Sardegna, FVG, Campania, Calabria, Basilicata and Liguria (live fetch needs an LLM key; --sample replays a recorded extraction offline).

  • Live OPEN Italian tenders (anac_pvl) — the national Pubblicità a Valore Legale feed of open, biddable gare, incl. sub-threshold ones TED never lists, no credentials — the biddable feed the other sources lack.

  • Self-healing crawl — when a scraper's listing drifts, an LLM re-derives the crawl recipe (data, not code); it's adopted only if it exactly reproduces the last-good results, otherwise human-flagged — never silently. Demonstrated on Toscana.

  • ANAC historical-benchmark enrichment — value/volume/seasonality context per CPV division, optionally attached to matches.

  • Document enrichment (PDF/OCR) — optionally pull attachment PDFs into the matcher so it reads the real requirements, not just title + CPV.

  • watch monitor loop (new/amended deltas) + JSON/RSS export.

  • CLI + MCP server — drive it from a shell or from an AI agent.

  • Fully offline on --sample — every demo and the whole test suite run with no network and no secrets.

Table of contents

Quickstart

30 seconds, offline, no keys:

pip install bandiradar
bandiradar match --profile mayai --sample

Or from a source checkout:

uv sync
uv run bandiradar match --profile mayai --sample

Real output on the bundled sample data:

4 matching opportunities for 'MayAI':

#1  score 55  [open]  Manifestazione d'interesse per l'accesso ai servizi per la digitalizzazione forniti da SoE AP EDIH
     issuer: Ministero delle Imprese e del Made in Italy (Campania)   deadline: 2026-06-30
     why: capability overlap: artificiale, digitalizzazione, intelligenza; within profile value range; national scope
     https://www.medisdih.it/wp/

#2  score 52  [open]  Voucher Digitalizzazione PMI 2025
     issuer: LazioInnova (Lazio)   deadline: —
     why: capability overlap: cloud, conforme, dati, digitalizzazione, machine; region match: Lazio
     https://www.lazioinnova.it/bandi/voucher-digitalizzazione-pmi-2025/

#3  score 44  [open]  Italia – Servizi di gestione dati – SERVIZIO DI GESTIONE … COMUNE DI ROCCA IMPERIALE (CS)
     issuer: CENTRALE UNICA DI COMMITTENZA … CASSANO ALL'IONIO E TREBISACCE (—)   deadline: —
     why: CPV prefix match (depth 2); capability overlap: dati; eu scope
     https://ted.europa.eu/en/notice/-/detail/376324-2026

#4  score 42  [closing_soon]  Donne e Impresa 2026
     issuer: LazioInnova (Lazio)   deadline: 2026-06-10
     why: capability overlap: data, software; region match: Lazio
     risk: deadline closing soon
     https://www.lazioinnova.it/bandi/donne-e-impresa-2026/

--profile accepts either a bundled example name (mayai, medtech_lombardia, pmi_toscana, … — packaged in the wheel, so the demos work from a pip install too) or a path to your own profile YAML.

Add --json for machine-readable output. Live opportunities come from the 9 key-less sources (incentivi, TED, anac_pvl open tenders, and the regions Lombardia, Lazio, Sicilia, Emilia-Romagna, Trentino); anac adds historical awarded-contract data (see Sources and Status).

Works across company types

BandiRadar isn't tuned to one company — it runs any profile against every source. bandiradar batch runs the bundled profile suite and compares results. Real output on --sample (offline heuristic):

PROFILE                          #  TOP MATCH (score)                      BY SOURCE
------------------------------------------------------------------------------------
Consulenza Strategica S.r.l.    11  Avviso Trasformazioni - Servizi… (55)  incentivi:5 lazio:5 ted:1
Costruzioni Lombarde S.r.l.      2  LAVORI DI FORMAZIONE MANUTENZIO… (56)  lombardia:1 ted:1
Trattoria & Bottega S.r.l.       4  Manifestazione d'interesse per … (55)  incentivi:2 lazio:2
Manifattura Esempio S.r.l.       2  Voucher 3I - Investire in innov… (36)  incentivi:2
MayAI                            4  Manifestazione d'interesse per … (55)  incentivi:1 lazio:2 ted:1
MedForniture Lombardia S.r.l.    2  FORNITURA DI DISPOSITIVI PER EN… (76)  lombardia:2
Innova Toscana S.r.l.           12  Bando 1.3.2 - Sostegno alle PMI… (60)  incentivi:3 ted:2 toscana:7
Studio Associato Commercialis…   8  Efficienza energetica e rinnova… (52)  incentivi:2 lazio:5 ted:1

The suite spans distinct Italian SME segments — AI/software (MayAI), manufacturing, medical-devices (Lombardy), accounting, construction, hospitality/retail (keyword-driven, no CPV), and consultancy. Counts are real matches on the tiny bundled sample; a segment can legitimately show few hits when the sample doesn't cover it. Keyword/capability overlap ignores a curated list of generic procurement filler (lavori, servizi, fornitura, manutenzione, …), so matches reflect sector-bearing terms rather than boilerplate.

uv run bandiradar batch --sample              # human comparison table
uv run bandiradar batch --sample --json       # machine-readable
uv run bandiradar batch --sample --csv out.csv

With an LLM key the same table gets sharper scores and ranking — see Stage 2: LLM scoring.

How it works

        ┌─────────┐   ┌───────────┐   ┌────────┐   ┌────────┐   ┌──────────┐
 sources│ INGEST  │──▶│ NORMALIZE │──▶│ STORE  │──▶│ MATCH  │──▶│ DELIVER  │
        └─────────┘   └───────────┘   └────────┘   └────────┘   └──────────┘
          fetch        raw→canonical   sqlite       2 stages     cli/mcp
                                      + dedupe                  (dashboard=pro)

Italian public funding is scattered across dozens of fragmented sources. BandiRadar pulls opportunities from those sources, maps each into a single canonical Opportunity model, and surfaces the few that matter for a given company — with reasons and deadlines. Matching is two stages:

  1. Deterministic prefilter — a pure, explainable function (region/geo, value range, deadline, exclusions, and a relevance signal: the opportunity's CPV codes prefix-matched against the profile's cpv_interests, or a keyword overlap). Cuts thousands of rows to dozens. No LLM, no network.

  2. LLM relevance — scores the survivors 0–100 with reasons, matched capabilities, eligibility flags, and risk notes. It ships with a zero-secrets offline fallback (a deterministic heuristic), so the whole thing runs in CI and in agent dev loops without any API key.

A thin core service layer orchestrates the pipeline; the CLI and MCP server are shells over it with no business logic. Storage is stdlib SQLite with change detection: a changed content_hash bumps the version, marks the row amended, and makes it re-notifiable (a tender rettifica should re-notify). See ARCHITECTURE.md for the full design.

Stage 2: LLM scoring

Stage 2 is off by default (zero secrets → deterministic offline heuristic). To enable real LLM relevance scoring:

uv sync --extra anthropic        # or: --extra openai  (optional SDKs)
# in .env (gitignored):
#   BANDIRADAR_LLM_PROVIDER=anthropic        # or openai
#   ANTHROPIC_API_KEY=sk-ant-...             # or OPENAI_API_KEY=...
#   BANDIRADAR_LLM_MODEL=...                 # optional; defaults to a cheap Haiku-class model

.env is auto-loaded — no manual export. With no key (or no SDK), the engine falls back to the heuristic, so CI and offline runs need nothing. When the LLM path is active you'll see a one-time scoring via anthropic:<model> on stderr.

The LLM is more discriminating than the heuristic — same prefiltered set, sharper scores/ranking (bandiradar batch --sample):

                                  heuristic        LLM (anthropic, Haiku)
Costruzioni  (real construction)    56               92   ← genuine match promoted
Costruzioni  (IT doc-digitization)  (kept ~36)       15   ← cross-sector match demoted
MayAI        top match            software-licenses  ML/data tender (88)
Studio comm. software-licenses      50               25   ← weak fit penalized
MedForniture medical devices        76               92   ← strong sector fit held

Matching quality (measured)

Most matching repos ask you to trust them. This one ships the numbers. On a labelled gold set of 312 real opportunities × 8 company profiles (src/bandiradar/data/eval/), here is the matcher quality — reproduce it any time with bandiradar eval --diagnostics (offline for the heuristic; set an LLM key for the LLM column):

min_score sweep — precision@5 / precision@10 / recall / false-positive-rate / returned
                 P@5   P@10  recall  FPR    returned
HEURISTIC (offline, zero-secret)
  recall  (0)   0.34  0.20   0.87   0.29     99      ← the firehose
  balanced(20)  0.34  0.20   0.87   0.29     99      ← scores too coarse to move
  precision(40) 0.53  0.39   0.64   0.24     82
  (60)          0.25  0.25   0.03   0.00      2      ← collapses; no usable cut
LLM pointwise (anthropic Haiku)
  recall  (0)   0.37  0.24   0.87   0.29     99      ← the firehose
  balanced(20)  0.46  0.37   0.78   0.11     51
  precision(40) 0.73  0.68   0.45   0.03     26      ← the operating point
  (60)          0.81  0.80   0.41   0.02     20

Read it: with an LLM, raising the cutoff cleanly trades recall for precision — at precision (min_score ≥ 40) P@5 0.73 / P@10 0.68 / FPR 0.03, roughly double the precision of the unfiltered firehose (P@5 0.37 / FPR 0.29) while still holding ~half the recall. The offline heuristic is a genuine zero-secret fallback (P@5 0.34) but its scores are too coarse to threshold — it has no usable precision cut (it collapses to 2 items at 60). So the LLM is the matcher to ship, and precision modes are meaningful only with a key; keyless runs are recall-oriented whatever the mode.

Operating-point modes

match / watch / batch (and the MCP search_opportunities) take a --mode:

mode

cutoff

with an LLM key

use it for

precision

min_score ≥ 40

P@5 0.73, P@10 0.68, FPR 0.03

a tight shortlist

balanced (default)

min_score ≥ 20

P@5 0.46, recall 0.78

day-to-day

recall

everything prefiltered

recall 0.87

the monitor's safety net

--min-score N still works for power users (it overrides --mode).

Honest limits (also measured — eval --diagnostics)

  • Embeddings (semantic prefilter, the embeddings extra) are built and measured but net-negative at the current recall ceiling: ~+0.02 recall for a 1.2–2.7× larger candidate set and higher FPR, so they ship optional and off.

  • Recall ceiling is real. Gate attribution shows the few relevant items the prefilter drops are 4/6 correctly-closed bandi (the deadline gate is right — expired calls shouldn't surface) and only 2 a lexical gap; no over-strict gate to tune.

  • Listwise reranking (eval --rerank) is an optional cheaper top-k mode (one LLM call/profile vs N) that lifts top-k slightly but loses the calibrated thresholding — so pointwise stays the default.

Sources

Source

What it delivers

Live fetch

incentivi

incentivi.gov.it (MIMIT) — the national catalogue of business incentives / grants (kind="incentive"), national and regional. The grant side, and the source a digital SME profile actually matches.

✅ Wired — the official IODL open-data export, no API key.

ted

TED — Tenders Electronic Daily, the EU's portal for above-threshold, OPEN, biddable tenders (includes large Italian public tenders).

✅ Wired — anonymous, no API key.

anac_pvl

ANAC Pubblicità a Valore Legale — the national feed of OPEN Italian public tenders (kind="tender"), incl. sub-threshold ones TED never lists; notices stay online until their deadline. This is the live open-calls feed the others lack. Carries buyer, oggetto, CIG, importo (sparse), CPV, region. CPV labels are resolved to official 8-digit CPV codes (EU vocabulary; often coarse division-level); region is resolved province→comune(ISTAT)→buyer→national. Caveats: importo often absent; CPV codes can be coarse.

✅ Wired — public JSON API, no credentials; keeps only still-open gare (deadline in the future).

lombardia

Regione Lombardia — regional / sub-threshold public tenders (kind="tender"), from the Osservatorio Regionale (Socrata SODA). Carries CPV, value, and province.

✅ Wired — Socrata SODA, no API key.

lazio

Regione Lazio — regional business incentives (kind="incentive"), from the LazioInnova bandi portal (WordPress REST API). The source the MayAI dogfood profile matches.

✅ Wired — WP REST, no API key.

toscana

Regione Toscana — regional business incentives (kind="incentive"), from the Sviluppo Toscana bandi portal. First LLM-assisted scraper: the portal has no field API, so an LLM extracts the canonical fields from each bando page.

⚠️ Wired — live fetch() needs an LLM key; fields are extracted from the portal's HTML bando pages. --sample replays a recorded extraction offline.

veneto

Regione del Veneto — regional bandi (tenders + incentives, classified per atto) from the SIU portal. LLM scraper: the portal's JSON layer stonewalls bots, so the server-rendered landing seeds the crawl and an LLM extracts each Dettaglio page. Honest scope: one visit surfaces the landing's ~10 atti; the daily monitor accumulates them.

⚠️ Wired — live fetch() needs an LLM key; --sample replays a recorded extraction offline.

piemonte

Regione Piemonte — regional bandi from the dedicated Drupal portal (bandi.regione.piemonte.it). LLM scraper seeded by the server-rendered Views listing filtered to stato "Aperto" server-side; an LLM extracts each detail page.

⚠️ Wired — live fetch() needs an LLM key; --sample replays a recorded extraction offline.

puglia

Regione Puglia — PR 2021-2027 avvisi from pr2127.regione.puglia.it. LLM scraper seeded by the portal's Liferay news-list fragment, keeping only items badged "Bando aperto". (The historic sistema.puglia.it is a frameset service registry with no scadenze — not viable.)

⚠️ Wired — live fetch() needs an LLM key; --sample replays a recorded extraction offline.

sardegna

Regione Sardegna — regional agevolazioni from Sardegna Impresa (Drupal). LLM scraper seeded by the server-rendered /it/agevolazioni Views listing (structured per-item scadenza).

⚠️ Wired — live fetch() needs an LLM key; --sample replays a recorded extraction offline.

fvg

Regione FVG — contributi-bearing bandi in corso from the regione.fvg.it bandi module, via the portal's own "misure contributive" filter (server-side). LLM scraper. (CI: routed via the EU-pinned relay — the host drops runner IPs.)

⚠️ Wired — live fetch() needs an LLM key; --sample replays a recorded extraction offline.

campania

Regione Campania — curated open business bandi from Sviluppo Campania (/bandi-aperti/ media-image widgets; the FESR portal blocks even the relay). LLM scraper; honest scope: the curated open set is small (~6).

⚠️ Wired — live fetch() needs an LLM key; --sample replays a recorded extraction offline.

calabria

Regione Calabria — PR 2021-2027 bandi from Calabria Europa: the bando custom post type is OPEN over WP-REST (records carry only id/link/title), so the JSON listing seeds the crawl and the LLM extracts the rich detail pages.

⚠️ Wired — live fetch() needs an LLM key; --sample replays offline.

basilicata

Regione Basilicata — all regional avvisi from the dedicated portalebandi (open avvisi-e-bandi CPT over WP-REST; structured detail pages: Destinatari, Importo, giorni alla scadenza). The LLM classifies tenders vs incentives.

⚠️ Wired — live fetch() needs an LLM key; --sample replays offline.

liguria

Regione Liguria — active contributi from the portal's Joomla publiccompetition search, filtered server-side (tipologia "contributi" + stato "Attivi"; session cookie + CSRF token handled by the crawl).

⚠️ Wired — live fetch() needs an LLM key; --sample replays offline.

sicilia

Regione Siciliana — regional FESR/FSC incentives (kind="incentive"), from EuroInfoSicilia. Standard WordPress posts under the "Bandi e Avvisi" category (config over the shared WP base + a categories filter).

✅ Wired — WP REST, no API key.

emilia_romagna

Regione Emilia-Romagna — regional incentives (kind="incentive") from the Politiche territoriali portal. Plone Bando content type with a structured scadenza_bando deadline (no text-parsing).

✅ Wired — plone.restapi @search, no API key.

trentino

Provincia Autonoma di Trento — FEASR rural-development incentives (kind="incentive"), from a dati.trentino.it CKAN open-data CSV (carries currently-open bandi, with importo and open/close dates).

✅ Wired — CKAN CSV, no API key.

anac

ANAC / PNCP open-contracting (OCDS) data — historical / awarded contracts (> €40k, monthly), not open calls. Surfaces mostly-closed opportunities (the matcher drops them); its value is market/history analysis. Region is absent in the data → national.

✅ Wired — streams the Open Contracting mirror (CC BY 4.0, no API key), capped at 500 releases/run.

uv run bandiradar fetch --source incentivi --sample   # offline, bundled real capture
uv run bandiradar match --profile mayai --source incentivi --sample
uv run bandiradar match --profile mayai --source ted --sample

The --sample fixtures are real captures (data/fixtures/*.json). incentivi exercises the canonical superset on the grant side — no CPV, a funding range, and an eligibility text the matcher reads. TED carries above-threshold contracts often far larger than a micro-SME's range, so a small profile matches only the few that fit — which is why incentive/national/regional sources matter too.

A regional example (the bundled medtech_lombardia example profile, a Lombardy medical-devices distributor) matches open Lombardy tenders, while the Lazio-only MayAI profile correctly drops them — regional filtering in action:

uv run bandiradar match --profile medtech_lombardia --source lombardia --sample
# -> 3 open medical-device tenders (region match, CPV 33*, within value range)
uv run bandiradar match --profile mayai --source lombardia --sample
# -> No matching opportunities (Lazio profile, Lombardy bandi dropped on region)

And the dogfood closes the loop — MayAI is a Lazio company, and lazio (LazioInnova) is where it finally matches its own region:

uv run bandiradar match --profile mayai --source lazio --sample
# -> Voucher Digitalizzazione PMI 2025 (52), Donne e Impresa 2026 (42, closing soon)
#    region match: Lazio; overlap: digitalizzazione, software, cloud, dati

Source data licensing is consolidated under Data and licenses.

Regional coverage

Two reusable bases make a sizeable share of Italian regions config-only: WordPressBandiSource (sources/wordpress.py) for WP-REST portals — Lazio (LazioInnova) and Sicilia (EuroInfoSicilia, standard posts + a categories filter) are configs over it — and PloneBandoSource (sources/plone.py) for the many PAs running Plone with the AGID Bando content type, where a structured scadenza_bando beats text-parsing — Emilia-Romagna is the reference config. Open-data tables get a dedicated adapter (Socrata for lombardia, a CKAN CSV for trentino FEASR). Each is a config/adapter + a fixture + a test, not core code.

Honestly, though, that clean pattern is rare — most Italian regional agency portals are bespoke sites with no public open-bandi API, so each new region usually needs its own adapter (CKAN/Socrata like lombardia, or HTML scraping) rather than a one-line config. We don't ship half-working adapters: a portal that's unreachable, retrospective-only, or API-less is skipped, not faked. The per-region status (what's been checked, where coverage is needed) lives in docs/regions.mdregional contributions are very welcome.

For those API-less portals, toscana is the reference LLM-assisted scraper: fetch() lists each bando's detail page, fetches the HTML, and an LLM extracts the canonical fields (title, deadline, eligibility, amounts, keywords), cached per URL. That extraction is I/O, so it lives in fetch() and needs an LLM keyto_opportunities stays pure, and --sample replays a recorded extraction with zero secrets:

uv run bandiradar fetch --source toscana --sample   # offline, recorded extraction
uv run bandiradar match --profile pmi_toscana --source toscana --sample
# -> Bando Energia Imprese (92), Bando 1.3.2 Sostegno alle PMI/BEI (82),
#    Bando Energia Immobili Imprese (78); public-only Energia Pubblico dropped to 15

Self-healing crawl

A scraper's listing drifts and the crawl re-derives its own recipe — offline

A scraper's fragile part is the crawl (the listing it depends on), not the extraction — the LLM already adapts to changed HTML. So the crawl is data, not code: a CrawlRecipe (where the listing is + dotted paths to each field). That makes drift detectable and the fix machine-checkable:

  1. Spine — every healthy crawl validates its results and snapshots the last-good ones (the golden). A drift (renamed/moved fields → unusable refs) is detected deterministically, not by a crash.

  2. Healer — on drift, an LLM is shown one live listing item and the broken recipe, and asked to re-derive only the paths (data, never code).

  3. Guard — the candidate recipe is adopted only if it exactly reproduces the golden. If it parses but differs (content genuinely changed) or stays broken, the recipe is left untouched and the source is flagged for a human — never a silent swap. Adoptions are auditable ({recipe, adopted_at, reason, validated_by}).

uv run python scripts/demo_self_heal.py   # offline, fake healer: drift → heal → recovered

First demonstrated on the toscana scraper. This keeps a scraper alive across small portal changes without shipping new code — and refuses to guess when it can't prove the fix. Where the open engine stops and managed/premium coverage begins is mapped in the coverage map.

Intelligence and benchmarks

A separate track (not the matcher) ingests ANAC historical OCDS data — awarded public contracts — and computes compact benchmarks per CPV-division × region: award value distribution (median, p25/p75, min/max), volume, seasonality (by year), and distinct-supplier counts.

uv run bandiradar benchmarks build --sample          # offline, bundled real capture
uv run bandiradar benchmarks show --cpv 45           # region falls back to national

Real output on the bundled sample:

CPV division 45  [national]
  awards (count): 22   distinct suppliers: 21
  value EUR: median 470,768  p25 121,649  p75 1,594,879
  range: 68,117 – 11,369,083
  by year: 2022:22

Honest data caveats:

  • The dataset is retrospectiveawarded contracts (> €40k), not open calls.

  • It has awards + suppliers but no tenderers list, so we cannot derive a "number of bidders". We derive value/volume/seasonality/supplier counts only.

  • The release addresses carry city + postal code but no region/NUTS, so benchmarks are national-only for now (region stays None); the model and aggregation already support regional buckets for when a region-bearing source arrives.

Enrichment: benchmarks in the matcher

The benchmarks are optional matcher enrichment (injected like the score cache — the matcher works fine without them). Add --with-benchmarks to match and each scored opportunity gets, for its CPV division, a historical-context reason plus a value-sanity risk note when it declares an estimated value:

uv run bandiradar benchmarks build --sample
uv run bandiradar match --profile mayai --source ted --sample --with-benchmarks
#1  score 44  [open]  Italia – Servizi di gestione dati – SERVIZIO DI GESTIONE ... ROCCA IMPERIALE (CS)
     why: CPV prefix match (depth 2); capability overlap: dati; eu scope; ANAC history (CPV 72, national): 8 awards, median EUR 104,326, p25-p75 EUR 71,619-183,410
     https://ted.europa.eu/en/notice/-/detail/376324-2026

Value-sanity triggers when the opportunity declares a value — e.g. a Lombardy medical-devices tender (--profile medtech_lombardia --source lombardia --sample --with-benchmarks):

#1  score 76  [open]  FORNITURA DI DISPOSITIVI PER ENDOSCOPIA DIGESTIVA … ASST LODI …
    why: ...; ANAC history (CPV 33, national): 7 awards, median EUR 1,183,540, p25-p75 EUR 104,190-1,640,000
    risk: estimated value EUR 2,133,178 is above the historical p75 EUR 1,640,000 for this category

Enrichment is append-only on a copy of the cached match: the cache always stores the bare match, so repeated runs never double-append. The search_opportunities MCP tool takes the same with_benchmarks flag. ANAC data licensing is under Data and licenses.

Document enrichment (PDF/OCR)

Most of a tender's real requirements live in attachment PDFs (the disciplinare/bando), not in the title or CPV. With --with-documents, the matcher fetches an opportunity's document_urls, extracts the text, and folds it into every matching input — the prefilter keyword gate, the offline heuristic's overlap, and the LLM prompt — so requirements that exist only in the attachments can still drive (or sink) a match. Extracted text is cached per URL (SQLite), so PDFs aren't re-downloaded.

uv run bandiradar match --profile mine.yaml --source ted --sample --with-documents
  • Optional and injected — like the score/benchmark caches; off by default. The default install only needs pypdf.

  • OCR for scanned PDFs is the optional ocr extra: uv sync --extra ocr plus the system binaries tesseract and poppler. When absent, OCR is skipped cleanly (text-based PDFs still work). Enrichment never raises into the matcher — a failed fetch/parse degrades to no added text.

  • Honest source coverage: only ted currently carries a per-notice document link (the notice PDF). lombardia, incentivi, and anac expose no per-document attachment URL in their data, so document_urls is empty for them (no faking) — until those links are wired, --with-documents is a no-op there.

--with-documents fetches PDFs over the network, so (unlike the default --sample flow) it is not offline.

Watch and export

watch is a monitor loop: it fetches, applies the storage change-detection, and reports only matches whose opportunity is new or amended since the last watch run (a per-profile marker is persisted; --since overrides it).

# 1st run: all current matches are "new"; a 2nd run reports nothing new
uv run bandiradar watch --profile mayai --source incentivi --sample
# write a feed instead of printing
uv run bandiradar watch --profile mayai --source incentivi --sample --rss ~/feed.xml

export is the full, non-delta dump of current matches (--json or --rss PATH).

Scheduling is your cron — this is open-core (single-user/local). For example:

0 8 * * *  cd /path/to/bandiradar && uv run bandiradar watch --profile mine.yaml --rss ~/feed.xml

Managed delivery (WhatsApp/email/alerts), scheduling SaaS, and multi-tenant hosting live in bandiradar-pro.

Live monitor (runs itself, daily)

This repo monitors itself. A GitHub Actions workflow (.github/workflows/monitor.yml) runs every day at 05:23 UTC (an off-peak minute — GitHub often skips on-the-hour schedules; and on demand): it fetches every key-less source — plus toscana, so the self-healing crawl drift-check runs in production — watches every bundled profile, and publishes the results to the orphan monitor-data branch:

  • feeds/<profile>.xml / feeds/<profile>.json — the new/amended matches per profile;

  • STATUS.md — run date, per-source outcome + counts, new matches per profile, and the crawl-recipe state (ok / drift / healed / flagged).

It runs with zero secrets (guardrail 1): keyless ⇒ recall mode + offline heuristic matcher, and crawl drift is only detected. Add the optional ANTHROPIC_API_KEY repo secret and the same workflow scores with the LLM and activates the crawl healer (a drifted recipe is auto-re-derived and adopted only if it reproduces the golden exactly). The data branch is kept flat — one force-pushed commit per run, so generated state never bloats the repo history. A run fails only when every source fails; partial failures are warnings in STATUS.md.

It fetches once per run (the first profile fetches every source; the others reuse the DB via watch --skip-fetch), so the whole job takes ~5 minutes, not 30+. Every request sends an identifying User-Agent and a short connect timeout.

Operational protections (v0.5.1). With an LLM key, a per-run spend cap (BANDIRADAR_LLM_BUDGET) bounds new scorings — items beyond the cap are deferred and amortized by the score cache across the next runs, not dropped. Before each publish, bandiradar prune trims stale raw_docs of long-closed opportunities and old run rows (then VACUUMs) to keep the data branch well under GitHub's blob limit, without touching the score cache or crawl recipes. And the long step is time-boxed so doctor + STATUS + publish always run: if the run is truncated, STATUS.md says so (⚠️ Run truncated: X/N profiles completed) instead of republishing stale numbers.

Known limit — PA hosts that geo-block datacenter IPs (and the optional relay). Some open endpoints drop datacenter / extra-EU IPs at the connection level — incentivi.gov.it does, so GitHub-hosted runners couldn't reach it (verified: works from residential / EU IPs). Not fixable in client code. The engine supports an optional HTTP relay for exactly this: requests to allowlisted hosts are transparently rewritten to <relay>?u=<original-url> at the HTTP layer (no adapter changes), driven by three env vars / repo secrets — BANDIRADAR_RELAY_URL, BANDIRADAR_RELAY_TOKEN (sent as X-Relay-Token; from secrets, never the repo), BANDIRADAR_RELAY_HOSTS (comma-separated allowlist).

In this repo's live monitor the incentivi gap is solved: the relay runs as a Vercel function pinned to an EU region (fra1 → EU egress; reference source in infra/vercel-relay/ — the deployment and its token are the operator's infrastructure). An earlier Cloudflare-Worker attempt did NOT work: Workers execute on the edge nearest the CALLER, so a US runner got US egress and the geo-block stayed (HTTP 522). With the env unset, nothing changes: the repo stays keyless and fully functional, and the gap returns — visibly classified in STATUS.md, never silently. The pre-flight step probes incentivi both direct and via relay and logs both outcomes. (TED's earlier 403 from CI was a different issue — a default-User-Agent block — and is fixed: with our identifying User-Agent, TED fetches from the runners.)

AI agents (MCP)

BandiRadar ships a thin MCP server (FastMCP), so you can drive it from Claude. Six tools:

list_sources · fetch_opportunities · search_opportunities · score_opportunity · get_matches · get_profile

uv run bandiradar mcp

Registration and an offline example session are in docs/MCP.md.

Status

  • Offline, zero-secret — every demo above and the whole test suite run with no network and no API key.

  • 9 live key-less sourcesincentivi, ted, anac_pvl, lombardia, lazio, sicilia, emilia_romagna, trentino (open calls) plus anac (historical). --sample keeps them offline against recorded real captures.

  • Live OPEN Italian tendersanac_pvl (Pubblicità a Valore Legale) is the national feed of open, biddable gare, incl. sub-threshold ones TED never lists, no credentials; it keeps only still-open notices.

  • 10 LLM-assisted scraperstoscana, veneto, piemonte, puglia, sardegna, fvg, campania, calabria, basilicata, liguria: live fetch() extracts fields from each portal's HTML bando pages with an LLM (needs a key); --sample replays offline.

  • Self-healing crawl — a drifted scraper listing triggers an LLM that re-derives the crawl recipe (data, not code); it's adopted only when it exactly reproduces the last-good results, else human-flagged.

  • Stage-2 LLM scoring is wired and working (optional); with no key it transparently uses the offline heuristic — a proxy, not real semantic relevance.

  • Live ANAC/PNCP fetch is wired — streams the Open Contracting OCDS mirror (key-less), capped at 500 releases/run. The data is retrospective (awarded contracts), so it surfaces mostly-closed opportunities — useful for history / market analysis, not as a feed of open calls.

  • Live-fetch robustness shipped (0.2.0) — retries/backoff, pagination, per-source isolation (one source failing never aborts the others), per-record quarantine, and a doctor diagnostic. Dirty single records are tolerated, never fatal.

  • Honest limitation: the real residual gap is coverage, not robustness — Italian regional funding is fragmented across bespoke API-less portals, and the richest tender documents are gated. See the coverage map for the open-vs-gated landscape and where the open/Pro boundary falls.

Open core vs Pro

Anything a single user can run locally is open. Anything managed, multi-client, or a delivery channel lives in the private bandiradar-pro, which depends on this package — never the reverse.

bandiradar (this repo, MIT)

bandiradar-pro (private)

Engine (ingest/normalize/match)

imports it

Source framework (Source interface + registry)

Reference adapters (ANAC, incentivi.gov.it)

Two-stage matcher (incl. offline fallback)

CLI + MCP server

Dashboard (web UI)

Premium / regional source adapters

Delivery channels (WhatsApp, email, alerts)

Multi-tenant, managed hosting, scheduling SaaS

Roadmap

Shipped

  • Canonical model + Source framework + two-stage matcher (deterministic prefilter + LLM relevance with a zero-secrets offline fallback) + SQLite with change-detection + CLI + MCP server.

  • Live sources: TED (EU open tenders), incentivi.gov.it (national incentives), anac_pvl (national OPEN tenders — Pubblicità a Valore Legale, incl. sub-threshold), and the regions Lombardia (Socrata tenders), Lazio (LazioInnova incentives), Sicilia (EuroInfoSicilia FESR/FSC), Emilia-Romagna (Plone Bando) and Trentino (CKAN FEASR), all key-less; ANAC OCDS wired as a capped, key-less historical / awarded-contracts feed (analysis, not open calls).

  • CPV resolver (Italian CPV labels → official 8-digit EU codes) + region fallback (province → comune/ISTAT → buyer → national) — measured keyless recall gains on tender profiles.

  • LLM-assisted scraper for API-less regional portals — Regione Toscana (Sviluppo Toscana) is the first instance (live fetch needs an LLM key).

  • Self-healing crawl — crawl recipes as data + drift detection + golden-sample guard + an LLM recipe healer (gated adoption; human-flagged otherwise).

  • Coverage map — honest open-vs-gated landscape of Italian funding data.

  • Intelligence track: ANAC historical benchmarks + optional matcher enrichment (--with-benchmarks).

  • watch monitor loop (new/amended deltas) + JSON/RSS export.

  • Embeddings semantic prefilter — built and measured; ships optional and off (net-negative at the current recall ceiling — see Honest limits under Matching quality).

Upcoming

  • More community/regional source adapters (via the Source framework — 6 regions covered so far; the per-territory recon in the coverage map shows where help is welcome).

  • bandiradar-pro (private): dashboard, WhatsApp/email delivery, scheduling SaaS, multi-tenant hosting.

Contributing

Every source is fetch + a pure to_opportunities, plus a recorded fixture and a test — adding one is a new file, no core changes. See CONTRIBUTING.md and the add-a-source skill (skills/add-a-source/) for the full copy-pasteable template; the playbook also lives in CLAUDE.md ("How to add a new Source").

Each source also has an offline contract test against a recorded real response (tests/cassettes/), plus an opt-in live drift check that runs only with uv run pytest -m live (never in CI). See CONTRIBUTING.md for how to run it and re-record a cassette when an API changes.

License

MIT © MayAI — see LICENSE.

Data and licenses

BandiRadar consumes public open data; each source keeps its own licence, which its operator requires you to honor:

  • TED — Tenders Electronic Daily (EU) — the EU's public procurement journal (Publications Office of the EU). Notice data is reusable under the Commission's open-data reuse policy; the live ted fetch uses the anonymous public api.ted.europa.eu search API (no key). Attribute TED / the Publications Office.

  • incentivi.gov.it (IODL 2.0) — published by the Ministero delle Imprese e del Made in Italy under the Italian Open Data License v2.0 (attribution required). The live incentivi fetch hits the same open-data export endpoint the portal's own "Scarica dataset" button uses (no separate static file; the download is built client-side from that endpoint).

  • Regione Lombardia (CC0 1.0) — dataset k6cb-4hbm (Bandi di gara — Osservatorio Regionale), via the dati.lombardia.it Socrata SODA API.

  • Regione Lazio / LazioInnova — bandi published by LazioInnova (the regional development agency) and read via its WordPress REST API (lazioinnova.it/wp-json). Source: LazioInnova / Regione Lazio; attribute the source when reusing.

  • Regione Toscana / Sviluppo Toscana — bandi published on the Sviluppo Toscana portal (sviluppo.toscana.it); detail-page links come from its WP REST listing and the fields are LLM-extracted from each public bando page. Source: Sviluppo Toscana / Regione Toscana; attribute the source when reusing.

  • Regione Siciliana / EuroInfoSicilia — FESR/FSC bandi published on euroinfosicilia.it and read via its WordPress REST API (category "Bandi e Avvisi"). Source: Regione Siciliana — EuroInfoSicilia; attribute the source.

  • Regione del Veneto — atti published on the public SIU bandi portal (bandi.regione.veneto.it); the landing page seeds the crawl and the fields are LLM-extracted from each public Dettaglio page. Source: Regione del Veneto; attribute the source when reusing.

  • Regione Piemonte — bandi published on bandi.regione.piemonte.it; the public Views listing seeds the crawl and the fields are LLM-extracted from each public detail page. Source: Regione Piemonte; attribute the source when reusing.

  • Regione Puglia — avvisi published on the PR Puglia 2021-2027 portal (pr2127.regione.puglia.it); the public news listing seeds the crawl and the fields are LLM-extracted from each public detail page. Source: Regione Puglia; attribute the source when reusing.

  • Regione Autonoma della Sardegna / Sardegna Impresa — agevolazioni published on sardegnaimpresa.eu; the public listing seeds the crawl and the fields are LLM-extracted from each public detail page. Source: Regione Autonoma della Sardegna; attribute the source when reusing.

  • Regione Autonoma Friuli Venezia Giulia — bandi published in the regione.fvg.it bandi/avvisi module; the contributi-filtered search seeds the crawl and the fields are LLM-extracted from each public detail page. Source: Regione Autonoma Friuli Venezia Giulia; attribute the source when reusing.

  • Regione Campania / Sviluppo Campania — bandi published on sviluppocampania.it; the public open-bandi page seeds the crawl and the fields are LLM-extracted from each public post. Source: Sviluppo Campania / Regione Campania; attribute the source when reusing.

  • Regione Calabria / Calabria Europa — bandi published on calabriaeuropa.regione.calabria.it (PR 2021-2027) and read via its open WP-REST bando type; fields LLM-extracted from each public page. Source: Regione Calabria; attribute the source when reusing.

  • Regione Basilicata — avvisi published on portalebandi.regione.basilicata.it and read via its open WP-REST type; fields LLM-extracted from each public page. Source: Regione Basilicata; attribute.

  • Regione Liguria — bandi published on regione.liguria.it (publiccompetition); the public filtered search seeds the crawl and fields are LLM-extracted from each public page. Source: Regione Liguria; attribute.

  • Regione Emilia-Romagna — bandi published on the regional Politiche territoriali portal (politicheterritoriali.regione.emilia-romagna.it) and read via plone.restapi (portal_type=Bando). Source: Regione Emilia-Romagna; attribute the source.

  • Provincia Autonoma di Trento (CC BY 4.0) — FEASR bandi calendar from the dati.trentino.it CKAN open-data portal. Source: Provincia Autonoma di Trento; attribute the source.

  • ANAC public contracts (CC BY 4.0) — via the Open Contracting Data mirror; both the anac source and the intelligence track stream the gzipped JSONL memory-safely (line by line) through the shared reader in bandiradar/ocp.py.

Install Server
A
license - permissive license
B
quality
A
maintenance

Maintenance

Maintainers
Response time
0dRelease cycle
11Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mayai-it/bandiradar'

If you have feedback or need assistance with the MCP directory API, please join our Discord server