Automates Instagram profile investigation by scraping profile data, recent posts metadata, and generating AI-powered analytical reports of user activity and content patterns.
Integrates with OpenAI's API to generate qualitative summaries and analytical reports from scraped Instagram profile data and post metadata.
Instagram MCP Investigator
Node.js MCP server that automates a Chromium session with Playwright, scrapes a target Instagram profile (using a saved login state), and feeds the results to OpenAI for an annotated report.
Prerequisites
Node.js 20+
Installed Playwright browsers (
npx playwright install chromium
)A valid Instagram session captured in
storageState.json
OpenAI API key (optional but recommended)
Setup
Install dependencies:
npm installInstall the Chromium runtime that Playwright drives:
npx playwright install chromiumCapture a logged-in Instagram session (opens a non-headless browser):
npm run loginLog in manually in the launched window.
Return to the terminal and press Enter to persist
storageState.json
.
Copy
.env.example
to.env
and fill in the values you need. At minimum setOPENAI_API_KEY
to enable report generation.
Running the MCP server
The server communicates over stdio, so it can be registered with any MCP-compatible client.
Environment variables influence scraping behaviour:
PLAYWRIGHT_STORAGE_STATE
: path to the saved session (defaults to./storageState.json
).PLAYWRIGHT_HEADLESS
: set tofalse
to watch the browser.PLAYWRIGHT_BROWSER_CHANNEL
: set tochrome
if you want to run the system Chrome instead of bundled Chromium.MAX_POSTS
: default number of recent posts to fetch.OPENAI_MODEL
: overrides the model used for summarisation.
MCP tool contract
The server exposes a single tool named instagram_profile_report
.
Input schema:
username
(required): Instagram handle with or without the leading@
.maxPosts
(optional): limit recent posts inspected (max 50).headless
(optional): override headless mode per invocation.storageStatePath
(optional): use an alternate saved login state.includeRaw
(optional): append the raw JSON payload to the textual response.model
(optional): set a different OpenAI chat model.
Workflow overview
Tool invocation resolves the target profile URL and opens it via Playwright.
The script reuses
storageState.json
to stay logged in.Recent post metadata (image URL, alt text, timestamp, caption preview) and profile stats are serialised into JSON.
The JSON is sent to OpenAI for a qualitative summary.
The MCP tool returns a textual report (and optional raw JSON) to the requesting client.
Troubleshooting
Redirected to login: refresh the saved session with
npm run login
.Not enough post data: verify the profile is public and that the logged-in account has access.
No summary returned: confirm
OPENAI_API_KEY
is set and the specified model is available.
Next steps
Add richer scraping (e.g., fetch captions via GraphQL requests) if Instagram layout changes.
Extend the MCP tool to cache results or stream image thumbnails for downstream automation.
ローカルデバッグ
クライアントに接続する前に動作を確認したい場合は、デバッグ用スクリプトを呼び出せます。
--no-headless
を付ければブラウザの挙動を確認できます。--storage-state
で別のセッションファイルを指定可能。OpenAIの要約が不要な場合は
OPENAI_API_KEY
を設定しなくても実行できます(要約部分は注意書きが表示されます)。進捗ログはデフォルトで表示されます。さらに詳しく見たいときは
--verbose
、静かに実行したいときは--quiet
を指定してください。ページロードが遅くタイムアウトする場合は
--wait-until domcontentloaded
や--wait-until load
を指定すると安定します(環境変数PLAYWRIGHT_WAIT_UNTIL
でも設定可)。
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
Automates Instagram profile scraping using Playwright with saved login sessions and generates AI-powered analytical reports. Enables users to extract profile data, recent posts metadata, and receive OpenAI-generated summaries through natural language interactions.