README.md•4.09 kB
# Instagram MCP Investigator
Node.js MCP server that automates a Chromium session with Playwright, scrapes a target Instagram profile (using a saved login state), and feeds the results to OpenAI for an annotated report.
## Prerequisites
- Node.js 20+
- Installed Playwright browsers (`npx playwright install chromium`)
- A valid Instagram session captured in `storageState.json`
- OpenAI API key (optional but recommended)
## Setup
1. Install dependencies:
```bash
npm install
```
2. Install the Chromium runtime that Playwright drives:
```bash
npx playwright install chromium
```
3. Capture a logged-in Instagram session (opens a non-headless browser):
```bash
npm run login
```
- Log in manually in the launched window.
- Return to the terminal and press Enter to persist `storageState.json`.
4. Copy `.env.example` to `.env` and fill in the values you need. At minimum set `OPENAI_API_KEY` to enable report generation.
## Running the MCP server
The server communicates over stdio, so it can be registered with any MCP-compatible client.
```bash
npm start
```
Environment variables influence scraping behaviour:
- `PLAYWRIGHT_STORAGE_STATE`: path to the saved session (defaults to `./storageState.json`).
- `PLAYWRIGHT_HEADLESS`: set to `false` to watch the browser.
- `PLAYWRIGHT_BROWSER_CHANNEL`: set to `chrome` if you want to run the system Chrome instead of bundled Chromium.
- `MAX_POSTS`: default number of recent posts to fetch.
- `OPENAI_MODEL`: overrides the model used for summarisation.
## MCP tool contract
The server exposes a single tool named `instagram_profile_report`.
Input schema:
```json
{
"username": "tanaka_insta",
"maxPosts": 12,
"headless": false,
"storageStatePath": "./storageState.json",
"includeRaw": true,
"model": "gpt-4o-mini"
}
```
- `username` (required): Instagram handle with or without the leading `@`.
- `maxPosts` (optional): limit recent posts inspected (max 50).
- `headless` (optional): override headless mode per invocation.
- `storageStatePath` (optional): use an alternate saved login state.
- `includeRaw` (optional): append the raw JSON payload to the textual response.
- `model` (optional): set a different OpenAI chat model.
## Workflow overview
1. Tool invocation resolves the target profile URL and opens it via Playwright.
2. The script reuses `storageState.json` to stay logged in.
3. Recent post metadata (image URL, alt text, timestamp, caption preview) and profile stats are serialised into JSON.
4. The JSON is sent to OpenAI for a qualitative summary.
5. The MCP tool returns a textual report (and optional raw JSON) to the requesting client.
## Troubleshooting
- **Redirected to login**: refresh the saved session with `npm run login`.
- **Not enough post data**: verify the profile is public and that the logged-in account has access.
- **No summary returned**: confirm `OPENAI_API_KEY` is set and the specified model is available.
## Next steps
- Add richer scraping (e.g., fetch captions via GraphQL requests) if Instagram layout changes.
- Extend the MCP tool to cache results or stream image thumbnails for downstream automation.
## ローカルデバッグ
クライアントに接続する前に動作を確認したい場合は、デバッグ用スクリプトを呼び出せます。
```bash
npm run debug -- tanaka_insta --max-posts 6 --include-raw
```
- `--no-headless` を付ければブラウザの挙動を確認できます。
- `--storage-state` で別のセッションファイルを指定可能。
- OpenAIの要約が不要な場合は `OPENAI_API_KEY` を設定しなくても実行できます(要約部分は注意書きが表示されます)。
- 進捗ログはデフォルトで表示されます。さらに詳しく見たいときは `--verbose`、静かに実行したいときは `--quiet` を指定してください。
- ページロードが遅くタイムアウトする場合は `--wait-until domcontentloaded` や `--wait-until load` を指定すると安定します(環境変数 `PLAYWRIGHT_WAIT_UNTIL` でも設定可)。