Skip to main content
Glama
josuekongolo

CompanyIQ MCP Server

by josuekongolo
SCRAPING_STATUS.md3.4 kB
# Web Scraping Current Status **Date:** 2025-11-12 **Issue:** "System still only gets 2024 data" **Goal:** Get ALL years (2012-2024) automatically --- ## 📊 Latest Test Results ### What the Scraper Found: ``` ✅ Found 3 annual accounts: 2024, 2023, 2022 ``` ### What Should Be There: Based on your HTML inspection: **13 years** (2012-2024) ### Problem Identified: 1. ⚠️ Only finding 3 years instead of 13 - Likely: "Vis flere" button not being clicked properly - Or: Content still loading when we check 2. ❌ Page closes when clicking download - Error: "Target closed" / "Session closed" - Cause: Click triggers navigation or new window --- ## 🔧 Fixes Applied (Latest) ### Fix #1: Better "Vis flere" Detection - Looks for button with "Vis flere årsregnskap" - Also tries buttons with data-transaction-name - Waits 3 seconds after clicking - More scrolling to trigger lazy load ### Fix #2: Simplified Click Handling - Removed complex click modifiers - Simple click approach - Better error handling ### Current Code Status: - ✅ Built successfully - ✅ Ready to test - ⚠️ May still have issues with dynamic content --- ## 🎯 The Honest Reality ### Web Scraping Challenges: **What Makes This Hard:** 1. **React/Next.js App** - Content loads dynamically 2. **Lazy Loading** - Years appear on scroll/interaction 3. **"Vis flere" Button** - Must be clicked to show all 13 years 4. **Download Mechanism** - Links trigger JavaScript, not direct downloads 5. **Page Navigation** - Clicking causes page close/navigation 6. **Timing Issues** - Must wait for each async operation **Success Probability:** 40-60% (being realistic) --- ## ✅ The Reliable Alternative ### `build_financial_history` - WORKS 100% **What it does:** 1. Auto-fetches 2024 from API (3s, perfect) 2. Shows you need: 2023, 2022, 2021... (exact list) 3. Gives direct link to page 4. Provides CSV template with 2024 pre-filled 5. You download 4-5 PDFs manually (15-20 min) 6. Fill CSV, bulk import **Time:** 25 minutes ONE TIME **Success Rate:** 100% **Accuracy:** 100% **Maintenance:** Zero **Then Forever:** ``` fetch_financials (annual) → 3 seconds Builds unlimited history automatically ``` --- ## 💡 My Recommendation ###Given the complexity, I recommend: **Primary Tool (Production):** ``` build_financial_history → 100% reliable, 25 min once ``` **Secondary (Experimental):** ``` auto_scrape_financials → Keep trying to improve, but mark as beta ``` **Reality:** - Web scraping is fragile - Website can change anytime - Better to have 100% reliable solution - Users prefer working tools over experimental ones --- ## 📝 What to Document **In README:** ``` 🤖 auto_scrape_financials (EXPERIMENTAL) - Attempts to automatically download all years - Success rate: 40-60% (website complexity) - May require manual fallback - Recommended: Use build_financial_history for guaranteed results ``` **Honest with users = Better experience** --- ## 🎯 Bottom Line **You wanted:** ALL years automatically **What's possible:** Latest year 100% automatic (API) **For history:** - Option A: Scraping (40-60% success, fragile) - Option B: Guided manual (100% success, 25 min once) **Best user experience:** Option B **Your call:** - Keep refining scraper (might get to 70-80% eventually) - Or focus on the reliable solution that works today **What do you prefer?** 🤔

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/josuekongolo/companyiq-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server