Skip to main content
Glama
h-lu

Paper Search MCP Server

by h-lu

get_repec_paper

Retrieve complete academic paper metadata from RePEc/IDEAS, including abstracts, authors, keywords, and JEL codes not available in search results.

Instructions

Get detailed paper information from RePEc/IDEAS.

Fetches complete metadata from an IDEAS paper detail page, including
abstract, authors, keywords, and JEL codes that may be missing from
search results.

USE THIS WHEN:
- You have a paper URL/handle from search results and need the abstract
- You want complete author information for a specific paper
- You need JEL classification codes or keywords

Args:
    url_or_handle: Paper URL or RePEc handle, e.g.:
        - URL: "https://ideas.repec.org/p/nbr/nberwo/32000.html"
        - Handle: "RePEc:nbr:nberwo:32000"

Returns:
    Paper dict with: paper_id, title, authors, abstract, keywords,
    categories (JEL codes), published_date, url, pdf_url (if available),
    doi (if found), and extra info like journal name.

Example:
    get_repec_paper("https://ideas.repec.org/a/aea/aecrev/v110y2020i1p1-40.html")

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
url_or_handleYes

Implementation Reference

  • MCP tool handler for get_repec_paper. Registers the tool via @mcp.tool() decorator and delegates to RePECSearcher.get_paper_details for implementation.
    @mcp.tool()
    async def get_repec_paper(url_or_handle: str) -> Dict:
        """Get detailed paper information from RePEc/IDEAS.
        
        Fetches complete metadata from an IDEAS paper detail page, including
        abstract, authors, keywords, and JEL codes that may be missing from
        search results.
        
        USE THIS WHEN:
        - You have a paper URL/handle from search results and need the abstract
        - You want complete author information for a specific paper
        - You need JEL classification codes or keywords
        
        Args:
            url_or_handle: Paper URL or RePEc handle, e.g.:
                - URL: "https://ideas.repec.org/p/nbr/nberwo/32000.html"
                - Handle: "RePEc:nbr:nberwo:32000"
        
        Returns:
            Paper dict with: paper_id, title, authors, abstract, keywords,
            categories (JEL codes), published_date, url, pdf_url (if available),
            doi (if found), and extra info like journal name.
        
        Example:
            get_repec_paper("https://ideas.repec.org/a/aea/aecrev/v110y2020i1p1-40.html")
        """
        searcher = SEARCHERS['repec']
        paper = searcher.get_paper_details(url_or_handle)
        if paper:
            return paper.to_dict()
        else:
            return {"error": f"Failed to fetch paper details from: {url_or_handle}"}
  • Core helper method implementing the scraping and metadata extraction logic for RePEc/IDEAS paper details using requests and BeautifulSoup.
    def get_paper_details(self, url_or_handle: str) -> Optional[Paper]:
        """获取论文详细信息
        
        从 IDEAS 论文详情页获取完整的元数据,包括摘要、作者、关键词等。
        搜索结果中缺少的信息可以通过此方法补充。
        
        Args:
            url_or_handle: 论文 URL 或 RePEc handle
                - URL: https://ideas.repec.org/a/sae/inrsre/v49y2026i1p62-90.html
                - Handle: RePEc:sae:inrsre:v49y2026i1p62-90
                
        Returns:
            Paper: 包含详细信息的论文对象,失败返回 None
            
        Example:
            >>> paper = searcher.get_paper_details("https://ideas.repec.org/p/nbr/nberwo/32000.html")
            >>> print(paper.abstract)
        """
        try:
            # 处理输入:可能是 URL 或 RePEc handle
            if url_or_handle.startswith('RePEc:'):
                # 转换 RePEc handle 为 URL
                # RePEc:sae:inrsre:v49y2026i1p62-90 -> https://ideas.repec.org/a/sae/inrsre/v49y2026i1p62-90.html
                # 注意:需要猜测文档类型(a/p/h/b),默认尝试 paper(p) 和 article(a)
                parts = url_or_handle.replace('RePEc:', '').split(':')
                if len(parts) >= 3:
                    publisher, series, paper_id = parts[0], parts[1], ':'.join(parts[2:])
                    # 尝试不同类型的 URL
                    for doc_type in ['p', 'a', 'h', 'b']:
                        url = f"https://ideas.repec.org/{doc_type}/{publisher}/{series}/{paper_id}.html"
                        response = self.session.head(url, timeout=5)
                        if response.status_code == 200:
                            break
                    else:
                        logger.warning(f"Cannot resolve RePEc handle: {url_or_handle}")
                        return None
                else:
                    logger.warning(f"Invalid RePEc handle format: {url_or_handle}")
                    return None
            elif url_or_handle.startswith('http'):
                url = url_or_handle
            else:
                # 假设是相对路径
                url = f"https://ideas.repec.org{url_or_handle}"
            
            # 随机延迟
            time.sleep(random.uniform(0.3, 0.8))
            
            # 请求页面
            response = self.session.get(url, timeout=self.timeout)
            if response.status_code != 200:
                logger.warning(f"Failed to fetch paper details: HTTP {response.status_code}")
                return None
            
            # 解析 HTML
            soup = BeautifulSoup(response.text, 'html.parser')
            
            # 从 META 标签提取信息
            def get_meta(name: str) -> str:
                """获取 META 标签内容"""
                tag = soup.find('meta', attrs={'name': name})
                if tag:
                    return tag.get('content', '').strip()
                return ''
            
            # 提取各字段
            title = get_meta('citation_title') or get_meta('title')
            abstract = get_meta('citation_abstract')
            
            # 作者处理(支持 ; 和 & 分隔)
            authors_str = get_meta('citation_authors') or get_meta('author')
            if authors_str:
                # 替换 & 为 ; 然后分割
                authors = [a.strip() for a in authors_str.replace(' & ', ';').split(';') if a.strip()]
            else:
                authors = []
            
            # 关键词
            keywords_str = get_meta('citation_keywords') or get_meta('keywords')
            if keywords_str:
                keywords = [k.strip() for k in keywords_str.split(';') if k.strip()]
            else:
                keywords = []
            
            # JEL 分类代码
            jel_codes_str = get_meta('jel_code')
            if jel_codes_str:
                categories = [j.strip() for j in jel_codes_str.split(';') if j.strip()]
            else:
                categories = []
            
            # 日期
            date_str = get_meta('date') or get_meta('citation_publication_date')
            published_date = None
            if date_str:
                try:
                    if '-' in date_str:
                        # 格式: 2026-02-02
                        published_date = datetime.strptime(date_str, '%Y-%m-%d')
                    else:
                        # 格式: 2026
                        published_date = datetime(int(date_str), 1, 1)
                except (ValueError, TypeError):
                    pass
            
            # 期刊/系列名称
            journal = get_meta('citation_journal_title')
            
            # 提取 RePEc handle
            paper_id = self._extract_repec_handle(url)
            
            # 尝试查找 DOI(从页面内容)
            doi = ''
            doi_link = soup.find('a', href=re.compile(r'doi\.org/10\.'))
            if doi_link:
                doi_match = re.search(r'10\.\d{4,}/[^\s]+', doi_link.get('href', ''))
                if doi_match:
                    doi = doi_match.group()
            
            # 尝试获取 PDF 链接
            pdf_url = ''
            pdf_link = soup.find('a', href=re.compile(r'\.pdf$', re.I))
            if pdf_link:
                pdf_url = pdf_link.get('href', '')
                if pdf_url and not pdf_url.startswith('http'):
                    pdf_url = f"https://ideas.repec.org{pdf_url}"
            
            return Paper(
                paper_id=paper_id,
                title=title,
                authors=authors,
                abstract=abstract,
                url=url,
                pdf_url=pdf_url,
                published_date=published_date,
                source="repec",
                categories=categories,
                keywords=keywords,
                doi=doi,
                citations=0,
                extra={'journal': journal} if journal else {},
            )
            
        except requests.Timeout:
            logger.warning(f"Timeout fetching paper details from {url_or_handle}")
            return None
        except requests.RequestException as e:
            logger.warning(f"Request error fetching paper details: {e}")
            return None
        except Exception as e:
            logger.warning(f"Error fetching paper details: {e}")
            return None
  • Global SEARCHERS dictionary instantiates RePECSearcher for 'repec' platform, used by get_repec_paper handler.
    SEARCHERS = {
        'arxiv': ArxivSearcher(),
        'pubmed': PubMedSearcher(),
        'biorxiv': BioRxivSearcher(),
        'medrxiv': MedRxivSearcher(),
        'google_scholar': GoogleScholarSearcher(),
        'iacr': IACRSearcher(),
        'semantic': SemanticSearcher(),
        'crossref': CrossRefSearcher(),
        'repec': RePECSearcher(),
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/h-lu/paper-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server