get_repec_paper
Retrieve complete academic paper metadata from RePEc/IDEAS, including abstracts, authors, keywords, and JEL codes not available in search results.
Instructions
Get detailed paper information from RePEc/IDEAS.
Fetches complete metadata from an IDEAS paper detail page, including
abstract, authors, keywords, and JEL codes that may be missing from
search results.
USE THIS WHEN:
- You have a paper URL/handle from search results and need the abstract
- You want complete author information for a specific paper
- You need JEL classification codes or keywords
Args:
url_or_handle: Paper URL or RePEc handle, e.g.:
- URL: "https://ideas.repec.org/p/nbr/nberwo/32000.html"
- Handle: "RePEc:nbr:nberwo:32000"
Returns:
Paper dict with: paper_id, title, authors, abstract, keywords,
categories (JEL codes), published_date, url, pdf_url (if available),
doi (if found), and extra info like journal name.
Example:
get_repec_paper("https://ideas.repec.org/a/aea/aecrev/v110y2020i1p1-40.html")
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url_or_handle | Yes |
Implementation Reference
- paper_find_mcp/server.py:856-888 (handler)MCP tool handler for get_repec_paper. Registers the tool via @mcp.tool() decorator and delegates to RePECSearcher.get_paper_details for implementation.@mcp.tool() async def get_repec_paper(url_or_handle: str) -> Dict: """Get detailed paper information from RePEc/IDEAS. Fetches complete metadata from an IDEAS paper detail page, including abstract, authors, keywords, and JEL codes that may be missing from search results. USE THIS WHEN: - You have a paper URL/handle from search results and need the abstract - You want complete author information for a specific paper - You need JEL classification codes or keywords Args: url_or_handle: Paper URL or RePEc handle, e.g.: - URL: "https://ideas.repec.org/p/nbr/nberwo/32000.html" - Handle: "RePEc:nbr:nberwo:32000" Returns: Paper dict with: paper_id, title, authors, abstract, keywords, categories (JEL codes), published_date, url, pdf_url (if available), doi (if found), and extra info like journal name. Example: get_repec_paper("https://ideas.repec.org/a/aea/aecrev/v110y2020i1p1-40.html") """ searcher = SEARCHERS['repec'] paper = searcher.get_paper_details(url_or_handle) if paper: return paper.to_dict() else: return {"error": f"Failed to fetch paper details from: {url_or_handle}"}
- Core helper method implementing the scraping and metadata extraction logic for RePEc/IDEAS paper details using requests and BeautifulSoup.def get_paper_details(self, url_or_handle: str) -> Optional[Paper]: """获取论文详细信息 从 IDEAS 论文详情页获取完整的元数据,包括摘要、作者、关键词等。 搜索结果中缺少的信息可以通过此方法补充。 Args: url_or_handle: 论文 URL 或 RePEc handle - URL: https://ideas.repec.org/a/sae/inrsre/v49y2026i1p62-90.html - Handle: RePEc:sae:inrsre:v49y2026i1p62-90 Returns: Paper: 包含详细信息的论文对象,失败返回 None Example: >>> paper = searcher.get_paper_details("https://ideas.repec.org/p/nbr/nberwo/32000.html") >>> print(paper.abstract) """ try: # 处理输入:可能是 URL 或 RePEc handle if url_or_handle.startswith('RePEc:'): # 转换 RePEc handle 为 URL # RePEc:sae:inrsre:v49y2026i1p62-90 -> https://ideas.repec.org/a/sae/inrsre/v49y2026i1p62-90.html # 注意:需要猜测文档类型(a/p/h/b),默认尝试 paper(p) 和 article(a) parts = url_or_handle.replace('RePEc:', '').split(':') if len(parts) >= 3: publisher, series, paper_id = parts[0], parts[1], ':'.join(parts[2:]) # 尝试不同类型的 URL for doc_type in ['p', 'a', 'h', 'b']: url = f"https://ideas.repec.org/{doc_type}/{publisher}/{series}/{paper_id}.html" response = self.session.head(url, timeout=5) if response.status_code == 200: break else: logger.warning(f"Cannot resolve RePEc handle: {url_or_handle}") return None else: logger.warning(f"Invalid RePEc handle format: {url_or_handle}") return None elif url_or_handle.startswith('http'): url = url_or_handle else: # 假设是相对路径 url = f"https://ideas.repec.org{url_or_handle}" # 随机延迟 time.sleep(random.uniform(0.3, 0.8)) # 请求页面 response = self.session.get(url, timeout=self.timeout) if response.status_code != 200: logger.warning(f"Failed to fetch paper details: HTTP {response.status_code}") return None # 解析 HTML soup = BeautifulSoup(response.text, 'html.parser') # 从 META 标签提取信息 def get_meta(name: str) -> str: """获取 META 标签内容""" tag = soup.find('meta', attrs={'name': name}) if tag: return tag.get('content', '').strip() return '' # 提取各字段 title = get_meta('citation_title') or get_meta('title') abstract = get_meta('citation_abstract') # 作者处理(支持 ; 和 & 分隔) authors_str = get_meta('citation_authors') or get_meta('author') if authors_str: # 替换 & 为 ; 然后分割 authors = [a.strip() for a in authors_str.replace(' & ', ';').split(';') if a.strip()] else: authors = [] # 关键词 keywords_str = get_meta('citation_keywords') or get_meta('keywords') if keywords_str: keywords = [k.strip() for k in keywords_str.split(';') if k.strip()] else: keywords = [] # JEL 分类代码 jel_codes_str = get_meta('jel_code') if jel_codes_str: categories = [j.strip() for j in jel_codes_str.split(';') if j.strip()] else: categories = [] # 日期 date_str = get_meta('date') or get_meta('citation_publication_date') published_date = None if date_str: try: if '-' in date_str: # 格式: 2026-02-02 published_date = datetime.strptime(date_str, '%Y-%m-%d') else: # 格式: 2026 published_date = datetime(int(date_str), 1, 1) except (ValueError, TypeError): pass # 期刊/系列名称 journal = get_meta('citation_journal_title') # 提取 RePEc handle paper_id = self._extract_repec_handle(url) # 尝试查找 DOI(从页面内容) doi = '' doi_link = soup.find('a', href=re.compile(r'doi\.org/10\.')) if doi_link: doi_match = re.search(r'10\.\d{4,}/[^\s]+', doi_link.get('href', '')) if doi_match: doi = doi_match.group() # 尝试获取 PDF 链接 pdf_url = '' pdf_link = soup.find('a', href=re.compile(r'\.pdf$', re.I)) if pdf_link: pdf_url = pdf_link.get('href', '') if pdf_url and not pdf_url.startswith('http'): pdf_url = f"https://ideas.repec.org{pdf_url}" return Paper( paper_id=paper_id, title=title, authors=authors, abstract=abstract, url=url, pdf_url=pdf_url, published_date=published_date, source="repec", categories=categories, keywords=keywords, doi=doi, citations=0, extra={'journal': journal} if journal else {}, ) except requests.Timeout: logger.warning(f"Timeout fetching paper details from {url_or_handle}") return None except requests.RequestException as e: logger.warning(f"Request error fetching paper details: {e}") return None except Exception as e: logger.warning(f"Error fetching paper details: {e}") return None
- paper_find_mcp/server.py:75-85 (registration)Global SEARCHERS dictionary instantiates RePECSearcher for 'repec' platform, used by get_repec_paper handler.SEARCHERS = { 'arxiv': ArxivSearcher(), 'pubmed': PubMedSearcher(), 'biorxiv': BioRxivSearcher(), 'medrxiv': MedRxivSearcher(), 'google_scholar': GoogleScholarSearcher(), 'iacr': IACRSearcher(), 'semantic': SemanticSearcher(), 'crossref': CrossRefSearcher(), 'repec': RePECSearcher(), }