search_citeseerx
Search academic papers from the CiteSeerX digital library to find relevant research publications for your query.
Instructions
Search academic papers from CiteSeerX digital library.
Args: query: Search query string (e.g., 'machine learning'). max_results: Maximum number of papers to return (default: 10). Returns: List of paper metadata in dictionary format.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| max_results | No |
Implementation Reference
- paper_search_mcp/server.py:947-957 (handler)Tool definition and handler for 'search_citeseerx'. It uses 'async_search' with 'citeseerx_searcher'.
async def search_citeseerx(query: str, max_results: int = 10) -> List[Dict]: """Search academic papers from CiteSeerX digital library. Args: query: Search query string (e.g., 'machine learning'). max_results: Maximum number of papers to return (default: 10). Returns: List of paper metadata in dictionary format. """ papers = await async_search(citeseerx_searcher, query, max_results) return papers if papers else [] - The actual implementation of the CiteSeerX search logic, within the 'CiteSeerXSearcher' class.
def search(self, query: str, max_results: int = 10, **kwargs) -> List[Paper]: """ Search CiteSeerX for computer science papers. Args: query: Search query string max_results: Maximum results to return (default: 10) **kwargs: Additional parameters: - year: Filter by publication year - author: Filter by author name - venue: Filter by conference/journal venue - min_citations: Minimum citation count - sort: Sort by 'relevance', 'date', 'citations' Returns: List of Paper objects """ papers = [] try: # Prepare parameters for CiteSeerX API params = { 'q': query, 'max': min(max_results, 100), # CiteSeerX default max 'start': 0, 'sort': kwargs.get('sort', 'relevance') } # Add filters if 'year' in kwargs: year = kwargs['year'] if isinstance(year, str) and '-' in year: # Handle year range year_range = year.split('-') if len(year_range) == 2: params['year'] = f"{year_range[0]}-{year_range[1]}" else: params['year'] = str(year) if 'author' in kwargs: params['author'] = kwargs['author'] if 'venue' in kwargs: params['venue'] = kwargs['venue'] if 'min_citations' in kwargs: params['minCitations'] = kwargs['min_citations'] logger.debug(f"Searching CiteSeerX with params: {params}") response = self._get(self.SEARCH_API, params=params) response.raise_for_status() data = response.json() # CiteSeerX API returns results in 'result' field results = data.get('result', {}).get('hits', {}).get('hit', []) # Handle single result (API returns dict instead of list for single result) if isinstance(results, dict): results = [results] for result in results: try: paper = self._parse_citeseerx_result(result) if paper: papers.append(paper) if len(papers) >= max_results: break except Exception as e: logger.warning(f"Error parsing CiteSeerX result: {e}") continue logger.info(f"Found {len(papers)} papers from CiteSeerX for query: {query}") except requests.RequestException as e: logger.error(f"CiteSeerX API request error: {e}") if hasattr(e, 'response') and e.response is not None: logger.error(f"Response status: {e.response.status_code}") if e.response.status_code == 429: logger.warning("CiteSeerX rate limit exceeded") except json.JSONDecodeError as e: logger.error(f"Failed to parse CiteSeerX JSON response: {e}") except Exception as e: logger.error(f"Unexpected error in CiteSeerX search: {e}") return papers - paper_search_mcp/server.py:303-303 (registration)Task registration mapping 'search_citeseerx' in the server.
task_map[source] = search_citeseerx(query, max_results_per_source)