Skip to main content
Glama

extract_douyin_text

Extract text content from Douyin video share links using speech recognition to convert audio to readable text.

Instructions

从抖音分享链接提取视频中的文本内容

参数:
- share_link: 抖音分享链接或包含链接的文本
- model: 语音识别模型(可选,默认使用paraformer-v2)

返回:
- 提取的文本内容

注意: 需要设置环境变量 DASHSCOPE_API_KEY

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
share_linkYes
modelNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • Main handler function decorated with @mcp.tool() that orchestrates parsing the share link and extracting text from the video using DouyinProcessor class methods.
    @mcp.tool()
    async def extract_douyin_text(
        share_link: str,
        model: Optional[str] = None,
        ctx: Context = None
    ) -> str:
        """
        从抖音分享链接提取视频中的文本内容
        
        参数:
        - share_link: 抖音分享链接或包含链接的文本
        - model: 语音识别模型(可选,默认使用paraformer-v2)
        
        返回:
        - 提取的文本内容
        
        注意: 需要设置环境变量 DASHSCOPE_API_KEY
        """
        try:
            # 从环境变量获取API密钥
            api_key = os.getenv('DASHSCOPE_API_KEY')
            if not api_key:
                raise ValueError("未设置环境变量 DASHSCOPE_API_KEY,请在配置中添加阿里云百炼API密钥")
            
            processor = DouyinProcessor(api_key, model)
            
            # 解析视频链接
            ctx.info("正在解析抖音分享链接...")
            video_info = processor.parse_share_url(share_link)
            
            # 直接使用视频URL进行文本提取
            ctx.info("正在从视频中提取文本...")
            text_content = processor.extract_text_from_video_url(video_info['url'])
            
            ctx.info("文本提取完成!")
            return text_content
            
        except Exception as e:
            ctx.error(f"处理过程中出现错误: {str(e)}")
            raise Exception(f"提取抖音视频文本失败: {str(e)}")
  • Helper method in DouyinProcessor class that parses Douyin share URL/text to extract no-watermark video URL, title, and ID by scraping the page and parsing JSON data.
    def parse_share_url(self, share_text: str) -> dict:
        """从分享文本中提取无水印视频链接"""
        # 提取分享链接
        urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', share_text)
        if not urls:
            raise ValueError("未找到有效的分享链接")
        
        share_url = urls[0]
        share_response = requests.get(share_url, headers=HEADERS)
        video_id = share_response.url.split("?")[0].strip("/").split("/")[-1]
        share_url = f'https://www.iesdouyin.com/share/video/{video_id}'
        
        # 获取视频页面内容
        response = requests.get(share_url, headers=HEADERS)
        response.raise_for_status()
        
        pattern = re.compile(
            pattern=r"window\._ROUTER_DATA\s*=\s*(.*?)</script>",
            flags=re.DOTALL,
        )
        find_res = pattern.search(response.text)
    
        if not find_res or not find_res.group(1):
            raise ValueError("从HTML中解析视频信息失败")
    
        # 解析JSON数据
        json_data = json.loads(find_res.group(1).strip())
        VIDEO_ID_PAGE_KEY = "video_(id)/page"
        NOTE_ID_PAGE_KEY = "note_(id)/page"
        
        if VIDEO_ID_PAGE_KEY in json_data["loaderData"]:
            original_video_info = json_data["loaderData"][VIDEO_ID_PAGE_KEY]["videoInfoRes"]
        elif NOTE_ID_PAGE_KEY in json_data["loaderData"]:
            original_video_info = json_data["loaderData"][NOTE_ID_PAGE_KEY]["videoInfoRes"]
        else:
            raise Exception("无法从JSON中解析视频或图集信息")
    
        data = original_video_info["item_list"][0]
    
        # 获取视频信息
        video_url = data["video"]["play_addr"]["url_list"][0].replace("playwm", "play")
        desc = data.get("desc", "").strip() or f"douyin_{video_id}"
        
        # 替换文件名中的非法字符
        desc = re.sub(r'[\\/:*?"<>|]', '_', desc)
        
        return {
            "url": video_url,
            "title": desc,
            "video_id": video_id
        }
  • Helper method in DouyinProcessor that performs speech-to-text extraction directly from the video URL using Dashscope (Aliyun) ASR Transcription API.
    def extract_text_from_video_url(self, video_url: str) -> str:
        """从视频URL中提取文字(使用阿里云百炼API)"""
        try:
            # 发起异步转录任务
            task_response = dashscope.audio.asr.Transcription.async_call(
                model=self.model,
                file_urls=[video_url],
                language_hints=['zh', 'en']
            )
            
            # 等待转录完成
            transcription_response = dashscope.audio.asr.Transcription.wait(
                task=task_response.output.task_id
            )
            
            if transcription_response.status_code == HTTPStatus.OK:
                # 获取转录结果
                for transcription in transcription_response.output['results']:
                    url = transcription['transcription_url']
                    result = json.loads(request.urlopen(url).read().decode('utf8'))
                    
                    # 保存结果到临时文件
                    temp_json_path = self.temp_dir / 'transcription.json'
                    with open(temp_json_path, 'w') as f:
                        json.dump(result, f, indent=4, ensure_ascii=False)
                    
                    # 提取文本内容
                    if 'transcripts' in result and len(result['transcripts']) > 0:
                        return result['transcripts'][0]['text']
                    else:
                        return "未识别到文本内容"
                        
            else:
                raise Exception(f"转录失败: {transcription_response.output.message}")
                
        except Exception as e:
            raise Exception(f"提取文字时出错: {str(e)}")
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It successfully reveals several important behavioral traits: the tool performs text extraction from video content, requires an API key (DASHSCOPE_API_KEY environment variable), uses speech recognition (implied by the model parameter), and has an optional model parameter with a default. However, it doesn't mention rate limits, error conditions, or authentication details beyond the API key requirement.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, parameters, returns, notes) and uses only essential sentences. Each section earns its place by providing distinct information. The Chinese text is concise and direct, though the formatting with section headers could be slightly more polished.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, speech recognition functionality) and the presence of an output schema (which handles return values), the description provides good contextual coverage. It explains the core functionality, parameters, authentication requirement, and basic workflow. The main gap is lack of error handling information and more detailed model options.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate. It provides meaningful semantic context for both parameters: 'share_link' accepts Douyin share links or text containing links, and 'model' specifies the speech recognition model with a default value. This adds substantial value beyond the bare schema, though it could provide more detail about valid model options or link formats.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('从抖音分享链接提取视频中的文本内容' - extract text content from Douyin share links) and the resource (Douyin video content). It distinguishes itself from sibling tools like 'get_douyin_download_link' and 'parse_douyin_video_info' by focusing specifically on text extraction rather than downloading or general video information parsing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context through the mention of Douyin share links and the optional model parameter, but doesn't explicitly state when to use this tool versus the sibling tools. There's no guidance about alternative approaches or specific scenarios where this tool is preferred over 'parse_douyin_video_info' which might also provide text information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yzfly/douyin-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server