Skip to main content
Glama

URL Reputation and Validity Checker

by prismon
validators.cpython-313.pyc11.4 kB
� ��7h�%���SrSSKrSSKrSSKrSSKrSSKJr SSKJrJrJ r J r SSK J r SSK r SSKrSSKrSSKJr SSKJrJrJr "S S 5rg) zCore validation logic for URLs.�N)�datetime)�Dict�List�Optional�Tuple)�urlparse)� BeautifulSoup�)�URLValidationResult�ConfidenceLevel�ValidationLevelc��\rSrSrSrSS\S\\4SjjrSr Sr S \S \ 4S jr \ R4S \S \ S \4S jjrS \S \ 4SjrS\S\S \\4SjrS \S\S \\4SjrS\S\S \ 4SjrS\S\S \4SjrS\ S\\S \4SjrSrg)� URLValidator�z(Handles URL validation and basic checks.N�timeout� user_agentc�>�XlU=(d SUlSUlg)NzURL-Reputation-Checker/1.0)rr�client)�selfrrs �X/home/josh/Projects/reputation-and-validity-checker/url_reputation_checker/validators.py�__init__�URLValidator.__init__s��� �$�D�(D����� �c��# �[R"URS[R"5SUR 0S9UlU$7f)zAsync context manager entry.Tz User-Agent)r�follow_redirects�verify�headers)�httpx� AsyncClientr�certifi�whererr)rs r� __aenter__�URLValidator.__aenter__s=����'�'��L�L�!��=�=�?�!�4�?�?�3�  �� � � �s�AA c��x# �UR(a#URR5IShv�N ggN7f)zAsync context manager exit.N)r�aclose)r�exc_type�exc_val�exc_tbs r� __aexit__�URLValidator.__aexit__%s*��� �;�;��+�+�$�$�&� &� &� � &�s �/:�8�:�url�returnc�2�[R"U5SL$)zCheck if URL has valid format.T)� validatorsr+)rr+s r� is_valid_url�URLValidator.is_valid_url*s���~�~�c�"�d�*�*r�levelc��6# �URU5(d[USSSSSS/[RS9$[R"5n/nUR R U5IShv�N n[R"5U- nURS;n[UR5nURS5n U (a-U[R:waURU5IShv�N n U[R:waGURS:Xa7URURUR 5n UR#U 5 U[R$:Xa,UR'XR5n UR#U 5 [UUURUUU UUR)Xt5[+UR,5[UR.5UR R S S 5S .S 9 $GN�GN![0R2a, [USSUR4SSS /[RS9s$[6aKn [USS[R"5U- SSS[+U 53/[RS9sSn A $Sn A ff=f7f)zValidate a single URL.FrzInvalid URL format)r+�is_valid� status_code� response_time�content_length� ssl_valid�warnings�confidence_levelN) ����������i-i.i3i4zhttps://r:� content-type�unknown)� final_url�redirect_count� content_type) r+r3r4r5r6r7r8r9�metadatazRequest timeoutzRequest failed: )r/r r �HIGH�timer�getr4�len�content� startswithr �BASIC� _validate_ssl�_validate_content�textr�extend� COMPREHENSIVE�_check_suspicious_patterns�_determine_confidence�strr+�historyr�TimeoutExceptionr� Exception) rr+r1� start_timer8�responser5r3r6r7�content_warnings�pattern_warnings�es r� check_url�URLValidator.check_url.s[���� � ��%�%�&����� ��.�/�!0�!5�!5� � ��Y�Y�[� ���< �!�[�[�_�_�S�1�1�H� �I�I�K�*�4�M� �+�+�/\�\�H� ��!1�!1�2�N����z�2�I��U�o�&;�&;�;�"&�"4�"4�S�"9�9� ���-�-�-�(�2F�2F�#�2M�#'�#9�#9�(�-�-��IY�IY�#Z� ���� 0�1���5�5�5�#'�#B�#B�3� � �#V� ���� 0�1�&��!�$�0�0�+�-�#�!�!%�!;�!;�H�!O�!$�X�\�\�!2�&)�(�*:�*:�&;�$,�$4�$4�$8�$8���$S��� �-2�:��8�%�%� �&����"�l�l� ��+�,�!0�!5�!5� � �� �&����"�i�i�k�J�6� ��,�S��V�H�5�6�!0�!5�!5� � �� �si�A J�H�.G=�/BH�/H�0D H�<J�=H�H�=J�J� J� AJ� J� J�J�Jc��R# �[U5n[R"[R"5S9n[ R "URUR=(d SUS9IShv�N upEUR5 UR5IShv�N gN/N! g=f7f)zValidate SSL certificate.)�cafilei�)�sslNTF) rr`�create_default_contextr r!�asyncio�open_connection�hostname�port�close� wait_closed)rr+�parsed�context�reader�writers rrL�URLValidator._validate_ssl}s���� ��c�]�F��0�0�� � ��H�G�#*�#:�#:����� � �"�s��$��N�F� �L�L�N��$�$�&� &� &��� '�� ��sA�B'�A(B �,B�-)B �B�B �B'�B �B � B$�"B'rIrc���/n[U5S:aURS5 /SQnUR5nUHnXe;dM URSUS35 O [US5nUR S5(aUR S5(dURS 5 U$! UR S S 5R S 5(aURS 5 U$=f)z0Validate page content for suspicious indicators.�dz.Very short content - possible placeholder page)zdomain for salezthis domain is parkedzbuy this domainzdomain parkingzunder constructionz coming soonzPossible parking page: 'z' foundz html.parser�html�bodyzInvalid HTML structurer?�z text/htmlzFailed to parse HTML)rH�append�lowerr �findrGrJ)rrIrr8�parking_indicators� content_lower� indicator�soups rrM�URLValidator._validate_content�s����� �w�<�#� � �O�O�L� M� �� � � �� �+�I��)����":�9�+�W� M�N��,�  8� ��-�8�D��9�9�V�$�$�D�I�I�f�,=�,=���� 8�9� ���  8��{�{�>�2�.�9�9�+�F�F���� 6�7����s �A B(�(9C$c�J�/n[U5nURR5n/SQnUH1n[R"Xu5(dM UR S5 O UR S5Vs/sH o�(dM UPM n n[U 5S:�aUR S5 URn U (a_U RS5S:�aUR S5 /S Qn U H/n URX�5(dMUR S U 35 M1 U$s snf) z2Check for patterns common in AI-hallucinated URLs.)z/blog/\d{4}/\d{2}/\d{2}/[a-z-]+z/docs/v\d+\.\d+\.\d+/apiz/research/papers/\d{4}/z%/products/[a-z]+-[a-z]+-[a-z]+-[a-z]+z.URL pattern commonly seen in AI hallucinations�/�z!Unusually deep URL path structure�.�zExcessive subdomains)z github.comz google.comz microsoft.comz amazon.comzPossible typosquatting of ) r�pathrs�re�searchrr�splitrHrd�count�_is_typosquatting) rr+rIr8rhr�suspicious_patterns�pattern�p� path_parts�domain�common_domains�commons rrQ�'URLValidator._check_suspicious_patterns�s������#����{�{� � �"�� ��+�G��y�y��'�'����"P�R��+� "&���C��6��A�A�a�� �6� �z�?�Q� � �O�O�?� @����� ��|�|�C� �1�$���� 6�7�Y�N�(���)�)�&�9�9��O�O�&@���$I�J�)����#7s �6 D �D r��targetc��X:XagURS5SnURS5SnURX45nSUs=:=(a S:*$s $)z7Check if domain might be typosquatting a target domain.Fr}r�)r��_levenshtein_distance)rr�r�� domain_base� target_base�distances rr��URLValidator._is_typosquatting�sb�� � ���l�l�3�'��*� ��l�l�3�'��*� ��-�-�k�G���8� � �q� � � � r�s1�s2c ��[U5[U5:aURX!5$[U5S:Xa [U5$[[U5S-5n[U5HVupEUS-/n[U5H:upxX7S-S-n XgS-n X7XX:g-n UR [ X�U 55 M< UnMX US$)z3Calculate Levenshtein distance between two strings.rr �����)rHr��range� enumeraterr�min) rr�r�� previous_row�i�c1� current_row�j�c2� insertions� deletions� substitutionss rr��"URLValidator._levenshtein_distance�s��� �r�7�S��W� ��-�-�b�5� 5� �r�7�a�<��r�7�N��S��W�q�[�)� ��r�]�E�A��q�5�'�K�"�2����)�a�%�0�1�4� �'�N�Q�.� � ,��2�8� <� ��"�"�3�z�m�#L�M� '� '�L�#��B��rr3r8c��U(d[R$[U5nUS:Xa[R$US::a[R$[R$)z7Determine confidence level based on validation results.rr�)r rErH�MEDIUM�LOW)rr3r8� warning_counts rrR�"URLValidator._determine_confidence�sN���"�'�'� '��H� � � �A� �"�'�'� '� �a� �"�)�)� )�"�&�&� &r)rrr)g$@N)�__name__� __module__� __qualname__�__firstlineno__�__doc__�floatrrSrr"r)�boolr/r �STANDARDr r\rLrrrMrQr��intr�r rR�__static_attributes__�rrrrs ��2����(�3�-�� �'� +��+��+�BQ�AY�AY�M�3�M��M�^q�M�^�s��t��"!��!�t�!��S� �!�F'�c�'�C�'�D��I�'�R!��!�S�!�T�!�  �� �� �� �( '�d� '�d�3�i� '�O� 'rr)r�rbr�r`rFr�typingrrrr� urllib.parserr rr.�bs4r �modelsr r r rr�rr�<module>r�s>��%�� � � ��.�.�!�� ���I�I�v'�v'r

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/prismon/reputation-checker-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server