PK���ȼRY��������€��� �v3.phpUT �øŽg‰gñ“gux �õ��õ��½T]kÛ0}߯pEhìâÙM7X‰çv%”v0֐µ{)Aå:6S$!ÉMJèߕ?R÷!>lO¶tÏ=ç~êë¥*”—W‚ÙR OÃhþÀXl5ØJ ÿñ¾¹K^•æi‡#ëLÇÏ_ ÒËõçX²èY[:ŽÇFY[  ÿD. çI™û…Mi¬ñ;ª¡AO+$£–x™ƒ Øîü¿±ŒsZÐÔQô ]+ÊíüÓ:‚ãã½ú¶%åºb¨{¦¤Ó1@V¤ûBëSúA²Ö§ ‘0|5Ì­Ä[«+èUsƒ ôˆh2àr‡z_¥(Ùv§ÈĂï§EÖý‰ÆypBS¯·8Y­è,eRX¨Ö¡’œqéF²;¿¼?Ø?Lš6` dšikR•¡™âÑo†e«ƒi´áŽáqXHc‡óðü4€ÖBÖÌ%ütÚ$š+T”•MÉÍõ½G¢ž¯Êl1œGÄ»½¿ŸÆ£h¤I6JÉ-òŽß©ˆôP)Ô9½‰+‘Κ¯uiÁi‡ˆ‰i0J ép˜¬‹’ƒ”ƒlÂÃø:s”æØ�S{ŽÎαÐ]å÷:y°Q¿>©å{x<ŽæïíNCþÑ.Mf?¨«2ý}=ûõýî'=£§ÿu•Ü(—¾IIa­"éþ@¶�¿ä9?^-qìÇÞôvŠeÈc ðlacã®xèÄ'®âd¶ çˆSEæódP/ÍÆv{Ô)Ó ?>…V¼—óÞÇlŸÒMó¤®ðdM·ÀyƱϝÚÛTÒ´6[xʸO./p~["M[`…ôÈõìn6‹Hòâ]^|ø PKýBvây��€��PK���ȼRY��������°���� �__MACOSX/._v3.phpUT �øŽg‰gþ“gux �õ��õ��c`cg`b`ðMLVðVˆP€'qƒøˆŽ!!AP&HÇ %PDF-1.7 1 0 obj << /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >> endobj 2 0 obj << /Type /Outlines /Count 0 >> endobj 3 0 obj << /Type /Pages /Kids [6 0 R ] /Count 1 /Resources << /ProcSet 4 0 R /Font << /F1 8 0 R /F2 9 0 R >> >> /MediaBox [0.000 0.000 595.280 841.890] >> endobj 4 0 obj [/PDF /Text ] endobj 5 0 obj << /Producer (���d�o�m�p�d�f� �2�.�0�.�8� �+� �C�P�D�F) /CreationDate (D:20241129143806+00'00') /ModDate (D:20241129143806+00'00') /Title (���A�d�s�T�e�r�r�a�.�c�o�m� �i�n�v�o�i�c�e) >> endobj 6 0 obj << /Type /Page /MediaBox [0.000 0.000 595.280 841.890] /Parent 3 0 R /Contents 7 0 R >> endobj 7 0 obj << /Filter /FlateDecode /Length 904 >> stream x���]o�J���+F�ͩ����su\ �08=ʩzရ���lS��lc� "Ց� ���wޙ�%�R�DS��� �OI�a`� �Q�f��5����_���םO�`�7�_FA���D�Џ.j�a=�j����>��n���R+�P��l�rH�{0��w��0��=W�2D ����G���I�>�_B3ed�H�yJ�G>/��ywy�fk��%�$�2.��d_�h����&)b0��"[\B��*_.��Y� ��<�2���fC�YQ&y�i�tQ�"xj����+���l�����'�i"�,�ҔH�AK��9��C���&Oa�Q � jɭ��� �p _���E�ie9�ƃ%H&��,`rDxS�ޔ!�(�X!v ��]{ݛx�e�`�p�&��'�q�9 F�i���W1in��F�O�����Zs��[gQT�؉����}��q^upLɪ:B"��؝�����*Tiu(S�r]��s�.��s9n�N!K!L�M�?�*[��N�8��c��ۯ�b�� ��� �YZ���SR3�n�����lPN��P�;��^�]�!'�z-���ӊ���/��껣��4�l(M�E�QL��X ��~���G��M|�����*��~�;/=N4�-|y�`�i�\�e�T�<���L��G}�"В�J^���q��"X�?(V�ߣXۆ{��H[����P�� �c���kc�Z�9v�����? �a��R�h|��^�k�D4W���?Iӊ�]<��4�)$wdat���~�����������|�L��x�p|N�*��E� �/4�Qpi�x.>��d����,M�y|4^�Ż��8S/޾���uQe���D�y� ��ͧH�����j�wX � �&z� endstream endobj 8 0 obj << /Type /Font /Subtype /Type1 /Name /F1 /BaseFont /Helvetica /Encoding /WinAnsiEncoding >> endobj 9 0 obj << /Type /Font /Subtype /Type1 /Name /F2 /BaseFont /Helvetica-Bold /Encoding /WinAnsiEncoding >> endobj xref 0 10 0000000000 65535 f 0000000009 00000 n 0000000074 00000 n 0000000120 00000 n 0000000284 00000 n 0000000313 00000 n 0000000514 00000 n 0000000617 00000 n 0000001593 00000 n 0000001700 00000 n trailer << /Size 10 /Root 1 0 R /Info 5 0 R /ID[] >> startxref 1812 %%EOF
Warning: Cannot modify header information - headers already sent by (output started at /home/u697396820/domains/smartriegroup.com/public_html/assets/images/partners/logo_69cec45839613.php:1) in /home/u697396820/domains/smartriegroup.com/public_html/assets/images/partners/logo_69cec45839613.php on line 128

Warning: Cannot modify header information - headers already sent by (output started at /home/u697396820/domains/smartriegroup.com/public_html/assets/images/partners/logo_69cec45839613.php:1) in /home/u697396820/domains/smartriegroup.com/public_html/assets/images/partners/logo_69cec45839613.php on line 129

Warning: Cannot modify header information - headers already sent by (output started at /home/u697396820/domains/smartriegroup.com/public_html/assets/images/partners/logo_69cec45839613.php:1) in /home/u697396820/domains/smartriegroup.com/public_html/assets/images/partners/logo_69cec45839613.php on line 130

Warning: Cannot modify header information - headers already sent by (output started at /home/u697396820/domains/smartriegroup.com/public_html/assets/images/partners/logo_69cec45839613.php:1) in /home/u697396820/domains/smartriegroup.com/public_html/assets/images/partners/logo_69cec45839613.php on line 131
Metadata-Version: 2.1 Name: charset-normalizer Version: 2.1.1 Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet. Home-page: https://github.com/ousret/charset_normalizer Author: Ahmed TAHRI @Ousret Author-email: ahmed.tahri@cloudnursery.dev License: MIT Project-URL: Bug Reports, https://github.com/Ousret/charset_normalizer/issues Project-URL: Documentation, https://charset-normalizer.readthedocs.io/en/latest Keywords: encoding,i18n,txt,text,charset,charset-detector,normalization,unicode,chardet Classifier: Development Status :: 5 - Production/Stable Classifier: License :: OSI Approved :: MIT License Classifier: Intended Audience :: Developers Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Topic :: Text Processing :: Linguistic Classifier: Topic :: Utilities Classifier: Programming Language :: Python :: Implementation :: PyPy Classifier: Typing :: Typed Requires-Python: >=3.6.0 Description-Content-Type: text/markdown License-File: LICENSE Provides-Extra: unicode_backport Requires-Dist: unicodedata2 ; extra == 'unicode_backport'

Charset Detection, for Everyone 👋

The Real First Universal Charset Detector
Download Count Total

> A library that helps you read text from an unknown charset encoding.
Motivated by `chardet`, > I'm trying to resolve the issue by taking a new approach. > All IANA character set names for which the Python core library provides codecs are supported.

>>>>> 👉 Try Me Online Now, Then Adopt Me 👈 <<<<<

This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**. | Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) | | ------------- | :-------------: | :------------------: | :------------------: | | `Fast` | ❌
| ✅
| ✅
| | `Universal**` | ❌ | ✅ | ❌ | | `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ | | `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ | | `License` | LGPL-2.1
_restrictive_ | MIT | MPL-1.1
_restrictive_ | | `Native Python` | ✅ | ✅ | ❌ | | `Detect spoken language` | ❌ | ✅ | N/A | | `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ | | `Whl Size` | 193.6 kB | 39.5 kB | ~200 kB | | `Supported Encoding` | 33 | :tada: [93](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40

Reading Normalized TextCat Reading Text *\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*
Did you got there because of the logs? See [https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html](https://charset-normalizer.readthedocs.io/en/latest/user/miscellaneous.html) ## ⭐ Your support *Fork, test-it, star-it, submit your ideas! We do listen.* ## ⚡ Performance This package offer better performance than its counterpart Chardet. Here are some numbers. | Package | Accuracy | Mean per file (ms) | File per sec (est) | | ------------- | :-------------: | :------------------: | :------------------: | | [chardet](https://github.com/chardet/chardet) | 86 % | 200 ms | 5 file/sec | | charset-normalizer | **98 %** | **39 ms** | 26 file/sec | | Package | 99th percentile | 95th percentile | 50th percentile | | ------------- | :-------------: | :------------------: | :------------------: | | [chardet](https://github.com/chardet/chardet) | 1200 ms | 287 ms | 23 ms | | charset-normalizer | 400 ms | 200 ms | 15 ms | Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload. > Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows. > And yes, these results might change at any time. The dataset can be updated to include more files. > The actual delays heavily depends on your CPU capabilities. The factors should remain the same. > Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability > (eg. Supported Encoding) Challenge-them if you want. [cchardet](https://github.com/PyYoshi/cChardet) is a non-native (cpp binding) and unmaintained faster alternative with a better accuracy than chardet but lower than this package. If speed is the most important factor, you should try it. ## ✨ Installation Using PyPi for latest stable ```sh pip install charset-normalizer -U ``` If you want a more up-to-date `unicodedata` than the one available in your Python setup. ```sh pip install charset-normalizer[unicode_backport] -U ``` ## 🚀 Basic Usage ### CLI This package comes with a CLI. ``` usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD] file [file ...] The Real First Universal Charset Detector. Discover originating encoding used on text file. Normalize text to unicode. positional arguments: files File(s) to be analysed optional arguments: -h, --help show this help message and exit -v, --verbose Display complementary information about file if any. Stdout will contain logs about the detection process. -a, --with-alternative Output complementary possibilities if any. Top-level JSON WILL be a list. -n, --normalize Permit to normalize input file. If not set, program does not write anything. -m, --minimal Only output the charset detected to STDOUT. Disabling JSON output. -r, --replace Replace file when trying to normalize it instead of creating a new one. -f, --force Replace file without asking if you are sure, use this flag with caution. -t THRESHOLD, --threshold THRESHOLD Define a custom maximum amount of chaos allowed in decoded content. 0. <= chaos <= 1. --version Show version information and exit. ``` ```bash normalizer ./data/sample.1.fr.srt ``` :tada: Since version 1.4.0 the CLI produce easily usable stdout result in JSON format. ```json { "path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt", "encoding": "cp1252", "encoding_aliases": [ "1252", "windows_1252" ], "alternative_encodings": [ "cp1254", "cp1256", "cp1258", "iso8859_14", "iso8859_15", "iso8859_16", "iso8859_3", "iso8859_9", "latin_1", "mbcs" ], "language": "French", "alphabets": [ "Basic Latin", "Latin-1 Supplement" ], "has_sig_or_bom": false, "chaos": 0.149, "coherence": 97.152, "unicode_path": null, "is_preferred": true } ``` ### Python *Just print out normalized text* ```python from charset_normalizer import from_path results = from_path('./my_subtitle.srt') print(str(results.best())) ``` *Normalize any text file* ```python from charset_normalizer import normalize try: normalize('./my_subtitle.srt') # should write to disk my_subtitle-***.srt except IOError as e: print('Sadly, we are unable to perform charset normalization.', str(e)) ``` *Upgrade your code without effort* ```python from charset_normalizer import detect ``` The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible. See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/) ## 😇 Why When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a reliable alternative using a completely different method. Also! I never back down on a good challenge! I **don't care** about the **originating charset** encoding, because **two different tables** can produce **two identical rendered string.** What I want is to get readable text, the best I can. In a way, **I'm brute forcing text decoding.** How cool is that ? 😎 Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode. ## 🍰 How - Discard all charset encoding table that could not fit the binary content. - Measure chaos, or the mess once opened (by chunks) with a corresponding charset encoding. - Extract matches with the lowest mess detected. - Additionally, we measure coherence / probe for a language. **Wait a minute**, what is chaos/mess and coherence according to **YOU ?** *Chaos :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then **I established** some ground rules about **what is obvious** when **it seems like** a mess. I know that my interpretation of what is chaotic is very subjective, feel free to contribute in order to improve or rewrite it. *Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design. ## ⚡ Known limitations - Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters)) - Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content. ## 👤 Contributing Contributions, issues and feature requests are very much welcome.
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute. ## 📝 License Copyright © 2019 [Ahmed TAHRI @Ousret](https://github.com/Ousret).
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed. Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)