Hacker News

Nuŋɔŋlɔ ƒe hatsotsowo me toto kple Python 3.14 ƒe ZSTD module

Nuŋɔŋlɔ ƒe hatsotsowo me toto kple Python 3.14 ƒe ZSTD module Nuŋɔŋlɔwo me dzodzro blibo sia na wodzro eƒe akpa veviwo kple gɔmesese siwo keke ta wu me tsitotsito. Nu Vevi Siwo Ŋu Wòalé Be Na Numedzodzroa ku ɖe: Core mɔnuwo kple pro...

11 min read Via maxhalford.github.io

Mewayz Team

Editorial Team

Hacker News
Fifia nya siwo ƒo xlãe siwo katã mehiã la su asinye. Mina maŋlɔ blog me nya la.

Nuŋɔŋlɔ ƒe hatsotso kple Python 3.14 ƒe ZSTD Module

Python 3.14 to compression.zstd module vɛ ɖe agbalẽdzraɖoƒe si wozãna ɖaa la me, eye wòʋu mɔnu si ŋu ŋusẽ le nukutɔe le nuŋɔŋlɔwo ƒe hatsotsowo me toto mɔ̃wo ƒe nusɔsrɔ̃ ƒe kpɔɖeŋuwo manɔmee. Ne èdzidze alesi compressor ateŋu aƒo nuŋɔŋlɔ eve nu ƒu nyuie la, àteŋu anya woƒe sɔsɔ — mɔnu si woyɔna be Normalized Compression Distance (NCD) — eye fifia Zstandard na wòwɔe kabakaba ale gbegbe na nuwɔwɔ ƒe dɔwo.

Aleke Nuŋɔŋlɔ ƒe hatsotso si wotu ɖe nutete dzi wɔa dɔ ŋutɔŋutɔ?

Susu vevitɔ si le megbe na vovototodedeameme si wotu ɖe nutete dzi la dzɔ tso nyatakaka ŋuti nufiafia me. Ne compression algorithm abe Zstandard ene do go nuŋɔŋlɔ ƒe ƒuƒoƒo aɖe la, etua nyagɔmeɖegbalẽ ememetɔ si me nɔnɔmetatawo le. Ne nuŋɔŋlɔ eve ƒe nyagbɔgblɔ, nyagɔmeɖegbalẽ, kple wɔwɔme sɔ la, ne woƒo wo nu ƒu ɖekae la, nusi dona tso eme si lolo vie wu nuŋɔŋlɔ gãtɔ ɖeɖeko ƒoƒo ɖe enu. Ne womedo ƒome o la, lolome si woƒo ƒu si wotsɔ ƒo ƒui la gogo lolome ɖekaɖeka eveawo siaa ƒe ƒuƒoƒo.

Woléa ƒomedodo sia to Normalized Compression Distance formula me: NCD(x, y) = (C(xy) - min(C(x), C(y))) / max(C(x), C(y)), afisi C(x) nye nuŋɔŋlɔ x ƒe lolome si woƒo, eye C(xy) nye nuŋɔŋlɔ eve siwo wotsɔ ƒo ƒui ƒe lolome si woƒo. NCD ƒe asixɔxɔ si te ɖe 0 ŋu fia be nuŋɔŋlɔawo ɖi wo nɔewo ŋutɔ, gake asixɔxɔ si gogo 1 fia be nyatakaka aɖeke mele wo si kloe o.

Nusi na mɔnu sia ɖe dzesi enye be mehiã hehenana ŋuti nyatakaka aɖeke, tokenization aɖeke, embeddings aɖeke, kple GPU aɖeke o. Compressor la ŋutɔ wɔa dɔ abe nuŋɔŋlɔa ƒe wɔwɔme ƒe kpɔɖeŋu si wosrɔ̃ ene. Numekuku siwo wota ɖe agbalẽwo abe "Low-Resource Text Classification: A Parameter-Free Classification Method with Compressors" (2023) ɖee fia be gzip-based NCD hoʋiʋli kple BERT le dzidzenu aɖewo me, si he ɖetsɔleme yeye vɛ ɖe mɔnua ŋu.

Nukatae Python 3.14 ƒe Zstandard Module nye Fefe-Trɔla na NCD?

Do ŋgɔ na Python 3.14 la, Zstandard zazã bia be nàde ame etɔ̃lia ƒe python-zstandard ƒe akpata. compression.zstd module yeyea, si woto vɛ to PEP 784 dzi, ɖoa ɖe CPython tẽ. Esia fia be zero dependency overhead kple kakaɖedzi, API si li ke si Meta ƒe aʋawɔwɔ-do libzstd da megbe na. Le hatsotso ƒe dɔwo koŋ gome la, Zstandard naa viɖe geɖe wu gzip alo bzip2:

    ƒe nyawo
  • Duƒuƒu: Zstandard tea nu kabakaba wu gzip zi gbɔ zi 3-5 le xexlẽme siwo sɔ me, si wɔnɛ be batch ƒe hatsotsowo me toto ɖe nuŋlɔɖi akpe geɖe dzi te ŋu wɔa dɔ le sɛkɛndwo me tsɔ wu aɖabaƒoƒo
  • Tunable compression levels: Levels 1 vaseɖe 22 na nète ŋu dzraa duƒuƒu ɖe ratio ta, si na be nàte ŋu ado NCD ƒe nyateƒetoto ɖe throughput requirements
  • Nyagɔmeɖegbalẽ ƒe kpekpeɖeŋu: Zstandard nyagɔmeɖegbalẽ siwo wona hehee do ŋgɔ ateŋu ana nuŋɔŋlɔ suewo ƒe nutete nanyo ɖe edzi ŋutɔ (le 4KB te), si nye nuŋlɔɖi ƒe lolome ƒe didime tututu afisi NCD ƒe nyateƒetoto le vevie wu
  • Streaming API: Module la doa alɔ dzidziɖedzi ƒe nutete, si wɔnɛ be vovototodedeameme ƒe pɔmpi siwo wɔa dɔ tso nuŋɔŋlɔwo ŋu evɔ wometsɔa corpora bliboa dea ŋkuɖodzinu me o
  • Agbalẽdzraɖoƒe ƒe liƒo si woɖo ɖi: Trɔtrɔ aɖeke metsi tre ɖe eŋu o, afɔku aɖeke mele nuzazãwo ƒe kɔsɔkɔsɔ me o — tso compression import zstd wɔa dɔ le Python 3.14+ ɖoɖo ɖesiaɖe dzi
ƒe nyawo
ƒe nyawo

Gbese vevi: Nutete-si wotu ɖe hatsotsowo dzi wɔa dɔ nyuie wu ne èhiã gɔmedzedze kabakaba, si dzi woanɔ te ɖo manɔmee, si akpɔ gbegbɔgblɔ geɖe me nuŋɔŋlɔwo gbɔ le wo ɖokui si. Esi wònye be compressorwo wɔa dɔ ɖe byte xoxowo dzi tsɔ wu gbegbɔgblɔ tɔxɛ ƒe dzesiwo ta la, wodaa Chinagbe, Arabgbe, alo gbe vovovowo ƒe nuŋlɔɖiwo ɖe hatsotsowo me nyuie abe Eŋlisigbe ene — gbegbɔgblɔ ƒe kpɔɖeŋu aɖeke mehiã o.

ƒe nyawo

Aleke Dɔwɔwɔ Nyui aɖe Le?

NCD ƒe hatsotso suetɔ kekeake le Python 3.14 me sɔ ɖe fli 30 te. Èŋlɔa nufiame nuŋɔŋlɔ ɖesiaɖe ɖe kɔpi me (ɖeka le hatsotso ɖesiaɖe me), emegbe le nuŋlɔɖi yeye ɖesiaɖe gome la, bu NCD la ɖe nufiame ɖesiaɖe ŋu eye nèdea hatsotso si ƒe didime le bɔbɔe wu la asi na. Susu vevitɔe nye esi:

Gbã la, tsɔ module la kple tso compression import zstd me. Gblɔ dɔwɔwɔ si xɔa byte ka eve, ƒoa ɖesiaɖe ɖekaɖeka, ƒoa woƒe kadodo, eye wòtrɔa NCD ƒe dzesi. Emegbe tu nyagɔmeɖegbalẽ si awɔ nɔnɔmetata na hatsotso ƒe dzesidewo ɖe kpɔɖeŋu nuŋɔŋlɔ siwo le teƒenɔlawo ŋu. Le nuŋlɔɖi ɖesiaɖe si gbɔna gome la, gbugbɔ ƒo nu tso hatsotsowo ŋu, bu NCD ƒe akɔnta, eye nàtia esi le sue wu.

💡 DID YOU KNOW?

Mewayz replaces 8+ business tools in one platform

CRM · Invoicing · HR · Projects · Booking · eCommerce · POS · Analytics. Free forever plan available.

Start Free →
| Ne wokɔ nutete ƒe sesẽme ɖe dzi va ɖo 10 la, etutua nyateƒetoto ɖe 68% lɔƒo le ga si woatsɔ aɖe dɔwɔwɔ dzi akpɔtɔ va ɖo nuŋlɔɖi siwo ade 2,500 le sɛkɛnd ɖeka me. Xexlẽdzesi siawo mewɔ ɖeka kple transformer siwo woɖɔ ɖo nyuie o, gake wonaa gɔmeɖoanyi sesẽ na prototyping, data labeling triage, alo nuto siwo me ML dependencies dede mawɔ dɔ o.

Aleke NCD Sɔ Kple ML ƒe Ƒomedodo Deŋgɔ?

Ŋuɖoɖo anukwaretɔe enye be NCD menye nusi woatsɔ aɖɔli transformer-based classifiers le high-stakes production systems me o. Kpɔɖeŋuwo abe BERT alo GPT-si wotu ɖe hatsotsowo dzi ɖoa 94%+ ƒe nyateƒetoto gbɔ le dzidzenu siwo woɖo ɖi dzi. Gake NCD kple Zstandard xɔ teƒe tɔxɛ aɖe. Ewɔa dɔ nyuie wu le vuvɔ-gɔmedzedze ƒe nɔnɔme siwo me kpɔɖeŋu siwo woŋlɔ ŋkɔ na siwo mede 50 o le asiwò le klass ɖesiaɖe me — nɔnɔme si me kpɔɖeŋu siwo ŋu wotrɔ asi le nyuie gɔ̃ hã ʋlina vevie. Ebia hehexɔxɔ ƒe ɣeyiɣi zero, ekpɔa gbegbɔgblɔ alo nuŋɔŋlɔ ɖesiaɖe gbɔ tɔtrɔ aɖeke manɔmee, eye wòwɔa dɔ bliboe le CPU dzi kple ŋkuɖodzinu si nɔa anyi ɖaa.

| Afɔɖeɖe eve ƒe pɔmpi sia ɖea nutsotso ƒe gazazã dzi kpɔtɔna ŋutɔ esime wòléa nyateƒetoto bliboa me ɖe asi. Mɔ̃ siwo wɔa dɔ tso nyatakaka siwo zãlawo wɔ ŋu le agbɔsɔsɔ me, abe Mewayz ƒe asitsatsa ƒe OS si me mɔ̃ 207 le si asitsala siwo wu 138,000 zãna ene, kpɔa viɖe tso hatsotso siwo le bɔbɔe me be woatsɔ aɖo gbedasiwo, ade dzesi nyatakakawo, eye woatrɔ asi le zãla ƒe nuteƒekpɔkpɔwo ŋu wòasɔ na ame ŋutɔ evɔ xɔtuɖaŋu sesẽ aɖeke manɔmee.

Nukae Nye Seɖoƒewo Kple Nuwɔna Nyuitɔwo?

Seɖoƒe nyanya siwo ŋu wòle be nàbu akɔnta le le hatsotso si wotu ɖe nutete dzi. Nuŋɔŋlɔ kpuiwo (siwo mede byte 100 o) naa NCD xexlẽdzesi siwo ŋu kakaɖedzi mele o elabena nyatakaka si sɔ mele compressor la si be wòatu nɔnɔme siwo ŋu gɔmesese le o. Mɔnua hã sea veve ɖe numekugbalẽwo tiatia ŋu — teƒenɔla siwo wometia nyuie o la gblẽa nyateƒetoto me vevie. Eye esi NCD nye didiƒe ƒe metrik tsɔ wu be wòanye nusiwo ate ŋu adzɔ ƒe kpɔɖeŋu ta la, le dzɔdzɔme nu la, mewɔa kakaɖedzi ƒe dzesiwo o.

| Le nuŋɔŋlɔ suewo ƒe hatsotsowo me toto gome la, hehe Zstandard nyagɔmeɖegbalẽ do ŋgɔ le wò domenyinyi ƒe nuŋlɔɖi me — afɔɖeɖe ɖeka sia ateŋu ana nyateƒetoto nanyo ɖe edzi alafa memama 8-12 le nuŋlɔɖi kpuiwo dzi.

Nyabiase Siwo Wobiana Enuenu

Ðe vovototodedeameme si wotu ɖe nutete dzi wɔa dɔ na seselelãmewo me dzodzroa?

Ate ŋui, gake kple nuxlɔ̃amewo. Seselelãmewo me dzodzro bia be woade dzesi gbeɖiɖi ƒe vovototo siwo menya kpɔna dzea sii bɔbɔe o le nuŋɔŋlɔ siwo ƒe wɔwɔme sɔ kple wo nɔewo me. NCD wɔa dɔ nyuie wu na tanyawo ƒe hatsotsowo me afisi nuŋlɔɖi siwo le hatsotso vovovowo me zãa nyagbe vovovowo le. Le seselelãme gome la, zi geɖe la, nyateƒetoto dzea anyigba le 55-60% — enyo wu le vome, gake menye ewɔwɔ-dzra ɖo le eɖokui si o. NCD ƒe nɔnɔmewo tsɔtsɔ ƒo ƒu kple logistic regression model si le bɔbɔe la naa emetsonuwo nyona ɖe edzi ŋutɔ.

Ðe mateŋu azã compression.zstd module le Python ƒe tɔtrɔwo me do ŋgɔ na 3.14?

Ao. compression.zstd module nye yeye le Python 3.14 me. Le tɔtrɔ siwo do ŋgɔ gome la, de python-zstandard ƒe agbalẽvi tso PyPI me, si naa compress() kple decompress() dɔwɔwɔ siwo sɔ. NCD ƒe susuŋudɔwɔwɔ gakpɔtɔ le ɖeka — import nyagbɔgblɔ koe trɔna. Ne ènya do ɖe ŋgɔ va ɖo 3.14 ko la, àteŋu aɖe asi le ame etɔ̃lia ƒe ŋuɖoɖo ɖe eŋu keŋkeŋ.

Aleke Zstandard NCD wɔa dɔe ne wotsɔe sɔ kple TF-IDF si ƒe cosine ɖi wo nɔewo?

Le hatsotso geɖe ƒe tanyawo ƒe hatsotsowo me kple nyatakakadzraɖoƒe siwo da sɔ me la, TF-IDF kpe ɖe cosine ƒe sɔsɔ ŋu zi geɖe la, eɖoa 75-82% ƒe nyateƒetoto gbɔ ne wotsɔe sɔ kple Zstandard NCD ƒe 62-68%. Gake TF-IDF bia vectoriser si sɔ, nyagbɔgblɔ si woɖe fia, kple gbegbɔgblɔ tɔxɛ ƒe nyagbewo ƒe xexlẽdzesiwo. Zstandard NCD mehiã be woawɔ dɔ siawo dometɔ aɖeke do ŋgɔ o, ewɔa dɔ le gbegbɔgblɔ vovovowo me le aɖaka me, eye wòdaa nuŋlɔɖi yeyewo ɖe hatsotsowo me le ɣeyiɣi madzudzɔmadzudzɔe me metsɔ le nyagbewo ƒe lolome me o. Le kpɔɖeŋuwɔwɔ kabakaba alo gbegbɔgblɔ geɖe ƒe nɔnɔmewo gome la, zi geɖe la, NCD nyea mɔ si dzi woato kabakaba wu ayi ɖoɖo si le dɔ wɔm gbɔ.

| Ne èle mɔnu si me nusianu le ɖeka dim be yeatsɔ akpɔ wò asitsanyawo, adzɔnuwo, nusɔsrɔ̃wo, kple asisiwo ƒe kadodowo dzi la, dze xɔtutu gɔme kple Mewayz egbea eye nàtsɔ aɖaŋu siawo awɔ dɔ le wò dɔwɔwɔ bliboa me.

ɣesiaɣi

Try Mewayz Free

All-in-one platform for CRM, invoicing, projects, HR & more. No credit card required.

Start managing your business smarter today

Join 30,000+ businesses. Free forever plan · No credit card required.

Ready to put this into practice?

Join 30,000+ businesses using Mewayz. Free forever plan — no credit card required.

Start Free Trial →

Ready to take action?

Start your free Mewayz trial today

All-in-one business platform. No credit card required.

Start Free →

14-day free trial · No credit card · Cancel anytime