HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
31.
▲
Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets
github.com/jmaczan
discuss
2 years ago
yu3zhou4
5 points
32.
▲
Show HN: 1gbps Tokenizer written in Assembly. 20x faster than HuggingFace
github.com/dogmaticdev
2 comments
2 months ago
dogmaticdev
3 points
33.
▲
Node.js Open Source LLM Tokenizer
github.com/jakecyr
1 comment
2 years ago
jakecyr
2 points
34.
▲
LLM Tokenization Demo
github.com/tokfan
discuss
10 months ago
tokfan
2 points
35.
▲
The Worst (But Only) Claude 3 Tokenizer
github.com/javirandor
discuss
2 years ago
dpaleka
2 points
36.
▲
Neural Tokenizer
github.com/Kyubyong
discuss
9 years ago
kyubyong
2 points
37.
▲
Claude-Tokenwise – CLI wrapper for efficient Claude token usage
github.com/nniinnoo
1 comment
4 months ago
little_epsilon
1 points
38.
▲
Show HN: CLI Tokenizer – A tiny tool for prompt engineers
github.com/ericciarla
discuss
2 years ago
ericciarla
1 points
39.
▲
Stripe on Apple watchOS 3
github.com/appintheair
discuss
10 years ago
Bayram
4 points
40.
▲
LLM Tokenizer in Zig
github.com/Mario-SO
discuss
10 months ago
mariodev__
1 points
41.
▲
Very simple javascript highlighter that can be used in blog posts
github.com/fatih-erikli
discuss
2 years ago
fatih-erikli
1 points
42.
▲
Show HN: I'm writing a library to apply NLP techniques to StarCraft 2
github.com/ZephyrBlu
discuss
5 years ago
ZephyrBlu
1 points
43.
▲
PRFI Protocol:Decentralized API Tokenization with Oof-of-Work Mining
github.com/sr-oliveiraa
discuss
a year ago
gustavudeoli
1 points
44.
▲
Card Network Tokenization: A Savior or Hidden Menace
github.com/juspay
discuss
3 years ago
manojr13
1 points
45.
▲
Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktoken
github.com/M4THYOU
73 comments
a year ago
matthewolfe
281 points
46.
▲
Tiktoken: OpenAI’s Tokenizer
github.com/openai
74 comments
4 years ago
azhenley
153 points
47.
▲
Code for the Byte Pair Encoding algorithm, commonly used in LLM tokenization
github.com/karpathy
31 comments
2 years ago
magoghm
81 points
48.
▲
55x Speedup of Andrej Karpathy's Minbpe LLM Tokenizer with PyTorch/CUDA
github.com/kuprel
9 comments
2 years ago
kuprel
19 points
49.
▲
Show HN: Open-source card tokenization service in Rust
github.com/juspay
discuss
3 years ago
thala
14 points
50.
▲
XML Tokenizer that's 4x faster than stdlib's XML
github.com/muktihari
1 comment
2 years ago
todsacerdoti
10 points
51.
▲
TokenMonster: Ungreedy tokenizer, outperforming tiktoken by 35%
github.com/alasdairforsythe
discuss
3 years ago
tosh
10 points
52.
▲
Show HN: A Command-Line Sentence Tokenizer Written in Golang
github.com/neurosnap
1 comment
11 years ago
qudat
6 points
53.
▲
From Scratch GPT Built with NumPy (Tokenizer, Model, Adam)
github.com/codiceSpaghetti
discuss
a year ago
xnan
6 points
54.
▲
Show HN: Rust BPE tokenizer for Qwen models that's 12x faster than HuggingFace
github.com/sweepai
discuss
9 months ago
williamzeng0
5 points
55.
▲
Chiffon: A very small ECMAScript parser, tokenizer in JS
github.com/polygonplanet
discuss
11 years ago
shawndumas
5 points
56.
▲
Jargon: tokenizers and lemmatizers for Go
github.com/clipperhouse
discuss
8 years ago
mwsherman
4 points
57.
▲
Fast JSON parser in Rust that uses SIMD and avoids tokenisation
github.com/pikkr
discuss
9 years ago
tambourine_man
4 points
58.
▲
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and Voice Cloning
github.com/OpenBMB
discuss
7 months ago
chaosprint
3 points
59.
▲
SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization
github.com/facebookresearch
discuss
9 months ago
montyanderson
3 points
60.
▲
Show HN: Jsmn_Zig – Memory-Efficient JSON Tokenizer for Zig
discuss
9 months ago
potom
3 points
More