AI & ML interests

None defined yet.

Norod78ย 
posted an update about 2 months ago
view post
Post
1702
Multilingual Tokenization Showdown
Analyzing 12 LLM Tokenizers Across 204 Languages.

First, I've created a dataset with Wikipedia's "Cat" article text in 272 languages:
Norod78/WikiCat-Multilingual

For each language entry with at least 100 words, I tokenized the text using 12 tokenizers and calculated the "Characters per token" ratio and "Word per token" ratio. The higher this ratio is, the more information each token represents on average for that language (and perhaps allowing the llm to potentially learn more per-parameter if trained on a dataset of that language).

You can see a slideshow summary of the results here:
https://norod.github.io/wikicat-tokenizer-eval/tokenizer-slideshow.html

I hope I interpreted the results correctly, I've made the code available on GitHub so you can re-create the raw results jsonl with this repo:
https://github.com/Norod/wikicat-tokenizer-eval

Post on X:
https://x.com/Norod78/status/1984366900550266999

Norod78ย 
posted an update almost 2 years ago
view post
Post
I've prepared a Google Colab notebook which allows you to play with interpolating between different people using IP-Adapter SDXL Face-ID Plus.

#Prepare a list t of num_of_results values between 0 and 1
t_space = torch.linspace(0, 1, num_of_results)
for t in tqdm(t_space):
    mix_factor = t.item()
    # interpolate between the two face images 
    image = (image1 * (1 - mix_factor) + image2 * mix_factor).astype(np.uint8)
    # interpolate between the two face embedding 
    faceid_embeds = torch.lerp(faceid_embeds1, faceid_embeds2, t)
   #generate interpolated result
    images = ip_model.generate(prompt=prompt, negative_prompt=negative_prompt, face_image=image, faceid_embeds=faceid_embeds, shortcut=v2, num_samples=2, scale=scale, s_scale=s_scale, guidance_scale=guidance_scale, width=width, height=height, num_inference_steps=steps, seed=seed)


Link to notebook:
Norod78/face_id_v2_test_code

Link to Face-ID Repo:
h94/IP-Adapter-FaceID

Link to all sorts of generated examples (Use the file tab):
Norod78/face_id_v2_test_code

ยท