Fix tokenizer

#16

by stephantulkens - opened Sep 12

Sep 12

Hello! The tokenizer used contains an incorrect redundant pretokenizer. This can lead downstream tools to believe that pretokenization (e.g., splitting is happening), when it is not. Would you accept PRs for this?

BalakrishnaCh

Google org Sep 12

Hi @stephantulkens ,

Welcome to Google's Gemma family of open source models, thanks for bringing this to our attention. Yes, Gemma models are open source and can accepts the community contributions. Please raise a PR for the changes with the necessary details once it's reviewed the PR is going to be merged.

Thanks.

stephantulkens

Sep 12

Ok thanks! I made a PR separately.

stephantulkens changed discussion status to closed Sep 12

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment