Fix tokenizer

#16
by stephantulkens - opened

Hello! The tokenizer used contains an incorrect redundant pretokenizer. This can lead downstream tools to believe that pretokenization (e.g., splitting is happening), when it is not. Would you accept PRs for this?

Google org

Hi @stephantulkens ,

Welcome to Google's Gemma family of open source models, thanks for bringing this to our attention. Yes, Gemma models are open source and can accepts the community contributions. Please raise a PR for the changes with the necessary details once it's reviewed the PR is going to be merged.

Thanks.

Ok thanks! I made a PR separately.

stephantulkens changed discussion status to closed

Sign up or log in to comment