Ġ in tokenizer
#20
by Sm1Ling - opened
Why are there so many characters "Ġ" in tokenizer?
my understanding is that this character simply indicates the beginning of a word. I think its presence improves model's behavior around word boundaries.
I appreciate your answer a lot!
Sm1Ling changed discussion status to closed