If you choose this second option, there are three possibilities you can use to gather all the input TensorsNevertheless, in the vocabulary size growth in RoBERTa allows to encode almost any word or subword without using the unknown token, compared to BERT. This gives a considerable advantage to RoBERTa as the model can now more fully understand com… Read More