The Wayback Machine - http://web.archive.org/web/20220102135832/https://github.com/topics/bert
Skip to content
#

bert

Here are 1,992 public repositories matching this topic...

transformers
ikergarcia1996
ikergarcia1996 commented Dec 10, 2021

🚀 Feature request

Fast Tokenizer for DeBERTA-V3 and mDeBERTa-V3

Motivation

DeBERTa V3 is an improved version of DeBERTa. With the V3 version, the authors also released a multilingual model "mDeBERTa-base" that outperforms XLM-R-base. However, DeBERTa V3 currently lacks a FastTokenizer implementation which makes it impossible to use with some of the example scripts (They require a Fa

haystack
maxupp
maxupp commented Nov 12, 2021

_handle_duplicate_documents and _drop_duplicate_documents in the elastic search document store will always report self.index as the index with the conflict, which is obviously incorrect.

Edit: Upon further investigation, this is actually a lot worse. Using multiple indices with the ElasticSearch DocumentStore is completely broken due to the fact, that this is used in `_handle_duplicate_do

akari0216
akari0216 commented Sep 2, 2021

欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献!
在留下您的问题时,辛苦您同步提供如下信息:

  • 版本、环境信息
    1)PaddleNLP和PaddlePaddle版本:请提供您的PaddleNLP和PaddlePaddle版本号,例如PaddleNLP 2.0.4,PaddlePaddle2.1.1
    2)系统环境:请您描述系统类型,例如Linux/Windows/MacOS/,python版本
  • 复现信息:如为报错,请给出复现环境、复现步骤
    paddle版本2.0.8 paddlenlp版本2.1.0
    建议,能否在paddlenlp文档中,整理列出各个模型的tokenizer是基于什么类别的based,如bert tokenizer是word piece的,xlnet tokenizer是sentence piece的,以及对应的输入输出样例

Improve this page

Add a description, image, and links to the bert topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bert topic, visit your repo's landing page and select "manage topics."

Learn more