bert
Here are 1,992 public repositories matching this topic...
Natural Language Processing Tutorial for Deep Learning Researchers
-
Updated
Jul 25, 2021 - Jupyter Notebook
Mapping a variable-length sentence to a fixed-length vector using BERT model
-
Updated
Jul 1, 2021 - Python
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
-
Updated
Oct 22, 2020
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
-
Updated
Dec 17, 2021 - Python
-
Updated
Dec 31, 2021 - Rust
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.
-
Updated
Dec 27, 2021 - Python
chooses 15% of token
From paper, it mentioned
Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.
It means that 15% of token will be choose for sure.
From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.
PositionalEmbedding
Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services
-
Updated
Feb 24, 2021 - Python
_handle_duplicate_documents and _drop_duplicate_documents in the elastic search document store will always report self.index as the index with the conflict, which is obviously incorrect.
Edit: Upon further investigation, this is actually a lot worse. Using multiple indices with the ElasticSearch DocumentStore is completely broken due to the fact, that this is used in `_handle_duplicate_do
A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
-
Updated
Oct 22, 2020 - Python
BertViz: Visualize Attention in Transformer Models ( BERT, GPT-2, BART, etc.)
-
Updated
Dec 29, 2021 - Python
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
-
Updated
Jul 15, 2021 - Jupyter Notebook
pycorrector is a toolkit for text error correction. 文本纠错,Kenlm,Seq2Seq_Attention,BERT,MacBERT,ELECTRA,ERNIE,Transformer等模型实现,开箱即用。
-
Updated
Dec 28, 2021 - Python
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
-
Updated
Sep 12, 2021 - Python
文档增加tokenizer类别及样例建议
欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献!
在留下您的问题时,辛苦您同步提供如下信息:
- 版本、环境信息
1)PaddleNLP和PaddlePaddle版本:请提供您的PaddleNLP和PaddlePaddle版本号,例如PaddleNLP 2.0.4,PaddlePaddle2.1.1
2)系统环境:请您描述系统类型,例如Linux/Windows/MacOS/,python版本 - 复现信息:如为报错,请给出复现环境、复现步骤
paddle版本2.0.8 paddlenlp版本2.1.0
建议,能否在paddlenlp文档中,整理列出各个模型的tokenizer是基于什么类别的based,如bert tokenizer是word piece的,xlnet tokenizer是sentence piece的,以及对应的输入输出样例
关于一些具体建议
State of the Art Natural Language Processing
-
Updated
Jan 1, 2022 - Scala
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
-
Updated
Nov 11, 2021 - Python
Implementation of BERT that could load official pre-trained models for feature extraction and prediction
-
Updated
Jun 19, 2021 - Python
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
-
Updated
Jul 9, 2021 - Python
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
-
Updated
Aug 26, 2021 - Python
A curated list of pretrained sentence and word embedding models
-
Updated
Apr 23, 2021 - Python
RoBERTa中文预训练模型: RoBERTa for Chinese
-
Updated
Jul 8, 2021 - Python
LightSeq: A High Performance Library for Sequence Processing and Generation
-
Updated
Dec 29, 2021 - Cuda
Multi-Task Deep Neural Networks for Natural Language Understanding
-
Updated
Nov 13, 2021 - Python
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
-
Updated
Dec 27, 2021 - Python
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
-
Updated
Jan 1, 2022 - Python
bert nlp papers, applications and github resources, including the newst xlnet , BERT、XLNet 相关论文和 github 项目
-
Updated
Mar 21, 2021
Super easy library for BERT based NLP models
-
Updated
Dec 24, 2021 - Python
Improve this page
Add a description, image, and links to the bert topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the bert topic, visit your repo's landing page and select "manage topics."


Fast Tokenizer for DeBERTA-V3 and mDeBERTa-V3
Motivation
DeBERTa V3 is an improved version of DeBERTa. With the V3 version, the authors also released a multilingual model "mDeBERTa-base" that outperforms XLM-R-base. However, DeBERTa V3 currently lacks a FastTokenizer implementation which makes it impossible to use with some of the example scripts (They require a Fa