Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
-
Updated
Jun 3, 2020 - Python
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
From the code (input_pipeline.py) I can see that the ParallelTextInputPipeline automatically generates the SEQUENCE_START and SEQUENCE_END tokens (which means that the input text does not need to have those special tokens).
Does ParallelTextInputPipeline also perform **_padding
When positional encoding is disabled, the embedding scaling is also disabled even though the operations are independent:
https://github.com/OpenNMT/OpenNMT-py/blob/1.0.0/onmt/modules/embeddings.py#L48
In consequence, Transformer models with relative position representations do not follow the reference implementation which scales the embedding [by default](https://github.com/tensorflow/tensor
Current documentation in README explains how to install the toolkit and how to run examples. However, I don't think this is enough for users who want to make some changes to the existing recipes or make their own new recipe. In that case, one needs to understand what run.sh does step by step, but I think docs for that are missing at the moment. It would be great if we provide documentation for:
Lingvo
warnings and add spellcheck).warnings in the docs building.Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Hi,
Is it possible to add benchmarks of some models into documentation for comparison purposes ?
Also run time would be helpful. For example 1M iteration takes a weekend with GTX 1080.
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
搜索所有中文NLP数据集,附常用英文NLP数据集
Open-Source Neural Machine Translation in Tensorflow
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
TransformerDecoder.forward: where does self.training come from?
https://github.com/asyml/texar-pytorch/blob/d17d502b50da1d95cb70435ed21c6603370ce76d/texar/torch/modules/decoders/transformer_decoders.py#L448-L449
All arguments should say their types explicitly in the docstring. E.g., what is the type of infer_mode? The [method signature](https://texar-pytorch.readthedocs.
Minimal Seq2Seq model with Attention for Neural Machine Translation in PyTorch
Neural Machine Translation with Keras
An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group
Based on this line of code:
https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/decoders/output_projection.py#L125
Current implementation isn't flexible enough; if we train a "submodel" (e.g. decoder without attention - not containing any ctx_tensors) we cannot use the trained variables to initialize model with attention defined because the size of the dense layer matrix input become
ByteNet for character-level language modelling
Minimalist NMT for educational purposes
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
Python port of Moses tokenizer, truecaser and normalizer
Machine-Translation-based sentence alignment tool for parallel text
A Pytorch Implementation of "Attention is All You Need" and "Weighted Transformer Network for Machine Translation"
Implementation of Dual Learning NMT on PyTorch
Open-Source Machine Translation Quality Estimation in PyTorch
Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
Environment
tensorflow==1.14.0Log
$ python build_vocab.py data/monument_300/data_300.en > data/monument_300/vocab.en
WARNING:tensorflow:From build_vocab.py:44: VocabularyProcessor.__init__ (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorfl[ACL 2020] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Code for AAAI2020 paper "Graph Transformer for Graph-to-Sequence Learning"
Add a description, image, and links to the machine-translation topic page so that developers can more easily learn about it.
To associate your repository with the machine-translation topic, visit your repo's landing page and select "manage topics."
Description
I am wondering when Assessing the Factual Accuracy of Generated Text in https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/wikifact will be publically available since it's already been 6 months. @bengoodrich