Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

🚀 Feature request

Add better error message to HubertForCTC, Wav2Vec2ForCTC if labels are bigger than vocab size.

Motivation

Following this issue: huggingface/transformers#12264 it is clear that an error message should be thrown if any of the any of the labels are > self.config.vocab_size or else silent errors can sneak into the training script.

So w

In gensim/models/fasttext.py:

    model = FastText(
        vector_size=m.dim,
        vector_size=m.dim,
        window=m.ws,
        window=m.ws,
        epochs=m.epoch,
        epochs=m.epoch,
        negative=m.neg,
        negative=m.neg,
        # FIXME: these next 2 lines read in unsupported FB FT modes (loss=3 softmax or loss=4 onevsall,
        # or model=3 supervi

Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.

@lhoestq

Hi
Wikiann dataset needs to have "spans" columns, which is necessary to be able to use this dataset, but this column is missing from huggingface datasets, could you please have a look? thank you @lhoestq

We have separate CI and CD pipelines, which share a lot of code actually (like the testing part). This is a bit annoying to maintain at the moment, because we have to do all changes twice. It would be great do modularize this somehow so that both workflows

Jun	JUL	Aug
	17
2020	2021	2022

Natural language processing

Here are 14,884 public repositories matching this topic...

huggingface / transformers

🚀 Feature request

Motivation

apachecn / AiLearning

google-research / bert

hankcs / HanLP

explosion / spaCy

oxford-cs-deepnlp-2017 / lectures

virgili0 / Virgilio

RaRe-Technologies / gensim

keon / awesome-nlp

bharathgs / Awesome-pytorch-list

RasaHQ / rasa

flairNLP / flair

allenai / allennlp

chiphuyen / stanford-tensorflow-tutorials

nltk / nltk

spencermountain / compromise

NLP-LOVE / ML-NLP

hanxiao / bert-as-service

botpress / botpress

graykode / nlp-tutorial

huggingface / datasets

stanfordnlp / CoreNLP

sloria / TextBlob

jina-ai / jina

brightmart / text_classification

PaddlePaddle / PaddleHub

crownpku / Awesome-Chinese-NLP

brightmart / nlp_chinese_corpus

dragen1860 / TensorFlow-2.x-Tutorials

NLPchina / ansj_seg