-
Updated
Nov 5, 2020 - Shell
{{ message }}
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Intuitive find & replace CLI (sed alternative)
Text Classification Algorithms: A Survey
Python library for creating PEG parsers
A simple Python module for parsing human names into their individual components
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Open Korean Text Processor - An Open-source Korean Text Processor
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
A fast implementation of Aho-Corasick in Rust.
pip install -r requirements.txt
python -m spacy download en_core_web_sm # depend on which language you want to you: https://spacy.io/usage/models
THE String Processing Package for R (with ICU)
A low level regular expression library that uses deterministic finite automata.
Tool which allow you to detect and translate text.
Python library for Natural Language Preprocessing (NLPre)
Text vectorization tool to outperform TFIDF for classification tasks
Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.
Extract indicators of compromise from text, including "escaped" ones.
A web app to create and browse text visualizations for automated customer listening.
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku and Zenkaku
Stanford NLP group's shared Python tools.
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
Preprocessing Library for Natural Language Processing
Add a description, image, and links to the text-processing topic page so that developers can more easily learn about it.
To associate your repository with the text-processing topic, visit your repo's landing page and select "manage topics."
I'd like to be able to run commands on all lines of a file. For example,
bsed wrap lines with "should execute on all lines of the file. Current workaround is to include some trivial filter likewrap lines containing '.' with "