-
Updated
Apr 3, 2020 - Shell
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Intuitive find & replace CLI (sed alternative)
Text Classification Algorithms: A Survey
Python library for creating PEG parsers
A simple Python module for parsing human names into their individual components
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Simple SQL-like syntax on top of Perl text processing.
Open Korean Text Processor - An Open-source Korean Text Processor
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
A fast implementation of Aho-Corasick in Rust.
Textpipe: clean and extract metadata from text
A low level regular expression library that uses deterministic finite automata.
UNIC: Unicode and Internationalization Crates for Rust
Tool which allow you to detect and translate text.
Text vectorization tool to outperform TFIDF for classification tasks
Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.
Extract indicators of compromise from text, including "escaped" ones.
Python library for Natural Language Preprocessing (NLPre)
A web app to create and browse text visualizations for automated customer listening.
Stanford NLP group's shared Python tools.
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku and Zenkaku
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
Preprocessing Library for Natural Language Processing
CogComp's light-weight Python NLP annotators
A Golang library for processing Asciidoc files.
Add a description, image, and links to the text-processing topic page so that developers can more easily learn about it.
To associate your repository with the text-processing topic, visit your repo's landing page and select "manage topics."