-
Updated
Oct 16, 2021 - Shell
{{ message }}
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Intuitive find & replace CLI (sed alternative)
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Text Classification Algorithms: A Survey
Python library for creating PEG parsers
Program to convert lines of text into a tree structure.
A fast implementation of Aho-Corasick in Rust.
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
A simple Python module for parsing human names into their individual components
Open Korean Text Processor - An Open-source Korean Text Processor
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
I'd like to be able to run commands on all lines of a file. For example, bsed wrap lines with " should execute on all lines of the file. Current workaround is to include some trivial filter like wrap lines containing '.' with "
Textpipe: clean and extract metadata from text
A low level regular expression library that uses deterministic finite automata.
Automatic Korean word spacing with Python
Are there any cheat sheets of stringi available? Like this one of stringr: http://edrub.in/CheatSheets/cheatSheetStringr.pdf
It would be more efficient to have a cheat sheet since R base, stringr, and stringi have different but similar types of syntax, which could be confusing some times.
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
Text vectorization tool to outperform TFIDF for classification tasks
Tool which allow you to detect and translate text.
Python library for Natural Language Preprocessing (NLPre)
短文本聚类预处理模块 Short text cluster
Recreated sources for the book "UNIX Text Processing," published in 1987.
Add a description, image, and links to the text-processing topic page so that developers can more easily learn about it.
To associate your repository with the text-processing topic, visit your repo's landing page and select "manage topics."
Please change i.e.
https://github.com/sstadick/hck/releases/download/v0.7.1/hck-windows-amd64tohttps://github.com/sstadick/hck/releases/download/v0.7.1/hck-windows-amd64.exe ;)Thanks :)