-
Updated
Apr 29, 2022 - Python
{{ message }}
LunaSec - Open Source AppSec platform that automatically notifies you the next time vulnerabilities like Log4Shell or node-ipc happen. Track your dependencies and builds in a centralized service. Get started in one-click via our GitHub App or host it yourself. https://github.com/apps/lunatrace-by-lunasec/
A secure user directory built for developers to comply with the GDPR
Unsupervised text tokenizer focused on computational efficiency
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
The slides for most of the courses need to be unzipped.
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
Rule-based token, sentence segmentation for Russian language
Fast and customizable text tokenization library with BPE and SentencePiece support
An official Sudachi clone in Rust
TokenScript schema, specs and paper
Simple NLP in Rust with Python bindings
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
High performance tokenizers for natural language processing and other related tasks
Implementation of the GBST block from the Charformer paper, in Pytorch
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
Language Modeling and Text Classification in Malayalam Language using ULMFiT
Collection of Wongnai's datasets
How easy would it be to change the library to have versions of the encode and decode functions where the payload JSON was provided / returned just as the JSON text?
There are other good JSON generation / parsing libraries available, and some people may wish to use them to generate or process the payload, rather than the built in claim processing.
Natural Language Processing Toolkit in Golang
Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.
To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."
Currently different transaction types show different details. Asset transactions don't show "To" and "From". On RVN Transactions the "From" is almost always listed as "unknown". Maybe change this to list a sending address and if there is a label on one of the sending addresses, put the label after it in parenthesis like it has handled in the "To" section. See inconsistencies below:
![assettx](htt