-
Updated
Jul 8, 2022 - Python
{{ message }}
Natural Language Processing Best Practices & Examples
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
Snips Python library to extract meaning from text
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
DELTA is a deep learning based natural language and speech processing platform.
Deprecated in favor of https://github.com/facebook/duckling
The Open Source AI Chatbot Platform Builder in 100% C# Running in .NET Core with Machine Learning algorithm.
基于金融-司法领域(兼有闲聊性质)的聊天机器人,其中的主要模块有信息抽取、NLU、NLG、知识图谱等,并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口
Rasa UI is a frontend for the Rasa Framework
User Simulation for Task-Completion Dialogues
Randomly the matrix connector will crash when connected to a public room.
Create a skill and point it to #geeklab:linuxdelta.com and wait for people t
基于自然语言理解与机器学习的聊天机器人,支持多用户并发及自定义多轮对话
Hardware-accelerated vector database and search engine. Available as a HTTP service or as an embedded library.
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.
A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.
JavaScript Web SDK for Dialogflow
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
Pen.el stands for Prompt Engineering in emacs. It facilitates the creation, discovery and usage of prompts to language models. Pen supports OpenAI, EleutherAI, Aleph-Alpha, HuggingFace and others. It's the engine for the LookingGlass imaginary web browser.
A collection of resources to make a smart speaker
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
Add a description, image, and links to the nlu topic page so that developers can more easily learn about it.
To associate your repository with the nlu topic, visit your repo's landing page and select "manage topics."
Description
While using tokenizers.create with the model and vocab file for a custom corpus, the code throws an error and is not able to generate the BERT vocab file
Error Message
ValueError: Mismatch vocabulary! All special tokens specified must be control tokens in the sentencepiece vocabulary.
To Reproduce
from gluonnlp.data import tokenizers
tokenizers.create('spm', model_p