Showing 1-20 of 26 projects
Comprehensive Chinese NLP resource collection for developers
Large-scale Chinese natural language processing corpus for training and fine-tuning language models
Gathers text and metadata from the web using crawling, scraping, and extraction techniques.
A research corpus for benchmarking AI systems on abstract reasoning tasks.
A comprehensive search tool for finding Chinese NLP datasets, with support for common English NLP datasets as well.
This repo contains a list of the 10,000 most common English words, useful for NLP and language modeling tasks.
A Chinese name corpus and generator for natural language processing and entity recognition.
CLUE is a comprehensive Chinese language understanding evaluation benchmark with datasets, baselines, pre-trained models, and a leaderboard.
A pre-trained ALBERT model for self-supervised learning of Chinese language representations.
A collection of deep learning and reinforcement learning research papers, codes, and resources.
A powerful Python-based web scraper for extracting data from Weibo, a popular Chinese social media platform.
A collection of awesome chatbot projects, corpus, papers, and tutorials for developers working with AI-powered chatbots.
Deep learning model to analyze a large corpus of clear text passwords.
A single-document unsupervised keyword extraction tool focused on AI and machine learning use cases.
A large annotated semantic parsing corpus for developing natural language interfaces.
A benchmark for evaluating language understanding models and datasets for the Chinese language.
WikiChat is an improved Retrieval Augmented Generation (RAG) model that reduces language model hallucination by retrieving data from a corpus.
An open-source tool for annotating Chinese text corpus, useful for NLP and text analysis projects.
A Python library that implements a transformer-based seq2seq model for language translation tasks.
A parallel corpus of classical Chinese and modern Chinese texts for language processing and analysis.
Get weekly updates on trending AI coding tools and projects.