Explore Projects

Discover 26 open source projects

Active filters (1):
Search: corpus×
Clear all

Showing 1-20 of 26 projects

fighting41love/funNLP

Comprehensive Chinese NLP resource collection for developers

79.2K
Archived
Python
LLM Frameworks
RAG & Vector
Python
#nlp#chinese-nlp#ai-resources

brightmart/nlp_chinese_corpus

Large-scale Chinese natural language processing corpus for training and fine-tuning language models

9.9K
Stable
LLM Frameworks
#chinese#nlp#corpus

adbar/trafilatura

Gathers text and metadata from the web using crawling, scraping, and extraction techniques.

5.4K
Stable
Python
React
#web-scraping#text-extraction#metadata-gathering

fchollet/ARC-AGI

A research corpus for benchmarking AI systems on abstract reasoning tasks.

4.7K
Experimental
JavaScript
Agents & Orchestration
Computer Vision
JavaScript
#artificial-intelligence#intelligence-testing#psychometrics

CLUEbenchmark/CLUEDatasetSearch

A comprehensive search tool for finding Chinese NLP datasets, with support for common English NLP datasets as well.

4.4K
Archived
Python
Datasets
Tutorials & Courses
Python
#chinese#nlp#datasets

first20hours/google-10000-english

This repo contains a list of the 10,000 most common English words, useful for NLP and language modeling tasks.

4.3K
Archived
Databases
NLP
#nlp#language-modeling#dataset

wainshine/Chinese-Names-Corpus

A Chinese name corpus and generator for natural language processing and entity recognition.

4.3K
Stable
Databases
CLI Tools
#corpus#dataset#names

CLUEbenchmark/CLUE

CLUE is a comprehensive Chinese language understanding evaluation benchmark with datasets, baselines, pre-trained models, and a leaderboard.

4.2K
Stable
Python
LLM Frameworks
Datasets
PyTorch
#chinese#nlp#bert

brightmart/albert_zh

A pre-trained ALBERT model for self-supervised learning of Chinese language representations.

4.0K
Archived
Python
LLM Frameworks
API Frameworks
PyTorch
#albert#bert#chinese-nlp

endymecy/awesome-deeplearning-resources

A collection of deep learning and reinforcement learning research papers, codes, and resources.

3.0K
Active
React
#deep-learning#reinforcement-learning#research-papers

lucasjinreal/weibo_terminater

A powerful Python-based web scraper for extracting data from Weibo, a popular Chinese social media platform.

2.3K
Archived
Python
Backend & APIs
Scraping & ETL
Python
#web-scraper#social-media-data#chinese-corpus

fendouai/Awesome-Chatbot

A collection of awesome chatbot projects, corpus, papers, and tutorials for developers working with AI-powered chatbots.

2.2K
Archived
Python
LLM Frameworks
API Frameworks
Tensorflow
#chatbot#nlp#seq2seq

philipperemy/tensorflow-1.4-billion-password-analysis

Deep learning model to analyze a large corpus of clear text passwords.

2.0K
Archived
Python
Natural Language Processing
Deep Learning
TensorFlow
#deep-learning#natural-language-processing#password-analysis

INESCTEC/yake

A single-document unsupervised keyword extraction tool focused on AI and machine learning use cases.

1.8K
Stable
Jupyter Notebook
Corpus-Independent Keyword Extraction
#ai#keyword-extraction#unsupervised

salesforce/WikiSQL

A large annotated semantic parsing corpus for developing natural language interfaces.

1.8K
Stable
HTML
React
#natural-language-interface#semantic-parsing#corpus

ChineseGLUE/ChineseGLUE

A benchmark for evaluating language understanding models and datasets for the Chinese language.

1.8K
Archived
Python
LLM Frameworks
Datasets
Python
#nlp#benchmarking#language-understanding

stanford-oval/WikiChat

WikiChat is an improved Retrieval Augmented Generation (RAG) model that reduces language model hallucination by retrieving data from a corpus.

1.6K
Active
Python
RAG & Vector
LLM Frameworks
Python
#chatbot#factuality#emnlp2023

deepwel/Chinese-Annotator

An open-source tool for annotating Chinese text corpus, useful for NLP and text analysis projects.

1.5K
Archived
JavaScript
Computer Vision
Search
React
#nlp#text-annotation#chinese

SamLynnEvans/Transformer

A Python library that implements a transformer-based seq2seq model for language translation tasks.

1.4K
Archived
Python
LLM Frameworks
API Frameworks
#nlp#machine-translation#transformer

NiuTrans/Classical-Modern

A parallel corpus of classical Chinese and modern Chinese texts for language processing and analysis.

1.4K
Archived
Python
Databases
Tutorials & Courses
#corpus#parallel-corpus#traditional-chinese
2

Stay in the loop

Get weekly updates on trending AI coding tools and projects.