Explore Projects

Discover 382 open source projects

Active filters (1):
Search: datasetsร—
Clear all

Showing 221-240 of 382 projects

zwang4/awesome-machine-learning-in-compilers

A collection of research papers and tools related to using machine learning for compiler and system optimization.

1.7K
Active
ML Ops
Build Tools
#machine-learning#compilers#optimization

Toyhom/Chinese-medical-dialogue-data

This repository contains a dataset of Chinese medical dialogues for NLP and conversational AI research.

1.6K
Archived
Python
LLM Frameworks
Datasets
#medical-data#chinese-language#natural-language-processing

bespokelabsai/curator

A Python library for synthetic data curation and structured data extraction for machine learning models.

1.6K
Active
Python
Synthetic Data
LLM Frameworks
Python
#machine-learning#data-generation#data-curation

koaning/drawdata

A Python library that allows developers to easily draw datasets within their notebooks.

1.6K
Active
JavaScript
Databases
Charts & Visualization
JavaScript
#data#visualization#notebook

EleutherAI/the-pile

The Pile is a large, diverse language model training dataset for use in AI research and development.

1.6K
Archived
Python
LLM Frameworks
Datasets
Python
#language-model#dataset#machine-learning

huggingface/aisheets

Build, enrich, and transform datasets using AI models with no code

1.6K
Stable
TypeScript
LLM Frameworks
AI SDKs & Wrappers
TypeScript
#ai#llms#nocode

teknium1/GPTeacher

A collection of modular datasets generated by GPT-4 for AI code generation and prompt engineering.

1.6K
Archived
Python
LLM Wrappers & SDKs
AI Code Generation
Python
#llm#gpt#ai-generation

PKU-Alignment/safe-rlhf

A safe reinforcement learning from human feedback (RLHF) system for aligning large language models with human values.

1.6K
Stable
Python
LLM Frameworks
Reinforcement Learning
#ai-safety#large-language-models#reinforcement-learning

gururise/AlpacaDataCleaned

A cleaned and curated version of the Alpaca dataset from Stanford, useful for machine learning projects.

1.6K
Archived
Python
Datasets
#machine-learning#dataset#computer-vision

sudharsan13296/Awesome-Meta-Learning

A curated list of resources for meta-learning, including papers, code, books, and more for developers working with AI tools.

1.6K
Archived
LLM Frameworks
Tutorials & Courses
#meta-learning#few-shot-learning#one-shot-learning

lotus-data/lotus

A Python library that uses LLMs and embeddings to process datasets with up to 1000x speedups

1.6K
Active
Python
LLM Frameworks
ETL & Pipelines
Python
#ai-data-processing#llm#semantic-search

capitalone/DataProfiler

A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.

1.5K
Stable
Python
ETL & Pipelines
CLI Tools
Python
#data-profiling#data-analysis#privacy

antontarasenko/smq

A collection of SQL queries to analyze social media datasets.

1.5K
Archived
TSQL
Databases
API Frameworks
#sql#social-media#data-analysis

allenai/bi-att-flow

A PyTorch implementation of the BiDAF network for question-answering on the SQuAD dataset.

1.5K
Archived
Python
NLP
API Frameworks
PyTorch
#bidaf#question-answering#nlp

Autodesk/react-base-table

A highly performant and flexible React table component for displaying large datasets.

1.5K
Archived
JavaScript
Component Libraries (React)
UI Component Libraries
React
#react#table#virtualized

yechens/NL2SQL

A project that integrates text-to-SQL dataset, solutions, and research papers for developers working with AI tools.

1.5K
Archived
LLM Frameworks
Databases
#text-to-sql#dataset#research-papers

meta-llama/synthetic-data-kit

Tool for generating high-quality synthetic datasets

1.5K
Stable
Python
React
#synthetic-data-kit#data-generation#llm

CLUEbenchmark/CLUENER2020

CLUENER2020 is a Chinese fine-grained named entity recognition dataset and benchmark for AI-powered NLP development.

1.5K
Archived
Python
Fine-tuning
Databases
Python
#chinese-ner#named-entity-recognition#seq2seq

Vchitect/VBench

An open-source benchmarking tool for evaluating video generation models.

1.5K
Active
Python
React
#authentication#benchmarking#evaluation

facebookresearch/fastMRI

A large-scale dataset of raw MRI measurements and clinical MRI images for medical imaging research.

1.5K
Archived
Python
Computer Vision
Datasets
PyTorch
#medical-imaging#mri-reconstruction#deep-learning
1...1113...20

Stay in the loop

Get weekly updates on trending AI coding tools and projects.