Explore Projects

Discover 382 open source projects

Active filters (1):

Search: dataset×

Clear all

Showing 221-240 of 382 projects

zwang4/awesome-machine-learning-in-compilers

A collection of research papers and tools related to using machine learning for compiler and system optimization.

1.7K

Active

ML Ops

Build Tools

#machine-learning#compilers#optimization

Toyhom/Chinese-medical-dialogue-data

This repository contains a dataset of Chinese medical dialogues for NLP and conversational AI research.

1.6K

Archived

Python

LLM Frameworks

Datasets

#medical-data#chinese-language#natural-language-processing

bespokelabsai/curator

A Python library for synthetic data curation and structured data extraction for machine learning models.

1.6K

Active

Python

Synthetic Data

LLM Frameworks

Python

#machine-learning#data-generation#data-curation

koaning/drawdata

A Python library that allows developers to easily draw datasets within their notebooks.

1.6K

Active

JavaScript

Databases

Charts & Visualization

JavaScript

#data#visualization#notebook

EleutherAI/the-pile

The Pile is a large, diverse language model training dataset for use in AI research and development.

1.6K

Archived

Python

LLM Frameworks

Datasets

Python

#language-model#dataset#machine-learning

huggingface/aisheets

Build, enrich, and transform datasets using AI models with no code

1.6K

Stable

TypeScript

LLM Frameworks

AI SDKs & Wrappers

TypeScript

#ai#llms#nocode

teknium1/GPTeacher

A collection of modular datasets generated by GPT-4 for AI code generation and prompt engineering.

1.6K

Archived

Python

LLM Wrappers & SDKs

AI Code Generation

Python

#llm#gpt#ai-generation

PKU-Alignment/safe-rlhf

A safe reinforcement learning from human feedback (RLHF) system for aligning large language models with human values.

1.6K

Stable

Python

LLM Frameworks

Reinforcement Learning

#ai-safety#large-language-models#reinforcement-learning

gururise/AlpacaDataCleaned

A cleaned and curated version of the Alpaca dataset from Stanford, useful for machine learning projects.

1.6K

Archived

Python

Datasets

#machine-learning#dataset#computer-vision

sudharsan13296/Awesome-Meta-Learning

A curated list of resources for meta-learning, including papers, code, books, and more for developers working with AI tools.

1.6K

Archived

LLM Frameworks

Tutorials & Courses

#meta-learning#few-shot-learning#one-shot-learning

lotus-data/lotus

A Python library that uses LLMs and embeddings to process datasets with up to 1000x speedups

1.6K

Active

Python

LLM Frameworks

ETL & Pipelines

Python

#ai-data-processing#llm#semantic-search

capitalone/DataProfiler

A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.

1.5K

Stable

Python

ETL & Pipelines

CLI Tools

Python

#data-profiling#data-analysis#privacy

antontarasenko/smq

A collection of SQL queries to analyze social media datasets.

1.5K

Archived

TSQL

Databases

API Frameworks

#sql#social-media#data-analysis

allenai/bi-att-flow

A PyTorch implementation of the BiDAF network for question-answering on the SQuAD dataset.

1.5K

Archived

Python

NLP

API Frameworks

PyTorch

#bidaf#question-answering#nlp

Autodesk/react-base-table

A highly performant and flexible React table component for displaying large datasets.

1.5K

Archived

JavaScript

Component Libraries (React)

UI Component Libraries

React

#react#table#virtualized

yechens/NL2SQL

A project that integrates text-to-SQL dataset, solutions, and research papers for developers working with AI tools.

1.5K

Archived

LLM Frameworks

Databases

#text-to-sql#dataset#research-papers

meta-llama/synthetic-data-kit

Tool for generating high-quality synthetic datasets

1.5K

Stable

Python

React

#synthetic-data-kit#data-generation#llm

CLUEbenchmark/CLUENER2020

CLUENER2020 is a Chinese fine-grained named entity recognition dataset and benchmark for AI-powered NLP development.

1.5K

Archived

Python

Fine-tuning

Databases

Python

#chinese-ner#named-entity-recognition#seq2seq

Vchitect/VBench

An open-source benchmarking tool for evaluating video generation models.

1.5K

Active

Python

React

#authentication#benchmarking#evaluation

facebookresearch/fastMRI

A large-scale dataset of raw MRI measurements and clinical MRI images for medical imaging research.

1.5K

Archived

Python

Computer Vision

Datasets

PyTorch

#medical-imaging#mri-reconstruction#deep-learning

1...1113...20

Stay in the loop

Get weekly updates on trending AI coding tools and projects.