Explore Projects

Discover 382 open source projects

Active filters (1):
Search: datasetร—
Clear all

Showing 361-380 of 382 projects

rixwew/pytorch-fm

A PyTorch library for building Factorization Machine models for click-through rate prediction tasks.

1.1K
Archived
Python
ML Ops
API Frameworks
PyTorch
#factorization-machines#ctr-prediction#collaborative-filtering

liucongg/NLPDataSet

A repository containing various NLP datasets collected and organized by the owner.

1.1K
Archived
Datasets
#nlp#datasets#data-collection

doc-analysis/TableBank

TableBank is a benchmark dataset for table detection and recognition, useful for building computer vision models.

1.1K
Archived
Computer Vision
#computer-vision#table-detection#table-recognition

mihail911/nlp-library

A curated collection of research papers and resources for natural language processing (NLP) practitioners.

1.1K
Archived
LLM Frameworks
Books & Guides
None
#nlp#deep-learning#language-model

tb0hdan/domains

A large dataset of Internet domains that can be used for search engine development and research.

1.1K
Active
JavaScript
API Frameworks
Databases
Node.js
#dataset#web-scraping#search-engines

databricks/lilac

An open-source Python library that helps curate better data for large language models (LLMs).

1.1K
Archived
Python
LLM Frameworks
Data Analysis
Python
#data-curation#unstructured-data#dataset-analysis

ternaus/TernausNet

A PyTorch implementation of the TernausNet model for image segmentation, pre-trained on the Kaggle Carvana dataset.

1.1K
Archived
Python
Computer Vision
Backend Frameworks
PyTorch
#image-segmentation#computer-vision#deep-learning

orobix/Prototypical-Networks-for-Few-shot-Learning-PyTorch

A PyTorch implementation of Prototypical Networks for Few-Shot Learning, a powerful technique for training AI models on small datasets.

1.1K
Archived
Python
ML Ops
Inference
PyTorch
#prototypical-networks#few-shot-learning#computer-vision

comet-ml/kangas

A Jupyter Notebook-based library for exploring and analyzing multimedia datasets at scale.

1.1K
Archived
Jupyter Notebook
Data Analysis
Dataframe
#data-exploration#data-visualization#machine-learning

NEU-Gou/awesome-reid-dataset

This GitHub repository is a collection of public person re-identification datasets, which are useful for computer vision and AI research.

1.1K
Stable
Computer Vision
#computer-vision#dataset#person-re-identification

HRNet/HRNet-Image-Classification

A high-resolution network (HRNet) model for image classification trained on the ImageNet dataset.

1.0K
Archived
Python
Computer Vision
PyTorch
#image-classification#computer-vision#deep-learning

chatopera/insuranceqa-corpus-zh

An open-source corpus dataset for chatbots and question-answering systems in the insurance domain.

1.0K
Experimental
Python
LLM Frameworks
CMS & Content
#chatbot#corpus#dataset

piskvorky/gensim-data

A data repository for pre-trained NLP models and corpora to use in language processing projects.

1.0K
Archived
Python
LLM Frameworks
Databases
Python
#nlp#corpora#pretrained-models

google/cluster-data

This is a dataset of Borg cluster traces from Google, which can be useful for researchers and developers in the field of distributed systems and cloud infrastructure.

1.0K
Stable
TeX
Databases
Monitoring
#distributed-systems#cloud-infrastructure#cluster-data

facebookresearch/cc_net

Tools to download and cleanup Common Crawl data, a large web crawl dataset, for further analysis and processing.

1.0K
Archived
Python
ETL & Pipelines
CLI Tools
Python
#data-processing#web-crawling#data-cleanup

PRBonn/lidar-bonnetal

A Python library for semantic and instance segmentation of LiDAR point clouds for autonomous driving.

1.0K
Archived
Python
Computer Vision
API Frameworks
Python
#lidar#point-cloud#segmentation

unrealcv/synthetic-computer-vision

A Python library for generating synthetic datasets and tools for computer vision applications.

1.0K
Archived
Python
Computer Vision
Datasets
#computer-vision#dataset#synthetic-data

OpenBioLink/ThoughtSource

A central, open resource for data and tools related to chain-of-thought reasoning in large language models.

1.0K
Archived
Jupyter Notebook
LLM Frameworks
Databases
Jupyter Notebook
#nlp#question-answering#reasoning

madnight/githut

A GitHub language statistics tool that provides insights into programming language usage across GitHub repositories.

1.0K
Archived
JavaScript
Charts & Visualization
ETL & Pipelines
React
#github-statistics#programming-languages#data-visualization

declare-lab/MELD

A multimodal dataset for emotion recognition in conversation, useful for building conversational AI and chatbots.

1.0K
Archived
Python
Computer Vision
Agents & Orchestration
Python
#emotion-recognition#multimodal-interactions#dialogue-systems
1...1820

Stay in the loop

Get weekly updates on trending AI coding tools and projects.