Showing 1-18 of 18 projects
Python ETL framework for real-time analytics and LLM pipelines
A collection of handy Bash one-liners and terminal tricks for data processing and Linux system maintenance.
Miller is a powerful CLI tool for processing tabular data like CSV, TSV, and JSON, similar to awk, sed, and other Unix utilities.
A Go tool for selecting, updating, and deleting data from various file formats like JSON, YAML, and XML.
Data transformation framework for AI with ultra-fast, incremental processing capabilities.
A Python library for processing and analyzing data with foundation models and large language models.
A highly optimized GPU-accelerated library for accelerating deep learning training and inference applications.
A lightweight data processing framework built on DuckDB and 3FS for vibe coders working with AI tools.
LLMs-based Operators and Pipelines for data prep
Numaflow is a Kubernetes-native platform to run massively parallel data/streaming jobs.
A large-scale pretrained dialogue model for building conversational AI applications.
Texar is a toolkit for machine learning, NLP, and text generation in TensorFlow, part of the CASL project.
Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.
Concurrent Python made simple, with support for asyncio, multiprocessing, and threading.
Scalable data pre processing and curation toolkit for Large Language Models (LLMs)
A Python library and tools for generating and inspecting data for pre-training large language models (LLMs).
A repository providing data science tools and examples for the Google Cloud Platform.
Distribute and run AI workloads on Kubernetes with a Python-based infrastructure toolkit like PyTorch.
Get weekly updates on trending AI coding tools and projects.