
GitHub - huggingface/datasets: The largest hub of ready-to-use ...
🤗 Datasets is a lightweight library providing two main features: one-line dataloaders for many public datasets: one-liners to download and pre-process any of the major public datasets (image datasets, …
datasets · GitHub Topics · GitHub
Dec 29, 2025 · GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.
GitHub - ncbi/datasets: NCBI Datasets is a new resource that lets you ...
NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases. - ncbi/datasets
GitHub - datasets/commons: DataHub commons. Wiki catalog of …
DataHub commons. Wiki catalog of interesting and important datasets - datasets/commons
GitHub - luminati-io/Free-datasets: A collection of multiple free ...
This repository contains a collection of free datasets with thousands of records for use in data analysis, machine learning, and research. The datasets span multiple domains, from business to social media …
A collection of datasets originally distributed in R packages
Rdatasets is a collection of 3499 datasets which were originally distributed alongside the statistical software environment R and some of its add-on packages. The goal is to make these data more …
TensorFlow Datasets - GitHub
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ... - tensorflow/datasets
Releases · huggingface/datasets - GitHub
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools - huggingface/datasets
Google Research Datasets - GitHub
Datasets released by Google Research. Google Research Datasets has 172 repositories available. Follow their code on GitHub.
GitHub - allenai/olmocr: Toolkit for linearizing PDFs for LLM datasets ...
About Toolkit for linearizing PDFs for LLM datasets/training Readme Apache-2.0 license Contributing