Skip to Main Content

Health Data Analytics

This guide is for the introduction seminar for the Health Data Analytics MS program in the Department of Rehabilitation and Health Services.

Open Dataset Repositories

Government Data Surveys

Library Databases with Practice Data

Practice Datasets

These datasets are in open repositories for public use, and usually they've been compiled from other public and government sources. You may find them useful for practicing your skills. Unlike some of the other links we've provided to publicly accessible raw data, because these have already been compiled or analyzed in some way, your professor or instructor may not allow use of these datasets in your course.  While these may be useful as practice sets, they may not be appropriate for use in coursework. Before using these data for an assignment, please check with your faculty.

After describing the dataset below, we have provided the license associated with the dataset. 

UCI Machine Learning Repository
The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.

Please note that this repo often links to datasets hosted on sites like Kaggle, which you may not be able to use for your assignments. These Kaggle datasets usually are cleaned up versions of data that was originally found on public, government websites. 

Kaggle 
Kaggle is a subsidiary of Google and is an online community of data scientists and people from related fields. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning. 

Many professors do not allow Kaggle datasets to be used in assignments. While these data might be useful for you in practicing your skills, please check with your faculty and your course syllabi before using any of these data. 

Most Kaggle data has been pulled from other public, often government sources. If you find a dataset in Kaggle that you would like to use, check its provenance to backtrack to where the original raw data was sourced. Please keep in mind that because Kaggle projects have already been completed, many data analytics faculty do not allow projects which are too similar to things that have already been posted to Kaggle. 

Miscellaneous