Skip to content

gitworkflows/Data-sets

 
 

Repository files navigation

Here you can find all data sets that are used is examples at Pythonfordatascience.org.

These data sets are open to the public and can be downloaded and used by anyone. The sources of each data set will be inlcuded in this README file.

To download all files, click the Clone or download drop down arrow and select "Download ZIP". This will download all the data sets used. Another option is to click on the file that you are interested in and click the "Raw" button which will open the file the browser. From here, the URL link can be used in the pandas.read_csv() method and it will import the dataset.

Data sets (in no particular order)
The Energy Level.csv data set is a simulated data set that was created to be used in an independent t-test and compared two groups, Group A and Group B, on some outcome measure. The values range 1-10 and can represent anything that fits within that scale. It was created using the following Python code:

np.random.seed(12345678)

df = pd.DataFrame(np.random.randint(10, size= (100, 2)), columns= ['Group A', 'Group B'])

df.to_csv("Energy Level.csv", index= False)

The automotive_data.csv file was downloaded from Kaggle.com from the user Ramakrishnan Srinivasan; the link to the full page is here: https://www.kaggle.com/toramky/automobile-dataset

The responses.csv file was downloaded from Kaggle.com from the user Miroslav Sabo; the link to the full page is here: https://www.kaggle.com/miroslavsabo/young-people-survey. The "Participant Number" column is not part of the original data set. This was added to show examples on how to merge.

The responses_state.csv file is a simulated file (not real data) to be paired with the responses.csv data in the merging examples.

admission.csv file is from the logistic regression example created by UCLA for their walk through of how to conduct logistic regression using Stata. The original data link is here: https://stats.idre.ucla.edu/stat/stata/dae/binary.dta

blood_pressure.csv is an example data set that is included in Stata. This file was exported from within Stata to be used within Python.

difficile.csv is a made up data set that was created to be used in an example.

fairpoor.csv is a made up data set that was created to be used in an eample.

About

Data sets used in examples

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published