GitHub - gitworkflows/Data-sets: Data sets used in examples

Here you can find all data sets that are used is examples at Pythonfordatascience.org.

These data sets are open to the public and can be downloaded and used by anyone. The sources of each data set will be inlcuded in this README file.

To download all files, click the Clone or download drop down arrow and select "Download ZIP". This will download all the data sets used. Another option is to click on the file that you are interested in and click the "Raw" button which will open the file the browser. From here, the URL link can be used in the pandas.read_csv() method and it will import the dataset.

Data sets (in no particular order)
The Energy Level.csv data set is a simulated data set that was created to be used in an independent t-test and compared two groups, Group A and Group B, on some outcome measure. The values range 1-10 and can represent anything that fits within that scale. It was created using the following Python code:

np.random.seed(12345678)

df = pd.DataFrame(np.random.randint(10, size= (100, 2)), columns= ['Group A', 'Group B'])

df.to_csv("Energy Level.csv", index= False)

The automotive_data.csv file was downloaded from Kaggle.com from the user Ramakrishnan Srinivasan; the link to the full page is here: https://www.kaggle.com/toramky/automobile-dataset

The responses.csv file was downloaded from Kaggle.com from the user Miroslav Sabo; the link to the full page is here: https://www.kaggle.com/miroslavsabo/young-people-survey. The "Participant Number" column is not part of the original data set. This was added to show examples on how to merge.

The responses_state.csv file is a simulated file (not real data) to be paired with the responses.csv data in the merging examples.

admission.csv file is from the logistic regression example created by UCLA for their walk through of how to conduct logistic regression using Stata. The original data link is here: https://stats.idre.ucla.edu/stat/stata/dae/binary.dta

blood_pressure.csv is an example data set that is included in Stata. This file was exported from within Stata to be used within Python.

difficile.csv is a made up data set that was created to be used in an example.

fairpoor.csv is a made up data set that was created to be used in an eample.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Energy Level.csv

Energy Level.csv

Iris_Data.csv

Iris_Data.csv

README.md

README.md

admission.csv

admission.csv

automotive_data.csv

automotive_data.csv

blood_pressure.csv

blood_pressure.csv

crop_yield.csv

crop_yield.csv

diamonds.csv

diamonds.csv

difficile.csv

difficile.csv

fairpoor.csv

fairpoor.csv

responses.csv

responses.csv

responses_state.csv

responses_state.csv

sexual_comp.csv

sexual_comp.csv

Repository files navigation

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Energy Level.csv		Energy Level.csv
Iris_Data.csv		Iris_Data.csv
README.md		README.md
admission.csv		admission.csv
automotive_data.csv		automotive_data.csv
blood_pressure.csv		blood_pressure.csv
crop_yield.csv		crop_yield.csv
diamonds.csv		diamonds.csv
difficile.csv		difficile.csv
fairpoor.csv		fairpoor.csv
responses.csv		responses.csv
responses_state.csv		responses_state.csv
sexual_comp.csv		sexual_comp.csv

gitworkflows/Data-sets

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks