Skip to content

katsully/camp-data-crash-course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

Data Science is a mixed of

  • Hacking Sills
  • Math & Stats
  • Subject Expertise
  • Detective Skills!

This repo only contains a few examples but many, many more can be found here. This is from Joel Grus's amazing Book, Data Science from Scratch.

Capture

Data can be complex, enormous, and beyond tedious to parse by hand. This is why we need tools, like Python to help us. Data can be used to determine Twitter trends, pinpoint where epidemics began, and Facebook even used their data to draw out global migration patterns.

Data has also played a major role in recent events, this is a great article about data highlighting racial inequality, this is a great complied list of institutions and organizations utilizing data to highlight racial inequality, and this looks at a data-driven model to predict the severity of COVID-19 cases.

Example Dataset

This repo uses two CSV (comma separated values) file from NYC OpenData. For Civilian Complaints, we are looking at all complaints gathered by the Civilian Complaint Review Board. Download the data set as a CSV here. The second example is about all complaints received pertaining to the Department of Buildings. Download the data set as a CSV file here. Be sure to save the files inside this repository's folder! Some of these columns contains codes such as Disposition Codes and Complaints Categories and codes are not very helpful, so to help 'decode' this dataset I've included some resources:

Getting Started

First, you want Python installed on your computer. (Mac comes with Python already installed) You can download Python here. I recommend using the macOS 64-bit installer for Mac users (if Python is not already installed) and the Windows x86-64 executable installer for Windows. (Be sure to include Python in the PATH during installation!)

You'll also need to install Jupyter Notebook. Following the directions from the official site, in terminal (or Command Prompt for Windows) you'll want to type the following commands:

python -m pip install --upgrade pip
python -m pip install jupyter

If you have two versions of python installed you may need to do

pip3 install juypter

We also need to install two Python modules, pandas and matplotlib. To do this we'll type (in the terminal)

pip install pandas
pip install matplotlib

Again, if you have two versions of python installed you may need to do

pip3 install pandas
pip3 install matplotlib

Finally we'll run jupyter notebook. Be sure to run this command from the folder where you downloaded the repo!

Questions? Concerns? Please create an issue in the repo!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published