Extraction of NSW Property Data

Date: 24/04/2023

Last Updated: 06/03/2024

Author: Joseph Cheng

Programming Language: Python3.8 and above

Disclaimer

This is a personal, non-profit project that is intended for the public to access datasets, which can potentially help people make decisions when analysing on the property market.
If the owner / government of this data source requires me to take down this project I will take it down immediately.

Property data is difficult to gather through these days. Luckily in New South Wales - Australia, the NSW State Government has provided public dataset of the transactional property sales data (See link below)
The objective is to create a clean / comprehenable dataset with historical information of the property information in NSW Australia, based on the raw data provided by the government
Please reach out to me to provide any feedbacks / improvements and I will try my best to update the dataset as soon as possible

I am also a first home buyer looking for optimisation to find opportunites in the property market. I hope that by sharing this code repository more people can access to property data and help them find their dream home easier.

number of requests: too many requests within a particular time frame or there are too many parallel requests from the same IP
number of repetitions and find request patterns (X number of requests at every Y seconds)
Honeypots are link traps webmasters can add to the HTML file that are hidden from humans
Redirecting the request to a page with a CAPTCHA
javascript checks
anti-bot mechanisms can spot patterns in the number of clicks, clicks’ location, the interval between clicks, and other metrics

Set Your Timeout to at Least 60 seconds
Don’t Set Custom Headers Unless You 100% Need To
Always Send Your Requests to the HTTPS Version
Avoid Using Sessions Unless Completely Necessary
Manage Your Concurrency Properly
Verify if You Need Geotargeting Before Running Your Scraper
If you want to be able to interact with the page (click on a button, scroll, etc.) then you will need to use your own Selenium, Puppeteer, or Nightmare headless browser

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
nswLandSales		nswLandSales
nswPropertySales		nswPropertySales
.gitignore		.gitignore
create_nsw_land_sales_data.py		create_nsw_land_sales_data.py
create_nsw_property_sales_data.py		create_nsw_property_sales_data.py
extract_nsw_property_sales_data.py		extract_nsw_property_sales_data.py
nsw_district_code.csv		nsw_district_code.csv
nsw_land_sales_exploratory.ipynb		nsw_land_sales_exploratory.ipynb
nsw_property_sales_explorartory.ipynb		nsw_property_sales_explorartory.ipynb
readme.md		readme.md
requirements.txt		requirements.txt
test.py		test.py