Skip to content

Latest commit

 

History

History

edechamps-home-hunt

edechamps Home Hunt pipeline

edechamps Home Hunt (EHH) is a geographic data processing application (pipeline) built within the kmlpipe framework. The goal of the pipeline is to help Etienne Dechamps (the author of this pipeline and of kmlpipe in general), faced with the huge sprawling mess that is the London property market, find a place to live according to his specific personal criteria. Hopefully this can be useful to other people who would like to follow a similar approach.

The pipeline is designed to automate the following tasks:

  • Gather data about supermarkets in London, being careful to exclude small convenience stores (get-supermarkets stage);
  • Gather data about Hyperoptic-enabled buildings (get-hyperoptic stage);
  • Fetch and merge property listings from a given set of about 20 weighted areas (zones) in London (get-listings stage);
  • Only keep new properties that did not already appear in a previous run of the pipeline;
  • For each property, compute the real transit time to the workplace using public transport, the real walking time to the nearest supermarket, and the distance to the nearest Hyperoptic-enabled site (widen stage);
  • For each property, compute a score based on all the information gathered so far and the given keywords, weights and thresholds (compute-partial-scores, compute-total-scores stages);
  • Finally, sort the resulting property list by score and format the results for presentation in a KML viewer such as Google Earth (present stage).

EHH flow diagram

In the end, the results look like this:

Screenshot of the edechamps Home Hunt Pipeline

Running the pipeline

To run the pipeline on a small number of properties for a quick test, use the following procedure:

  1. Make sure kmlpipe is fully operational, i.e. test/run-scenarios succeeds and curl is installed.
  2. Create or reuse a Google Cloud project and enable the Google Places and Google Distance Matrix APIs. Get the API key.
    • Note: running the EHH pipeline in test mode should only result in a very small number of Google Maps API requests - much smaller than what the free quota allows. Nevertheless, if you're paranoid about being billed actual money, feel free to set your own quotas.
  3. Run:
    mkdir cache
    edechamps-home-hunt/run \
        --google-key =<YOUR GOOGLE API KEY> --cache-directory cache > result.kml
    • Note: this can take a few minutes to run and uses a sizable amount of RAM. This is because the Hyperoptic data contains thousands of places, making it expensive to process, even with a small number of properties.
  4. Open result.kml in a KML viewer, preferably Google Earth Pro.

With the above command, the EHH pipeline will run in "test" mode, which only fetches a total of 10 properties in 2 zones. To run the full pipeline (thousands of properties in 20 zones), use the --production flag. Note that this takes much longer and can make a sizeable dent in your Google API free quota. Subsequent runs are much cheaper thanks to the cache.