Skip to content

rapidsai-community/mortgage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Mortgage Workflow

The Dataset

The dataset used with this workflow is derived from Fannie Mae’s Single-Family Loan Performance Data with all rights reserved by Fannie Mae. This processed dataset is redistributed with permission and consent from Fannie Mae.

To acquire this dataset, please visit RAPIDS Datasets Homepage

Introduction

The Mortgage workflow is composed of three core phases:

  1. ETL - Extract, Transform, Load
  2. Data Conversion
  3. ML - Training

ETL

Data is:

  1. Read in from storage
  2. Transformed to emphasize key features
  3. Loaded into volatile memory for conversion

Data Conversion

Features are:

  1. Broken into (labels, data) pairs
  2. Distributed across many workers
  3. Converted into compressed sparse row (CSR) matrix format for XGBoost

Machine Learning

The CSR data is fed into a distributed training session with xgboost.dask

Performance

We regularly benchmark RAPIDS on this workload to measure our performance against not just Apache Spark on CPUs but past versions of RAPIDS.

Slide 1

Slide 2

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published