About Freddie Mac

The data originates from Freddy Mac a group created by congress in an effort to make mortgages more accessible to Americans. This accessibility is derived from Freddie Mac’s mortgage purchases. Lenders sell the mortgage loans to Freddie Mac who in turn uses the assets as securities sold to investors. Securities give one the right to something else– a financial contract. In this case mortgages, as securities, give investors a monthly stream of income for decades.

As a main purchaser of mortgages, Freddie Mac has a large amount of mortgage data across the entire US. These datapoints are ideal for the Home Loan Adviser model.

About the Raw Data

The data is provided by Freddie Mac in zip files by year. The download is available here. A guide is also provided here.

Each year of the SFLL data is composed of quarters, which are provided in zip files as well. Each quarter then contains two files, one contains loan origination data and the other contains the performances of each loan. Their “key” is a column named loan_sequence_number to be used for joining if needed.

#| eval: false
#--to extract automatically (on mac)-- 
cd ~/Downloads
unzip "historical_data_*.zip"

#-- remove the main zip from folder and re-run to extract quarters--
unzip "historical_data_*.zip"

Once the nested zip files were extracted, there were a total of 40 files. The raw data is in .txt format separated by | vertical bars, or “pipes” instead of commas or tabs. There are no headers so all columns were named upon importing.

Data Dictionary

column_name type max_length
0 Credit Score Numeric 4
1 First Payment Date Date 6
2 First Time Homebuyer Flag Alpha 1
3 Maturity Date Date 6
4 Metropolitan Statistical Area (MSA) Or Metropo... Numeric 5
5 Mortgage Insurance Percentage (MI %) Numeric 3
6 Number of Units Numeric 2
7 Occupancy Status Alpha 1
8 Original Combined Loan-to-Value (CLTV) Numeric 3
9 Original Debt-to-Income (DTI) Ratio Numeric 3
10 Original UPB Numeric 12
11 Original Loan-to-Value (LTV) Numeric 3
12 Original Interest Rate Numeric - 6,3 6
13 Channel Alpha 1
14 Prepayment Penalty Mortgage (PPM) Flag Alpha 1
15 Amortization Type (Formerly Product Type) Alpha 5
16 Property State Alpha 2
17 Property Type Alpha 2
18 Postal Code Numeric 5
19 Loan Sequence Number Alpha Numeric - PYYQnXXXXXXX 12
20 Loan Purpose Alpha 1
21 Original Loan Term Numeric 3
22 Number of Borrowers Numeric 2
23 Seller Name Alpha Numeric 60
24 Servicer Name Alpha Numeric 60
25 Super Conforming Flag Alpha 1
26 Pre-HARP Loan Sequence Number Alpha Numeric - PYYQnXXXXXXX 12
27 Program Indicator Alpha Numeric 1
28 HARP Indicator Alpha 1
29 Property Valuation Method Numeric 1
30 Interest Only (I/O) Indicator Alpha 1
31 Mortgage Insurance Cancellation Indicator Alpha 1
column_name type max_length
0 Loan Sequence Number Alpha Numeric - PYYQnXXXXXXX 12
1 Monthly Reporting Period Date 6
2 Current Actual UPB Numeric - 12,2 12
3 Current Loan Delinquency Status Alpha Numeric 3
4 Loan Age Numeric 3
5 Remaining Months to Legal Maturity Numeric 3
6 Defect Settlement Date Date 6
7 Modification Flag Alpha 1
8 Zero Balance Code Numeric 2
9 Zero Balance Effective Date Date 6
10 Current Interest Rate Numeric - 8,3 8
11 Current Deferred UPB Numeric 12
12 Due Date of Last Paid Installment (DDLPI) Date 6
13 MI Recoveries Numeric - 12,2 12
14 Net Sales Proceeds Alpha-Numeric 14
15 Non MI Recoveries Numeric - 12,2 12
16 Expenses Numeric - 12,2 12
17 Legal Costs Numeric - 12,2 12
18 Maintenance and Preservation Costs Numeric - 12,2 12
19 Taxes and Insurance Numeric - 12,2 12
20 Miscellaneous Expenses Numeric - 12,2 12
21 Actual Loss Calculation Numeric - 12,2 12
22 Modification Cost Numeric - 12,2 12
23 Step Modification Flag Alpha 1
24 Deferred Payment Plan Alpha 1
25 Estimated Loan-to-Value (ELTV) Numeric 4
26 Zero Balance Removal UPB Numeric - 12,2 12
27 Delinquent Accrued Interest Numeric - 12,2 12
28 Delinquency Due to Disaster Alpha 1
29 Borrower Assistance Status Code Alpha 1
30 Current Month Modification Cost Numeric - 12,2 12
31 Interest Bearing UPB Numeric - 12,2 12

Data Load & Clean

About Missing Values

(how were handling missing values)

Categorical Variable

(“encoding the categorical values”)

Target Variable

(“ever 90+ days delinquent”)

Sampling

(about imbalances, “weighted sampling or SMOTE”)

Data Exploration

(data exploration and descriptive stats)

Data Visualizations

(Are we showing distributions and such?)