From standard workflows to advanced optimization techniques
The core end-to-end workflow for IFRS 9 ECL calculation. These examples cover the fundamental steps required for a compliant model.
Loading raw loan-level data and merging with macroeconomic indicators.
import pandas as pd
# Load loan portfolio and macro data
df_loans = pd.read_csv('raw_data/loan_portfolio.csv')
df_macro = pd.read_excel('raw_data/macro_forecasts.xlsx')
# Merge macro data onto loan data based on reporting date
df_merged = pd.merge(df_loans, df_macro, on='reporting_date', how='left')
PROC IMPORT DATAFILE="raw_data/loan_portfolio.csv" OUT=work.loans DBMS=CSV REPLACE; RUN;
PROC IMPORT DATAFILE="raw_data/macro_forecasts.xlsx" OUT=work.macro DBMS=XLSX REPLACE; RUN;
PROC SQL;
CREATE TABLE work.merged AS
SELECT a.*, b.*
FROM work.loans a LEFT JOIN work.macro b
ON a.reporting_date = b.reporting_date;
QUIT;
Basic validation for negative exposure and missing ratings.
# Fill missing ratings
df_merged['rating'] = df_merged['rating'].fillna('Unrated')
# Filter out invalid records (negative exposure)
df_clean = df_merged[df_merged['exposure'] >= 0].copy()
DATA work.clean;
SET work.merged;
IF exposure >= 0;
IF MISSING(rating) THEN rating = "Unrated";
RUN;
Aggregating portfolio by product type for high-level analysis.
segment_stats = df_clean.groupby(['product_type']).agg({
'exposure': 'sum',
'loan_id': 'count'
}).reset_index()
PROC SUMMARY DATA=work.clean NWAY;
CLASS product_type;
VAR exposure;
OUTPUT OUT=work.segments SUM(exposure)=total_exposure N(exposure)=count;
RUN;
Applying IFRS 9 logic (30 DPD, 90 DPD) to determine stages.
import numpy as np
conditions = [
(df_clean['default_flag'] == 1) | (df_clean['dpd'] > 90),
(df_clean['dpd'] > 30) | (df_clean['watchlist'] == 1)
]
df_clean['stage'] = np.select(conditions, [3, 2], default=1)
DATA work.staged;
SET work.clean;
IF default_flag = 1 OR dpd > 90 THEN stage = 3;
ELSE IF dpd > 30 OR watchlist = 1 THEN stage = 2;
ELSE stage = 1;
RUN;
Computing the final provision amount.
df_clean['applied_pd'] = np.where(df_clean['stage'] == 1, df_clean['pd_12m'], df_clean['pd_life'])
df_clean['df'] = 1 / (1 + df_clean['eir'])
df_clean['ecl'] = df_clean['applied_pd'] * df_clean['lgd'] * df_clean['exposure'] * df_clean['df']
DATA work.final;
SET work.staged;
IF stage = 1 THEN applied_pd = pd_12m; ELSE applied_pd = pd_life;
df = 1 / (1 + eir);
ecl = applied_pd * lgd * exposure * df;
RUN;
Techniques for handling large datasets, improving model accuracy, and implementing complex business rules.
Using chunking and indexing to handle millions of records.
chunk_size = 100000
chunks = []
for chunk in pd.read_csv('huge_portfolio.csv', chunksize=chunk_size):
chunk['exposure'] = pd.to_numeric(chunk['exposure'], downcast='float')
chunks.append(chunk)
df_all = pd.concat(chunks)
PROC DATASETS LIBRARY=work;
MODIFY large_portfolio;
INDEX CREATE loan_id / NOMISS UNIQUE;
QUIT;
Using Z-scores to flag anomalies.
from scipy import stats
df_clean['exposure_z'] = np.abs(stats.zscore(df_clean['exposure']))
outliers = df_clean[df_clean['exposure_z'] > 3]
Validating discriminatory power.
from sklearn.metrics import roc_curve, auc
fpr, tpr, _ = roc_curve(y_true, y_prob)
gini = 2 * auc(fpr, tpr) - 1
Probation logic for Stage 2 to 1 transfers.
in_probation = (df['months_since_trigger'] < 3) & (df['prev_stage'] == 2)
df['stage'] = np.select([df['dpd']>30 | in_probation], [2], default=1)
Summing marginal ECL over the term structure.
lifetime_ecl = np.zeros(len(df))
for t in range(5):
df_t = (1 + eir) ** -(t + 1)
lifetime_ecl += (pd_curves[:, t] * lgd * exposure * df_t)
ARRAY marginal_pd[5] pd1-pd5;
DO t = 1 TO 5;
df = (1 + eir) ** -t;
loss = marginal_pd[t] * lgd * exposure * df;
lifetime_ecl = SUM(lifetime_ecl, loss);
END;