Part 1: Standard Pipeline

The core end-to-end workflow for IFRS 9 ECL calculation. These examples cover the fundamental steps required for a compliant model.

1. Data Ingestion

Loading raw loan-level data and merging with macroeconomic indicators.

import pandas as pd # Load loan portfolio and macro data df_loans = pd.read_csv('raw_data/loan_portfolio.csv') df_macro = pd.read_excel('raw_data/macro_forecasts.xlsx') # Merge macro data onto loan data based on reporting date df_merged = pd.merge(df_loans, df_macro, on='reporting_date', how='left')
PROC IMPORT DATAFILE="raw_data/loan_portfolio.csv" OUT=work.loans DBMS=CSV REPLACE; RUN; PROC IMPORT DATAFILE="raw_data/macro_forecasts.xlsx" OUT=work.macro DBMS=XLSX REPLACE; RUN; PROC SQL; CREATE TABLE work.merged AS SELECT a.*, b.* FROM work.loans a LEFT JOIN work.macro b ON a.reporting_date = b.reporting_date; QUIT;

2. Data Quality Check

Basic validation for negative exposure and missing ratings.

# Fill missing ratings df_merged['rating'] = df_merged['rating'].fillna('Unrated') # Filter out invalid records (negative exposure) df_clean = df_merged[df_merged['exposure'] >= 0].copy()
DATA work.clean; SET work.merged; IF exposure >= 0; IF MISSING(rating) THEN rating = "Unrated"; RUN;

3. Segmentation

Aggregating portfolio by product type for high-level analysis.

segment_stats = df_clean.groupby(['product_type']).agg({ 'exposure': 'sum', 'loan_id': 'count' }).reset_index()
PROC SUMMARY DATA=work.clean NWAY; CLASS product_type; VAR exposure; OUTPUT OUT=work.segments SUM(exposure)=total_exposure N(exposure)=count; RUN;

4. Staging Assessment

Applying IFRS 9 logic (30 DPD, 90 DPD) to determine stages.

import numpy as np conditions = [ (df_clean['default_flag'] == 1) | (df_clean['dpd'] > 90), (df_clean['dpd'] > 30) | (df_clean['watchlist'] == 1) ] df_clean['stage'] = np.select(conditions, [3, 2], default=1)
DATA work.staged; SET work.clean; IF default_flag = 1 OR dpd > 90 THEN stage = 3; ELSE IF dpd > 30 OR watchlist = 1 THEN stage = 2; ELSE stage = 1; RUN;

5. ECL Calculation

Computing the final provision amount.

df_clean['applied_pd'] = np.where(df_clean['stage'] == 1, df_clean['pd_12m'], df_clean['pd_life']) df_clean['df'] = 1 / (1 + df_clean['eir']) df_clean['ecl'] = df_clean['applied_pd'] * df_clean['lgd'] * df_clean['exposure'] * df_clean['df']
DATA work.final; SET work.staged; IF stage = 1 THEN applied_pd = pd_12m; ELSE applied_pd = pd_life; df = 1 / (1 + eir); ecl = applied_pd * lgd * exposure * df; RUN;

Part 2: Advanced Techniques

Techniques for handling large datasets, improving model accuracy, and implementing complex business rules.

Big Data Optimization

Using chunking and indexing to handle millions of records.

chunk_size = 100000 chunks = [] for chunk in pd.read_csv('huge_portfolio.csv', chunksize=chunk_size): chunk['exposure'] = pd.to_numeric(chunk['exposure'], downcast='float') chunks.append(chunk) df_all = pd.concat(chunks)
PROC DATASETS LIBRARY=work; MODIFY large_portfolio; INDEX CREATE loan_id / NOMISS UNIQUE; QUIT;

Outlier Detection

Using Z-scores to flag anomalies.

from scipy import stats df_clean['exposure_z'] = np.abs(stats.zscore(df_clean['exposure'])) outliers = df_clean[df_clean['exposure_z'] > 3]

Model Metrics (Gini)

Validating discriminatory power.

from sklearn.metrics import roc_curve, auc fpr, tpr, _ = roc_curve(y_true, y_prob) gini = 2 * auc(fpr, tpr) - 1

Staging with Cooling Period

Probation logic for Stage 2 to 1 transfers.

in_probation = (df['months_since_trigger'] < 3) & (df['prev_stage'] == 2) df['stage'] = np.select([df['dpd']>30 | in_probation], [2], default=1)

Lifetime Survival Calculation

Summing marginal ECL over the term structure.

lifetime_ecl = np.zeros(len(df)) for t in range(5): df_t = (1 + eir) ** -(t + 1) lifetime_ecl += (pd_curves[:, t] * lgd * exposure * df_t)
ARRAY marginal_pd[5] pd1-pd5; DO t = 1 TO 5; df = (1 + eir) ** -t; loss = marginal_pd[t] * lgd * exposure * df; lifetime_ecl = SUM(lifetime_ecl, loss); END;