Hiring Manager Guide: Commercial PD Modeling Team
Interviewer's Preamble: "Thank you for coming in today. As you know, our team is responsible for developing, maintaining, and validating Probability of Default models for our commercial loan portfolios. SAS is our primary tool for data manipulation, analysis, and modeling. These next few questions are designed to help me understand your practical SAS skills and your approach to solving problems we face every day."
"Imagine you have two datasets. CUSTOMER_MASTER
contains unique customer information like CustomerID
, IndustryCode
, and YearsInBusiness
. LOAN_PERFORMANCE
contains monthly performance data for every loan, with fields like LoanID
, CustomerID
, SnapshotDate
, and DaysPastDue
. How would you combine these two datasets to bring the customer's IndustryCode
into the loan performance table? Please describe how you might do this using both a DATA Step MERGE
and PROC SQL
."
PROC SORT
both datasets by the key (CustomerID
) before the merge. This is a fundamental check.LEFT JOIN
, with LOAN_PERFORMANCE
as the left table, and identify the correct join key in an ON
clause.PROC SQL
's flexibility for complex joins or the potential efficiency of a DATA Step MERGE
on large, pre-sorted tables."What is the practical difference between using a WHERE
statement and an IF
statement to filter observations within a DATA step? When would you choose one over the other?"
WHERE
statement filters data as it's read *before* it enters the Program Data Vector (PDV). An IF
statement filters *after* the observation is in the PDV.WHERE
is more efficient for filtering based on existing variables. IF
is necessary when filtering on a variable that was just created within the same DATA step."Let's say you have the SnapshotDate
and the OriginationDate
for each loan. You need to create a new variable called LoanAgeMonths
. How would you code this in a DATA step?"
INTCK
function. The correct syntax would be LoanAgeMonths = INTCK('MONTH', OriginationDate, SnapshotDate);
.INTCK
counts boundaries and considering if the business definition of "age" requires a different approach."In our commercial portfolio data, financial statement variables often have missing values. What are some SAS techniques you would use to identify and then handle these missing values before modeling?"
PROC MEANS
(with NMISS
), PROC FREQ
(with the /MISSING
option), or PROC UNIVARIATE
.PROC STDIZE
or mean/median from PROC MEANS
), and creating a binary flag "Is_Variable_Missing" as a predictive feature."You need to create a summary table that shows the total exposure, average risk rating, and number of loans for each IndustryCode
. How could you accomplish this using PROC MEANS
and then using PROC SQL
?"
CLASS
statement for the grouping variable (IndustryCode
), a VAR
statement for the variables to summarize, and an OUTPUT
statement to create a new dataset.SELECT
statement with aggregate functions (SUM()
, AVG()
, COUNT()
) and a GROUP BY
clause."For building a PD model, PROC LOGISTIC
is our standard tool. Can you describe three key options or statements within this procedure that you would use, and what information you would look for in the output to assess the initial model?"
MODEL
statement, the CLASS
statement (for categorical variables), the SELECTION=
option for variable selection, and the OUTPUT
statement to save predicted probabilities."Describe a situation where you would use a SAS Macro. Can you explain the difference between %LET
and CALL SYMPUTX
for creating macro variables?"
%LET
is a global statement to assign a static value. CALL SYMPUTX
is a DATA step function used to assign a value from a dataset observation to a macro variable, demonstrating a deeper, more dynamic usage."Our portfolio data can be very large, with millions of rows. Your code is running too slowly. What are the first three things you would investigate to improve its performance?"
WHERE
statements and KEEP
/DROP
options as early as possible to minimize data processing.PROC SORT
s. Use indexes where available."You receive a dataset where a customer's payment history is in a 'wide' format, with columns like Pay_Status_Jan
, Pay_Status_Feb
, etc. For modeling, you need this in a 'long' format. How would you accomplish this transformation in SAS?"
PROC TRANSPOSE
, as it's designed for this exact task.ARRAY
and a DO
loop is also a great answer and demonstrates strong fundamental data manipulation skills."Walk me through the high-level SAS steps you would take to prepare an analysis-ready dataset for a PD model, starting from the raw tables we discussed earlier. You don't need to write the exact code, but describe the procedures and data steps you would use in order."
PROC SQL
), 2) Define Target Variable (DATA Step), 3) Clean & Engineer Features (DATA Step), 4) Perform EDA (PROC FREQ
, PROC MEANS
), and 5) Finalize Dataset for modeling.