Hiring Manager Guide: Commercial PD Modeling Team
Interviewer's Preamble: "Thank you for coming in today. As you know, our team is responsible for developing, maintaining, and validating Probability of Default models for our commercial loan portfolios. SAS is our primary tool for data manipulation, analysis, and modeling. These next few questions are designed to help me understand your practical SAS skills and your approach to solving problems we face every day."
"Imagine you have two datasets. CUSTOMER_MASTER contains unique customer information like CustomerID, IndustryCode, and YearsInBusiness. LOAN_PERFORMANCE contains monthly performance data for every loan, with fields like LoanID, CustomerID, SnapshotDate, and DaysPastDue. How would you combine these two datasets to bring the customer's IndustryCode into the loan performance table? Please describe how you might do this using both a DATA Step MERGE and PROC SQL."
PROC SORT both datasets by the key (CustomerID) before the merge. This is a fundamental check.LEFT JOIN, with LOAN_PERFORMANCE as the left table, and identify the correct join key in an ON clause.PROC SQL's flexibility for complex joins or the potential efficiency of a DATA Step MERGE on large, pre-sorted tables."What is the practical difference between using a WHERE statement and an IF statement to filter observations within a DATA step? When would you choose one over the other?"
WHERE statement filters data as it's read *before* it enters the Program Data Vector (PDV). An IF statement filters *after* the observation is in the PDV.WHERE is more efficient for filtering based on existing variables. IF is necessary when filtering on a variable that was just created within the same DATA step."Let's say you have the SnapshotDate and the OriginationDate for each loan. You need to create a new variable called LoanAgeMonths. How would you code this in a DATA step?"
INTCK function. The correct syntax would be LoanAgeMonths = INTCK('MONTH', OriginationDate, SnapshotDate);.INTCK counts boundaries and considering if the business definition of "age" requires a different approach."In our commercial portfolio data, financial statement variables often have missing values. What are some SAS techniques you would use to identify and then handle these missing values before modeling?"
PROC MEANS (with NMISS), PROC FREQ (with the /MISSING option), or PROC UNIVARIATE.PROC STDIZE or mean/median from PROC MEANS), and creating a binary flag "Is_Variable_Missing" as a predictive feature."You need to create a summary table that shows the total exposure, average risk rating, and number of loans for each IndustryCode. How could you accomplish this using PROC MEANS and then using PROC SQL?"
CLASS statement for the grouping variable (IndustryCode), a VAR statement for the variables to summarize, and an OUTPUT statement to create a new dataset.SELECT statement with aggregate functions (SUM(), AVG(), COUNT()) and a GROUP BY clause."For building a PD model, PROC LOGISTIC is our standard tool. Can you describe three key options or statements within this procedure that you would use, and what information you would look for in the output to assess the initial model?"
MODEL statement, the CLASS statement (for categorical variables), the SELECTION= option for variable selection, and the OUTPUT statement to save predicted probabilities."Describe a situation where you would use a SAS Macro. Can you explain the difference between %LET and CALL SYMPUTX for creating macro variables?"
%LET is a global statement to assign a static value. CALL SYMPUTX is a DATA step function used to assign a value from a dataset observation to a macro variable, demonstrating a deeper, more dynamic usage."Our portfolio data can be very large, with millions of rows. Your code is running too slowly. What are the first three things you would investigate to improve its performance?"
WHERE statements and KEEP/DROP options as early as possible to minimize data processing.PROC SORTs. Use indexes where available."You receive a dataset where a customer's payment history is in a 'wide' format, with columns like Pay_Status_Jan, Pay_Status_Feb, etc. For modeling, you need this in a 'long' format. How would you accomplish this transformation in SAS?"
PROC TRANSPOSE, as it's designed for this exact task.ARRAY and a DO loop is also a great answer and demonstrates strong fundamental data manipulation skills."Walk me through the high-level SAS steps you would take to prepare an analysis-ready dataset for a PD model, starting from the raw tables we discussed earlier. You don't need to write the exact code, but describe the procedures and data steps you would use in order."
PROC SQL), 2) Define Target Variable (DATA Step), 3) Clean & Engineer Features (DATA Step), 4) Perform EDA (PROC FREQ, PROC MEANS), and 5) Finalize Dataset for modeling.