Commercial Bank - PD Modeling Analyst Hiring Test

Position: SAS Modeling Analyst - Probability of Default (PD) Models
Department: Commercial Credit Risk

Technical SAS Questions

1. Data Step vs. PROC SQL

When would you prefer using a DATA STEP over PROC SQL for merging datasets, and vice versa?

Answer: - DATA STEP (e.g., MERGE): Better for complex, sequential operations (e.g., BY-group processing, conditional merging). - PROC SQL: More efficient for simple joins (e.g., LEFT JOIN, INNER JOIN) and aggregations. Example:
/* DATA STEP merge (for sorted datasets) */ DATA merged; MERGE dataset1 dataset2; BY customer_id; RUN; /* PROC SQL (for flexible joins) */ PROC SQL; CREATE TABLE merged AS SELECT a.*, b.* FROM dataset1 a LEFT JOIN dataset2 b ON a.customer_id = b.customer_id; QUIT;
2. Macro Variables

Explain the difference between %LET and CALL SYMPUTX. Provide an example where you would use a macro variable in PD modeling.

Answer: - %LET: Compile-time assignment (static). - CALL SYMPUTX: Runtime assignment (dynamic, e.g., from a DATA step). Example:
/* %LET for static values */ %LET cutoff_date = '31DEC2023'd; /* CALL SYMPUTX for dynamic values */ DATA _null_; SET default_data END=last; IF last THEN CALL SYMPUTX('total_defaults', _N_); RUN; /* Usage in PD modeling */ PROC LOGISTIC DATA=loans; WHERE date <= &cutoff_date; MODEL default(event='1') = ltv income; RUN;

Banking & Risk-Specific SAS Questions

6. Time Aggregation for PD Data

In SAS, how would you handle panel data (e.g., multiple snapshots of borrower data over time) before fitting a PD model?

Answer:
/* Aggregate to one record per borrower (latest snapshot) */ PROC SORT DATA=panel_data; BY customer_id date; RUN; DATA latest_snapshot; SET panel_data; BY customer_id; IF last.customer_id; /* Keep most recent observation */ RUN;

Scoring Guidance

Score Criteria
Strong (9-10) Efficient code, explains banking context (e.g., Basel), uses advanced techniques (macros, PHREG).
Average (5-8) Basic SAS skills but lacks optimization/regulatory awareness.
Weak (0-4) Struggles with DATA steps, PROC SQL, or PD-specific requirements.