Question 1 – Code Review
Look at the following snippet and tell me what it does, why the data-step-merge is risky, and how you would refactor it to avoid accidental many-to-many joins and missing BY values.
/* What it does: left-joins loans to financials on facility_id, keeps only matches. */
/* Risks: many-to-many merge if duplicates exist; silently drops records with missing BY values. */
proc sql;
create table merged as
select l.*, f.*
from loans as l
join financials as f
on l.facility_id = f.facility_id;
quit;
Question 2 – PD Vintage Analysis
Using only PROC SQL or the data step, calculate the cumulative 12-month default rate for each origination vintage.
proc sql;
create table vintage_pd as
select vintage_qtr,
sum(flag_default)/count(distinct facility_id) as cum12_pd
from (
select l.vintage_qtr,
l.facility_id,
max(case when intck('month',l.orig_date,p.report_date) between 1 and 12
and p.dpd ge 90 then 1 else 0 end) as flag_default
from loans l
join perf.monthly p
on l.facility_id = p.facility_id
group by l.vintage_qtr, l.facility_id
)
group by vintage_qtr;
quit;
Question 3 – MNAR Imputation
Describe two SAS procedures or macros you could use to implement multiple-imputation by chained equations (MICE) and explain how you would pool the resulting PD parameter estimates.
/* PROC MI with FCS for MICE */
proc mi data=loans nimpute=10 seed=12345;
fcs nbiter=20;
var debt_service_coverage_ratio ltv dscr other_ratios;
run;
/* PROC MIANALYZE to pool logistic regression results */
proc mianalyze parms=parms_pd;
modeleffects intercept ltv dscr;
run;
Question 4 – PROC LOGISTIC Fine-Tuning
Show the exact options you would add to PROC LOGISTIC to fit a PD model with backward selection (SBC), forced zero intercept, exposure weighting, and export ROC & parameter estimates.
proc logistic data=train;
model default(event='1') = &vars
/ selection=backward(select=SBC) noint weight=exposure_wgt;
output out=scored pred=p;
roc; ods output roc=rocdata;
ods output parameterestimates=parms;
run;
Question 5 – Scalability Debate
Compare Base-SAS multithreaded, PROC HPLOGISTIC, and CAS actions on Viya for scoring 25 million facilities quarterly. Which do you recommend?
Base-SAS: single-machine RAM-bound.
PROC HPLOGISTIC: multithreaded, up to ~10 M rows.
CAS actions (Viya): distributed memory, linear scale.
Recommendation: use CAS action logistic in Viya if available; else PROC HPLOGISTIC.
Question 6 – Data-Step Array Puzzle
Given 60 monthly PD forecasts (pd1–pd60), compute cumulative 12-month PD under conditional independence.
data want;
set have;
array pd{60} pd1-pd60;
cum12_pd = 1 - exp(sum(log(1-pd{i}}), i=1 to 12));
run;
Question 7 – Hash Object vs. Formats
Explain when a hash object is preferable to a user-defined format for NAICS look-up and show skeleton code.
/* Hash – dynamic, multi-var return */
if _n_=0 then do;
declare hash h(dataset:'industry_map');
h.defineKey('naics');
h.defineData('risk_weight');
h.defineDone();
end;
rc = h.find();
/* Format – fastest, single var */
proc format;
value $naics_fmt
'311'='Low'
'312'='Medium';
run;
Question 8 – Macro Facility Debugging
Describe systematic debugging steps and defensive coding for an inherited macro that throws “Apparent symbolic reference not resolved”.
/* Debug steps */
options mprint mlogic symbolgen;
%put _user_;
/* Defensive macro */
%macro score_pd(lib=, ds=);
%if %length(&lib)=0 or %length(&ds)=0 %then %do;
%put ERROR: parameters missing;
%return;
%end;
...
%mend;
Question 9 – Stress-Testing Integration
Outline a macro that re-calculates PDs under 20 macro scenarios, writes outputs, and logs audit metadata.
%macro stress_pd(scen_file=);
proc import datafile="&scen_file" out=scenarios dbms=csv replace; run;
data _null_;
set scenarios end=eof;
call symputx(cats('scen',_n_),scenario_name);
call symputx('nscen',_n_);
run;
%do i=1 %to &nscen;
data loans_scen; set loans; scenario="&&scen&i"; macro_pd=...; run;
proc append base=results.pd_scenarios data=loans_schen force; run;
proc sql; insert into audit_log set timestamp=datetime(), scenario="&&scen&i", nobs=&sqlobs; quit;
%end;
%mend;
Question 10 – Git & Enterprise Guide Workflow
Propose a workflow so analysts can use Enterprise Guide interactively while production runs the exact Git-tagged version.
1. Clone repo to local workspace.
2. EG project links *.sas files via relative paths; .egp not committed.
3. Commit .sas files with Git; tag releases.
4. Production batch job runs: sasbatch.bat myfile.sas from tagged release branch.