Skip to main content

Research

Research for Accountable Credit Decisioning

NomoCrit began as a research effort in credit decisioning, model governance, fairness-aware lending systems, and explainable AI. Today the platform is operational, but the research program remains intentionally unfinished. Credit decisioning sits at the intersection of prediction, policy, access, regulation, and institutional accountability. The goal is not to claim these problems are solved. It is to build infrastructure through which they can be measured, studied, audited, and improved.

Overview

Why This Research Exists

Most production credit systems still inherit assumptions from scorecard-era architectures. Even when machine learning is introduced, the surrounding logic often stays unchanged — a bureau score, a fixed threshold, a few rules, and a static workflow. That structure is familiar and easy to operate, but it is technically limited as lenders serve heterogeneous, thin-file, and digitally originated borrowers.

The lineage of this work traces back to research under Prof. Vijay Keswani at IIT Delhi, focused on dynamic thresholding, counterfactual recourse, and cluster-level fairness analysis. That bounded investigation expanded into a platform for banks, NBFCs, fintech lenders, MFIs, and alternative credit providers operating in real decision environments. The questions below are why the program continues.

Bureau-centric representation

A borrower is often reduced to a credit score, a few delinquency flags, and utilization. That compression is insufficient for thin-file, informally employed, or newly originated borrowers.

Historical labels are not neutral

Default data is the product of prior approval policy and unequal access. A model can learn that underserved groups are riskier when the labels reflect exclusion, not capacity to repay.

Thin-file borrowers underserved

Applicants with sparse history are routed to blunt reject rules — not because their risk is understood, but because the system lacks calibrated uncertainty and alternative-data tooling.

A score is not an explanation

A probability of default does not say why an application was declined, which variables mattered, or whether the decision was sensitive to threshold choice.

Recourse is underdeveloped

Adverse-action guidance is often generic. A path that is mathematically valid but operationally impossible — or that breaks after retraining — is not meaningful recourse.

Static thresholds underperform

A single global cut-off assumes one decision boundary fits heterogeneous populations. The same threshold can be too conservative in one cluster and too permissive in another.

Population shift is the norm

Product mix, geographies, and macro conditions change. A model can become miscalibrated or uneven across subgroups without any dramatic collapse in top-line AUC.

Fairness begins upstream

The problem starts in data generation, approval policy, reject handling, and feature construction — not in a post-hoc compliance check applied at the end.

Active Research Areas

Nine open lines of work

Each area is framed as a question we are still answering, with the current lines of inquiry, why it matters operationally, and what remains unresolved.

01Fairness-Aware Credit ModellingCan a lending system be fair across the slices where decisions actually land — products, ticket sizes, geographies, and proxy-sensitive groups — not only in aggregate?
Current Lines of Inquiry
  • Statistical parity and approval-rate disparity across product segments
  • Equal opportunity and equalized odds treated as diagnostic signals, not optimization targets
  • Groupwise calibration: does a 0.30 predicted default probability mean the same realized outcome across groups?
  • Proxy bias detection — separating lawful signal from harmful proxy correlation
  • Lending-specific tradeoffs studied as a constrained decision problem
Why It Matters

Aggregate acceptability can mask sharp local disparity. Fairness interacts with approval rates, default rates, provisioning, collections, and capital allocation — a change to one can shift burden elsewhere.

Open Challenges

Labels reflect selective approval history. Protected attributes are often partial or unavailable. No single metric resolves whether a pipeline is fair, useful, and viable at once.

02Bias Detection and MitigationWhere does bias enter a lending pipeline, and which disparities are correctable model artifacts versus structural risk differences?
Current Lines of Inquiry
  • Segment-level error analysis
  • Policy-threshold sensitivity analysis
  • Groupwise calibration studies
  • Feature-contribution auditing and proxy correlation review
  • Historical portfolio disparity mapping
  • Remediation testing under controlled shadow deployment
Why It Matters

In lending, bias is as much a systems problem as a measurement task. It can enter through missing data, target leakage, legacy policy rules, selective labels, proxy-rich features, and threshold choice.

Open Challenges

Not every disparity can or should be corrected automatically. Some reveal structural conditions or product-design problems rather than model discrimination. Mitigation is treated with caution.

03Explainability ResearchWhat makes an explanation faithful, stable, and useful — to underwriters, compliance officers, and borrowers — rather than merely a list of top features?
Current Lines of Inquiry
  • SHAP-based local attribution and its stability across adjacent model versions
  • Layered global importance — portfolio, product, cluster, time-window, and drift-conditioned
  • Decision traceability: model version, threshold version, segment assignment, explanation, recourse state, policy flags
  • Adverse-action reasoning as a translation problem between model behavior and institutional communication
Why It Matters

A score by itself cannot be audited, defended, or acted on. Decision-level traceability is what lets institutions answer why a decision was made under a specific model state.

Open Challenges

Explanations can be technically correct yet operationally unhelpful, or simplified enough to mislead. The goal is usefulness without sacrificing faithfulness — not explanation volume.

04Counterfactual RecourseIs the path to approval actionable, plausible, fair — and still valid after the model changes?
Current Lines of Inquiry
  • Feasibility and immutability constraints on recommended changes
  • Causal consistency and product-policy compatibility
  • Temporal validity across retraining
  • Whether counterfactuals support human review more effectively than raw feature attributions
Why It Matters

A system that can only say no, without a structured path forward, is incomplete. Recourse touches borrower communication, internal appeals, accountability, and policy stress testing.

Open Challenges

Recourse optimized for proximity to the boundary can be unrealistic. Recommendations also break after retraining — a blind spot that motivated formalizing recourse validity degradation as a measurable quantity.

05Dynamic Threshold OptimizationWhen should decision boundaries vary across segments, on what basis, and under what governance controls?
Current Lines of Inquiry
  • How much value is lost by static global thresholds
  • Which segmentation strategy best supports threshold adaptation
  • When dynamic thresholding improves profitability but worsens subgroup disparity
  • How threshold updates should be governed under drift
  • What calibration guarantees are required before optimization becomes trustworthy
Why It Matters

The conversion from score to decision is mediated by thresholds, policy, capital, and economics. Threshold policy affects approval rates, default rates, portfolio yield, fairness, and manual-review routing.

Open Challenges

Dynamic thresholding is where model science meets business policy. Gains in one objective can degrade another, so it is treated as a research layer of the system, not a deployment constant.

06Alternative Data EvaluationDoes an alternative signal add stable, lawful, interpretable, and governance-compatible value beyond bureau and internal data?
Current Lines of Inquiry
  • Incremental predictive value relative to bureau and internal data
  • Stability of alternative features under drift
  • Proxy-bias risk and calibration contribution
  • Product-specific utility against documentation and explainability burden
Why It Matters

Domains range from cash-flow and bank-transaction summaries to GST or invoice patterns, device signals, merchant flows, platform income, and utility regularity. Each is promising and easily overclaimed.

Open Challenges

More data is not automatically better. Signal gain alone is not enough — the operating principle is additive evidence under governance constraints.

07Reject Inference and Selective LabelsHow severe is sample-selection bias, and how much observed subgroup disparity is an artifact of selective labels?
Current Lines of Inquiry
  • Severity of sample selection bias in historical training data
  • Which reject-inference assumptions are defensible in specific product settings
  • Whether shadow deployment or policy experiments can improve label coverage
  • Disentangling genuine disparity from selective labeling
Why It Matters

Institutions observe realized outcomes only for approved borrowers. Rejected applicants generate no labels — a selective-label problem at the heart of model development.

Open Challenges

This remains one of the foundational unresolved problems in credit modelling. Any institution serious about fairness or benchmarking eventually confronts it. We treat it as core research, not a detail.

08Human-AI Collaborative UnderwritingHow should machine intelligence and human judgement be combined to improve consistency, transparency, and auditability?
Current Lines of Inquiry
  • When high-uncertainty cases should be routed to human review
  • Which explanation formats improve underwriter consistency
  • Whether segment-aware policies reduce unnecessary rejections without raising manual load
  • How human overrides are logged and fed back into governance
Why It Matters

Many lenders need better visibility, segmentation, and control over parts of the decision process — not wholesale replacement. A large design space sits between manual underwriting and full automation.

Open Challenges

The long-term position is not that automation eliminates human judgement, but that intelligence is structured to improve consistency and auditability where human review remains necessary.

09Governance and MonitoringHow do you detect meaningful change — drift, miscalibration, subgroup instability — and connect it to action?
Current Lines of Inquiry
  • PSI as one part of a broader stack: meaningful warning versus noisy artifact
  • CSI for feature- and segment-level characteristic change
  • Drift detection across covariate, label, concept, segment-mix, and pipeline shifts
  • Calibration monitoring, champion-challenger evaluation, and measurable retraining triggers
Why It Matters

A model can keep strong rank ordering while becoming poorly calibrated. Probabilities feed thresholds, pricing, and review, so monitoring and governance must be first-class, not retrofitted.

Open Challenges

Tooling is necessary but insufficient. No monitoring system helps unless it is connected to governance routines, escalation paths, and retraining decisions.

Methodological Contributions

Proposed internal metrics

Several strands of the original research led to internal metrics intended to measure aspects of the decision system that standard classification metrics miss.

DTPGDynamic Threshold Profit Gain
What It Measures

The incremental value of segment-aware threshold policies relative to a single static threshold baseline.

Why It Was Developed

If borrower populations are heterogeneous, a uniform threshold may leave measurable value on the table. DTPG reframes thresholding as a measurable policy problem rather than a heuristic tuning exercise.

Status — Proposed, under active evaluation
RVDRRecourse Validity Degradation Rate
What It Measures

How often previously valid recourse recommendations become invalid after a model update or retraining.

Why It Was Developed

Most recourse is generated once and never monitored through lifecycle changes. RVDR converts recourse from a static explanation output into a monitored governance object.

Status — Proposed, under active evaluation
CFDICluster-Fairness Disparity Index
What It Measures

Fairness instability across borrower clusters or subgroups, even when aggregate fairness metrics appear acceptable.

Why It Was Developed

Population-level fairness can mask severe local disparities. CFDI surfaces a common failure mode where product mix, ticket size, region, and borrower type create sharp within-portfolio heterogeneity.

Status — Proposed, under active evaluation

These metrics remain under active evaluation and should be interpreted as research instruments rather than universal standards or validated industry benchmarks.

Empirical Foundation

Evidence and Validation

The foundation is not purely conceptual. It has been informed by working empirical pipelines across academic and industry-style datasets.

Public Benchmark Lineage

Earlier work on the Taiwan Credit dataset established the end-to-end research pipeline: feature engineering, clustering, threshold optimization, recourse generation, fairness auditing, drift monitoring, and dashboarding. That stage demonstrated the feasibility of the broader architecture and produced early evidence for threshold and recourse research.

Larger Portfolio-Style Evaluation

Subsequent development extended the work into realistic portfolio analyses, including CRIF-style structured credit data. Those studies mattered not because they solved every modelling issue, but because they exposed the operational depth real institutions require:

  • Portfolio segmentation
  • Cohort analysis
  • Product-level stress behavior
  • Dynamic policy simulation
  • Governance reporting
  • Explainability & fairness integration

The evidence so far is encouraging, but it should be interpreted carefully. Strong results in one dataset or portfolio context are not universal claims. The program is oriented around replication, shadow testing, and institution-specific validation.

Limitations

What Remains Unresolved

A serious research program must state clearly what it has not solved. These are the constraints we work inside, not around.

01

Fairness is not solved

No fairness metric or remediation procedure fully resolves the normative and operational complexity of lending. Different criteria conflict, protected attributes are partial, proxies are embedded, and labels are selective.

02

Explainability can become performative

It is possible to generate explanations that are technically correct but operationally unhelpful, or simplified enough to mislead. The challenge is improving usefulness without sacrificing faithfulness.

03

Recourse quality is hard to guarantee

Counterfactual recourse is sensitive to feature design, causal assumptions, policy rules, and model updates. Stable, feasible, borrower-meaningful recourse in real environments remains open.

04

Alternative data stays governance-heavy

Even when alternative signals improve prediction, they can increase interpretability burden, proxy-bias risk, and regulatory complexity. Signal gain alone is not enough.

05

Monitoring requires institutional discipline

No monitoring system helps unless it is connected to governance routines, escalation paths, and retraining decisions. Tooling is necessary but insufficient on its own.

Future Research Roadmap

Where the Research Is Going

Future investigations that extend beyond current scoring, fairness, and monitoring layers. Framed as open directions, not shipped features.

Direction 01

Graph-Based Credit Intelligence

Borrowers, merchants, co-applicants, employers, devices, and transaction networks form relational structures. Graph approaches may surface shared-risk and ecosystem-level patterns that flat tabular models miss.

Direction 02

Causal Inference for Lending

Most credit systems remain associational. Causal inference offers a more principled route to policy analysis, recourse realism, and selective-label reasoning — treatment effects of approval policy, collections, and routing.

Direction 03

Dynamic Borrower Representations

Borrowers are not static vectors; their behavior evolves. Future work includes richer temporal and longitudinal representations that model trajectory rather than snapshot features alone.

Direction 04

Foundation Models for Financial Decisioning

Large representation models for tabular, transactional, textual, and multi-modal data may reshape credit modelling. Calibration, privacy, controllability, auditability, and transparency remain open questions.

Direction 05

Multi-Modal Risk Assessment

Credit decisions increasingly draw from structured fields, transaction summaries, documents, GST records, bank statements, and underwriting notes. Multi-modal architectures may improve signal but raise governance complexity.

Direction 06

Agentic Underwriting Systems

Whether decision-support agents can coordinate document review, policy retrieval, explanation, and adverse-action drafting. Approached cautiously: agentic systems in lending must be bounded, auditable, and constrained by clear approval authority.

Direction 07

Responsible AI in Lending

The umbrella theme. The objective is not abstract principles but operationalizing responsible AI through measurable workflows — monitoring, review, explanation, recourse, validation, and documented governance.

Collaboration

Institutional Research Collaborations

The ideal collaborations are not procurement exercises. They are research-informed institutional pilots. NomoCrit is open to work with:

BanksNBFCsFintech lendersMicrofinance institutionsDigital lending platformsAlternative credit providers
Possible Collaboration Formats

Historical portfolio analysis

Run NomoCrit on historical loan data to study segmentation, threshold policy, fairness behavior, calibration, and portfolio health under alternative decision strategies.

Fairness audits

Evaluate aggregate and subgroup disparities across products, geographies, and borrower segments — and identify where population-level acceptability masks localized failures.

Explainability audits

Assess whether existing model explanations are faithful, stable, useful to credit teams, and suitable for adverse-action workflows.

Shadow deployment pilots

Run NomoCrit alongside an existing process without immediate production replacement. Compare recommendations, threshold effects, explanation quality, and subgroup behavior under controlled observation.

Model benchmarking

Benchmark scorecards, tree ensembles, segment-aware systems, and hybrid workflows under shared governance metrics rather than predictive lift alone.

Reject inference studies

Study the extent of sample selection bias and evaluate institution-specific strategies for addressing missing rejected-applicant outcomes.

Portfolio health assessment

Analyze vintage behavior, stress concentration, drift, calibration decay, and emerging risk pockets across product segments.

Governance implementation

Design and test practical workflows for fairness review, drift escalation, explanation logging, model version control, and retraining triggers.

Closing Position

The platform is real. The deeper work remains ongoing.

Credit decisioning cannot be handled by static scorecards, isolated dashboards, or one-time fairness checks. It requires systems that support continuous experimentation, measurement, audit, and revision. The most valuable collaborations are those willing to test assumptions, expose limitations, and improve systems through evidence rather than narrative. That is the purpose of this program.