Research

Research for Accountable Credit Decisioning

NomoCrit began as a research effort in credit decisioning, model governance, fairness-aware lending systems, and explainable AI. Today the platform is operational, but the research program remains intentionally unfinished. Credit decisioning sits at the intersection of prediction, policy, access, regulation, and institutional accountability. The goal is not to claim these problems are solved. It is to build infrastructure through which they can be measured, studied, audited, and improved.

Explore Research Areas Collaboration opportunities →

Overview Research Areas Methodology Evidence Open Problems Collaboration

Overview

Why This Research Exists

Most production credit systems still inherit assumptions from scorecard-era architectures. Even when machine learning is introduced, the surrounding logic often stays unchanged — a bureau score, a fixed threshold, a few rules, and a static workflow. That structure is familiar and easy to operate, but it is technically limited as lenders serve heterogeneous, thin-file, and digitally originated borrowers.

The lineage of this work traces back to research under Prof. Vijay Keswani at IIT Delhi, focused on dynamic thresholding, counterfactual recourse, and cluster-level fairness analysis. That bounded investigation expanded into a platform for banks, NBFCs, fintech lenders, MFIs, and alternative credit providers operating in real decision environments. The questions below are why the program continues.

Bureau-centric representation

A borrower is often reduced to a credit score, a few delinquency flags, and utilization. That compression is insufficient for thin-file, informally employed, or newly originated borrowers.

Historical labels are not neutral

Default data is the product of prior approval policy and unequal access. A model can learn that underserved groups are riskier when the labels reflect exclusion, not capacity to repay.

Thin-file borrowers underserved

Applicants with sparse history are routed to blunt reject rules — not because their risk is understood, but because the system lacks calibrated uncertainty and alternative-data tooling.

A score is not an explanation

A probability of default does not say why an application was declined, which variables mattered, or whether the decision was sensitive to threshold choice.

Recourse is underdeveloped

Adverse-action guidance is often generic. A path that is mathematically valid but operationally impossible — or that breaks after retraining — is not meaningful recourse.

Static thresholds underperform

A single global cut-off assumes one decision boundary fits heterogeneous populations. The same threshold can be too conservative in one cluster and too permissive in another.

Population shift is the norm

Product mix, geographies, and macro conditions change. A model can become miscalibrated or uneven across subgroups without any dramatic collapse in top-line AUC.

Fairness begins upstream

The problem starts in data generation, approval policy, reject handling, and feature construction — not in a post-hoc compliance check applied at the end.

Active Research Areas

Nine open lines of work

Each area is framed as a question we are still answering, with the current lines of inquiry, why it matters operationally, and what remains unresolved.

01Fairness-Aware Credit ModellingCan a lending system be fair across the slices where decisions actually land — products, ticket sizes, geographies, and proxy-sensitive groups — not only in aggregate?

Current Lines of Inquiry

Statistical parity and approval-rate disparity across product segments
Equal opportunity and equalized odds treated as diagnostic signals, not optimization targets
Groupwise calibration: does a 0.30 predicted default probability mean the same realized outcome across groups?
Proxy bias detection — separating lawful signal from harmful proxy correlation
Lending-specific tradeoffs studied as a constrained decision problem

Why It Matters

Aggregate acceptability can mask sharp local disparity. Fairness interacts with approval rates, default rates, provisioning, collections, and capital allocation — a change to one can shift burden elsewhere.

Open Challenges

Labels reflect selective approval history. Protected attributes are often partial or unavailable. No single metric resolves whether a pipeline is fair, useful, and viable at once.

02Bias Detection and MitigationWhere does bias enter a lending pipeline, and which disparities are correctable model artifacts versus structural risk differences?

Current Lines of Inquiry

Segment-level error analysis
Policy-threshold sensitivity analysis
Groupwise calibration studies
Feature-contribution auditing and proxy correlation review
Historical portfolio disparity mapping
Remediation testing under controlled shadow deployment

Why It Matters

In lending, bias is as much a systems problem as a measurement task. It can enter through missing data, target leakage, legacy policy rules, selective labels, proxy-rich features, and threshold choice.

Open Challenges

Not every disparity can or should be corrected automatically. Some reveal structural conditions or product-design problems rather than model discrimination. Mitigation is treated with caution.

03Explainability ResearchWhat makes an explanation faithful, stable, and useful — to underwriters, compliance officers, and borrowers — rather than merely a list of top features?

Current Lines of Inquiry

SHAP-based local attribution and its stability across adjacent model versions
Layered global importance — portfolio, product, cluster, time-window, and drift-conditioned
Decision traceability: model version, threshold version, segment assignment, explanation, recourse state, policy flags
Adverse-action reasoning as a translation problem between model behavior and institutional communication

Why It Matters

A score by itself cannot be audited, defended, or acted on. Decision-level traceability is what lets institutions answer why a decision was made under a specific model state.

Open Challenges

Explanations can be technically correct yet operationally unhelpful, or simplified enough to mislead. The goal is usefulness without sacrificing faithfulness — not explanation volume.

04Counterfactual RecourseIs the path to approval actionable, plausible, fair — and still valid after the model changes?

Current Lines of Inquiry

Feasibility and immutability constraints on recommended changes
Causal consistency and product-policy compatibility
Temporal validity across retraining
Whether counterfactuals support human review more effectively than raw feature attributions

Why It Matters

A system that can only say no, without a structured path forward, is incomplete. Recourse touches borrower communication, internal appeals, accountability, and policy stress testing.

Open Challenges

Recourse optimized for proximity to the boundary can be unrealistic. Recommendations also break after retraining — a blind spot that motivated formalizing recourse validity degradation as a measurable quantity.

05Dynamic Threshold OptimizationWhen should decision boundaries vary across segments, on what basis, and under what governance controls?

Current Lines of Inquiry

How much value is lost by static global thresholds
Which segmentation strategy best supports threshold adaptation
When dynamic thresholding improves profitability but worsens subgroup disparity
How threshold updates should be governed under drift
What calibration guarantees are required before optimization becomes trustworthy

Why It Matters

The conversion from score to decision is mediated by thresholds, policy, capital, and economics. Threshold policy affects approval rates, default rates, portfolio yield, fairness, and manual-review routing.

Open Challenges

Dynamic thresholding is where model science meets business policy. Gains in one objective can degrade another, so it is treated as a research layer of the system, not a deployment constant.

06Alternative Data EvaluationDoes an alternative signal add stable, lawful, interpretable, and governance-compatible value beyond bureau and internal data?

Current Lines of Inquiry

Incremental predictive value relative to bureau and internal data
Stability of alternative features under drift
Proxy-bias risk and calibration contribution
Product-specific utility against documentation and explainability burden

Why It Matters

Domains range from cash-flow and bank-transaction summaries to GST or invoice patterns, device signals, merchant flows, platform income, and utility regularity. Each is promising and easily overclaimed.

Open Challenges

More data is not automatically better. Signal gain alone is not enough — the operating principle is additive evidence under governance constraints.

07Reject Inference and Selective LabelsHow severe is sample-selection bias, and how much observed subgroup disparity is an artifact of selective labels?

Current Lines of Inquiry

Severity of sample selection bias in historical training data
Which reject-inference assumptions are defensible in specific product settings
Whether shadow deployment or policy experiments can improve label coverage
Disentangling genuine disparity from selective labeling

Why It Matters

Institutions observe realized outcomes only for approved borrowers. Rejected applicants generate no labels — a selective-label problem at the heart of model development.

Open Challenges

This remains one of the foundational unresolved problems in credit modelling. Any institution serious about fairness or benchmarking eventually confronts it. We treat it as core research, not a detail.

08Human-AI Collaborative UnderwritingHow should machine intelligence and human judgement be combined to improve consistency, transparency, and auditability?

Current Lines of Inquiry

When high-uncertainty cases should be routed to human review
Which explanation formats improve underwriter consistency
Whether segment-aware policies reduce unnecessary rejections without raising manual load
How human overrides are logged and fed back into governance

Why It Matters

Many lenders need better visibility, segmentation, and control over parts of the decision process — not wholesale replacement. A large design space sits between manual underwriting and full automation.

Open Challenges

The long-term position is not that automation eliminates human judgement, but that intelligence is structured to improve consistency and auditability where human review remains necessary.

09Governance and MonitoringHow do you detect meaningful change — drift, miscalibration, subgroup instability — and connect it to action?

Current Lines of Inquiry

PSI as one part of a broader stack: meaningful warning versus noisy artifact
CSI for feature- and segment-level characteristic change
Drift detection across covariate, label, concept, segment-mix, and pipeline shifts
Calibration monitoring, champion-challenger evaluation, and measurable retraining triggers

Why It Matters

A model can keep strong rank ordering while becoming poorly calibrated. Probabilities feed thresholds, pricing, and review, so monitoring and governance must be first-class, not retrofitted.

Open Challenges

Tooling is necessary but insufficient. No monitoring system helps unless it is connected to governance routines, escalation paths, and retraining decisions.

Methodological Contributions

Proposed internal metrics

Several strands of the original research led to internal metrics intended to measure aspects of the decision system that standard classification metrics miss.

DTPGDynamic Threshold Profit Gain

What It Measures

The incremental value of segment-aware threshold policies relative to a single static threshold baseline.

Why It Was Developed

If borrower populations are heterogeneous, a uniform threshold may leave measurable value on the table. DTPG reframes thresholding as a measurable policy problem rather than a heuristic tuning exercise.

Status — Proposed, under active evaluation

RVDRRecourse Validity Degradation Rate

What It Measures

How often previously valid recourse recommendations become invalid after a model update or retraining.

Why It Was Developed

Most recourse is generated once and never monitored through lifecycle changes. RVDR converts recourse from a static explanation output into a monitored governance object.

Status — Proposed, under active evaluation

CFDICluster-Fairness Disparity Index

What It Measures

Fairness instability across borrower clusters or subgroups, even when aggregate fairness metrics appear acceptable.

Why It Was Developed

Population-level fairness can mask severe local disparities. CFDI surfaces a common failure mode where product mix, ticket size, region, and borrower type create sharp within-portfolio heterogeneity.

Status — Proposed, under active evaluation

These metrics remain under active evaluation and should be interpreted as research instruments rather than universal standards or validated industry benchmarks.

Empirical Foundation

Evidence and Validation

The foundation is not purely conceptual. It has been informed by working empirical pipelines across academic and industry-style datasets.

Public Benchmark Lineage

Earlier work on the Taiwan Credit dataset established the end-to-end research pipeline: feature engineering, clustering, threshold optimization, recourse generation, fairness auditing, drift monitoring, and dashboarding. That stage demonstrated the feasibility of the broader architecture and produced early evidence for threshold and recourse research.

Larger Portfolio-Style Evaluation

Subsequent development extended the work into realistic portfolio analyses, including CRIF-style structured credit data. Those studies mattered not because they solved every modelling issue, but because they exposed the operational depth real institutions require:

Portfolio segmentation
Cohort analysis
Product-level stress behavior
Dynamic policy simulation
Governance reporting
Explainability & fairness integration

The evidence so far is encouraging, but it should be interpreted carefully. Strong results in one dataset or portfolio context are not universal claims. The program is oriented around replication, shadow testing, and institution-specific validation.

Limitations

What Remains Unresolved

A serious research program must state clearly what it has not solved. These are the constraints we work inside, not around.

Fairness is not solved

No fairness metric or remediation procedure fully resolves the normative and operational complexity of lending. Different criteria conflict, protected attributes are partial, proxies are embedded, and labels are selective.

Explainability can become performative

It is possible to generate explanations that are technically correct but operationally unhelpful, or simplified enough to mislead. The challenge is improving usefulness without sacrificing faithfulness.

Recourse quality is hard to guarantee

Counterfactual recourse is sensitive to feature design, causal assumptions, policy rules, and model updates. Stable, feasible, borrower-meaningful recourse in real environments remains open.

Alternative data stays governance-heavy

Even when alternative signals improve prediction, they can increase interpretability burden, proxy-bias risk, and regulatory complexity. Signal gain alone is not enough.

Monitoring requires institutional discipline

No monitoring system helps unless it is connected to governance routines, escalation paths, and retraining decisions. Tooling is necessary but insufficient on its own.

Future Research Roadmap

Where the Research Is Going

Future investigations that extend beyond current scoring, fairness, and monitoring layers. Framed as open directions, not shipped features.

Direction 01

Graph-Based Credit Intelligence

Borrowers, merchants, co-applicants, employers, devices, and transaction networks form relational structures. Graph approaches may surface shared-risk and ecosystem-level patterns that flat tabular models miss.

Direction 02

Causal Inference for Lending

Most credit systems remain associational. Causal inference offers a more principled route to policy analysis, recourse realism, and selective-label reasoning — treatment effects of approval policy, collections, and routing.

Direction 03

Dynamic Borrower Representations

Borrowers are not static vectors; their behavior evolves. Future work includes richer temporal and longitudinal representations that model trajectory rather than snapshot features alone.

Direction 04

Foundation Models for Financial Decisioning

Large representation models for tabular, transactional, textual, and multi-modal data may reshape credit modelling. Calibration, privacy, controllability, auditability, and transparency remain open questions.

Direction 05

Multi-Modal Risk Assessment

Credit decisions increasingly draw from structured fields, transaction summaries, documents, GST records, bank statements, and underwriting notes. Multi-modal architectures may improve signal but raise governance complexity.

Direction 06

Agentic Underwriting Systems

Whether decision-support agents can coordinate document review, policy retrieval, explanation, and adverse-action drafting. Approached cautiously: agentic systems in lending must be bounded, auditable, and constrained by clear approval authority.

Direction 07

Responsible AI in Lending

The umbrella theme. The objective is not abstract principles but operationalizing responsible AI through measurable workflows — monitoring, review, explanation, recourse, validation, and documented governance.

Collaboration

Institutional Research Collaborations

The ideal collaborations are not procurement exercises. They are research-informed institutional pilots. NomoCrit is open to work with:

BanksNBFCsFintech lendersMicrofinance institutionsDigital lending platformsAlternative credit providers

Possible Collaboration Formats

Historical portfolio analysis

Run NomoCrit on historical loan data to study segmentation, threshold policy, fairness behavior, calibration, and portfolio health under alternative decision strategies.

Fairness audits

Evaluate aggregate and subgroup disparities across products, geographies, and borrower segments — and identify where population-level acceptability masks localized failures.

Explainability audits

Assess whether existing model explanations are faithful, stable, useful to credit teams, and suitable for adverse-action workflows.

Shadow deployment pilots

Run NomoCrit alongside an existing process without immediate production replacement. Compare recommendations, threshold effects, explanation quality, and subgroup behavior under controlled observation.

Model benchmarking

Benchmark scorecards, tree ensembles, segment-aware systems, and hybrid workflows under shared governance metrics rather than predictive lift alone.

Reject inference studies

Study the extent of sample selection bias and evaluate institution-specific strategies for addressing missing rejected-applicant outcomes.

Portfolio health assessment

Analyze vintage behavior, stress concentration, drift, calibration decay, and emerging risk pockets across product segments.

Governance implementation

Design and test practical workflows for fairness review, drift escalation, explanation logging, model version control, and retraining triggers.

Closing Position

The platform is real. The deeper work remains ongoing.

Credit decisioning cannot be handled by static scorecards, isolated dashboards, or one-time fairness checks. It requires systems that support continuous experimentation, measurement, audit, and revision. The most valuable collaborations are those willing to test assumptions, expose limitations, and improve systems through evidence rather than narrative. That is the purpose of this program.

Discuss a Research Collaboration Request a Platform Demo