articles Corporate /en/research-insights/articles/using-machine-learning-to-solve-data-imbalance-in-aml-l1-alerts content
Log in to other products

Login to Market Intelligence Platform


Looking for more?

Request a Demo

You're one step closer to unlocking our suite of comprehensive and robust tools.

Fill out the form so we can connect you to the right person.

  • First Name*
  • Last Name*
  • Business Email *
  • Phone *
  • Company Name *
  • City *

* Required

In This List

Using Machine Learning to Solve Data Imbalance in AML L1 Alerts

S&P Global Ratings

2018 Annual Global Leveraged Loan CLO Default and Rating Transition Study

S&P Global Ratings

For Global Banks, The Fickleness of Capital Markets Revenue is on Full Display

S&P Global Platts

Oil traders have bigger worries than a new Hormuz tanker war

S&P Dow Jones Indices

Iron Ore is on a Hot Roll

Using Machine Learning to Solve Data Imbalance in AML L1 Alerts

TData in the banking and financial services sectors has grown exponentially with the rise in money laundering and other financial crimes across the globe. Anti-money laundering (AML) data, in particular, has evolved dramatically and grown in volume due to the complexity of existing alerts as well as generation of new types of alerts.

Understanding customer level and transactions data is important in model development activities, which are vital to AML programs.

Based on various studies on financial crime compliance or FCC, researchers have found growing data imbalance problem between the minority class and the majority class (the minority class being true matches or true alerts, and the majority class being false matches or false positives).

Classical or traditional models favour the majority class and usually show inferior performance on the minority class. Presenting imbalanced data to a classifier will produce undesirable results, such as a much lower performance on testing data than training data.

However, a good AML model should perform equally well on both minority and majority classes.

The cost-sensitive learning methods consider higher costs for misclassification of observations in the minority class to address the anomaly. However, using a cost-sensitive learning method requires knowledge of the cost of misclassification, which is often unknown and therefore has to be assumed.

Machine learning algorithms and data mining solutions have provided an opportunity to understand the nature of imbalanced data. Machine learning techniques attempt to resolve class imbalance problems using sampling techniques, optimisation of model structure and learning algorithms. For imbalanced datasets, applying traditional methodologies such as K Nearest Neighbors, and Naive Bayes, results in inferior performance of the algorithms.

In this paper, we focus on the current challenges faced in using traditional methods for classification with imbalanced datasets, which rely on conventional sampling techniques to balance datasets. Additionally, we discuss alternative data balancing techniques to rebalance the data and a few of the machine learning classification algorithms that adapt themselves to deal with minority class data detection.