How Big Data and Machine Learning Shield BFSI from Financial Fraud

Mar 05,2019 by Parul Singh
Inner banner

If you are into online banking, there is a high chance that you have received calls asking for your card details or login credentials. Financial fraud is nothing new to the industry. Data has become more accessible and data breaches more frequent. Consider the following statistics :

  • In 2017, over 25,000 cases of online banking fraud affected around 18% Indians.
  • The same year, Indian banks reported at least one fraud every hour.
  • Indian banks lost INR 109.75 crore to theft and online fraud in 2018.

Financial fraud detection is a daunting task for organizations, be it banks, insurance companies or eCommerce retailers. Organizations lose around 5% of their revenue every year to fraudulent transactions.

Timely detection (and prevention) of fraud can help these organizations save resources and gain the confidence of all the stakeholders, be it employees, customers, shareholders, investors or vendors. Big Data and Machine Learning  have emerged as potent weapons to combat this menace.

Financial Fraud Detection Methods: Traditional vs. Modern

Traditional fraud detection methods involve the use of explicit signals to detect anomalies. For example, any unusually large transaction or transaction with the same card from two different cities within a single day can be fraudulent.

Conventional systems have fraud detection algorithms with hundreds of manually written scenarios to detect anomalies. Every time some new method of fraud is discovered, a new scenario has to be added manually, which is time-consuming.

This method is reactive (and not proactive) in approach and unfit for real-time threat detection. These methods are too plain to detect any hidden correlations between the variables involved. Moreover, they are unable to handle huge volumes of data.

Modern financial fraud detection methods employ sophisticated Big Data and Machine Learning platforms. These platforms process gigantic sets of data and handle hundreds of variables simultaneously, e.g. transaction volume, device used, time, location, customer background etc., just to name a few. Their algorithms are capable of detecting implicit correlations between user behavior and the likelihood of fraudulent actions.

Companies are switching to these modern day tools to reduce manual work and combat the threat of financial fraud in real-time. Modern day algorithms also help improve customer experience, as they reduce the verification steps involved in any transaction.

Big Data and Machine Learning models to Detect Financial Fraud

Big Data platforms make use of Machine Learning models to combat financial fraud. The following steps are usually involved:

1)Identifying business objectives: The first step in financial fraud detection would be to decide the goal behind building the model. An organization could be trying to identify unusual credit card transactions or checking inconsistencies in the information provided by a credit card applicant.

2) Preparing the database: Once the objectives have been identified, the next step would be to prepare the database to be fed into the models. This process is time-consuming, as the models need voluminous datasets for higher accuracy. The data fed into the system should be free from errors and should not have irrelevant variables.

For example, a database for credit card transactions could include millions of rows, each row representing a different transaction. Each row has several columns, each column representing a unique variable. These variables (columns) could be transaction ID, transaction value, user location, card type, date of the transaction, IP address and so on. The last column is usually the target column which accepts a value of 0 or 1. 0 could stand for a legitimate transaction and 1 for a fraudulent one or vice versa.

3) Building the model and making predictions: The next step involves building the model using the database prepared in step 2. Modern detection systems use supervised and unsupervised learning techniques to build sophisticated models for greater accuracy in threat detection.

Supervised learning techniques involve the use of labelled datasets, wherein each transaction or target value is labelled as ‘fraudulent’ or ‘not fraudulent’. The model learns from this labelled data to distinguish fraudulent transactions from genuine ones.

Now, when new input values are fed into this model, it will provide the output by classifying the transaction as fraudulent or legitimate.

Unsupervised learning techniques use unlabeled data and allow the model to learn the inherent structure of the data on its own, and identify the hidden correlations between the variables in question. Unsupervised models can detect outlier transactions and behaviour.

For unsupervised learning, clustering algorithms such as k-means clustering are used. K-means clustering tries to group the observations having similar characteristics into one cluster. The algorithm decides clusters on its own. Observations which don’t belong to any cluster are considered aberrant.

To make predictions more accurate, data scientists construct multiple models, all of which analyze the same transaction and judge accordingly. Multiple methods allow data scientists to leverage the strengths of different models and build robust threat detection systems.

4) Upgrading the model: The model needs to be fed with new data from time to time to maintain its viability. As new transactions occur, new fraudulent patterns emerge. Data scientists keep on adding these transactions to the model to enhance its fraud detection capability.

Applications of Financial Fraud Detection Systems in BFSI industry

1)Combating Spurious Insurance Claims: According to an estimate, fraud costs American insurance companies as much as $80 billion every year. Claimants often stage accidents like road mishaps/fire or overstate the amount of damage to draw higher amounts from insurers.

To deal with this menace, several insurance companies have begun to build a repository of customer data by collating data from several different sources e.g. customer’s social media feed, his transaction history, call centre call recordings, credit history, financial background, address changes and criminal record, if any.

Insurers even fetch real-time data from third party sources (such as weather and traffic data) to augment their models. Big Data tools such as Apache Hadoop and Spark work on this repository and carry out link analysis across different datasets to detect inconsistencies in the claim if any.

For example, a customer claims car insurance on the grounds that his car was damaged in rain, but weather data for the time and place stated in the claim may show sunny weather. Insurers can exploit such inconsistencies to expose spurious claims.

2) Tackling Loan Application Fraud: A person applying for a loan can commit fraud in several ways. He can assume a false identity, provide incomplete information, forge documents, or even modify certain details to better his chances of qualifying for the loan. As a result, loan companies often end up lending to unqualified people.

Big Data and Machine Learning software builds a scoring model which calculates the probability of committing fraud and compares it against a standard value. This helps loan companies to assess which applications are more likely to be fraudulent. Such assessments can save these companies millions by reducing the time and effort needed to verify each application. 

3) Preventing Fraudulent Transactions: Companies can detect unusual transactions through user behavior analysis. Several metrics are considered while carrying out this analysis. These can include login metrics like device used, login attempts, IP address, time of day, etc. or non-transaction metrics like account balance, transaction history, addition of new user, etc.

For example, a credit card owner visits a particular supermarket every two days at a specific time. He uses his card to carry out transactions in the range of INR 500-1000 in each visit. Every three days, he also visits a petrol pump close to his house.

If one day the same credit card shows three transactions happening in some other supermarket and the transaction amount is 30000 INR, the behavior analysis model will label this transaction as potentially spurious and alert the card user.

Descriptive statistics like average, median and standard deviation come in handy while analyzing user behavior e.g. if a transaction value exceeds the average standard deviation, it raises a red flag and the user is notified immediately.

4) Combating Spoofing in E-commerce: Many fraudsters clone an entire ecommerce site or certain pages of the site from which an order is placed.When a customer visits the cloned site, he is duped into buying from the site, as the look and feel of the site is identical to the real site.

This spoofed site collects all customer details including his credit card credentials as he enters these details on the site. The customer also receives a notification of the transaction to make it appear real. Fraudsters are able to collect credentials of thousands of customers this way. Machine Learning algorithms use behavior analytics to detect such malicious websites and alert the customers on time.

The Bottom Line

While Big Data and Machine Learning offer robust platforms to combat financial fraud, it is not possible for every organization to implement these solutions in-house. These platforms require an exhaustive amount of carefully prepared data in addition to deep technological and domain expertise.

Outsourcing financial fraud analysis to experienced third-party vendors can shield organizations from malicious attacks and help them focus on their core business.

Leave a Reply

1 Comment threads
0 Thread replies
Most reacted comment
Hottest comment thread
newest oldest
Notify of
Data Science Training In Hyderabad

The Above Content is Useful to All The Enthusiastic of Data Science Aspirants