Apr. 01 2019 — AI is on the rise at financial institutions, and there are many applications in play today. From intelligent virtual assistants to techniques to quickly sift through mounds of data to identify investment signals, this technology is helping firms work more efficiently and make faster decisions. But the attention that AI is receiving in the press can have business professionals intrigued with different techniques and tools that can take them down the wrong path. Given this, we thought it would be useful to discuss a six-step, common sense approach to assess AI opportunities for your business.
- 1. Define your problem
- 2. Design a framework for identifying a solution
- 3. Determine data needs
- 4. Evaluate different techniques
- 5. Assess delivery options
- 6. Consider maintenance issues
A Six-Step Approach
- 1. Define your problem
This is the start of a typical business case where you need to address the “three W’s”: what problem are you trying to solve, for whom, and why?
What: You should be as precise as possible when you define the problem, breaking it down by internal issues, such as tasks that take too much time to complete, or external issues, such as the need to better understand lending risks.
For whom: This can help dictate what you are able to do and how quickly it can be implemented. Using AI to support a credit risk model to assess lending decisions for individuals, for example, will have different constraints and time requirements than automating the collection of company filings.
Why: Having a clear definition of the reasons behind the initiative and the benefits you hope to achieve can help keep you focused on the end game, and build internal support.
By taking the time upfront to define the three W’s, you are setting the stage to better address the steps that follow, such as identifying data needs and how the solution will be delivered to end users.
Let’s look at a lending example: Your institution is facing increased competition from FinTech firms that can offer quick loan decisions for small loans to small businesses. You would like to automate your current labor-intensive process to be more responsive to potential borrowers. This involves ingesting financials and other documents from borrowers, analyzing the data, recommending a yes-no decision and a price, and communicating that decision. This may also require accessing and analyzing new data sets, which could provide timely indicators of risks (e.g., the CEO at the company requesting a loan has previously directed three successful businesses, providing confidence in his management skills). This could help you be more competitive, make better risk-adjusted decisions, and expand your business relationships.
- 2. Defining Success
The objective here is to define your measures of success that, in turn, can help point you to potential solutions to the problem you have identified in Step 1. For example, if you are working on enhancing a model or automating a process, you should clearly define your target performance metrics (e.g., improved accuracy, a better mean squared error, etc.) and the amount of gain you hope to achieve in each of these metrics that would justify a model update or change in a client’s existing workflow. If building a risk model, are you seeking performance gains only, or performance gains with a transparent model? If you are looking to speed up the loan origination process, do you have a target turnaround time and a hypothesis as to which component can be done faster? Answering these questions will help lead you to one AI technique over another.
The potential solutions that you identify at this stage should be defined in detail and tested. For example, if you decide to improve your credit risk model and add new data sources, but require transparency, you might start with visualizing data relationships and running semi-supervised Machine Learning (ML) models that give computers the capability to learn without being explicitly programmed. Either way, testing should involve rapid prototyping and comparing alternative methodologies and their outcomes.
It is important that these testing/validation activities be conducted under the reproducible research principles, where data and software codes are available so they may be verified and built upon by others. This can help prove to stakeholders how your solution is addressing their problem, and let teams within your organization leverage your methods and results when working on similar problems. Importantly, it also speeds up the move to production. There are software tools, such as notebooks and markdowns,1 that help scientists easily document and share their workflows and narratives, and software containers2 that help computational reproducibility of a scientist’s work by packaging their computing environment in a self-contained system that will always run the same, regardless of where it is deployed.
Lending example: Begin by determining your measure of success (e.g., decrease processing time of loan applications from 20 days to three days), and then start gathering some candidate solutions to automate your current process. For example, a potential solution might include sourcing new data about your client and using this as part of your adjudication models. At each step, leverage the reproducible research framework, so it can be easily verifiable and built upon by others.
- 3. Determine data needs
In this step, you need to determine what data will be required, and where you may have gaps in coverage, history, or quality. As you consider how best to fill the gaps, review possible data restrictions, privacy concerns, and whether you can gather information on your own or need to have a license agreement. (Note: defining data to solve a problem is more straightforward than looking for a problem to apply to available data.)
Once defined, you need to ingest the datasets into your environment and test whether they fit your purpose. If ‘big data’ is required, consider obtaining a representative sample that you can review in your current computing environment, and start going through your reproducible framework to assess how it works. ‘Model-free’ visualizations may be useful, which involve plotting attributes of your data that you expect to be meaningful for your problem across datasets and/or over time. If you are dealing with a dataset that has many attributes, however, such as text data, most insight will be obtained after building a prototype model and doing visual inspections of the outputs or attributes that the model identifies as being relevant.
Lending example: What data does your model require? If the borrower is and has been a client in another part of the institution, can that data be cross-referenced to the lending model analysis and scenarios?
4. Evaluate different techniques
There are a number of sub-fields within AI, and you will need to identify the right technique to solve your problem. Two important areas include: i) ML, mentioned earlier, that gives computers the capability to learn without being explicitly programmed, and ii) Deep Learning (DL), which is ML using deep neural networks to learn from a hierarchy of concepts, which is inspired by functions of the human brain. A recent successful application of AI is Natural Language Processing (NLP), which deals with the interaction of human and machine languages, computer vision, and speech recognition that leverage DL techniques.
The technique you ultimately choose will likely depend on its performance for your application and the expected maintenance costs. Developing and deploying powerful ML/DL solutions is relatively easy and inexpensive. The open source community has made it accessible and state-of-the-art neural network architectures can be implemented using a handful of code lines. Maintenance of ML/DL-based solutions is costly, however, as they are prone to accumulating hidden technical debt,3 which should be thought through and planned for in advance.4
In addition to performance and maintenance costs, fairness (i.e., not biased) should be considered, a topic that has recently received significant attention from the media and policymakers. Transparency of AI-based models is also important, especially for topics that are under regulatory oversight, such as consumer credit risk models. We approach transparency by: i) following the reproducible research principles mentioned earlier that makes our methodology accessible, and ii) helping users understand why a given solution delivers a particular outcome. This can be achieved by providing ‘global’ interpretations to the solution that show which factors are main drivers of an algorithm, as well as ‘local’ interpretations that describe the interaction of the output and input of an algorithm, given a single sample of data.5
Lending example: Can you leverage your existing model or should you consider expanding it to consider additional variables? What are your regulatory and compliance requirements, and are there some privacy restrictions? These questions will help define what approach is reasonable.
- 5. Assess delivery options
Who you will serve and your product and technical requirements will determine how you should deliver your solution and what additional material may be required. For example, you will need to take certain steps if it is going to a downstream API, and others if it is going directly to end users who may need detailed documentation. You need to make it easy for your users to understand and approve of your solution by enabling them to manipulate the data, run what-if scenarios, and see the value of what you have built. Spending time building interactive visualizations using Shiny,6 Dash,7 or other applications, may enhance adoption rates by internal and external clients.
Lending example: Assuming this is an extension of your current business, will you leverage your existing channels or add new digital channels? If you are adding channels, there are quite a few other questions that need to be addressed around technology, platforms, continuity, cyber risk, and regulatory requirements.
- 6. Consider maintenance issues
The world is changing fast so you can’t execute and forget – you need to continuously monitor your input, model, and output. You need to take steps to validate that your solution is being used and is performing as intended. This calls for putting in place a regular monitoring and action plan to know if your data or model starts to deteriorate and how you will respond. Consider how you will decide if you have to upgrade your model or not, and how easy it will be to test a new algorithm within your pipeline. Also determine how dependent your solution is on key individuals and how quickly you can bring new members of your team up to speed if these individuals are not available. These are all business-as-usual technology and model life-cycle management considerations that need extra planning and design for AI-based solutions, because of the complexities and hidden technical debts we referenced earlier.
Lending example: There needs to be a deliberate plan to manage the maintenance, including an outline of roles and responsibilities and frequency. There are two primary aspects to look at: i) monitor what you have put in place to determine whether goals for speed, accuracy, and other metrics are being met, ii) continue to expand your data sources and, potentially, add new models.
Keep in mind any cyber security, privacy, regulatory, region-specific, and policy issues, as they will create guardrails for your solution. Cyber risk is critical, of course, which can be an external force, but also how your model can be impacted by internal factors. For example, adversarial training and making your model immune to malicious inputs must be taken into account to secure success for your AI-based solutions.
Having followed this six-step process, you should have enough information to make an informed business decision as to whether to proceed or not. Even if you do not implement an AI solution at this point, you and your team will have a methodology in place for evaluating the next opportunity. We are all in business to solve problems and, if approached properly, AI can present exciting options.
3 This term likens poor software design to servicing a financial debt which, ultimately, needs to be paid with interest, that is, extra work.
4 For more information on this topic see: Sculley, et al., “Hidden Technical Debt in Machine Learning Systems”, (https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf).
5 A useful implementation of this is provided by LIME, which uses an easily interpretable local surrogate model to describe what is happening in the neighborhood of the prediction to be explained; Tulio Ribeiro, et al., "Why Should I Trust You?: Explaining the Predictions of Any Classifier”, (https://arxiv.org/abs/1602.04938).
6 Shinyapps.io, https://www.shinyapps.io/.
7 Dash by Plotly, https://plot.ly/products/dash/.