DataDecisionMakers
Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.
The rapid technical progress and widespread adoption of artificial intelligence (AI)-based products and workflows are influencing many aspects of human and business activities across banking, healthcare, advertising and many more industries. Although the accuracy of AI models is undoubtedly the most important factor to consider while deploying AI-based products, there is an urgent need to understand how AI can be designed to operate responsibly.
Responsible AI is a framework that any organization developing software needs to adopt to build customer trust in the transparency, accountability, fairness and security of any deployed AI solutions. At the same time, a key aspect to make AI responsible is to have a development pipeline that can promote the reproducibility of results and manage the lineage of data and ML models.
Low-code machine learning is gaining popularity with tools like PyCaret, H2O.ai and DataRobot, allowing data scientists to run pre-canned patterns for feature engineering, data cleansing, model development and statistical performance comparison. However, often the missing pieces of these packages are patterns around responsible AI that evaluates ML models for fairness, transparency, explainability, causality and more.
Here, we demonstrate a quick and easy way to integrate PyCaret with Microsoft RAI (Responsible AI) framework to generate a detailed report showing error analysis, explainability, causality and counterfactuals. The first part is a code walkthrough for developers to show how an RAI dashboard can be built. The second part is a detailed evaluation of the RAI report.
MetaBeat 2022
MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.
First, we install the libraries needed. This can be done on your local machine with Python 3.6+ or on a SaaS platform like Google Colab.
Pandas and Numpy upgrade is needed for now but should be fixed shortly. Also, don’t forget to restart runtime if you are installing in Google Colab.
Next, we load data from GitHub and cleanse the data and do feature engineering with PyCaret.
The dataset is a simulated loan applications dataset with features like gender, marital status, employment, income, etc. of applicants. PyCaret has a cool feature to make the training and testing data frames available after feature engineering via get _config method. We use this to get cleansed features that we will later feed to RAI widget.
Now we run PyCaret to build multiple models and compare them on Recall as a statistical performance metric.
Our top model is a Random Forest Classifier with a Recall of 0.9, which we can plot here.
Now, we will write our 10 lines of code to build a RAI dashboard using features data frames and models we generated from PyCaret.
The above code, though pretty minimalist, does a lot of things under the hood. It creates insights on RAI for classification and adds modules for explainability and error analysis. Then, a causal analysis is done based on two treatment features including credit history and marital status. Also, counterfactual analysis is done for 10 scenarios. Now, let’s generate the dashboard.
The above code will start the dashboard on a port like 5000. On a local machine, you could directly go to http://localhost:5000 and see the dashboard. On Google Colab, you need to do a simple trick to see this dashboard.
This will give you a URL to view the RAI dashboard. You can see some components of the RAI dashboard below. Here are some major results of this analysis that were generated automatically to complement the AutoML analysis done by PyCaret.
Error analysis: We see that the error rate is high for rural property areas and our model has a negative bias for this feature.
Global explainability – feature importance: We see that the feature importance remains the same across both cohorts — all data (blue) and property area rural (orange). We see for the orange cohort, the property area does have a bigger impact but still, credit history is the #1 factor.
Local explainability: We see that credit history is an important feature for an individual prediction also – row #20.
Counterfactual analysis: We see that for the same row #20 a decision from N to Y can be possible (based on data) if credit history and loan amount is changed.
Causal inference: We consider causal analysis to study the impact of two treatments, credit history and employment status, and see that credit history has a greater direct impact on approval.
A responsible AI analysis report showing model error analysis, explainability, causal inference and counterfactuals can add great value to traditional statistical metrics of precision-recall that we usually use as levers to evaluate models. With modern tools like PyCaret and RAI dashboards, it’s easy to build these reports. These reports can be developed using other tools — the key is that data scientists need to evaluate models for these patterns on responsible AI to make sure their models are ethical along with being accurate.
Dattaraj Rao is chief data scientist at Persistent.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!
Read More From DataDecisionMakers
Join metaverse thought leaders in San Francisco on October 4 to learn how metaverse technology will transform the way all industries communicate and do business.
Did you miss a session from Transform 2022? Head over to the on-demand library for all of our featured sessions.
© 2022 VentureBeat. All rights reserved.
We may collect cookies and other personal information from your interaction with our website. For more information on the categories of personal information we collect and the purposes we use them for, please view our Notice at Collection.