Showing posts with label Machine Learning. Show all posts
Showing posts with label Machine Learning. Show all posts

Monday, 24 June 2024

Exploring ML Explainability in SAP HANA PAL – Classification and Regression

1. Introduction


In this blog post, we will delve into the concept of Machine Learning (ML) Explainability in SAP HANA Predictive Analysis Library (PAL) and showcase how HANA PAL has seamlessly integrated this feature into various classification and regression algorithms, providing an effective tool for understanding predictive modeling. ML explainability are integral to achieving SAP's ethical AI goals, ensuring fairness, transparency, and trustworthiness in AI systems.

Upon completing this article, your key takeaways will be:

  • An understanding of the concept of ML Explainability.
  • How to utilize HANA PAL for ML Explainability in classification and regression tasks.
  • Hands-on experience with Python Machine Learning Client for SAP HANA (hana-ml) through an example.

Wednesday, 6 March 2024

Global Explanation Capabilities in SAP HANA Machine Learning

Machine learning (ML) has great potential for improving products and services across various industries. However, the explainability of ML models is crucial for their widespread adoption. First, explanation helps build trust and transparency between the users and the models. When users understand how ML model works, they are more likely to trust its results. Moreover, explainability allows for better debugging of complex models. By providing explanations for models’ decisions, researchers can gain insights into the underlying patterns, which helps identify potential biases or flaws. Furthermore, the explainability of models enables auditing, a prerequisite for its usage in regulated industries, such as finance and healthcare.

Saturday, 24 June 2023

SHAP Interaction values with Automated Predictive (APL)

We already covered SHAP-explained models for classification and regression scenarios in a previous APL blog post, and at the time we talked briefly about the main effect of a predictor and its interaction effect with the other predictors of the model. Now with HANA ML 2.17, you have the ability to visualize the interaction between variables in a heatmap. This new visualization adds to the bar chart Variable Importance in providing a global explanation of the classification/regression model. To get that new feature you need APL 2311 or a later version.

This blog will walk you through an example using the Census dataset that comes with APL.

Monday, 29 May 2023

How to compare an APL model to a non-APL model ─ Part 2

After completing the first part of the blog you should have a hold-out dataset in HANA dedicated to test, for both the Census case, and the California Housing case. In this second part we will build an APL model and a non-APL model on the same training data. Predictions will be made against the hold-out dataset to ensure a fair comparison between the two models. We will use standard metrics to measure the accuracy of our classification models and our regression models.

Regression Use Case


We define the HANA dataframes for training and for test using the tables prepared during part 1:

Wednesday, 17 May 2023

SAP Datasphere: Seamless extraction of business insights in multi-cloud environments with HANA Machine Learning and FedML

This blog post describes how SAP Datasphere can be used to provide a seamless data science experience by facilitating the training of machine learning (ML) models on different platforms (e.g. using hana_ml in SAP HANA and using FedML on hypercaler landscapes). Furthermore, it shows how those ML models can work hand-in-hand to provide data-driven insights to business users without the need of expensive data replication and fully preserving the business context of the data. To illustrate the point, a real-life use case is reviewed and the selection of an ML runtime is discussed in the context of data gravity, availability of the required ML tools on the platform and business criticality of the data. The objective of this blog post is to provide a high level concept and consideration guidelines for data scientists and architects when working on similar multi-cloud cases.

Friday, 14 October 2022

Develop a Machine Learning Application on SAP BTP – Data Science Part

In a series of blog posts, we address the topic of how to develop a Machine Learning Application on SAP BTP. The overall sequence of steps performed by the involved personas is depicted below:

SAP HANA Exam, SAP HANA Prep, SAP HANA Career, SAP HANA Certification, SAP HANA Guides, SAP HANA Tutorial and Materials, SAP HANA Learning, SAP HANA Preparation Exam

In this particular blog of the series, we focus on the data scientist’s work, i.e., understanding the business problem, performing experiments, creating appropriate machine learning models and finally generating the corresponding design time artifacts. These artifacts can then be exchanged with the application developer by pushing/pulling them to a common git repository.

Saturday, 14 May 2022

APL Time Series Forecast using a Segmented Measure

The latest release of the Automated Predictive Library (APL) introduces the capability to build several time series models at once from a segmented measure like Sales by Store for example or Profit by Product. No need any more to define a loop in your SQL code or Python code. Just tell APL what column represents the segment in your dataset. You can also specify how many HANA tasks to run in parallel for a faster execution.

This new capability requires HANA ML 2.13 and APL 2209.

Let’s see how it works in Python and then in SQL.

Friday, 8 April 2022

Deploy Machine Learning/Exploratory Data Analysis Models to SAP Business Technology Platform

DISCLAIMER: Please note that the content of this blog post is for demonstration purpose only, it should not be used productively without impact evaluation on production environment.

Introduction:

In this blog, we will implement an end to end solution for Python based web application(Flask) on SAP Business Technology Platform.

◉ We will use a cloud based HANA DB, and will leverage python package hdbcli to fetch the relevant data using SQL statement.

◉ We will be using python data science packages such as pandas,seaborn and matplotlib to display various graphs showing Exploratory Data Analysis.

Monday, 4 April 2022

Multiclass Classification with APL (Automated Predictive Library)

Common machine learning scenarios such as fraud detection, customer churn, employee flight risk, aim to predict Yes/No outcomes using binary classification models. But sometimes the target to predict has more than just two classes. This is the case of Delivery Timeliness that can have three categories: Early/On-time/Late.

From this article you will learn how to train and apply a multiclass classification model in a Python notebook with HANA ML APL.

The following example was built using HANA ML 2.12.220325 and APL 2209.

Census Income will be our training dataset.

Friday, 1 April 2022

Two simple tips to boost the working efficiency of a Data Science Project

How can we make our daily work more efficient? Is there any straight forward answer? For me, the answer is only one word, experience.

Participating on several Data Science projects the last years, i was really amazed how fast you can confirm the saying “Almost 70-80% of a Data Science project is spent on the Data preparation”. There are two simple tips that will be presented on this blog post regarding the Data preparation process.

The first one is comparing four different ways, that a data scientist in SAP HANA, can create random sample datasets from an initial dataset and which can be their potential usage. The second one is exposing the power of SAP HANA ML on creating and automating a set of new aggregated columns (max(), sum(), avg() for example) from existing columns without the need of writing complex and big SQL queries (feature engineering part).

Monday, 14 February 2022

Forecasting Intermittent Time Series with Automated Predictive (APL)

Starting with version 2203 of the Automated Predictive Library (APL) intermittent time series are given a special treatment. When the target value has many zeros, typically when the demand for a product or a service is sporadic, APL will no longer put in competition various forecasting models, but it will systematically use the Single Exponential Smoothing (SES) technique.

For SAP Analytics Cloud users, this functionality is coming with the 2022.Q2 QRC release in May.

Let’s take the following monthly quantity as an example.

Monday, 11 October 2021

Detecting Contextual Anomalies with SAP HANA ML

Introduction

What is an Anomaly?

The goal here is to detect outlier data points, which do not follow the collective common pattern of the majority of data points, hence can be easily separated from the group.

SAP HANA ML, SAP HANA Tutorial and Materials, SAP HANA Certification, SAP HANA Guides, SAP HANA Preparation, SAP HANA Career

Some of the possible use cases here are: 

Wednesday, 16 June 2021

Scheduling Python code on Cloud Foundry

This blog starts with a very simple example to schedule a Python file on Cloud Foundry, just to introduce the most important steps. That concept is then extended to schedule a Python file, which applies a trained Machine Learning model in SAP HANA.

Run Python file locally

We would like to schedule a Python file, not a Jupyter Notebook. Hence use your preferred local Python IDE or editor to run this simple file, helloworld.py.

Monday, 14 June 2021

Piecewise Linear Trend with Automated Time Series Forecasting (APL)

If you are a user of APL time series, you probably have seen models fitting a linear trend or a quadratic trend to your data. With version 2113 the Automated Predictive Library introduces an additional method called Piecewise Linear that can detect breakpoints in your series. You don’t have to do anything new to take advantage of this functionality, the trend is detected automatically as shown in the example below.

For SAP Analytics Cloud users, note that Piecewise Linear Trend is coming with  the 2021.Q3 QRC (August release).

Monday, 12 April 2021

Hands-On Tutorial: Leverage SAP HANA Machine Learning in the Cloud through the Predictive Analysis Library

The hard truth is that many machine learning projects fail to get set into production. It takes time and real effort to move from a machine learning model to a real business application. This is due to many different reasons, for example:

1. Limited data access

2. Poor data quality

3. Small computing power

4. No version control

Wednesday, 16 December 2020

COPD study, explanation and interpretability with Python machine learning client for SAP HANA

Chronic obstructive pulmonary disease (COPD) is a type of obstructive lung disease. Globally, it is estimated that 3.17 million deaths were caused by this disease in 2015. Exposure to indoor and outdoor air pollution, tabacco smoke (especially secondhand smoke), dusts and fumes is the key facts. In this blog post, I’d like to introduce two new features of Python machine learning client for SAP HANA: dataset report and model report to support me to study COPD cases. These two features provide great convenience to data scientists to analyze their data and the trained model. Let’s go through a use case to learn how it works.

Tuesday, 21 July 2020

Get On-boarded with HANA XSA and Python Application using hdbcli

In this post, I will be discussing on how to connect Python application to HANA XSA using hdbcli. I will also include what are the prerequisites to get the environment set up and will demo the end to end execution steps using the screen shots.

Some previous knowledge on Python will be great but not required as I will explain everything in detail as we move further.

Tuesday, 28 January 2020

My Adventures in Machine Learning 2

My project literally starts from scratch. I have some experience in operating SAP BW systems and troubleshooting various performance issues, but that always was human experience applied to a specific technical problem. Now there is an (at least for me) completely new field: Applying Machine Learning to operations. During the last years I read a lot about the AI hype. AI was applied almost exclusively to business cases. This means to me stuff inside the SAP applications or some other business software layers. I haven’t really seen a case for trying to apply AI or Machine Learning below, in the ordinary operations. Sure, there is lots of automation in operations, but hardly anyone that I am aware of talks about AI or ML in operations. This makes it hard to use some best practices which might guide me.

Friday, 17 January 2020

HANA ML DataFrame : End-to-end methods and it’s usage

A small write-up on HANA ML dataframe , it is really a learning , an exposure and a knowledge sharing process to write something beautiful you learn along with your day to day job so holding the passion for technology in both of my hands here come’s my first post of 2020 and topic is interesting enough , everyone’s favorite. HANA Machine learning & it’s about dataframe this time.