Thursday, 23 June 2016

SAP HANA Database as a Graph Store - Uncovering the network using Graph Algorithms

This Document is part of the series on HANA Database as a Graph Store in SAP HANA SPS 12 : SAP HANA Database as a Graph Store - Introduction

Let us figure out the other possibilities to uncover fraud rings in Insurance-Fraud scenario discussed in the previous documents using Graph Algorithms in SAP HANA.

Shortest Path Algorithm:


This action provides the information for the shortest path from the starting vertex to all reachable vertices.

The action GET_SHORTEST_PATHS_ONE_TO_ALL returns the shortest paths from the provided start vertex to all reachable vertices in the graph also known as single-source shortest path (SSSP). The resulting shortest paths form a tree structure with the start vertex at the root.
All other vertices carry the shortest distance (smallest weight) information. The non-negative edge weights are read from the column provided in the edge table.

let us now create a calculation scenario with “Shortest Path” Algorithm to figure out the existence of connection from person(Tony) to all other vertices(cars, person, accidents) in the fraud-ring detected :


CREATE CALCULATION SCENARIO "INSURANCE_FRAUD"."SSSP_EXAMPLE" USING '
<?xml version="1.0"?>
<cubeSchema version="2" operation="createCalculationScenario" defaultLanguage="en">
<calculationScenario schema="INSURANCE_FRAUD" name="SSSP_EXAMPLE">
<calculationViews>
<graph name="sssp_node" defaultViewFlag="true" schema="INSURANCE_FRAUD" workspace="GRAPH" action="GET_SHORTEST_PATHS_ONE_TO_ALL">
<expression>
<![CDATA[{
"parameters": {
"startVertex": "Tony",
"outputWeightColumn": "DISTANCE"
}
}]]>
</expression>
<viewAttributes>
<viewAttribute name="NAME" datatype="string"/>
<viewAttribute name="DISTANCE" datatype="int"/>
</viewAttributes>
</graph>
</calculationViews>
</calculationScenario>
</cubeSchema>
' WITH PARAMETERS ('EXPOSE_NODE'=('sssp_node', 'SSSP_EXAMPLE'));

In this we define the name of the action(Graph Algo), Parameters such as the start vertex from which we must get the all the connected vertices without a depth limit with its shortest path.

It creates a calculation scenario with a single graph node using ‘Shortest-Path’ action on the graph workspace “INSURANCE_FRAUD”.”GRAPH”. This scenario traverses the underlying graph using all edges from start-vertex ‘Tony’ and returns all the connected vertices from the start-vertex with its shortest path.

Execute the below query on the above created Calc Scenario to  find the relationship of John/Person(Vertex) with all Other Vertex(Cars and Accidents)  in the fraud-ring:

SELECT * FROM "INSURANCE_FRAUD"."SSSP_EXAMPLE" ORDER BY "DISTANCE";


Isomorphic Subgraph Algorithm


This Action GET_ISOMORPHIC_SUBGRAPHS returns a projection of subgraphs within a given graph workspace that are isomorphic to a given subgraph pattern.

With the created Graph Database in the previous discussion, let us get the information of a subgraph that are isomorphic to a sub-graph pattern in the insurance-fraud ring.

To achieve this, let us create a calculation scenario with “Isomorphic Subgraph” Algorithm to figure out the matching isomorphic subgraph.

In this we define the name of the action(Graph Algo),a subgraph pattern containing a set of vertex variables, a set of edge variables, a condition-clause, a projection list, an order-by list, a limit and an offset.

Let us define a pattern of Person ----(Who Advocates) Person ----(* any relation)----car(which is involved)----(accident) as a parameter to the Isomorphic subgraph algorithm.

CREATE CALCULATION SCENARIO "INSURANCE_FRAUD"."PATTERN_EXAMPLE" USING '
<?xml version="1.0"?>
<cubeSchema version="2" operation="createCalculationScenario" defaultLanguage="en">
<calculationScenario schema="INSURANCE_FRAUD" name="PATTERN_EXAMPLE">
<calculationViews>
<graph name="get_iso_subgraph_node" defaultViewFlag="true" schema="INSURANCE_FRAUD" workspace="GRAPH" action="GET_ISOMORPHIC_SUBGRAPHS">
<expression>
<![CDATA[{
"parameters": {
"pattern": {"version": "01.00.00",
"subgraph" : {
"vertexVariables" : ["A","B","C","D"],
"edgeVariables":["E1","E2","E3"],
"projection":[
{
"variable" :"A",
"attribute":"NAME",
"alias":"ANAME"
                     },
                                          {
"variable" :"B",
"attribute":"NAME",
"alias":"BNAME"
},
                                          {
"variable" :"C",
"attribute":"NAME",
"alias":"CNAME"
},
{
"variable" :"D",
"attribute":"NAME",
"alias":"DNAME"
}
],
"condition":{
"type":"AND",
"arguments" : [
{
                     "type":"-->",
                               "arguments" : ["E1","A","B"]
},
{
"type":"-->",
                               "arguments" : ["E2","B","C"]
},
{
"type":"-->",
                               "arguments" : ["E3","C","D"]
},
{
"type":"=",
                               "arguments" : [
                                    {
                                    "type" : ".",
                                    "arguments" : ["E1","TYPE"]
                                    },
                                    {
                                    "type" : "literal",
                                    "datatype":"string",
                                    "value":"Advocates"
                                    }
                                    ]
}

]
}
}
                    
}
}
             }]]>
</expression>
<viewAttributes>
<viewAttribute name="ANAME" datatype="string"/>
<viewAttribute name="BNAME" datatype="string"/>
                         <viewAttribute name="CNAME" datatype="string"/>
                         <viewAttribute name="DNAME" datatype="string"/>
</viewAttributes>
</graph>
</calculationViews>
</calculationScenario>
</cubeSchema>'
WITH PARAMETERS ('EXPOSE_NODE'=('get_iso_subgraph_node','PATTERN_EXAMPLE'));

This pattern of parameter must provide the result of all the sub-graphs that starts from a person vertex who advocates another person which is extrapolated with all the relations from the former person to the cars that are involved in accidents in the fraud-ring.

Execute the below query on the above created Calc Scenario to  find the relationship of John/Jane Person(Vertex) with all Other Vertex(Cars and Accidents)  in the fraud-ring:

SELECT * FROM INSURANCE_FRAUD"."PATTERN_EXAMPLE";

Result set must provide all the sub-graph with a person who advocates(Tony in our case) to all other people(John,Jane) who has seek for his support and from there the graph is extended by the relationship of John and Jane to the Cars involved in accident as shown below :


Thus with Isomorphic Sub-Graph algorithm we are able to find all the required pattern of sub-graphs from the holistic graph set.

In our example we must be able to fetch the result all the sub-graphs that involved people who are advocated by Tony who are in-turn involved in some relationships(drivers, passengers, witnesses) to the cars that are involved in accident.Above result gives 6 such sub-graphs of pattern:
  • John advocated by Tony who drove Maruti Car involved in accident1.
  • John advocated by Tony who witnessed for BMW car involved in accident1.
  • Jane advocated by Tony who drove Audi car involved in accident2.
  • Jane advocated by Tony who drove Audi car involved in accident3.

And similar such patterned sub-graphs.

Above calculation scenarios for various Graph Algorithms can also be created using hdbcalculation view's Graph Node in DB Module of XS Advance.

Here by, we complete the discussion on understanding and working of HANA as a Graph Database along with the discussion on Graph Algorithms that can be used on the defined graph data set.

Source: scn.sap.com

No comments:

Post a Comment