Wednesday, 3 July 2019

SAP Hana Geospatial – Making things simple

SAP Hana customers often ask about an easy way to use Hana Geospatial capabilities to address their business scenarios related to simple location problems such as determining the “distance between two locations” or if “location X is accessible from location Y in less than 10 minutes by car”, or what is “minimum distance between location Z to location A”…

In this blog we are going to describe a real use case scenario proposed by a customer who needs to answer some questions involving geolocation used by their target oriented marketing campaigns to find the right clients for increasing sales.

We demonstrate how easy is to integrate spatial information with SAP Hana and query it effectively using the Hana spatial integrated engine. So far, their initial approach was to use trigonometric functions, remember your math classes, to compute the distance between locations using the “as crow flies” method.

Scenario description

Customer proposed a very simple case providing information about two entities:

◈ Stores: sample dataset of the shopping centers owned by the customer with textual address information (street, zip code, etc.) and location information (longitude, latitude)

◈ Clients: sample of their customers related information containing textual address information including also location (longitude, latitude)

SAP HANA Geospatial, SAP HANA Study Materials, SAP HANA Certifications, SAP HANA Tutorial and Materials
SAP HANA Geospatial, SAP HANA Study Materials, SAP HANA Certifications, SAP HANA Tutorial and Materials

With that information the goal was to answer the following questions:

1. What is the store closest to a particular customer?
2. Is the store located less than 1 kilometer?
3. Could a customer reach the store by walking within 5 minutes?
4. Could a customer reach the store by car within 10 minutes?
5. What is the number of stores located less than 20 kilometers away from the customer?

Currently for answering those they were using “manual” distance calculations approaches resulting in convoluted programs and difficult to integrate with regular SQL queries. In addition, the methods followed by the customer were unrealistic not taking into account street and road landscapes.

Data preparation


We must first generate the Hana spatial data that we need to answer the questions using the Hana Spatial predicates afterwards.

ST_POINTs

We start generating the ST_POINT columns for storing the coordinates for each store and customer entry. This can be done quickly using a couple of SQL commands for each table.

◈ Extend the existing Hana table adding a ST_POINT(4326) column initially containing NULL values: we use 4326 as Spatial Reference Identifier as we want to use spheroidal surface for Earth representation (same as the one used by GPS systems).

◈ Initialize the value using the provided (X,Y) (longitude, latitude) coordinates that each record already has (VL_X and VL_Y)

ALTER TABLE "GEO_DATA"."STORES" ADD (COORDINATES ST_POINT(4326) NULL);
UPDATE "GEO_DATA"."STORES" SET COORDINATES = NEW ST_POINT(VL_X, VL_Y);

Same for CLIENTS table:

ALTER TABLE "GEO_DATA"."CLIENTS" ADD (COORDINATES ST_POINT(4326) NULL);
UPDATE "GEO_DATA"."CLIENTS" SET COORDINATES = NEW ST_POINT(VL_X, VL_Y);

ISOCHRONES


The second step is to track the reachability information for the stores, in other words, we need to represent the areas that are accessible, for example, by the following conditions:

◈ 5 minutes by walking
◈ 10 minutes by walking
◈ 15 minutes by walking
◈ 5 minutes by car
◈ 10 minutes by car
◈ 15 minutes by car

For this we used the isochrone concept getting for each case a polygon representing the area that can be reached from the store position:

SAP HANA Geospatial, SAP HANA Study Materials, SAP HANA Certifications, SAP HANA Tutorial and Materials

Those polygons can be obtained, for instance, using openrouteservice.org via the “Isochrones” service. For our exercise a very simple Jupyter python notebook was created for calling the free openrouteservice API and to store the provided polygons into the Stores table.

We added then 6 new columns of type ST_GEOMETRY(4326), a polygon in the end,  to the Stores table, to persist each reachable area.

ALTER TABLE "GEO_DATA"."STORES" ADD(ISOCRON5MW  ST_GEOMETRY(4326) NULL);
ALTER TABLE "GEO_DATA"."STORES" ADD(ISOCRON10MW ST_GEOMETRY(4326) NULL);
ALTER TABLE "GEO_DATA"."STORES" ADD(ISOCRON15MW ST_GEOMETRY(4326) NULL);
ALTER TABLE "GEO_DATA"."STORES" ADD(ISOCRON5MC  ST_GEOMETRY(4326) NULL);
ALTER TABLE "GEO_DATA"."STORES" ADD(ISOCRON10MC ST_GEOMETRY(4326) NULL);
ALTER TABLE "GEO_DATA"."STORES" ADD(ISOCRON15MC ST_GEOMETRY(4326) NULL);

For each store we get the 6 wanted polygons and update them in a single SQL UPDATE statement. This is a snippet of the python code used in the Jupyter notebook:

◈ It connects to Hana to get the coordinates.
◈ It connects to openrouteservice.org and gets the different isochrones: “driving-car”, “foot-walking”, etc.
◈ Finally, it connects to Hana and updates the table with the polygons for the store.
   
    from hdbcli import dbapi
    from openrouteservice import client
    import json
    
    api_key = 'Token obtained from openroute'
    clior = client.Client(key=api_key)
    
    con = dbapi.connect(address='HANA host', port=<port number>,
                        user='user name', password='password')
    cur = con.cursor()
    cur.execute("""SELECT ID, VL_X, VL_Y FROM "GEO_DATA"."STORES" """)
    
    res = cur.fetchall()
    
    profiles = ['driving-car', 'foot-walking']

    #5, 10 and 15 minutes (converted into seconds)
    ranges = [300, 600, 900]
    polygons = []

    for store in res:
        # Get isochrones for each row
        # Update corresponding columns
        c_x = store[1]
        c_y = store[2]
        polygons = [] 
        for profile in profiles:
            for r in ranges:
                iso_result = clior.isochrones(locations=[(c_x, c_y)],
                                              profile=profile,
                                              range=[r],
                                              dry_run = False)
                geom = iso_result['features'][0]['geometry']
                polygons.append(json.dumps(geom))
                
        update_sql = """UPDATE "GEO_DATA"."STORES"
                        SET ISOCRON5MC  = ST_GEOMFROMGEOJSON(?, 4326), 
                            ISOCRON10MC = ST_GEOMFROMGEOJSON(?, 4326), 
                            ISOCRON15MC = ST_GEOMFROMGEOJSON(?, 4326),
                            ISOCRON5MW  = ST_GEOMFROMGEOJSON(?, 4326),
                            ISOCRON10MW = ST_GEOMFROMGEOJSON(?, 4326),
                            ISOCRON15MW = ST_GEOMFROMGEOJSON(?, 4326)
                        WHERE ID = %d """ %(store[0])
        cur2 = con.cursor()
        cur2.execute(update_sql, polygons)

The polygons from openrouteservice are returned as a Polygon object with its list of coordinates (one for each point) in GeoJSON format:

{'coordinates': [[[-0.446451, 39.476014],
   [-0.445327, 39.475588],
   [-0.444924, 39.475452],
   [-0.438826, 39.473127],
   [-0.433999, 39.4713],
   [-0.42627, 39.466775],
   [-0.421565, 39.463238],
   [-0.416286, 39.460046],
   ...
   [-0.446451, 39.476014]]],
 'type': 'Polygon'}

For storing them into the ST_GEOMETRY column in Hana we can use ST_GeomFromGeoJSON() method directly as shown in the python program:

        update_sql = """UPDATE "GEO_DATA"."STORES"
                        SET ISOCRON5MC  = ST_GEOMFROMGEOJSON(?, 4326), 
                            ISOCRON10MC = ST_GEOMFROMGEOJSON(?, 4326), 
                            ISOCRON15MC = ST_GEOMFROMGEOJSON(?, 4326),
                            ISOCRON5MW  = ST_GEOMFROMGEOJSON(?, 4326),
                            ISOCRON10MW = ST_GEOMFROMGEOJSON(?, 4326),
                            ISOCRON15MW = ST_GEOMFROMGEOJSON(?, 4326)
                        WHERE ID = %d """ %(store[0])

For a graphical representation of the reachable areas in the Jupyter notebook we decided to use folium python library:

◈ It draws the maps using openstreetmap as maps provider.
◈ “Markers” and “Polygons” can be added quite easily.

Let’s see an example with the Stores table and one of the reachable polygons (for example the one for “5 minutes by car”)

We retrieve the location using directly the VL_X and VL_Y fields and for the polygon we could use the method to retrieve the Hana spatial object in GeoJSON text format:

cur.execute("""SELECT ISOCRON5MC.ST_AsGeoJSON(),
                       VL_X,
                       VL_Y,
                       NAME FROM STORES
                WHERE ID = 30111 """)
res = cur.fetchall()

Showing the store location is straight forward (we need to be careful as folium expect the coordinates in reverse order, first latitude then longitude)

import folium
m = folium.Map(location=[res[0][2], res[0][1]], zoom_start=14)
folium.Marker([res[0][2], res[0][1]]).add_to(m)
m

SAP HANA Geospatial, SAP HANA Study Materials, SAP HANA Certifications, SAP HANA Tutorial and Materials

Now to draw the area for the isochrone we just add the polygon to the same map object. But a small preparation step is needed to extract the coordinates for the points from the GeoJSON format and again reverse the values from (longitude,latitude) to (latitude,longitude)

The GeoJSON text output looks like the following:

'{"type": "Polygon", "coordinates": [[[-4.634976, 36.546284], [-4.634647, 36.544776], [-4.633848, 36.543983], [-4.628487, 36.534139], [-4.62702, 36.532061], [-4.622769, 36.530737], [-4.617062, 36.536948], [-4.616745, 36.540534], ..., [-4.634976, 36.546284]]]}'

With a few python lines we can change this to the expected format required by folium:

import json

list_coords = json.loads(res[0][0])
list_coords = list_coords['coordinates'][0]
list_coords = [(y,x) for x,y in list_coords]
list_coords
[[-4.634976, 36.546284], [-4.634647, 36.544776], [-4.633848, 36.543983], [-4.628487, 36.534139], [-4.62702, 36.532061], [-4.622769, 36.530737], [-4.617062, 36.536948], [-4.616745, 36.540534], ..., [-4.634976, 36.546284]] 

Just adding that to the folium map object makes the isochrone polygon visible:

folium.vector_layers.Polygon(list_coords,
                             color='#ffd600',
                             fill_color='#ffd600',
                             fill_opacity=0.5,
                             weight=3).add_to(m)
m

SAP HANA Geospatial, SAP HANA Study Materials, SAP HANA Certifications, SAP HANA Tutorial and Materials

Answering the questions


Now that all the data is ready we can answer the proposed questions with very simple SQL queries by using Hana geospatial predicates.

For the exercise, we picked up randomly one of the customers from the Clients table, for example, the one with ID = 475903

1. Which is the store closest to the customer?

– The store with ID = 30216

2. Is the store located less than 1 km?

– Yes.

The answers can be easily derived just using the MIN() SQL aggregate function with the ST_Distance() spatial method joining both tables for getting the distance for the closest store:

SELECT MIN(stores.COORDINATES.ST_DISTANCE(clients.COORDINATES, 'kilometer'))
FROM "GEO_DATA"."STORES" stores,
     "GEO_DATA"."CLIENTS" clients
WHERE clients."ID" = 475903

| MIN(STORES.COORDI                                                     |
| --------------------------------------------------------------------- |
|                                                     0.874822458959487 |
1 row selected (overall time 56.818 msec; server time 31.743 msec)

So the closest store is located at 0.87 km (so less than 1 km) from the customer.

For obtaining the store details like the ID or even the position to plot it in the map a query like the following can be used:

SELECT stores."ID",
       Stores."VL_X",
       stores."VL_Y" 
FROM "GEO_DATA"."STORES" stores, 
     (SELECT stores."ID" "ID",
             MIN(stores.COORDINATES.ST_DISTANCE(clients.COORDINATES, 'kilometer'))
      FROM
          "GEO_DATA"."STORES" stores,
          "GEO_DATA"."CLIENTS" clients
      WHERE clients."ID" = 475903
      GROUP BY stores."ID"
      ORDER BY 2
      LIMIT 1
     ) closest_stores
WHERE stores."ID" = closest_stores."ID"

| ID          | VL_X                    | VL_Y                    |
| ----------- | ----------------------- | ----------------------- |
|       30216 |               -3.656899 |       40.54659199999999 |
1 row selected (overall time 277.303 msec; server time 198.717 msec)

Adding the location as a marker to the folium map shows both client and store together:

folium.Marker([res[0][5], res[0][4]],
              popup="<i>"+ res[0][1] + "</i>", tooltip="Closest Store",
              icon=folium.Icon(color='green')).add_to(m)

SAP HANA Geospatial, SAP HANA Study Materials, SAP HANA Certifications, SAP HANA Tutorial and Materials

3. Could a customer reach the store by walking within 5 minutes?

– No.

4. Could a customer reach the store by car within 10 minutes?

– Yes.

For answering those, the isochrones are needed, and we can use the Hana spatial predicate “covered by” (ST_CoveredBy):

SELECT
    clients.COORDINATES.ST_CoveredBy(stores.ISOCRON5MW) 
        as "5 minutes walking",
    clients.COORDINATES.ST_CoveredBy(stores.ISOCRON10MW)
        as "10 minutes walking",
    clients.COORDINATES.ST_CoveredBy(stores.ISOCRON15MW)
        as "15 minutes walking",
    clients.COORDINATES.ST_CoveredBy(stores.ISOCRON5MC)
        as "5 minutes car",
    clients.COORDINATES.ST_CoveredBy(stores.ISOCRON10MC)
        as "10 minutes car",
    clients.COORDINATES.ST_CoveredBy(stores.ISOCRON15MC)
       as "15 minutes car"
FROM "GEO_DATA"."STORES" stores, "GEO_DATA"."CLIENTS" clients
WHERE
    clients."ID" = 475903 AND
    stores."ID" = 30216

| 5 minutes walking  | 10 minutes walking  | 15 minutes walking  | 5 minutes car  | 10 minutes car  | 15 minutes car  |
| ------------------ | ------------------- | ------------------- | -------------- | --------------- | --------------- |
|                  0 |                   0 |                   1 |              1 |               1 |               1 |
1 row selected (overall time 15.151 msec; server time 4983 usec)

In this case we see that the customer can reach the store by car (in 5,10 and 15 minutes) and walking in 15 minutes.

Adding in the folium map both locations (customer and store) and polygons for “15 minutes walking” and “15 minutes by car” shows the following:

SAP HANA Geospatial, SAP HANA Study Materials, SAP HANA Certifications, SAP HANA Tutorial and Materials

5. How many stores are located less than 20 kilometers away from the customer?

– 43

The last question is again addressed by a simple query combining the SQL COUNT() aggregate and the distance spatial predicate (ST_Distance)

SELECT COUNT(*)
FROM
    "GEO_DATA"."STORES" stores,
    "GEO_DATA"."CLIENTS" clients
WHERE
    clients."ID" = 475903 AND
    stores.COORDINATES.ST_DISTANCE(clients.COORDINATES, 'kilometer') < 20
| COUNT(*)             |
| -------------------- |
|                   43 |
1 row selected (overall time 52.003 msec; server time 25.774 msec)

No comments:

Post a Comment