Wednesday, 24 November 2021

Linking Latitude and Longitude to Countries with SAP HANA Cloud

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

The actual solution of the question above is super-easy and comes down to a single select statement using a spatial join. So, the major part of this explanation will be about data import of the following files:

1. We need a dataset with global latitude and longitude data. On Kaggle there is a nice data set with Global Power Plants. This dataset already contains a country column, which we will of course ignore (otherwise the question above does not make much sense). Instead of this, you can also use your own dataset, where the problem statement makes more sense.

2. To determine to which country a coordinate belongs, we need to have an idea of the extent of each country. The easiest way is to upload a shapefile with global country borders to our SAP HANA Cloud database.

Import a CSV file with Latitude and Longitude Values

The first part is not even related to the initial question, since you would typically already have some lat/lon data in your database when asking this question. Still, I think it’s a good idea to make the upload part of the description, so that this blogs works as a self-contained end-to-end example.

Please skip this part in case you want to leverage your own lat/lon dataset.

1. Download and unzip the Global Power Plants dataset from Kaggle (requires a free account).

2. Open your SAP HANA Cloud instance in Database Explorer

3. Right-click your instance and choose ‘Import Data’

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

4. Choose ‘Import Data’ again

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

5. Select the downloaded CSV file

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

6. Choose your import target

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

7. Adjust the type mapping. Make sure to change Capacity, Latitude and Longitude to Double and increase the length of the text types (e.g. to 500).

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

8. Start the import and wait for the records to be imported into table ‘PLANTS’.

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

You wouldn’t believe it, but this was the most complex part! From now on, things will get easier.

Import a Shapefile with Global Country Borders


Next, we need to import the global country borders. The easiest way to do this is by using a standard exchange format for spatial data – the so called Shapefile.

1. Download the shapefile representation of the World Administrative Boundaries.
2. Do not unzip this file. The zip format is just fine for import.
3. As before, open Database Explorer, right-click your instance and choose ‘Import Data’.
4. This time select ‘Import Esri Shapefiles’

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

5. Select the downloaded zip file

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

6. Select the target schema for the import. I am using the same target as for the CSV import.

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

7. As a Spatial Reference Identifier choose “WGS 84 (planar)” with SRID 1000004326.

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

8. Start the import and wait for the records to be imported into table ‘world-administrative-boundaries’.
We have now imported all the data for our example.

Determine the Country for each Power Plant


Again, our power plant dataset already has the country information included. We will ignore this information and add the country by the knowledge of latitude and longitude information. Of course, we can use the country information of the dataset itself to double-check our results for correctness.

To match each lat/lon point with its country, we need to execute two spatial operations:

1. Create a point geometry from lat/lon information. This can be done using SQL function ST_GeomFromText. The minimal usage example looks like the following:

SELECT ST_GeomFromText('POINT(8.642057 49.293432)', 1000004326) FROM DUMMY;​

2. We do a spatial join by predicate ST_Intersects to match two entries in the respective table in case the point is contained within the polygon of the country.

Knowing these two spatial functions, we can add the country information from table ‘world-administrative-boundaries’ to our data table by executing this SQL query:

SELECT wab."iso3", wab."name", p.*
FROM PLANTS p 
LEFT JOIN "world-administrative-boundaries" wab 
ON ST_GeomFromText('POINT(' || p."Longitude" || ' ' || p."Latitude" || ')', 1000004326).ST_Intersects(wab.SHAPE) = 1

As a result we added columns “iso3” (the ISO code of the country) as well as the standardized “name” to our result set. We can also see that the information matches the datasets country in most cases. Exception can be found in the bordering regions of countries, where the level of detail of the used world border are probably not sufficient.

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

This solved our initial question. So everyone, who is short on time, can safely stop reading.

⚠️ In this example we are generating the geometries on the fly instead of using the built-in column type ST_Geometry. Thus SAP HANA cannot make use of spatial indices, which results in decreased performance. I strongly recommend to use persistent geometries with type ST_Geometry for productive scenarios.

Add Additional Insights By Spatial Aggregation


Now, that we already have the data imported and sufficiently answered our initial question, we may as well add a little analysis on top. The following view helps us identify the countries with the most capacity by power plants. Note, that we aggregate over the spatial dimension.

CREATE OR REPLACE VIEW CAPA_PER_COUNTRY AS (
SELECT wab."iso3", wab."name", wab.SHAPE, SUM(p."Capacity (MW)") AS CAPACITY
FROM PLANTS p 
JOIN "world-administrative-boundaries" wab 
ON ST_GeomFromText('POINT(' || p."Longitude" || ' ' || p."Latitude" || ')', 1000004326).ST_Intersects(wab.SHAPE) = 1
GROUP BY wab."iso3", wab."name", wab.SHAPE
)

The result doesn’t come as a total surprise: The biggest countries are the ones with the highest capacity.

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

To make the results more interesting, we could generate the same report, but this time ordered by capacity per country size. To evaluate the size of a country, we can use spatial function ST_Area.

SELECT TOP 10 * 
FROM CAPA_PER_COUNTRY 
ORDER BY CAPACITY/SHAPE.ST_AREA() DESC

The result of this analysis looks significantly different.

SAP HANA Exam, SAP HANA Tutorial and Material, SAP HANA Certification, SAP HANA Learning, SAP HANA Career, SAP HANA Jobs

We could have easily achieved the top list by capacity by just using the Global Power Plants dataset and without doing our spatial join. However, the last analysis (capacity per area) can only be done knowing the actual administrative boundaries.

Source: sap.com

No comments:

Post a Comment