Motivation
Global warming is affecting the temperature, precipitation levels, and soil moisture of our forest, turning them into kindling during wildfire season. To exemplify the severity of the problem, the 2018 wildfire season in the State of California USA was the most destructive and deadliest recorded thus far, with over 7500 fires burning an area of over 1,670,000 acres.
The economic impact amounted to several billion dollars, thousands of buildings destroyed, hundreds of fatalities and tens of persons injured. These reports do not even consider additional cost to the states and the federal government in fire-suppression management.
When weather conditions such as low relative humidity, strong wind, and high temperatures raise the probability of wildfires to start, authorities issue a mandatory evacuation notice and the area must be evacuated immediately.
Depending upon the location of the fire, winds, topography, vegetation, etc., officials will determine the areas to be evacuated, escape routes to use and temporary assembly areas to await transfer to a safe location.
With the vision to reduce the threat or loss of human life, real and personal property, infrastructure, and damage to the natural resources and watersheds caused by wildfires we use SAP HANA Spatial and Graph to identify optimal evacuation routes as described below.
Figure 1. Area of interest near Chapala lake, Mexico. The insert depicts four temporary assembly locations shown as dots in red. Routes will be computed from a point of origin to one of these points.
As an example, we picked a region near the lake of Chapala, Mexico as shown in Figure 1. This is an area with hazardous natural topography with shifting wind conditions that is prone to wildfires. Given four shelter points (red dots in the figure), we identified evacuation zones and calculated least cost paths in an evacuation scenario. The path costs should reflect their level of safety and are based on elevation, soil type, vegetation and avoiding water bodies, areas with steep slope, fire, etc. These are the core processing steps:
1. Convert a “Digital Elevation Model” (DEM) encoded as raster data (a .tif image) into vector data (point data in a flat file).
2. The vectorized DEM model is loaded into HANA and spatial clustering is used to aggregate all relevant cost measures along discrete cells – in our case hexagons/squares. Furthermore, no-go areas are identified such as bodies of water and fire areas.
3. Given the hexagon/square tessellation, we create a network where each hexagon/square cell is connected to its neighbors.
4. On this network, we use HANA Graph to calculate all shortest paths from a start point (e.g. evacuation point) to understand “isolines“, i.e. areas which can be reached with the same budget/cost, or catchment areas for the shelter points of Figure 1 (red dots).
5. Similarly, we use a shortest path to identify the least-cost path given a complex cost function.
Conversion of a “Digital Elevation Model” (DEM) raster to vector
Vegetation, soil moisture, land cover classification, and global elevation data is available from public sources. The DEM files contain either points (vector) or pixels (raster), with each point or pixel having an elevation value. In this example the DEM was downloaded from U.S. Geological Survey (USGS).
(a) Elevation.
(b) Vegetation.
Figure 2. Elevation data and vegetation cover.
Elevation and vegetation cover information is encoded in .tif image files where the measured data are stored in so-called bands. In the picture above (Figure 2) the measures are visualized using color coding: areas in red have high elevation/vegetation, areas in blue correspond to low elevation/vegetation. In order to work with this data in HANA, we need to vectorize: each pixel is extracted from the image along with its LAT/LON coordinates and the elevation/vegetation. The raster images had a resolution of 1440*1680 pixels rendering a table with approximately 2.5 million records, consuming 25MB of memory.
There are many tools and libraries that provide raster to vector conversion. The generated vector data can be uploaded as flat file into a HANA table which looks like this:
Figure 3. Linear combination of slope, vegetation cover, bodies of water and fire with a resolution of 1440*1680 pixels. Red areas correspond to no-go areas, either areas of fire, dense vegetation or water bodies. The blue areas correspond to flat areas and sparse vegetation. The red dots are the assembly points.
Go/no-go areas are determined by a combination of slope, vegetation cover, bodies of water and location of the fire as shown in Figure 3. The column “water” in the table refers to the bodies of water such as lakes or rivers. By default, bodies of water are not be used for evacuation. It is also important to consider buffer areas around the no-go zones. Slope, vegetation cover, and type of transportation play an important role in determining proper buffer size around no-go areas. For example, considering that the width of the path is finite, it might allow one person to travel but not a group of 100 at once, or if the vehicle is wide or narrow, heavy or light it might be able to go or not through heavy vegetation or high-slope terrain. In the next sections we will show how to create buffers around the no-go zones.
Spatial clustering and buffer calculation
SAP HANA supports spatial clustering in multiple flavors, using (rectangular) grids, k-Means, DBSCAN, and hexagon clustering. Below we give two examples of how clustering is used starting with hexagons. The simplest way to use hexagon clustering is to just define the number of hexagons you would like to have on one axis.
SELECT ST_CLUSTERID() AS “ID”, ST_CLUSTERCELL() AS “CLUSTERCELL_SHAPE”, COUNT(*) AS “COU”, AVG(“MOISTURE”) AS “AVG_MOIST_VAL”
FROM “DEM_MEX”
GROUP CLUSTER BY “PIXEL”
USING HEXAGON X CELLS 300
An example of clustering using rectangular grids is:
SELECT ST_CLUSTERID() AS “ID”, ST_CLUSTERENVELOPE() AS “SHAPE”, COUNT(*) AS “COU”, AVG(“VEGETATION”) AS “AVG_ VEGETATION _VAL”
FROM “DEM_MEX”
GROUP CLUSTER BY “SHAPE”
USING GRID X CELLS 300 Y CELLS 300
The result in the rectangular case is a 300*300 grid of squares. All grid cells carry a count and an average measure value. Figure 4 below was obtained using QGIS, an open source Geographic Information System. It shows the clustered vegetation cover – blue is low value, red is high.
Figure 4. Vegetation cover with low levels of vegetation cover corresponding to blue and high density of vegetation to read. The insert shows a detail of the hexagonal cluster shape.
(a) No-go polygons.
(b) No-go polygons + buffer.
Figure 5. The picture on the left correspond to the no-go areas. The picture on the right depicts polygons representing the no-go areas plus a buffer around.
The calculated average values of the cluster cells were used to identify no-go areas as shown in Figure 5 in red color. In the figure, a threshold between 0 and 1 is used, with 0 corresponding to flat areas with no vegetation and no fire hazard, whereas 1 correspond to either fire, body of water or a combination of slope/vegetation that makes impossible to cross through that area. Assuming that we have computed the no-go areas, we can now compute a buffer around the set of points by first computing the alphashape followed by a buffer as shown in the following queries:
SELECT ST_AlphaShapeAggr(CENTROID, 1000) AS “ALPHA_SHAPE” FROM “GRID_CLUSTERS_MEX”;
SELECT SHAPE.ST_Buffer(500) AS “ALPHA_SHAPE_BUFFER” FROM “ALPHASHAPE”;
Once we have these buffer polygons we can identify and mark the cluster cells that lie within. An example of this query is
SELECT a.“ID”, “CENTROID” FROM “GRID_CLUSTERS_MEX” a , ALPHASHAPE where “CENTROID”.ST_Within(“ALPHA_SHAPE_BUFFER”) = 1;
Create a network (graph)
Path finding is a well-known problem in network analysis. For this, we will use the HANA Graph engine – we just have to expose our data as a network, i.e. a set of nodes and edges. In our case, the nodes are the cluster cells, the edges are connections from the hexagon/rectangle (SOURCE) to its direct neighbors (TARGET). For most of the hexagons/squares there are obviously 6/8 neighbors. Now that we have the connections, we need to calculate the measures which are used in our cost function. The things we calculate are:
◉ Average slope of the SOURCE and TARGET.
◉ Average vegetation cover of the SOURCE and TARGET.
◉ Line (the line-string geometry that connects the centroid of the SOURCE to the centroid of the TARGET)
◉ Length of line (the physical distance between the centroids)
◉ Visibility of the connections (is the edge intersection a no-go area – in our case there is a lake which we cannot cross or fire hazard)
The cost function is implemented as a SQL view on top of the connection data. This is quite flexible as we can use the full breadth of SQL functionality. The cost function may look like this:
0.5 * “LINE_LENGTH” * (“SLOPE” + 0.01*“VEGETATION” + “FIRE”) AS “COST” where “LINE_LENGTH” is the distance between nodes, i.e. horizontal, vertical and at 45 degrees. The SQL view also contains restrictions:
WHERE “NOGO” = 0 AND “COST” > 0 AND < “COST” < 1.1, so we exclude certain areas from the network: no-go areas (a lake) are avoided, as well as areas above/below a certain go-no-go value based on slope, vegetation cover and fire hazard.
Isolines
The “isolines” are generated using SHORTEST_PATHS_ONE_TO_ALL in GraphScript. GraphScript is the programming model of the SAP HANA Graph engine for server-side procedures. In essence, this is one line of code:
GRAPH g_spoa = SHORTEST_PATHS_ONE_TO_ALL(:g, :v_start, “CALC_COST”, (Edge e) => DOUBLE{ RETURN :e.“COST”; })
where :g is out network, :v_start is the start point, and “COST” is the result from our cost function which is used to calculate path costs. In our example, we call this function multiple times – for four different start hexagons, i.e. evacuation points). The result of the shortest paths calculation is
◉ each hexagon/rectangle T in our area is assigned to its nearest start/evacuation point S
◉ for each T we know the cost to travel from the nearest S to T
◉ for each T we know the incoming edge that is on the shortest path from S to T, so we can backtrack for each T
In Figure 6 below we see evacuation zones for four evacuation points (red dots). The red areas are no-go zones – there is a lake and fire.
Least cost paths
Similar to the isolines calculation, we use the SHORTEST_PATH function in GraphScript to calculate an actual path from A to B. Again, this is just a few lines of code, at its core there is:
WEIGHTEDPATH<DOUBLE> p = SHORTEST_PATH (:g, :v_start, :v_target, (Edge e) => DOUBLE{ RETURN :e.“COST”; });
The result is
◉ set of nodes that make up the path
◉ set of edges which contain the number of “hops” in our path, the accumulated costs, and other measures that can be aggregated – MIN/MAX/AVG elevation
The black dotted path in Figure 6 below avoids no-go areas (lake and fire) and minimizes cost according to our defined cost function. It is obviously not the shortest path, but a path which minimizes cost (slope, vegetation cover, etc.).
Figure 6. The picture shows the evacuation areas for four evacuation points (red dots). The red areas are no-go zones. The black dots indicate a least cost path between two points.
No comments:
Post a Comment