Example Use Case for the Document Store
Overview
In part one of this blog series the Document Store was introduced. The most interesting part about this is the possibility to interact with the relational world inside SAP HANA, e.g. joining a collection with a table. Since this is a very abstract feature, the following sections will introduce an example use case including SQL queries for a better and deeper understanding of the SAP HANA JSON Document Store.
Introduction of the Use Case
A web shop for a various kind of products uses SAP HANA as its database. The data is being stored in a simplified data model. This rudimentary model helps to focus on the key points that this blog wants to describe.
◈ The Customer table stores all information about the customers of the shop like the name and the address for delivery purposes.
◈ Orders are stored in the Order table with a zero to many relationship to the customer table. Since the web shop only needs information about the creation date of an order, only this besides an identifier and the affected customer is saved.
◈ Details of an order like the bought products and their amount are stored in OrderPosition table. Each order position belongs to one order and each order needs at least one order position.
◈ Information about the offered products of the web shop are stored in the Products table with information like their name and their price.
Scenario
The web shop becomes now more popular and wants to improve the user experience on its website. For that, additional information about the products like artist and genre for CD’s or the number of pages and the edition of books should be shown. Besides the simple showing of additional fields, this should also be included into the search functionality, so that customers can search for artists or genre.
Solution 1: Use relational possibilities
Without using any additional SAP HANA non-relational technologies or secondary databases, the following ways are possible to store additional data in a searchable way.
Single Table Inheritance
To enable the database structure to store additional attributes, the single table inheritance pattern could be used. Each new product attribute is stored as a new column in the Product table. This pattern leads to many NULL values and denormalization, because with any new product and its new attributes, the table gets more complex and confusing.
Class Table Inheritance
The class table inheritance pattern will help to get rid of the NULL values and denormalization. For each kind of product, a new table is created with all the needed attributes.
This solution leads to many tables, especially when some attributes only needs to be saved for a subgroup of products categories. In this case, a new hierarchy level is needed. In addition, the application needs to know in which table the concrete product information is stored. If this information isn’t directly available (e.g. by context), it could be stored as an additional attribute in the base table or by looking which table contains the id.
Beside the class table inheritance, the concrete table inheritance is also usable. These two approaches differ in the existence of a base class; the concrete inheritance doesn’t use one, so all common attributes are stored in the product tables.
Store as an additional column
As a last option, the data can be stored as one string or LOB inside a separate column in the products table. The data could be formatted as simple key-value pairs, JSON or XML. To extract certain field values, e.g. SAP HANA fulltext search or JSON functions could be used, but without using supportive tools the information is mostly hidden.
Solution 2: Use the SAP HANA JSON Document Store
The relational data model solutions for this task are limited, sometimes complex and not flexible at all. The usage of SAP HANA JSON Document Store will help to solve the disadvantages in an easy and flexible way.
For that the Product table is replaced with a collection. Each document in the table should have the same fields as the table and all other attributes optionally.
{ "id": 469,
"price": 69.05,
"name": "Product 469",
"author": "Muriel Towe",
"pages": 759,
"edition": 4
}
{ "id": 501,
"price": 91.18,
"name": "Product 501",
"artist": "Emile Gain",
"albumname": "AlbumName 1",
"genre": "Country"
}
Alternatively, to the usage of only one collection, multiple collections can be used. Moreover, the Product table can be used as the parent table and only the additional fields are stored inside a collection.
Since the Document Store is a document-oriented database it has no fixed structure which allows dynamic adding or removing of fields for individual documents. This flexibility applies also to the “mandatory” fields because the Document Store can’t force the existence of fields. In that case the application has to make sure that the fields exist.
SAP HANA allows joins between collections and tables, so nothing else needs to be changed to the web shop to offer more and more kind of products. The major advantages of this solution are the easy development and fast deployment combined with the probability to change the data model in a more flexible way. Because the Document Store is an integral part of SAP HANA, no secondary database is needed. If JSON is used by the application, the documents could be used from the Document Store without additional formatting or parsing.
Use Case Example Queries
In this section example queries are presented which might be used by the web shop with focus on the special functionalities of the Document Store.
Get the favorite music genre of a customer
The following queries will return the most bought music genre of the customer with the id 2.
With relational possibilities (Class Table Inheritance):
SELECT TOP 1 CD.Genre, COUNT(*) AS "COUNT" FROM WEBSHOP."Customer" C
JOIN WEBSHOP."Order" O On O.CustomerID = C.ID
JOIN WEBSHOP."OrderPosition" OP ON OP.OrderID = O.ID
JOIN WEBSHOP."CD" CD ON CD.ID = OP.ProductID
WHERE C.ID = '2'
GROUP BY CD.Genre
ORDER BY COUNT DESC;
With the Document Store:
WITH productView AS (SELECT "id", "genre" FROM WEBSHOP."ProductCollection" WHERE "genre" IS SET)
SELECT TOP 1 P."genre", COUNT(*) AS "COUNT" FROM WEBSHOP."Customer" C
JOIN WEBSHOP."Order" O On O.CustomerID = C.ID
JOIN WEBSHOP."OrderPosition" OP ON OP.OrderID = O.ID
JOIN productView P ON P."id" = OP.ProductID
WHERE C.ID = '2'
GROUP BY P."genre"
ORDER BY COUNT DESC;
These two statements producing the same result but have a small difference in their SQL. The second statement must use the WITH-SELECT statement to define the fields of the documents selected by the query. The additional WHERE condition in the WITH clause filters all documents without the field genre.
Add and remove fields from documents
The UPDATE statement allows to add and remove fields to the documents. For example, a new attribute called authorDetails for a certain author (in this case Ollie Nesey) needs to be added to all documents which are containing this author:
UPDATE WEBSHOP."ProductCollection"
SET "authorDetails" = 'This author is special'
WHERE "author" = 'Ollie Nesey';
To remove this field again, UNSET can be used.
UPDATE WEBSHOP."ProductCollection" UNSET "authorDetails";
Return JSON in a SELECT
SELECT can be used to return individual fields, as shown above. Using SELECT *, the entire document is being returned as a string. Moreover, it is possible to define the structure of JSON documents to be returned:
SELECT {
"productID": "id",
"articleName": "name",
"articlePrice": "price",
"articleType": 'book'}
FROM WEBSHOP."ProductCollection"
WHERE "pages" IS SET;
This returns JSON values like:
{ "productID": 469,
"articleName": "Product 469",
"articlePrice": 69.05,
"articleType": "book"
}
Performance Recommendation
When it’s needed to use tables and the Document Store together, it is highly recommended to use precondition filters on the used collection as much as possible, as that will increase the overall performance and decrease the memory consumption as well.
No comments:
Post a Comment