One of these is SAP HANA SQLScript, which is used to develop high-performance stored procedures for the SAP HANA in-memory database. Unfortunately, SAP did not provide any static code analysis for SQLScript (in contrast to SAP Code Inspector for ABAP). Moreover, there are no precise guidelines how to develop good SQLScript code so far. In this post I'll present our initial thoughts on assessing the code quality of SQLScript.
The starting point to identify relevant static checks was the SAP HANA SQLScript Reference, which already mentions some (very general) best practices (chapter 13). Some of the recommendations there are very easy to detect automatically, e.g.
The first rule in the SAP best practices is more interesting: Reduce complexity of SQL statements. Here, the recommendation is to extract sub-queries from complex queries and assign these to table variables. Sub-queries are SELECTs which occur within an outer SQL statement. These can be used in most places where a table name could be used, especially after the FROM clauses. Since sub-queries can be as complex as any SELECT statement and may contain sub-queries again, the heavy use of sub-queries may result in code that is unreadable.
Here is an example of nested queries (adopted from the SAP HANA SQLSCript Reference):
SELECT * FROM
(
SELECT publisher, COUNT (*) AS cnt FROM :books
GROUP BY publisher
)
WHERE cnt >=
(
SELECT MAX (cnt) FROM
(
SELECT publisher, COUNT (*) AS cnt FROM :books
GROUP BY publisher
)
);
This can be improved by extracting the sub-query (note: the same sub-query is used twice here).
books_per_publisher =
SELECT publisher, COUNT (*) AS cnt FROM :books
GROUP BY publisher;
SELECT * FROM :books_per_publisher
WHERE cnt >=
(
SELECT MAX (cnt) FROM :books_per_publisher
);
The extraction of sub-queries will result in more comprehensive programs: Using descriptive names for the table variables increases the readability of the code a lot, since it documents the intention of the sub-query. Thus, it would be good to have automatic checks to avoid such complex SQL statements. To cover this demand, we have derived checks to identify
Non-trivial Sub-queries
Since there is hardly any benefit to extract very simple SELECT sub-queries (e.g. selecting rows from a table only), a check which reports any sub-query is not very useful in practice. Only non-trivial sub-queries should be regarded as finding. Thus it is required to distinguish between trivial and non-trivial sub-queries. For now we applied the following definition: Non-trivial queries are either nested or contain any of the following clauses: WHERE, JOIN, GROUP BY, HAVING or UNION. Otherwise the sub-query is regraded as trivial.
Deeply Nested Sub-queries
Sub-queries may recursively contain sub-queries, which make the code even harder to understand. Thus I've added a separate check to identify deeply nested sub-queries. The nesting depth level is configurable. In contrast to the check for non-trivial queries (which already covers nested queries) this checks allows explicitly identifying deeply nested queries, which can be reported with higher severity (e.g. red instead of yellow).
Sub-queries without parentheses
A third check came to life as a side-effect during implementation of the complex sub-queries checks, since it backups the detection of sub-queries: The check searches for sub-queries, which are not enclosed in parentheses. Usually all sub-queries should be enclosed in parentheses, otherwise it is hard to understand which clauses belong to the sub-query and which to the surrounding query. This check reports missing parentheses, but it can be suppressed for specific SQL statements (e.g. within INSERT statements)
These checks will lead to findings as in the following image:
The starting point to identify relevant static checks was the SAP HANA SQLScript Reference, which already mentions some (very general) best practices (chapter 13). Some of the recommendations there are very easy to detect automatically, e.g.
- Avoid using cursors
- Avoid using dynamic SQL
Complex SQL Statements
The first rule in the SAP best practices is more interesting: Reduce complexity of SQL statements. Here, the recommendation is to extract sub-queries from complex queries and assign these to table variables. Sub-queries are SELECTs which occur within an outer SQL statement. These can be used in most places where a table name could be used, especially after the FROM clauses. Since sub-queries can be as complex as any SELECT statement and may contain sub-queries again, the heavy use of sub-queries may result in code that is unreadable.
Here is an example of nested queries (adopted from the SAP HANA SQLSCript Reference):
SELECT * FROM
(
SELECT publisher, COUNT (*) AS cnt FROM :books
GROUP BY publisher
)
WHERE cnt >=
(
SELECT MAX (cnt) FROM
(
SELECT publisher, COUNT (*) AS cnt FROM :books
GROUP BY publisher
)
);
This can be improved by extracting the sub-query (note: the same sub-query is used twice here).
books_per_publisher =
SELECT publisher, COUNT (*) AS cnt FROM :books
GROUP BY publisher;
SELECT * FROM :books_per_publisher
WHERE cnt >=
(
SELECT MAX (cnt) FROM :books_per_publisher
);
The extraction of sub-queries will result in more comprehensive programs: Using descriptive names for the table variables increases the readability of the code a lot, since it documents the intention of the sub-query. Thus, it would be good to have automatic checks to avoid such complex SQL statements. To cover this demand, we have derived checks to identify
- Non-trivial sub-queries
- Deeply nested sub-queries
- Sub-queries without parentheses
Non-trivial Sub-queries
Since there is hardly any benefit to extract very simple SELECT sub-queries (e.g. selecting rows from a table only), a check which reports any sub-query is not very useful in practice. Only non-trivial sub-queries should be regarded as finding. Thus it is required to distinguish between trivial and non-trivial sub-queries. For now we applied the following definition: Non-trivial queries are either nested or contain any of the following clauses: WHERE, JOIN, GROUP BY, HAVING or UNION. Otherwise the sub-query is regraded as trivial.
Deeply Nested Sub-queries
Sub-queries may recursively contain sub-queries, which make the code even harder to understand. Thus I've added a separate check to identify deeply nested sub-queries. The nesting depth level is configurable. In contrast to the check for non-trivial queries (which already covers nested queries) this checks allows explicitly identifying deeply nested queries, which can be reported with higher severity (e.g. red instead of yellow).
Sub-queries without parentheses
A third check came to life as a side-effect during implementation of the complex sub-queries checks, since it backups the detection of sub-queries: The check searches for sub-queries, which are not enclosed in parentheses. Usually all sub-queries should be enclosed in parentheses, otherwise it is hard to understand which clauses belong to the sub-query and which to the surrounding query. This check reports missing parentheses, but it can be suppressed for specific SQL statements (e.g. within INSERT statements)
These checks will lead to findings as in the following image:
Our Standard Analysis
Further we apply our standard analyses which we already use for ABAP also to SQLScript, e.g.
- Duplicated Code
- Deeply nested procedures / functions
- Long procedures / functions
For now, we are still evaluating the practical use of these checks and probably we'll and up with even more checks. Hopefully this will help to keep SQLScript code easy to maintain.
Source: scn.sap.com
No comments:
Post a Comment