Friday, 3 May 2024

Cost optimized SAP HANA DR options on Google Cloud

Abstract


Business Continuity is of utmost importance for any organization. A well defined High Availability (HA) & Disaster Recovery (DR) strategy ensures business critical SAP Applications accessibility during any planned or unplanned outage. SAP HANA database being a central component of a SAP Application is configured with relevant HA/DR setup to make the business data available at secondary node/site to ensure business continuity.

HA for HANA database is being used as fault tolerance mostly for any infra related failures where HANA database fails over to secondary hot standby node deployed in cluster mode within a single region. RPO and RTO are almost zero in this case as most of the steps are seamless and automated with cluster management. Synchronous HANA replication across zones of the same region ensures the secondary HANA node is in sync with the primary node all the time.

In case of complete primary site loss, where Primary Node along with HA hot standby is not available, DR solution on a separate geographical location termed as Secondary or DR Region in public cloud acts as a safety net to ensure business continuity. Asynchronous HANA replication to the standby node in the secondary region keeps pushing the data from the Primary node. Manual failover is required to make the DR node as Primary.

Let's discuss some SAP HANA DR options along with their pros & cons respectively. > Performance optimized DR setup (cost challenge) For mission critical applications with requirement of minimum RTO (few minutes to hours) to have the system up & running in the secondary region with all the business data, performance optimized DR using SAP HANA HSR is deployed. In this DR setup,
computing capacity of DR HANA node is kept the same as Primary HANA Node and full data is loaded in memory of DR HANA node at the time of replication setup. Then all the delta data committed in primary, post initial load, is replicated regularly in secondary through archive logs.

As depicted in the diagram below, secondary node cpu and memory configuration is kept the same as primary node so the major chunk of data is already loaded in secondary HANA node memory. In a disaster scenario, such configuration enables the Secondary node as Primary in minimum possible time and almost no data loss. However maintaining the PRD equal hardware at secondary site for DR node adds to a significant cost.

Cost optimized SAP HANA DR options on Google Cloud

> Cost optimized DR setup (higher RTO)

To overcome the cost challenge with performance optimized DR setup, we can consider following cost optimized SAP HANA DR options. Trade off with such setups will be higher RTO but with low DR cost.

(i) Shared HANA DR node

In a DR setup where secondary node sizing is kept identical to PRD primary, generally resources on secondary node can not be used for anything else until the takeover takes place. In this shared DR setup, PRD data is not loaded in memory but in disk at secondary site and thus resources can be shared by another non-PRD SAP HANA instance e.g. QAS or TEST on the same node. In order to achieve it,
memory allocation to the PRD secondary node is restricted and the rest of the memory is allocated to non-PRD (QAS/TEST) instance.

Cost optimized SAP HANA DR options on Google Cloud

In case of a PRD Primary disaster scenario, non-PRD system QAS/TEST to be stopped, full memory/resources to be allocated to the PRD standby node, load data from disk to memory and bring it up as Primary. Apparently these steps will increase the recovery time but it has the advantage of low cost DR setup because we are using the same DR node for our QAS/TEST instance.

(ii) Lean HANA DR node

As compared to Shared DR setup, here we opt for bare minimum memory configuration for PRD secondary instance so as the replication of PRD data keeps loading in the disk. Thus we don't need a DR node to match the same sizing/memory configuration as PRD primary. As in shared DR setup, preload of column tables to memory of Standby HANA node is disabled by setting the database parameter “preload_column_tables” as false.

In case of a PRD Primary disaster scenario, DR node to be stopped & upgraded to configuration matching the PRD primary (Google VM type approved by SAP for PRD use) and full memory/resources to be allocated to PRD standby node. The value of database parameter “preload_column_tables” must be changed back to default value as true so as to load the complete data including the column tables to memory. As compared to Shared mode, this setup will have significant reduction in the DR cost as we are keeping the Standby node computing/memory to bare minimum to support the data replication from the Primary node.

Cost optimized SAP HANA DR options on Google Cloud

Minimum memory configuration/Google VM type needed for Cost optimized lean DR secondary node to be calculated as per SAP guidelines and supporting documentation (SAP Note 1999880 FAQ SAP HANA System Replication). It is advisable to run a Pilot/PoC to come up with exact memory & sizing configuration requirements for lean DR node and arrest any other unforeseen issues upfront.

(iii) Backup-Restore HANA DR

In this most cost effective HANA DR solution, no dedicated secondary HANA Node is deployed and no real time data replication from Primary HANA node happens.

However, we need to ensure that backups (HANA Database – data & logs and Application/file systems) are being stored (dual region/multi region mode) on another region identified as a DR site. RTO to bring up the Primary instance at identified DR site will be quite high as , in case of disaster scenario of Primary region not being available, one needs to set up the Servers in DR region from scratch, Install the Application along with Database and then restore the Database from the backup.

Cost optimized SAP HANA DR options on Google Cloud

We also must reserve needed computing capacity in the DR region so that required VMs can be deployed quickly with needed capacity at DR site in a disaster scenario. We also must ensure to have a network connectivity (VPN) to DR side to access the Applications.

No comments:

Post a Comment