Data science is quite similar to cooking and making your favorite meal. While we can usually simply go to our local supermarket and acquire our raw ingredients, it is often not that easy for a data scientist. Imagen before cooking your favorite meal you didn’t know if the supermarket is open or if the food is even edible. Hence, before the fun part the cooking and eating can start, we need to acquire, organize and structure our data. This is by my experience one of the crucial parts during a Machine Learning use case and usually takes most of the time. Often, the data does not just reside locally in a csv or excel file on our laptop but originally lies in a database like SAP HANA. To work on a database like SAP HANA you usually work with the Structured Query Language (SQL) which is over 40 years old. But as a huge R fan I want to stay in my used environment and not switch back and forth. For example, after the first modeling phase I may have to go back into the data preparation phase to engineer new features. Hence, I want to be more flexible but still use the power of SAP HANA. The R package dbplyr brings both worlds together and is designed to work with database tables as if they were local data frames in R. The goal of the package dbplyr is to automatically generate certain SQL statements for you, focusing on select statements. This means you can continue to use the functions out of the dplyr package with which you are familiar with.