We’ve encountered several situations recently where it was required to acquire information from external systems that already had an API to expose data with web services.
The usual way would be writting a script or using an ETL, but those options had some disadvantages:
◆ Provide obsolete data, unless scheduled frequently, but then it creates a workload to refresh a dataset that might not be used
◆ Difficult to pass parameters and control from SAP HANA
With data federation in SAP HANA SDI, we can make virtual tables, meaning that they can be manipulated or used like regular tables, but no data is actually stored. When a query references a virtual table, the data is sent by an agent connected to the source system.
In the example below, we’ll make a virtual table of 5 columns that is populated at every query by the content coming from a web service.
SAP HANA SDI SDK provides the ability implement any processing and formatting logic with an interface at the database level, making it seamless to embed in views, odata service or any HANA database artifact.
After browsing through the code of the adapters published on github, it was a bit surprised by the amount of functions exposed, allowing to control most aspects of the behavior of the adapter. My situation required to collect data from multiple web services, and I didn’t want to write each time all mandatory adapter functions so I wrote a few abstract classes to factor in some tasks:
◈ expose data source parameters to the data provisioning UI in SAP HANA
◈ make web services appear as virtual tables or virtual functions
◈ send rows to SAP HANA
◈ And some minor tasks: initializing an http client, json parsing
Here’s a useful class diagram, the adapter base class a few methods that must be implemented to create a working adapter:
The usual way would be writting a script or using an ETL, but those options had some disadvantages:
◆ Provide obsolete data, unless scheduled frequently, but then it creates a workload to refresh a dataset that might not be used
◆ Difficult to pass parameters and control from SAP HANA
With data federation in SAP HANA SDI, we can make virtual tables, meaning that they can be manipulated or used like regular tables, but no data is actually stored. When a query references a virtual table, the data is sent by an agent connected to the source system.
In the example below, we’ll make a virtual table of 5 columns that is populated at every query by the content coming from a web service.
SAP HANA SDI SDK provides the ability implement any processing and formatting logic with an interface at the database level, making it seamless to embed in views, odata service or any HANA database artifact.
Factoring code to minimize effort of adding new web services
After browsing through the code of the adapters published on github, it was a bit surprised by the amount of functions exposed, allowing to control most aspects of the behavior of the adapter. My situation required to collect data from multiple web services, and I didn’t want to write each time all mandatory adapter functions so I wrote a few abstract classes to factor in some tasks:
◈ expose data source parameters to the data provisioning UI in SAP HANA
◈ make web services appear as virtual tables or virtual functions
◈ send rows to SAP HANA
◈ And some minor tasks: initializing an http client, json parsing
Here’s a useful class diagram, the adapter base class a few methods that must be implemented to create a working adapter:
The two dotted lines represent class inheritance. So for every web service functionnalty there are three parts to write :
1. Describe input parameters and result set format
2. Make the Request
3. Transform the output into a result set
Example with a file hosted on a public web server
Here’s an example that hopefully won’t draw too many copyright concerns, a customer needed to fetch some financial information from company filings available publicly on the website of the SEC.
There is a file called master index that is updated regularly. Instead of scripting a download, transformation and import of the file, we’ll make the content of the file available as a database table.
The content starts with 11 lines of header then the values are separated by pipes :
Description: Master Index of EDGAR Dissemination Feed
Last Data Received: June 27, 2018
Comments: webmaster@sec.gov
Anonymous FTP: ftp://ftp.sec.gov/edgar/
Cloud HTTP: https://www.sec.gov/Archives/
CIK|Company Name|Form Type|Date Filed|Filename
--------------------------------------------------------------------------------
1000045|NICHOLAS FINANCIAL INC|10-K|2018-06-27|edgar/data/1000045/0001193125-18-205637.txt
1000045|NICHOLAS FINANCIAL INC|3|2018-04-02|edgar/data/1000045/0001736213-18-000001.txt
1000045|NICHOLAS FINANCIAL INC|3|2018-06-27|edgar/data/1000045/0001357521-18-000003.txt
1000045|NICHOLAS FINANCIAL INC|4|2018-04-27|edgar/data/1000045/0001609591-18-000002.txt
1000045|NICHOLAS FINANCIAL INC|8-K|2018-04-04|edgar/data/1000045/000119312 5-18-106701.txt
We will call this webservice SED Edgar Master
Webservice Description
We implement the abstract class WebserviceConfig in SECEdgarMasterService. In this case, there is no input parameters, we simply define the webservice as a virtual table of 5 columns.
public class SECEdgarMasterService extends WebserviceConfig{
final static String DESC="Master Index of EDGAR Dissemination Feed";
public SECEdgarMasterService() {
super("SEC_EDGAR_MASTER",
DESC,
new SECEdgarMasterRequest(),
new SECEdgarMasterResponse());
}
@Override
public Metadata getMetadata() throws AdapterException {
TableMetadata tm=new TableMetadata();
tm.setName(name);
tm.addDescription(AdapterLang.ENGLISH, DESC);
ArrayList<Column> cols = new ArrayList<Column>();
cols.add(new Column("CIK", DataType.INTEGER));
cols.add(new Column("COMPANY_NAME", DataType.NVARCHAR, 128));
cols.add(new Column("FORM_TYPE", DataType.NVARCHAR, 20));
cols.add(new Column("DATE_FILED", DataType.NVARCHAR, 10));
cols.add(new Column("FILE_NAME", DataType.NVARCHAR, 128));
tm.setColumns(cols);
return tm;
}
@Override
public NodeType getNodeType() {
return NodeType.TABLE;
}
}
Making the Request
We implement the abstract class WebserviceRequestHandler in SECEdgarMasterRequest. It consists of opening the URL and returning the result as a CloseableHttpResponse
public class SECEdgarMasterRequest extends WebserviceRequestHandler{
final static String URL="https://www.sec.gov/Archives/edgar/full-index/master.idx";
@Override
public CloseableHttpResponse exec(Metadata func, HashMap<String, String> connProps)
throws AdapterException, ClientProtocolException, IOException, URISyntaxException {
CloseableHttpClient httpclient=WebserviceRequestHandler.getDefaultHttpClient();
HttpGet httpget = new HttpGet(URL);
CloseableHttpResponse resp = httpclient.execute(httpget);
logger.debug("HttpAuthRequest finished for "+URL);
return resp;
}
@Override
public String getURL(Metadata func) throws AdapterException {
return URL;
}
}
Formatting the Result
We implement the abstract class WebserviceResponseHandler in SECEdgarMasterResponse. It consumes the output of the http request line by line. After skipping the header, it builds a list of records and returns it. The WebserviceAdapter class will then handle sending the records to SAP HANA in bulk.
public class SECEdgarMasterResponse extends WebserviceResponseHandler {
static final int SKIP_LINES=12;
static final String FIELD_SEPARATOR="\\|";
@Override
public LinkedList<WebserviceResponseRecord> handleResponse(CloseableHttpResponse r, String url)
throws AdapterException, ParseException, IOException {
BufferedReader in = new BufferedReader(new InputStreamReader(r.getEntity().getContent()));
String line;
String[] fields;
LinkedList<WebserviceResponseRecord> res = new LinkedList<WebserviceResponseRecord>();
for(int i=0;i<SKIP_LINES;i++)
in.readLine();
while((line = in.readLine()) != null) {
fields=line.split(FIELD_SEPARATOR);
if(fields.length==5){
res.add(new SECEdgarMasterRecord(Integer.parseInt(fields[0]),
fields[1],
fields[2],
fields[3],
fields[4]
));
}
else {
logger.debug("strange row: "+line);
}
}
in.close();
return res;
}
/**
* Describes a tag
* @author I063909
*
*/
public static class SECEdgarMasterRecord extends WebserviceResponseRecord{
int cik;
String cName;
String fType;
String dt;
String fName;
public SECEdgarMasterRecord(int cik, String cName, String fType, String dt, String fName){
this.cik=cik;
this.cName=cName;
this.fType=fType;
this.dt=dt;
this.fName=fName;
}
@Override
public void appendTo(AdapterRowSet rows) throws AdapterException {
AdapterRow row=rows.newRow();
row.setColumnValue(0, cik);
row.setColumnValue(1, cName);
row.setColumnValue(2, fType);
row.setColumnValue(3, dt);
row.setColumnValue(4, fName);
}
}
}
The results
And that’s it for coding ! Then we build the adapter, deploy the jar at an exsiting data provisioning agent as a custom adapter, and register the adapter in SAP HANA. When browsing the remote source associated with the adapter, we can see 4 webservices exposed as virtual tables, including our example one.
We then create a virtual table pointing to the SEC_EDGAR_MASTER table, and we can start to explore the content of the table.
Fetching the first row of the virtual table takes 3 seconds, and fetching all 227 000 rows takes 3.2 seconds.
No comments:
Post a Comment