Oracle Analytics

Get Involved. Join the Conversation.

Topic

    Olivier Sinquin
    OAC Data Gateway FAQ
    Topic posted August 13, 2019 by Olivier SinquinRed Ribbon: 250+ Points, last edited September 11, 2019, tagged Administration, Tip 
    1619 Views, 3 Comments
    Title:
    OAC Data Gateway FAQ
    Summary:
    Everything you ever wanted to know about OAC Data Gateway (with inputs from Product Management)
    Content:

    What is Data Gateway?

    Overview

    OAC can seamlessly connect to many data sources that are accessible via a public IP (e.g. Oracle ADW, Oracle SaaS, Saleforce… ).

    But some data sources are not exposed over the public internet.

    Data Gateway allows OAC to connect to those data sources that are not directly accessible via a public IP, those are usually on-premise databases, but they can also be cloud data sources that don’t expose a public IP (e.g. Oracle DBCS in a customer private subnet).

    Data Gateway consists of on an agent connected to the data source and relaying queries coming from OAC.

    It’s important to note that all communication is initiated by the Data Gateway, not by OAC. This ensures that there is no ingress traffic on the database side, which is a preferred approach in terms of security and makes it easy setup Data Gateway (no special port to open in the firewall).

    What is the flow? In a nutshell:

    1. the BI server in OAC issues queries and when the queries involve remote data sources OAC queues them
    2. Data Gateway calls in to OAC and picks queries off the queue
    3. Data Gateway executes those queries locally against data sources
    4. Data Gateway then returns the results to OAC

    Difference between Remote Data Connector and Data Gateway

    Data Gateway supersedes the Remote Data Connector utility that was used in earlier releases (and is still required when connecting to Essbase as of August 2019). Although you can still use Remote Data Connector, Oracle recommends moving to Data Gateway if you are not connecting to Essbase.

    The key difference between Remote Data Connector and Data Gateway is that the direction of traffic got inverted. With Remote Data Connector the communication was initiated by OAC. With Data Gateway the communication is initiated by the Data Gateway. OAC never initiates communication with Data Gateway.

    Because of this:

    • a DMZ should not be necessary
    • ports in the customer firewall do not need to be opened for ingress  traffic anymore
    • OAC does not need to be white-listed any longer

    In addition, Data Gateway is self-contained, it comes with its own application server (Jetty) so there is no need for WebLogic or Tomcat anymore.

    Finally because there can be multiple Data Gateways serving the same databases, failover can be implemented and highly available architectures are possible.

    FAQ

    Why was Remote Data Connector deprecated?

    Data Gateway offers a superior architecture: it avoids ingress traffic, it does not require opening special ports in the firewall, it does not require the customer to provide an application server (remove dependency on  Weblogic or Tomcat), it enables high availability architecture ( a single Remote Data Connector vs multiple gateways to balance the load)

    How to install Data Gateway?

    For details You can refer to the documentation as well as this blog post by the Oracle A-Team but in a nutshell you must:

    1. [OAC side] Enable Data Gateway in OAC
    2. [DG side] Install Data Gateway on the server
    3. [DG side] Configure Data Gateway:
      1. Copy/paste the URL of the OAC server it is paired to
      2. Copy paste the IP or URL of the allowed hosts ( AKA data sources) it is servicing
    4. [DG side] Generate the Data Gateway key and Copy it to clipboard
    5. [OAC side] Add the Data Gateway in OAC by pasting the key
    6.  [DG side] click on Test then Save

    Where is Data Gateway running? Do we need to install it on some VM?

    There are 2 sides to the Data Gateway: The OAC side and the Database side.

    • On the OAC side, the Data Gateway queue is managed by OAC, nothing to install.
    • On the database side, the Data Gateway agent usually runs on dedicated server or VM next to the data source, but  it could also run on a simple laptop or be deployed in a compute instance in the cloud, as long as it is able to connect to the data source.

    Where should Data Gateway be installed?

    The Data Gateway should be installed in a subnet that gives visibility to both OAC and the target data source(s).

    From that subnet it should be possible to open a browser and connect to OAC, and use a generic JDBC tool to connect to the data source from the host where Data Gateway has been installed.

    What are the supported Operating Systems?

    As of August 2019 Data Gateway Agents can be installed on the following version of Linux, Windows or Mac:

    • Oracle Linux 6 Update Level 6+
    • Oracle Linux 7 Update Level 0+
    • Red Hat Enterprise Linux 6 Update Level 6+
    • Red Hat Enterprise Linux 7 Update Level 0+
    • SLES 11 Service Pack 3+
    • SLES 12 Service Pack 1+
    • Microsoft Windows x64 (64-bit) 7 SP1+, 8.1, or 10; Windows Server 2012 R2
    • Mac OS Sierra (10.12), High Sierra (10.13)

    As of August 2019 the installation wizard is available for Linux only. Oracle plans to have it available for Windows too.

    Oracle recommends to:

    • Use Linux for server use cases (e.g. OAC connecting to corporate databases)
    • Use Windows and Mac for personal gateway use cases (i.e. connect OAC to the database under my desk)

    For other operating systems where Data Gateway is not available, Remote Data Connector on Weblogic/Tomcat will continue to be made available but will not be updated with new functionality.

    Can we run Data Gateway as a service on windows server?

    In theory Data Gateway could be run as a windows service but this has not been tested.

    Can there be multiple Data Gateways?

    Yes, you can install, configure and register many Data Gateways per OAC service instance, but as of August 2019 all Data Gateways must be capable of servicing all remote queries. This means that all Data Gateways must be able to query all remote data sources, in other words all Data Gateways attached to an OAC server must be configured to serve the same list of databases (same list of hosts).

    In addition, there can be multiple Data Gateways per node (physical or virtual) when running the Linux version of Data Gateway.  As of August 2019 there can only be one Data Gateway per server/VM when running the windows version, but the intention is to remove this restriction in the future.

    If HA is required, it is recommended to have at least 2 Data Gateways (meaning 2 VMs) per OAC server/environment. The more data gateways, the more the load on the data sources can be balanced.

    How is HA achieved for the Data Gateway?

    On the OAC side, high availably it natively provided by Oracle Cloud Infrastructure.

    On the Data Gateway side, HA can be achieved by setting up 2 Data Gateways for each OAC server/environment.

    Each Data Gateway will be set up with the same list of data sources it can connect to, so it will be able to issue query to the same list of databases. If a Data Gateway was to fail then, the next query will be run by the other Data Gateway. NB: the first query would be lost and would need to be triggered by the user again. 

    How to size Data Gateway?

    As of August 2019 Oracle advises to use Remote Data Connector sizing guidelines  (based on: 100Mbps bandwidth; 5000 resultset rows; 35ms latency) to size the server that will host Data Gateway.

    The Remote Data Connector sizing calculator can be found in the general OAC sizing calculator used by Oracle Solution Engineers.

    Sizing guidelines specific to Data Gateway are being worked on.

    However, Oracle expects the performance characteristics to be similar to Remote Data Connector in terms of resource usage .

    Do customers need to open ports in their firewalls to enable OAC/Data Gateway communication?

    No, with Data Gateway there is no special egress/ingress port requirements, communication is done through the standard 443 HTTPS port

    • No ingress traffic (all traffic is initiated by the Agent).
    • Egress traffic uses port 443, just like browser access to OAC or via a proxy… If you are able to access OAC from a browser on the machine where Data Gateway is installed, you should be good.

    How come traffic is only egress?

    It could seem counter intuitive that the Data Gateway always initiate the communication when in reality it should be responding to a querying need from the OAC server.

    This push from OAC to Data Gateway is emulated using a technique called long polling.

    Every time a Data Gateway is added, it initiates a communication with its corresponding OAC server.

    Basically, telling the OAC server that it’s available to help with remote queries.

    That HTTPS query can be held indefinitely (or at least until the firewall timeout is reached and the firewall terminates the connection).

    Once the OAC server needs to query a remote database, OAC responds to the HTTPS query that was held, including the query that needs to be run.

    Data Gateway then initiate a new HTTPS communication with OAC, including the resultset related to the query tied to the previous HTTPS request.

    This technique guarantees that the Data Gateway always initiate the communication.

    How are the SSL Certificates managed?

    The HTTPS communication between Data Gateway and OAC leverages OAC’s SSL certificates.

    How secure is Data Gateway?

    When a Data Gateway is setup a pair of public and private key is generated. Those keys are used to encrypt all communication between OAC and the Data Gateway. This provides security for replay or man in the middle attack.

    In addition, the TLS1.2 encryption brought by the HTTPS connection brings another layer of encryption.

    Finally, if “Allowed Hosts” are specified in Data Gateway, then only those databases can be reached.

    NB: If you login to OAC using a browser, then inspect the security (using Chrome this means open Developer tools and look at the security tab), you will see first hand the connection security settings.  In 105.3 it is showing:

    The connection to this site is encrypted and authenticated using TLS 1.2, ECDHE_RSA with P-256, and AES_256_GCM.

    Data Gateway connects in the same way, with the addition of the signature of each packet.

    What is the Key for?

    When a Data Gateway is set up, the click on the “Re-generate a Key” button triggers the creation of both a private and a public key.  Only the public key is shown in the config UI.  The private key is kept by the Data Gateway agent.  You manually copy the public key to the target OAC service instance so that OAC has the public key and the Data Gateway agent has the private key.  The Data Gateway agent communicates with OAC over HTTPS, but in addition, the packets of data are encrypted and decrypted using the public and private keys. So only the registered Data Gateway agents can communicate with OAC though this mechanism.

    The private key is stored in a keystore file at the root of the Data Gateway folder. The keystore does not contain the public key - this is shown only once, for OAC registration. The private key in the keystore is encrypted and no one has the keystore password so the keystore would be useless on any other installation of Data Gateway. However it is recommended to protect the keystore file using operating system security.

    How performant is Data Gateway?

    Data Gateway uses SQL over HTTPS.  Whereas a direct connection will take some chatter to connect, send the query then return the results, the connection is already there between OAC and Data Gateway and the data is sent back in packets designed to go over HTTPS.

    In addition, Data Gateway leverages OAC query engine to optimize performances, including query federation and data caching. Data Gateway has no inherent performance boosting mechanism.

    OAC’s BI server will determine which queries to issue to which connections/Data Gateway and then leverage cache and/or stitch together the results if there are multiple queries being issued for the same analysis.

    From a Data Gateway perspective, there is no difference between running a query against an Oracle database in the cloud or going through Data Gateway to run that query against a database on premise. OAC query engine works its magic then passes the queries along, if it turns out the queries involve some on-prem data then the queries are added to Data Gateway queue.

    Are there safeguards at the Data Gateway and/or the Agent level to prevent “crazy” queries to be run?

    Preventing queries with the potential of altering the data:

    As of August 2019, only SELECT statements are allowed.  

    Preventing queries with the potential of cross join:

    If the query comes from the RPD, the data model defined in the RPD will ensure that the query generated by the query engine is sound and does not result in a cross join

    If the query is ad-hoc/direct, that means we are querying a single table – so there is no chance of cross join. Join/blend of datasets will occur in OAC.

    Preventing queries with the potential of returning too much data:

    Depending on the size of the OAC instance a max row limit of 1.1M/2.2M rows applies (https://docs.oracle.com/en/cloud/paas/analytics-cloud/acsom/create-services-oracle-analytics-cloud.html#GUID-164D8568-9AE3-4A74-9F1A-0D87B78713C9)  and is enforced by adding a FETCH clause to the query.

    What type of API is used for the communication between Data Gateway and the Data Gateways?

    This is a private API.  It uses HTTPS as transport. Data packets are signed and encrypted using a public/private key mechanism.

    What is the Data Gateway throughput and when would additional Data Gateways be needed to balance it out?

    Additional testing of Data Gateway under various configuration is under way (server OS, connection bandwidth, datasource type, request frequency, dataset size…) but we expect the throughput of individual Data Gateways to be similar to the one of Remote Data Connector.

    Are there performance considerations specific to Data Gateway?

    Long polling requires that the HTTPs connection get maintained indefinitely.

    There will 1 be HTTPs connection per Data Gateway.

    If the customer implements HA, there will be 2 Data Gateways per OAC server/environment.

    This can result in multiple open connections that need to be maintained indefinitely,

    The customer firewall must also allow long delays for HTTPS responses.

    What is the timeout setting?

    If the response time exceeds the 10 minutes timeout defined in OAC the query will be killed by OAC what ever the reason (database issue, Data Gateway issue, networking issue…)

    Where are Data Gateway logs stored?

    Data Gateway logs can be found in logs subfolder within the Data Gateway main folder.

    For example on windows they can be found in : <Install folder>\logs\datagateway.log

    Can we enable more detailed logging?

    yes, by updating the logger.rolling.level property.

    On the Windows version of Data Gateway :

    • Edit: <Install folder>\war\datagateway\WEB-INF\classeslog4j2.properties
    • Change the line:

    logger.rolling.level = ERROR

    • to this:

    logger.rolling.level = TRACE (or ALL or DEBUG)

    • Restart Data Gateway

    On the Linux version of Data Gateway :

    • Edit: domain/jettybase/resources/jetty-logging.properties
    • Change it from this:

    org.eclipse.jetty.util.log.class=org.eclipse.jetty.util.log.StrErrLog
    org.eclipse.jetty.LEVEL=INFO

    • to this:

    org.eclipse.jetty.util.log.class=org.eclipse.jetty.util.log.StrErrLog
    org.eclipse.jetty.LEVEL=ALL



    and the log file is in
    /domain/jettybase/logs/[date].jetty.log 

    Can we use Data Gateway for Usage Tracking on Prem?

    It might work in 105.4 when more than just SELECT statements are possible

    What on-prem data sources can Data Gateway be used against?

    Here is the list of supported Data Sources  as of August 2019, with a distinction between direct connection (via DV) and modeled connection (via RPD)

    NB: As of August 2019 connection to Essbase can’t be done via Data Gateway and must still rely on Remote Data Connector.

    NB: Although not officially supported some customers have been able to successfully use Data Gateway against DB2/Z using the generic DB2 driver. Try at your own risk…

    The current list of supported Data Sources is available at:

    https://docs.oracle.com/en/cloud/paas/analytics-cloud/acubi/supported-data-sources.html

     

    Remote Connection

    for Data Sets

    Remote Connection

    for Data Models

    Oracle DB

    Yes

    Yes

    Oracle Essbase

    Yes (Remote Data Connector but not Data Gateway)

    Yes (Remote Data Connector but not Data Gateway)

    Apache Hive

    Not yet

    Yes

    Amazon Redshift

    Not yet

    Not yet

    DB2

    Yes

    Yes

    Impala

    Not yet

    Yes

    MySQL

    Not yet

    Yes

    Spark

    Not yet

    Yes

    SQL Server

    Yes

    Yes

    Teradata

    Yes

    Yes

    Can we run both Remote Data Connector and Data Gateway?

    No, a given OAC server can either use Remote Data Connector OR Data Gateway, not both simultaneously.

    Since there can only be one Remote Data Connector per OAC server, an HA architecture is not possible when RDC is required. Which, as of August 2019, would only occur when Essbase is a required data source.

    NB: Although Remote Data Connector documentation suggests it only supports a single data source, like Data Gateway, a Remote Data Connector can connect to multiple data sources of the same or different types.

    Can Data Gateway connect to SQL Server through Active Directory?

    No, OAC only supports connecting to SQL Server with UserID and Password.

    Troubleshooting

    Agent state change failed with error: Agent name or OAC URL not specified or the Key pair not generated’

    If you get that message when setting Data Gateway try to

    • Save  then Enable
    • if still not working : close the application and restart the application
    • if still not working : check your networking (on/off VPN)

    Invalid OAC URL

    If Data Gateway seems unable to reach OAC.

    • Make sure Data Gateway is well defined and enabled on the OAC side
    • Make sure you can reach the OAC URL from the environment where Data Gateway is running (e.g.  from a browser)
    • Make sure nothing is blocking the communication (e.g. VPN) -  traffic has to be permitted through the firewall

     If needed you can use a proxy by following those steps (detailed in https://community.oracle.com/message/15411756#15411756 ):

    1. edit the file obiee_rdc_agent.properties located at the root of the Data Gateway folder and provide proxyHost and proxyPort there
    2. once you save the file, immediately go to you Data Gateway UI configuration page, and hit 'Save' button. Only this way your settings from previous step are made persistent.
    3. now you should be able to configure you agent as normal, but it will use your proxy settings

    Conflict between Data Gateway and Data Visualisation Desktop.

    Data Gateway and DVD both use jetty, which can cause issues. It’s a known bug that is being worked on. Data Gateway will probably be packaged within DVD in the future.

    For now: Stop both application, then start Data Gateway first, then DVD second

    Linux installation

    When using OAC 105.3, you need to install Remote Data Connector from the 105.3 Data Gateway Linux installer just to ‘Load Java Data Sources’ in Admintool.  In the next version you will not need this Remote Data Connector install.  See Release Notes:
    https://docs.oracle.com/en/cloud/paas/analytics-cloud/acski/index.html#ACSKI-GUID-0DF51FB0-83B4-4BFA-B6B5-B11F0A3EF369

    Loading Java Data Sources in the BI Admintool

    If you are already using Remote Data Connector. you do not need to change a thing.

    If you are a new user,  to Load Java Data Sources in the Admintool:

    • File > Load Java Data Sources
    • Enter the Hostname (FQDN)
    • Enter the port (non-SSL unless you need to use SSL – there is nothing sensitive here)
    • Ensure UserID and Password are NOT empty.  Put anything, just be creative, this will not actually be validated

    Note that there is no validation when simply calling ‘Load Java Data Sources’ as it is a public call, but you do need to send something in the userID and password fields.

    JDBC Driver Classes and Connection String

    JDBC Driver Classes and Connection String example:

    SQL Server – com.oracle.bi.jdbc.sqlserver.SQLServerDriver – jdbc:oracle:sqlserver://MyServer:1433;DatabaseName=MyDB

    Creating remote connections to data sources other than Oracle requires changing the database type

    When specifying a remote connection from the data model (.rpd file) to a remote data source, you must initially specify Oracle as the database type while you set up the connection pool to point to Data Gateway. After saving your connection pool, you must change the database type to the appropriate type supported as a remote connection.

    The Load Java Data Source operation fails in Developer Client Tool for Oracle Analytics Cloud

    In Developer Client Tool for Oracle Analytics Cloud, when you select the Load Java Data Sources option from the File menu, the operation fails. To work around this issue:
    Install Oracle Analytics Cloud Remote Data Connector by running the Data Gateway installer and at the Select Remote Data Version dialog, select Remote Data Connector rather than Data Gateway.
    In Developer Client Tool for Oracle Analytics Cloud, from the File menu, select Load Java Data Sources.

    NB: You should always use the version of Admintool, Data Gateway or RDC that corresponds to the version of OAC.

    This means use RDC installed through the Data Gateway installer.  This means Linux only in DG 5.3.  The issue goes away in DG 5.4.

    More Known Issues in Oracle Analytics Cloud Data Gateway

     

    Image:

    Comment