Business Intelligence

Get Involved. Join the Conversation.

Topic

    Brian Hall
    Update / Inserts in Data-SyncAnswered
    Topic posted April 29, 2019 by Brian HallBronze Medal: 1,250+ Points, tagged BI Publisher, OAC Enterprise Edition, OBI 
    118 Views, 14 Comments
    Title:
    Update / Inserts in Data-Sync
    Summary:
    Looking for the count of records Inserted & Updated
    Content:

    Is there a way to find out how many records were Inserted and Updated.  Separate counts not total counts.

    Version:
    Oracle Analytics Cloud Data Sync (2.5)

    Best Comment

    Vijay Ranganathan

    Unfortunately there is no direct way of figuring it out.  Data Sync uses merge statements to do insert or update in a single sql statement, and from the rows affected (reported in the log file),  one would not be able to say which is what.  Data Sync uses two writers to push the data, and hence the two counts - they do not reflect the insert/update counts. 

    If this information is extremely important to you, then you can do the following (using version 2.6)...

    1.  create two new columns called PROCESS_ID(NUMBER(38)) and DML_CODE(VARCHAR(1) using Data Loads->Column Mapping->Manage Target Columns->New Target Column.

    2.  Assign runtime defaults to these two new columns by clicking on Target Expression.  For PROCESS_ID column choose the default as %CURRENT_PROCESS_ID and for DML_CODE choose the default as %DML_CODE.

    When you run the dataload next time, PROCESS_ID will be populated with the run's processid and DML_CODE will have 'I' for inserts and 'U' for UPDATES.

    Then you can identify the number of inserts/updates for any run with a simple select statement like so:

    select count(*), DML_CODE from my_table where process_id = ? group by DML_CODE

    Hope this helps!

    Comment

     

    • Brian Hall

      Maybe this is it.

      Looks like 909 inserted and 5000 updated

      LOG: C:\OAC\OACDataSync_V2_5\log\BHS_OAC_DS_PSOFT-VOUCHER_DATA.25940821

      Truncate Stage Table
      SQL: TRUNCATE TABLE "S1#FACT_VOUCHER_PO"
      Stored Procedure: false
      Continue on error: false
      Number of retries: 1
       
      361  GLOBAL  2019-04-28 07:02:29.707 EST  Reader1 - 5909 records processed, total time: 4 second(s)375 millisecond(s), idle time: 0 millisecond(s)
      362  GLOBAL  2019-04-28 07:02:29.707 EST  Reader1 - successfully completed
      368  GLOBAL  2019-04-28 07:03:12.893 EST  Writer1 - 909 records processed, 0 records failed, total time: 47 second(s)514 millisecond(s), idle time: 4 second(s)328 millisecond(s) (9%), merge time: 41 second(s)592 millisecond(s) (87%)
      Merge counts: 
      "Merge SQL INSERT_UPDATE_DATA_COPY" processed 909 row(s)
      "Truncate Stage Table" reported 0 row(s)
      369  GLOBAL  2019-04-28 07:03:12.893 EST  Writer1 - successfully completed
      370  GLOBAL  2019-04-28 07:04:12.324 EST  Writer2 - 5000 records processed, 0 records failed, total time: 1 minute(s), 46 second(s), 945 millisecond(s), idle time: 2 second(s)484 millisecond(s) (2%), merge time: 1 minute(s), 40 second(s), 582 millisecond(s) (94%)
      Merge counts: 
      "Merge SQL INSERT_UPDATE_DATA_COPY" processed 5000 row(s)
      "Truncate Stage Table" reported 0 row(s)
      371  GLOBAL  2019-04-28 07:04:12.324 EST  Writer2 - successfully completed
      372  GLOBAL  2019-04-28 07:04:12.324 EST  5909 records read.
      Read throughput, records per second: 1355.
      373  GLOBAL  2019-04-28 07:04:12.324 EST  5909 stage and 5909 total records written/processed.
      Write throughput, records per second: 69.
      374  GLOBAL  2019-04-28 07:04:12.324 EST  Merge counts: 
      "Merge SQL INSERT_UPDATE_DATA_COPY" processed 5909 row(s)
      "Truncate Stage Table" reported 0 row(s)
    • Vijay Ranganathan

      Unfortunately there is no direct way of figuring it out.  Data Sync uses merge statements to do insert or update in a single sql statement, and from the rows affected (reported in the log file),  one would not be able to say which is what.  Data Sync uses two writers to push the data, and hence the two counts - they do not reflect the insert/update counts. 

      If this information is extremely important to you, then you can do the following (using version 2.6)...

      1.  create two new columns called PROCESS_ID(NUMBER(38)) and DML_CODE(VARCHAR(1) using Data Loads->Column Mapping->Manage Target Columns->New Target Column.

      2.  Assign runtime defaults to these two new columns by clicking on Target Expression.  For PROCESS_ID column choose the default as %CURRENT_PROCESS_ID and for DML_CODE choose the default as %DML_CODE.

      When you run the dataload next time, PROCESS_ID will be populated with the run's processid and DML_CODE will have 'I' for inserts and 'U' for UPDATES.

      Then you can identify the number of inserts/updates for any run with a simple select statement like so:

      select count(*), DML_CODE from my_table where process_id = ? group by DML_CODE

      Hope this helps!

      • Brian Hall

        Thanks for the help.   Is it possible to access the refresh dates outside data-sync?

        • Vijay Ranganathan

          No its not possible.  The metadata, and the runtime data are stored in a file based relational database construct called JavaDB, and its not possible to query this database from outside.

          However, if you want to use it anywhere in the dataload routines, you could create a parameter (under Parameters tab).  Choose Timestamp as the Data type.  Choose Runtime variable, and then you can choose SOURCE_REFRESH_TIMESTAMP or TARGET_REFRESH_TIMESTAMP and choose the format.  You can refer to the parameter anywhere in the dataload routines by using a prefix %%.  For example, if you created MY_PARAMETER, you can refer to it as %%MY_PARAMETER.

          • Brian Hall

            We use data (PSOFT) that is always one day old.  Our Prod data is moved into our FINT DB at 12:01AM every night.  At the 7:00 AM that morning Data-Sync runs.  Our counts are always off because of the refresh date and the "Entered/Update" dates in Data-Sync is off by 12 hours or more.  We need a way to compensate for this difference.  Right now I must go in manually and change the refresh date everyday from 7:00AM to 12:01 AM to get all records.  

            • Vijay Ranganathan

              Thats easy - to compensate, you have two ways (in general).

              1.  If your source DB's timezone is different than the machine where DS runs, then be sure to specify Timezone information for the connection - this way DS will automatically compensate for the time difference between the timezones.

              2.  If you want to manually specify how much of data you want to re-extract (in your example 12 hours), under Jobs->Connectivity Parameters->Source Connection->Prune Time->Set it to 12 hours.  Lets say you last ran the job at 12:01 April 28, then the current run will ask for changed records since 12:01 April 28th.  With the prune time of 12 hours it will ask for all data changed sunce 00:01 April 28th (offset by 12 hours).

              I am guessing the second option is what you want.

              • Brian Hall

                Thank You So Much!!!!  This is a great help.   BTW is this information documented anywhere? 

                • Vijay Ranganathan

                  Its a not so common usecase.  So, probably not documented clearly.  But this is the precise use case for the prune time concept.  Hope you find this functionality useful.

                  • Brian Hall

                    One more question.  I set the Prune time yesterday and my process ran this morning.  Is there a log file that will tell me wjat refresh date/time used?  If all went well it should show 12:04 AM 04/30/2019  not 09:04 am 04/30/2019

                    • Vijay Ranganathan

                      Jobs->Runs->Tasks subtab.  Choose the row you are interested.  Click on details.  Choose the row that reads INSERT_UPDATE_DATACOPY.  Double click on Additional Info/Log.  There is a log file name published like so:

                      D:\datasync\log\ReadFromOracle-Oracle12c.25944081\CR_AB_DATA_FROM_OCS-AB_OCS_TARGET.20190430.1020.log

                      That log file will contain the filter criteria used.

    • Brian Hall

      We want to alter the SOURCE_REFRESH_TIMESTAMP and TARGET_REFRESH_TIMESTAMP from its set time (7:00 AM) to 12:01 AM) for each FACT table

      • Vijay Ranganathan

        You cannot update it from outside.  Manually, yes, you can go to connections -> refresh dates, and update the value.  But this will not be sustainable, as you will be doing it prior to every run.  Best to use the prune time concept at the job level. This will apply to all the tables being populated in a project.

    • Brian Hall
      We want to alter the SOURCE_REFRESH_TIMESTAMP and 
      TARGET_REFRESH_TIMESTAMP from its set time (7:00 AM) to 12:01 AM) for 
      each FACT table
      
    • Brian Hall

      Thanks for all your help!