Reporting and Analytics for Service

Get Involved. Join the Conversation.

Topic

    Guru Perumal
    Custom report extract from Oracle Service Cloud
    Topic posted February 4, 2019 by Guru PerumalRed Ribbon: 250+ Points 
    260 Views, 3 Comments
    Title:
    Custom report extract from Oracle Service Cloud
    Summary:
    Custom report extract from Oracle Service Cloud
    Content:

    Hi There,

    We have a custom report created in Oracle Right Now which is reading the data from multiple tables, up to 6, I guess. 

    If we run the extract manually for all the records, I get the below error

    "The report could not be processed because it exceeds the query size limitation, please reduce the amount of data processed by the report". 

    I modified the filter so that, it would fetch maximum 10000 records only and it is works fine. However, I have to extract approximately , 1 million records and manually performing this task is tedious. 

     

    To avoid this, I have written custom PHP code that would run  the report with the range which would fetch less than 10000 records and it is working fine. ex: My report has filters on Incident Numbers as start and end. so , I always provide the start end numbers to be less than 10000. I have designed the code to provide the start and end range for up to 200000 records and code split them to a batch size which is 10000 and the the code will run the extract and get the records and save them in a file. and I am downloading this file to local machine.

    the challenge here is... this is working only when the range I have provided extracts only up 30-40 k records (20MB) , if the record count increases , it is failing. 

    Any inputs are much appreciated. Thanks.

     

    Abhi

    Version:
    Oracle Service Cloud 18C

    Comment

     

    • Rajan Davis

      Hi Abhi,

      Can you attach this report?

      I would recommend using the REST API instead to pull this information instead. If you use the osvc-crest-api-access-token, you can have long running scripts that should be able to pull a million records fairly easily (depending on how many columns you are retrieving).

      If you can provide an example of what you need to pull, I can write an example script in Node, Python, C#, or PHP.

      Kind Regards,

      Rajan

      • Guru Perumal

        Thanks Ranjan.

         

        I need to pull the data from a custom report. and my requirement is to pull upto 1.5 million records. 

         

        Please let me know what additional information is needed, I can provide the details. Thanks.

        • Rajan Davis

          Hi Abhi,

          It looks like the latest version of the REST API supports bulk extracts. Going to take a look at this, but if you're not on the latest version, the below should work.

          After doing some research, there are a few different methods that I would propose depending on what type of data is being pulled from the reports. This response is kind of long, but hopefully something in here will help you out.

          There are a few issues that I see for doing a bulk report/data export:

          1. If you are trying to get blob/freeform text data, it can really slow down any exporting. I haven't figure out a good way around this, but if you are trying to work with this data it might make more sense to export this data to an external database when it is created using an asynchronous CPM.
          2. It is essentially impossible to know in advance the total number of rows a report will run as this is not exposed in either the SOAP or REST API's. If this was accessible, you can get a bit more creative with the approach, although this doesn't make anything easier.
          3. Finally, even if you know the number of results, you don't always know the size of the data that's coming in. Thread text has a tendency to be very large so even if you can get 20,000 rows, the bytesize of the data can be so large that the request doesn't complete (more on this below)

          For the examples I will give, I am using a custom command line tool that I built to demonstrate the different options as well as tool called jq for working with JSON.

          1. If the data does NOT contain blob/freeform text data, you can probably use a ROQL query instead and use the report database. I did a quick test with the REST API and you can pull 100,000 records per request assuming that the size of the data is not too large:

          First, I ran a query to check the number of threads:

          $ osvc-rest query "SELECT COUNT(incidents.threads.threadList.id) as thread_count FROM incidents" -u $OSVC_ADMIN -p $OSVC_PASSWORD -i $OSVC_SITE
          

          This returns the following parsed JSON:

          [
            {
              "thread_count": 155643
            }
          ]
          

          Secondly, I will run a query to get thread information (without the actual thread text) and pipe the output to a file:

          $ osvc-rest query "USE REPORT; SELECT incidents.threads.threadList.id as thread_id, incidents.threads.threadList.createdTime as created_time, incidents.threads.threadList.account.lookupName as account_name, incidents.threads.threadList.entryType.lookupName as entry_type, incidents.threads.threadList.contact.lookupName as contact_name from incidents" -u $OSVC_ADMIN -p $OSVC_PASSWORD -i $OSVC_SITE >> threads_info.json
          

          After the file has been created, I can use jq to tell me how many records have been returned with the following command:

          $ cat threads_info.json | jq "length"
            100000
          

          It will return 100000. To get the last bit of records, I can run a similar command as before but piped to a different file and with an offset and limit  of 10000 add to the query:

          $ osvc-rest query "USE REPORT; SELECT incidents.threads.threadList.id as thread_id, incidents.threads.threadList.createdTime as created_time, incidents.threads.threadList.account.lookupName as account_name, incidents.threads.threadList.entryType.lookupName as entry_type, incidents.threads.threadList.contact.lookupName as contact_name from incidents limit 100000 offset 100000" -u
          $OSVC_ADMIN -p $OSVC_PASSWORD -i $OSVC_SITE >> threads_info2.json
          

          To combine the files, I can use the following jq command; it is basically combining the files and removing any duplicates:

          $ jq -s '[.[][]] | unique' threads_info*.json >> threads_info_combined.json
          

          When I check the length of the file, I should get the same count as the first command:

          $ cat threads_info_combined.json | jq "unique | length"
          155643
          

          2. If you are trying to pull report data with freeform text/blob type data, using ROQL and the report database will not work as effectively. What I have found is that if a REST call has a large bytesize, the REST server will not respond properly.

          Here is what happens when I try to add text data to one of the queries above:

          Request:

          $ osvc-rest query "USE REPORT; SELECT incidents.threads.threadList.id as thread_id, incidents.threads.threadList.createdTime as created_time, incidents.threads.threadList.account.lookupName as account_name, incidents.threads.threadList.entryType.lookupName as entry_type, incidents.threads.threadList.contact.lookupName as contact_name, incidents.threads.threadList.text as thread_text from incidents" -u $OSVC_ADMIN -p $OSVC_PASSWORD -i $OSVC_SITE
          

          Response:

          <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
          <html xmlns="http://www.w3.org/1999/xhtml">
          <head>
              <title>There is a an error with your request</title>
              <meta http-equiv="content-type" content="text/html; charset=utf-8" />
          </head>
          
          <body>
            <p>There has been an error with your request.</p>
          </body>
          </html>
          

          In this case, there are a couple of options the first of which is to create a custom report and try to keep making requests until all of the rows have been collected. Here is how to do that with my command line tool:

          1. To retrieve JSON: 

          $ osvc-rest report --id 100562 --utc -u $OSC_ADMIN -p $OSC_PASSWORD -i $OSC_SITE -v latest -a "Dumping Threads" --debug
          

          2. To return a CSV named "threads_dump.csv": 

          $ osvc-rest report --id 100562 --utc --csv "threads_dump" -u $OSC_ADMIN -p $OSC_PASSWORD -i $OSC_SITE -v latest -a "Dumping Threads" --debug
          

          What the above commands will do is:

          1. Make an initial HTTP request
          2. If the first request returns 10,000 results, take the access-token-header and add to the request header and make another HTTP request with an offset and limit set to 10,000
          3. These will keep going until the number of results is less than 10,000 and will either return a large JSON array (the default) or create a CSV file with a name of "threads dump" and append to the file for each request
          4. The "--debug" flag will show the request headers being sent for each request so that you can see how many requests you have made.

          Because there are two blob fields in this report (threads.note and threads.mail_header), this takes an extremely long time to run. If you were trying to get 1.5 million records, it would probably take about an hour and half to run...


          3. This is something I haven't implemented fully in my command line tool, but I think what would be the best way to perform a bulk export regardless of the datatype would be to use some form of concurrency.

          My command line tool supports concurrent queries; here is a modified version of the first example:

          $ osvc-rest query "USE REPORT; SELECT incidents.threads.threadList.id as thread_id, incidents.threads.threadList.createdTime as created_time, incidents.threads.threadList.account.lookupName as account_name, incidents.threads.threadList.entryType.lookupName as entry_type, incidents.threads.threadList.contact.lookupName as contact_name from incidents" "USE REPORT; SELECT incidents.threads.threadList.id as thread_id, incidents.threads.threadList.createdTime as created_time, incidents.threads.threadList.account.lookupName as account_name, incidents.threads.threadList.entryType.lookupName as entry_type, incidents.threads.threadList.contact.lookupName as contact_name from incidents LIMIT 100000 OFFSET 100000" --concurrent -u $OSVC_ADMIN -p $OSVC_PASSWORD -i $OSVC_SITE -v latest -a "Using multiple queries" >> threads_info_concurrent.json
          

          Basically, I am running two HTTP requests with the first one getting the first 100,000 rows and the second one getting the last remaining records.

          To combine the results, I will use the following jq command: 

          $ cat threads_info_concurrent.json | jq -s "flatten | unique" >> threads_info_concurrent_combined.json
          

          To check that I have the correct number, I can run this command:

          $ cat threads_info_concurrent_combined.json | jq "unique | length"
          155643
          

          Ultimately, what would work would be to:

          1. Use ROQL
          2. Get the count of items
          3. Split up the items in smaller groups and run multiple requests in different threads
          4. Combine the results
          5. Pipe to JSON or CSV

          I am thinking of adding something like the following command to my library to support this:

          $ osvc-rest bulk-query --select "incidents.threads.threadList.id as thread_id, incidents.threads.threadList.createdTime as created_time, incidents.threads.threadList.account.lookupName as account_name, incidents.threads.threadList.entryType.lookupName as entry_type, incidents.threads.threadList.contact.lookupName as contact_name" --from "incidents" --batch 20000 -u $OSVC_ADMIN -p $OSVC_PASSWORD -i $OSVC_SITE -v latest -a "Bulk ROQL exports"
          

          My thoughts are to get the counts of each column from the "--select" flag and use the highest count for to divide the requests by the number provided in the "--batch". The requests would run concurrently and combine the results until the total number of results are fulfilled.

          I would try the first few examples provided; however, if you need to get threads text, I am looking to implement and test this last example in the next couple of weeks.

          Kind Regards,

          Raj