Big Data

Get Involved. Join the Conversation.

Comments

  • Craig Decker

    JP,

    I am really interested in CDH 6.1 due to the erasure encoding.  Can you give an update oracle's plans to make this available on the BDA?

    Thanks!

  • Greg Haskell

    JP - We're trying to plan our next upgrade from BDA 4.11 (CDH 5.13) and I had a couple of questions and was wondering if you might have some feedback. Before continuing, I should mention that we have an on-premise BDA, not cloud, but I'm hoping these questions would still apply to on-prem. 
     
    1) In your first post, you mention that the last CDH 5 based release will be BDA 4.14 which will be CDH 5.15.x. However, BDA 4.13 is already on CDH 5.15. Was this just a typo or is the version of CDH really not changing from BDA 4.13 to BDA 4.14? 

    2) I see from Cloudera's site that there is no upgrade path to CDH 6.0.x from CDH 5.15 and higher. So this means that if we upgrade to BDA 4.13 or higher, we'll have to remain on CDH 5 until a BDA release comes out that's compatible with CDH 6.1. With that in mind, do you have any sort of guess when we might anticipate a BDA release that's compatible with CDH 6.1? (2Q2019,3Q2019, "impossible to say or share at this point :-)", etc). 

    Thanks!

  • Alexey Filanovskiy

    thank you for a good question.

    Oracle RDBMS 12.2 allows you to create partitioned external table and you could use this feature to achieve your goal.

    let me show how you could have incremental statistics for partitioned Hive table with Big Data SQL.

    Note1: this example based on the TCP/DS dataset.

    Note2: for simplify explanation I've used in the beginning of scripts, listed below "hive" for statements which should be run in hive and "SQL" for Oracle RDBMS part.

    1) create partitioned table in Hive:

    ----------------------------------------------------------------------

    hive> drop table parq.store_sales_part;

     CREATE TABLE parq.store_sales_part(

      ss_sold_date_sk bigint, 

      ss_sold_time_sk bigint, 

      ss_item_sk bigint, 

      ss_customer_sk bigint, 

      ss_cdemo_sk bigint,

      ss_hdemo_sk bigint, 

      ss_addr_sk bigint, 

      ss_store_sk bigint,

      ss_promo_sk bigint, 

      ss_ticket_number bigint,

      ss_quantity int, 

      ss_wholesale_cost decimal(7,2), 

      ss_list_price decimal(7,2),

      ss_sales_price decimal(7,2), 

      ss_ext_discount_amt decimal(7,2),

      ss_ext_sales_price decimal(7,2),

      ss_ext_wholesale_cost decimal(7,2), 

      ss_ext_list_price decimal(7,2),

      ss_ext_tax decimal(7,2), 

      ss_coupon_amt decimal(7,2),

      ss_net_paid decimal(7,2), 

      ss_net_paid_inc_tax decimal(7,2),

      ss_net_profit decimal(7,2))

      partitioned by (

      year INT,

      month INT,

      day INT)

    stored as parquetfile;

    ----------------------------------------------------------------------

    2) load some data to this table (into 2 partitions):

    ----------------------------------------------------------------------

    hive> SET hive.exec.dynamic.partition=true;

     SET hive.exec.dynamic.partition.mode=nonstrict;

     SET hive.exec.max.dynamic.partitions =10000;

     INSERT INTO TABLE parq.store_sales_part PARTITION (year, month, day)

     SELECT store_sales.*, dt.d_year, dt.d_moy, dt.d_dom

    FROM  parq.store_sales, parq.date_dim dt

    WHERE

    dt.d_date_sk = store_sales.ss_sold_date_sk

    and d_year=2003

    and d_moy=01

    and (d_dom=01 or d_dom=02);

    ----------------------------------------------------------------------

    3) Obtain DDL for External table and Create external table in Oracle RDBMS:

    ----------------------------------------------------------------------

    SQL> set serveroutput on;

    DECLARE

       DDLtxt CLOB;

    BEGIN

       dbms_hadoop.create_extddl_for_hive(

         CLUSTER_ID=>'pmcluster',

         DB_NAME=>'parq',

         HIVE_TABLE_NAME=>'store_sales_part',

         HIVE_PARTITION=>TRUE,

         TABLE_NAME=>'ora_store_sales_part',

         PERFORM_DDL=>FALSE,

         TEXT_OF_DDL=>DDLtxt

       );

       dbms_output.put_line(DDLtxt);

    END;

    /

     

    SQL> CREATE TABLE "BDSQL"."ORA_STORE_SALES_PART" (ss_sold_date_sk NUMBER(19), ss_sold_time_sk NUMBER(19), ss_item_sk NUMBER(19), ss_customer_sk NUMBER(19), ss_cdemo_sk NUMBER(19), ss_hdemo_sk NUMBER(19), ss_addr_sk NUMBER(19), ss_store_sk NUMBER(19), ss_promo_sk NUMBER(19), ss_ticket_number NUMBER(19), ss_quantity NUMBER(10), ss_wholesale_cost NUMBER(7,2), ss_list_price NUMBER(7,2), ss_sales_price NUMBER(7,2), ss_ext_discount_amt NUMBER(7,2), ss_ext_sales_price NUMBER(7,2), ss_ext_wholesale_cost NUMBER(7,2), ss_ext_list_price NUMBER(7,2), ss_ext_tax NUMBER(7,2), ss_coupon_amt NUMBER(7,2),

    ss_net_paid NUMBER(7,2), ss_net_paid_inc_tax NUMBER(7,2), ss_net_profit NUMBER(7,2), year NUMBER(10), month NUMBER(10), day NUMBER(10))  ORGANIZATION EXTERNAL

     (TYPE ORACLE_HIVE

     DEFAULT DIRECTORY DEFAULT_DIR

     ACCESS PARAMETERS (

    com.oracle.bigdata.cluster=pmcluster

    com.oracle.bigdata.tablename=parq.store_sales_part)

    ) REJECT LIMIT UNLIMITED

    PARTITION BY LIST (year, month, day)

    (

    PARTITION "P_484070065" VALUES (2003, 1, 1),

    PARTITION "P_288436660" VALUES (2003, 1, 2)

    )

    ----------------------------------------------------------------------
    4) Gather Statistics for this newly created table:
    ----------------------------------------------------------------------

    SQL> begin

        DBMS_STATS.GATHER_TABLE_STATS (ownname => 'BDSQL',

                                       tabname => 'ORA_STORE_SALES_PART');

    end;

    ----------------------------------------------------------------------
    5) Verify gathering statistics:
    ----------------------------------------------------------------------

    SQL> select PARTITION_NAME, LAST_ANALYZED from ALL_TAB_STATISTICS

    where

    owner=upper('BDSQL')

    and table_name='ORA_STORE_SALES_PART';

     

    PARTITION_NAME                                                             LAST_ANALYZED

    ------------------------------------------------------------------------ -------------------

    P_288436660                                                             02-05-2019 00:13:40

    P_484070065                                                             02-05-2019 00:13:40

    ----------------------------------------------------------------------

    6) set up property for using incremental statistics(for more details check here):

    ----------------------------------------------------------------------

    SQL> begin

    DBMS_STATS.SET_TABLE_PREFS('BDSQL','ORA_STORE_SALES_PART','INCREMENTAL','TRUE');

    end;

    ----------------------------------------------------------------------

    7) add a partition to hive table:

    ----------------------------------------------------------------------

    hive> SET hive.exec.dynamic.partition=true;

     SET hive.exec.dynamic.partition.mode=nonstrict;

     SET hive.exec.max.dynamic.partitions =10000;

     INSERT INTO TABLE parq.store_sales_part PARTITION (year, month, day)

     SELECT store_sales.*, dt.d_year, dt.d_moy, dt.d_dom

     FROM  parq.store_sales, parq.date_dim dt

    WHERE

    dt.d_date_sk = store_sales.ss_sold_date_sk

    and d_year=2003

    and d_moy=01

    and d_dom=03;

    ----------------------------------------------------------------------

    8) add corresponding partition to the Oracle table:

    ----------------------------------------------------------------------

    SQL> alter table BDSQL.ORA_STORE_SALES_PART add partition "NEW_PARTITION2" VALUES (2003, 1, 4);

    ----------------------------------------------------------------------
    9) make sure that there is no statistics for this newly added partition:
    ----------------------------------------------------------------------

    SQL> select PARTITION_NAME, LAST_ANALYZED from ALL_TAB_STATISTICS

    where

    owner=upper('BDSQL')

    and table_name='ORA_STORE_SALES_PART';

     

    PARTITION_NAME                                                             LAST_ANALYZED

    ------------------------------------------------------------------------ -------------------

    P_288436660                                                             02-05-2019 00:13:40

    P_484070065                                                             02-05-2019 00:13:40

    NEW_PARTITION

    ----------------------------------------------------------------------

    10) run procedure for gathering statistics (it may happen automatically during your maintain window):

    ----------------------------------------------------------------------

    SQL> begin

        DBMS_STATS.GATHER_TABLE_STATS (ownname => 'BDSQL',

                                       tabname => 'ORA_STORE_SALES_PART');

    end;

    ----------------------------------------------------------------------

    11) validate that LAST_ANALIZED changed only for last partition:

    ----------------------------------------------------------------------

    SQL> select PARTITION_NAME, LAST_ANALYZED from ALL_TAB_STATISTICS

    where

    owner=upper('BDSQL')

    and table_name='ORA_STORE_SALES_PART';

     

    PARTITION_NAME                                                             LAST_ANALYZED

    ------------------------------------------------------------------------ -------------------

    P_288436660                                                             02-05-2019 00:13:40

    P_484070065                                                             02-05-2019 00:13:40

    NEW_PARTITION                                                           02-05-2019 00:24:26

    ----------------------------------------------------------------------
    this output shows that procedure of gathering statistics don't touch the previously analyzed partitions and only touch the new one.
    Hope it's something that you want to achieve.
  • Jean-Pierre Dijcks

    One more correction on the above. It looks like we are going to go with CDH 6.0.1, as 6.0.2 is not yet available. Rounding out the release and hope to announce the availability shortly.

    JP

  • candan pehlivanoglu

    Sorry, We were at holiday. I didn't see this comment. It is my fault. I uploaded.

     

    CREATE EXTERNAL TABLE `firmarec`(
      `snpst_dt` timestamp,
      `src_dt` timestamp,
      `upd_dt` timestamp,
      `hex_payload` string,
      `frsbkd` string,
      `frtelno` string,
      `fr_konharf` string,
      `fr_durumkodu` string,
      `fr_yetkikodu` string,
      `fr_adi1` string,
      `fr_adi2` string,
      `fr_adres1` string,
      `fr_adres2` string,
      `fr_telefon_1` bigint,
      `fr_telefon_2` bigint,
      `fr_dogum_tarihi_yyag` bigint,
      `fr_dogum_yeri` string,
      `fr_baba_adi` string,
      `fr_kisilikkd` string,
      `fr_kisiliknev` int,
      `fr_merkilkod` int,
      `fr_yer_kodu` string,
      `fr_kit_kodu` string,
      `fr_nitelik_kodu` string,
      `fr_tescil_durumu` string,
      `fr_istirak_kodu` string,
      `fr_istigal_harfi` string,
      `fr_musteri_no` bigint,
      `fr_vergi_dai_kod` string,
      `fr_vergi_dai_sicilno` string,
      `fr_vkn` string,
      `fr_ulk_kodu` string,
      `fr_memzuc_kodu` string,
      `fr_sicilno` string,
      `fr_kib_sicilno` string,
      `fr_tic_sic_memkd` string,
      `fr_kib_sic_turu` string,
      `fr_tic_sic_ilcekd` string,
      `fr_kib_sehir_kodu` string,
      `fr_kredi_grubu` string,
      `fr_provizyon_kodu` string,
      `fr_prov_veren_fir_anah` bigint,
      `fr_prov_veren_fir_anah_eski` bigint,
      `fr_prov_iptal_yyag` bigint,
      `fr_birlesen_fir_anah` bigint,
      `fr_acilis_tarihi_yyag` bigint,
      `fr_son_kullandirma_yyag` bigint,
      `fr_son_update_tarihi_yyag` bigint,
      `fr_onc_son_kullandirma_yyag` bigint,
      `fr_istihbarat_tarihi_yyag` bigint,
      `fr_ilk_tahsis_tarihi_yyag` bigint,
      `fr_risk_yyyyaa` bigint,
      `fm_isl_yyyyaa` bigint,
      `fr_memzuc_geri_bildirim_yyyyaa` bigint,
      `fr_bankamiz_yyyyaa` bigint,
      `fr_memzuc_bankamiz_yyyyaa` bigint,
      `fr_tasfiye_tarihi_yyag` bigint,
      `fr_tasfiye_nedeni` string,
      `fr_yeniden_yapilandirma` string,
      `fr_firma_turu` string,
      `fr_yil_istisna` string,
      `fr_ay_istisna` string,
      `fr_cek_ith_ihr_kodu` string,
      `fr_kategori` string,
      `fr_yeni_kisilik_nevi` string,
      `fr_eski_yetkikodu` string,
      `fr_yeni_nitelik_kodu` int,
      `fr_sakla_yetkikodu` string,
      `fr_devir_nedeni` string,
      `fr_gv_kod` string,
      `fr_gv_hata_kod` string,
      `fr_kanuni_takip_varmi` string,
      `fr_kredi_kullaniyormu` string,
      `fr_sirano` bigint,
      `fr_son_teminat_no` int,
      `fr_arsivtar` bigint,
      `fr_bch_idaritakip` string,
      `fr_bch_kanunitakip` string,
      `fr_dbmussnc_flag` string,
      `fr_devir_flag` string,
      `fr_f2_ypdos_flag` string,
      `fr_yhkskd` string,
      `fr_yp_eslemekd` string,
      `fr_tuketici_kodu` string,
      `fr_kredi_teklif_sayisi` int,
      `fr_yetki_flag` string,
      `fr_nm_tasfiye_limit_yp` decimal(20,2),
      `fr_nm_tasfiye_risk_yp` decimal(20,2),
      `fr_aysonu_toplam_limit` decimal(20,2),
      `fr_convert_flag` string,
      `fr_aysonu_firma_turu` string,
      `fr_nm_rotatif_limit_tp` decimal(20,2),
      `fr_nm_rotatif_risk_tp` decimal(20,2),
      `fr_nm_rotatif_limit_yp` decimal(20,2),
      `fr_nm_rotatif_risk_yp` decimal(20,2),
      `fr_nm_tasfiye_limit_tp` decimal(20,2),
      `fr_nm_tasfiye_risk_tp` decimal(20,2),
      `fr_rotatif_limit_tp` decimal(20,2),
      `fr_rotatif_limit_tp_yyag` bigint,
      `fr_rotatif_risk_tp` decimal(20,2),
      `fr_rotatif_limit_yp` decimal(20,2),
      `fr_rotatif_limit_yp_yyag` bigint,
      `fr_rotatif_risk_yp` decimal(20,2),
      `fr_tasfiye_limit_tp` decimal(20,2),
      `fr_tasfiye_limit_tp_yyag` bigint,
      `fr_tasfiye_risk_tp` decimal(20,2),
      `fr_tasfiye_limit_yp` decimal(20,2),
      `fr_tasfiye_limit_yp_yyag` bigint,
      `fr_tasfiye_risk_yp` decimal(20,2),
      `fr_risk_tavani_usd` decimal(20,2),
      `fr_risk_tavani_usd_yyag` bigint,
      `fr_nakdi_risk_yp` decimal(20,2),
      `fr_nakdi_risk_tp` decimal(20,2),
      `fr_bch_risk` decimal(20,2),
      `fr_iskonto_istira_risk` decimal(20,2),
      `fr_sirket_kredi_karti_risk` decimal(20,2),
      `fr_tuketici_risk` decimal(20,2),
      `fr_ticari_ek_hesap_risk` decimal(20,2),
      `fr_gayrinakdi_risk_yp` decimal(20,2),
      `fr_gayrinakdi_risk_tp` decimal(20,2),
      `fr_tazmin_tp` decimal(20,2),
      `fr_tazmin_yp` decimal(20,2),
      `fr_ilk_kr_tahsistar_yyag` bigint,
      `fr_vkn_tckn` bigint,
      `fr_tckn` bigint,
      `fr_gecen_aysonu_rotlim_tl` decimal(20,2),
      `fr_gecen_aysonu_rotlim_yp` bigint,
      `fr_gecen_aysonu_taslim_tl` decimal(20,2),
      `fr_gecen_aysonu_taslim_yp` bigint,
      `fr_upd_check_str` string,
      `fr_upd_date_time` bigint,
      `fr_risk_tavani_dvzkd` string,
      `fr_rating_sektor_kodu` string,
      `fr_gecen_aysonu_top_nak_tl` decimal(20,2),
      `fr_gecen_aysonu_top_nak_yp` bigint,
      `fr_gecen_aysonu_top_gnak_tl` decimal(20,2),
      `fr_gecen_aysonu_top_gnak_yp` bigint,
      `fr_vob_risk_tp` decimal(20,2),
      `fr_kyp_son_onay_tarihi_yyag` bigint,
      `fr_kyp_son_onay_saati_hhmmssff` bigint,
      `fr_prop_validity_start_yyag` bigint,
      `fr_prop_validity_end_yyag` bigint,
      `fr_active_proposal_flag` string,
      `fr_active_switching_flag` string,
      `fr_active_provizyon_flag` string,
      `fr_customer_transfer_flag` string,
      `fr_son_teklif_onay_birimi` string,
      `fr_kyp_son_onay_merci_kodu` string,
      `fr_kyp_son_onayli_proposal_id` string,
      `fr_kyp_pilot_flag` string,
      `fr_santral_devri_flag` string,
      `fr_tic_krd_cnt` int,
      `fr_bir_krd_cnt` int,
      `fr_tasima_kodu` string,
      `fr_nace_istigal` string,
      `fr_nace_kisim` string,
      `fr_nace_bolum` string,
      `fr_nace_grup1` string,
      `fr_nace_grup2` string,
      `fr_nace_grup3` string,
      `fr_nace_grup4` string,
      `fr_birlesen_yyag` bigint,
      `fr_ciro_gecen_yil` bigint,
      `fr_ciro_2yil_once` bigint,
      `fr_calisan_sayisi` bigint,
      `fr_aktiflik_gecen_yil` bigint,
      `fr_aktiflik_2yil_once` bigint,
      `fr_ciro_ismas_ilkgunc_yyag` bigint,
      `fr_ciro_ismas_gecyil_flag` string,
      `fr_ciro_ismas_2yilon_flag` string,
      `fr_vkn_tckn_kod` string,
      `fr_musno_degis_flag` string,
      `fr_rttf_convert` string,
      `fr_reserve_limit_tp` decimal(20,2),
      `fr_reserve_limit_yp` decimal(20,2),
      `fr_reserve_risk_tp` decimal(20,2),
      `fr_reserve_risk_yp` decimal(20,2),
      `fr_filler` string,
      `segckey` string)
    PARTITIONED BY (
      `y` int,
      `m` int,
      `d` int)
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
    STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
    LOCATION
      'hdfs://bda-ns/data/hive/warehouse/etloffloading.db/firmarec'
    TBLPROPERTIES (
      'transient_lastDdlTime'='1545053179')
    Time taken: 0.257 seconds, Fetched: 194 row(s)
    hive>

  • Marty Gubar

    Hi Candan -

    This looks like a hive table over a CSV table.  Any chance you have the ddl for the Parquet-based table?

    Thanks!

    Marty

  • candan pehlivanoglu

     Ok I attached

    hive_table.sql (797 Bytes)
  • Marty Gubar

    Nice job sorting this out Candan :)

    I'm curious - would you mind posting the Hive DDL and the corresponding Oracle DDL?  This could be helpful to others.

    Thanks!

    Marty

  • candan pehlivanoglu

    Sorry Marty for delayed answer.

    Yes you are right. Some queries were working properly but for some tables it didn't work. I had requested a code development from our team. The below parameter fixed the problem

    --conf "spark.sql.parquet.writeLegacyFormat=true" fixed the problem.

     

     

      

     

  • Jean-Pierre Dijcks

    I've added a new post on the forum on the release time lines. That way it is hopefully easier for people to find the right information.

    JP

  • Jean-Pierre Dijcks

    We are finalizing dates, will update everyone on Friday with more specifics. But right now we expect 5.16.2 to be around March as it looks like C6 will slip to mid January. 

    JP

  • Chris Peart

    Hi Jean-Peirre,

    Thanks for the detailed response, in answer to your questions we are planning to upgrade to CDH 6 for following reasons:

    1. Apache Hive 2.1: Performance enhancements and fixes to security vulnerabilities in Hive 1.1.0.
    2. Hadoop 3: Storage overhead, NameNode HA (multiple standbys) and containerization technologies such as Docker.

    These are the main two reasons for upgrading to CDH 6.1 for now but there will be many more I'm sure.

     

    BTW we had a call with Alexey Filanovskiy on Friday which proved very beneficial, so we are probably going to wait for CDH v6 to settle down and become more stable so we will probably wait for CDH 6.1 or more stable later versions of CDH 6.x.

    We are still planning to upgrade our BDA 4.7 cluster in Jan 19 so would be good to know when 5.16.2 comes out as it may be worth waiting a few more weeks for the final cut of CDH-5.x.

    Look forward for your updates on version releases.

    Many Thanks,

    Chris

     

  • Marty Gubar

    Hi Candan -

    Can you provide some more details about the error?  Are you saying that some queries offload without retries - while other queries do see retries?  If that's the case, can you share the ddl for those tables that are showing retries?

    Thanks.
    Marty

  • Jean-Pierre Dijcks

    Hi Chris,

    Q: Looking at the install/upgrade notes for CDH 6.0 it states upgrades from CDH 5.15 are not supported.

    A: Correct... (which you knew, but just confirming for the community)

    Q: We at DAZN are looking to re-image a BDA 4.7 appliance mid January 2019, we would like to use BDA 4.14 with a view of upgrading to CDH 6.0, are you still on course to release this version by mid December?

    ​A: No. We are knee-deep in two major projects, both to be delivered in the coming weeks: 1) Uptake of CDH 6.0.1 and 2) On-line migration from OL6 to OL7 on the BDA clusters. The plan is to release CDH 6 before the Xmas break, and OL7 after the break in January. CDH 5.16.0 was released and pulled, quickly replaced with 5.16.1 at the end of November. Because of the urgency of the 2 projects mentioned we will only start the 5.16.1 or .2 work once one of these finishes. We are also thinking to spend a bit more time on testing with 5.16 as it will the terminal release for 5.x. So currently we anticipate that we will get 5.16.2 to your team in the February / March timelines.

    ​Certainly open to hear your feedback on this.

    Q: Can you confirm if BDA 4.14 will support upgrade to CDH 6.0?

    ​A: Not yet, need to discuss. Here we'd love to understand your direction. As I said, we are putting out 6.0.1 asap. Would you want to go to 6.x asap, if so would you mind sharing your reasoning? Trying to figure out if 6.1 may make more sense or 6.0. 

    ​Hopefully this gives you some insight. And as said, please do share your thoughts.

    ​Thanks,

    JP

  • Chris Peart

    Hi Jean-Pierre,

    Looking at the install/upgrade notes for CDH 6.0 it states upgrades from CDH 5.15 are not supported.

    We at DAZN are looking to re-image a BDA 4.7 appliance mid January 2019, we would like to use BDA 4.14 with a view of upgrading to CDH 6.0, are you still on course to release this version by mid December?

    Can you confirm if BDA 4.14 will support upgrade to CDH 6.0?

    Many Thanks,

    Chris