Comments

  • 1-13 of 13
  • Jean-Pierre Dijcks

    One more correction on the above. It looks like we are going to go with CDH 6.0.1, as 6.0.2 is not yet available. Rounding out the release and hope to announce the availability shortly.

    JP

  • Jean-Pierre Dijcks

    I've added a new post on the forum on the release time lines. That way it is hopefully easier for people to find the right information.

    JP

  • Jean-Pierre Dijcks

    We are finalizing dates, will update everyone on Friday with more specifics. But right now we expect 5.16.2 to be around March as it looks like C6 will slip to mid January. 

    JP

  • Jean-Pierre Dijcks

    Hi Chris,

    Q: Looking at the install/upgrade notes for CDH 6.0 it states upgrades from CDH 5.15 are not supported.

    A: Correct... (which you knew, but just confirming for the community)

    Q: We at DAZN are looking to re-image a BDA 4.7 appliance mid January 2019, we would like to use BDA 4.14 with a view of upgrading to CDH 6.0, are you still on course to release this version by mid December?

    ​A: No. We are knee-deep in two major projects, both to be delivered in the coming weeks: 1) Uptake of CDH 6.0.1 and 2) On-line migration from OL6 to OL7 on the BDA clusters. The plan is to release CDH 6 before the Xmas break, and OL7 after the break in January. CDH 5.16.0 was released and pulled, quickly replaced with 5.16.1 at the end of November. Because of the urgency of the 2 projects mentioned we will only start the 5.16.1 or .2 work once one of these finishes. We are also thinking to spend a bit more time on testing with 5.16 as it will the terminal release for 5.x. So currently we anticipate that we will get 5.16.2 to your team in the February / March timelines.

    ​Certainly open to hear your feedback on this.

    Q: Can you confirm if BDA 4.14 will support upgrade to CDH 6.0?

    ​A: Not yet, need to discuss. Here we'd love to understand your direction. As I said, we are putting out 6.0.1 asap. Would you want to go to 6.x asap, if so would you mind sharing your reasoning? Trying to figure out if 6.1 may make more sense or 6.0. 

    ​Hopefully this gives you some insight. And as said, please do share your thoughts.

    ​Thanks,

    JP

  • Jean-Pierre Dijcks

    We are a little behind in BDLite... due to a switch to OL7 on the VM etc. 

    Would a cloud solution be an option? Imagine doing a short lived cluster that would be small, secure and HA (e.g. your small BDA) but in cloud. Or is that not possible in your environments and it would have to be on-prem?

  • Jean-Pierre Dijcks

    BDA 4.13.0 is now available for download, see the forum posts for details.

    JP

  • Jean-Pierre Dijcks

    Makes sense. I would just chart out the requirements, and then design a solution on top of that with the two BDAs.

    One additional thought, you could consider Big Data Lite VM as a functional development environment: http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html

    JP

     

  • Jean-Pierre Dijcks

    To add one more thought - wanted to not dilute the above with a cloud pitch :-)

    If you are looking to DR data (so not a workload failover), you could consider using something like Object Storage in Oracle Cloud. BDA comes with Big Data Manager (as of 4.12) which enables drag and drop file transfer to Object Storage. This generates a Spark application that parallelizes the data flow to Object Storage.  Once there, you can easily load it into Big Data Cloud Service or work on it with Autonomous Data Warehouse Cloud (as an example).

    Big Data Cloud Service could also be used as a dev environment, and you could also BDR data directly into BDCS...

    Some of this is in this OBE: Ingesting and backing up Data

    JP

  • Jean-Pierre Dijcks

    Hi Robin,

    Well, in general the answer is yes, this is technically possible. Whether it is advisable is a different matter.

    Here are some scenarios we have seen customers use:

    • Have a primary cluster (PRD) and a secondary (DR). Replicate from PRD to DR. Run Analytics workloads, as well as Tests, QA, Performance test on the DR system. This way you use DR for productive work (Analytics) while having a DR copy. This assumes that the replication is done timely enough for the analytics to be effective for the business. 
    • Have an active - active set up with data in sync across the two clusters and run workloads across as you see fit - essentially load balancing. Customer implement using:
      • WanDisco - a partner solution to manage the synchronization
      • Dual Ingest with ETL tools, pushing the data into both clusters. ETL tools are "responsible" for ensuring clusters are in sync
      • Kafka in front of the clusters and both cluster subscribe to the topics, leveraging guaranteed delivery in Kafka to ensure sync (at some point)

    Now, one of the problems you will run into is that you do not have a cluster that you can use to verify and test upgrades and application testing. In other words, when moving to the next CDH version on the BDA, how do you verify that everything works? If you use DR, and it somehow breaks something you are in a bit of a pickle.

    Regarding development, I assume you are developing analytics or pipelines or stuff like that. I think you can use DR for that. You will just need to make sure that none of that work EVER modifies the original data sets. Which can be achieved with structuring the directories and ACLs to avoid deletes etc.

    All in all, as with all of these decisions, be mindful of saving money and then losing a lot of it by breaking something. Most of our production installations are really a 3 system setup. Prod-DR-Dev/Sandbox. Dev/Sandbox is used for new versions, new models, new apps, upgrades, betas etc. Dr for analytics that are not super time sensitive in data latency and prod for ingest, ETL, production apps etc.

    Comments welcome!

    JP

  • Jean-Pierre Dijcks

    Update: with 5.15.1 imminent and our release equally imminent, we decided to add 1 week to our release schedule and uptake CDH 5.15.1 end of this week. Then release - tentative dates of course - around August 30 with that .1 update.

    JP

  • Jean-Pierre Dijcks

    Hi there,

    Should have been more clear in my post. The post refers to BDA - so the on-premises version. Once that is out, it becomes available in BDCS after a short certification step. The current estimate is that the BDA release should be out mid-August.

    One gotcha on that date, it is also when the next CDH update (5.15.1) is - roughly - planned. If that is close enough, we will pick up that code and delay the BDA release for a week to ensure we have the latest stability fixes on BDA and save customers the time of applying the .1 parcels after we release BDA.

    Hope that clarifies.

    JP

  • Jean-Pierre Dijcks

    Nice summary, thanks for the write-up.

    JP

  • Jean-Pierre Dijcks

    In general, all big data topics should come here... it enables us to cover related topics in one place and address BDA, BDCS, BDCC (cloud at customer) as well as the architectures, hybrid set ups etc. So, yes, please leverage this forum, and just clarify that you are asking about BDA...

    JP