Language Selection

English French German Italian Portuguese Spanish

Databricks brings its Delta Lake project to the Linux Foundation

Filed under
Linux

Databricks, the big data analytics service founded by the original developers of Apache Spark, today announced that it is bringing its Delta Lake open-source project for building data lakes to the Linux Foundation and under an open governance model. The company announced the launch of Delta Lake earlier this year and even though it’s still a relatively new project, it has already been adopted by many organizations and has found backing from companies like Intel, Alibaba and Booz Allen Hamilton.

“In 2013, we had a small project where we added SQL to Spark at Databricks […] and donated it to the Apache Foundation,” Databricks CEO and co-founder Ali Ghodsi told me. “Over the years, slowly people have changed how they actually leverage Spark and only in the last year or so it really started to dawn upon us that there’s a new pattern that’s emerging and Spark is being used in a completely different way than maybe we had planned initially.”

This pattern, he said, is that companies are taking all of their data and putting it into data lakes and then do a couple of things with this data, machine learning and data science being the obvious ones. But they are also doing things that are more traditionally associated with data warehouses, like business intelligence and reporting. The term Ghodsi uses for this kind of usage is ‘Lake House.’ More and more, Databricks is seeing that Spark is being used for this purpose and not just to replace Hadoop and doing ETL (extract, transform, load). “This kind of Lake House patterns we’ve seen emerge more and more and we wanted to double down on it.”

Read more

The LF's press release

  • The Delta Lake Project Turns to Linux Foundation to Become the Open Standard for Data Lakes

    Amsterdam and San Francisco, October 16, 2019 – The Linux Foundation, the nonprofit organization enabling mass innovation through open source, today announced that it will host Delta Lake, a project focusing on improving the reliability, quality and performance of data lakes. Delta Lake, announced by Databricks earlier this year, has been adopted by thousands of organizations and has a thriving ecosystem of supporters, including Intel, Alibaba and Booz Allen Hamilton. To further drive adoption and contributions, Delta Lake will become a Linux Foundation project and use an open governance model.

    Every organization aspires to get more value from data through data science, machine learning and analytics, but they are massively hindered by the lack of data reliability within data lakes. Delta Lake addresses data reliability challenges by making transactions ACID compliant enabling concurrent reads and writes. Its schema enforcement capability helps to ensure that the data lake is free of corrupt and not-conformant data. Since its launch in October 2017, Delta Lake has been adopted by over 4,000 organizations and processes over two exabytes of data each month.

Delta Lake finds new home at Linux Foundation

  • Delta Lake finds new home at Linux Foundation

    Databricks used the currently happening Spark + AI Summit Europe to announce a change in the governance of Delta Lake.

    The storage layer was introduced to the public in April 2019 and is now in the process of moving to the Linux Foundation, which also fosters software projects such as the Linux kernel and Kubernetes.

    The new home is meant to drive the adoption of Delta Lake and establish it as a standard for managing big data. Databricks’ cofounder Ali Ghodsi commented the move in a canned statement. “To address organizations’ data challenges we want to ensure this project is open source in the truest form. Through the strength of the Linux Foundation community and contributions, we’re confident that Delta Lake will quickly become the standard for data storage in data lakes.”

Open source Delta Lake project moves to the Linux Foundation

  • Open source Delta Lake project moves to the Linux Foundation

    Databricks Inc.’s Delta Lake today became the latest open-source software project to fall under the banner of the Linux Foundation.

    Delta Lake has rapidly gained momentum since it was open-sourced by Databricks in April, and is already being used by thousands of organizations, including important backers such as Alibaba Group Holding Ltd., Booz Allen Hamilton Corp. and Intel Corp., its founders say. The project was conceived as a way of improving the reliability of so-called “data lakes,” which are systems or repositories of data stored in its natural format, usually in object “blobs” or files.

    Data lakes are popularly used by large enterprises as they provide a reliable way of ensuring that data can be accessed by anyone within an organization. They can be used to store any kind of data, including both structured and unstructured information in its native format, and also support analysis of data that helps provide real-time insights on business matters.

Databricks contributes Delta Lake to the Linux Foundation

  • Databricks contributes Delta Lake to the Linux Foundation

    The Databricks-led open source Delta Lake project is getting a new home and a new governance model at the Linux Foundation.

    In April, the San Francisco-based data science and analytics vendor open sourced the Delta Lake project, in an attempt to create an open community around its data lake technology. After months of usage and feedback from a community of users, Databricks decided that a more open model for development, contribution and governance was needed and the Linux Foundation was the right place for that.

Databricks’ Delta Lake Moves To Linux Foundation

Unifying cloud storage and data warehouses: Delta Lake project..

  • Unifying cloud storage and data warehouses: Delta Lake project hosted by the Linux Foundation

    Going cloud for your storage needs comes with some baggage. On the one hand, it's cheap, elastic, and convenient - it just works. On the other hand, it's messy, especially if you are used to working with data management systems like databases and data warehouses.

    Unlike those systems, cloud storage was not designed with things such as transactional support or metadata in mind. If you work with data at scale, these are pretty important features. This is why Databricks introduced Delta Lake to add those features on top of cloud storage back in 2017.

SDxCentral coverage

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

More in Tux Machines

Growing the Linux app Ecosystem at LAS 2019

The third Linux Application Summit (LAS) kicks off this week in Barcelona, Spain. Formerly organised under the GNOME project, known as Libre Application Summit, the new LAS is a joint effort between the KDE and GNOME projects. The aim of the conference is to encourage the growth of a vibrant Linux application ecosystem. Canonical are proud sponsors of LAS 2019, and are sending along a team to represent Ubuntu and Snapcraft. The volunteers on the organising committee each have a long history in the Linux application community. They’ve all worked on platforms and infrastructure to enable new software development for Linux. I took some time to chat with some of the team, and what LAS means for them. Aleix Pol, representing KDE, has worked on Linux applications for a while, and is hopeful for increased collaboration between application developers and platform maintainers. Aleix told me; “While we [GNOME and KDE] are sizeable organisations, we have massive tasks at hand. We need to create an environment where people can come and create their solutions for all of us.” This applies both for application developers and those who work primarily on the platforms themselves. He continued; “With GNOME, we share pieces of software, we share users and we even share some of our dreams. Meeting, talking and collaborating can only be beneficial”. Aleix also highlighted the benefits of meeting in person at events like LAS, “There’s a very different kinds of visitor. The ones who have been around will be putting faces to nicknames and having these discussions that IRC and mailing lists can’t sustain”. Read more

Kdenlive 19.08.3 is out

The last minor release of the 19.08 series is out with a fair amount of usability fixes while preparations are underway for the next major version. The highlights include an audio mixer, improved effects UI and some performance optimizations. Grab the nightly AppImage builds, give it a spin and report any issues. Read more

Today in Techrights

Android Leftovers