Advancing no-impact and low-impact maintenance technologies

“This post continues our reliability series kicked off by my July blog post highlighting several initiatives underway to keep improving platform availability, as part of our commitment to provide a trusted set of cloud services. Today I wanted to double-click on the investments we’ve made in no-impact and low-impact update technologies including hot patching, memory-preserving maintenance, and live migration. We’ve deployed dozens of security and reliability patches to host infrastructure in the past year, many of which were implemented with no customer impact or downtime. The post that follows was written by John Slack from our core operating systems team, who is the Program Manager for several of the update technologies discussed below.” – Mark Russinovich, CTO, Azure

This post was co-authored by Apurva Thanky, Cristina del Amo Casado, and Shantanu Srivastava from the engineering teams responsible for these technologies.

 

We regularly update Azure host infrastructure to improve the reliability, performance, and security of the platform. While the purposes of these ‘maintenance’ updates vary, they typically involve updating software components in the hosting environment or decommissioning hardware. If we go back five years, the only way to apply some of these updates was by fully rebooting the entire host. This approach took customer virtual machines (VMs) down for minutes at a time. Since then, we have invested in a variety of technologies to minimize customer impact when updating the fleet. Today, the vast majority of updates to the host operating system are deployed in place with absolute transparency and zero customer impact using hot patching. In infrequent cases in which the update cannot be hot patched, we typically utilize low-impact memory preserving update technologies to roll out the update.

Even with these technologies, there are still other rare cases in which we need to do more impactful maintenance (including evacuating faulty hardware or decommissioning old hardware). In such cases, we use a combination of live migration, in-VM notifications, and planned maintenance providing customer controls.

Thanks to continued investments in this space, we are at a point where the vast majority of host maintenance activities do not impact the VMs hosted on the affected infrastructure. We’re writing this post to be transparent about the different techniques that we use to ensure that Azure updates are minimally impactful.

Plan A: Hot patching

Function-level “hot” patching provides the ability to make targeted changes to running code without incurring any downtime for customer VMs. It does this by redirecting all new invocations of a function on the host to an updated version of that function, so it is considered a ‘no impact’ update technology. Wherever possible we use hot patching to apply host updates completely avoiding any impact to the VMs running on that host. We have been using hot patching in Azure since 2017. Since then, we have worked to broaden the scope of what we can hot patch. As an example, we updated the host operating system to allow the hypervisor to be hot patched in 2018. Looking forward, we are exploring firmware hot patches. This is a place where the industry typically hasn't focused. Firmware has always been viewed as ‘if you need to update it, reboot the server,’ but we know that makes for a terrible customer experience. We've been working with hardware manufacturers to consider our own firmware to make them hot patchable and incrementally updatable.

Some large host updates contain changes that cannot be applied using function-level hot patching. For those updates, we endeavor to use memory-preserving maintenance.

Plan B: Memory-preserving maintenance

Memory-preserving maintenance involves ‘pausing’ the guest VMs (while preserving their memory in RAM), updating the host server, then resuming the VMs and automatically synchronizing their clocks. We first used memory-preserving maintenance for Azure in 2018. Since then we have improved the technology in three important ways. First, we have developed less impactful variants of memory-preserving maintenance targeted for host components that can be serviced without a host reboot. Second, we have reduced the duration of the customer experienced pause. Third, we have expanded the number of VM types that can be updated with memory preserving maintenance. While we continue to work in this space, some variants of memory-preserving maintenance are still incompatible with some specialized VM offerings like M, N, or H series VMs for a variety of technical reasons.

In the rare case we need to make more impactful maintenance (including host reboots, VM redeployment), customers are notified in advance and given the opportunity to perform the maintenance at a time suitable for their workload(s).

Plan C: Self-service maintenance

Self-service maintenance involves providing customers and partners a window of time, within which they can choose when to initiate impactful maintenance on their VM(s). This initial self-service phase typically lasts around a month and empowers organizations to perform the maintenance on their own schedules so it has no or minimal disruption to users. At the end of this self-service window, a scheduled maintenance phase begins—this is where Azure will perform the maintenance automatically. Throughout both phases, customers get full visibility of which VMs have or have not been updated—in Azure Service Health or by querying in PowerShell/CLI. Azure first offered self-service maintenance in 2018. We generally see that administrators take advantage of the self-service phase rather than wait for Azure to perform maintenance on their VMs automatically.

In addition to this, when the customer owns the full host machine, either using Azure Dedicated Hosts or Isolated virtual machines, we recently started to offer maintenance control over all non-zero impact platform updates. This includes rebootless updates which only cause a few seconds pause. It is useful for VMs running ultra-sensitive workloads which can’t sustain any interruption even if it lasts just for a few seconds. Customers can choose when to apply these non-zero impact updates in a 35-day rolling window. This feature is in public preview, and more information can be found in this dedicated blog post.

Sometimes in-place update technologies aren’t viable, like when a host shows signs of hardware degradation. In such cases, the best option is to initiate a move of the VM to another host, either through customer control via planned maintenance or through live migration.

Plan D: Live migration

Live migration involves moving a running customer VM from one “source” host to another “destination” host. Live migration starts by moving the VM’s local state (including RAM and local storage) from the source to the destination while the virtual machine is still running. Once most of the local state is moved, the guest VM experiences a short pause usually lasting five seconds or less. After that pause, the VM resumes running on the destination host. Azure first started using live migration for maintenance in 2018. Today, when Azure Machine Learning algorithms predict an impending hardware failure, live migration can be used to move guest VMs onto different hosts preemptively.

Amongst other topics, planned maintenance and AI Operations were covered in Igal Figlin’s recent Ignite 2019 session “Building resilient applications in Azure.” Watch the recording here for additional context on these, and to learn more about how to take advantage of the various resilient services Azure provides to help you build applications that are inherently resilient.

The future of Azure maintenance 

In summary, the way in which Azure performs maintenance varies significantly depending on the type of updates being applied. Regardless of the specifics, Azure always approaches maintenance with a view towards ensuring the smallest possible impact to customer workloads. This post has outlined several of the technologies that we use to achieve this, and we are working diligently to continue improving the customer experience. As we look toward the future, we are investing heavily in machine learning-based insights and automation to maintain availability and reliability. Eventually, this “AI Operations” model will carry out preventative maintenance, initiate automated mitigations, and identify contributing factors and dependencies during incidents more effectively than our human engineers can. We look forward to sharing more on these topics as we continue to learn and evolve.
Quelle: Azure

Getting started with new table formats on Dataproc

At Google Cloud, we’re always looking for ways to help you connect data sources and get the most out of the big data that your business gathers. Dataproc is a fully managed service for running Apache Hadoop ecosystem software such as Apache Hive, Apache Spark, and many more in the cloud. We’re announcing that table format projects Delta Lake and Apache Iceberg (Incubating) are now available in the latest version of Cloud Dataproc (version 1.5 Preview). You can start using them today with either Spark or Presto. Apache Hudi is also available on Dataproc 1.3.With these table formats, you can now use Dataproc for workloads that need: ACID transactionData versioning (a.k.a. time travel)Schema enforcementSchema evolution and moreIn this blog, we will walk you through what table formats are, why they are useful, and how to use them on Dataproc with some examples.Benefits of table formatsACID transaction capability is very important to business operations. In the data warehouse, it is very common that users generate reports based on a common set of data. While building reports, there are other applications and users that might write to the same set of tables. Because Hadoop Distributed File System (HDFS) and object stores are designed to be like file systems, they are not providing transactional support. Implementing transactions in distributed processing environments is a challenging problem. For example, implementation typically has to consider locking access to the storage system, which comes at the cost of overall throughput performance. Table formats such as Apache Iceberg and Delta Lake solve these ACID requirements efficiently by pushing these transactional semantics and rules into the file formats themselves. Another benefit of table formats is data versioning. This provides a snapshot of your data in history. You can look up data history and even roll back to the data at a certain time or version in history. It makes debugging and maintaining your data system much easier when there are mistakes or bad data.Getting to know table formatsMost big data platforms store data as files or objects on the underlying storage systems. These files have certain structures and access protocols to represent a table of data (think of the Parquet file format, for example). As the size of these tables grow, they are divided up into multiple files. That allows tables that are bigger than the storage system limitations on a single file or object. This also allows you to filter unnecessary files based on data value (partitioning). And, you can have multiple writers at once. The way of organizing these files into tables is known as a table format. Table formats on Google CloudAs the modern data lake has started to merge with the modern data warehouse, the data lake has taken on responsibility for features previously reserved for the data warehouse.A common scenario when building big data processing pipelines is to structure the unstructured data (log files, database backups, clickstream, etc.) into tables that can be used by the data analyst or data scientist. In the past, this often involved using a tool like Apache Sqoop to export the results of big data processing into a data warehouse or relational database system (RDBMS) to make it easier for data to be interpreted by the business. However, as tools like Spark and Presto have grown in terms of both features and adoption, the same data users now prefer the functionality offered by these data lake tools over the traditional data warehouse or SQL-only interface. However, because this data is necessary to the business, the storage expectations related to ACID transactions, schema evolution, etc. became a missing link. In Google Cloud, BigQuery storage solves these problems and needs. You can access BigQuery storage with Spark using the Spark-BigQuery connector, and BigQuery storage is fully managed, with the maintenance and operations overhead taken care of by the Google engineering team.   In addition to BigQuery storage, Dataproc customers using Cloud Storage have had many of these same table-like features that solve for some of the basic warehousing use cases. For example, Cloud Storage is strongly consistent at the object level. And starting with version 2.0 of the Cloud Storage Connector for Hadoop, cooperative locking is supported for directory modification operations performed though the Hadoop file system shell (hadoop fs command) and other HCFS API interfaces to Cloud Storage. However, in the open source community, Delta Lake and Apache Iceberg (Incubating) are two solutions that approximate traditional data warehouses in functionality. Apache Hudi (Incubating) is another solution to this problem that also provides a way to accommodate incremental data. While these file formats will involve some do-it-yourself operations, you can gain a lot of flexibility and portability using these open source file formats.  The start of table formats in OSS Big Data: Apache HiveAn intuitive way of organizing files as a table is using a directory structure. For example, a directory represents a table. Each of its subdirectories can be named based on partition values. Each of these subdirectories contains other subpartitions or data files. This is basically how Apache Hive manages data. A component separate but related to Hive, the Hive Metastore keeps track of this table and partition information. However, as Hive data warehouses increased in data size and moved to the cloud, the Hive approach to table formats started to expose its limitations. To name just a few:Hive requires a listing operation to find its data. It is expensive on object stores.The structure of Hive data storage is against the best practices of object store structure, which prefers data evenly distributed to avoid hot-spotting.Reading a table that is being written to can lead to the wrong result.Adding and dropping partitions directly on HDFS breaks atomicity and table statistics, which could lead to wrong results. Users have to know a table’s physical layout (partition columns) to write efficient queries. Changes to layout break user queries.To solve the limitations of existing table formats, the open source community has come up with table formats. Let’s see how to run them on Google Cloud. Running Apache Iceberg on Google CloudApache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. In addition to the new features listed above, Iceberg also added hidden partitioning. It abstracts the partitioning of a table by defining the relationship between the partition column and actual partition value. So changing table partitioning will not break user queries. Iceberg also provides a series of optimizations such as advanced filtering based on column statistics and other metrics, avoiding listing and renaming files, isolation and concurrent writing. Iceberg works very well with Dataproc. Iceberg uses a pointer to the latest version of a snapshot and needs a mechanism to ensure the atomicity when switching versions. Iceberg provides two options to track tables: Hive catalog—uses the Hive catalog and Hive Metastore to keep track of tablesHadoop tables—tracks tables by maintaining a pointer on Cloud StorageHive catalog tables rely on Hive Metastore to provide atomicity when switching pointers. Hadoop tables rely on a file system such as HDFS that provides atomic renaming operation. When using in Cloud Dataproc, Iceberg can utilize the Hive Metastore, which is backed by the Cloud SQL database. Using Iceberg with SparkTo get started, create a Cloud Dataproc cluster with the newest 1.5 image. After the cluster is created, SSH to the cluster and run Apache Spark.Now, you can get started by creating an Iceberg table on Cloud Storage using Hive Catalog. First, start spark-shell and tell it to use a Cloud Storage bucket to store data:Use the Hive catalog to create a table and write some data:Now, change the table schema by adding another column named “count”:Let’s add more data and see how it handles schema evolution:You can look at the table history:You can also look into the valid snapshots, manifest files, and data files:Let’s say we made a mistake by adding the row with value of id=6 and want to go back to see a correct version of the table using time travel:Running Delta Lake on Google CloudDelta Lake focuses on bringing RDBMS-like features to Spark. In addition to the features such as ACID transactions, time travel, and schema evolution, Delta Lake also provides the ability to delete, update, and upsert data.Delta Lake writes data files in the Parquet format on the storage system. Tables can be registered either using path or using Hive Metastore. On the metadata side, similar to Iceberg, Delta Lake manages table metadata in files on the storage system. Delta Lake uses a transaction log mechanism to make sure that there is a single source of truth for the system and to implement atomicity. Delta Lake breaks down user operations such as DELETE into a few actions such as add file or update metadata. Delta Lake then writes these actions into the transaction log in order as commits. Each commit will result in a JSON file under the _delta_log subdirectory under the table directory. Working with Cloud StorageDelta Lake stores transaction logs on object store when running on cloud. To ensure atomicity of the commit, writing to the transaction log needs to be atomic. Delta Lake relies on the storage system to provide that. In fact, it requires a few characteristics from storage:Atomic visibility of files: Any file written through this store must be made visible atomically. In other words, this should not generate partial files.Mutual exclusion: Only one writer must be able to create (or rename) a file at the final destination.Consistent listing: Once a file has been written in a directory, all future listings for that directory must return that file.Some object stores on other cloud providers do not provide a consistent listing, which could lead to data loss. So, in some cases Delta Lake only allows one single Spark session to write to the transaction log on the object store at a time. For the same reason, Delta Lake also disallows multiple writers, further limiting the concurrency and throughput of Delta Lake.Cloud Storage allows multiple writers to the transaction log and guarantees consistency , allowing you to use Delta Lake to its full potential. Delta Lake by default uses HDFS as the transaction log store. You can easily use Cloud Storage to store your transaction log by pointing your table to a location on Cloud Storage using the format of gs://<bucket-name>/<your-table>. Use Delta Lake with SparkTo get started, create a Dataproc cluster with the newest 1.5 image. After the cluster is created, SSH to the cluster and run Apache Spark.Now, you can get started by creating a Delta Lake table on Cloud Storage. First, start spark-shell and tell it to use a Cloud Storage bucket to store data:Let’s create a Delta Lake table on Cloud Storage:Next, overwrite the original data. Then, use time travel to see the earlier version:Inspect table history:Update:Vacuum/delete older snapshots:Running Apache Hudi on Google CloudAt the moment, Hudi can only run on Dataproc 1.3 version because of open issues like supporting Scala 2.12 and upgrading Avro library. You can get started with Apache Hudi using the following steps:After the Spark shell starts, use the quick start tutorial from Hudi. The example inside the quick start uses the file system. But you can simply point the file system to use Cloud Storage by starting with the following code:Get in touch with the Dataproc team anytime. Special thanks to Ryan Blue, who led the development of Apache Iceberg, for reviewing this blog.
Quelle: Google Cloud Platform