Building cloud-native applications with Azure and HashiCorp

With each passing year, more and more developers are building cloud-native applications. As developers build more complex applications they are looking to innovators like Microsoft Azure and HashiCorp to reduce the complexity of building and operating these applications. HashiCorp and Azure have worked together on a myriad of innovations. Examples of this innovation include tools that connect cloud-native applications to legacy infrastructure and tools that secure and automate the continuous deployment of customer applications and infrastructure. Azure is deeply committed to being the best platform for open source software developers like HashiCorp to deliver their tools to their customers in an easy-to use, integrated way. Azure innovation like the managed applications platform that power HashiCorp’s Consul Service on Azure are great examples of this commitment to collaboration and a vibrant open source startup ecosystem. We’re also committed to the development of open standards that help these ecosystems move forward and we’re thrilled to have been able to collaborate with HashiCorp on both the CNAB (Cloud Native Application Bundle) and SMI (Service Mesh Interface) specifications.

Last year at HashiConf 2018, I had the opportunity to share how we had started to integrate Terraform and Packer into the Azure platform. I’m incredibly excited to get the opportunity to return this year to share how these integrations are progressing and to share a new collaboration on cloud native networking. With this new work we now have collaborations that help customers connect and operate their applications on Azure using HashiCorp technology.

Connect — HashiCorp Consul Service on Azure

After containers and Kubernetes, one of the most important innovations in microservices has been the development of the concept of a service mesh. Earlier this year we partnered with HashiCorp and others to announce the release of Service Mesh Interface, a collaborative, implementation agnostic API for the configuration and deployment of service mesh technology. We collaborated with HashiCorp to produce a control rules implementation of the traffic access control (TAC) using Consul Connect. Today we’re excited that Azure customers can take advantage of HashiCorp Consul Services on Azure powered by the Azure Managed Applications platform. HashiCorp Consul provides a solution to simplify and secure service networking and with this new managed offering, our joint customers can focus on the value of Consul while confident that the experts at HashiCorp are taking care of the management of the service. Reducing complexity for customers and enabling them to focus on cloud native innovation.

Provision — HashiCorp Terraform on Azure

HashiCorp Terraform is a great tool for doing declarative deployment to Azure. We're seeing great momentum with adoption of HashiCorp Terraform on Azure as the number of customers has doubled since the beginning of the year – customers are using Terraform to automate Azure infrastructure deployment and operation in a variety of scenarios. 

The momentum is fantastic on the contribution front as well with nearly 180 unique contributors to the Terraform provider for Azure Resource Manager. The involvement from the community with our increased 3-week cadence of releases (currently at version 1.32) ensures more coverage of Azure services by Terraform. Additionally, after customer and community feedback regarding the need for additional Terraform modules for Azure, we've been working hard at adding high quality modules and now have doubled the number of Azure modules in the terraform registry, bringing it to over 120 modules. 

We believe all these additional integrations enable customers to manage infrastructure as code more easily and simplify managing their cloud environments. Learn more about Terraform on Azure.

Microsoft and HashiCorp are working together to provide integrated support for Terraform on Azure. Customers using Terraform on Microsoft's Azure cloud are mutual customers, and both companies are united to provide troubleshooting and support services. This joint entitlement process provides collaborative support across companies and platforms while delivering a seamless customer experience. Customers using Terraform Provider for Azure can file support tickets to Microsoft support. Customers using Terraform on Azure support can file support tickets to Microsoft or HashiCorp.

Deploy — Collaborating on Cloud Native Application Bundles specification

One of the critical problems solved by containers is the hermetic packaging of a binary into a package that is easy to share and deploy around the world. But a cloud-native application is more than a binary, and this is what led to the co-development, with HashiCorp and others, of the Coud Native Application Bundle (CNAB) specification. CNABs  allow you to package images alongside configuration tools like Terraform and other artifacts to allow a user to seamlessly deploy an application from a single package. I’ve been excited to see the community work together to build the specification to a 1.0 release that shows CNAB is ready for all of the world’s deployment needs. Congratulations to the team on the work and the fantastic partnership.

If you want to learn more about the ways in which Azure and HashiCorp collaborate to make cloud-native development easier, please check out the links below:

HashiCorp announcement regarding the HashiCorp Consul Service offering on Azure
Terraform on Azure
Cloud Native application bundles (CNAB)

Quelle: Azure

Alpha access to Cloud Dataproc Jobs on GKE

Cloud Dataproc is Google Cloud’s fully managed Apache Hadoop and Spark service. The mission of Cloud Dataproc has always been to make it simple and intuitive for developers and data scientists to apply their existing tools, algorithms, and programming languages to cloud-scale datasets. Its flexibility means you can continue to use the skills and techniques you are already using to explore data of any size. We hear from enterprises and SaaS companies around the world that they’re using Cloud Dataproc for data processing and analytics. Cloud Dataproc now offers alpha access to Spark jobs on Google Kubernetes Engine (GKE). (Find out more about the program here.) This means you can take advantage of the latest approaches in machine learning and big data analysis (Apache Spark and Google Cloud) together with the state-of-the-art cloud management capabilities that developers and data scientists have come to rely upon with Kubernetes and GKE. Using these tools together can bring you flexibility, auto-healing jobs, and a unified infrastructure, so you can focus on workloads, not maintaining infrastructure. Email us for more information and to join the alpha program. Let’s take a look at Cloud Dataproc in its current form and what the new GKE alpha offers. Cloud Dataproc now: Cloud-native Apache Spark Cloud Dataproc has democratized big data and analytics processing for thousands of customers, offering the ability to spin up a fully loaded and configured Apache Spark cluster in minutes. With Cloud Dataproc, features such as Component Gateway enable secure access to notebooks with zero setup or installation, letting you immediately start exploring data of any size. These notebooks, combined with Cloud Dataproc Autoscaling, make it possible to run ML training or process data of various sizes without ever having to leave your notebook or worry about how the job will get done. The underlying Cloud Dataproc cluster simply adjusts compute resources as needed, within predefined limits. Once your ML model or data engineering job is ready for production, or for use in an automated or recurring way, you can use the Cloud Dataproc Jobs API to submit a job to an existing Cloud Dataproc cluster with a jobs.submit call over HTTP, using the gcloud command-line tool, or in the Google Cloud Platform Console itself. Submitting your Spark code with the Jobs APIs ensures the jobs are logged and monitored, in addition to having them managed across the cluster. It also makes it easy to separate the permissions of who has access to submit jobs on a cluster and who has permissions to reach the cluster itself, without needing a gateway node or an application like Livy. Cloud Dataproc next: Extending the Jobs API with GKEThe Cloud Dataproc Jobs API has been a perfect match for companies who prefer to wrap their job automation and extract, transform, and load processing (ETL) jobs in custom tooling such as Spotify’s Spydra or Cloud Dataproc’s Workflow Templates. However, developers and data scientists who have embraced containerization and the cloud management capabilities of Kubernetes have started to demand more from their big data processing services. In order to automate your Spark job today, you would either need to continue running the cluster that created the job (expensive and does not take advantage of the pay-as-you-need capability of the cloud), or you need to carefully track how to re-create that same cluster environment in the cloud, which can become a complicated mixture of configurations, initialization scripts, conda environments, and library/package management scripts. This process can be additionally cumbersome in multi-tenant environments, where various software packages, configurations, and OS updates may conflict.  With Cloud Dataproc on Kubernetes, you can eliminate the need for multiple types of clusters that have various sets of software, and the complexity that’s involved. By extending the Cloud Dataproc Jobs API to GKE, you can package all the various dependencies of your job into a single Docker container. This Docker container allows you to integrate Spark jobs directly into the rest of your software development pipelines. Additionally, by extending the Cloud Dataproc Jobs API for GKE, administrators have a unified management system where they can tap into their Kubernetes knowledge. You can avoid having a silo of Spark applications that need to be managed in standalone virtual machines or in Apache Hadoop YARN.Kubernetes: Yet another resource negotiator? Apache Hadoop YARN (introduced in 2012) is a resource negotiator commonly found in Spark platforms across on-prem and cloud. YARN provides the core capabilities of scheduling computing resources in Cloud Dataproc clusters that are based on Compute Engine. By extending the Jobs API in Cloud Dataproc with GKE, you can choose to replace your YARN management with Kubernetes. There are some key advantages to using Kubernetes over YARN: 1. Flexibility.Greater flexibility of production jobs can be achieved by having a consistent configuration of software libraries embedded with the Spark code. Containerizing Spark jobs isolates dependencies and resources at the job level instead of the cluster level. This flexibility will give you more predictable workload cycles and make it easier to target your troubleshooting when something does go wrong.  2. Auto-healing. Kubernetes provides declarative configuration for your Spark jobs. This means that you can declare at the start of the job the resources required to process the job. If for some reason Kubernetes resources (i.e., executors) become unhealthy, Kubernetes will automatically restore them and your job will continue to run with the resources you declared at the onset.   3. Unified infrastructure.At Google, we have used a system called Borg to unify all of our processing, whether it’s a data analytics workload, web site, or anything else. Borg’s architecture served as the basis for Kubernetes, which you can use to remove the need for a big data (YARN) silo. By migrating Spark jobs to a single cluster manager, you can focus on modern cloud management in Kubernetes. At Google, having a single cluster manager system has led to more efficient use of resources and provided a unified logging and management framework. This same capability is now available to your organization. Kubernetes is not just “yet another resource negotiator” for big data processing. It’s an entirely new way of approaching big data that can greatly improve the reliability and management of your data and analytics workloads. Spark jobs on GKE in actionLet’s walk through what is involved with submitting an Apache Spark job to Cloud Dataproc on GKE during the alpha phase. Step 0: Register your GKE cluster with Cloud DataprocBefore you can execute Cloud Dataproc jobs on GKE, you must first register your GKE cluster with Cloud Dataproc. During alpha, the registration will be completed with a helm installation.  Once the GKE cluster is registered, you will be able to see your GKE cluster unified with the rest of your Cloud Dataproc clusters by running the command:Step 1: Define your Cloud Dataproc Docker containerCloud Dataproc offers Docker images that will match the bundle of software provided on theCloud Dataproc image version list. The alpha offering contains an image based on Debian 9 Stretch that mirrors the same Spark 2.4.3 package as Cloud Dataproc 1.4. This makes it seamless to port Spark code between Cloud Dataproc running on Compute Engine and Cloud Dataproc jobs on GKE. This Docker container encapsulates not only Cloud Dataproc’s agent for job management but also builds on top of Google Cloud’s Spark Operator for Kubernetes (in beta). This fully open source operator provides many of the integrations between Kubernetes and the rest of the Google Cloud Platform, including:Integration with BigQuery, Google’s serverless data warehouse Google Cloud Storage as a replacement for HDFSLogs shipped to Stackdriver MonitoringAccess to sparkctl, a command-line tool that simplifies client-local application dependencies in a Kubernetes environment.This Cloud Dataproc Docker container can be customized to include all the packages and configurations needed for your Spark job. Step 2: Submit your job Once the Docker container is ready, you can submit a Cloud Dataproc job to the GKE cluster. You can follow the same instructions that you would use for any Cloud Dataproc Spark job.Extending Cloud Dataproc with your own container Running the above job will mirror a software environment on Kubernetes to that found on Cloud Dataproc. However, with the GKE option, there is an extra benefit of being able to specify a container image associated with the job. This container property provides a reliable pairing of your job code and necessary software configurations.Starting your Cloud Dataproc on Kubernetes testingAt Google Cloud, we work with thousands of customers who have migrated production workloads to Kubernetes and reaped the benefits described in this post. However, it’s important to note that while Cloud Dataproc is a generally available service used to run a variety of mission-critical applications across enterprises, the Cloud Dataproc on GKE feature is in alpha and still under active development. Kubernetes support in the latest stable version of Spark is still considered an experimental feature. In future versions, there may be behavior changes around configuration, container images, and entry points. The Google Cloud Spark Operator that is core to this Cloud Dataproc offering is also a beta application and subject to the same stipulations. So far, we’ve been very impressed and excited by the preliminary adoption and new workloads customers have opened by running their Spark processing on Kubernetes. We’re looking forward to taking this journey to production together with our customers and invite you to join our alpha program.  Email us for more information and to join the alpha program.
Quelle: Google Cloud Platform

Modernize Apache Spark with Cloud Dataproc on Kubernetes

Google Cloud Dataproc provides open source data and analytic processing for data engineers and data scientists who need to process data and train models faster at scale. However, as enterprise infrastructure becomes increasingly hybrid in nature, machines can sit idle, single workload clusters continue to sprawl, and open source software and libraries continue to become outdated and incompatible with your stack. It’s critical that Cloud Dataproc continues to empower data professionals to focus more on workloads than infrastructure by combining the best of cloud and open source. We’re happy to announce alpha availability of Cloud Dataproc for Kubernetes so that we can continue to support this vision. With this announcement, we are bringing enterprise-grade support, management, and security to Apache Spark jobs running on GKE clusters. (Get all the technical details here.)“Enterprises are increasingly looking for products and services that support data processing across multiple locations and platforms,” said Matt Aslett, Research Vice President at 451 Research. “The launch of Cloud Dataproc on Kubernetes is significant in that it provides customers with a single control plane for deploying and managing Apache Spark jobs on Google Kubernetes Engine in both public cloud and on-premises environments.”This is the first step in a larger journey to a container-first world. While Apache Spark is the first open source processing engine we will bring to Cloud Dataproc on Kubernetes, it won’t be the last. Kubernetes has flipped the big data and machine learning open source software (OSS) world on its head, since it gives data scientists and data engineers a way to unify resource management, isolate jobs, and build resilient infrastructures across any environment. Deploy unified resource management With this alpha announcement, big data professionals are no longer obligated to deal with two separate cluster management interfaces to manage open source components running on Kubernetes and YARN. Using Cloud Dataproc’s new capabilities, you’ll get one central view that can span both cluster management systems. Supporting both YARN and Kubernetes can bring your enterprise the needed flexibility to modernize certain hybrid workloads while continuing to monitor YARN-based workloads. Isolate OSS jobs to accelerate the analytics life cycleContainerizing and isolating OSS jobs on Kubernetes will allow data professionals to move faster and remove the version and library dependencies associated with traditional big data technologies. You can move models and new ETL pipelines from dev to production without having to worry about compatibility. Building on a new agile infrastructure like Kubernetes will make OSS easier and faster to upgrade. Build resilient infrastructureDeploying Spark jobs on a self-healing GKE environment will help mission-critical ETL and machine learning jobs run smoothly. Data scientists and data engineers don’t have to worry about sizing and building clusters, manipulating Docker files, or messing around with Kubernetes networking configurations. It just works. With leading support from the team that built Kubernetes, enterprises have access to the skills they need to close any Kubernetes skills gap on their team. Open source has always been a core pillar of Google Cloud’s data and analytics strategy. As we continue to work with the community to set industry standards, we continue to integrate those standards into our services so organizations around the world can unlock the value of data faster. Moving Cloud Dataproc to Kubernetes involved changes to Cloud Dataproc and the open-source ecosystem that we run as a managed service. We will continue to work with other open source communities, like Apache Flink, to enable Cloud Dataproc on Kubernetes capabilities for more and more open source projects. This alpha announcement of bringing enterprise-grade support, management, and security to Apache Spark jobs on Kubernetes is the first of many as we aim to simplify infrastructure complexities for data scientists and data engineers around the world. Email us for more information and to join the alpha program. Also, be sure to check out the tech deep dive on this alpha.
Quelle: Google Cloud Platform

How to deploy a Windows container on Google Compute Engine

Last year, we published a blog post and demonstrated how to deploy a Windows container running Windows Server 2016 on Google Compute Engine. Since then, there have been a number of important developments. First, Microsoft announced the availability of Windows Server 2019. Second, Kubernetes 1.14 was released with support for Windows nodes and Windows containers.Supporting Windows workloads and helping you modernize your apps using containers and Kubernetes is one of our top priorities at Google Cloud. Soon after the Microsoft and Kubernetes announcements, we added support for Windows Server 2019 in Compute Engine and Windows containers in Google Kubernetes Engine (GKE). Given this expanded landscape for Windows containers on Google Cloud, let’s take a fresh look at how best to deploy and manage them. In this first post, we’ll show you how to deploy an app to a Windows container on Windows Server 2019 on Compute Engine. Then stay tuned for the next post, where we’ll deploy and manage the same Windows container via Kubernetes 1.14 on GKE. Let’s get started!Create a Windows Server instance on Compute EngineFirst, you need a Windows Server instance on which to run a Windows container. Compute Engine comes with many flavors of Windows Server (Server vs. Server Core), and many versions (2008 to 2019). There are also container-optimized versions that come with Docker and some base images already installed. For this exercise, let’s choose the latest container-optimized version of Windows Server. In Google Cloud console, create a VM with the Windows Server 2019 Datacenter for Containers image:Make sure that HTTP/HTTPS traffic is enabled:And also make sure to select “Allow full access to all Cloud APIs”:Allowing HTTP/HTTPs and Cloud API traffic will be useful later when we want to push/pull Docker images. Once the VM is up and running, you can set a Windows password and connect into it using Remote Desktop (RDP). Inside the VM, open a Command Prompt in Admin mode and type the following:As you can see, Docker is already installed and the Windows Server Core image for 2019 is already on the VM (“pulled,” in Docker-speak). We will use this as a base image for our Windows container app. Create a Windows container appFor the app inside the Windows container, let’s use an IIS Web Server. IIS has an image for Windows Server 2019. We could use the image as is and it will serve the default IIS page. But let’s do something more interesting and have IIS serve a page we define. Create a folder called my-windows-app with the following folder and file structure:Replace index.html with the following content:This is the page IIS will serve. Build a Docker imageNext, let’s create a Dockerfile for the Docker image. Notice that we’re using the IIS Container image version that is compatible with Windows Server 2019:Build the Docker image and tag it with Google Container Registry and your project id. This will be useful when we push the image to Container Registry later (replace dotnet-atamel with your project id):Once the Docker image is built, you will be able to see it along with its IIS dependency:Run your Windows containerWe’re now ready to run the Windows container. From inside the command prompt, run the container and expose it on port 80:You can check that the container is running:To see the web page, go to the External IP column of the Compute Engine instance and simply open it with HTTP in the browser:We’re now running an IIS site inside a Windows container! If you want to try out these steps on your own, we also published a codelab on this topic:Note that this setup is not ideal for production. It does not survive server restarts or crashes. In a production system, you’ll want a static IP for your VM and create a startup script to start the container. This will take care of server restarts but doesn’t help so much for server crashes. To make the app resilient against server crashes, you can run the container inside a pod managed by Kubernetes. The process for doing this will be the topic of our next blog post.
Quelle: Google Cloud Platform