Bayer Crop Science seeds the future with 15000-node GKE clusters

Editor’s note: Today’s post examines how GKE’s support of up to 15,000 nodes per cluster benefits a wide range of use cases, including helping Bayer Crop Science rapidly process new information arriving from its genotyping labs.At Google, scalability is a core requirement for the products we build. With more enterprises adopting Google Kubernetes Engine (GKE), we’ve been working to push the limits of a GKE cluster way beyond the supported limits—specifically, clusters with up to 15,000 nodes. This is the most supported nodes of any cloud-based Kubernetes service, and 3X the number of nodes supported by open-source Kubernetes. There are various use cases when this kind of huge scale is useful:If you’re running large, internet-scale services If you need to simplify infrastructure management by having fewer clusters to manageBatch processing — shortening the time needed to process data by temporarily using much more resourcesTo absorb large spikes in resource demand, for example during a gaming launch, or an online ecommerce campaign. Being able to resize an existing cluster rather than provisioning a new one can improve the availability and performance of your service. Having 15,000 nodes per cluster is all the more significant when you consider that the scalability of an IT system is much more than just how many nodes it supports. A scalable system needs to be able to use a significant amount of resources and still serve its purpose. In the context of a Kubernetes cluster, the number of nodes is usually a proxy for the size of a cluster and its workloads. When you take a closer look though, the situation is far more complex.The scale of a Kubernetes cluster is like a multidimensional object composed of all the cluster’s resources—and scalability is an envelope that limits how much you can stretch that cube. The number of pods and containers, the frequency of scheduling events, the number of services and endpoints in each service—these and many others are good indicators of a cluster’s scale. The control plane must also remain available and workloads must be able to execute their tasks. What makes operating at a very large scale harder is that there are dependencies between these dimensions. For more information and examples, check out this document on Kubernetes Scalability thresholds and our GKE-specific scalability guidelines.The Kubernetes scalability envelope, based on http://www.gregegan.net/APPLETS/29/29.htmlIt’s not just hyperscale services that benefit from running on highly scalable platforms—smaller services benefit too. By pushing the limits of an environment’s scalability, you also expand your comfort zone, with more freedom to make mistakes and use non-standard design patterns without jeopardizing the reliability and performance of your infrastructure. For a real-world example of highly scalable platforms, today we are hearing from the team at Bayer Crop Science and learning about a recent project they designed.Setting on a journey to run at 15,000 node scaleTo make it possible for GKE users to run workloads that need more than 5,000 nodes in one cluster, we engaged a group of design partners into a closed early access program. Precision agriculture company Bayer Crop Science (BCS) is currently one of the biggest users of GKE, with some of the largest GKE clusters in the Google Cloud fleet. Specifically, it uses GKE to help it make decisions about which seeds to advance in its Research & Development pipeline, and eventually which products (seeds) to make available to farmers. Doing this depends upon having accurate and plentiful genotype data. With 60,000 germplasm in its corn catalog alone, BCS can’t test each seed population  individually, but rather, uses other data sets, like pedigree and ancestral genotype observations, to infer the likely genotypes of each population. This way, BCS data scientists can answer questions like “will this seed be resistant to a particular pest?”, reducing how much farmland they need each year to operate the seed production pipeline.Bayer Crop Science’s 60,000 member corn “galaxy,” where every dot is an individual corn germplasm, and every line is the relationship between them. Credit: Tim WilliamsonLast year, BCS moved its on-premises calculations to GKE, where the availability of up to 5,000-node clusters allowed scientists to precalculate the data they would need for the month, and run it as a single massive multi-day batch job. Previously, scientists had to specially request the genotype data they needed for their research, often waiting several days for the results. To learn more, watch this presentation from Next ‘19 by BCS’s Jason Clark. Bayer Crop Science infrastructure/architectureTo facilitate the rapid processing of new information arriving from genotyping labs, BCS implemented an event-driven architecture. When a new set of genotype observations passes quality control, it’s written to a service, and an event is published to a Cloud Pub/Sub topic. The inference engine watches this topic, and if the incoming events match the requirements to allow inference, a job request is created and placed on another topic. The inference engine worker nodes are deployed on the largest available Kubernetes cluster using a Horizontal Pod Autoscaler that looks at the depth of work on the incoming queue. Once a worker selects a job from the topic, it stages all the required inputs, including the genotype observations that initially triggered the job, and runs the genotype inference algorithm. Results are written into a service for accessibility and an event is emitted to a genotype inference topic. Downstream processes like decision making based on inferred genotypes are wired into this event stream and begin their work as soon as they receive the event.Click to enlargePreparations and joint testsBCS’s biggest cluster used to infer the data (a.k.a. for imputation) uses up to 4,479 nodes with 94,798 CPUs and 455 TB of RAM. And because that imputation pipeline is a highly parallelizable batch job, scaling it to run on a 15,000-node cluster was straightforward.In our joint tests we used the cluster hosting the inference engine and its autoscaling capabilities to overscale the size of the workload and amount of available resources. We aimed to scale the cluster from 0 to 15,000 nodes with large machines (16CPU highmem 104GB RAM), for a total of 240,000 CPU cores and 1.48PiB of RAM.To make sure that the resources here are provided at low cost, the cluster hosting the inference engine worker pods used exclusively preemptible instances, while the supporting services hosting the input data and handling outputs ran on regular instances. With preemptible VMs, BCS gets a massive amount of compute power, while slashing the costs of compute power almost by a factor of five.With 15,000 nodes at its disposal, BCS also saves a lot of time. In the old on-prem environment with 1,000 CPUs, BCS would have been able to process ~62,500,000 genotypes per hour. With clusters up to the 5,000 node limit BCS can process 100 times faster. And with 240,000 CPUs across 15,000 nodes, BCS can process ~15,000,000,000 genotypes per hour. That gives BCS the flexibility to make model revisions and quickly reprocess the entire data backlog, or quickly add inference based on new data sets, so data scientists can continue to work rather than waiting for batch jobs to finish.Lessons learned from running at large scaleBoth Google and BCS learned a lot from running a workload across a single 15,000 node cluster. For one thing, scaling the components that interact with the cluster proved to be very important. As GKE processed data with increased throughput, we had to scale up other components of the system too, e.g., increase the number of instances on which Spanner runs.Another important takeaway was the importance of managing preemptible VMs. Preemptible VMs are highly cost efficient but only run for up to 24 hours, during which period they can be evicted. To use preemptible VMs effectively, BCS checkpointed their Google Cloud Storage environment every 15 minutes. That way, if the job is preempted before it completes, the job request falls back into the queue and is picked up and continued by the next available worker. Sowing the seeds of innovationFor Bayer Crop Science to handle large amounts of genome data, it needs significant amounts of infrastructure on-demand. When all is said and done, being able to run clusters with thousands of nodes helps BCS deliver precomputed data quickly, for example, being able to reprocess the entire data set in two weeks. Up to 15,000 node clusters will help cut that time down to four days. This way analysts don’t have to request specific batches to be processed offline. BCS also realized the value of testing hypotheses on large datasets quickly, in a non-production setting. And thanks to this collaboration, all GKE users will soon be able to access these capabilities, with support for 15,000 node clusters broadly available later this year. Stay tuned for more updates from the GKE team. In particular, be sure to join our session during NEXT OnAir on August 25th. There we’ll talk about how Google Cloud is collaborating with large Mesos and Aurora users to offer similar hyperscale experiences on GKE.
Quelle: Google Cloud Platform

Rules Engine for Azure Front Door and Azure CDN is now generally available

Today we are announcing the general availability of the Rules Engine feature on both Azure Front Door and Azure Content Delivery Network (CDN). Rules Engine places the specific routing needs of your customers at the forefront of Azure’s global application delivery services, giving you more control in how you define and enforce what content gets served from where. Both services offer customers the ability to deliver content fast and securely using Azure’s best-in-class network. We have learned a lot from our customers during the preview and look forward to sharing the latest updates going into general availability.

How Rules Engine works

We recently talked about how we are building and evolving the architecture and design of Azure Front Door Rules Engine. The Rules Engine implementation for Content Delivery Network follows a similar design. However, rather than creating groups of rules in Rules Engine Configurations, all rules are created and applied to each Content Delivery Network endpoint. Content Delivery Network Rules Engine also boasts the concept of a global rule which acts as a default rule for each endpoint that always triggers its action.

General availability capabilities

Azure Front Door

The most important feedback we heard during the Azure Front Door Rules Engine preview was the need for higher rule limits. Effective today, you will be able to create up to 25 rules per configuration, for a total of 10 configurations, giving you the ability to create a total of 250 rules across your Azure Front Door. There remains no additional charge for Azure Front Door Rules Engine.

Azure Content Delivery Network 

Similarly, Azure Content Delivery Network limits have been updated. Through preview, users had access to five total rules including the global rule for each CDN endpoint. We are announcing that as part of general availability, the first five rules will continue to be free of charge, and users can now purchase additional rules to customize CDN behavior further. We’re also increasing the number of match conditions and actions within each rule to ten match conditions and five actions.

Rules Engine scenarios

Rules Engine streamlines security and content delivery logic at the edge, a benefit to both current and new customers of either service. Different combinations of match conditions and actions give you fine-grained control over which users get what content and make the possible scenarios that you can accomplish with Rules Engine endless.

For instance, it’s an ideal solution to address legacy application migrations, where you don’t want to worry about users accessing old applications or not knowing how to find content in your new apps. Similarly, geo match and device identification capabilities ensure that your users always see the optimal content their location and device are using. Implementing security headers and cookies with Rules Engine can also ensure that no matter how your users come to interact with the site, they are doing so over a secure connection, preventing browser-based vulnerabilities from impacting your site.

Here are some additional scenarios that Rules Engine empowers:

Enforce HTTPS, ensure all your end-users interact with your content over a secure connection.
Implement security headers to prevent browser-based vulnerabilities like HTTP Strict-Transport-Security (HSTS), X-XSS-Protection, Content-Security-Policy, X-Frame-Options, as well as Access-Control-Allow-Origin headers for Cross-Origin Resource Sharing (CORS) scenarios. Security-based attributes can also be defined with cookies.
Route requests to mobile or desktop versions of your application based on the patterns in the contents of request headers, cookies, or query strings.
Use redirect capabilities to return 301, 302, 307, and 308 redirects to the client to redirect to new hostnames, paths, or protocols.
Dynamically modify the caching configuration of your route based on the incoming requests.
Rewrite the request URL path and forward the request to the appropriate backend in your configured backend pool.
Optimize media delivery to tune the caching configuration based on file type or content path (Azure Content Delivery Network only).

Next steps

We look forward to working with more customers using both Azure Front Door and Content Delivery Network Rules Engine. For more information, please see the documentation for Azure Front Door Rules Engine and Azure Content Delivery Network Rules Engine.
Quelle: Azure

Azure Container Registry: Securing container workflows

Securing any environment requires multiple lines of defense. Azure Container Registry recently announced the general availability of features like Azure Private Link, customer-managed keys, dedicated data-endpoints, and Azure Policy definitions. These features provide tools to secure Azure Container Registry as part of the container end-to-end workflow.

Customer-managed keys

By default, when you store images and other artifacts in an Azure Container Registry, content is automatically encrypted at rest with Microsoft-managed keys.

Choosing Microsoft-managed keys means that Microsoft oversees managing the key’s lifecycle. Many organizations have stricter compliance needs, requiring ownership and management of the key’s lifecycle and access policies. In such cases, customers can choose customer-managed keys that are created and maintained in a customer’s Azure Key Vault instance. Since the keys are stored in Key Vault, customers can also closely monitor the access of these keys using the built-in diagnostics and audit logging capabilities  in Key Vault. Customer-managed keys supplement the default encryption capability with an additional encryption layer using keys provided by customers. See details on how you can create a registry enabled for customer-managed keys.

Private links

Container Registry previously had the ability to restrict access using firewall rules. With the introduction of Private Link, the registry endpoints are assigned private IP addresses, routing traffic within your virtual network and the service through a Microsoft backbone network.

Private Link support has been one of the top asks, allowing customers to benefit from the Azure management of their registry while benefiting from tightly controlled network ingress and egress.

Private links are available across a wide range of Azure resources with more coming soon, allowing a wide range of container workloads with the security of a private virtual network. See documentation on how to configure Azure Private Link for Container Registry.

Dedicated data-endpoints

Private Link is the most secure way to control network access between clients and the registry as network traffic is limited to the Azure Virtual Network. When Private Link can't be used, dedicated data-endpoints can minimize data exfiltration concerns. Enabling dedicated data endpoints means they can configure firewall rules with fully qualified domain names ([registry].[region].data.azurecr.io) rather than a rule with wildcard (*.blob.core.windows.net) for all storage accounts.

You can enable dedicated data-endpoints using the Azure portal or the Microsoft Azure CLI. The data endpoints follow a regional pattern, <registry-name>.<region>.data.azurecr.io. In a geo-replicated registry, enabling data endpoints allows endpoints in all replica regions. Review the documentation on how to enable dedicated data endpoints to learn more.

Azure built-in policies

Having security capabilities will secure your workflows if they’re implemented. To assure your Azure resources are following the best security practices, Azure Container Registry has added built-in Azure Policy definitions that you can leverage to enforce security rules. Here are some of the built-in policies that you can enable for your container registry:

Container Registries should be encrypted with a customer-managed key. Audit Container Registries that do not have encryption enabled with customer-managed keys.
Container Registries should not allow unrestricted network access. Audit Container Registries that do not have any network (IP or VNET) rules configured and allow all network access by default. Container Registries with at least one IP or firewall rule, or configured virtual network will be deemed compliant.
Container Registries should use private links. Audit Container Registries that do not have at least one approved private endpoint connection. Clients in a virtual network can securely access resources that have private endpoint connections through private links.

Using Azure Policy, you can ensure that your registries stay compliant with your organization's compliance needs.

Additional links

Learn more about Azure Container Registry.
UserVoice: To vote for existing requests or create a new request.
Issues: To view existing bugs and issues or log new ones.
Azure Container Registry documentation: For Azure Container Registry tutorials and documentation.

Quelle: Azure

Streamline connectivity and improve efficiency for remote work using Azure Virtual WAN

Today, we see a huge shift to remote work due to the global pandemic. Organizations around the world need to enable more of their employees to work remotely. We are working to address common infrastructure challenges businesses face when helping remote employees stay connected at scale.

A common operational challenge is to seamlessly connect remote users to on-premises resources. Even within Microsoft, we’ve seen our typical remote access of roughly 55,000 employees spike to as high as 128,000 employees while we’re working to protect our staff and communities during the global pandemic. Traditionally, you planned for increased user capacity, deployed additional on-premises connectivity resources, and had time to re-arrange routing infrastructure to meet organization transit connectivity and security requirements. Today’s dynamic environment demands rapid enablement of remote connectivity. Azure Virtual WAN supports multiple scenarios providing large scale connectivity and security in a few clicks.

Azure Virtual WAN provides network and security in a unified framework. Typically deployed with a hub and spoke topology, the Azure Virtual WAN architecture enables scenarios such as:

Branch connectivity via connectivity automation provided by Virtual WAN VPN/SD-WAN partners.
IPsec VPN connectivity.
Remote User VPN (Point-to-Site) connectivity.
Private (ExpressRoute) connectivity.
Intra cloud connectivity (transitive connectivity for Virtual Networks).
Transit connectivity for VPN and ExpressRoute.
Routing.
Security with Azure Firewall and Firewall Manager.

Organizations can quickly use Virtual WAN to deploy remote user connectivity in minutes and provide access to on-premises resources. A standard virtual WAN allows fully meshed hubs and routing infrastructure.

 
Here is how to support remote users:

Set up remote user connectivity: Connect to your Azure resources with an IPsec/IKE (IKEv2) or OpenVPN connection. This requires a virtual private network (VPN) client to be configured for the remote user. The Azure VPN Client, OpenVPN Client, or any client that supports IKEv2 can be used. For more information, see Create a point-to-site connection.
Enable connectivity from the remote user to on-premises: Two options are:

Set up Site-to-Site connectivity with an existing VPN device. When you connect the IPsec VPN device to Azure Virtual WAN hub, interconnectivity between the Point-to-Site User VPN (remote user) and Site-to-Site VPN is automatic. For more information on how to set up Site-to-Site VPN from your on-premise VPN device to Azure Virtual WAN, see Create a Site-to-Site connection using Virtual WAN.
Connect your ExpressRoute circuit to the Virtual WAN hub. Connecting an ExpressRoute circuit requires deploying an ExpressRoute gateway in Virtual WAN. As soon as you have deployed one, interconnectivity between the Point-to-Site User VPN and ExpressRoute user is automatic. To create the ExpressRoute connection, see Create an ExpressRoute connection using Virtual WAN. You can use an existing ExpressRoute circuit to connect to Azure Virtual WAN.

Connect your Azure resources to the Virtual Hub: Select a Virtual Network and attach it to your hub of choice.
Set up firewall policies in Virtual Hub: A secured virtual hub is an Azure Virtual WAN hub with associated security and routing policies configured by Azure Firewall Manager. Use secured virtual hubs to easily create native security services for traffic governance and protection. You can choose the services to protect and govern your network traffic with Azure Firewall. Azure Firewall Manager also allows you to use your familiar, best-in-breed, third-party security as a service (SECaaS) offerings to protect Internet access for your users. To create a firewall policy and secure your hub, see Secure your cloud network with Azure Firewall Manager using the Azure portal.

Learn more

For additional information, please explore these resources.

•    Virtual WAN Global Transit Architecture.
•    SD-WAN Connectivity Architecture with Virtual WAN.
•    Virtual WAN Monitoring (metrics and logs).
•    Install Azure Firewall in Virtual Hub.
•    Virtual WAN FAQ.
•    Virtual WAN pricing.
•    Using Azure Virtual WAN to support remote work documentation.
Quelle: Azure

Containerize Your Go Developer Environment – Part 3

In this series of blog posts, we show how to put in place an optimized containerized Go development environment. In part 1, we explained how to start a containerized development environment for local Go development, building an example CLI tool for different platforms. Part 2 covered how to add Go dependencies, caching for faster builds and unit tests. This third and final part is going to show you how to add a code linter, a GitHub Action CI, and some extra build optimizations.

Adding a linter

We’d like to automate checking for good programming practices as much as possible so let’s add a linter to our setup. First step is to modify the Dockerfile:

# syntax = docker/dockerfile:1-experimentalFROM –platform=${BUILDPLATFORM} golang:1.14.3-alpine AS base WORKDIR /src ENV CGO_ENABLED=0 COPY go.* . RUN go mod download COPY . .FROM base AS build ARG TARGETOS ARG TARGETARCH RUN –mount=type=cache,target=/root/.cache/go-build   GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -o /out/example .FROM base AS unit-test RUN –mount=type=cache,target=/root/.cache/go-build   go test -v .FROM golangci/golangci-lint:v1.27-alpine AS lint-base
FROM base AS lintCOPY –from=lint-base /usr/bin/golangci-lint /usr/bin/golangci-lintRUN –mount=type=cache,target=/root/.cache/go-build   –mount=type=cache,target=/root/.cache/golangci-lint   golangci-lint run –timeout 10m0s ./…FROM scratch AS bin-unix COPY –from=build /out/example / …

We now have a lint-base stage that is an alias for the golangci-lint image which contains the linter that we would like to use. We then have a lint stage that runs the lint, mounting a cache to the correct place.

As for the unit tests, we can add a lint rule to our Makefile for linting. We can also alias the test rule to run the linter and unit tests:

all: bin/example test: lint unit-testPLATFORM=local.PHONY: bin/example bin/example:    @docker build . –target bin     –output bin/     –platform ${PLATFORM}.PHONY: unit-test unit-test:    @docker build . –target unit-test.PHONY: lint lint:    @docker build . –target lint

Adding a CI

Now that we’ve containerized our development platform, it’s really easy to add CI for our project. We only need to run our docker build or make commands from the CI script. To demonstrate this, we’ll use GitHub Actions. To set this up, we can use the following .github/workflows/ci.yaml file:

name: Continuous Integrationon: [push]jobs:  ci:    name: CI    runs-on: ubuntu-latest    env:       DOCKER_BUILDKIT: “1”    steps:     – name: Checkout code       uses: actions/checkout@v2     – name: Run linter       run: make lint     – name: Run unit tests       run: make unit-test     – name: Build Linux binary       run: make PLATFORM=linux/amd64     – name: Build Windows binary       run: make PLATFORM=windows/amd64

Notice that the commands we run on the CI are identical to those that we use locally and that we don’t need to do any toolchain configuration as everything is already defined in the Dockerfile!

One last optimization

Performing a COPY will create an extra layer in the container image which slows things down and uses extra disk space. This can be avoided by using RUN –mount and bind mounting from the build context, from a stage, or an image. Adopting this pattern, the resulting Dockerfile is as follows:

# syntax = docker/dockerfile:1-experimentalFROM –platform=${BUILDPLATFORM} golang:1.14.3-alpine AS base WORKDIR /src ENV CGO_ENABLED=0 COPY go.* . RUN go mod downloadFROM base AS build ARG TARGETOS ARG TARGETARCH RUN –mount=target=.   –mount=type=cache,target=/root/.cache/go-build   GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -o /out/example .FROM base AS unit-test RUN –mount=target=.   –mount=type=cache,target=/root/.cache/go-build   go test -v .FROM golangci/golangci-lint:v1.27-alpine AS lint-base
FROM base AS lintRUN–mount=target=.   –mount=from=lint-base,src=/usr/bin/golangci-lint,target=/usr/bin/golangci-lint   –mount=type=cache,target=/root/.cache/go-build   –mount=type=cache,target=/root/.cache/golangci-lint   golangci-lint run –timeout 10m0s ./…
FROM scratch AS bin-unix COPY –from=build /out/example /FROM bin-unix AS bin-linuxFROM bin-unix AS bin-darwinFROM scratch AS bin-windows
COPY –from=build /out/example /example.exeFROM bin-${TARGETOS} AS bin

The default mount type is a read only bind mount from the context that you pass with the docker build command. This means that you can replace the COPY . . with a RUN –mount=target=. wherever you need the files from your context to run a command but do not need them to persist in the final image.

Instead of separating the Go module download, we could remove this and just use a cache mount for /go/pkg/mod.

Conclusion

This series of posts showed how to put in place an optimized containerized Go development environment and then how to use this same environment on the CI. The only dependencies for those who would like to develop on such a project are Docker and make– the latter being optionally replaced by another scripting language.

You can find the source for this example on my GitHub: https://github.com/chris-crone/containerized-go-dev

You can read more about the experimental Dockerfile syntax here: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md

If you’re interested in build at Docker, take a look at the Buildx repository: https://github.com/docker/buildx
The post Containerize Your Go Developer Environment – Part 3 appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/