Easily upgrade Windows Server 2008 R2 while migrating to Google Cloud

Cloud migration projects almost always involve components that span multiple applications, technologies, and platforms. This means they also usually require a comprehensive strategy including multiple products and solutions to get migrations over the finish line. There’s no magical ‘one size fits all’ approach, much as we wish there were. Of all the applications you’ve got running and might be looking to migrate, at least a few are probably running on Windows Server 2008 R2—in fact, as many as 60% of systems globally are estimated to still be running this version of Windows Server. Now, unless you’ve just returned from a nice three-year vacation you probably know that Windows Server 2008 R2 reached End of Support (EOS) last month. This is likely to put some of you in a bind, and might accelerate your schedule to try and migrate or replatform these systems. Given that time is of the essence and that you probably don’t have unlimited resources to dedicate to this transition, we’re happy to announce a new feature for Migrate for Compute Engine that lets you simultaneously migrate and replatform your Windows Server 2008 R2 systems into Windows Server 2012. Everything you had running on the original system will persist, but when the migration is done it’ll be running the new OS, Windows Server 2012. You can do this with your physical and virtual servers from on-prem, and also with VMs currently running in AWS or Azure. It’s a fast way to accomplish your upgrade and migration goals in one seamless motion, and better than the other options you might be facing.  Speaking of other options, let’s review some of the alternatives. Another option is to perform a manual, in-place upgrade on any of your Windows Server 2008 R2 VMs (current guidance and installation media). You could also manually create another VM that’s running a supported version of the OS (such as Windows Server 2012 or 2016) and then manually migrate your apps and data. The challenge with both of these options, though, is that they require time and staff. That’s why we’re so excited for this new Migrate for Compute Engine feature, which lets you avoid these time and staff challenges completely, by simply replatforming to Windows Server 2012 while you migrate. Technically, there’s one more option for solving your Windows Server 2008 R2 problem: paying a huge sum to Microsoft to buy you another few years of security updates. But that’s costly and just ‘kicks the can down the road,’ as they say. That’s why we recommend you take a more proactive approach, which also happens to be faster, easier, and less costly. Whichever route you ultimately choose, it’s always a best practice to make sure your systems are running current, supported versions of the OS. We’re confident that Migrate for Compute Engine will help you accomplish this goal more smoothly than traditional approaches to replatforming. And when it comes to migration, where you likely have a lot of things to figure out, our mission at Google Cloud is to provide migration pathways that are low in risk and toil, but high in success! To learn more, read up on Migrate for Compute Engine or check out our data center migration solutions page.
Quelle: Google Cloud Platform

Working with Hedera Hashgraph to support the next generation of distributed ledger technology

Distributed ledger technology (DLT) developers continue to present exciting opportunities for the expansion of the digital economy. DLT can increase security and decrease the friction and costs associated with the secure exchange of information, from family photos to international remittances. While many recent DLT advancements have occurred primarily on top of public and permissionless DLTs such as Ethereum and Bitcoin, the challenge of delivering fast and inexpensive transactions with finality has slowed the rate of DLT adoption by large enterprises that require the ability to process thousands of transactions per second. To address these challenges, Hedera developed a DLT that’s based on the hashgraph consensus algorithm. This algorithm was leveraged to build the Hedera network which began open access in September 2019. The Hedera network now offers the accessibility advantages of an open public network and the technical capability to securely deliver large volumes of faster and less expensive transactions to enterprises all over the world.  But low levels of friction are not enough to ensure organizations will conduct cryptoeconomic activity on Hedera’s network; participants must also trust in the stability and reliability of the operations and governance of Hedera’s protocol and network. Since the inception of its DLT, Hedera has ensured the stability and reliability of its network by leveraging the Google Cloud Platform (GCP) premium network tier, a highly provisioned, low-latency global fiber optic network. By using Google’s network, Hedera has been able to focus on nurturing its community of developers who are building high throughput, low latency decentralized applications (dApps) on Hedera. “We initially turned to Google Cloud for its ease of use, networking advantages and overall platform performance,” said Atul Mahamuni, SVP of Products for Hedera. “The Google Cloud Platform was the perfect foundation to help us achieve our goals to bring enterprise-ready DLT adoption at scale.” Now, Hedera is expanding this relationship by selecting Google Cloud as a preferred cloud provider for its public testnets and Hedera Consensus Service ecosystem. Thanks to the extensive geographical coverage of Google’s network, the growing global Hedera developer community can ensure both access and performance of their applications no matter where they are in the world. “As we continue to evolve our infrastructure, we’re excited to expand our work with Google Cloud to now make DLT even more readily available for our community members, no matter their size or industry,” said Mahamuni. “This is about developing solutions that drive real business value.” Finally, Google is becoming a member of the Hedera Governing Council, which leads strategic planning for the Hedera network and works to ensure the stability and reliability of network protocols and operations. As part of our council membership, Google Cloud will operate a Hedera network node, and make the ledger data available for analytics alongside GCP’s other public DLT datasets, bolstering GCP’s position as the cloud provider of choice for DLT networks and decentralized applications. We’re inspired by what Hedera has accomplished to date, and look forward to providing the infrastructure and technologies to support what’s possible with distributed ledger technology.
Quelle: Google Cloud Platform

Last month today: January at Google Cloud

We welcomed January with a splash at Google Cloud, with product and service announcements around security, storage, and application development. Plus, we added more options to get your legacy systems to cloud, offered some Kubernetes tips, and opened up registration for Next ‘20 in April. Read on for details on the top stories.Making cloud stronger, easier—and colderGoogle Cloud Secret Manager, now in beta, offers a way to easily and securely store API keys, passwords, certificates, and other sensitive information on Google Cloud. It supports global names and replication policies, first-class versioning, principles of least privilege, audit logging, and more. Cloud Storage got its coldest storage class yet with the addition of the Archive class, announced last month. This brings even more flexibility and cost savings for your storage options, particularly for large, rarely accessed data sets that need long-term storage. Archive storage can be accessed quickly when you need it, without performance bottlenecks.  Welcoming platforms and partnersGoogle announced the acquisition of AppSheet last month. This leading no-code application development platform can power a range of apps, from CRM and field inspections to personalized reporting. AppSheet complements Google Cloud’s strategy to reimagine the app dev space with its platform, which helps enterprises innovate with no-code development, workflow automation, app integration and API management.The road to cloud is paved with legacy systems and complex workloads, and plenty of options to modernize enterprise workloads. To smooth the way, Google Cloud now supports IBM Power Systems as part of our cloud solutions. You can now run IBM Power Systems as a service on Google Cloud, whether you’re using AIX, IBM i, or Linux on IBM Power. This can be especially useful if you’re applying a hybrid cloud strategy.Learn something new, online or in personThis new guide walks you through our most popular tips and best practices on deploying and using Kubernetes and Google Kubernetes Engine (GKE). You’ll find guidance on using Kubernetes Namespaces to stay organized and improve performance, an explanation of using readiness and liveness probes, help deciding whether to run a database on Kubernetes, and more. When you’re at work, it’s nice to be as productive as you can—but it’s not always easy at the speed of modern business. Last month’s tips on managing information overload at work offered ideas about how you can use your time more efficiently to avoid interruptions, prioritize work, and communicate clearly. Plus, you’ll find details on some of the G Suite features designed to help you work faster and smarter. And finally, we announced Google Cloud Next ‘20, our annual event that brings together the cloud community for three days of learning, networking, and more. Registration is open now for the show, where you’ll hear from expert speakers and customers and choose from more than 500 sessions. Next ‘20 takes place April 6-8 in San Francisco.That’s a wrap for January. Let us know what you’re working on!
Quelle: Google Cloud Platform

Forrester looks at the opportunity for Google Cloud partners

One reason that I joined Google Cloud’s channel team was the importance Google places on partners and the opportunity we have together to help customers succeed. In 2019, our partners responded to this opportunity and are seeing the results. Google recently commissioned a new Total Economic Impact study from Forrester, which shines some light on our partner community, how they are helping customers succeed, and the opportunities they’re seeing. Based on interviews with Google Cloud partners, Forrester’s research helps illustrate why so many cloud and enterprise partners are building on, supporting, and providing unique services to customers deploying Google Cloud. As we enter 2020, the opportunity ahead for partners working with Google Cloud is larger than ever. Let’s look at some highlights from the study.Why are partners choosing to grow with Google Cloud?Open, multi-cloud strategy. Customers overwhelmingly choose multi-cloud environments, and partners are increasingly choosing to work with Google Cloud thanks to this multi-cloud approach. But equally important is Google Cloud’s support of open source software, like Kubernetes. Partners who want openness are choosing to work with Google Cloud.Data, analytics, and AI capabilities. Our differentiated capabilities in data and analytics—BigQuery, DataFlow, Dataproc, and Tensorflow—are also key reasons partners choose Google Cloud. A good example of this is our partner Deloitte, which in 2019 began to build new solutions leveraging BigQuery, AI, and ML to help customers in clinical research better leverage enormous quantities of data.How are partners growing their businesses on Google Cloud? Forrester data reveals that partners are seeing strong revenue and margin growth by building products, developing services, and selling capabilities on Google Cloud, and they’ve also seen success in differentiating themselves by earning specializations for the platform as well. Growing services offerings. As customer demand for Google Cloud capabilities grows, partners are seeing increased business from services offerings, helping customers implement GCP capabilities, including AI and ML, data center modernization, and multi-cloud deployments with Anthos. For example, the average migration services deal size for Google Cloud partners increased between three-to-six times over the last three years, while cloud modernization and application development deals grew by between three-to-five times during the same period.High-value projects. Partners in Forrester’s study were also able to increase their margins by adding services around higher-value analytics, AI/ML, cloud modernization, and cloud-native application development projects, as were partners who built custom IP on GCP. While Google Cloud offers strong margins for products across the board, like G Suite, the study showed that partners who moved “up the stack” to more complex technologies earned larger margins overall.Investing in talent and skills development. In Forrester’s research, some partners reported up to 800% growth in hiring over a four-year period. This aligns with what we already know: Google Cloud skills and talent are in high demand, and hiring and building skills around Google Cloud continues to be a tremendous area of opportunity and differentiation for our partners.Certification and training. Google Cloud partners are continuing to invest in training and certifications for their teams. For example, one partner surveyed by Forrester has a goal to have 70% of its technical staff certified on Google Cloud by the end of the year. Our partners have many opportunities to build new skills on Google Cloud, including sales and technical enablement offerings, professional development tools and incentives, and in many cases, no-cost certification training.Expertises and Specializations in key areas. Investments in Expertises and Specializations help partners differentiate themselves and fill technical skills gaps that customers need. The number of our partners that have earned Specializations increased nearly three times over the past year (based on Google Cloud data), which was reflected in interviews conducted by Forrester as well.What opportunities lie ahead?Forrester asked partners about the road ahead, such as where Google Cloud partners see the most opportunity to grow and differentiate their businesses. We know that it’s still early days for cloud migration across industries, but a few key product areas stood out as opportunities for partners—particularly, data analytics, artificial intelligence, machine learning, and building custom IP. Customer engagements in these areas are leading to repeat business and larger contract values for partners, Forrester found, so building expertise can help “further bolster partners’ value proposition and differentiation in the marketplace.” We’re excited about the shared opportunity we have with partners to help customers solve their trickiest challenges and address their biggest opportunities with Google Cloud. To learn more, download and read Forrester’s full TEI study, “The Google Cloud Business Opportunity for Partners,” here. Visit us here to learn about our Partner Advantage program, including more details about how you can start earning new Google Cloud Specializations and Expertises.
Quelle: Google Cloud Platform

How SAP users are achieving retail transformation with Google Cloud

The retail industry is in the midst of a transformation. Online commerce has emerged as a force to reckon with, commanding close to $6 trillion in market opportunity by 2022. With so much at stake, nearly half of all retailers are looking to the cloud to improve customer omnichannel experience and retail store performance. And retailers utilizing SAP solutions are no exception: 75% of retailers surveyed by the Americas’ SAP Users Group (ASUG whitepaper) expressed plans to increase digital investments in the next two years by at least 10% in order to accelerate digital transformation. Of those surveyed, 1 in 4 intend to increase investments significantly, by 50% or more.Retailers know what they need to offer to evolve today: a customer-focused, data-driven, seamless customer experience. But that journey is filled with technological roadblocks that are leaving even the largest retailers in limbo. For retailers innovating with SAP technologies, these roadblocks can present difficulties while migrating, deploying, and running new software that’s expensive and challenging to scale on legacy, on-premises infrastructure. Central to making the transformation journey a success is leveraging the public cloud and choosing the right public cloud service provider (CSP) — remember that not all clouds are created equal. Here at Google Cloud, we’ve helped SAP customers and retailers achieve transformation success by:Giving customers a simplified cloud journey with access to our Cloud Acceleration Program (CAP), and our robust partner community. Helping to accelerate innovation with industry-leading advanced analytics and AI/ML tools.Providing a scalable and elastic infrastructure to rightsize your applications and instances.Minimizing downtime with automated infrastructure maintenance with our Live Migration offering.Let’s take a look at how three retailers using SAP on Google Cloud were able to face their technology challenges head-on and bring their visions for digital transformation to life.Omnichannel: MediaMarktSaturn’s road to customer-centricityCustomers in the digital age expect personalized, seamless omnichannel experiences—from browsing online or via mobile to in-store and checkout. Most retailers are eager to deliver on this expectation, especially with rising technologies like AI, ML, and predictive analytics promising seamless omnichannel experiences. But contrast retail’s future tech landscape with today’s reality: 75% of SAP retail solution customers who participated in our recent ASUG study qualify as digital newcomers that are still in the early stages of transformation. In order to successfully offer personalized, customer-centric omnichannel experiences, retailers must generate customer insights in real time. However, this requires massive compute resources that are beyond the capabilities of most current on-premises infrastructures leveraging SAP.MediaMarktSaturn Retail Group, one of the world’s leading consumer electronics retailers, recently encountered data pipeline challenges that prevented the company from modernizing its omnichannel and retail strategies. MediaMarktSaturn was looking to unify its large data sets and infrastructure across its SAP solutions to generate accurate and relevant insights for both its business and its customers. However, MediaMarktSaturn’s legacy hardware infrastructure was not only incapable of handling the data volumes required to realize its omnichannel goal, but it was also unable to scale up and then back down again to accommodate varying levels of traffic without disruption. To overcome these technical and infrastructural hurdles, MediaMarktSaturn chose Google Cloud to help modernize and migrate its SAP workloads into the cloud. Together with Google Cloud, MediaMarktSaturn decided to leverage Google Kubernetes Engine (GKE), BigQuery, and BigTable to store, mine, cleanse, and analyze data to generate real-time, personalized insights that would better serve customers across all channels. The effort has so far yielded a 30% increase in conversion rates, due to optimizing their search technology and high-performance data handling. Looking to the future and equipped with the tools to modernize its retail strategy, MediaMarktSaturn has started to build analytics tools that explore price elasticity and price prediction based on multiple variables.Store operations: How Loblaw is delighting customers with seamless experiencesBuilding on the omnichannel experience, retailers are also rapidly modernizing store operations, outpacing the agility of their on-premise SAP infrastructure. With optimized express checkout, on-shelf and intelligent inventory management, and dynamic assortment planning on the retail tech horizon, it’s becoming increasingly critical that retail businesses have the foundation to build, test, and deploy the emerging technologies that are critical to compete. Retailers that delay infrastructural modernization in favor of layering new swaths of code on top of legacy systems risk creating a highly complex, coupled, and unscalable monolith that’s prone to downtime and data inaccuracies. Loblaw, Canada’s food and pharmacy leader and the nation’s largest retailer, recently encountered data pipeline issues similar to those at MMS while leveraging SAP Hybris in traditional on-prem environments. It had the goal of enabling personalized product recommendations on ecommerce platforms, but the technology was missing the mark, as the quality of suggestions and response latency had room for improvement. Loblaw also wanted to enable marketers to run promotions at any time, without requiring conversations with IT to prepare ecommerce systems. Loblaw decided to leverage public cloud because achieving its vision on-premises would require expanding its data centers and creating dedicated IT maintenance and operations teams. Rather than investing even more resources to support dated, inflexible technology solutions, Loblaw picked Google Cloud:“We thought, ‘Why don’t we offload all that effort to someone who’s doing it at scale, making the appropriate investments, and staying ahead in technology so that we can really focus our efforts on driving value to the customer,’ ” says Hesham Fahmy, Vice President of Technology at Loblaw.The first phase of Loblaw’s migration to the cloud involved its online grocery store, QuickShop, that leverages transaction data from SAP Hybris. Google Cloud offers a certified infrastructure for SAP Hybris, removing the administrative burden required to create an architectural foundation for modernization. Loblaw also uses BigQuery to run real-time analysis of customer data across the buying lifecycle to serve customers with more relevant offers. As a result of the partnership between Google Cloud and SAP, Loblaw has experienced a four-fold improvement in QuickShop’s performance, a three-fold increase in site capacity, and a 50% time savings for its Site Reliability Engineers, allowing the company to focus on further innovations in customer experience. Logistics, fulfillment, and delivery: MultiPharma’s path to serving customer needs with automated warehouses They may not get as much attention, but back-end operations are critical to retail success. Real-time, accurate, automated warehouse management is one of those workloads. From robotics and RFID tagging to on-demand inventory management, warehousing systems require a vast amount of data from all across a retailer’s ecosystem, both online and in-store. Much like the issues that come with developing omnichannel and store operations innovations, modernizing a company’s warehousing can strain legacy, on-prem infrastructure, causing inaccuracies, downtime, and unfulfilled orders. For pharmaceutical retailer MultiPharma, a key value proposition is prompt delivery of medication orders to pharmacists, even during periods of high demand. This required heavy investments in warehouse distribution, robotics, and automation—technologies that need scalable, elastic, and extensible infrastructure. MultiPharma originally satisfied this need with a legacy back-end SAP system and its own private cloud. But issues with cost and flexibility prompted the company to leverage SAP HANA and move to the public cloud. While the company considered several cloud services providers, MultiPharma selected Google Cloud for its superior VM solutions, flexible sizing, and pricing structures. MultiPharma phased the migration of SAP workloads into Google Cloud, the first of which involved creating a development environment for teams to conduct agile testing before finishing the product environment. Within the first phase, MultiPharma is already reaping benefits, including greater flexibility and increased resources that allow it to concentrate on further business innovations, such as optimizing ecommerce and customer-facing applications. As the retail industry continues to transform, retailers that embrace cloud technologies are increasingly positioned to take advantage of emerging opportunities. But in order for increased investments in digital transformations to pay off, retailers leveraging SAP need to ensure their infrastructure and data pipeline are ready for upcoming innovations. Although many enterprises may be tempted to temporarily solve this challenge by layering software in legacy, on-prem architecture, doing so almost certainly guarantees an inflexible, unscalable, inelastic, and costly monolith incapable of continuous modernization. Like MediaMarktSaturn, Loblaw, and MultiPharma, forward-thinking retailers should consider leveraging the cloud’s many offerings and managed services to not only remove the burden of infrastructure and data development and maintenance, but also to enable the best performance from their SAP and technology investments. To learn more about Google Cloud’s work with retailers utilizing SAP technologies and get key takeaways, read “Google Cloud Strategy Guide: 5 Learnings for Your SAP Retail Workloads.” You can also learn more about our SAP and retail industry solutions.
Quelle: Google Cloud Platform

Building more secure data pipelines with Cloud Data Fusion

For those of you working in data analytics, ETL and ELT pipelines are an important piece of your data foundation. Cloud Data Fusion is our fully managed data integration service for quickly building and managing data pipelines. Cloud Data Fusion is built on the open source project CDAP, and this open core lets you build portable data pipelines. A CDAP server might satisfy your need to run a few simple data pipelines. But when it comes to securing a larger number of business-critical data pipelines, you’ll often need to put a lot more effort into logging and monitoring those pipelines. You will also need to manage authentication and authorization to protect that data when you have servers running workloads for multiple teams and environments. These additional services can require a lot of maintenance effort from your operations team and take time away from development. The goal is running pipelines, not logging, monitoring, or the identity and access management (IAM) service.We designed Cloud Data Fusion to take care of most of this work for you. And since it’s part of Google Cloud, you can take advantage of built-in security benefits when using Cloud Data Fusion rather than self-managed CDAP servers:Cloud-native security control with Cloud IAM—Identity management and authentication efforts are taken care of by Cloud IdentityFull observability with Stackdriver Logging and Monitoring—Logs include pipeline logs and audit logsReduced exposure to public internet with private networkingLet’s take a look at these features in detail.Access control with Cloud IAM The number one reason to use Cloud Data Fusion over self-managed CDAP servers is that it integrates seamlessly with Cloud IAM. That lets you control access to your Cloud Data Fusion resources. With Cloud IAM, Cloud Data Fusion is able to easily integrate with other Google Cloud services. You can also use Cloud Identity for users and groups management and authentication [such as multi-factor authentication (MFA)], instead of implementing or deploying your own.There are two predefined roles in Cloud Data Fusion: admin and viewer. As a practice of the IAM principle of least privilege, the admin role should only be assigned to users who need to manage (create and delete) the instances. The viewer role should be assigned to users who only need to access the instances, not manage them. Both roles can access the Cloud Data Fusion web UI to create pipelines and plugins.Assign roles and permissions to groups with users instead of assigning them to users directly whenever possible. This helps you control users’ access to Cloud Data Fusion resources in a more organized manner, especially when you assign permissions to the groups repeatedly on multiple projects.Read more about the two Cloud Data Fusion roles and their corresponding permissions.Private IP instanceThe private IP instance of Cloud Data Fusion connects with your Virtual Private Cloud (VPC) privately. Traffic over this network does not go through the public internet, and reduces potential attack surface as a result. You can find more about setting up private IP for Cloud Data Fusion.VPC Service ControlsWe’re also announcing beta support for VPC Service Controls to Cloud Data Fusion. You can now prevent data exfiltration by adding a Cloud Data Fusion instance to your service perimeter. When configured with VPC-SC, any pipeline that reads data from within the perimeter will fail if it tries to write the data outside the service perimeter.Stackdriver LoggingStackdriver Logging and Monitoring are disabled by default in Cloud Data Fusion, but we recommend you enable these tools for observability.With the extra information provided by the logs and metrics, you can not only investigate and respond to incidents faster, but understand how to manage your particular infrastructure and workloads more effectively in the long run. There are a range of logs that can help you run your Cloud Data Fusion pipelines better.Pipeline logsThese are generated by your pipelines in Cloud Data Fusion. They are useful for understanding and troubleshooting your Cloud Data Fusion pipelines. You can find these logs in the Cloud Data Fusion UI as well as in the Stackdriver logs of the Dataproc clusters that execute the pipelines.Admin activity audit logsThese logs record operations that modify the configuration or metadata of your resources. Admin activity audit logs are enabled by default and cannot be disabled.Data access audit logsData access audit logs contain API calls that read the configuration or metadata of the resources, as well as user-driven API calls that create, modify, or read user-provided resource data.Admin activity audit logs and data access audit logs are useful for tracking who accessed or made changes to your Cloud Data Fusion resources. In case there’s any malicious activity, a security admin will be able to find and track down the bad actor in the audit logs.These Google Cloud features can give you extra control and visibility into your Cloud Data Fusion pipelines. Cloud IAM helps you to control who can access your Cloud Data Fusion resources; private instance minimizes exposure to public internet; and Stackdriver Logging and Monitoring provides information about your workloads, changes in permission, and access to your resources. Together, they create a more secure solution for your data pipeline on Google Cloud.Learn more about Cloud Data Fusion.
Quelle: Google Cloud Platform

Scale your Composer environment together with your business

When you’re building data pipelines, it’s important to consider business needs now and in the future. We often hear from customers that they want to configure and optimize their Cloud Composer environments. So we on the Cloud Composer engineering team will share in this post how Cloud Composer—built on Apache Airflow—works, and offer some tips to optimize your Cloud Composer performance. Cloud Composer is a fully managed workflow orchestration service that lets you author, schedule, and monitor pipelines that span across clouds and on-premises data centers. It’s built on Apache Airflow open source software and operated using the Python programming language.We’ll start by analyzing how Airflow configurations can affect performance, then offer tips on ways to quickly bootstrap your initial settings for high performance. You may also find this sizing guide helpful—make a copy and add your own numbers. Understanding Apache Airflow scheduler architecture with CeleryExecutorLet’s start with this detailed architecture of Airflow scheduler/worker in Cloud Composer. This assumes you’re already familiar with overall Cloud Composer architecture and Apache Airflow concepts.In the diagram below, you can see that the process of parsing DAGs loads DAGs from files repeatedly. The process checks DAGs and fires scheduling actions, such as starting a DAG run or creating a task. Tasks are sent to workers for execution via a Redis-based task queue.Scheduler architecture of Airflow on ComposerThe scheduler launches multiple processes by calling Python multiprocessing.Process to parse DAG files in parallel. The total number of DAG processing processes that the scheduler can launch is limited by the Airflow config (scheduler)-max_threads.Each DAG parsing process will complete the following steps:Parse a subset of DAG files to generate DAG runs and tasks for those DAG runs.Collect tasks that meet all dependencies.Set these tasks to the SCHEDULED state.The main process of the scheduler will do the following in a loop:Collect all SCHEDULED tasks from DAG parsing processes.Set eligible tasks to QUEUED state.Send certain number of QUEUED tasks into Celery queue. This number is calculated by the parallelism config parameter, which represents the max number of tasks running concurrently.Remaining tasks will remain in QUEUED state.Life of a task in Cloud ComposerEvery Airflow Task goes through the process and constraints depicted below before being executed by a worker. In sequence, a Airflow task needs to pass these Airflow config constraints to be finally executed by a worker:Constraints in Airflow at different stagesThe DAG parsing process in the scheduler parses the DAG definition, creating task instances for each task in the DAG.If all of the task dependencies are met, the task is set to the SCHEDULED state.Once the task is in the SCHEDULED state, the scheduler main process picks it for processing.The Scheduler main pProcess picks tasks in the SCHEDULED state, taking into account the constraints `dag_concurrency` for maximum number of tasks per DAG and `non_pooled_task_slot_count’ for max number of tasks in the system, together with other criteria for queueing. Effectively queued tasks are set to the QUEUED state.As the next step, the Scheduler main process queues tasks in the Celery queue based on the `parallelism` constraint, which limits the number of queued tasks in the Celery queue. Queued tasks are kept in state QUEUED.Last, worker processes take tasks from the Celery queue as long as the number of tasks in the worker is lower than the `worker_concurrency` constraint. Tasks effectively running in a worker are set to the RUNNING state.Recommended Airflow config variables for optimal performanceHere’s a quick reference table with our recommendations for various Airflow configs that may affect performance. We’re going to discuss the rationale behind each of them in the following sections.Choose the right Airflow scheduler settings When you need to scale your Cloud Composer environment, you’ll want to choose the right Airflow configs as well as node and machine type settings.The Airflow default config for scheduler max_threads is only two, which means even if the Airflow scheduler pod runs in a 32-core node, it can only launch two DAG parsing processes. Therefore, it’s recommended to set max_threads to at least the number of vCPUs per machine.If you find tasks are taking a long time in SCHEDULED state, it can mean that tasks are constrained by dag_concurrency or non_pooled_task_slot_count. You can consider increasing the value of the two options.If you find tasks are stuck in QUEUED state, it can mean they may be constrained by parallelism. It may, however, also be limited by worker processing power, because tasks are only set to RUNNING state after they’re already picked up by a worker. You can consider increasing parallelism or adding more worker nodes.Test Airflow worker performance Cloud Composer launches a worker pod for each node you have in your environment. Each worker pod can launch multiple worker processes to fetch and run a task from the Celery queue. The number of processes a worker pod can launch is limited by Airflow config worker_concurrency. To test worker performance, we ran a test based on no-op PythonOperator and found that six or seven concurrent worker processes seem to already fully utilize one vCPU with 3.75GB RAM (the default n1-standard-1 machine type). The addition of worker processes can introduce large context switch overhead and can even result in out-of-memory issues for worker pods, ultimately disrupting task execution.`worker_concurrency` = 6-8 * cores_per_node or per_3.75GB_ramCloud Composer uses six as the default concurrency value for environments. For environments with more cores in a single node, use the above formula to quickly get a worker_concurrency number that works for you. If you do want a higher concurrency, we recommend monitoring worker pod stability closely after the new value takes effect. Worker pod evictions that happen because of out-of-memory errors may indicate the concurrency value is too high. Your real limit may vary depending on your worker process memory consumption.Another consideration to take into account is long-running operations that are not CPU-intensive, such as polling status from a remote server that consumes memory for running a whole Airflow process. We advise lifting your worker_concurrency number slowly and monitoring closely after adjustment.Consider more nodes vs. more powerful machinesBig node setup vs. small node setup with the same number of vCPUs. In the image on the right, the Airflow Scheduler pod runs in a relatively less powerful machine.Our internal tests show that worker processing power is most influenced by the total number of vCPU cores rather than machine type. There’s not much difference in terms of worker processing power between a small number of multi-core machines and a large number of single-core machines, as long as the total number of CPU cores is the same.However, in the small node setup, with a large number of nodes but less powerful machines, the Scheduler runs in a small machine, and it may not have enough compute power to produce tasks for workers to execute. Therefore, we recommend setting up a Cloud Composer cluster with a relatively small number of powerful machines, keeping in mind that if the number of machines is too small, a failure of one machine will impact the cluster severely.Our internal tests show that with worker_cores:scheduler_cores ratio up to around 9:1, there is no performance difference in terms of system turnout for the same amount of cores, as long as there are no long-running tasks. We recommend that you only exceed that ratio when you have long-running tasks. You can use the formula below to quickly calculate a good worker_cores:scheduler_cores ratio to start with.For example, if you set up your environment initially with three nodes and two cores per machine and then estimate you may have 24 long-running tasks at the same time, you could try to scale your environment up to 9 + 24 / (2 * 6) = 11 nodes. If you want to have more performance, it may be worth trying with a more powerful machine type instead.Use our sizing guide to get started, and have a wonderful journey with Cloud Composer!
Quelle: Google Cloud Platform

Bringing a passion for privacy to cloud

Editor’s note:This is the first in our Black History Month series of Cloud Googler profiles. Michee Smith is a product manager within Google Cloud who’s responsible for building products that help protect customer privacy when they store their data within Google Cloud. In her almost five years at Google, she’s led the charge on many key projects, like launching our first Political Ads Transparency Report, redesigning the Transparency Report to be even easier to use, and working on our efforts across Google to comply with the GDPR. We sat down with Michee to talk about her career path, her tech passions, why representation matters, and why staying true to yourself is a winning formula. Michee on why privacy mattersI’m passionate about making our customers super comfortable with holding data inside of Google Cloud. In my opinion, doing privacy right is twofold: Developer tools should make privacy the easiest thing to build into the product. Developers shouldn’t have to jump through hoops to deliver notice and consent flows, and enforce identity and access management policies. It should be natural to the tooling they use and how they work. The second part, and most important, is that customer expectations should be set, then met, around who accesses their data and how it’s used. Users should never be surprised about who saw their data and how it was used; products with transparency and control built-in by default helps make sure they aren’t. At Google, I work on products like Access Transparency and Cloud Audit Logs, which help users easily track who has accessed their data and when. My affection toward privacy and security started at my previous job, when I started to learn how nuanced and technical these important topics are. I became a privacy champion for my products, doing office hours and privacy assessments to help teams understand why it’s important and how to make it easier for users to adopt these products. Then, Google hired me to work on privacy full time, where I’ve worked on a bunch of interesting projects, including Google’s efforts to comply with the GDPR, Europe’s broad privacy regulations. Now, as a part of Google Cloud, I lead a team that builds products to make sure we are a trusted cloud for customers to put their data. At Cloud, I have the opportunity to help not just Google, but many companies across the world, work to ensure the privacy of their data. One project on my team, called Key Access Justifications, lets customers truly be the final arbiters of access to their own data, and gives transparency around Google personnel access to their data. I also get to work on Google’s compliance efforts, and work with product managers and engineers on the best ways we can give our customers data controls.Working at the speed of trust is faster than anything else.On choosing her pathI knew I wanted to study computer science in high school and that meant being around people who were different than those who I grew up with in my predominantly Black neighborhood. So, I applied for a full scholarship to Rochester Institute of Technology to be around people who were different from me. College was the first time I had an email address, and I didn’t even have a computer until I started my first job. I learned technical skills, and also about different cultures and groups of people. Ultimately, that understanding and empathy not only helps me navigate the corporate world, but build products for everyone. I’ve always had a belief in myself, which I credit in part to being raised in the Black church, a supportive and encouraging environment. But even so, it’s sometimes been hard to maintain that confidence. I regretted not taking certain opportunities at RIT, like collaborating with my chemistry professor on a biomedical computing research project, because I didn’t want to fail. I didn’t believe in myself enough. The first time someone asked me to speak at a conference, I had to be pushed. But after it was well-received, I realized I shouldn’t counsel myself out of doing anything. Let other people tell me “no,” but don’t let myself be the one to say it.  I want people to know I’m not a unicorn—I’m not here because I’m necessarily special, but because I haven’t let rejection stop me. My philosophy is that I’m the only one who can tell me I can’t do something. The superpower I rely on is that I won’t let other people tell me I’m not good enough.The superpower I rely on is that I won’t let other people tell me I’m not good enough.On being yourself, no matter whatAt my first job after college in 2003, people would talk over me. I was just the college hire, the junior employee. I had to show up, and I decided to start dressing up for work as a way to show my presence and be taken more seriously. Surprisingly, it worked! Soon I was leading on projects, and my colleagues started looking to me for my counsel. In making that decision to change how I dressed, I became conscious of how I can adapt to situations without changing who I am. It’s a negotiation with myself that I don’t take lightly. I always remember to be true to my values, and I tell my managers that I value camaraderie. I spend more time with my colleagues than my family, so I need to feel good about working with them, and them working with me. Being myself at work lets other people feel comfortable being themselves around me, too. That camaraderie and trust allows us to work quickly and effectively. On the importance of mentors and representationI believe you should always have three relationships: mentor, mentee, peer. I always have those going at all times, whether it’s formal or informal, and I regularly speak at conferences and attend student events, too. Relationships and representation in tech really matter. Often times, people of color don’t see people who look like us in these roles and on stages. There’s a sense of gratitude, belonging, and relief to see someone who looks like you. I want to show up to help others imagine themselves in this role, building products for everyone, and let them know that they, too, can find their superpower.
Quelle: Google Cloud Platform

Identifying and tracking toil using SRE principles

One of the key measures that Google site reliability engineers (SREs) use to verify our effectiveness is how we spend our time day-to-day. We want ample time available for long-term engineering project work, but we’re also responsible for the continued operation of Google’s services, which sometimes requires doing some manual work. We aim for less than half of our time to be spent on what we call “toil.” So what is toil, and how do we stop it from interfering with our engineering velocity? We’ll look at these questions in this post.First, let’s define toil, from chapter 5 of the Site Reliability Engineering book:“Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.”Some examples of toil may include:Handling quota requestsApplying database schema changesReviewing non-critical monitoring alertsCopying and pasting commands from a playbookA common thread in all of these examples is that they do not require an engineer’s human judgment. The work is easy but it’s not very rewarding, and it interrupts us from making progress on engineering work to scale services and launch features.Here’s how to take your team through the process of identifying, measuring, and eliminating toil.Identifying toilThe hardest part of tackling toil is identifying it. If you aren’t explicitly tracking it, there’s probably a lot of work happening on your team that you aren’t aware of. Toil often comes as a request texted to you or email sent to an individual who dutifully completes the work without anyone else noticing. We heard a great example of this from CRE Jamie Wilkinson in Sydney, Australia, who shared this story of his experience as an SRE on a team managing one of Google’s datastore services.Jamie’s SRE team was split between Sydney and Mountain View, CA, and there was a big disconnect between the achievements of the two sites. Sydney was frustrated that the project work they relied upon—and the Mountain View team committed to—never got done. One of the engineers from Sydney visited the team in Mountain View, and discovered they were being interrupted frequently throughout the day, handling walk-ups and IMs from the Mountain View-based developers. Despite regular meetings to discuss on-call incidents and project work, and complaints that the Mountain View side felt overworked, the Sydney team couldn’t help because they didn’t know the extent of these requests. So the team decided to require all the requests to be submitted as bugs. The Mountain View team had been trained to leap in and help with every customer’s emergency, so it took three months just to make the cultural change. Once that happened, they could establish a rotation of people across both sites to distribute load, see stats on how much work there was and how long it took, and identify repetitive issues that needed fixing.“The one takeaway from this was that when you start measuring the right thing, you can show people what is happening, and then they agree with you,” Jamie said. “Showing everyone on the team the incoming vs. outgoing ticket rates was a watershed moment.”When tracking your work this way, it helps to gather some lightweight metadata in a tracking system of your choice, such as:What type of work was it (quota changes, push release to production, ACL update, etc.)?What was the degree of difficulty: Easy (<1 hour); Medium (hours); Hard (days) (based on human hands-on time, not elapsed time)?Who did the work?This initial data lets you measure the impact of your toil. Remember, however, that the emphasis is on lightweight in this step. Extreme precision has little value here; it actually places more burden on your team if they need to capture many details, and makes them feel micromanaged.Another way to successfully identify toil is to survey your team. Another Google CRE, Vivek Rau, would regularly survey Google’s entire SRE organization. Because the size and shape of toil varied between different SRE teams, at a company-wide level ticket metrics were harder to analyze. He surveyed SREs every three months to identify common issues across Google that were eating away at our time for project work. Try this sample toil survey to start:Averaging over the past four weeks, approximately what fraction of your time did you spend on toil?  Scale 0-100%How happy are you with the quantity of time you spend on toil? Not happy / OK / No problem at allWhat are your top three sources of toil?On-call Response / Interrupts / Pushes / Capacity / Other / etc.Do you have a long-term engineering project in your quarterly objectives?Yes / NoIf so, averaging over the past four weeks, approximately what fraction of your time did you spend on your engineering project? (estimate)Scale 0-100%In your team, is there toil you can automate away but you don’t do so, because that very toil takes time away from long-term engineering work? If so, please describe below.Open responseMeasuring toilOnce you’ve identified the work being done, how do you determine if it’s too much? It’s pretty simple: Regularly (we find monthly or quarterly to be a good interval), compute an estimate of how much time is being spent on various types of work. Look for patterns or trends in your tickets, surveys, and on-call incident response, and prioritize based on the aggregate human time spent. Within Google SRE, we aim to keep toil below 50% of each SRE’s time, to preserve the other 50% for engineering project work. If the estimates show that we have exceeded the 50% toil threshold, we plan work explicitly with the goal of reducing that number and getting the work balance back into a healthy state. Eliminating toilNow that you’ve identified and measured your toil, it’s time to minimize it. As we’ve hinted at already, the solution here is typically to automate the work. This is not always straightforward, however, and the aim shouldn’t be to eliminate all toil.Automating tasks that you rarely do (for example, deploying your service at a new location) can be tricky, because the procedure you used or assumptions you made while automating may change by the time you do that same task again. If a large amount of your time is spent on this kind of toil, consider how you might change the underlying architecture to smooth this variability. Do you use an infrastructure as code (IaC) solution for managing your systems? Can the procedure be executed multiple times without negative side effects? Is there a test to verify the procedure?Treat your automation like any other production system. If you have an SLO practice, use some of your error budget to automate away toil. Complete postmortems when your automation fails, and fix it as you would any user-facing system. You want your automation available to you in any situation, including production incidents, to free humans to do the work they’re good at.If you’ve gotten your users familiar with opening tickets to request help, use your ticketing system as the API for automation, making the work fully self-service.Also, because toil isn’t just technical, but also cultural, make sure the only people doing toil work are the people explicitly assigned to it. This might be your oncaller, or a rotation of engineers scheduled to deal with “tickets” or “interrupts.” This preserves the rest of the team’s time to work on projects and reinforces a culture of surfacing and accounting for toil.A note on complexity vs. toilSometimes we see engineers and leadership mistaking technical or organizational complexity as toil. The effects on humans are similar, but the work fails to meet the definition at the start of this post. Where toil is work that is basically of no enduring value, complexity often makes valuable work feel onerous. Google SRE Laura Beegle has been investigating this within Google, and suggests a different approach to addressing complexity: While there’s intense satisfaction in designing a simple, robust system, it inevitably becomes somewhat more complex, simply by existing in a distributed environment, used by a diverse range of users, or growing to serve more functionality over time. We want our systems to evolve over time, while also reducing what we call “experienced complexity”—the negative feelings based on mismatched expectations about how long or difficult a task is to complete. Quantifying the subjective experience of your systems is known by another name: user experience. The users in this case are SREs. The observable outcome of well-managed system complexity is a better user experience.Addressing the user experience of supporting your systems is engineering work of enduring value, and therefore not the same as toil. If you find that complexity is threatening your system’s reliability, take action. By following a blameless postmortem process, or surveying your team, you can identify situations where complexity resulted in unexpected results or a longer-than-expected recovery time.Some manual care and feeding of the systems we build is inevitably required, but the number of humans needed shouldn’t grow linearly with the number of VMs, users, or requests. As engineers, we know the power of using computers to complete routine tasks, but we often find ourselves doing that work by hand anyway. By identifying, measuring, and reducing toil, we can reduce operating costs and ensure time to focus on the difficult and interesting projects instead.For more about SRE, learn about the fundamentals or explore the full SRE book.
Quelle: Google Cloud Platform

Expanding our alliance with Cisco in hybrid cloud and the contact center

Over the past three years, we’ve worked closely with Cisco to deliver a number of customer-focused solutions in areas such as hybrid cloud, multi-cloud, work transformation, and contact center integrations. This week at Cisco Live in Barcelona, we’re sharing updates on our joint work in two key areas of customer demand—hybrid cloud solutions and the digital contact center.Announcing the availability of Anthos 1.2 with Cisco HyperFlexAt Next ‘19, Cisco and Google Cloud announced a hybrid cloud partnership to bring Anthos and Cisco HyperFlex to our shared customers. After working closely across our engineering and business development teams, today we are excited to announce the general availability of Anthos 1.2 with Cisco HyperFlex, with a Cisco Validated Design (CVD) for the joint solution coming soon. Google Cloud’s Anthos deployed with Cisco HyperFlex enables you to modernize in-place with your existing resources. You can automate policy and security at scale, track configuration and policy changes to have an audit log of system configuration, and update configurations in seconds across all of your Anthos environments. It also provides consistency, the same experience across on-prem and in the cloud.“This is an important milestone in our hybrid cloud partnership with Google Cloud,” said Kaustubh Das, Vice President Product Management, Cisco. “With Anthos and the HyperFlex Data Platform, our customers now have a highly available and resilient on-prem data platform for running Kubernetes workloads at scale. We now have all the benefits of Anthos on a system that delivers predictable performance, enterprise-grade data services, storage optimization, security and zero downtime during upgrades.”Cisco HyperFlex unifies compute, storage, and networking for your core to the edge. Anthos GKE on-prem deployed on Cisco HyperFlex provides a Container-as-a-Service environment based on our recently released Anthos 1.2. This solution provides end-to-end orchestration, management and scalable architecture to deploy Anthos on Cisco HyperFlex with HyperFlex CSI (Container Storage Interface) for persistent storage. Customers looking at hybrid cloud models will experience a consistent Kubernetes experience on-prem as well as on the cloud with:A single management control plane for the entire hardware lifecycle management.Scalable and highly available hyperconverged infrastructure to aid container applications with compute, network and storage needs,Faster turnaround time, making it a good fit for DevOps and CI/CD use cases.Anthos single control plane for multi-cloud management, allowing you to deploy applications across hybrid and multi-cloud environments without changing the underlying code.Automated policies and security at scale.“Cisco and Google Cloud have combined Cisco’s leading hyper converged technology with Google Cloud’s Anthos to make hybrid cloud containerization a reality for our customers,” said Dave Sellers, General Manager, MultiCloud at World Wide Technology.”  Leveraging our Advanced Technology Center, WWT is providing our customers a unique educational and hands-on lab experience showcasing the unique value proposition offered by these cutting-edge technologies.”Customers in our first Anthos lab day event last week with WWT are already sharing positive feedback with us—and we’ve been thrilled by the reception. Customer centricity is a joint value shared within this partnership, and it has helped inform the direction of our products and will continue to shape the future of Anthos. Expanding our partnership to modernize the Contact CenterWe are also excited to expand our partnership with Cisco by offering Contact Center AI through Cisco’s platform. Cisco is now bringing in Google Cloud’s Natural Language Processing (NLP), AI, and ML capabilities to create a seamless end-to-end conversational experience for customers. With the release of our joint solution, Cisco is now introducing Google Cloud’s conversational IVR, Virtual Agent, Agent Assist, and Insights to their contact center offering. Powered by Google Cloud’s innovative conversational AI, our Contact Center AI offering helps businesses create richer, more natural-sounding, and more helpful conversational experiences within the contact center. Customers can use natural language to describe the reason for their text or call. The Virtual Agent can then either assist the customer or route the conversation to the appropriate agent. Cisco Contact Center’s industry leading routing technology then routes the customer to the appropriate agent based on the understood intent.. AI and NLP continue to assist the conversation by surfacing knowledge articles, recommendations, and turn-by-turn guidance for the agent. Then the agent is assisted with wrap-up and business leaders can use the data with Insights for sentiment analysis and spotting trends.Google Cloud Contact Center AI, in partnership with Cisco, improves the customer experience, increases agent satisfaction, and provides insights to business leaders—and it does this all while deflecting more calls, reducing average handle time, and lowering costs. And since Google Cloud and Cisco have done the hard work on the backend, the solution is easier to implement. No machine learning experts needed!“We’re excited to launch this joint solution that infuses AI from the Google Cloud into our Contact Center and transforms how our joint customers do business,” said Omar Tawakol, VP/GM at Cisco Contact Center:  With this integration, we’re combining Google Cloud’s Natural Language Processing and AI capabilities with our industry leading contact center capabilities to empower agents to provide better customer service and vastly improve the experience for the end customer.”You can learn more about Anthos and Contact Center AI on our website. And if you’re attending Cisco Live Barcelona, we invite you to stop by Booth 02 to learn more about our joint solutions—details are here.
Quelle: Google Cloud Platform