How MEDITECH adds advanced security to its cloud-based healthcare solutions with Cloud IDS

MEDITECH develops electronic health record (EHR) systems solutions that enhance the interactions of physicians and clinicians with patients. The company empowers healthcare organizations large and small to deliver secure, cost-effective patient care. MEDITECH’s intuitive and mobile offerings include software for health information management, patient care and patient safety, emergency department management, oncology, genomics, population health, laboratories, blood banks, revenue cycle management, home health, virtual care, and many other areas of healthcare.Proficiency with cloud technology is a competitive advantage and major selling point for MEDITECH. On its website the company describes its MEDITECH Cloud Platform as a way to “[g]ive your clinicians and patients a better, more mobile healthcare experience while ensuring your organization’s long-term sustainability” and “the latest step in our journey to deliver innovative, cost-effective healthcare technology.”Cloud IDS: Built on industry-leading network security technologyGoogle Cloud IDS is built with Palo Alto Networks’ industry-leading threat detection technologies, so it was a natural choice for MEDITECH.Tom Moriarty, Manager, Information Security at MEDITECH says: “Keeping our environment secure is our primary reason for deploying Cloud IDS. In healthcare, infrastructure and patient data security are absolutely crucial.” Fast and easy to deployTom and his team were extremely impressed with how quickly they were able to set up Cloud IDS. “We deployed it in a couple of days,” he reports. “It should take one person less than a day of steady work.”Because Cloud IDS is already integrated with the components of Google Cloud, implementation required very little configuration work and no changes to MEDITECH’s network architecture. Since the company had experience with Palo Alto Networks’ intrusion detection software, they were confident about the detections, and already knew how to interpret them. Because of the architecture, they didn’t need to write any rules or come up with their own detections. Industry-leading, third-party validated threat detection rules were already included, and updated daily. And of course, the security team is happy to be working with a cloud-hosted product that doesn’t require installing and managing any local hardware or software, or managing load balancing, scaling, performance monitoring, and licensing.Out of the box integration with Google ChronicleGoogle Chronicle is an advanced security analytics tool that enables security teams to identify and respond to fast-moving attacks. It can store and analyze petabytes of security telemetry. By correlating threat indicators inside an organization’s network with intelligence about global threats in the wild, it supports threat detection, threat hunting, malware investigation, security forensics, and other key activities of cybersecurity groups.Tom’s team was very pleased that the new threat detection solution works out of the box with Chronicle. He says: “We are using Google Chronicle as our security analytics tool for our corporate environment. By integrating Cloud IDS with Chronicle, we are able to analyze threats surfaced by Cloud IDS.” Helping with complianceMEDITECH works to meet or exceed the requirements of industry standards and security frameworks. Google Security’s technology plays an important role in achieving this objective, according to Tom: “Cloud IDS helps us address our compliance requirements and best practices.”ConclusionMEDITECH business strategy depends on maintaining a position at the forefront of cloud technology. Strong security is a key element of that position. Cloud IDS provides the company with industry-leading threat detection technology, simplified operations, and integration with key security workflow tools like Chronicle. It also supports the organization’s compliance requirements. That will allow MEDITECH to continue to deliver innovative, cost-effective, and secure technology to its healthcare customers and their patients.Related ArticleCloud IDS for network-based threat detection is now generally availableGoogle Cloud IDS for network-based threat detection is now generally available.Read Article
Quelle: Google Cloud Platform

Cloud IDS for network-based threat detection is now generally available

As more and more applications move to the cloud, cloud network security teams have to keep them secure against an ever-evolving threat landscape. Shielding applications against network threats is also one of the most important criteria for regulatory compliance. For example, effective intrusion detection is a requirement of the Payment Card Industry Data Security Standard – PCI DSS 3.2.1. To address these challenges, many cloud network security teams build their own complex network threat detection solutions based on open source or third-party IDS components. These bespoke solutions can be difficult and costly to operate, and they often lack the scalability that is required to protect dynamic cloud applications. Earlier this year, we announced Cloud IDS, a new cloud-native network security offering that delivers on our vision of Invisible Security, where key security capabilities are continuously engineered into our trusted cloud platform. Today we’re excited to announce the general availability of Cloud IDS. This core network security offering helps detect network-based threats and helps organizations meet compliance standards that call for the use of an intrusion detection system. Cloud IDS is built with Palo Alto Networks’ industry-leading threat detection technologies,  providing high levels of security efficacy that enable you to detect malicious activity with few false positives. The general availability release includes these enhancements:Service availability in all regionsAuto-scaling available in all regionsDetection signatures automatically updated dailySupport for customers’ HIPAA compliance requirements (under the Google Cloud HIPAA Business Associate Agreement)ISO27001 certification (and in the audit process to support customers’ PCI-DSS compliance requirements by year end)Integration with Chronicle, Google’s security analytics platform, to help organizations investigate threats surfaced by Cloud IDS.Managed network threat detection with full traffic visibilityCloud IDS delivers cloud-native, managed, network-based threat detection. It features simple setup and deployment, and gives customers visibility into traffic entering their cloud environment (north-south traffic) and into traffic between workloads (east-west traffic). Cloud IDS empowers security teams to focus their resources on high priority issues instead of designing and operating complex network threat detection solutions.AvayaAvaya is a leader in cloud communications and collaboration solutions. Cloud IDS was enabled for Avaya’s Google Cloud environment to address network threat detection requirements. John Akerboom, Sr. Director for Architecture & Experience Platforms at Avaya shared his experience with Cloud IDS:”It was easy to setup: a couple clicks, a few settings, and a few minutes later it was up and running,” explained Akerboom. “We had a scanner running, and some pen testing going on. We went into the Google Cloud IDS UI and saw all those things in progress.”LyticsGraham Forest, Principal Operations Engineer at Lytics, a cloud-native, customer data platform (CDP) vendor headquartered in Oregon, summarized his take on Cloud IDS this way:”It’s built-in to our platform on Google Cloud; it’s just a toggle, with a giant team of Google SREs behind it. The implementation cost is extremely low; reliability and architecture complexity are not impacted, and maintenance cost is low.” Forest chose Cloud IDS for these main reasons: “Our customers require compliance validation, like SOC2, and our larger financial customers run their own audits on our service. Our initial interest was to fulfill those compliance requirements. But we also want indication when attackers are attempting to breach our network, and we want to know immediately. We get both with this solution!”MEDITECHMedical Information Technology, Inc. (MEDITECH) empowers providers and patients around the world with its Expanse EHR (Electronic Health Record), setting new standards for electronic medical record usability, efficiency, and provider and patient satisfaction. The company’s cloud-native solutions are built on Google Cloud, representing the latest step in MEDITECH’s journey to deliver innovative, cost-effective healthcare technology that is also safe and secure.”In healthcare, infrastructure and patient data security are absolutely crucial. Keeping our environment secure is our primary reason for deploying Cloud IDS,” said Tom Moriarty, Manager, Information Security, MEDITECH. “The ease of setup and its cloud-native design add value, by protecting access to high quality healthcare for a diverse range of geographic settings and healthcare needs.” MEDITECH also has previous experience with Cloud IDS’ threat detection from Palo Alto Networks. “We are using Palo Alto Networks IDS and IPS in our on-premises network, and we look forward to leveraging the same advantages in our cloud hosted environment,” said Moriarty. MEDITECH’s confidence in these offerings stems from deploying them in-house. “We are using Google Chronicle as our security analytics tool for our corporate environment. By integrating Cloud IDS with Chronicle, we are able to analyze threats surfaced by Cloud IDS. This also helps us address our compliance requirements,” Moriarty concluded. Read more about MEDITECH’s use of Cloud IDS in their detailed case study.Detect at scale, investigate, and respond to threats in all regionsCloud IDS is now available in all regions. It provides protection against malware, virus and spyware, command and control (C2) attacks, and vulnerabilities such as buffer overflow and illegal code execution attacks. Autoscaling capability dynamically adjusts Cloud IDS as needed when your traffic throughput changes so that you can automatically keep up with your scale needs. Threat signature updates are applied daily so you can stay ahead of the new threat variants. You can now use Chronicle to investigate the threats surfaced in Cloud IDS. With Chronicle’s integration, you can store and analyze Cloud IDS threat logs along with all your security telemetry data in one place so that you can effectively investigate and respond to threats at scale.Getting startedYou can get started with Cloud IDS through the GCP console. Watch a Getting started with Cloud IDS video that walks you through the high-level architecture and a product demo.Cloud IDS pricing is based on a per-hour charge for the Cloud IDS endpoint and the amount of traffic that is inspected. You can learn more about Cloud IDS and express interest in a trial using the Cloud IDS webpage. Related ArticleHow MEDITECH adds advanced security to its cloud-based healthcare solutions with Cloud IDSMEDITECH adds advanced security to its cloud-based healthcare solutions with Google Cloud IDS.Read Article
Quelle: Google Cloud Platform

Cloud Security podcast by Google turns 46 – Reflections and lessons!

Time flies when you’re having fun! We’ve produced 46 episodes of the Cloud Security Podcast by Google since our launch in February 2021. Looking back, we’d like to share some cloud security lessons and insights we picked up along the way.Over the course of 2021, the following themes emerged as the most popular with our audience.Zero trust securityCloud threat detectionMaking Cloud migrations more secureData security in the cloudLet’s explore each of these while highlighting some of the more interesting episodes:Zero trustOn the zero trust side, we had a great episode  where we interviewed the creator of the term “zero trust”, John Kindervag. We looked through more than 10 years of history of zero trust beginning with the coining of the term in 2010 and early Google efforts in this area. John also shared some practical tips on how to approach zero trust in today’s IT environments.The second zero trust episode focused on the technical details of collecting data for successful zero trust implementations. We covered some of the critical tasks and data points you must have before beginning any zero trust project.Rest assured, more episodes on zero trust are coming.Cloud migration securityThe topic of security during cloud migration has been covered both at the leadership level, in our CISO panels, as well as using field lessons from customers, partners, and Googlers. For example, in our CISO panel, Phil Venables, Google Cloud’s CISO, and others emphasized that security in the cloud involves a mindset shift, not just technology change. On the other hand, while looking at some of the implementation lessons, we covered common mistakes that companies make while migrating. One of our partners shared lessons they’ve learned supporting cloud migrations. We also touched on how some organizations faced challenges abandoning pre-cloud thinking and practices.When migrating to the cloud, where you’re starting from matters as much as where you’re going, and even where you and your customers are located, as we cover in our Europe-focused episode. Specifically, for our users in Europe, a different set of regulatory challenges are in play, including the overlapping and multiplicative regulatory complexity that arises from European federalism.Finally, most organizations really migrate data and workloads to multiple clouds, and there are specific multi-cloud security challenges covered in this episode.Cloud threat detectionWe dug deep into the topic of threat detection, looking at many angles: from more philosophical challenges down to operational issues with creating rules and practicing detection engineering. A very popular episode shares how some threat detection challenges are solved here at Google. Specifically, we covered how our engineers pursue threat research, then create detection code, and then follow up triaging and responding to the “signals” generated by their detection logic. Yes, Google security engineers both write detection logic and respond to the output of that detection logic. Talk about aligned incentives to create low-noise rules!  No Google security story would be complete without mentioning our fun episode with Heather Adkins. She shared perspective on securing Google and her talk at RSA 2021, which, unlike the proverbial tree falling in the forest, really did happen, even if virtually. Some great content on SIEM modernization was revealed in the episode where we interviewed one of the key implementation partners for Chronicleand Google Cloud security. We covered how SIEM technology is evolving in the cloud age, and plan to further explore this rich topic in future episodes.Another excellent episode with a Chronicle user focused on how SIEM technology evolved and how to make it work for you now and in the future.Data security in the cloudData security in the cloud presents both new challenges as well as solutions to old challenges. Things like pervasive encryption in GCP certainly solves some challenges while at the same time, reliance on identity is difficult for organizations that are used to building network security barriers between attackers and data. We covered foundational approaches to data security in the cloud and key pillars of a strategy in our second episode.  Next, we asked more key questions about how secure data in the cloud really is and what controls are most important to address customer needs.A NEXT 2021 special episode gathered together several product managers that build various data security products at Google Cloud (our DLP, encryption, etc.). They spoke to some of the data security innovations built here at Google and how they’ve been productized for our Cloud customers.Other topics and notable episodesWe’ve also talked in-depth about automated response to security events in the cloud. Cloud platforms are API first environments, so our security response can be automated in ways that weren’t previously possible. We spoke with a Cloud Security director who automated vulnerability and threat response at a large American bank, and we spoke with the engineering team who built, from the ground up, the automated response system for a large pharmaceutical company.We also covered some of the interesting security research done at Google, such as at VirusTotal (two in fact) and by our counter-abuse team.What’s nextYou can review past episodes on the site and subscribe for upcoming episodes (please!)  via Google Podcasts, Apple Podcasts and Spotify. Also, do follow Cloud Security Podcast on Twitter for episode announcements and audience commentary. Finally, let us know what we should cover in 2022! We look forward to another exciting year bringing you some of the most interesting and diverse voices across the Cloud Security community.Related ArticleAchieving Autonomic Security Operations: Reducing toilAs organizations go through digital transformation, the importance of building a highly effective threat management function rises to be …Read Article
Quelle: Google Cloud Platform

How Vuclip safeguards its cloud environment across 100+ projects with Security Command Center

Entertainment has never been more accessible. As our phones are now an inextricable part of our lives, there’s an increasing appetite for mobile video content, and that is what Vuclip delivers. Vuclip is a leading video-on-demand service for mobile devices with more than 41 million monthly active users across more than 22 countries.Speed is critical to the viewing experience, and delivering crisp, no-buffer video streaming was one of the reasons we decided to migrate to Google Cloud in 2017. Now we have replaced our monolithic on-prem infrastructure with a microservices-based production environment that’s almost fully on Google Cloud. Most services run on Google Kubernetes Engine, which delivers effortless scalability and quick time to market for new features and updates.With a huge footprint in the cloud across multiple companies, we’re a big target for attacks, from data breaches to hackers trying to access our systems illegally. We must prepare for these attacks proactively and mitigate them quickly when they happen. That’s why we decided to use Google Cloud’s Security Command Center (SCC) Premium to protect our technology environment across our complex microservices-based architecture.Increasing security and time-to-market with Security Command CenterBefore signing up to SCC Premium, we conducted a proof-of-concept with help from the Google Cloud team to experience its capabilities firsthand. What stood out to us was that SCC wouldn’t just help us mitigate attacks, it would strengthen our entire security apparatus by continuously identifying the weaknesses of our system and giving us recommendations on how to improve it. In the past, we had quite a traditional security model. Business units were responsible for their own security setup and received support from Group Risk, our company’s internal security audit team, to review developed applications before they could go into production. With SCC, it’s easier for us to detect findings and build the right security configurations into new services as we build them. We can configure policy based on SCC recommendations and act on suggestions quickly unlike earlier when everything was reported back to the Group Risk team for review. This has really reduced our time to market: going into production used to take at least a month, now we can do it in a week.Centralizing visibility for continuous insightsWith SCC Premium, we now streamline many security processes that used to require a lot of manual effort. In the past, we had to conduct regular vulnerability scans of our most critical systems, but with microservices running across more than 100+ projects it was difficult to deliver constant security checks on all of them. With centralized visibility, SCC enables us to monitor all of these projects continuously to discover misconfigurations and threats quickly, while making sure we’re adhering to our compliance standards.Here’s what it looks like day to day: for every new and existing project, when new services are added to the system our policies require the SRE team to configure SCC into the setup from the beginning. That’s how we can make sure that every surface and every application stack is utilizing the platform to help us detect all alerts and suggestions. We integrate all of these notifications into our Pub/Sub alerting system, giving us centralized visibility over our security posture across multiple projects.Every misconfiguration revealed with comprehensive alertsImproved visibility enables us to keep an eye on our systems proactively. Let’s take IP addresses, for example. Whenever we set up a new system, we must configure a new public-facing IP address from the GKE endpoint. When that happens, we get an alert from SCC, informing us that a new public IP address is being set up. Right away, SCC identifies any vulnerabilities or misconfigurations, such as missing firewall rules. Having that constant visibility, as opposed to the spaced-out vulnerability scans from the past, we achieve a continuous level of security that improves our overall posture.Mitigating threats in ¼ of the timeThis comprehensive security posture inadvertently leads to an increased number of alerts from SCC. Not all of them relate to serious attacks that need to be mitigated right away. That’s why we have dedicated team members on a rotating basis, who scroll through the alerts to identify the most pressing threats and decide on further actions. If there’s a problem we need to mitigate, we can do it in about a quarter of the time it used to take without SCC. This is because we no longer have to identify issues and search for solutions ourselves. Instead, the issue is pointed out immediately in the alert.A great side effect of these detailed alerts and recommendations is that our employees learn more about security-related matters. This experience trains them on how to improve our systems in the future and helps them prepare for more serious attacks.Strengthening compliance for faster approvalAnother area where SCC is helpful is compliance. Our baseline for new and existing services is the CIS Google Cloud Computing Foundations Benchmark, and SCC enables us to meet its requirements more efficiently with targeted suggestions. This facilitates the approval from the Group Risk team before we launch a service, they can see exactly how compliant we are with the CIS standard, further increasing our time-to-market and overall security posture.Entertaining the world securely with Security Command CenterWith SCC Premium, we’ve moved from a traditional security model reliant on intermittent vulnerability scans to a much more agile security strategy with continuous monitoring and centralized visibility and control. We’re excited to explore more of SCC’s features in the future, such as the ability to mute findings, which will help us to disable certain alerts we don’t need to be reminded of.Our evolution with SCC hasn’t just made Vuclip more secure and compliant, it’s helped us to reduce our time-to-market, delivering our services faster without compromising on security. In a fast-paced media world, that’s exactly what we need to remain the video-on-demand service provider of choice and entertain people around the world.Related ArticleSecurity Command Center – Increasing operational efficiency with new mute findings capabilitySecurity Command Center mute findings capability helps you gain operational efficiencies by effectively managing the findings volume base…Read Article
Quelle: Google Cloud Platform

Postmortems at Loon: a guiding force for rapid development

Founded by Google SRE alumni, it is no surprise that Loon’s Production Engineering/SRE team instituted a culture of blameless postmortems that became a key feature of Loon’s approach to incident response. Blameless postmortems originated as an aerospace practice in the mid-20th century, so it was particularly fitting that they came full circle to be used at a company that melded cutting edge aerospace work with the development of a communications platform and the world’s first stratospheric temporospatial software defined network. The use of postmortems became a standardizing factor across Loon’s teams— from avionics and manufacturing, to flight operations, to software platforms and network service. This blog post discusses how Loon moved from a heterogeneous approach to postmortems to eventually standardize and share this practice across the organization— a shift that helped the company move from R&D to commercial service in 2020.BackgroundPostmortemsMany industries have adopted the use of postmortems— they are fairly common in high-risk fields where mistakes can be fatal or extremely expensive. Postmortems are also widespread in industries and projects where bad processes or assumptions can incur expensive project development costs and avoiding repeat mistakes is a priority. Individual industries and organizations often develop their own postmortem standards or templates so that postmortems are easier to create and digest across teams.Blameless postmortems likely originated in the healthcare and aerospace industries in the mid-20th century. Because of the high cost of failure, these industries needed to create a culture of transparency and continuous improvement that could only come from openly discussing failure. As the original SRE book states, blameless postmortems are key to “an environment where every ‘mistake’ is seen as an opportunity to strengthen the system.” The goal of a postmortem is to document an incident or event in order to foster learning from it, both among the affected teams and beyond. The postmortem usually includes a timeline of what happened, the solutions implemented, the incident’s impact, the investigation into root causes, and changes or follow-ups to stop it from happening again. To facilitate learning, SRE’s postmortem format includes both what went well— acknowledging the successes that should be maintained and expanded— and what went poorly and needs to be changed. In this way, postmortem action items are key to prioritizing work that ensures the same failures don’t happen again.LoonLoon aimed to supply internet access to unserved and underserved populations around the world by providing connectivity via stratospheric balloons. These high altitude “flying cell towers” covered a much wider footprint than a terrestrial tower, and could be deployed (and repositioned) into the most remote corners of the earth without expensive overland transportation and installation. As the first company to attempt anything like this, Loon dealt with a number of systems that were complex, challenging, or novel: superpressure balloons designed to stay aloft for hundreds of days, wind-dependant steering, a software defined network consisting of constantly moving nodes, and extremes of temperature and weather at 20km above Earth’s surface.Prod TeamThe initial high-risk operations of Loon’s mission were avionic: could we launch and steer balloons carrying a networking payload long enough to reach and serve the targeted region? As such, the earliest failure reports within Loon (which weren’t officially called “postmortems” at the time) mostly involved balloon construction or flight, and drew on the experience of team members who had worked in the Avionics, Reliability Engineering, and/or Flight Safety fields. As Loon’s systems evolved and matured, they started to require operational reliability, as well. Just before graduating from a purely R&D project in Google’s “moonshot factory” incubator X to a company with commercial goals, Loon started building a Site Reliability Engineering (SRE) team known internally as Prod Team. In order to effectively offer internet connectivity to users, Loon had to solve network serving failures with the same rigor as hardware failures. Prod Team took the lead on a number of practices to improve network reliability. The Prod Team had three primary goals: Ensure that the fleet’s automation, management, and safety-critical systems were built and operated to meet the high safety bar of the aviation industry.Lead the integration of the communications services (e.g., LTE) end to end.Own the mission of fielding and providing a reliable commercial service (Loon Library) in the real world.  Postmortems at LoonThe Early DaysPostmortems were one tool for reaching Prod Team’s (SRE’s) goals. Prod Team often interacted with SREs in other infrastructure support teams that the Loon service connected to, such as the team developing the Evolved Packet Core (EPC), our telco partner counterparts, and teams that handle edge network connectivity. Postmortems provided a common tool for sharing incident information across all these teams, and could even span multiple companies when upstream problems impacted customers.At Loon, postmortems served the following goals:Document and transcribe the events, actions, and remedies related to an incident.Provide a feedback loop to rectify problems.Indicate where to build better safeguards and alerts.Break down silos between teams in order to facilitate cross-functional knowledge sharing and accelerate development.Identify macro themes and blind spots over the longer term. The combination of aerospace and high tech brought two strong practices of writing postmortems, but also the challenge of how to own, investigate, or follow up on problems that crossed those boundaries, or when it wasn’t clear where the system fault lay.Loon’s teams across hardware, software, and operations orgs used postmortems, as was standard practice in their fields for incident response. The Flight Operations Team, which handled the day-to-day operations of steering launched balloons, captured in-flight issues in a tracking system. The tracking system was part of the anomaly resolution system devised to identify and resolve root cause problems. Seeking to complement the anomaly resolution system, the Flight Operations Team incorporated the SRE software team’s postmortem format for incidents that needed further investigation— for example, failure to avoid a storm system, deviations from the simulated (expected) flight path that led to an incident, and flight operator actions that directly or indirectly caused an incident. Given that most incidents spanned multiple teams (e.g., when automation failed to catch an incorrect command sent by a flight operator, which resulted in a hardware failure), utilizing a consistent postmortem format across teams simplified collaboration.The Aviation and Systems Safety Team, which focused on safety related to the flight system and flight process, also brought their own tradition and best practices of postmortems. Their motto, “Own our Safety”, brought a commitment to continually improving safety performance and building a positive safety culture across the company. This was one of the strengths of Loon’s culture: all the organizations were aligned not just on our audacious vision to “connect people everywhere”, but also on doing so safely and effectively. However, because industry standards for postmortems and how to handle different types of problems varied across teams, there was some divergence in process. We proactively encouraged teams to share postmortems between teams, between orgs, and across the company so that anyone could provide feedback and insight into an incident. In that way, anyone at Loon could contribute to a postmortem, see how an incident was handled, and learn about the breadth of challenges that Loon was solving. ChallengesWhile everyone agreed that postmortems were an important practice, in a fast moving start-up culture, it was a struggle to comprehensively follow through on action items. This probably comes as no surprise to developers in similar environments— when the platform or services that require investment are rapidly changing or being replaced, it’s hard to spend resources on not repeating the same mistakes. Ideally, we would have prioritized postmortems that focused on best practices and learnings that were applicable to multiple generations of the platform, but those weren’t easy to identify at the time of each incident.Even though the company was not especially large, the novelty of Loon’s platform and interconnectedness of its operations made determining which team was responsible for writing a postmortem and investigating root causes difficult. For example, a 20 minute service disruption on the ground might be caused by a loss of connectivity from the balloon to the backhaul network, a pointing error with the antennae on the payload, insufficient battery levels, or wind that temporarily blew the balloon out of range. Actual causes could be quite nuanced, and often were attributable to interactions between multiple sub-systems. Thus, we had a chicken-and-egg problem: which team should start the postmortem and investigation, and when should they hand off the postmortem to the teams that likely owned the faulty system or process? Not all teams had a culture of postmortems, so the process could stall depending on the system where the root cause originated. For that reason, Loon’s Prod Team/SREs advocated for a company-wide blameless postmortem culture. Much of how Loon used postmortems, especially in software development and Prod Team, was in line with SRE industry standards. In the early days of Loon, however, there were no service level objectives or agreements (SLO/As). As Loon was an R&D project, we wrote postmortems when a test network failed to boot after launch, or when performance didn’t meet the team’s predictions, rather than for “service outages”. Later on, when Loon supplied commercial service in disaster relief areas in Peru and Kenya, the Prod Team could more clearly identify the types of user-facing incidents that required postmortems due to failure to meet SLAs.Improving and Standardizing Loon’s Postmortem ProcessesMoving Loon from an R&D model to the model of reliability and safety necessary for a commercial offering required more than simply performing postmortems. Sharing the postmortems openly and widely across Loon was critical to building a culture of continuous improvement and addressing root causes. To increase cross-team awareness of incidents, in 2019 we instituted a Postmortem Working Group. In addition to reading and discussing recent postmortems from across the company, the goals of the working group were to make it easier to write postmortems, promote the practice of writing postmortems, increase sharing across teams, and discuss the findings of these incidents in order to learn the patterns of failure. Its founding goal was to “Cultivate a postmortem culture in Loon to encourage thoughtful risk taking, to take advantage of mistakes, and to provide structure to support improvement over time.” While the volume of postmortems could ebb and flow across weeks and months, over multiple years of commercial service we expected to be able to identify macro-trends that needed to be addressed with the cooperation of multiple teams.In addition to the Postmortem Working Group, we also created a postmortem mailing list and a repository of all postmortems, and presented a “Lunch & Learn” on blameless postmortems (see example slide below). Prod Team and several other teams’ meetings had a standing agenda item to review postmortems of interest from across the company, and we sent a semi-annual email celebrating Loon’s “best-of” recent incidents: the most interesting or educational outages.Once we had a standardized postmortem template in place, we could adopt and reuse it to document commercial service field tests. By recording a timeline and incidents, defining a process and space to determine root causes of problems, recording measurements and metrics, and providing the structure for action item tracking, we brought the benefits of postmortem retrospectives to prospective tasks. When Loon began commercial trials in countries like Peru and Kenya, we conducted numerous field tests. These tests required engineers from Loon and/or the telco partner to travel to remote locations to measure the strength of the LTE signal on the ground. Prod Team proactively used the postmortem template to document the field tests. It provided a useful format to record the log of test events, results that did and did not match expectations, and links to further investigations into those failures. As a cutting edge project in a highly variable operating environment, using the postmortem template as our default testing template was an acknowledgement that we were in a state of constant and rapid iteration and improvement. These trials took place in early to mid 2020, under the sudden specter of Covid and the subsequent shift towards working from home. The structured communications at the core of Loon’s postmortem structure were particularly helpful as we moved from in-person coordination rooms to WFH.What Loon Learned from Standardizing PostmortemsPostmortems are widely used in various industries because they are effective. At Loon, we saw that even fast moving startups and R&D projects should invest early in a transparent and blameless postmortem culture. That culture should include a clear process for writing postmortems, clear guidelines for when to conduct a postmortem, and a staffed commitment to follow up on action items. Meta-reviews across postmortems and outages revealed several trends. The many points of failure we observed across the range of postmortems were indicative of both the complexity of Loon’s systems and the complexity of some of its supporting infrastructure. Postmortems are equally adept at finding flaky tests and fragile processes vs. hardware failures or satellite network outages. These are complexities familiar to many startups, where postmortems can help manage the tradeoff between making changes safely vs. moving quickly and trying many new things.Loon was still operating a superhero culture: across a wide range of issues, a small set of experts were repeatedly called upon to fix the system. This dynamic is common in startups, and not meant as a pejorative, but was markedly different from the system maturity that many of Prod Team/SRE were used to. Once we identified this pattern, our plan for commercial service was to staff a 24×7 oncall rotation, complemented by Program Managers driving intention processes to de-risk productionPostmortems provided a space to ask questions like, “What other issues could pop up in this realm?”, which prompted us to solve for the broader case of problems rather than specific problems we’d already seen. This practice also stopped people from brushing off problems in the name of development speed, or from dismissing issues because they “just concerned a prototype”.Tips and TakeawaysWhile the specifics of Loon’s journey to standardize postmortems tell the story of one company, we have some tips and takeaways that should be applicable at most organizations.Tip 1: Adopting a blameless postmortem culture requires everyone to participateAlthough the initiative of writing postmortems often originates with a software team, if you want every team to adopt the practice, we suggest trying the following:Give a talk about postmortems and how and why they could benefit all.Form a postmortem working group.Invite people representing different teams to be part of the postmortem working group. They will give insights into what could work better for their respective teams.Don’t make the postmortem working group responsible for writing the postmortems— this approach doesn’t scale. Reviewing and consulting on postmortems may be in scope of their duties, especially while new teams are adopting this practice.Tip 2: Define a lightweight postmortem processEspecially during adoption, you want teams to see the benefits of postmortems, not the burden of writing them. Creating a postmortem template with minimum requirements can be helpful.Tip 3: Define a clear owner for postmortemsWho should write a postmortem and when? For software teams with an oncall rotation, the answer is clear: the person who was oncall during the incident is the owner, and we write postmortems when a service interruption breached SLOs. But when the service has no SLOs, or when a team doesn’t have an oncall rotation, you need defined criteria. Bonus points if the outage involves multiple systems and teams. The following exercises can help in this area:Reflect on these topics from the point of view of each team, and from the point of view of the interaction between teams.For each team, define what type of incident(s) should trigger a postmortem.Within the team, define who should own writing each postmortem. Avoid putting the entire burden on the same person frequently; consider forming a rotation.Tip 4: Encourage blameless postmortems and make people proud of themConsider some activities that can help foster the blameless postmortem culture:Write a report of the best postmortems over a given period and circulate them broadly.Conduct training on how to write postmortems.Train managers and encourage them to prioritize postmortems on their teams.ConclusionWhen Loon shut down, addressing all these points was still a work in progress. We don’t have a teachable moment of “this postmortem process will solve your failures”, because postmortems don’t do that. However, we could see where postmortems stopped us from needing to deal with the same failures repeatedly… and where sometimes we did experience repeat incidents because the AIs from the first postmortem weren’t prioritized enough. And so this piece of writing— effectively, a postmortem on Loon’s postmortems—serves up a familiar lesson: postmortems work, but only as well as they are widely accepted and adhered to.
Quelle: Google Cloud Platform

Tokopedia’s journey to creating a Customer Data Platform (CDP) on Google Cloud Platform

Founded in 2009, Tokopedia is an ecommerce platform that enables millions of Indonesian to transact online. As the company grows, there is an urgent need to better understand customer’s behavior in order to improve the customer’s experience across the platform. Now, Tokopedia has more than 100 million Monthly Active Users and the demography and preferences of all these users are different. A way to meet their needs is through personalization. Normally, a user needs to browse through thousands of products in order to find the item they are looking for. By creating product recommendations that are relevant to each user, we shorten their search journey and hopefully increase conversion early on in the journey. In order to build personalization, the Data Engineering Team’s Customer Data Platform (CDP) helped to gain access to user’s attributes. These attributes developed by the Data Engineering team come in handy for different use cases across functions and teams.Previously, two main challenges were observed:The need for speed and answers caused an increase in data silos. As the needs for personalization increased across the company, different teams have been building their own personalization features. However, the limited time and the need to simplify communication across teams have resulted in the decision for each team to create their own data pipeline. This caused a few redundancies due to the development of similar data across different teams and these redundancies caused slower development time for new personalized feature, even though some of the attributes have been previously build in a different module.Inconsistent data definitions. As each team created their own data pipeline, there are many cases where each team had a different definition of a user’s attributes. On several occasions, this caused misunderstandings during meetings and unsynchronized user journeys due to different teams applying different attribute values to the same user. For example, team A evaluated user_id 001 as a woman in their 20s. Meanwhile, team B, having a different set of attributes and definitions evaluated user_id 001 as a woman in their 30s. These differences in definition and attributes can lead to different conclusions and results, consequently giving different personalizations. As a result, customers might be facing inconsistent experience during their journey in Tokopedia and have a bad experience during their activity. Imagine that you’re being displayed by one set type of content that is related with college necessities and then in a different module you’re being given a a content that is related to mom and baby.Previous State of Data DistributionCurrently, with CDP, different teams do not have to constantly rebuild the infrastructure. The same attributes will only need to be processed once, and can be used by different teams across the company. This optimizes the development time, cost, and effort. Another advantage of having CDP is the single definition of attributes across services and teams. Since different teams will be looking at the same attributes inside the CDP, this will reduce the chances of misunderstanding and strengthen synchronization between teams. This will give customers consistent experience across the Tokopedia platform and enable them to display relevant contents.CDP High level ConceptMoreover, there are several key factors required in building the CDP platform in Tokopedia. The journey is as follows:1. Define and Make a List of AttributesDuring this phase, we work with the Product and Analyst teams to define all of the user’s attributes required to build the CDP. Our product team interviewed several stakeholders to understand different perspectives regarding user attributes. As a result, an initial attributes list was made to include gender, age group, location, etc. This process is done repetitively in order to have the best understanding of the user’s attributes.2. Platform DesignAfter doing comprehensive reviews, we decided to build our CDP platform using several GCP tech stacks.CDP ArchitectureBigquery was chosen as the analytics backend of our CDP self-service. Meanwhile, Google Cloud BigTable was selected as the backend, where our services will interact to enable the personalization. In developing the storage for Big Table, the design of the scheme is very important. The frequency and categorization will affect how we design the column qualifier while the CDP attribute will affect how we design the row key.We also opted to create a caching mechanism to reduce the load to big tables for similar read activity. We build the cache system using redis with certain Time to Live (TTL) to ensure an optimized performance. In addition, we also applied a Role Based Access Control (RBAC) mechanism on the CDP API to ensure access control of different services towards attributes in the CDP.3. Monitoring and alertingAnother important point in building a CDP is developing the correct monitoring and alerting system to maintain stability on our platform. A soft and hard threshold on each metric is established and monitored. Once this threshold is reached, some alerts will be sent through the communication channel. Based on the current architecture, there are several parts in which we need to enable monitoring and alerting. Data PipelineOne of the things that we will need to monitor is resource consumption during computation and data pipeline from data sources to the CDP storages, as we operate using Bigquery and Dataflow for Data Computation and Data Pipeline. In Bigquery, we need to monitor the slot utilization that is used to compute some data aggregation or manipulation to produce the attribute. Data QualityWhen building the CDP, high quality data was important in order for it to be a trusted platform. Several metrics that are important in terms of data quality are Data Completeness, Data Validity, Data Anomaly and Data Consistency. Therefore, several monitoring needs to be enabled to ensure these metrics.Storage and API PerformanceSince CDP’s backend and API directly interact with several front facing features, we have to ensure the availability of the CDP service. Since we’re using Big Table as the backend, the monitoring of CPU, Latency and RPS is required. This metric, by default, is provided in the Bigtable monitoring.4. Discoverability across companyMany users have been inquiring on how they can browse attributes that our CDP offers. Initially, we started out by documenting our attributes and sharing it to our stakeholders. However, as the number of the attributes increased, it became increasingly harder for people to go through our documentation. This pushed us to start integrating the CDP terminology into our Data Catalog. In this case, our Data Catalog plays an important role in enabling users to browse attributes in CDP, including the definition of each attribute and how they can retrieve the data.5. Implementation and adoption of the platform Another key point for a successful CDP implementation is collaboration across teams on the front end services. There are several types of CDP implementation in Tokopedia: Personalization, Marketing Analytics, and Self Service Analytics.PersonalizationThe most common usage of CDP would be in personalizing a user’s journey. One example of personalization is the search feature. The product team personalizes the user’s search result based on the user’s address, so that the user will be able to find products that are in proximity to their location. After discussing the definition of user address, we created a CDP API contract with the Search team, so the development can run in parallel. As a result, today our users are able to have a better user experience based on their location.Marketing AnalyticsWhen we started building the CDP platform, we discussed with the Marketing team on their existing use cases. One of their goals was to personalize and optimize marketing efforts, such as sending out notifications to the right user based on the user’s attributes to reduce unnecessary notification costs to unrelated users, and to enhance the overall user experience by avoiding spam notifications. Once we understood their needs, we looked at the ways in which CDP could cater to those needs. We discussed with the relevant team on how to integrate the segmentation engine and communication channel towards the CDP platform, the type of user attributes to use when sending marketing push/notifications, and how to integrate it with the segmentation engine and communication channel of the CDP platform.Self-Service AnalyticsCDP also often uses self-service analytics to enable quick insights on user demographics and behavior in certain segments. To build this self-serve analytics tool, our team consulted with the Product and Analyst teams to define the user demographics’ attributes that business/product users often select for insights. After understanding the attributes required, we discussed with the Business Intelligence team to enable the visualization for the end user. This allowed different teams to understand our users better and gain insights on how we can improve our platform.CDP implementation has created a significant impact on different use cases and helped Tokopedia to be a more data-driven company. Through CDP, we are also able to strengthen one of our core DNA, which is Focus on Consumer. By sharing the CDP framework, we hope to bring value and help others to more easily create a thriving CDP platform.
Quelle: Google Cloud Platform

Ensuring scale and compliance of your Terraform Deployment with Cloud Build

Terraform is an open source Infrastructure as Code tool that is popular with platform developers building reusable cloud automation. The Terraform Provider for Google Cloud Platform continues to add support for the latest Google Cloud features, such as Anthos on GKE, and our teams continue to expand Terraform integrations including Cloud Foundation Toolkit and Terraform Validator.How do teams use Terraform on Google Cloud? While the simplest approach is to run terraform init, plan and apply directly from your terminal,  it cannot be recommended for automating your production deployments. First, there is a decision on how to store your Terraform state in a way that is secure, compliant and enables team collaboration. Secondly there’s a question of scale and reliability. Over the course of even the simplest cloud deployment, Terraform can end up making thousands of Create/Read/Update/Delete API calls to the endpoints used by the Terraform providers, some of which will inevitably hit quota issues or need to be retried for other reasons. For platform administrators, who are looking to ensure the best deployment practices for their curated Terraform solutions,  while benefiting from the simplicity of Google Cloud Console, there’s Terraform Private Catalog integration that we enabled earlier this year.Outside of Private Catalog, Cloud Build and Cloud Storage have been the recommended approach to use Terraform on Google Cloud. Using a remotebackend prevents race conditions and simplifies sharing reusable modules between different configurations. With Cloud Build you can configure a GitOps CI/CD pipeline to automatically plan and apply your Terraform configuration when changes are pushed into the repo. These are widely popularized benefits explored in Managing infrastructure as code with Terraform, Cloud Build, and GitOps. In addition, there are lesser known advantages of Cloud Build, particularly for enterprise customers: Cloud Build’s concurrency capabilities and VPC-SC support, Cloud Storage versioning, security and compliance. Let’s explore these benefits in more detail.Cloud Build’s ability to scale makes it capable to process multiple Terraform deployments across the regions globally and simultaneously. By default, Cloud Build supports 30 concurrent builds, with additional builds queued and processed after the running builds complete. In some cases it may not be enough. Customers who initiate parallel deployments to multiple zones, or, those who provision infrastructure on behalf of multiple tenants, often require running more concurrent deployments to complete all of them within the allotted deployment window. Cloud Build private pool feature allows up to 100 concurrent builds which may be further adjusted upon request. This is an example of creating a private pool and then using it when submitting a build:A full step by step example of creating a private pool and submitting 80+ Terraform deployments with Cloud Build simultaneously is available here.Using Cloud Build removes the need to build a custom high-scale Terraform provisioning service and provides observability and diagnostics for each of the build instances launched and their results. Using Cloud Build with private pools enables recommended security features, such as VPC Service Controlsthat allows setting secure perimeter to protect against data exfiltration, with additional restrictions to further restrict it to using the specified private pools. This makes it unnecessary to configure a dedicated bastion host inside the perimeter, which improves the overall security posture.Beyond just using Cloud Storage for remote storage, additional reasons to use Cloud Storage include versioning, security and compliance. Enabling versioning protects against state file corruption and allows you to view earlier versions. Versioning can be enabled with gsutil command:In addition to versioning, you can use Customer-Supplied Encryption Keys to encrypt the Terraform state file. After you generated the key you can specify it as encryption_key parameter of your backend object:Once encrypted you can still view the contents of your state by adding encryption_key option to boto configuration file. Finally, Cloud Storage is one of the Google Cloud services covered by FedRAMP High, which is important for enterprises  that are seeking their own FedRAMP on top of Google Cloud (for more details see Compliance resource center).To summarize, using Cloud Build and Cloud Storage for your Terraform deployments enable high scalability, security and compliance with simpler configuration and via familiar gcloud and Google Cloud console interface. Please check out this sample for step by step guidance.Related ArticleIntroducing Cloud Build private pools: Secure CI/CD for private networksWith new private pools, you can use Google Cloud’s hosted Cloud Build CI/CD service on resources in your private network or in other clouds.Read Article
Quelle: Google Cloud Platform

Enabling keyless authentication from GitHub Actions

GitHub Actions is a third-party CI/CD solution popular among many Google Cloud customers and developers. When a GitHub Actions Workflow needs to read or mutate resources on Google Cloud – such as publishing a container to Artifact Registry or deploying a new service with Cloud Run – it must first authenticate.Traditionally, authenticating from GitHub Actions to Google Cloud required exporting and storing a long-lived JSON service account key, turning an identity management problem into a secrets management problem. Not only did this introduce additional security risks if the service account key were to leak, but it also meant developers would be unable to authenticate from GitHub Actions to Google Cloud if their organization has disabled service account key creation (a common security best practice) via organization policy constraints like constraints/iam.disableServiceAccountKeyCreation.But now, with GitHub’s introduction of OIDC tokens into GitHub Actions Workflows, you can authenticate from GitHub Actions to Google Cloud using Workload Identity Federation, removing the need to export a long-lived JSON service account key.Fine-grained scoping. Workload Identity Pools and Providers can define fine-grained attribute mappings between the OIDC token and the available permissions in Google Cloud. Whereas a JSON service account key is either accessible or inaccessible, Workload Identity Federation can be configured to selectively allow authentication based on properties in the downstream OIDC tokens. For GitHub Actions, that means you can, for example, restrict authentication to certain repositories, usernames, branch names, or published claims. You can also combine and build more complex and compound constraints using CEL.Short-lived credentials. Unlike JSON service account keys, Workload Identity Federation generates short-lived OAuth 2.0 or JWT credentials. By default, these credentials automatically expire one hour after they are created, potentially reducing the time a malicious actor would be able to exploit a compromised credential.Minimal management overhead. JSON service account keys must be securely stored, rotated, and managed. Even at a small scale, this can be toilsome and prone to errors. Because Workload Identity Federation uses short-lived credentials, there are no secrets to rotate or manage beyond the initial configuration.A new GitHub Action – auth!To ease the process of authenticating and authorizing GitHub Actions Workflows to Google Cloud via Workload Identity Federation, we are introducing a new GitHub Action – auth! The auth action joins our growing collection of Google-managed GitHub Actions and makes it simple to set up and configure authentication to Google Cloud:This will use the configured workload_identity_provider and service_account to authenticate future steps. The gcloud command-line tool, official Google Cloud client libraries, and popular third-party tools like Terraform will automatically detect and use this authentication. Additionally, all the Google GitHub Actions support this authentication mechanism. For example, you can use the auth GitHub Action with the get-gke-credentials GitHub Action:If you are using third-party tools that do not support Application Default Credentials, or if you want to invoke Google Cloud APIs manually via curl, the auth GitHub Action can create OAuth 2.0 tokens and JWTs for use in future steps. The following example creates a short-lived OAuth 2.0 access token and then uses that token to access a secret from Google Secret Manager using curl:To ease in migration and to support legacy workflows, the auth GitHub Action also supports authenticating via a Google Cloud service account key JSON file:Learn more about the auth GitHub Action and check out the examples at google-github-actions/auth.Setting up Identity Federation for GitHub ActionsTo use the new GitHub Actions auth action, you need to set up and configure Workload Identity Federation by creating a Workload Identity Pool and Workload Identity Provider:The attribute mappings map claims in the GitHub Actions JWT to assertions you can make about the request (like the repository or GitHub username of the principal invoking the GitHub Action). These can be used to further restrict the authentication using –attribute-condition flags. For example, you can map the attribute repository value (which can be used later to restrict the authentication to specific repositories):Finally, allow authentications from the Workload Identity Provider to impersonate the desired Service Account:For more configuration options, see the Workload Identity Federation documentation. If you are using Terraform to automate your infrastructure provisioning, check out the GitHub OIDC Terraform module too.Towards invisible securityAt first, authenticating to Google Cloud from a GitHub Action without a long-lived JSON service account key might seem like magic, but it’s all part of Google Cloud’s ongoing efforts to make security invisible and our platform secure-by-default. Using Workload Identity Federation to replace long-lived JSON service account keys in GitHub Actions delivers  improvements in security and auditability.To get started, check out the auth GitHub Action today!Related ArticleKeyless API authentication—Better cloud security through workload identity federation, no service account keys necessaryWith workload Identity federation, you can securely operate your workloads and no longer have to worry about managing service account keys.Read Article
Quelle: Google Cloud Platform

Using BigQuery with data sources in Google Cloud VMware Engine

This blog is intended for customers who have migrated on-premises data sources to Google Cloud VMware Engine and want to utilize data and analytics services provided by Google Cloud. One of the objectives of customers who choose Google Cloud is to leverage Google Cloud analytics with their datasets. If you are an IT decision maker or a data architect who wants to quickly use the power of your data with Google analytics, this blog describes approaches to access your data within BigQuery, where advanced analytics and machine learning on your datasets is possible. Why?Data consumption and analytics is at the forefront of technology. Customers today consume and manage large amounts of data and resource pools. These challenges create an opportunity for Google Cloud to assist in managing and understanding your existing databases without having the need to undergo costly re-architecting of your source material or data location. This blog presents approaches to access Google Cloud data and analytics services with your existing data without having to re-architect your databases. Once your data sources are in Google Cloud VMware Engine, Google’s highly available and fault tolerant infrastructure can be leveraged to enhance the performance of data pipelines. These solutions aim to reduce time to value extraction from your datasets with cloud native analytics available via BigQuery.  This solution of migrating via Google Cloud VMware Engine offers advantages to all parts of data operations. The database administrator (DBA) and virtual infrastructure/cloud admins can use familiar environments similar to on-premises on the cloud. The on-premises infrastructure team can enable the data scientist/AI/machine learning (ML) teams using familiar toolsets. These teams now have access to Google Cloud AI/ML/data analytics capabilities for their on-premises data.For example, if you want to uncover cross-sell opportunities within your products, the first step is to ensure that product usage and billing datasets across your products are connected for analytics. The DBA team will identify these datasets and the infrastructure team will enable access to these sources. The application team will then replicate this data to BigQuery and use approaches such as BigQuery ML recommendations to uncover cross-sell opportunities. Another example of a use case is forecasting usage growth for operations and growth planning. Once your sales data is replicated within BigQuery, approaches for advanced time-series forecasting become available with your datasets.What does this cover?We present approaches to replicate your relational datasets within BigQuery in a private and secure way utilizing either Google Cloud Data Fusion or Google Cloud Datastream. Datafusion is an ETL tool that supports various kinds of data pipelines. Datastream is a service for change-data-capture and replication. Using both these services, data is always within your projects in Google Cloud and internal IP is used to access data. We will focus on real-time replication, so that you can access your data continuously from operational data stores, such as SQL Server, MySQL, and Oracle within BigQuery. Moving data from your data sources to the cloud and maintaining data pipelines to your data warehouses via Extract Transform Load (ETL) is a time consuming activity. An alternate approach is ELT (Extract Load Transform). The ELT approach loads data into the target system (e.g., BigQuery) before transforming the data. The ELT process is frequently preferred over the traditional ETL process because it’s simpler to realize and loads the data faster.With your datasets now residing in the Google Cloud, data teams can utilize Cloud Data Fusion and Datastream over the high speed, low latency Google Cloud network to replicate or move data from your VMware infrastructure to various destinations in Google Cloud such as Google Cloud native storage buckets or BigQuery. For simplicity, we will assume that all services are consumed within the same project. We will also discuss some pricing implications when moving data from Google Cloud VMware Engine from on-premises or another virtual private cloud (VPC).Cloud Data Fusion: Cloud Data Fusion provides a visual point-and-click interface that enables code-free deployment of ETL/ELT data pipeline. Cloud Data Fusion also provides a replication accelerator that allows you to replicate your tables into BigQuery. Cloud Data Fusion internally sets up a tenant project with its own VPCs to manage Cloud Data Fusion resources. To access data sources within Google Cloud VMware Engine using Cloud Data Fusion, we use a reverse proxy on the main VPC. This is described in the image below.In this scenario, we have our data workloads running on the Google Cloud VMware Engine instance within the project. The Google Cloud VMware Engine environment is accessed via a project level VPC peered with Google Cloud VMware Engine. A Google Compute Engine instance on the project level VPC exposes reverse proxy to the Google Cloud VMware Engine database to services that are unable to access the Google Cloud VMware Engine instance directly. A Cloud Data Fusion instance is enabled with private IP access and network peering to the main VPC and is able to access the data via the reverse proxy instance. This process to set up internal IP access and network peering on Cloud Data Fusion is described in this documentation.Once this peering is complete, we use a Java Database Connectivity connector within Cloud Data Fusion to access our databases either for replication or for advanced ETL operations. To enable change data capture, we need to enable the database within Google Cloud VMware Engine to track and capture the changes to the databases. This entire process setup and replication are described in the documentation for MySQL and for SQL Server. Google Cloud Datastream:Datastream is a serverless change data capture and replication service. You can access streaming, low-latency data from Oracle, and MySQL databases on Google Cloud VMware Engine. This approach offers more flexibility in managing data flow pipelines. This solution is currently in pre-general availability and is only available in select regions.This option also requires a reverse proxy configured within a Google Compute Engine instance. This reverse proxy is used to access data sources within Google Cloud VMware Engine. This option is described in this documentation.The complete setup to use Datastream can be found in this how-to guide. To enable replication, we need a stream configured on Datastream, this stream accesses data from the database and pipes the data to the cloud storage sink. Datastream accesses data using  a reverse proxy which needs to be exposed on the customer’s VPC. To pipe the data to BigQuery, we use a pre-configured Datastream to BigQuery template within Dataflow.How to get started?First step is to migrate workloads to Google Cloud VMware Engine. Your cloud admin/architect will typically drive this. If not already identified during the migration phase, the next step is to identify databases residing on virtual machines hosted within Google Cloud VMware Engine, and recreate existing reports using BigQuery. In most organizations there will be multiple personas involved with this process. For example, a data architect might be the best source for info on data sources, a solutions architect will have insights on the cost/performance implications, and the infrastructure inputs will be needed for network interfaces. The steps below outline one possible approach to enable this motion.  Identify datasets residing on virtual machines migrated to Google Cloud VMware Engine that are used for reports. Select the right pipeline (Datastream vs. Data Fusion) based on the database type and the pipeline requirements (price/performance trade offs and ease of use).Based on the data pipeline, select the appropriate region. There are no data egress charges within the same region. Setup the reverse proxy to the Google Cloud VMware Engine dataset. Setup the replication service with performance parameters based on the replication performance needed.Enable analytics and visualization based on the business requirements on the dataset.Conclusion:The Google Cloud VMware Engine service is a fast and easy way to enable data and analytics visualization using your existing data sets. You can now leverage your existing infrastructure operational posture on VMware to enable cloud analytics without having to undergo time consuming re-architecting of your databases. These approaches enable you to leverage the performance benefits of dedicated hardware on Google Cloud, connecting with the world’s most advanced data capabilities.  Acknowledgements:The authors would like to thank Manoj Sharma and Sai Gopalan regarding their inputs on this blog.Related ArticleMonitoring made simple for Google Cloud VMware Engine and Google Cloud operations suiteLearn how we simplified monitoring for Google Cloud VMware Engine and Google Cloud operations suite.Read Article
Quelle: Google Cloud Platform

Unlocking opportunities with data transformation

One of the biggest challenges data executives have today is turning the immense amount of information that their organization, customers and partners — or rather their whole ecosystem — are creating into a competitive advantage. In my role here at Google Cloud, I specialize in everything data — from analytics, to business intelligence, data science and AI. My team’s role is split into 3 main activities:Engagement with customers and partner community. About 70% of my time is spent with customers. And it’s where I’ve gathered all these insights that I’m going to share with you today.  Product strategy and execution. This time is for strategizing and planning around all our new Cloud launches and products.Go-To-Market globally. This is where we ask all the tough questions: How do we make it easier for our customers to onboard? And get the most out of our services? To transform and innovate? And then we solve for them.It’s safe to say data-driven transformation is my bread and butter. And I want it to be yours too. My aim is to help people think about data in a new way — not something to be afraid of, but something to leverage and grow with. There are still lots of problems to be solved in our industry. But data is helping us unlock a world of opportunities.What modern data architectures look like todayThere’s a treasure trove of new technologies that are transforming the way companies do business at incredible speeds. I think of companies like Paypal, which migrated over 20 petabytes of data to serve its 3,000+ users, and Verizon Media, which ingested 200 terabytes of data daily and stored 100 petabytes in BigQuery. Even traditional retailers like Crate & Barrel are making strides in the cloud, doubling their return-on-ad-spend (ROAS) while only increasing investment by 20%.But what do these companies all have in common? A modern approach to their data practices and platforms. And there are three attributes that I think all organizations should take into account: 1. Embrace the old with the new.  Every single one of the most important brands on earth has legacy systems. They’ve developed leadership over decades and these systems (before the cloud came along) got them there.2. Don’t discard what’s going to get you there (i.e multi-cloud). All modern architectures today are multi-cloud by default. According to Flexera, over 80% of businesses reported using a multi-cloud strategy this year and over 90%  have a hybrid  strategy in place.3. Data is no longer a stagnant asset. Organizations that win with data think about it as part of an ‘ecosystem’ of opportunity, where insights arise from emerging data — whether it be from interconnected data networks or the data from their partners. And this is a trend organizations should keep their eye on. A study from Gartner predicts that by 2023, organizations that promote data sharing will outperform their peers on most business value metrics.How to make the best hires for your data teamLeaders often say that their competitive advantage comes from their people, not just services or products. While most companies are now recognizing the importance of data and analytics, many still struggle to get the right people in place. The best way to look at how many data people to hire is to ask yourself, what percentage of my total employee base should they make up? I agree with Kirk Borne, Chief Scientist Officer at DataPrime Solutions, who says that your entire organization should be ‘data literate’. And when we say literate, we mean recognize, understand and talk data. One third of your company should be ‘data fluent’ — meaning able to analyze and present informed results with data. And finally, 10% of employees should be ‘data professionals’ that are paid to create value from data. That’s where all your chief scientists, data analysts, engineers and Business Intelligence specialists come into play.The ideal data team structure of course depends on the type and size of the company. Furniture and home e-commerce company Wayfair for instance has approximately 3,000 engineers and data scientists — close to 18% of its total workforce. Who should own the data? There are a lot of questions around who data leaders should work for and who should own that data. It’s  tough to answer because there are so many choices. Should it be the CTO? Or the CFO, whose initiatives are around cost reduction? Or the CPO, who may focus on product analytics only? When asking customers at scale, it’s typically under the CFO or CTO. And while that makes sense, I think there’s something else we should be asking: How should data be approached so that companies are enabled to innovate with it?A trend we’re hearing a lot more about is data mesh. This data ownership approach basically centralizes data and decentralizes analytics through ‘data neighborhoods.’ This allows business users and data scientists to access, analyze, and augment insights, but in a way that’s connected to the centralized strategy and abides by corporate rules and policies.Data neighborhoodsData: 2022 and beyondData analytics, data integration and data processing can be very complex, especially as we begin to modernize. So I’d like to leave you with a ‘gotcha’ moment — and that’s data sharing. You can’t expect to reap the benefits of data instantly. First you have to work with it, clean it up and analyze it. The real innovators are those looking at the wider picture — considering analytics solutions and sharing and combining datasets.My advice for people who want to get started? Forget the notion of new and existing use cases and focus on business value from day one. How are you going to measure that? And how are you sharing that with leaders that are supporting your initiative? Data is constantly growing and trends are always shifting. So we need to stay on our toes. Data-driven transformation gives businesses real-time insights and prepares you for the unpredictable. So looking forward to 2022, I’d say use data to plan for change and plan for the unexpected.A data cloud offers a comprehensive and proven approach to cloud — allowing you to increase agility, innovate faster, get value from your data and support business transformation. Google Cloud is uniquely positioned to help businesses get there. Learn how. 
Quelle: Google Cloud Platform