Cloud CISO Perspectives: February 2022

As the war in Ukraine continues to unfold, I want to update you on how we’re supporting our customers and partners during this time. Google is taking a number of actions. Our security teams are actively monitoring developments, and we offer a host of security products and services designed to keep customers and partners safe from attacks. We have published security checklists for small businesses and medium-to-large enterprises, to enable entities to take necessary steps to promote resilience to malicious cyber activity.Below, I’ll recap the latest efforts from the Google Cybersecurity Action Team such as our second Threat Horizons Report, and highlight new capabilities from our cloud security product teams who have been working to deliver new controls, security solutions and more to earn the trust of our customers globally. Munich Cyber Security ConferenceEarlier this month, I joined a panel at the Munich Cyber Security Conference (Digital Edition) to discuss supply chain risks and cyber resiliency. It was great to see a packed agenda featuring diverse voices from the security industry along with government leaders and policymakers coming together to discuss the challenges we’re working to collectively solve in cybersecurity. One area of particular focus is securing the software supply chain. During the panel, we talked about Google’s approach to building our own internal software and incorporating open source code in a secure way. This has been the foundation of our BeyondProd approach.We implement multiple layers of safeguards like multi-party change controls and a hardened build process that produces digitally signed software that our infrastructure explicitly validates before executing. We’ve since turned this into an open framework that all organizations can use to assess themselves and their supply chains: SLSA. How we collectively as an industry secure the software supply chain and prevent vulnerabilities in open source software will continue to be critical for cloud and SaaS providers, governments and maintainers throughout 2022. Google Cloud Security Talks  On March 9, we’ll host our first Cloud Security Talks of 2022 that will focus on how enterprises can modernize their approach to threat detection and response with Google Cloud. Sessions will highlight how SecOps teams can leverage our threat detection, investigation and response capabilities across on-premise, cloud, and hybrid environments, including new SOAR capabilities from our recent acquisition of Siemplify. Registerhere. Google Cybersecurity Action Team Highlights Here are the latest updates, products, services and resources from our cloud security teams this month: Security FIDO security key support for GCE VMs: Physical security keys can now be used to authenticate to Google Compute Engine virtual machine (VM) instances that use our OS Login service for SSH management. Security keys offer some of the strongest protection against phishing and account takeovers and are strongly recommended in administrative workflows like this.IAM Conditions and Tags support in Cloud SQL: We introduced IAM Conditions and Tags in Cloud SQL which bring powerful new capabilities for finer-grained administrative and connection access control for Cloud SQL instances. Achieving Autonomic Security Operations: Anton Chuvakin and Iman Ghanizada from the Cybersecurity Action Team shared their latest blog post on how organizations can achieve Autonomic Security Operations by leveraging key learnings from SRE principles. The post highlights multiple ways automation can serve as a force multiplier to achieve better outcomes in your SOC.Certificate Manager integration with External HTTPS Load Balancing: We released the public preview of our Certificate Manager service and integration with External HTTPS Load Balancing to help simplify the way you deploy HTTPS services for your customers. You can bring your own TLS certificates and keys if you have an existing certificate lifecycle management solution or use Google Cloud’s fully managed TLS offerings. Another helpful feature of this release is integration of alerts on certificate expiry into Cloud Logging.Virtual Machine Threat Detection: The cloud is impacted by unique threat vectors but also offers novel opportunities to build effective detection into the platform natively. This dynamic underpins our latest Security Command Center Premium capability: Virtual Machine Threat Detection (VMTD). VMTD helps ensure strong protection for VM-based workloads by providing agentless memory scanning that can detect threats like cryptomining malware inside your Google Compute Engine VMs.Chrome Browser Cloud Management: A large part of enterprise security is protecting endpoints that access the web overall and a big part of this is not only using a secure browser like Chrome, but also how you get to manage and support that. We have a lot of these capabilities in Chrome Browser Cloud Management along with our overall zero trust approach. We also recently extended CIS benchmark coverage to include Chrome. Google Cloud architecture diagramming tool: We recently launched the brand new Google Cloud Architecture Diagramming Tool. This is an awesome tool for cloud architects, developers and security teams alike, and it’s another opportunity for us to be helpful in providing pre-baked reference architectures into the tools. Watch out for more on this as we build in more security patterns. Some of the Best Security Tools Might Not be “Security Tools”: Remember, there are many problems in risk management, security and compliance that don’t need specialist security tools. In fact some of the best tools might be from our data analysis and AI stacks such as our Vertex AI capability. Check out these new training features from the team. Stopping website attacks with reCAPTCHA Enterprise: reCAPTHA Enterprise is a great solution that mitigates many of the issues in the OWASP Automated Threat Handbook and can be deployed seamlessly for your website. Industry updatesOpen source software security: Just a few weeks after technology companies (including Google) and industry foundations convened at the White House summit on open source security, the OpenSSFannounced the Alpha-Omega project. The project aims to help improve software supply chain security for 10,000 OSS projects through direct engagement of software security experts and automated testing. Microsoft and Google are supporting the Alpha-Omega Project with an initial investment of $5 million. Building cybersecurity resilience in healthcare: Taylor Lehmann and Seth Rosenblatt from Google’s Cybersecurity Action team recently outlined best practices healthcare leaders can adopt to build resilience for IT systems, overcome attacks to improve both security and business outcomes, and above all, protect patient care and data.  Threat IntelligenceThreat Horizons Report Issue 2: Providing timely, actionable cloud threat intelligence to our customers so they can take action to protect their environments is critical and this is the aim of our Threat Horizons report series. Customers benefit from guidance on how to securely use and configure the cloud, which is why we operate within a “shared fate” model that exemplifies a true partnership with our customers regarding their security outcomes. In the latest Google Cybersecurity Action Team Threat Horizons Report, we observed vulnerable instances of Apache Log4j are still being sought by attackers, which requires continued vigilance by customers and cloud providers alike in ensuring patching is effective. Additionally, Google Cloud Threat Intelligence has observed that the Sliver framework is being used by adversaries post initial compromise in attempts to ensure they maintain access to networks. Check out the fullreport for this month’s findings and best practices you can adopt to stay protected against these and other evolving threats.ControlsAssured Workloads for EU: Organizations around the world need confidence they can meet their unique and evolving needs for security, privacy, and digital sovereignty as they use cloud services. Assured Workloads for EU, now GA, allows GCP customers to create and maintain workloads with data residency in their choice of EU Google Cloud regions, personnel access and customer support restricted to EU persons located in the EU, and cryptographic control over data access using encryption keys stored outside Google Cloud infrastructure.Client Authorization for gRPC Services with Traffic Director: One way developers use the open source gRPC framework is for backend service-to-service communications. The latest release of Traffic Director now supports client authorization by proxyless gRPC services. This release, in conjunction with Traffic Director’s capability for managing mTLS credentials for Google Kubernetes Engine (GKE) enables customers to centrally manage access between workloads using Traffic Director.Don’t forget to sign-up for our newsletter if you’d like to have our Cloud CISO Perspectives post delivered every month to your inbox. We’ll be back next month with more updates and security-related news.Related ArticleCloud CISO Perspectives: January 2022Google Cloud CISO Phil Venables shares his thoughts on the latest security updates from the Google Cybersecurity Action Team.Read Article
Quelle: Google Cloud Platform

Four ways Google Cloud Marketplace simplifies buying cloud software

Discovering, purchasing, and managing solutions on Google Cloud has never been easier thanks to Google Cloud Marketplace, where you can explore thousands of enterprise-ready products and services that integrate with Google Cloud. Marketplace simplifies buying solutions from Google and top software vendors as all purchases are seamlessly added to your existing Google Cloud invoice, so you receive just one bill from Google. We’re continuously improving how our customers can evaluate, procure, and manage cloud software online because we know that’s increasingly how you want to buy. In fact, in 2021 our marketplace third-party transaction value was up more than 500% YoY from 2020 (Q1-Q3). Let’s revisit the top four improvements we’ve made to the cloud software buying and selling experience in just the last few months that have accelerated buying momentum.1. Find what you’re looking for fasterWith enhanced filters, quickly discover top solutions ready to run in your Google Cloud instance. Filters are now more intuitive, allowing you to browse solutions by industry, type, category, use case, and more. Check out the “free trial” filter if you want to try a solution before buying. And if you’re looking for inspiration, we surface popular, new, and featured products across categories. Once you’ve found what you’re looking for, you’ll benefit from more detailed product cards to help you evaluate what’s best for your organization. Explore the Google Cloud Marketplace catalog to find innovative solutions to your business problems.2. Buy and sell the way you wantWe’ve added new subscription models and payment schedules, making it simpler than ever to save money and meet your organization’s procurement needs. Through Google Cloud Marketplace, you can negotiate with third-party vendors and retire Google Cloud committed spend on first and third-party purchases. Many of our software vendors now have the ability to offer flexible subscription models, including flat fees, usage-based, hybrid flat fee + usage, and committed use discounts that customers can pay for monthly or up front. These partners can now also include standard or customized terms to meet your organization’s procurement needs. Learn more about initiating and accepting customized offers. And to make buying easier in large organizations, billing admins can now set up a procurement governance workflow that allows end-users to submit procurement requests for new SaaS products directly from Google Cloud Marketplace.For the many software vendors accelerating growth via the Google Cloud Marketplace, in addition to these new pricing tools and the partner investments we announced last month, we’ve also broadened migration to Producer Portal, our new offer publishing, deal-making, and analytics portal. This new experience makes many of the new pricing features possible. Plus we’ve improved partner disbursements to set you up for success as your marketplace deal flow grows – all in pursuit of being the easiest cloud to go-to-market with. If you’re interested in selling your enterprise-grade solution to Google Cloud customers, here’s how to get started.3. Manage purchases convenientlyAfter purchasing a great solution, you’ve got to set up and manage it. We’ve also made a few improvements to speed up your post-purchase time-to-value: The new Your Orders page provides a unified experience where you can view and manage all third-party subscription purchases under a single billing account. Your Products helps developers easily find, access, and receive relevant updates for the products that they used or are available to them within their project so they always have them at their fingertips.Plus we’ve simplified SaaS setup for admins with the Service Account Provisioning UI. It used to be that the software vendor had to manually share service account details with you, and someone on your end needed to provide access via command-line interface (CLI). We’ve made this easier: software vendors can now get the access their product needs with just a few clicks. And if you’re a SaaS vendor looking to speed up customer onboarding, read how you can incorporate this feature into your product. 4. Maintain control over your solutions – easilyOnce customers build or buy, establishing and enforcing standards and controls across your cloud landscape can be a headache, especially in large organizations. Luckily, Google Cloud Marketplace has Service Catalog functionality built-in to make teams more productive by helping you efficiently distribute approved solution configurations across your organization. Enforcing governance policies in-console with Service Catalog not only simplifies compliance, but it also saves engineering time by reducing manual configuration steps and post-build reviews. And now with the ability to publish Terraform configurations, Service Catalog allows for a consistent, flexible approach that reduces the need for organizations to learn multiple infrastructure-as-code (IaC) solutions. Ready to get started? Be up-and-running fast with our Service Catalog quickstart guide.We know you’ll love these enhancements, and we’re not slowing down. We can’t wait to share more about what we’re working on soon.Related ArticleGoogle Cloud doubles-down on ecosystem in 2022 to meet customer demandGoogle Cloud will double spend in its partner ecosystem over the next few years, including new benefits, incentives, programs, and training.Read Article
Quelle: Google Cloud Platform

Google Cloud Data Heroes Series: Meet Lynn, a cloud architect equipping bioinformatic researchers with genomic-scale data pipelines on GCP

Google Cloud Data Heroes is a series where we share stories of the everyday heroes who use our data analytics tools to do amazing things. Like any good superhero tale, we explore our Google Cloud Data Heroes’ origin stories, how they moved from data chaos to a data-driven environment, what projects and challenges they are overcoming now, and how they give back to the community.Lynn Langit rides her bike in the middle of a snowy Minnesota winterFor our first issue, we couldn’t be more excited to introduce Google Cloud Data Heroine Lynn Langit. Lynn is a seasoned business woman in Minnesota beginning her eleventh year as the Founder of her own consulting business, Lynn Langit Consulting LLC. Lynn wears many data professional hats including Cloud Architect, Developer, and Educator. If that wasn’t already a handful, she also loves riding her bike at any given season of the year (pictured on the right), which you might imagine gets a bit challenging when you have to invest in bike studded snow tires!Tell us how you got to be a data practitioner. What was that experience like and how did this journey bring you to GCP?I worked on the business side of tech for many years. While I enjoyed my work, I found I was intrigued by the nuanced questions practitioners could ask – and the sophisticated decisions they could make – once they unlocked value from their data. This initial intrigue developed into a strong curiosity and I ultimately made the switch from business worker to data practitioner over 15 years ago. This was a huge change in career considering I got my bachelor’s degree in Linguistics and German. And so I started small. I taught myself most everything both at the beginning and even now through online resources, courses, and materials. I began with database and data warehousing, specifically building and tuning many enterprise databases. It wasn’t until Hadoop/NoSQL became available that I pivoted to Big Data…Back then, I supplemented my self-paced learning with Microsoft technologies, even earning all Microsoft certifications in just one year. When I noticed the industry shifting from on premise to cloud, I shifted my learning from programming to cloud, too. I have been working in the public cloud for over ten years already!“I started with AWS, but recently I have been doing most everything in GCP. I particularly love implementing data pipelining, data ops, and machine learning.”How did you supplement your self teachings with Google Cloud data upskilling opportunities like product deep dives and documentation, courses, skills, and certificates?One of the first Google Cloud data analytics products I fell in love with was BigQuery. BigQuery was my gateway product into a much larger open, intelligent, and unified data platform full of products that combined data analytics, databases, AI/ML, and business intelligence.I’ve used BigQuery forever. It’s been amazing since it’s initial release and it keeps getting better and better. Then I discovered Dataproc and BigTable. Dataproc is my go-to for Apache Spark projects and I’ve used BigTable for several projects as well. I am also a heavy user of TensorFlow and also AutoMLI’ve achieved Skills Badges in BigQuery, Data Analysis, and more. I’ve also achieved Google’s Professional Data Engineer Certification, and have been a Google Developer Expert since 2012. Most recently, I was named one of few Data Analysis Innovator Champions within the Google Cloud Innovators Program, which I’m particularly excited about because I’ve heard it’s a coveted spot for data practitioners and necessitates a Googler nomination to move from the Innovator membership to Champion title!You’re undoubtedly a data analytics thought leader in the community. When did you know you moved from data student to data master and what data project are you most excited about? I knew I had graduated, if you will, to the data architect realm once I was able to confidently do data work that matters, even if that work was outside of my usual domains: adTech and finTech.. For example, my work over the past few years has been around human health outcomes, including combatting the COVID-19 pandemic. I do this by supporting scientists and bioinformatic researchers with genomic-scale data pipelines. Did I know anything about genomics before I started? Not at all! I self-studied bioinformatics and recorded my learnings on GitHub.  Along the way  I  adopted my learnings into an open source GCP course on GitHub aimed at researchers who are new to working with GCP. What’s cool about the course is that I begin from the true basics of how to set up a GCP account. Then I gradually work up to mapping out genomic-scale data workflows, pipelines, analyses, batch jobs, and more using BigQuery and a host of other Google Cloud data products. Now, I’ve received feedback that this repository has made a positive impact on researchers’ ability to process and synthesize enormous amounts of data quickly. Plus, it achieves the greater goal of broadening accessibility to a public cloud like GCP. In what ways do you think you uniquely bring value back to the data community? Why is it important to you to give back to the data community? I stay busy always sharing my learnings back to the community. I record Cloud and Big data technical screencasts (demos) on Youtube, I’ve authored 25 data and cloud courses on LinkedIn Learning, and I occasionally write Medium articles on cloud technology and random thoughts I have about everyday life. I’m also the cofounder of Teaching Kids Programming, with a mission to help equip middle and high school teachers with a great programming curriculum on Java.If I had to rationalize why giving back to the data community was important to me, I’d say this: I just turned 60 and I am learning cutting edge technology constantly – my latest foray is into cloud quantum computing Technology benefits us when we combine life experience with curiosity, so I feel an immense duty to keep learning and share my progress and success along the way!Begin your own hero’s journeyReady to embark on your Google Cloud data adventure? Begin your own hero’s journey with GCP’s recommended learning path where you can achieve badges and certifications along the way. Join the Cloud Innovators program today to stay up to date on more data practitioner tips, tricks, and events.Connect with Google’s data community at our upcoming virtual event “Latest Google Cloud data analytics innovations”. Register and save your spot now to get your data questions answered live by GCP’s top data leaders and watch demos from our latest products and features including BigQuery, Dataproc, Dataplex, Dataflow, and more. Lynn will take the main stage as an emcee for this event – you won’t want to miss!Finally, if you think you have a good Data Hero story worth sharing, please let us know! We’d love to feature you in our series as well.Related ArticleGoogle data experts share top data practitioner skills needed in 2022Top data analytics skills to learn in 2022 as a data practitioner, Google Cloud experts weigh in.Read Article
Quelle: Google Cloud Platform

Developing high-quality ML solutions

When a deployed ML model produces poor predictions, it can be due to a wide range of problems. It can be the result of bugs that are typical in any program—but it can also be the result of ML-specific problems. Perhaps data skews and anomalies are causing model performance to degrade over time. Or the data format is inconsistent between the model’s native interface and the serving API. If  models aren’t monitored, they can fail silently. When a model is embedded into an application, issues like this can create poor user experiences. If the model is part of an internal process, the issues can negatively impact business decision-making. Software engineering has many processes, tools, and practices to ensure software quality, all of which help make sure that the software is working in production as intended. These tools include software testing, verification and validation, and logging and monitoring. In ML systems, the tasks of building, deploying, and operating the systems present additional challenges that require additional processes and practices. Not only are ML systems particularly data-dependent because they inform decision-making from data automatically, but they’re also dual training-serving systems. This duality can result in training-serving skew. ML systems are also prone to staleness in automated decision-making systems.These additional challenges mean that you need different kinds of testing and monitoring for ML models and systems than you do for other software systems—during development, during deployment, and in production. Based on our work with customers, we’ve created a comprehensive collection of guidelines for each process in the MLOps lifecycle. The guidelines cover how to assess, ensure, and control the quality of your ML solutions. We’ve published this complete set of guidelines on the Google Cloud site. To give you an idea of what you can learn, here’s a summary of what the guidelines cover:Model development: These guidelines are about building an effective ML model for the task at hand by applying relevant data preprocessing, model evaluation, and model testing and debugging techniques. Training pipeline deployment: These guidelines discuss ways to implement a CI/CD routine that automates the unit tests for model functions and the integration tests of the training pipeline components. The guidelines also help you apply an appropriate progressive delivery strategy for deploying the training pipeline.Continuous training: These guidelines provide recommendations for extending your automated training workflows with steps that validate the new input data for training, and that validate the new output model that’s produced after training. The guidelines also suggest ways to track the metadata and the artifacts that are generated during the training process.Model deployment: These guidelines address how to implement a CI/CD routine that automates the process of validating compatibility of the model and its dependencies with the target deployment infrastructure. These recommendations also cover how to test the deployed model service and how to apply progressive delivery and online experimentation strategies to decide on a model’s effectiveness.Model serving: These guidelines concern ways to monitor the deployment model throughout its prediction serving lifetime to check for performance degradation and dataset drift. They also provide suggestions for monitoring the efficiency of model service.Model governance: These guidelines concern setting model quality standards. They also cover techniques for implementing procedures and workflows to review and approve models for production deployment, as well as managing the deployed model in production.To read the full list of our recommendations, read the document Guidelines for developing high-quality ML solutions.Acknowledgements: Thanks to Jarek Kazmierczak, Renato Leite, Lak Lakshmanan, and Etsuji Nakai for their valuable contributions to the guide.
Quelle: Google Cloud Platform

Build a data mesh on Google Cloud with Dataplex, now generally available

Democratizing data insights and accelerating data-driven decision making is a top priority for most enterprises seeking to build a data cloud. This often requires building a self-serve data platform that can span data silos and enable at-scale usage and application of data to drive meaningful business insights.  Organizations today need the ability to distribute ownership of data across teams that have the most business context, while ensuring that the overall data lifecycle management and governance is consistently applied across their distributed data landscape.Today we are excited to announce the general availability of Dataplex, an intelligent data fabric that enables you to centrally manage, monitor, and govern data across data lakes, data warehouses, and data marts, and make this data securely accessible to a variety of analytics and data science tools.With Dataplex, enterprises can easily delegate ownership, usage, and sharing of data, to data owners who have the right business context, while still having a single pane of glass to consistently monitor and govern data across various data domains in their organization. With built-in data intelligence, Dataplex automates the data discovery, data lifecycle management, and data quality, enabling data productivity and accelerating analytics agility.  Here is what some of our customers have to say, “We have PBs of data stored in GCS and BigQuery in GCP, accessed by 1000s of internal users daily” said Saral Jain, Director of Engineering, Snap Inc. “Dataplex enables us to deliver a business domain specific, self-service data platform across distributed data, with de-centralized data ownership but centralized governance and visibility. It significantly reduces the manual toil involved in data management, and automatically makes this data queryable via both BigQuery and open source applications. We are very excited to adopt Dataplex as a central component for building a unified data mesh across our analytics data.”“As the central data team at Deutsche Bank, we are building a data mesh to standardize data discovery, access control and data quality across the distributed domains,” said Balaji Maragalla, Director Big Data Platform at Deutsche Bank. “To help us on this journey, we are excited to use Dataplex to enable centralized governance for our distributed data. Dataplex formalizes our data mesh vision and gives us the right set of controls for cross-domain data organization, data security, and data quality.”“As one of the largest entertainment companies in Japan, we generate TBs of data everyday and use it to make business critical decisions”,  said Iwao-san, Director of Data Analytics at DeNA. “While we manage each product independently as a separate domain, we want to centralize governance of data across our products. Dataplex enables us to effectively manage and standardize data quality, data security, and data privacy for data across these domains. We are looking forward to building trust in our data with Google Cloud’s Dataplex.”One of the key use cases that Dataplex enables is a data mesh architecture. Let’s take a closer look at how you can use Dataplex as the data fabric that enables a data mesh. What is a Data Mesh?With enterprise data becoming more diverse and distributed, and the number of tools and users that need access to this data growing, organizations are moving away from monolithic data architectures that are domain agnostic. While monolithic, centrally managed architectures create data bottlenecks and impact analytics agility, a completely decentralized architecture where business domains maintain their own purpose-built data lakes also has its pitfalls and results in data duplication and silos, making governance of this data impossible. Per Gartner, Through 2025, 80% of organizations seeking to scale digital business will fail because they do not take a modern approach to data and analytics governance.The data mesh architecture, first proposed in this paper by Zamak Deghani, describes a modern data stack that moves away from a monolithic data lake or data warehouse architecture to a distributed domain-specific architecture that enables autonomy of data ownership, provides agility with decentralized domain aware data management while providing the ability to centrally govern and monitor data across domains. To learn more, refer to this Build a Modern Distributed Data Mesh Whitepaper.  How to make Data Mesh real with Google Cloud Dataplex provides a data management platform to easily build independent data domains within a data mesh that spans your organization while still maintaining central controls for governing and monitoring the data across domains. “Dataplex is embodying the principles of Data Mesh as we have envisioned in Adeo. Having a first party, cloud-native, product to architect a Data Mesh in GCP is crucial for effective data sharing and data quality amongst teams. Dataplex streamlines productivity, allowing teams to build data domains and orchestrate data curation across the enterprise. I only wish we had Dataplex three years ago.” —Alexandre Cote, Product Leader with ADEOImagine you have the following domains in your organization,With Dataplex you can logically organize your data and related artifacts such as code, notebooks, and logs, into a Dataplex Lake which represents a data domain. You can model all the data in a particular domain as a set of Dataplex Assets within a lake without physically moving data or storing it into a single storage system. Assets can refer to Cloud Storage buckets and BigQuery datasets, stored in multiple Google Cloud projects, and manage both analytics and operational data, structured and unstructured data that logically belongs to a single domain. Dataplex Zones enable you to group assets and add structure that capture key aspects of your data – its readiness, the workloads it is associated with, or the data products it is serving.  The lakes and data zones in Dataplex enable you to unify distributed data and organize it based on the business context. This forms the foundation for managing metadata, setting up governance policies, monitoring data quality and so on, giving you the ability to manage your distributed data at scale.  Now let’s take a look at one of the domains in a little more detail.Automatically discover metadata across data sources: Dataplex provides metadata management and cataloging that enables all members of the domain to easily search, browse and discover the tables and filesets as well as augment them with business and domain-specific semantics. Once data is added as assets, Dataplex automatically extracts associated metadata and keeps it up-to-date as data evolves. This metadata is made available for search, discovery, and enrichment via integration with Data Catalog.Enable interoperability of tools: The metadata curated by Dataplex is automatically made available as runtime metadata to power federated open source analytics via Apache SparkSQL, HiveQL, Presto, and so on. Compatible metadata is also automatically published as external tables in BigQuery to enable federated analytics via BigQuery. Govern data at scale: Dataplex enables data administrators and stewards to consistently and scalably manage their IAM data policies to control data access across distributed data. It provides the ability to centrally govern data across domains while enabling autonomous and delegated ownership of data. It provides the ability to manage reader/writer permissions on the domains and the underlying physical storage resources. Dataplex integrates with Stackdriver to provide observability including audit logs, data metrics and logs.Enable access to high quality data: Dataplex provides built-in data quality rules that can automatically surface issues in your data. You can run these rules as data quality tasks across your data in BigQuery and GCS. One-click data exploration: Dataplex enables data engineers, data scientists and data analysts with a built-in, self-serve, serverless data exploration experience to interactively explore data and metadata, iteratively develop scripts, and deploy and monitor data management workloads. It provides content management across SQL scripts and Jupyter notebooks that makes it easy to create domain-specific code artifacts and share or schedule them from that same interface. Data management: You can also leverage the built-in data management tasks that address common tasks such as tiering, archiving or refining data. It integrates with Google Cloud’s native data tools such as Dataproc Serverless, Dataflow, Data Fusion, and BigQuery to provide an integrated data management platform. With the collective of data, metadata, policies, code, interactive and production analytics infrastructure, and data monitoring, Dataplex delivers on the core value proposition of a data mesh: data as the product.“Consistent data management and governance of distributed data remains a top priority for most of our clients today. Dataplex enables a business-centric data mesh architecture and significantly lowers the administrative overhead associated with managing, monitoring, and governing distributed data. We are excited to collaborate with the Dataplex team to enable enterprise clients to be more data-driven and accelerate their digital transformation journeys.”—Navin Warerkar, Managing Director, Deloitte Consulting LLP, and US Google Cloud Data & Analytics GTM LeaderNext stepsGet started with Dataplex today by using this quickstart guide, this data mesh tutorial or contact the Google Cloud sales team.Related ArticleIntroducing Dataplex—an intelligent data fabric for analytics at scaleDataplex unifies distributed data to help automate data management and power analytics at scale.Read Article
Quelle: Google Cloud Platform

Google data experts share top data practitioner skills needed in 2022

It’s 2022 and nanosatellites, NFTs, and autonomous cars that deliver your pizza are in full force. In a world where people rely on simple technology to untangle complex problems, companies must deliver simple experiences to be successful in today’s landscape. For many cloud providers this means enabling tightly integrated data offerings that simplify the data delivery process without losing sight of the sophisticated needs of the modern data consumer. But while the name of the game is helping companies reach informed decisions from their data simpler and faster, what about the data practitioners – data analysts, data engineers, database administrators, developers, etc – who use these cloud data tools and technologies everyday? To proactively stay ahead of data cloud market trends in 2022 should data practitioners invest their time in specializing their data cloud skill sets (e.g. go deep in, say, data pipelining skills) or instead invest their time generalizing their data cloud skill sets (e.g. growing proficiencies in a mix of data analytics, databases, AI/ML, and more domains)? Skill deep or wide with data – that is the questionFor Abdul Razack, VP, Solutions Engineering, Technology Solutions and Strategy at Google Cloud, the answer is a bit of both. “Data practitioners need to be broad in terms of their technology skills, but specialized with respect to the domain or domains in which they apply them. The reason why is because many things that used to be separate skill sets are now converging – like business analytics, streaming, machine learning, data pipelines, and data warehousing. Data practitioners need to be able to implement end-to-end workflows that solve specific business problems using skills from each category.”It’s true, thousands of customers are choosing Google’s data cloud because it offers a unified and open approach to cloud that enables their practitioners to break down silos, begin and end projects without leaving the data platform, and innovate faster across their organization. The data practitioners who mirror Google data cloud’s frame of mind of being smart and agile across data domains in their skilling and learning will reap the benefits of solving more nuanced problems – building out internet-scale applications, fine tuning smart processes with analytics and AI, constructing data meshes that make product building simple, etc – at a larger scale than they would if they specialized in just one or two areas alone.Of course at the end of the day it depends on what tools a data practitioner is using to complete their workflows. There’s only so much you can learn and skills you can develop when you’re using limited tools.“Of course at the end of the day it depends on what tools a data practitioner is using to complete their workflows. There’s only so much you can learn and skills you can develop when you’re using limited tools. Growing data proficiencies across the board is made a lot easier when you’re using a data platform like BigQuery to address all these needs. BigQuery eliminates the choices you have to make – for instance you don’t have to choose between streaming data and data at rest, batch and realtime, or business intelligence and data science. This freedom gives data professionals a huge advantage when they’re building their skill sets and taking on more complex projects.” -Abdul Razack – VP, Solutions Engineering, Technology Solutions and Strategy, Google CloudKnowing your value is half the battle when upskillingWhile some experts think technology is the limiting factor of whether or not you can even go wide or go deep in the first place, others like Google Cloud’s Head of Data and Analytics Bruno Aziza purport that it also depends on who you are, who you wish to be, and what investments your company is making to ensure you can become that person. “If you wish to set yourself up to be a Chief Data Officer, then you’ll want to understand how technologies fit together across your data estate first” said Aziza. “Only after you feel like you’re the go-to ‘data person’ can you then decide which part of the technology stack you want to double-down on.”Only after you feel like you’re the go-to ‘data person’ can you then decide which part of the technology stack you want to double-down on.But technology isn’t everything. Aziza notes, “Make sure you focus on the  business impact that your data work provides.  You want to spend as much time as you can with your business counterparts to understand their business goals and challenges. The Harvard Business Review provides great guidance on how to succeed as a Chief Data Officer.”Even if you don’t have your sights set on a C-suite role, both Aziza and Razack contend that the number one skill data practitioners should tackle in 2022 is actually a broad and perhaps abstract one: develop and exercise the curiosity to solve problems with a data-driven strategy. That is, today’s data practitioners should always be interested in educating themselves in the industry and continually upskilling in something. And their employers should also be invested in helping practitioners develop those interests, most likely through exposure to learning materials, engaging in career conversations, subsidized courses, or incentives attached to pursuing a new certification or skill.  “Every industry is going through a digital transformation and the ability to identify what data to collect, how to prepare the data, and how to derive insights from it is critical. Therefore, the ability to find business challenges and formulate a data-driven approach to address those problems is the most important skill to have.” Abdul Razack – VP, Solutions Engineering, Technology Solutions and Strategy, Google CloudWhether you’re a data engineer, data analyst, citizen data scientist, or data practitioner by any other name, asking more questions and being curious to learn more should be that thing that you gravitate towards in those spare moments…Be a constant learner. New concepts pop up all the time and you want to be the person who can learn the fastest so you can advance your company’s mission and contribute back to the community.Take the example of the “Data Mesh” I just wrote about in VentureBeat.  You’ll find 3 types of attitudes towards this new concept. There are Disciples who encourage continued learning only from the source – like the author of a new book or the creator of a theory. There are Distractors who tell you that new skills, trends, and technologies are fake news. And there are Distorters like vendors who will sell you one easy fix solution. But it’s the data practitioner who needs to proceed with caution when interacting with all three types  and forge their own path to discovering the truth when they’re learning and building skills. And for better or worse, this comes with trial and error, experimentation, and an eagerness to grow relative to where they began.”Ready to start data upskilling? Start here.For those interested in keeping up their data curiosities, check out our Data Journeys video series. Each week Bruno Aziza investigates a new authentic customer’s data journey – from migrating to cloud or  building a data platform to carrying out new data for good initiatives. Learn how they did it, their data dos and don’ts, and what’s next for them on their journey. These videos include a flavor of both specializing your data competencies and broadening your data competencies. For those interested in deepskilling, connect with Google’s data community at our upcoming virtual event: Latest Google Cloud data analytics innovations. Register and save your spot now to get your data questions answered live by GCP’s top data leaders and watch demos from our latest products and features including BigQuery, Dataproc, Dataplex, Dataflow, and more.If you have any questions or need support along your learning journey – we’re here for you! Sign up to be a Google Cloud Innovator, and join the Google Cloud Data Analytics Community.Related ArticleThe top three insights we learned from data analytics customers in 2021Google Cloud announces the top data analytics stories from 2021 including the top three trends and lessons they learned from customers th…Read Article
Quelle: Google Cloud Platform

Scaling to new heights with Cloud Memorystore and Envoy

Modern applications need to process large-scale data at millisecond latency to provide experiences like instant gaming leaderboards, fast analysis of streaming data from millions of IoT sensors, or real-time threat detection of malicious websites. In-memory datastores are a critical component to deliver the scale, performance, and availability required by these modern applications. Memorystore makes it easy for developers building applications on Google Cloud to leverage the speed and powerful capabilities of the most loved in-memory store: Redis. Memorystore for Redis Standard Tier instances are a popular choice for applications requiring a highly available Redis instance. Standard Tier provides a failover replica across zones for redundancy and provides fast failover with a 99.9% SLA. However, in some cases, your applications may need to scale beyond the limitations of a single Standard Tier instance. Read replicas allow you to scale to a higher read throughput, but your application may require higher write throughput or a larger keyspace size as well. In these scenarios, you can implement a strategy to partition your cache usage across multiple independent Memorystore instances which is known as client-side sharding. In this post, we’ll discuss how you can implement your own client-side sharding strategy to scale infinitely with Cloud Memorystore and Envoy. Architectural Overview Let’s start by discussing an architecture of GCP native services alongside open-source software which can scale Cloud Memorystore beyond its usual limits. To do this, we’ll be sharding a cache such that the total keyspace is split among multiple otherwise independent Memorystore instances. Sharding can pose challenges to client applications which must then be rewritten for awareness of the appropriate place to search for a specific key and must be updated to scale the backend. However, client-side sharding can be easier to implement and maintain by encapsulating the sharding logic in a proxy, allowing your application and sharding logic to be updated independently. You’ll find a sample architecture below and we’ll briefly detail each of the major components.Memorystore for Redis Cloud Memorystore for Redis enables GCP users to quickly deploy a managed Redis instance within a GCP project. A single node Memorystore instance can support a keyspace as large as 300 GB and a maximum network throughput of 16gbps. With Standard Tier you get a highly available Redis instance with built in health checks and fast automatic failover.Today, we’ll show you how to deploy multiple Standard Tier Cloud Memorystore instances which can be used together to scale beyond the limits of a single instance for an application with increased scale demands. Each individual Memorystore instance will be deployed as a standalone instance that is unaware of the other instances within its shared host project. In this example, you’ll deploy three Standard Tier instances which will be treated as a single unified backend. By using Standard Tier instances instead of self-managed Redis instances on GCE, you get the benefit of: Highly available backends: Standard Tier provides high availability without requiring any additional work from you. Enabling high availability on self-managed Redis instances on GCE can add additional complexities and failure points.Integrated monitoring: Memorystore is integrated with Cloud Monitoring and you can easily monitor the individual shards using Cloud Monitoring, compared to having to deploy and manage monitoring agents on self managed instancesMemtier Benchmark Memtier Benchmark is a commonly used command line utility for load generation and benchmarking of key-value databases. You will deploy and use this utility to demonstrate the ability to easily scale to high query volume. Similar benchmarking tools or your own Redis client application could be used instead of Memtier Benchmark. Envoy​​Envoy is an open-source network proxy designed for service oriented architectures. Envoy supports many different filters which allow it to support network traffic from many different software applications and protocols. For this use case, you will deploy Envoy with the Redis filter configured. Rather than connecting directly to Memorystore instances, the Redis clients will connect to the Envoy proxy. By appropriately configuring Envoy, you can take a collection of independent Memorystore instances and define them as a cluster where inbound traffic will be load balanced among the individual instances. By leveraging Envoy, you decrease the likelihood of needing a significant application rewrite to leverage more than one Memorystore instance for higher scale. To ensure compatibility with your application, you’ll want to review the list of the Redis commands which Envoy currently supports.  Let’s get started. PrerequisitesTo follow along with this walkthrough, you’ll need a GCP project with permissions to do the following: Deploy Cloud Memorystore for Redis instances (permissions)Deploy GCE instances with SSH access (permissions)Cloud Monitoring viewer access (permissions) Access to Cloud Shell or another gCloud authenticated environment Deploying the Memorystore Backend You’ll start by deploying a backend cache which will serve all of your application traffic. As you’re looking to scale beyond the limits of a single node, you’ll deploy a series of Standard Tier instances. From an authenticated cloud shell environment, this can be done as follows:$ for i in {1..3}; do gcloud redis instances create memorystore${i} –size=1 –region=us-central1 –tier=STANDARD –async; doneIf you do not already have the Memorystore for Redis API enabled in your project, the command will ask you to enable the API before proceeding. While your Memorystore instances deploy, which typically takes a few minutes, you can move onto the next steps. Creating a Client and Proxy VM Next, you need a VM where you can deploy a Redis client and the Envoy proxy. You’ll be creating a single GCE instance where you deploy these two applications as containers. This type of deployment is referred to as a “sidecar architecture” which is a common Envoy deployment model. Deploying in this fashion nearly eliminates any added network latency as there is no additional physical network hop that takes place. While you are deploying a single vertically scaled client instance, in practice, you’ll likely deploy many clients and proxies, so the steps outlined in the following sections could be used to create a reusable instance template or repurposed for GKE. You can start by creating the base VM: $ gcloud compute instances create envoy-memtier-client –zone=us-central1-a –machine-type=e2-highcpu-32 –image-family cos-stable –image-project cos-cloud We’ve opted for a Container-Optimized OS instance as you’ll be deploying Envoy and Memtier Benchmark as containers on this instance. Configure and Deploy the Envoy Proxy Before deploying the proxy, you need to gather the necessary information to properly configure the Memorystore endpoints. To do this, you need the host IP addresses for the Memorystore instances you have already created. You can gather these programmatically: $ for i in {1..3}; do gcloud redis instances describe memorystore${i} –region us-central1 –format=json | jq -r “.host”; doneCopy these IP addresses somewhere easily accessible as you’ll use them shortly in your Envoy configuration. Next, you’ll need to connect to your newly created VM instance, so that you can deploy the Envoy Proxy. You can do this easily via SSH in the Google Cloud Console. More details can be found here.After you have successfully connected to the instance, you’ll create the Envoy configuration. Start by creating a new file named envoy.yaml on the instance with your text editor of choice. Use the following .yaml file, entering the three IP addresses of the instances you created:The IP addresses need to be inserted into the highlighted portions of each endpoint configuration near the bottom of the file. If you chose to create a different number of Memorystore instances, simply add or remove endpoints from the configuration file. Before you move on, take a look at a few important details of the configuration: We’ve configured the Redis Proxy filter to support the Redis traffic which you’ll be forwarding to Cloud Memorystore We’ve configured the Envoy proxy to listen for client Redis traffic on port 6379 We’ve chosen MAGLEV as the load balancing policy for the Memorystore instances which make up the client-side sharded cluster. You can learn more about the various types of load balancing available here. Scaling up and down the number of Memorystore backends requires rebalancing data and configuration changes which are not covered in this tutorial.Once you’ve added your Memorystore instance IP addresses, save the file locally to your container OS VM where it can be easily referenced. Now, you’ll use Docker to pull the official Envoy proxy image and run it with your own configuration. $ docker run –rm -d -p 8001:8001 -p 6379:6379 -v $(pwd)/envoy.yaml:/envoy.yaml envoyproxy/envoy:v1.21.0 -c /envoy.yaml Now that Envoy is deployed, you can test it by visiting the admin interface from the container VM: $ curl -v localhost:8001/stats If successful, you should see a print out of the various Envoy admin stats in your terminal. Without any traffic yet, these will not be particularly useful, but they allow you to ensure that your container is running and available on the network. If this command does not succeed, we recommend checking that the Envoy container is running. Common issues include syntax errors within your envoy.yaml and can be found by running your Envoy container interactively and reading the terminal output. Deploy and Run Memtier Benchmark While you’re still ssh’ed into the container OS VM, you will also deploy the Memtier Benchmark utility which you’ll use to generate artificial Redis traffic. Since you are using Memtier Benchmark, you do not need to provide your own dataset. The utility will populate the cache for you using a series of set commands. You can run a series of benchmark tests: $ for i in {1..15}; do docker run –network=”host” –rm -d redislabs/memtier_benchmark:1.3.0 -s 127.0.0.1 -p 6379 –test-time=300 –key-maximum=10000; doneHere are some configuration options of note: If you have configured Envoy to listen on another port, specify the appropriate port after the `-p` flagWe have chosen to run the benchmark for a set period of time (5 minutes, specified in seconds)  by using the –test-time flag rather than a set number of requests which is the default behavior. By default, the utility uses a uniform random pattern for getting and setting keys. You will not modify this, but it can be specified using the –keypattern flag.The utility works by performing gets and sets based on the minimum and maximum values of the key range as well as the specified key pattern which we just discussed. We will decrease the size of this key range by setting the –key-maximum parameter. This allows us to ensure a higher cache hit ratio which is more representative of most real world applications. The –ratio flag allows us to modify the set to get ratio of commands issued by the utility. By default, the utility issues 10 get commands for every set command. You can easily modify this ratio to better match your workload’s characteristics.   You can increase the load generated by the utility by increasing the number of threads with the `–threads` flag and/or by increasing the number of clients per thread with the `–clients` flag. The above command uses the default number of threads (4) and clients (50). Observe the Redis TrafficOnce you have kicked off the load tests, you can confirm that traffic is being balanced across the individual Memorystore instances via Cloud Monitoring. You can easily set up a custom dashboard that shows the Calls per minute for each of the Memorystore instances. Let’s start by navigating to the Cloud Monitoring Dashboards page. Next, you’ll click “Create Dashboard”. You will see many different types of widgets on the left side of the page which can be dragged onto the canvas on the right side of the page. You’ll select a “Line” chart and drag it onto the canvas. You then need to populate the line chart with data from the Memorystore instances. To do this, you’ll configure the chart via “MQL” which can be selected at the top of the chart configuration pane. For ease, we’ve created a query which you can simply paste into your console to populate your chart:If you have created your Memorystore instances with a different naming convention or have other Memorystore instances within the same project, you may need to modify the resource.instance_id filter. Once you’re finished, ensure that your chart is viewing the appropriate time range, and you should see something like:You should see nearly perfect distribution of the client workload across the Memorystore instances, effectively allowing infinite horizontal scalability for demanding workloads. More details on creating and managing custom dashboards can be found here. As you modify the parameters of your own testing, you’ll also want to keep the performance of the client and proxy in mind. As you vertically scale the number of operations sent by a client, you’ll eventually need to horizontally scale the number of clients and sidecar proxies which you have deployed to scale smoothly. You can view the Cloud Monitoring graphs for GCE instances as well. More details can be found here. Clean Up If you have followed along, you’ll want to spend a few minutes cleaning up resources to avoid accruing unwanted charges. You’ll need to delete the following: Any deployed Memorystore instances Any deployed GCE instancesMemorystore instances can be deleted like: $ gcloud redis instances delete <instance-name> –region=<region>If you followed the tutorial, you can use a command like: $ for i in {1..3}; do gcloud redis instances delete memorystore${i} –region=us-central1 –async; doneNote: You’ll need to manually acknowledge the deletion of each instance via the terminal The GCE container OS instance can be deleted like: $ gcloud compute instances delete <instance-name>If you created additional instances, you can simply chain them in a single command separated by spaces. Conclusion Client-side sharding is one strategy to address high scale use cases with Cloud Memorystore. Envoy and its Redis filter make implementation simple and extensible. The outline provided above is a great place to get started. These instructions can easily be extended to support other client deployment models including GKE and can be scaled out horizontally to reach even higher scale. As always, you can learn more about Cloud Memorystore through our documentation or request desired features via our public issue tracker.Related ArticleGet 6X read performance with Memorystore for Redis Read ReplicasMemorystore for Redis supports Read Replicas preview, allowing you to scale up to five replicas and achieve over one million read request…Read Article
Quelle: Google Cloud Platform

Cloud Spanner myths busted

Intro to Cloud SpannerCloud Spanner is an enterprise-grade, globally distributed, externally consistent database that offers unlimited scalability and industry-leading 99.999% availability. It requires no maintenance windows and offers a familiar PostgreSQL interface. It combines the benefits of relational databases with the unmatched scalability and availability of non-relational databases. As organizations modernize and simplify their tech stack, Spanner provides a unique opportunity to transform the way they think about and use databases as part of building new applications and customer experiences.But choosing a database for your workload can be challenging; there are so many options in the market and each one has a different onboarding and operating experience. At Google Cloud we know it’s hard to navigate this choice and are here to help you. In this blog post, I want to bust the seven most common misconceptions that I regularly hear about Spanner so that you can confidently make your decision.Myth #1 Only use Spanner if you have a massive workloadThe truth is that Spanner powers Google’s most popular, globally available products, like YouTube, Drive, and Gmail, and has enabled many large scale transformations including that of Uber, Niantic and Sharechat. It is also true that Spanner processes more than 1 Billion queries per second at peak.At the same time, many customers also use Spanner for their smaller workloads (both in terms of transactions per second and storage size) for availability and scalability reasons. For example, Google Password Manager has small workloads that run on Spanner. These customers cannot tolerate downtime, require high availability to power their applications and seek scale insurance for future growth scenarios.Limitless scalability with the highest availability is critical in many industry verticals such as gaming and retail, especially when a newly launched game goes viral and becomes an overnight success or when a retailer has to handle a sudden surge in traffic due to a  Black Friday/Cyber Monday sale.  Regardless of workload size, every customer on the journey to the cloud wants the benefits of scalability and availability while reducing the operational burden and the costs associated with patching, upgrades and other maintenance.Myth #2 Spanner is too expensiveThe truth is, when looking at the cost of a database, it is better to consider Total Cost of Ownership (TCO) and the value it offers rather than the raw list price. We deliver significant value to our customers starting at this price including critical things like availability, price performance, and reduced operational costs. Availability: Spanner provides high availability and reliability by synchronously replicating data. When it comes to Disaster Recovery, Spanner offers 0-RPO and 0-RTO for zonal failures in case of a regional instance and regional failure in case of multi-regional instances. Less downtime, more revenue!Price-performance: Spanner offers one of the industry’s leading price-performance ratios which makes it a great choice if you are running a demanding, performance sensitive application. Great customer experiences require consistent, optimal latencies!Reduced operational cost: With Spanner, customers enjoy zero downtime upgrades and schema changes, and no maintenance windows. Sharding is automatically handled so the challenges associated with scaling up traditional databases don’t exist. Spend more time innovating, and less time administering!Security & Compliance: By default, Spanner already offers encryption for data-in-transit via its client libraries and for data-at-rest using Google-managed encryption keys. CMEK support for Spanner lets you now have complete control of the encryption keys. Spanner also provides VPC Service Controls support and has compliance certifications and necessary approvals so that it can be used for workloads requiring ISO 27001, 27017, 27018, PCI DSS, SOC1|2|3, HIPAA and FedRAMP.With Spanner, you have peace of mind knowing that your data’s security, availability and reliability won’t be compromised.And best of all, with the introduction of Granular Instance Sizing, you can now get started for as little as $65/month and unlock the tremendous value spanner offers.Pro tip : Use the auto-scaler to right size your Spanner instances. Take advantage of TTL to reduce the amount of data stored.Myth #3 You have to make a trade off between scale, consistency, and latencyThe truth is, depending on the use case and instance configuration, users can use Spanner such that they don’t have to pick between consistency, latency and scale.To provide strong data consistency, Spanner uses a synchronous, Paxos-based replication scheme, in which replicas acknowledge every write request. A write is committed when a majority of the replicas (e.g 2 out of 3), called a quorum, agree to commit the write. In the case of regional instances, the replicas are within the region and hence the writes are faster than in the case of multi-region instances, where the replicas are distributed across multiple regions. In the latter case, forming a quorum on writes can result in slightly higher latency. Nevertheless, Spanner multi-regions are carefully designed in geographical configurations that ensure that the replicas can communicate fast enough and write latencies are acceptably low.A read can be served strong (by default) or stale. A strong read is a read at a current timestamp and is guaranteed to see all the data that has been committed up until the start of the read. A stale read is a read executed at a timestamp in the past. In case of a strong read, the serving replica ​​will guarantee that you will see all data that has been committed up until the start of the read. In some cases, this means that the serving replica has to contact the leader to ensure that it has the latest data. In case of a multi-region instance where the read is served from a non-leader replica, this would mean that read latency can be slightly higher than if it was served from a leader region. Stale reads are performed over data that was committed at a  timestamp in the past and can, therefore, be served at very low latencies by the closest replica that is caught up until that timestamp. If your application is latency sensitive, stale reads may be a good option and we recommend using a stale read value of 15 seconds. Myth #4 Spanner does not have a familiar interfaceThe truth is that Spanner offers the flexibility to interact with the database via a SQL dialect based on ANSI 2011 standard as well as via a REST or gRPC API interface, which are optimized for performance and ease-of-use. In addition to Spanner’s interface, we recently introduced a PostgreSQL interface for Spanner, that leverages the ubiquity of PostgreSQL to meet development teams using an interface that they are familiar with. The PostgreSQL interface provides a rich subset of the open-source PostgreSQL SQL dialect, including common query syntax, functions, and operators. It supports a core collection of open-source PostgreSQL data types, DDL syntax, and information schema views. You get the PostgreSQL familiarity, and relational semantics at Spanner scale. Learn more about our PostgreSQL interface here.Myth #5 The only way to get observability data is via the Spanner Console​​The truth is that Spanner client libraries support OpenCensus Tracing and Metrics, which gives insight into the client internals and aids in debugging production issues. For instance, client-side traces and metrics include sessions and transactions related information. Spanner also supports the OpenTelemetery receiver, which provides an easy way for you to process and visualize metrics from Cloud Spanner System tables, and export these to the Application Monitoring (APM) tool of your choice. This could be either an open source combination of a time-series database like Prometheus coupled with a Grafana dashboard, or it could be a commercial offering like Splunk, Datadog, Dynatrace, NewRelic or AppDynamics. We’ve also published reference Grafana dashboards, so that you can debug the most common user journeys such as “Why is my tail latency high” or “Why do I see a CPU spike when my workload did not change”. Here is a sample docker service, to show how the Cloud Spanner receiver can work with Prometheus exporter and Grafana dashboards.We are continuing to embrace open standards, and continuing to integrate with our partner ecosystem. We also continue to evolve the observability experience offered by the Google console so that our customers get the best experience wherever they are. Myth #6 Spanner is only for global workloads requiring copies in multiple regions The truth is that, while Spanner offers a range of multi-region instance configurations, it also offers regional configuration in each GCP region. Each regional node is replicated in 3 zones within the region, while a multi-regional node is replicated at least 5 times across multiple regions. A regional configuration offers 4 nines of availability and protection against zonal failures.Typically, multi-regional instance configurations are indicated if your application runs workloads in multiple geographical locations or your business needs 99.999% of availability and protection against regional failures. Learn more here.Myth #7 Spanner schema changes require expensive locksThe truth is that Spanner never has table level locks. Spanner uses a multi-version concurrency control architecture to manage concurrent versions of schema and  data allowing ad-hoc and online qualified schema changes that do not require any downtime, additional tools, migration pipelines or complex rollback/backup plans. When issuing a schema update you can continue writing and reading from the database without interruption while Spanner backfills the update, whether you have 10 rows or 10 billon rows in your table.The same mechanism can be used for Point-in-time recovery (PITR) and snapshot queries using stale reads to restore both schema and the state of data at a given query-condition and timestamp up to a maximum of seven days.Now that we’ve learned the truth about Cloud Spanner, I invite you to get started – visit our website.Related ArticleImproved troubleshooting with Cloud Spanner introspection capabilitiesCloud-native database Spanner has new introspection capabilities to monitor database performance and optimize application efficiency.Read Article
Quelle: Google Cloud Platform

Announcing Google Cloud 2022 Summits [frequently updated]

Register for our 2022 Google Cloud Summit series, and be among the first to learn about new solutions across data, machine learning, collaboration, security, sustainability, and more. You’ll hear from experts, explore customer perspectives, engage with interactive demos, and gain valuable insights to help you accelerate your business transformation. Bookmark the Google Cloud Summit series website to easily find updates as news develops. Can’t join us for a live broadcast? You can still register to enjoy all summit content, which becomes available for on-demand viewing immediately following each event. Upcoming eventsData Cloud Summit | April 6, 2022Mark your calendars for the Google Data Cloud Summit, April 6, 2022. Join us to explore the latest innovations in AI, machine learning, analytics, databases, and more. Learn how organizations are using a simple, unified, open approach with Google Cloud to make smarter decisions and solve their most complex business challenges.At the event, you will gain insights that can help move you and your organization forward. From our opening keynote to customer spotlights to sessions, you’ll have the chance to uncover up-to-the-minute insights on how to make the most of your data.Equip yourself with the technology, the confidence, and the experience to capitalize on the next wave of data solutions. Register today for the 2022 Google Data Cloud Summit.Related ArticleRead Article
Quelle: Google Cloud Platform

Strengthen protection for your GCE VMs with new FIDO security key support

With the release of OpenSSH 8.2 almost two years ago, native support for FIDO authentication became an option in SSH. This meant that you could have your SSH private key protected in a purpose-built security key, rather than storing the key locally on a disk where it may be more susceptible to compromise. Building on this capability, today we are excited to announce in public preview that physical security keys can be used to authenticate to Google Compute Engine (GCE) virtual machine (VM) instances that use our OS Login service for SSH management. These advances in OpenSSH made it easier to protect access to sensitive VMs by setting up FIDO authentication to these hosts and physically protecting the keys used to grant access. And while we’ve seen adoption of this technology, we also know that management of these keys can be challenging, particularly around the manual process of generating and storing FIDO keys. Additionally, physical security key lifecycle issues could leave you without access to your SSH host. And if you lose or misplace your security key, you could be locked out.At Google Cloud we’ve been working hard on integrating our industry-first account level support for FIDO security keys with SSH in a way that makes it simple to get all the benefits of using FIDO security keys for SSH login, without any of the drawbacks.Now, when you enable security key support through OS Login for your GCE VMs, and one of your security keys will be required to complete the login process, any of the security keys configured on your Google account will be accepted during login. If you ever lose a security key, you can easily update your security key configuration (i.e. delete the lost key and add a new one) and your VMs will automatically start accepting the new configuration on next login.If desired, OS Login’s FIDO security key support can further be combined with 2 Step Verification to add an extra layer of security with two-factor authentication (2FA). When this is enabled, a user is required to both have their security key available, and prove authorized access to their Google Account at the time of logging in to their GCE instance through additional factors.If you’d like to learn more or try this capability out on your own instances, visit our documentation to get started.Related ArticleRead Article
Quelle: Google Cloud Platform