Join us in evolving the usability of GitOps

Kubernetes configuration automation remains challengingCompanies of all sizes are leveraging Kubernetes to modernize how they build, deploy, and operate applications on their infrastructure. As these companies expand the numbers of development and production clusters they use, creating and enforcing consistent configurations and security policies across a growing environment becomes difficult. To address this challenge, it is increasingly common for platform teams to use GitOps methodology to deploy configuration and policies consistently across clusters and environments with a version-controlled deployment process. Using the same principles as Kubernetes itself, GitOps reconciles the desired state of clusters with a set of declarative Kubernetes configuration files in a versioned storage system, typically git. However, implementing the git workflow is often left as exercise for the user: repo, branch, and directory organization, versioning and tagging, change proposal and approval authorization, pre-merge validation checks, etc. It can be difficult to set up appropriately, especially when managing changes across 10s, to 100s, and even 1000s of applications that are deployed at large enterprises. Moreover, configuration is typically represented using code and code-like formats, such as templates, domain-specific languages, and general-purpose programming languages, which effectively require manual authoring and editing. Here is a very simple template, for generating Kubernetes RoleBindings:code_block[StructValue([(u’code’, u'{{- range .roleBindings }}rn—rnapiVersion: rbac.authorization.k8s.io/v1rnkind: RoleBindingrnmetadata:rn name: {{ .name }}rn namespace: {{ .namespace }}rnroleRef:rn apiGroup: rbac.authorization.k8s.iorn kind: {{ .roleKind }}rn name: {{ .role }}rnsubjects:rn- apiGroup: rbac.authorization.k8s.iorn kind: Grouprn name: {{ .namespace }}.admin@bigco.comrn{{- end }}’), (u’language’, u”)])]Cross-functional collaboration across platform and application teams can become a bottleneck especially as the needs of individual teams differ from one another, requiring frequent template changes that potentially affect all uses of the templates. For example, the template above does not support binding to ServiceAccounts. Adding that option could potentially affect all uses of the template.Since such configuration tools assume they exclusively generate and set the desired state, they are not interoperable with easier-to-use client surfaces, such as Graphical User Interfaces (GUIs) and Command-Line Interfaces (CLIs). Some of these tools support transitioning to configuration tools by providing the ability to download or output the YAML representations of resources.Once that transition is made, however, it’s a one-way door, and future edits must be made manually, to a different format, through a different process. We’ve heard from users that changes that take only seconds to make in a GUI can take days to make through configuration tools. Wouldn’t it be great if you didn’t have to choose between “the easy way” and “the right way”?To really make GitOps usable, we need to address the inherent dichotomy between preferred client surfaces and configuration tools.Making configuration authoring and editing a first class citizenWe previously open sourced kpt, a package-centric toolchain for helping platform teams manage their infrastructure. To address the usability challenges outlined previously, we are extending that toolchain with Porch, the package orchestrator, which enhances the toolchain by enabling a What You See Is What You Get (WYSIWYG) configuration authoring, automation, and delivery experience. This experience simplifies managing Kubernetes platforms and KRM-driven infrastructure at scale by manipulating declarative Configuration as Data, separated from the code that transforms it. Whereas GitOps automates on-the-fly configuration generation from existing configuration packages and repositories and deployment of the output of that process to Kubernetes, the package orchestrator automates configuration package creation, editing, transformation, upgrades, and other configuration package lifecycle operations, creating and managing the content to be deployed via GitOps.We created an open-source plugin for the Backstage platform portal framework that provides a WYSIWYG GUI experience. It builds on the package orchestrator to allow platform and application teams to easily author and edit configuration, while enforcing guardrails. You don’t need to write YAML, patches, or templates, or even branch, commit, tag, push, and merge changes.This approach is unique in that it avoids many of the pitfalls currently faced today in the ecosystem when building a GUI on top of GitOps. In particular, prevailing approaches require creating abstractions, often thin ones, that need to be custom-built on top of the Kubernetes resource model. This creates a situation where platform teams need to do a lot of additional work to create a management experience on top of Kubernetes, and lose out on the value of the ecosystem of tooling and educational content built around the standard Kubernetes (and extensions’) resource types.By leveraging Configuration as Data and package orchestration, we enable a GUI that complements the existing ecosystem rather than requiring thin abstractions that just get in the way. The GUI modifies configuration data very similarly to GUIs that directly operate on the live state in Kubernetes – the resource schemas are identical, since Kubernetes is natively declarative. Since it is early, the GUI supports a limited use case, provisioning and managing namespaces and their adjacent Kubernetes policy resources. Over time we plan to build in support for other critical use cases faced by cluster administrators today, which is mostly a matter of simply implementing form editors for additional resource types, and transformer functions for additional customization scenarios.As shown in our tutorial, blueprints can be created through a simple form-based UI, again, without templates. Just draft examples of the resources to deploy, similar to kustomize bases:Resources can be added, edited, or deleted, without writing YAML:Like kustomize, kpt uses KRM functions to transform the resources in order to create variants. You can select functions from the catalog and choose their inputs. Now you have a recipe for creating similar instances, as many as are needed. Functions can be used to validate blueprints and their derived instances, also, similar to Kubernetes admission control. There’s no need to build a whole new Operator or monolithic configuration generator just to automate provisioning groups of resources. Composable functions enable a low-code experience for platform builders and a no-code experience for platform users.To see this in action, check out our demo video.A GUI isn’t the only capability enabled by making the configuration in storage mutable. Nephio, the Cloud Native Network Automation project, is building on kpt, Porch, and Config Sync to fully automate configuration of interconnected network functions and the underlying infrastructure that supports those functions. Configuration as Data provides the foundational API for configuration data, enabling mutation by Nephio automation controllers.Configuration as Data is a novel approach that doesn’t sacrifice usability or the potential for higher-level automation in order to enable reproducibility. Instead, it supports an interoperable, WYSIWYG, automatable configuration authoring and editing experience. We are looking to demonstrate this innovative approach and engage with the community on advancing it further.Come innovate with usWe are looking to engage with the community to advance this technology forward. In particular, we are deeply interested in collaborating with developers working on GitOps technologies or looking to build around the existing GitOps technologies. We are including our own GitOps reference implementation Config Sync as part of kpt, but our interface to GitOps is intended to be extensible. Please check out our contact page to connect with us or jump straight to contributing. We’d love to hear and collaborate with you so that we can make GitOps usable by everyone.
Quelle: Google Cloud Platform

Standardization, security, and governance across environments with Anthos Multi-Cloud

Kubernetes is being used for an ever growing percentage of production applications that power the world. Day 2 operations are now in focus as organizations scale from just a few clusters and applications to many clusters across multiple environments, in one cloud, multiple, and even on premise. How do you establish “sameness” across all of your clusters, regardless of where they are?Standardization, security, and governanceContainer platform teams are tasked with keeping groups of clusters up to date and aligned with their organizations standards and security policies. They will need to automate as much of this work as possible since managing 1 cluster is very different than managing 10s or 100s across geographies. Automation and keeping things as similar as possible, or sameness which is a concept Google uses internally for Kubernetes management, is critical. Anthos has a number of benefits operators can take advantage of when it comes to establishing “sameness” with regard to standardization, security, and governance across Kubernetes clusters. As a first step in evaluating Anthos it is best define the environment you will be operating in:Do you want to utilize existing Kubernetes clusters deployed with first party Kubernetes services such as Google Cloud’s GKE, Amazon’s EKS or Azure’s AKS?Are you looking to standardize on GKE across clouds for runtime consistency? This decision will define which multi-cloud product, Anthos Clusters (GKE on AWS/Azure/GCP) or Attached Clusters (any CNCF conformant K8s) are best suited to your use case when it comes to applying standardization, security, and governance across your Kubernetes estate:Standardization, security, and governance across environments Anthos Configuration Management (ACM) config sync, Policy Controller, and Service Mesh can be extended to popular Kubernetes distributions such as EKS and AKS in addition to GKE. In a multi-tenant environment you can manage the baseline configurations required across all clusters such as telemetry, infosec tooling, and networking controls centrally in your ACM git repo while allowing your teams access to namespaces for application deployment and configuration. This architecture provides a safe landing zone for applications while providing automation tooling for day 2 operations. Application teams are free to use their application deployment tool of choice within a defined namespace while the operations group manages each cluster from a centralized git repo. ACM does allow fine grained configuration syncing per cluster based on labeling schemas, which may be required if operating across environments or geographies where different tooling or policy is required.Example multi-cluster/multi-environment strategy for establishing standardization, security, and governanceUse case spotlightHosted SaaS DeploymentsDeploying microservice based software across many public cloud accounts is made possible with the Anthos Multi-Cloud API which allows standardization of your Kubernetes runtime and lifecycle management activities for the cluster and associated infrastructure across environments with centralized remote management, telemetry, and logging. Maintaining a common runtime, security posture, toolset, and observability plane across customer deployments is critical to scaling and supporting a distributed user base. These capabilities of the Anthos Multi-Cloud product have been embraced by software vendors that need to be able to provide infrastructure and application level support into their customers’ cloud environments. In the example diagram below Anthos maintains the state of each cluster in each end user account and associated GCP project. Clusters are connected to a unified CD pipeline via Config Sync. Telemetry across the cluster projects is consolidated to a custom dashboard in Google Cloud operations for a consolidated view of the entire estate.Multi-cluster/ multi-account strategy for establishing standardization, security and governance over remote applications”As an integration platform that runs on multi-clouds, we chose Anthos for multi-cloud deployments to standardize our operations across multiple clouds while relying on GKE’s valuable security and governance features which already serve us far and wide. With Anthos, we have normalized our operations and fully unified our infrastructure support.” – Diego Maia, Head of SRE, Digibee  New features with Anthos 1.11 for Multi-CloudThe following multi-cloud features are part of our Anthos 1.11 Anthos Service Mesh Topology Diagrams for GKE on AWSSupport for Windows Worker NodesSupport for Dedicated Hosts/Instances for GKE on AWSApplication Logging for Linux and Windows workloadsRelated ArticleBest practices for upgrading your Anthos clusters on bare metalHere are some questions to consider before you go about upgrading your Anthos clusters running on bare metal.Read Article
Quelle: Google Cloud Platform

Google Cloud at KubeCon EU: New projects, updated services, and how to connect

It’s that time of the year again, when we get excited about all things cloud-native and gear up to connect, share and learn from fellow developers and technologists at KubeCon EU 2022. Here is a quick round up of the latest news from the Google open source and Kubernetes teams, and how to connect with us this week at the event. Google’s continued commitment to the open-source communityFor over 20 years, Google has helped define the state of computing with it’s commitment to open source. Google originated Kubernetes and supported the evolution of the project since contributing it to the Cloud Native Computing Foundation (CNCF) in 2015. Kubernetes became central to cloud-native computing because it was open sourced and under the governance of a neutral body. Since then, we’ve continued to invest deeply in cloud-native open source technologies on behalf of our customers. Most recently we completed the transition of Knative to the CNCF and announced our intent to contribute Istio to the organization, which alongside Kubernetes and Knative, is a critical part of cloud-native infrastructure. We continue to support the evolution of these projects and will be hosting KnativeCon at KubeCon EU, where you can learn more about the project and how you can join the community to help it grow further. Building new capabilities for critical workloads in the cloudKubernetes has been a transformative technology, bringing cloud-native best practices and design patterns to a number of industries. Yet AI/ML, batch, and HPC workloads have lagged behind traditional enterprise counterparts primarily due to the complex scheduling and resource allocation needs that make it difficult to deploy and scale these scientific workloads. Google, along with a number of community members, is working to make Kubernetes a first-class platform for these workloads through improvements to the batch API, improving scheduling performance, and leading the development of kueue, a Kubernetes-native work queue. Combined with Google Cloud’s leading hardware and autoscaling capabilities, these upstream efforts make Google Kubernetes Engine (GKE) an ideal platform for AI/ML and batch computing. To learn more about how Google is helping to add these critical capabilities into the project you can join us at Batch and HPC Day and Kubernetes AI Day onsite during KubeCon. Driving Kubernetes ease of use for customers through new open source projectsWe are embedded in open source communities, and believe in the power of the community to drive innovation and make it easier for everyone to build in the cloud. This week we reached an important milestone with a new open source offering from Google: Config Connector and Config Sync are now available as open source (Config Connector, Config Sync), joining Gatekeeper. Now, the entirety of Anthos Config Management is based on open source. We’ve also added Config Sync and the new package orchestrator to the kpt project. Together, these projects provide an end-to-end portable solution that enables a “What You See Is What You Get” configuration authoring, automation, and delivery experience, simplifies managing Kubernetes platforms and KRM-driven infrastructure at scale. We are seeking help from the community to innovate with us on this project, as we hope that it can help improve how others build platforms on top of Kubernetes. We are happy to accept contributions to kpt from the community and our customers. You can check out more information here on how to get involved as this project grows.Adding a high-usage tier to Managed Service for PrometheusThis March, we introduced Google Cloud Managed Service for Prometheus, and Kubernetes users are enthusiastic about the monitoring service’s ease of use and scalability. To get a sense of why customers are using it, you can read about the experience of Maisons du Monde, a French furniture and home decor company that adopted Managed Service for Prometheus after first running the open source version.In fact, Managed Service for Prometheus’ scalability is so strong that we’ve introduced a new high-usage tier designed for customers with extremely large volumes of metrics — more than 500 billion metric samples per month. This new pricing tier is 50% less than the previous highest-tier list price. We’ve also reduced the list price of lower-usage tiers by 25%. To get started with Managed Service for Prometheus, try out our new Managed Service for Prometheus qwiklab at no charge now through June 15, and join us on Tuesday at KubeCon for the presentation: Easy, scalable metrics for Kubernetes with Managed Service for Prometheus.The most automated and scalable managed KubernetesKubernetes is not just a technology — it’s a model for creating value for your business, a way of developing apps and services, and a means to secure and develop cloud-native IT capabilities for innovation. Given our long history with Kubernetes we are able to offer unparalleled managed services based on critical open source projects. Created by the same developers that built Kubernetes, Google Kubernetes Engine(GKE) leads the way in cloud-based Kubernetes services for running containerized applications. GKE makes it easy to recognize the benefits of innovation initiatives without getting bogged down troubleshooting infrastructure issues and managing day-to-day operations related to enterprise-scale container deployment. With fully managed Autopilot mode of operation combined with multi-dimensional auto scaling capabilities, GKE delivers most dimensions of automation to efficiently and easily operate your applications. Only GKE can run 15,000 node clusters, outscaling other cloud providers by up to 10X, letting you run applications effectively and reliably at scale.At KubeCon you will have direct access to our Kubernetes experts, starting on May 17th at our co-located event: Build with the most automated and scalable Kubernetes hosted by Google Cloud. Join us to learn what is new in the world of containers and Kubernetes here at Google Cloud and get access to technical demos.More ways to engage with Google expertise at KubeCon EUExplore several interesting courses to help get you started with Kubernetes by visiting our virtual booth. This includes some top sessions produced in the Learn Kubernetes with Googlevideo series and an opportunity to claim exclusive swag to support your Kubernetes learning from Google Cloud. You can also join over 25 sessions from Googlers onsite at the event. Kubernetes builds on more than 15 years of running Google’s containerized workloads and the invaluable contributions from the open source community. Have a question? Curious about the latest things in Google Cloud or want to talk to Kubernetes experts? Join us virtually on the CNCF slack in the #6-kubecon-googlecloud channel! There will be a number of Google Cloud and cloud-native open source community members available to field your questions. You can also request some time with our team on the ground.We are looking forward to connecting with developers and sharing expertise from some of our top Kubernetes experts this week at the event.
Quelle: Google Cloud Platform

New observability features for your Splunk Dataflow streaming pipelines

We’re thrilled to announce several new observability features for the Pub/Sub to Splunk Dataflow template to help operators keep a tab on their streaming pipeline performance. Splunk Enterprise and Splunk Cloud customers use the Splunk Dataflow template to reliably export Google Cloud logs for in-depth analytics for security, IT or business use cases. With newly added metrics and improved logging for Splunk IO sink, it’s now easier to answer operational questions such as:Is the Dataflow pipeline keeping up with the volume of logs generated?What is the latency and throughput (Event Per Second or EPS) when writing to Splunk?What is the response status breakdown of downstream Splunk HTTP Event Collector (HEC) and potential error messages?This critical visibility helps you derive your log export service-level indicators (SLIs) and monitor for any pipeline performance regressions. You can also more easily root cause potential downstream failures between Dataflow & Splunk such as Splunk HEC network connections or server issues, and fix the problem before it cascades. To help you quickly chart these new metrics, we’ve included them in the custom dashboard as part of the updated Terraform module for Splunk Dataflow. You can use those Terraform templates to deploy the entire infrastructure for log export to Splunk, or just the Monitoring dashboard alone.Log Export Ops Dashboard for Splunk DataflowMore metricsIn your Dataflow Console, you may have noticed several new custom metrics (highlighted below) for launched jobs as of template version 2022-03-21-00_RC01, that is gs://dataflow-templates/2022-03-21-00_RC01/Cloud_PubSub_to_Splunk or later:Pipeline instrumentationBefore we dive into the new metrics, let’s take a step back and go over the Splunk Dataflow job steps. The following flowchart represents the different stages that comprise a Splunk Dataflow job along with corresponding custom metrics:In this pipeline, we utilize two types of Apache Beam custom metrics:Counter metrics, labeled 1 through 10 above, used to count messages and requests (both successful and failed).Distribution metrics, labeled A through C above, used to report on distribution of request latency (both successful and failed) and batch size. Downstream request visibilitySplunk Dataflow operators have relied on some of these pre-built custom metrics to monitor log messages progress through the different pipeline stages, particularly in the last stage Write To Splunk, with metrics outbound-successful-events (counter #6 above) and outbound-failed-events (counter #7 above) to track the number of messages that were successfully exported (or not) to Splunk. While operators had visibility of the outbound message success rate, they lacked visibility at the HEC request level. Splunk Dataflow operators can now monitor not only the number of successful and failed HEC requests over time, but also the response status breakdown to determine if request failed due to a client request issue (e.g. invalid Splunk index or HEC token), or a transient network or Splunk issue (e.g. server busy or down) all from Dataflow Console with the addition of counters #7-10 above, that is:http-valid-requestshttp-invalid-requestshttp-server-error-requestsSplunk Dataflow operators can also now track average latency of downstream requests to Splunk HEC, as well as average request batch size, by using the new distribution metrics #A-C, that is:successful_write_to_splunk_latency_msunsuccessful_write_to_splunk_latency_mswrite_to_splunk_batchNote that a Distribution metric in Beam is reported by Dataflow as four sub-metrics suffixed with _MAX, _MIN, _MEAN and _COUNT. That is why those 3 new distribution metrics translate to 12 new metrics in Cloud Monitoring, as you can see in the earlier job info screenshot from Dataflow Console. Dataflow currently does not support creating a histogram to visualize the breakdown of these metrics’ values. Therefore, _MEAN metric is the only useful sub-metric for our purposes. As an all-time average value, _MEAN cannot be used to track changes over arbitrary time intervals (e.g. hourly), but it is useful to capture baseline, track trend or to compare different pipelines.Dataflow custom metrics, including aforementioned metrics reported by Splunk Dataflow template, are a chargeable feature of Cloud Monitoring. For more information on metrics pricing, see Pricing for Cloud Monitoring.Improved loggingLogging HEC errorsTo further root cause downstream issues, HEC request errors are now adequately logged, including both response status code and message:You can retrieve them directly in Worker Logs from Dataflow Console by setting log severity to Error.Alternatively, for those who prefer using Logs Explorer, you can use the following query.code_block[StructValue([(u’code’, u’log_id(“dataflow.googleapis.com/worker”)rnresource.type=”dataflow_step”rnresource.labels.step_id=”WriteToSplunk/Write Splunk events”rnseverity=ERROR’), (u’language’, u”)])]Disabling batch logsBy default, Splunk Dataflow workers log every HEC request as follows:Even though these requests are often batched events, these ‘batch logs’ are chatty as they add 2 log messages for every HEC request. With the addition of request-level counters (http-*-requests), latency & batch size distributions, and HEC error logging mentioned above, these batch logs are generally redundant. To control worker log volume, you can now disable these batch logs by setting the new optional template parameter enableBatchLogs to false, when deploying the Splunk Dataflow job. For more details on latest template parameters, refer to template user documentation.Enabling debug level logsThe default logging level for Google provided templates written using the Apache Beam Java SDK is INFO, which means all messages of INFO and higher i.e. WARN and ERROR will be logged. If you’d like to enable lower log levels like DEBUG, you can do so by setting the –defaultWorkerLogLevel flag to DEBUG while starting the pipeline using gcloud command-line tool. You can also override log levels for specific packages or classes with the –workerLogLevelOverridesflag. For example, the HttpEventPublisher class logs the final payload sent to Splunk at the DEBUG level. You can set the –workerLogLevelOverridesflag to {“com.google.cloud.teleport.splunk.HttpEventPublisher”:”DEBUG”} to view the final message in the logs before it is sent to Splunk, and keep the log level at INFO for other classes. Exercise caution while using this as it will log all messages sent to Splunk under the Worker Logs tab in the console, which might lead to log throttling or reveal sensitive information.Putting it all togetherWe put all this together in a single Monitoring dashboard that you can readily use to monitor your log export operations:Pipeline Throughput, Latency & ErrorsThis dashboard is a single pane of glass for monitoring your Pub/Sub to Splunk Dataflow pipeline. Use it to ensure your log export is meeting your dynamic log volume requirements, by scaling to adequate throughput (EPS) rate, while keeping latency and backlog to a minimum. There’s also a panel to track pipeline resource usage and utilization, to help you validate that the pipeline is running cost-efficiently during steady-state.Pipeline Utilization and Worker LogsFor specific guidance on handling and replaying failed messages, refer to Troubleshoot failed messages as part of the Splunk Dataflow reference guide. For general information on troubleshooting any Dataflow pipeline, check out the Troubleshooting and debugging documentation, and for a list of common errors and their resolutions look through the Common error guidance documentation. If you encounter any issue, please open an issue in the Dataflow templates GitHub repository, or open a support case directly in your Google Cloud Console.For a step-by-step guide on how to export GCP logs to Splunk, check out the Deploy production-ready log exports to Splunk using Dataflow tutorial, or use the accompanying Terraform scripts to automate the setup of your log export infrastructure along with the associated operational dashboard.Related ArticleWhat’s new with Splunk Dataflow template: Automatic log parsing, UDF support, and moreAnnouncing new features for Splunk Dataflow template with improved compatibility with Splunk Add-on for GCP, more extensibility using use…Read Article
Quelle: Google Cloud Platform

Google’s open-source solution to DFDL Processing

The cloud has become the choice for extending and modernizing applications, but there are some situations where the transition is not straightforward, such as migrating applications that access data from a mainframe environment.  Migrating the data and the applications at certain points can be outsync.  Mechanisms need to be in place during the transition to support interoperability with legacy workloads and  access data out of the mainframe.  For the latter, the Data Format Description Language  (DFDL) which is an open standard modeling language from the Open Grid Forum (OGF), has been used to access data from a mainframe, e.g. IBM Integration Bus.  DFDL uses a model or schema that allows text or binary data to be parsed from its native format and to be presented as an information set out of the mainframe (i.e., logical representation of the data contents, independent of the physical format). DFDL Processing with IBM App ConnectIf we talk about solutions for parsing and processing data described by DFDL, one of the options in the past has been IBM App Connect which allows development of custom solutions via IBM DFDL. The following diagram represents a high-level architecture of DFDL Solution implementation on IBM App Connect:IBM App Connect brings stable integration to the table at an enterprise level cost. According to IBM’s sticker pricing as of May 2022, IBM App Connect charges $500 and above per month for using the App Connect with IBM Cloud services. These prices are excluding the cost of storing and maintaining DFDL Definitions in the Mainframe. With the introduction of Tailored Fit Pricing on IBMz15, cost of maintaining the mainframe can range from $4900 to $9300 per month over the span of 5 years, which may be costly for a small/medium business only wanting to process data defined by DFDL.Introducing Google Open-Source DFDL Processor with Google CloudAt Google our mission is to build for everyone, everywhere. With this commitment in mind, the Google Cloud team has developed and open-sourced the solution for DFDL Processor which can be easily accessible and customizable for organizations to  use it. We understand that mainframes can be expensive to maintain and use, which is why we have integrated Cloud Firestore and Bigtable as the databases to store the DFDL definitions. Firestore can provide 100K reads, 25K writes, 100K deletes, and 1TB of storage per month for approximately $186 per month. While on the other hand Bigtable provides a fast, scalable database solution for storing terabytes, or even petabytes of data at a relatively lower cost too. This move away from the mainframe and adopting cloud-native database solutions can save organizations thousands of dollars every month.Next, we have substituted App Connect with a combination of our open-source DFDL processor, Cloud Pub/Sub service and open-source Apache Daffodil Library. Pub/Sub provides the connection between the mainframe and the processor, and from the processor to the downstream applications. The Daffodil Library helps in compiling schemas, and outputting infosets for the given DFDL definition and message. The total cost of employing the Pub/Sub service and the Daffodil Library comes out to be approximately $117 per month, which means an organization can save a minimum of $380 per month by using this solution.The table below shows a summary of the cost difference breakdown between the solutions as discussed above:How it worksThe data described by the DFDL usually needs to be available in widely used formats such as JSON, in order to be consumed by downstream applications which might  have already been migrated to a cloud native environment. To achieve the consumption of the data, cloud native applications/services can be implemented in conjunction with Google Cloud Services, which accepts the textual or binary data as input from the mainframe , fetches corresponding DFDL from a database, and finally compiles and outputs the equivalent JSON for the downstreaming applications to consume.The following diagram describes a high level architecture to be presentedAn application can be built to process the information being received from the mainframe, e.g a DFDL Processor Service, leveraging the Daffodil API to parse the data against a corresponding DFDL schema and output the JSON. DFDL schema definitions can be potentially migrated and stored in Firestore or Bigtable. Since these definitions rarely change and they can be stored in a key-value pair format, the storage of preference is a non-relational managed database. Google Cloud Pub/Sub, can leverage an eventing mechanism that receives the binary/textual message from a Data Source, i.e. the mainframe, in a Pub/Sub topic.  This feature will  allow the DFDL Processor to access the data, to retrieve the corresponding DFDL definition from Firestore or Bigtable and finally pass both on to the Daffodil API to compile and output the JSON result. The JSON result is finally published into a resulting Pub/Sub topic for any downstream application to consume. It is recommended to follow CloudEvent schema specification which allows to describe events in common formats, providing interoperability across services platforms and systems.You can find examples of the implementation in Github:  Firestore ExampleBigtable ExampleConclusionIn this post, we have discussed different pipelines used to process data defined by DFDL, and cost comparisons of these pipelines. Additionally, we have demonstrated how to use Cloud Pub/Sub, Firestore, and Bigtable to create a service which is capable of listening to binary event messages,  extract the corresponding DFDL definition from a  managed database, and process it to output a JSON which can then be consumed by downstream applications using well-established technologies and libraries.1. Price comparison analysis as of May 2022 and subject to change based on usageRelated Article5 principles for cloud-native architecture—what it is and how to master itLearn to maximize your use of Google Cloud by adopting a cloud-native architecture.Read Article
Quelle: Google Cloud Platform

How a top gaming company transformed its approach to security with Splunk and Google Cloud

Since Aristocrat’s founding in 1953, technology has constantly transformed gaming and the digital demands on our gaming business are a far cry from challenges we faced when we started. As we continue to expand globally, security and compliance are top priorities. Managing IT security for several gaming subsidiaries and our core business became more complex as we entered into new markets and scaled up our number of users. We needed a centralized platform that could give us full visibility into all of our systems and efficient monitoring capabilities to keep data and applications secure. We also needed the ability to secure our systems without compromising user experiences.We turned to Google Cloud and Splunk to better manage complexity and support highly efficient, secure, and more dynamic gaming experiences for everyone. We are committed to using today’s modern technologies to give players more optimal experiences.Bringing our digital footprint into the cloudWhen we set out on our digital transformation, we looked to address many business requirements. These requirements included:Regulation: We wanted a platform that could efficiently address our industry’s stringent and global regulatory compliance requirements. Player experience: Our IT environment must support smooth gaming experiences to keep users engaged and satisfied.Scalability: As we grow and diversify, meeting the changing demands of an increasingly global gaming community, we need an easily scalable platform to align with our current and future needs.Google Cloud offered us the perfect foundation through solutions such as Compute Engine, Google Kubernetes Engine, BigQuery, and Google Cloud Storage. These acted as the right infrastructure components for us for the following reasons:Google Cloud is globally accessible and supports compliance, helping to streamline security and regulatory processes for our team. With Google Cloud, we can manage our entire development and delivery processes globally with fast and efficient reconciliation of regional compliance requirements. When we need to adjust existing infrastructure or deliver new capabilities, Google Cloud accelerates the process and takes the heavy lifting off of our team. Google Cloud allows us to support tens of thousands of players on each of our apps while experiencing minimal downtime and low latency. The importance of this support can’t be underestimated in an industry where players have little to no patience if lags in games occur.We migrated our back-office IT stack alongside our consumer-facing production applications to Google Cloud given our positive experiences with compliance, security, scalability, and process management. This migration has significantly accelerated our digital transformation while streamlining our infrastructure for faster and more cost-effective performance.In many ways, Google Cloud has been, with maybe a pun intended, a game-changer for us. For instance, when we suddenly had to support a lot of remote work during the COVID-19 pandemic, native identity and access management tools in Google Cloud allowed us to retire costly VPNs used for backend access and quickly adopt a more easily managed, cost-effective zero-trust security posture.Accessing vital third-party partners and managed servicesAristocrat has many IT needs best addressed in a multi-cloud environment. Google Cloud is particularly attractive given its strong cloud interoperability, as well as the many products and services available on Google Cloud Marketplace. The marketplace accelerated our deployment of key third-party apps including Splunk and Qualys.Given the personal information we store and the global regulatory compliance statutes we must oblige, security lies at the heart of our business. Splunk is a critical component of our digital transformation because it offers solutions that provide the enhanced monitoring capabilities and visibility we need. The integration between Splunk and Google Cloud gives us confidence that our data is secure. We know our data can be secure in Google Cloud, while simplified billing through Google Cloud Marketplace makes payments and license tracking easier for our procurement team.As part of our protected environment, we use the Splunk platform as our security information and event management system, leveraging the InfoSec app for Splunk that provides continuous monitoring and advanced threat detection to significantly improve our security.We can manipulate and present data in Splunk in a way that provides us with a single pane-of-glass for our hybrid, multi-cloud environment and our third-party apps and systems. Splunk observability tools have likewise helped us to track browser-based applications like our online gaming apps to monitor details related to security and performance.Splunk and Google Cloud have transformed how we operate. We can now quickly ingest and analyze data at scale within our refined approach to security management by offloading software management to Splunk and Google Cloud. This ability enables us to approach security more strategically, and positions us to integrate more AI/ML capabilities into our products for even greater governance and performance.This is just the beginning of our journey with Splunk and Google Cloud. We’re excited to see the innovation we can continue bringing to the gaming community worldwide.
Quelle: Google Cloud Platform

Sharpen your machine learning skills at Google Cloud Applied ML Summit

Artificial intelligence (AI) and particularly machine learning (ML) continue to advance at breakneck pace. We see it throughout projects and commentaries across the broader technology industry. We see it in the amazing things our customers are doing, from creating friendly robots to aid childhood development, to leveraging data for better manufacturing and distribution, to fostering internal innovation through hackathons. And we see it in our own research and product development at Google, from improved machine learning models for our Speech API, to integrations that streamline data management and ML modeling, to making AlphaFold (DeepMind’s breakthrough protein structure prediction system) available to researchers throughout the world using VertexAI. At Google Cloud, we’ve helped thousands of companies to accelerate their AI efforts, empower their data scientists, and extend the ability to build AI-driven apps and workflows to more people, including those without data science or ML expertise. Next month, we’ll take the next step in this journey with our customers, at Google Cloud Applied ML Summit. Join us June 9 for this digital event, which will bring together some of the world’s leading ML and data science professionals to explore the latest cutting-edge AI tools for developing, deploying, and managing ML models at scale. On-demand sessions kick off at 9:00 AM Pacific with “Accelerating the deployment of predictable ML in production,” featuring VP & GM of Google Cloud AI & Industry Solutions Andrew Moore; Google Cloud Developer Advocate Priyanka Vergadia; Ford Director of AI and Cloud Bryan Goodman; and UberAI Director of Engineering Smitha Shyam.At the summit, you’ll learn how companies like General Mills, Vodafone, H&M, and CNA Insurance are developing, deploying, and safely managing long-running, self-improving AI services. Get insights in practitioner sessions where you can find new ways to:Build reliable, standardized AI pipelines across Spark on Google Cloud, Dataproc, BigQuery, Dataplex, Looker, and more, with a unified experience from Vertex AI, all in the session “Data to deployment – 5x faster.”Train high-quality ML models in minutes with AutoML innovations born of the latest Google Brain research, explored in the session “End-to-end AutoML for model prep.”Make the most of your Google Cloud investments in Vertex AI Training and Vertex AI Prediction to help you deploy custom models built on TensorFlow, PyTorch, scikit-learn, XGBoost, and other frameworks. Check out the session “ML prediction and serving: Vertex AI roadmap.”Automate and monitor AI integration, deployment, and infrastructure management to drive greater speed and efficiency. Don’t miss the session “Machine learning operations (MLOps) strategy and roadmap.”Streamline the process to audit, track, and govern ML models as they adapt to live data within a dynamic environment, without degrading performance. Dive into this topic  in the session “Model governance and auditability.” You can choose from over a dozen sessions across three tracks: “Data to ML Essentials,” “Fact-track Innovation,” and “Self-improving ML.” Session topics range from MLOps best practices, to Google Cloud customer experiences, to the importance of model auditability, and explainable and responsible AI, with multiple customer panels and “ask me anything” sessions to help you get the insights and develop the skill to take your business’s ML efforts to the next level.We’re committed to continuing to serve our customers in this rapidly-evolving space, and we’re excited to learn and collaborate with you at this event. To register, visit this link to reserve your seat for the Applied ML  Summit.Related ArticleUnified data and ML: 5 ways to use BigQuery and Vertex AI togetherVertex AI is a single platform with every tool you need to build, deploy, and scale ML models. Get started quickly with five easy integra…Read Article
Quelle: Google Cloud Platform

Your guide to sessions at Google Cloud Security Summit 2022

Google Cloud Security Summit is just a few days away! We have an exciting agenda with a keynote, demo, and breakout sessions across four tracks – Zero Trust, Secure Software Supply Chain, Ransomware & Emerging Threats, and Cloud Governance & Sovereignty. By attending this summit, you will be the first to learn about new products and advanced capabilities we are announcing from Google Cloud security and discover new ways to define and drive your security strategy and solve your biggest challenges.We hope you’ll join us for the Security Summit digital online event on May 17, 2022, to learn from experts, explore the latest tools, and share our vision for the future of security. Register here for the event and watch the sessions live and on-demand. If you are in Europe, the Middle East, or Africa please visit the EMEA page to view summit events in your time zone and captions in your local language.Security Summit KeynoteCharting a safer future with Google CloudFeatured Speakers:Chris Inglis, National Cyber Director, Executive Office of the President White HouseJonathan Meadows, Head of Cloud Cyber Security Engineering, CitibankSunil Potti, General Manager and Vice President of Cloud Security, Google CloudCybersecurity remains at the top of every organization’s agenda. Join our opening keynote to hear how Google Cloud’s unique capabilities and expertise can help organizations, large and small, in the public or private sector, address today’s most prominent security challenges and imperatives: Zero Trust, Securing the Software Supply Chain, Ransomware and other emerging threats, Cloud governance and Digital Sovereignty. Whether you use our trusted cloud for digital transformation, or continue to operate on-premise or in other clouds, you’ll learn how we can help you be safer with Google.DemoModern threat detection, investigation, and response with Google Cloud’s SecOps suiteFeatured Speakers:Arnaud Loos, Customer Engineer, Google CloudSvetla Yankova, Head of Customer Engineering, Google CloudTo stay secure in today’s growing threat landscape, organizations must detect and respond to cyber threats at unprecedented speed and scale. This demonstration will showcase Google Cloud’s Security Operations Suite, and its unique approach to building modern threat detection, investigation and response.Breakout SessionsWe have 19 breakout sessions that include sessions from Google speakers, our customers, and partners. The breakout sessions are available across four different tracks covering Zero Trust, Secure Software Supply Chain, Ransomware & Emerging threats, and Cloud Governance and Sovereignty.Zero Trust Track 1. How Google is helping customers move to Zero TrustFeatured Speakers:Aman Diwakar, Security Engineering Manager – Corporate Security, Door DashJeanette Manfra, Senior Director, Risk and Compliance, Google CloudTanisha Rai, Product Manager, Google CloudEnterprises around the globe are committed to moving to a Zero Trust architecture, but actually making that happen can be hard. Every day, we hear from customers asking how they can set up a Zero Trust model like Google’s, and we are here to help. Tune in to this session to hear speakers discuss how Google did it and how we can now help you with a comprehensive set of products, advisory services, and solutions. Whether you’re “born in the cloud,” a government agency looking to meet federal directives, or somewhere in between, Google Cloud products like BeyondCorp Enterprise and our set of partner solutions can help you jump-start your Zero Trust approach.2. A look ahead: the future of BeyondCorp EnterpriseFeatured Speakers:Prashant Jain, Product Manager, Google CloudJian Zhen, Product Manager, Google CloudGoogle pioneered Zero Trust. Now we’re pioneering rapid Zero Trust transformation. We know one size does not fit all and Zero Trust capabilities should conform to your needs – not vice versa. Join this session to learn more about how BeyondCorp Enterprise enables you to quickly and flexibly apply a Zero Trust approach to meet your application use cases and security requirements. Hear from product leaders as they share updates on new BeyondCorp capabilities, partnerships, and integrations that enable you to deliver rapid wins and avoid drawn out deployment projects.3. CrowdStrike and Deloitte: Managing cloud migration, remote workforce, and today’s threatsFeatured Speakers:Chris Kachigian, Sr. Director, Global Solutions Architecture, CrowdStrikeMike Morris, Detect and Respond CTO, Head of Engineering, DeloitteMcCall McIntyre, Strategic Technology Partner Lead, Google CloudYour organization is in the cloud migration journey, you have a remote or hybrid workforce and your extended infrastructure is more dependent than ever on disparate devices, partners and apps. To make things even more complicated, threat actors are targeting you in all of these facets, causing business disruption. How can you secure this new extended environment without negatively impacting user productivity? Join this Lightning Talk to learn more about how CrowdStrike and Deloitte have helped customers solve for just that. 4. Working safer with Google WorkspaceFeatured Speakers:Neil Kumaran, Product Lead, Gmail & Chat Security & Trust, Google CloudNikhil Sinha, Sr. Product Manager, Workspace Security, Google CloudGoogle Workspace is on a mission to make phishing and malware attacks a thing of the past. Google keeps more people safe online than anyone else in the world. According to our research, Gmail blocks more than 99.9% of malware and phishing attempts from reaching users’ inboxes. We do this by using our expertise protecting against threats at scale to protect every customer by default. This session will provide an overview of how Google Workspace layered, AI powered protections function across Gmail, Docs, Sheets, Slides, and Drive. We’ll examine real-life examples of large malware attacks to showcase how advanced capabilities like sandboxing, deep-learning-based malicious document classification, and performant, deep antivirus protections work to help stop threats. 5. Securing IoT devices using Certificate Authority ServiceFeatured Speakers:Sudhi Herle, Director, Engineering & Product Management, Android Platform Security, Google CloudAnoosh Saboori, Product Manager, Google CloudMahesh Venugopala, Director Security, AutonomicScaling security for IoT devices can be challenging. As the IoT market continues to grow, it is imperative that strong security measures are put into place to protect the information these devices send to the cloud. Join this session to learn how Google customers can leverage capabilities such as Certificate Authority Service to apply Zero Trust principles to secure IoT devices.Secure Software Supply Chain Track6. Building trust in your software supply chainFeatured Speakers:Nikhil Kaul, Head of Product Marketing – Application Modernization, Google CloudVictor Szalvay, Outbound Product Manager, Google CloudWhether you’re building an application on Kubernetes, or in a serverless or virtual machine environment, end-to-end security is critical for mitigating the vulnerabilities lurking within open source software, as well as those related to recent cybersecurity attacks and data breaches. Come learn how you can meet guidelines from the U.S. government and adopt an in-depth, security-first approach with Google Cloud that embeds security at every step of your software life cycle. 7. Protecting and securing your Kubernetes infrastructure with enterprise-grade controlsFeatured Speaker: Gari Singh, Product Manager, Google CloudKubernetes is not just a technology. It’s also a model for creating value for your business, a way of developing apps and services, and a means to help secure and develop cloud-native IT capabilities for innovation. Google Kubernetes Engine (GKE) allows your developers to spend less time worrying about security and to achieve more secure outcomes. In this session, learn how you can set up enterprise-grade security for your app right out of the box. We’ll cover the latest security controls, hardened configuration, and policies for GKE, including confidential computing options. 8. Managing the risks of open source dependencies in your software supply chainFeatured Speaker:Andy Chang, Group Product Manager, Google CloudOpen-source software code is available to the public – free for anyone to use, modify, or inspect. But securing open-source code, including fixing known vulnerabilities, is often done on an ad hoc, volunteer basis. Join this session to learn how our new Google Cloud solution addresses open-source software security.Ransomware and Emerging Threats Track9. A holistic defense strategy for modern ransomware attacksFeatured Speaker:Adrian Corona, Head of Security Solutions GTM, Google CloudMaking your organization resilient against modern ransomware attacks requires holistic detection, protection, and response capabilities. In this session, we’ll demonstrate how you can apply a cyber resilience framework, and products from Google Cloud and partners, to help thwart threats and combat ransomware attacks.10. Taking an autonomic approach to security operationsFeatured Speakers: Anton Chuvakin, Head of Security Solution Strategy, Google CloudIman Ghanizada, Head of Autonomic Security Operations, Google CloudSecurity operations centers are constantly pressed for time. Analysts seldom have the luxury to “clear the board” of active attacks and, as a result, can often feel overwhelmed. In this talk, we’ll show you how you can turn the tide and leverage Chronicle and Siemplify to prioritize and automate your SecOps, giving analysts valuable time back to focus on the threats that matter.11. Insight and perspective from the Unit 42 Ransomware Threat Report Featured Speakers:Joshua Haslett, Strategic Technology Partnership Manager, Google CloudJosh Zelonis, Field CTO and Evangelist, Palo Alto NetworksRansomware groups turned up the pressure on their victims in 2021, demanding higher ransoms and using new tactics to force them into paying.In fact, the average ransomware demand in cases handled by Unit 42 in 2021 climbed 144% since 2020. At the same time, there was an 85% increase in the number of victims who had their names and other details posted publicly on dark web “leak sites” that ransomware groups use to coerce their targets. As the ransomware landscape continues to evolve, and threat actors leverage new creative techniques to cripple business operations, what can your organization do to prepare and stay ahead of threats? Join us for this presentation as we discuss the key findings in our 2022 Unit 42 Ransomware Threat Report. 12. Cloud-native risk management and threat detection with Security Command CenterFeatured Speakers:Thomas Meriadec, Head of Cloud Platforms Security & Compliance, VeoliaTim Wingerter, Product Manager, Google CloudAs organizations move to the cloud, continuous monitoring of the environment for risk posture and threats is critical. In this session, learn how Security Command Center Premium provides risk management and threat detection capabilities to help you manage and improve your cloud security and risk posture. Join us to hear about Veolia’s experience with Security Command Center Premium.13. Securing web applications and APIs anywhereFeatured Speakers:Shelly Hershkovitz, Product Manager, Apigee API Security, Google CloudGregory Lebovitz, Product Management, Cloud Network Security, Google CloudApplication attack vectors are increasing rapidly, and many organizations seek to  protect against the different types of application and API attacks. Join this session to learn how Google Cloud can help protect and secure applications and APIs from fraud, abuse, and attacks – such as DDoS, API abuse, bot fraud, and more – using our Web App and API Protection (WAAP) offering.14. Maximizing your detection & response capabilitiesFeatured Speakers:Magali Bohn, Director, Partnerships and Channels GSEC, Google CloudBrett Perry, CISO, Dot FoodsJason Sloderbeck, Vice President, Worldwide Channels, CYDERESJoin Google Cloud, Cyderes (Cyber Defense and Response), and Dot Foods as we discuss best practices and real-world use cases that enable a company to detect threats and respond to incidents in real-time. Learn their autonomic security operations journey and how they’ve scaled a robust, cost-efficient program to accelerate their digital transformation and overall growth. Cloud Governance & Sovereignty Track15. Achieving your digital sovereignty with Google CloudFeatured Speaker:Dr. Wieland Holfelder, Vice President Engineering, Google CloudGoogle Cloud’s unique approach, which includes strong local partnerships, helps organizations balance transparency, control, and the ability to survive the unexpected – on a global scale. Join this session to learn how you can meet current and emerging digital sovereignty goals. 16. Compliance with confidence: Meeting regulatory mandates using software-defined community cloudsFeatured Speakers:Bryce Buffaloe, Product Manager Security & Compliance, Google CloudJamal Mahboob, Customer Engineer, Google CloudAdopting the cloud in regulated industries can require constraints for data residency, and the need for support and specific security controls. Learn how Google Cloud can help provide assurances without the strict physical infrastructure constraints of legacy approaches, enabling organizations to benefit from cloud innovation while meeting their compliance needs.17. Demystifying cyber security analytics – Scalable approaches for the real worldFeatured Speakers:Philip Bice, Global Lead – Service Provider Partnerships, Google CloudChris Knackstedt, Sr. Manager / Data Scientist, Deloitte & Touche LLPIn this session, join security leaders from Deloitte & Touche LLP and Google Cloud for an insightful conversation on key trends and challenges warranting the need for scalable, flexible and predictive security analytics solutions for today’s hybrid, multi cloud technology environments. The speakers will share practical approaches to designing and deploying use case-driven security analytics by leveraging the power of Google Cloud native data management and analytics services. The session will also cover solutions and managed services offered jointly by Deloitte and Google Cloud that can help organizations maintain their competitive differentiation and continually accelerate cyber security maturity.18. Best practices for defining and enforcing policies across your Google Cloud environmentFeatured Speakers:Vandhana Ramadurai, Sr. Product Manager, Google CloudSri Subramanian, Head of Product, Cloud Identity and Access Management, Google CloudLearn how to take a policy-driven approach to governing your cloud resources. In this session, we’ll cover best practices that enable organizations to shift from remediating resources that violate requirements to a more proactive state for preventing those violations.19. A comprehensive strategy for managing sensitive data in the cloudFeatured Speakers:Nelly Porter, Group Product Manager, Google CloudMatt Presson, Lead Security Architect, Product Security, BullishData is a big asset and a big risk, and classification and protection of it is an important task for organizations. In this session, learn how you can leverage Google security tools to more effortlessly take back control of your data.In addition to these sessions, there will be on-demand videos and demos published on May 17 that you can watch at your convenience by visiting the Security Summit page. We can’t wait for you to join us and learn all things Security at Google Cloud Security Summit!Related ArticleCloud CISO Perspectives: April 2022Google Cloud CISO Phil Venables shares his thoughts on the latest security updates from the Google Cybersecurity Action Team.Read Article
Quelle: Google Cloud Platform

Extending BigQuery Functions beyond SQL with Remote Functions, now in preview

Today we are announcing the Preview of BigQuery Remote Functions. Remote Functions are user-defined functions (UDF) that let you extend BigQuery SQL with your own custom code, written and hosted in Cloud Functions, Google Cloud’s scalable pay-as-you-go functions as a service.  A remote UDF accepts columns from BigQuery as input, performs actions on that input using a Cloud Function, and returns the result of those actions as a value in the query result. With Remote Functions, you can now write custom SQL functions in Node.js, Python, Go, Java, NET, Ruby, or PHP. This ability means you can personalize BigQuery for your company, leverage the same management and permission models without having to manage a server.In what type of situations could you use remote functions?Before today, BigQuery customers had the ability to create user defined functions or UDFs in either SQL or javascript that ran entirely within BigQuery. While these functions are performant and fully managed from within BigQuery, customers expressed a desire to extend BigQuery UDFs with their own external code. Here are some examples of what they have asked for:Security and Compliance: Use data encryption and tokenization services from the Google Cloud security ecosystem for external encryption and de-identification. We’ve already started working with key partners like Protegrity and CyberRes Voltage on using these external functions as a mechanism to merge BigQuery into their security platform, which will help our mutual customers address strict compliance controls. Real Time APIs: Enrich BigQuery data using external APIs to obtain the latest stock price data, weather updates, or geocoding information.Code Migration: Migrate legacy UDFs or other procedural functions written in Node.js, Python, Go, Java, .NET, Ruby or PHP. Data Science: Encapsulate complex business logic and score BigQuery datasets by calling models hosted in Vertex AI or other Machine Learning platforms.Getting StartedLet’s go through the steps to use a BigQuery remote UDF. Setup the BigQuery Connection:   1. Create a BigQuery Connection      a. You may need to enable the BigQuery Connection APIDeploy a Cloud Function with your code:   1. Deploying your Cloud Function     a. You may need to enable Cloud Functions API     b. You may need to enable Cloud Build APIs   2. Grant the BigQuery Connection service account access to the Cloud Function     a. One way you can find the service account is by using the bq cli show commandcode_block[StructValue([(u’code’, u’bq show –location=US –connection $CONNECTION_NAME’), (u’language’, u”)])]Define the BigQuery remote UDF:    1. Create the remote UDFs definition within BigQuery      a. One way to find the endpoint name is to use the gCloud cli functions describe commandcode_block[StructValue([(u’code’, u’gcloud functions describe $FUNCTION_NAME’), (u’language’, u”)])]Use the BigQuery remote UDF in SQL:   1. Write a SQL statement as you would calling a UDF    2. Get your results! How remote functions can help you with common data tasksLet’s take a look at some examples of how using BigQuery with remote UDFs can help accelerate development and enhance data processing and analysis.Encryption and DecryptionAs an example, let’s create a simple custom encryption and decryption Cloud Function in Python. The encryption function can receive the data and return an encrypted base64 encoded string. In the same Cloud Function, the decryption function can receive an encrypted base64 encoded string and return the decrypted string. A data engineer would be able to enable this functionality in BigQuery.The Cloud Function receives the data and determines which function you want to invoke. The data is received as an HTTP request. The additional userDefinedContext fields allow you to send additional pieces of data to the Cloud Function.code_block[StructValue([(u’code’, u’def remote_security(request):rn request_json = request.get_json()rn mode = request_json[‘userDefinedContext’][‘mode’]rn calls = request_json[‘calls’]rn not_extremely_secure_key = ‘not_really_secure’rn if mode == “encryption”:rn return encryption(calls, not_extremely_secure_key)rn elif mode == “decryption”:rn return decryption(calls, not_extremely_secure_key)rn return json.dumps({“Error in Request”: request_json}), 400′), (u’language’, u”)])]The result is returned in a specific JSON formatted response that is returned to BigQuery to be parsed.code_block[StructValue([(u’code’, u’def encryption(calls,not_extremely_secure_key):rn return_value = []rn for call in calls:rn data = call[0].encode(‘utf-8′)rn cipher = AES.new(rn not_extremely_secure_key.encode(‘utf-8′)[:16],rn AES.MODE_EAXrn )rn cipher_text = cipher.encrypt(data)rn return_value.append(rn str(base64.b64encode(cipher.nonce + cipher_text))[2:-1]rn )rn return json.dumps({“replies”: return_value})’), (u’language’, u”)])]This Python code is deployed to Cloud Functions where it awaits to be invoked.Let’s add the User Defined Function to BigQuery so we can invoke it from a SQL statement. The additional user_defined_context is what is sent to Cloud Functions as additional context in the request payloadso you can use multiple remote functions mapped to one endpoint.code_block[StructValue([(u’code’, u’CREATE OR REPLACE FUNCTION `<project-id>.demo.decryption` (x STRING) RETURNS STRING REMOTE WITH CONNECTION `<project-id>.us.my-bq-cf-connection` OPTIONS (endpoint = ‘https://us-central1-<project-id>.cloudfunctions.net/remote_security’, user_defined_context = [(“mode”,”decryption”)])’), (u’language’, u”)])]Once we’ve created our functions, users with the right IAM permissions can use them in SQL on BigQuery.If you’re new to Cloud Functions, be aware that there are very minimal delays known as “cold starts”. The neat thing is you can call APIs as well, which is how our partners at Protegrity and Voltage enable their platforms to perform encryption and decryption of BigQuery data.Calling APIs to enrich your dataUsers, such as data analysts, can use the user defined functions created easily without needing other tools and moving the data out of BigQuery.You can enrich your dataset with many more APIs, for example, the Google Cloud Natural Language API to analyze sentiment on your text without having to use another tool.code_block[StructValue([(u’code’, u’def call_nlp(calls):rn return_value = []rn client = language_v1.LanguageServiceClient()rn for call in calls:rn text = call[0]rn document = language_v1.Document(rn content=text, type_=language_v1.Document.Type.PLAIN_TEXTrn )rn sentiment = client.analyze_sentiment(rn request={“document”: document}rn ).document_sentimentrn return_value.append(str(sentiment.score))rn return_json = json.dumps({“replies”: return_value})rn return return_json’), (u’language’, u”)])]Once the Cloud Function is deployed and the remote UDF definition is created on BigQuery, you are able to invoke the NLP API and return the data from it for use in your queries.Custom Vertex AI endpointData Scientists can integrate Vertex AI endpoints and other APIs, all from the SQL console for custom models. Remember, the remote UDFs are meant for scalar executions.You are able to deploy a model to a Vertex AI endpoint, which is another API, and then call that endpoint from Cloud Functions.code_block[StructValue([(u’code’, u’def predict_classification(calls):rn # Vertex AI endpoint detailsrn client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)rn endpoint = client.endpoint_path(rn project=project, location=location, endpoint=endpoint_idrn )rn # Call the endpoint for eachrn for call in calls:rn content = call[0]rn instance = predict.instance.TextClassificationPredictionInstance(rn content=content,rn ).to_value()rn instances = [instance]rn parameters_dict = {}rn parameters = json_format.ParseDict(parameters_dict, Value())rn response = client.predict(rn endpoint=endpoint, instances=instances, parameters=parametersrn )’), (u’language’, u”)])]Try it out todayTry out the BigQuery remote UDFs today!Related ArticleRead Article
Quelle: Google Cloud Platform

Introducing AlloyDB for PostgreSQL: Free yourself from expensive, legacy databases

Enterprises are struggling to free themselves from legacy database systems, and need an alternative option to modernize their applications. Today at Google I/O, we’re thrilled to announce the preview of AlloyDB for PostgreSQL, a fully-managed, PostgreSQL-compatible database service that provides a powerful option for modernizing your most demanding enterprise database workloads. Compared with standard PostgreSQL, in our performance tests, AlloyDB was more than four times faster for transactional workloads, and up to 100 times faster for analytical queries. AlloyDB was also two times faster for transactional workloads than Amazon’s comparable service. This makes AlloyDB a powerful new modernization option for transitioning off of legacy databases.As organizations modernize their database estates in the cloud, many struggle to eliminate their dependency on legacy database engines. In particular, enterprise customers are looking to standardize on open systems such as PostgreSQL to eliminate expensive, unfriendly licensing and the vendor lock-in that comes with legacy products. However, running and replatforming business-critical workloads onto an open source database can be daunting: teams often struggle with performance tuning, disruptions caused by vacuuming, and managing application availability. AlloyDB combines the best of Google’s scale-out compute and storage, industry-leading availability, security, and AI/ML-powered management with full PostgreSQL compatibility, paired with the performance, scalability, manageability, and reliability benefits that enterprises expect to run their mission-critical applications.As noted by Carl Olofson, Research Vice President, Data Management Software, IDC, “databases are increasingly shifting into the cloud and we expect this trend to continue as more companies digitally transform their businesses. With AlloyDB, Google Cloud offers large enterprises a big leap forward, helping companies to have all the advantages of PostgreSQL, with the promise of improved speed and functionality, and predictable and transparent  pricing.”AlloyDB is the next major milestone in our journey to support customers’ heterogeneous migrations. For example, we recently added Oracle-to-PostgreSQL schema conversion and data replication capabilities to our Database Migration Service, while our new Database Migration Program helps you accelerate your move to the cloud with tooling and incentive funding. “Developers have many choices for building, innovating and migrating their applications. AlloyDB provides us with a compelling relational database option with full PostgreSQL compatibility, great performance, availability and cloud integration. We are really excited to co-innovate with Google and can now benefit from enterprise grade features while cost-effectively modernizing from legacy, proprietary databases.”—Bala Natrajan, Sr. Director, Data Infrastructure and Cloud Engineering, PayPal Let’s dive into what makes AlloyDB uniqueWith AlloyDB, we’re tapping into decades of experience designing and managing some of the world’s most scalable and available database services, bringing the best of Google to the PostgreSQL ecosystem. At AlloyDB’s core is an intelligent, database-optimized storage service built specifically for PostgreSQL. AlloyDB disaggregates compute and storage at every layer of the stack, using the same infrastructure building blocks that power large-scale Google services such as YouTube, Search, Maps, and Gmail. This unique technology allows it to scale seamlessly while offering predictable performance.Additional investments in analytical acceleration, embedded AI/ML, and automatic tiering of data means that AlloyDB is ready to handle any workload you throw at it, with minimal management overhead.Finally, we do all this while maintaining full compatibility with PostgreSQL 14, the latest version of the advanced open source database, so you can reuse your existing development skills and tools, and migrate your existing PostgreSQL applications with no code changes, benefitting from the entire PostgreSQL ecosystem. Furthermore, by using PostgreSQL as the foundation of AlloyDB, we’re continuing our commitment to openness while delivering differentiated value to our customers.“We have been so delighted to try out the new AlloyDB for PostgreSQL service. With AlloyDB, we have significantly increased throughput, with no application changes to our PostgreSQL workloads. And since it’s a managed service, our teams can spend less time on database operations, and more time on value added tasks.”—Sofian Hadiwijaya, CTO and Co-Founder, Warung PintarWith AlloyDB you can modernize your existing applications with:1. Superior performance and scaleAlloyDB delivers superior performance and scale for your most demanding commercial-grade workloads. AlloyDB is four times faster than standard PostgreSQL and two times faster than Amazon’s comparable PostgreSQL-compatible service for transactional workloads. Multiple layers of caching, automatically tiered based on workload patterns, provide customers best-in-class price/performance.2. Industry-leading availabilityAlloyDB provides a high-availability SLA of 99.99% inclusive of maintenance. AlloyDB automatically detects and recovers from most database failures within seconds, independent of database size and load. AlloyDB’s architecture also supports non-disruptive instance resizing and database maintenance. The primary instance can resume normal operations in seconds, while replica pool updates are fully transparent to users. This ensures that customers have a highly reliable, continuously available database for their mission-critical workloads.“We are excited about the new PostgreSQL-compatible database. AlloyDB will bring more scalability and availability with no application changes. As we run our e-commerce platform and its availability is important, we are specially expecting AlloyDB to minimize the maintenance downtime.”—Ryuzo Yamamoto, Software Engineer, Mercari (​​Souzoh, Inc.)3. Real-time business insights AlloyDB delivers up to 100 times faster analytical queries than standard PostgreSQL. This is enabled by a vectorized columnar accelerator that stores data in memory in an optimized columnar format for faster scans and aggregations. This makes AlloyDB a great fit for business intelligence, reporting, and hybrid transactional and analytical workloads (HTAP). And even better, the accelerator is auto-populated, so you can improve analytical performance with the click of a button. “At PLAID, we are developing KARTE, a customer experience platform. It provides advanced real-time analytics capabilities for vast amounts of behavioral data to discover deep insights and create an environment for communicating with customers. AlloyDB is fully compatible with PostgreSQL and can transparently extend column-oriented processing. We think it’s a new powerful option with a unique technical approach that enables system designs to integrate isolated OLTP, OLAP, and HTAP workloads with minimal investment in new expertise. We look forward to bringing more performance, scalability, and extensibility to our analytics capabilities by enhancing data integration with Google Cloud’s other powerful database services in the future.”—Takuya Ogawa, Lead Product Engineer, PLAID4. Predictable, transparent pricingAlloyDB makes keeping costs in check easier than ever. Pricing is transparent and predictable, with no expensive, proprietary licensing and no opaque I/O charges. Storage is automatically provisioned and customers are only charged for what they use, with no additional storage costs for read replicas. A free ultra-fast cache, automatically provisioned in addition to instance memory, allows you to maximize price/performance.5. ML-assisted management and insights Like many managed database services, AlloyDB automatically handles database patching, backups, scaling and replication for you. But it goes several steps further by using adaptive algorithms and machine learning for PostgreSQL vacuum management, storage and memory management, data tiering, and analytics acceleration. It learns about your workload and intelligently organizes your data across memory, an ultra-fast secondary cache, and durable storage. These automated capabilities simplify management for DBAs and developers. AlloyDB also empowers customers to better leverage machine learning in their applications. Built-in integration with Vertex AI, Google Cloud’s artificial intelligence platform, allows users to call models directly within a query or transaction. That means high throughput, low-latency, and augmented insights, without having to write any additional application code.Get started with AlloyDBA modern database strategy plays a critical role in developing great applications faster and delivering new experiences to your customers. The AlloyDB launch is an exciting milestone for Google Cloud databases, and we’re thrilled to see how you use it to drive innovation across your organization and regain control and freedom of your database workloads.To learn more about the technology innovations behind AlloyDB, check out this deep dive into its intelligent storage system. Then, visit cloud.google.com/alloydb to get started and create your first cluster. You can also review the demos and launch announcements from Google I/O 2022.Related ArticleAlloyDB for PostgreSQL under the hood: Intelligent, database-aware storageIn this technical deep dive, we take a look at the intelligent, scalable storage system that powers AlloyDB for PostgreSQL.Read Article
Quelle: Google Cloud Platform