August 2021 - Seite 31 von 52 - Cloud Computing Köln

Troubleshooting production issues with virtual machines (VMs) can be complex and often requires correlating multiple data points and signals across infrastructure and application metrics, as well as raw logs. When your end users are experiencing latency, downtime, or errors, switching between different tools and UIs to perform a root cause analysis can slow your developers down. Saving time when accessing the necessary data, deploying fixes, and verifying those fixes can save your organization money and keep the confidence of your users.We are happy to announce the general availability of an enhanced “in-context” set of UI-based tools for Compute Engine users to help make the troubleshooting journey easier and more intuitive. From the Google Console, developers can click into any VM and access a rich set of pre-built visualizations designed to give insights into common scenarios and issues associated with CPU, Disk, Memory, Networking, and live processes. With access to all of this data in one location, you can easily correlate between signals over a given timeframe. Bringing more operations data to your VMsA collection of high-level metrics has always been available in the Compute Engine console page. However, your feedback let us know that you still had to navigate between different tools to perform a proper root cause analysis. For example, seeing that CPU utilization peaked during a certain time frame might be a helpful starting point, but resolving the issue will require a deeper understanding of what is driving the utilization. Furthermore, you will want to correlate this data with processes, and other signals such as I/O wait time versus user space versus kernel space. With this in mind, we added metrics, charts, and a variety of new visualizations to the Compute Engine page, many requiring zero setup time. Some of these new additions are populated with in-depth metrics provided by the Google Cloud Ops Agent (or legacy agents if you’re currently using them), which can easily be installed via Terraform, Puppet, Ansible or an install script.Some of the observability data available when you click into your VMNew charts that leverage the metrics from the Ops Agent include: CPU utilization as reported by the OS, Memory Utilization, Memory breakdown by User, Kernel, and Disk Cache, I/O Latency, Disk Utilization and Queue Length, Process Metrics, and many more.A more detailed look at the data available when you click into the VM, including metrics and logsWhile no single troubleshooting journey fits all needs, this enhanced set of observability tools should make the following scenarios faster and more intuitive:Identifying networking changes via metrics and logs. By comparing unexpected increases in network traffic, network packet size, or spikes in new network connections against logs by severity, developers might identify a correlation between this traffic increase and critical logs errors. By further navigating to the Logs section of the tools, one can quickly filter to critical logs only and expand sample log messages to discover detailed logs around timeout messages or errors caused by the increased load. Deep links to the Logs Explorer filtered to the VM of interest allows for fast and seamless navigation between Compute Engine and Cloud Logging.Determining the impact of specific processes on utilization. By comparing times of high CPU or memory utilization against top processes, operators can determine whether a specific process (as denoted by command line or PID) is over-consuming. They can then refactor or terminate a process altogether, or choose to run a process on a machine better suited for its compute and memory requirements. Alternatively, there may be many short-lived processes that do not show up in the processes snapshot, but are visible as a spike in the Process Creation Rate chart. This can lead to a decision to refactor so that process duration is distributed more efficiently.Choosing appropriate disk size for workloads. A developer may notice that the “Peak 1-second IOPS” have begun to hit a flat line, indicating the disk is hitting a performance limit. If the “I/O Latency Avg” also shows a corresponding increase, this could indicate that I/O throttling is occurring. Finally, breaking this down the Peak IOPS by Storage Type, one might see that Persistent Disk SSD is responsible for the majority of the peak IOPS, which could lead to a decision to increase the size of the disk to get a higher block storage performance limit.Security Operations and Data Sovereignty. Operators may be in charge of enforcing security protocols around external data access, or creating technical architecture for keeping data within specific regions for privacy and regulatory compliance. Using the Network Summary, operators can determine at a glance whether a VM is establishing connections and sending traffic primarily to VMs and Google services within the same project, or whether connections and traffic ingress / egress may be occurring externally. Likewise, operators can determine whether new connections are being established or traffic is being sent to different regions or zones, which may lead to new protocols to block inter-region data transfer. Cost optimization via networking changes. A developer may notice that the majority of VM to VM traffic is being sent inter-region, as opposed to traffic remaining in the same region. Because this inter-region traffic is slower and is charged at an inter-region rate, the developer can choose to reconfigure the VM to communicate instead with local replicas of the data it needs in its same region, thus reducing both latency and cost.Measuring and tuning memory performance. The Ops Agent is required for most VM families to collect memory utilization. By examining the memory usage by top processes, a developer may detect a memory leak, and reconfigure or terminate the offending process. Additionally, an operator may examine the breakdown of memory usage by type and notice that disk cache usage has hit the limit of using all memory not in use by applications, correlating with an increase in disk latency. They may choose to upsize to a memory-optimized VM to allow enough memory for both applications and disk caching.These are just a few of the use cases where your team may leverage these new capabilities to spend less time troubleshooting, optimize for costs, and improve your overall experience with Compute Engine. Get Started TodayTo get started, navigate to Compute Engine > VM Instances, click into a specific VM of interest, and navigate to the Observability tab. You can also check out our developer docs for additional guidance on how to use these tools to troubleshoot VM performance, and we recommend installing the Ops Agent on your Compute Engine VMs to get the most out of these new tools. If you have specific questions or feedback, please join the discussion on our Google Cloud Community, Cloud Operations page.Related ArticleAgent installation options for Google Cloud VMsGoogle Cloud makes it easy to install agents on your single VMs or your whole fleet of VMs so you can collect data for monitoring and tro…Read Article
Quelle: Google Cloud Platform

13. August 2021

da Agency

Saving the day: How Cloud SQL makes data protection easy

If you’re managing a crucial application that has to be fully fault-tolerant, you need your system to be able to handle every fault, no matter the type and scope of failure, with minimal downtime and data loss. Protecting against these faults means juggling numerous variables that can impact performance as well as recovery time and cost.Today’s managed database services take over the operational complexity that used to exist for database administrators. Growing your organization’s tolerance required adding machines, compute, and storage, plus the operational costs of IT management: performing backups, writing scripts, creating dashboards, and carrying out testing to make sure your platform is ready when problems arise–all in a secure way.At Google, our Cloud SQL managed database service offers three fault tolerance mechanisms —backup, high availability, and replication—and there are three major factors to consider for each of them:RTO (recovery time objective): When a failure happens, how much time can be lost before significant harm occurs?RPO (recovery point objective): When a failure happens, how much data can be lost before significant harm occurs?Cost: How cost-effective is this solution? We’ve heard from customers like Major League Baseball, HSBC, and Equifax that they have strict data-protection needs and require highly fault-tolerant multi-region applications—and they’ve all chosen Cloud SQL to meet those needs. Let’s take a closer look at how the decision-making process plays out for each recovery solution. High availability (HA)If your application is business critical, you require minimum RTO and zero RPO— a high availability configuration ensures that you and your customers are protected. If the primary instance fails, there’s another standby instance ready to take over with no data loss. There’s an additional cost here, but doing this manually brings a great operational cost, since you have to detect and verify the fault, do the failover, and make sure it’s correct—you can’t have two primary instances or you risk data corruption—then finally connect the application to the new database.Cloud SQL removes all that complexity. Choose high availability for a given instance and we’ll replicate the data across multiple zones, synchronously, to each zone’s persistent disk. If an HA instance has a failure, you don’t have to think about when to fail over because Cloud SQL detects the failure and automatically initiates failover, for a full recovery and no data loss within minutes. Cloud SQL also moves the IP address during failover so your application can easily reconnect. MLB, for example, uses Cloud SQL high availability to serve prediction data to live games with minimal downtime. Dev/test instances don’t need those same guarantees, but can use local backups to recover from any potential failure.Cross-region replicaIf a whole Google Cloud region goes down you still need your business to continue to run. That’s where cross-region replication comes in, a hot standby replica in another Google Cloud region provides RTO of minutes and RPO typically less than a minute . If you create a read replica in a region separate from your primary instance and you get hit with a regional outage, your application and database can start serving customers from another region within minutes. But this solution can be complex and enabling it yourself can be difficult and time-consuming. Securing cross-geography traffic demands end-to-end encryption and can bring connectivity issues too. This is where the fully managed Cloud SQL solution shines. We offer MySQL, PostgreSQL and SQL Server database engines as a cross-region replication solution that’s easily configured and bolstered by Google’s interconnected global network. Just say, “I’m in U.S. East, I want to create a replica in U.S. West,” and it’s done, reliably and securely.BackupWhen you suffer data loss because of an operations error (for example, a bug in a script dropped your tables) or human error (for example, someone dropped the wrong table by accident), backups help you restore lost data to your Cloud SQL instance. Our low cost backup mechanism features point-in-time, granular recovery, meaning that if you accidentally delete data or something else goes wrong, you can ask for recovery of, for example, the state of that database down to the millisecond, such as Monday at 12:53pm. Your valuable data is replicated multiple times in multiple geographic locations automatically. This enables the automatic handling of failover in cases of major failure. You can always rest assured that your database is available and data is secure, even in the times of major failure crises. Cloud SQL provides automated and on-demand backups. With automated backups, Google manages the backups so that you can easily restore them when required. Also, the scheduled backing is automatically taken by default. With on-demand backup, you can create a backup at any time. This could be useful if you are about to perform a risky operation on your database, as Cloud SQL lets you select a custom location for your backup data. When the backup is stored in multiple regions, and there’s an outage in the region that contains the source instance, you can restore a backup to a new or existing instance in a different region. This is also useful if your organization needs to comply with data residency regulations that require you to keep your backups within a specific geographic boundary.Putting it all togetherFor critical workloads, MLB configures their Cloud SQL instances with backups, high availability, and cross-region replication. Doing so ensures they can recover from many failure types.To recover from human error (“Oops, I didn’t mean to delete that”), MLB uses backups and point-in-time recovery to recovery to a millisecond or specific database transactionTo automatically recover from primary instance failures and zonal outages, MLB uses Cloud SQL’s high availability configurationTo protect against regional outages, MLB uses cross-region replicationCreating a robust configuration, like MLB did, takes just a few minutes. Get started in our Console or review documentation.Related ArticleThe 5 benefits of Cloud SQL [infographic]Check out this infographic on the 5 benefits of Cloud SQL, Google Cloud’s managed database service for MySQL, PostgreSQL and SQL Server.Read Article
Quelle: Google Cloud Platform

13. August 2021

da Agency

BigQuery Admin reference guide: Data governance

Hopefully you’ve been following along with our BigQuery Admin series and are well on your way to getting ramped up with BigQuery. Now that you’re equipped with the fundamentals, let’s talk about something that’s relevant for all data professionals – data governance. What does data governance mean?Data governance is everything you do to ensure your data is secure, private, accurate, available, and usable inside of BigQuery. With good governance, everyone in your organization can easily find – and leverage – the data they need to make effective decisions. All while minimizing the overall risk of data leakage or misuse, and ensuring regulatory compliance. BigQuery security featuresBecause BigQuery is a fully-managed service, we take care of a lot of the hard stuff for you! Like we talked about in our post on BigQuery Storage Internals, BigQuery data is replicated across data centers to ensure reliability and availability. Plus data is always encrypted at rest. By default, we’ll manage encryption keys for you. However, you have the option to leverage customer managed encryption keys, by using Cloud KMS to automatically rotate and destroy encryption keys. You can also leverage Google Virtual Private Cloud (VPC) Service Controls to restrict traffic to BigQuery. When you correctly apply these controls, unauthorized networks can’t access BigQuery data, and data can’t be copied to unauthorized Google Cloud projects. Free communication can still occur within the perimeter, but communication is restricted across the perimeter.Aside from leveraging BigQuery’s out-of-the-box security features, there are also ways to improve governance from a process perspective. In this post, we’ll walk you through the different tactics to ensure data governance at your organization. Dataset onboarding: Understanding & classifying data Data governance starts with dataset onboarding. Let’s say you just received a request from someone on your eCommerce team to add a new dataset that contains customer transactions. The first thing you’ll need to do is understand the data. You might start by asking questions like these:What information does this contain?How will it be used to make business decisions?Who needs access to this data?Where does the data come from, and how will analysts get access to it in BigQuery?Understanding the data helps you make decisions on where the new table should live in BigQuery, who should have access to this data, and how you’ll plan to make the data accessible inside of BigQuery (e.g. leveraging an external table, batch loading data into native storage, etc.). For this example, the transactions live in an OLTP database. Let’s take a look at what information is contained in the existing table in our database. Below, we can see that this table has information about the order (when it was placed, who purchased it, any additional comments for the order), and details on the items that were purchased (the item ID, cost, category, etc.).Click to enlargeNow that we have an idea of what data exists in the source, and what information is relevant for the business, we can determine which fields we need in our BigQuery table and what transformations are necessary to push the data into a production environment.Classifying informationData classification means that you are identifying the types of information contained in the data, and storing it as searchable metadata. By properly classifying data you can make sure that it’s handled and shared appropriately, and that data is discoverable across your organization. Since we know what the production table should look like, we can go ahead and create an empty BigQuery table, with the appropriate schema, that will house the transactions. As far as storing metadata about this new table, we have two different options. Using labelsOn the one hand, we can leverage labels. Labels can be used on many BigQuery resourcesincluding Projects, Datasets and Tables. They are key:value pairs and can be used to filter data in Cloud Monitoring, or can be used in queries against the Information Schema to find data that pertains to specific use cases.Click to enlargeAlthough labels provide logical segregation and management of different business purposes in the Cloud ecosystem, they are not meant to be used in the context of data governance. Labels cannot specify a schema, and you can’t apply them to specific fields in your table. Labels cannot be used to establish access policies or track resource hierarchy. It’s pretty clear that our transactions table may contain personally identifiable information (PII). Specifically, we may want to mark the email address column as “Has_PII” : True. Instead of using labels on our new table, we’ll leverage Data Catalog to establish a robust data governance policy, incorporating metadata tags on BigQuery resources and individual fields.Using data catalog tagsData Catalog is Google Cloud’s data discovery and metadata management service. As soon as you create a new table in BigQuery, it is automatically discoverable in Data Catalog. Data Catalog tracks all technical metadata related to a table, such as name, description, time of creation, column names and datatypes, and others.In addition to the metadata that is captured through the BigQuery integration, you can create schematized tagsto track additional business information. For example, you may want to create a tag that tracks information about the source of the data, the analytics use case related to the data, or column-level information related to security and sharing. Going back to that email column we mentioned earlier, we can simply attach a column-level governance tag to the field and fill out the information by specifying that email_address is not encrypted, it does contain PII, and more specifically it contains an email address.While this may seem like a fairly manual process, Data Catalog has a fully equipped API which allows for tags to be created, attached and updated programmatically. With tags and technical metadata captured in a single location, data consumers can come to Data Catalog and search for what they need.Ingesting & staging dataWith metadata for the production table in place, we need to focus on how to push data into this new table. As you probably know, there are lots of different ways to pre-process and ingest data into BigQuery. Often customers choose to stage data in Google Cloud Services to kick off transformation, classification or de-identification workflows. There are two pretty common paths for staging data for batch loading:Stage data in a Google Cloud storage bucket: Pushing data into a Google Cloud storage bucket before directly ingesting it into BigQuery offers flexibility in terms of data structure and may be less expensive for storing large amounts of information. Additionally, you can easily kick off workflows when new data lands in a bucket by using PubSub to trigger transformation jobs. However, since transformations will happen outside of the BigQuery service, data engineers will need familiarity with other tools or languages. Blob storage also makes it difficult to track column-level metadata.Stage data in a BigQuery staging container: Pushing data into BigQuery gives you the opportunity to track metadata for specific fields earlier in the funnel, through BigQuery’s integration with Data Catalog. When running scan jobs with Data Loss Prevention (we’ll cover this in the next section), you can leave out specific columns and store the results directly in the staging table’s metadata inside of Data Catalog. Additionally, transformations to prepare data for production can be done using SQL statements, which may make them easier to develop and manage. Identifying (and de-identifying) sensitive information One of the hardest problems related to data governance is identifying any sensitive information in new data. Earlier we talked through tracking known metadata in Data Catalog, but what happens if we don’t know if data contains any sensitive information? This might be especially useful for free-form text fields, like the comments field in our transactions. With the data staged in Google Cloud, there’s an opportunity to programmatically identify any PII, or even remove sensitive information from the data, using Data Loss Prevention (DLP). DLP can be used to scan data for different types of sensitive information such as names, email addresses, locations, credit card numbers and others. You can kick off a scan job directly from BigQuery, Data Catalog, or the DLP service or API. DLP can be used to scan data that is staged in BigQuery or in Google Cloud. Additionally, for data stored in BigQuery, you can have DLP push the results of the scan directly into Data Catalog.You can also use the DLP API to de-identify data. For example, we may want to replace any instances of names, email addresses and locations with an asterisk (“*”). In our case, we can leverage DLP specifically to scan the comments column from our staging table in BigQuery, save the results in Data Catalog, and, if there are instances of sensitive data, run a de-identification workflow before pushing the sanitized data into the production table. Note that building a pipeline like the one we’re describing does require the use of some other tools. We could use a Cloud Function to make the API call, and an orchestration tool like Cloud Composer to run each step in the workflow (trying to decide on the right orchestration tool? check out this post). You can walk through an example of running a de-identification workflow using DLP and composer in this post.Data sharingBigQuery Identity Access ManagementGoogle Cloud as a whole leverages Identity Access Management (IAM) to manage permissions across cloud resources. With IAM, you manage access control by defining who (identity) has what access (role) for which resource. BigQuery, like other Google Cloud resources, has several predefined roles. Or you can create custom roles based on more granular permissions.When it comes to granting access to BigQuery data, many administrators chose to grant Google Groups, representing your company’s different departments, access to specific datasets or projects – so policies are simple to manage. You can see some examples of different business scenarios and the recommended access policies here. In our retail use case, we have one project for each team. Each team’s Google Group would be granted the BigQuery Data Viewer role to access information stored in their team’s project. However, there may be cases where someone from the ecommerce team needs data from a different project – like the product development team project. One way to grant limited access to data is through the use of authorized views.Protecting data with authorized viewsGiving a view access to a dataset is also known as creating an authorized view in BigQuery. An authorized view allows you to share query results with particular users and groups without giving them access to the underlying source data. So in our case, we can simply write a query to grab the pieces of information the ecommerce team needs to effectively analyze the data and save that view into the existing ecommerce project that they already have access to.Column-level access policiesAside from controlling access to data using standard IAM roles, or granting access to query results through authorized views, you also can leverage BigQuery’s column-level access policies. For example, remember that email address column we marked as containing PII earlier in this post? We may want to ensure that only members with high-security level clearance have access to query those columns. We can do this by:First defining a taxonomy in Data Catalog, including a “High” policy tag for fields with high-security level clearanceNext, add our group of users who need access to highly sensitive data as Fine Grained Access Readers to the High resourceFinally, we can set a policy tag on the email columnYou can find some tips on creating column-level access policies in our documentation on best practices.Row-level access policiesAside from restricting access to certain fields in our new table, we may want to only grant users access to rows that are relevant to them. One example may be if analysts from different business units only get access to rows that represent transactions for that business unit. In this case, the Google Group that represents the Activewear Team should only have access to orders that were placed on items categorized as “Active”. In BigQuery, we can accomplish this by creating a row-level access policy on the transactions table.You can find some tips on creating row-level access policies in our documentation on best practices.When to use what for data sharingAt the end of the day, you can achieve your goal of securing data using one or more of the concepts we discussed earlier. Authorized Views add a layer of abstraction to sharing data by providing the necessary information to certain users without giving them direct access to the underlying dataset. For cases where you want to transform (e.g. pre-aggregate before sharing) – authorized views are ideal. While authorized views can be used for managing column-level access, it may be preferable to leverage Data Catalog as you can easily centralize access knowledge in a single table’s metadata and control access through hierarchical taxonomies. Similarly, leveraging row level access policies, instead of authorized views to filter out rows, may be preferable in cases where it is easier to manage a single table with multiple access policies instead of multiple authorized views in different places. Monitoring data qualityOne last element of data governance that we’ll discuss here is monitoring data quality. The quality of your BigQuery data can drop for many different reasons – maybe there was a problem in the data source, or an error in your transformation pipeline. Either way, you’ll want to know if something is amiss and have a way to inform data consumers at your organization. Just like we described earlier, you can leverage an orchestration tool like Cloud Composer to create pipelines for running different SQL validation tests.Validation tests can be created in a few different ways:One option is to leverage open source frameworks, like this one that our professional services team put together. Using frameworks like these, you can declare rules for when validation tests pass or failSimilarly, you can use a tool like Dataform – which offers the ability to leverage YAML files to declare validation rules. Dataform recently came under the Google Cloud umbrella and will be open to new customers soon, join the waitlist here!Alternatively, you could always roll your own solution by programmatically running queries using built in BigQuery functionality like ASSERT, if the assertion is not valid then BigQuery will return an error that can inform the next step in your pipelineBased on the outcome of the validation test, you can have Composer send you a notification using Slack or other built-in-notifiers. Finally, you can use Data Catalog’s API to update a tag that tracks the data quality for the given table. Check out some example code here! With this information added to Data Catalog, it becomes searchable by data consumers at your organization so that they can stay informed on the quality of information they use in their analysis. What’s next?One thing that we didn’t mention in this post, but is certainly relevant to data governance, is ongoing monitoring around usage auditing and access policies. We’ll be going into more details on this in a few weeks when we cover BigQuery monitoring as a whole. Be sure to keep an eye out for more in this series by following me on LinkedIn and Twitter!Related ArticleBigQuery Admin reference guide: Query processingBigQuery is capable of some truly impressive feats, be it scanning billions of rows based on a regular expression, joining large tables, …Read Article
Quelle: Google Cloud Platform