AWS-Glue-Crawlers unterstützen inkrementelles Crawling von Amazon S3 auf bestehenden Tabellen von AWS-Glue-Datenkatalog

AWS Glue umfasst Crawler, die auf Amazon-S3-Ereignisbenachrichtigungen basieren. Durch diese Funktion wird das Auffinden von Datensätzen vereinfacht, indem nur auf Amazon-S3-Ereignissen basierende Daten gescannt werden. Der Glue-Crawler extrahiert das Datenschema und pflegt es automatisch in den AWS-Glue-Datenkatalog ein, so dass die Metadaten immer aktuell sind. Durch das Crawlen von auf S3-Ereignissen basierenden Datensätzen wird die Zeit bis zum Erhalt von Erkenntnissen verkürzt, indem neu aufgenommene Daten schnell für die Analyse mit deinen bevorzugten Analyse- und Machine-Learning-Tools verfügbar gemacht werden.
Quelle: aws.amazon.com

Amazon EC2 High Memory-Instances sind jetzt in den Regionen Asien-Pazifik (Singapur), Kanada (Zentral) und AWS GovCloud (USA Ost) verfügbar

Ab heute sind Amazon EC2 High Memory-Instances mit 3TiB (u-3tb1.56xlarge) Speicher in den Regionen Asien-Pazifik (Singapur) und Kanada (Zentral) verfügbar. Darüber hinaus sind High-Memory-Instances mit 6 TiB (u-6tb1.56xlarge, u-6tb1.112xlarge) Speicher jetzt in der Region Kanada (Zentral) und High-Memory-Instances mit 12 TiB Speicher (u-12tb1.112xlarge) jetzt in AWS GovCloud (USA Ost) verfügbar. Kunden können diese neuen High-Memory-Instances mit den Kaufoptionen On Demand und Savings Plan nutzen.
Quelle: aws.amazon.com

Introducing automated failover for private workloads using Cloud DNS routing policies with health checks

High availability is an important consideration for many customers and we’re happy to introduce health checking for private workloads in Cloud DNS to build business continuity/disaster recovery (BC/DR) architectures. Typical BC/DR architectures are built using multi-regional deployments on Google Cloud. In a previous blog post, we showed how highly available global applications can be published using Cloud DNS routing policies. The globally distributed, policy-based DNS configuration provided reliability, but in case of a failure, it required manual intervention to update the geo-location policy configuration. In this blog we will use Cloud DNS health check support for Internal Load Balancers to automatically failover to health instances. We will use the same setup we used in the previous blog. We have an internal knowledge-sharing web application. It uses a classic two-tier architecture: front-end servers tasked to serve web requests from our engineers and back-end servers containing the data for our application. Our San Francisco, Paris, and Tokyo engineers will use this application, so we decided to deploy our servers in three Google Cloud regions for better latency, performance, and lower cost.High level designThe wiki application is accessible in each region via an Internal Load Balancer (ILB). Engineers use the domain name wiki.example.com  to connect to the front-end web app over Interconnect or VPN. The geo-location policy will use the Google Cloud region where the Interconnect or VPN lands as the source for the traffic and look for the closest available endpoint.DNS resolution based on the location of the userWith the above setup, if our application in one of the regions goes down, we have to manually update the geo-location policy and remove the affected region from the configuration. Until someone detects the failure and updates the policy, the end users close to that region will not be able to reach the application. Not a great user experience. How can we design this better? Google Cloud is introducing Cloud DNS health check support for Internal Load balancers. For an internal TCP/UDP load balancer, we can use the existing health checks for a back-end service, and Cloud DNS will receive direct health signals from the individual back-end instances. This enables automatic failover when the endpoints fail their health checks.For example, if the US frontend service is unhealthy, Cloud DNS may return the closest region load balancer IP (in our example, Tokyo’s) to the San Francisco clients depending on the latency.DNS resolution based on the location of the user and health of ILBs backendsEnabling the health checks for the wiki.example.com record provides us with automatic failover in case of a failure and ensures that Cloud DNS always returns only the healthy endpoints in response to the client queries. This removes manual intervention and significantly improves the failover time.The Cloud DNS routing policy configuration would look like this:Creating the Cloud DNS managed zone:code_block[StructValue([(u’code’, u’gcloud dns managed-zones create wiki-private-zone \rn –description=”DNS Zone for the front-end servers of the wiki application” \rn –dns-name=wiki.example.com \rn –networks=prod-vpc \rn –visibility=private’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ed50078bcd0>)])]Creating the Cloud DNS Record set:For health checking to work, we need to reference the ILB using the ILB forwarding rule name. If we use the ILB IP instead, then Cloud DNS will not check the health of the endpoint. See the official documentation page for more information on how to configure Cloud DNS routing policies with health checks.code_block[StructValue([(u’code’, u’gcloud dns record-sets create front.wiki.example.com. \rn–ttl=30 \rn–type=A \rn–zone=wiki-private-zone \rn–routing-policy-type=GEO \rn–routing-policy-data=”us-west2=us-ilb-forwarding-rule;europe-west1=eu-ilb-forwarding-rule;asia-northeast1=asia-ilb-forwarding-rule” \rn–enable-health-checking’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ed50078b750>)])]Note: Cloud DNS uses the health checks configured on the load balancers itself. Users do not need to configure any additional health checks for Cloud DNS. See the official documentation page for information on how to create health checks for GCP Load Balancers.With this configuration, if we were to lose the application in one region due to an incident, the health checks on the ILB would fail, and Cloud DNS would automatically resolve new user queries to the next closest healthy endpoint.We can expand this configuration to ensure that front-end servers send traffic only to healthy bank-end servers in the region closest to them. We would configure front-end servers to connect to the global hostname backend.wiki.example.com.The Cloud DNS geo-location policy with health checks will use the front-end servers’ GCP region information to resolve this hostname to the closest available healthy back-end tier Internal Load Balancer.Front-end to back-end communication (instance to instance)Putting it all together, we now have set up our multi-regional and multi-tiered application with DNS policies to automatically failover to a healthy endpoint closest to the end user.
Quelle: Google Cloud Platform

BigQuery’s performance and scale means that everyone gets to play

Editor’s note: Today, we’re hearing from telematics solutions company Geotab about how Google BigQuery enables them to democratize data across their entire organization and reduce the complexity of their data pipelines. Geotab’s telematics devices and an extensive range of integrated sensors and apps record a wealth of raw vehicle data, such as GPS, engine speeds, ambient air temperatures, driving patterns, and weather conditions. With the help of our telematics solutions, our customers gain insights that help them optimize fleet operations, improve safety, and reduce fuel consumption. Google BigQuery sits at the heart of our platform as the data warehouse for our entire organization, ingesting data from our vehicle telematics devices and all customer-related data. Essentially, each of the nearly 3 billion raw data records that we collect every day across the organization, goes into BigQuery, whatever its purpose. In this post, we’ll share why we leverage BigQuery to accelerate our analytical insights, and how it’s helped us solve some of our most demanding data challenges. Democratizing big data with easeAs a company, Geotab manages geospatial data, but the general scalability of our data platform is even more critical for us than specific geospatial features. One of our biggest goals is to democratize the use of data within the company. If someone has an idea to use data to inform some aspect of the business better, they should have the green light to do that whenever they want.Nearly every employee within our organization has access to BigQuery to run queries related to the projects that they have permission to see. Analysts, VPs, data scientists, and even users who don’t typically work with data have access to the environment to help solve customer issues and improve our product offerings.While we have petabytes of information, not everything is big—our tables range in size from a few megabytes up to several hundred terabytes. Of course, there are many tricks and techniques for optimizing performant queries in the BigQuery environment, but most users don’t have to worry about optimization, parallelization, or scalability.Google BigQuery sits at the heart of our platform as the data warehouse for our entire organization.The beauty of the BigQuery environment is that it handles all of that for us behind the scenes. If someone needs insight from data and isn’t a BigQuery expert, we want them to be as comfortable querying those terabytes as they are on smaller tables—and this is where BigQuery excels. A user can write a simple query just as easily on a billion rows as on 100 rows without once thinking about whether BigQuery can handle the load. It’s fast, reliable, and frees up our time to rapidly iterate on product ideation and data exploration.Geotab has thousands of dashboards and scheduled queries constantly running to provide insights for various business units across the organization. While we do hit occasional performance and optimization bumps, most of the time, BigQuery races through everything without a hiccup. Also, the fact that BigQuery is optimized for performance on small tables means we can spread our operations and monitoring across the organization without too much thought—20% of the queries we run touch less than 6 MB of data while 50% touch less than 800 MB. That’s why it’s important that BigQuery excels not only at scale but at throughput for more bite-sized applications. The confidence we have in BigQuery to handle these loads across so many disparate business units is part of why we continue to push for increasingly more teams to take a data-driven approach to their business objectives.Reducing the complexity of the geospatial data pipelineThe ability of BigQuery to manage vast amounts of geospatial data has also changed our approach to data science. On the scale we are operating, with tens of petabytes of data, it’s not feasible for us to operate with anything other than BigQuery. In the past, using open-source geospatial tools, we would hit limits at volumes of around 2.5 million data points. BigQuery allows us to model over 4 billion data points, which is game-changing. Even basic functions, such as ingesting and managing geospatial polygons, used to be a complex workflow to string together in Python with Dataflow. Now, those geographic data types are handled natively by BigQuery and can be streamed directly into a table. Even better, all of the analytics, model building, and algorithm development can happen in that same environment—without ever leaving BigQuery. No other solution that would provide geospatial model building and analytics at this scale in a single environment. Here’s an example. We have datasets of vehicle movements through intersections. Even just a few years ago, we struggled to run an intersection dataset at scale and had to limit its use to one city at a time. Today, we are processing all the intersection data for the entire world every day without ever leaving BigQuery. Rather than worry about architecting a complex data pipeline across multiple tools, we can focus on what we want to do with the data and the business outcomes we are trying to achieve. BigQuery is more than a data warehouseWe frequently deal with four or five billion data points in our analytics applications and BigQuery functions like a data lake. It’s not just our SQL database—it also easily supports all of our unstructured data, such as BLOBS from our CRM systems or GIS data files as well as images. It’s been a fascinating experience to see SQL consuming more and more unstructured data and applying a more relational structure that makes it consumable and familiar to analysts with traditional database management skills. A great example is BigQuery’s support for JSON functions, which allows us to take hierarchical non-uniform data structures of metadata from things like OpenStreetMap and store it natively in BigQuery with easy access to descriptive keys and values. As a result, we can hire a wider range of analysts for roles across the business, not just PhD-level data scientists, knowing they can work effectively with the data in BigQuery. Even within our data science team, most of the things that we needed Python to accomplish a few years ago can now be done in SQL. That allows us to spend more time deriving insights rather than managing extended parts of the data pipeline. We also leverage SQL capabilities, such as stored procedures, to run within the data warehouse and churn through billions of data points with a five-second latency.The ease of using SQL to access this data has made it possible to democratize data across our company and give everyone the opportunity to use data to improve outcomes for our customers and develop interesting new applications. Reimagining innovation with Google CloudOver the years, we haven’t stayed with BigQuery because we have to—we want to. Google Cloud is helping us drive the insights that will fuel our future and the future of all organizations looking to raise the bar with data-driven insights and intelligence. BigQuery’s capabilities have continued to evolve along with our needs, with the addition of increasingly complex analytics, data science methodologies, geospatial support, and BQML. BigQuery offers Geotab an environment that provides a unique ability to manage, transform and analyze geospatial data at enormous scale. It also makes it possible to aggregate all kinds of other structured and unstructured data needed for our business into a single source of truth—against which all of our analytics can be performed.
Quelle: Google Cloud Platform

How UX researchers make Google Cloud better with user feedback

Customer experiences are critical to user experience (UX) researchers at every level of developing Google Cloud products. Whether it’s migrating a user to Google Cloud, helping them understand it once they are there, or delving into using Cloud services, one thing is clear: learning from the people who use Google Cloud is fundamental.Understanding our usersUX researchers touch various points of the customer journey, like migration, cloud operations, and data analytics. In each of these areas, understanding the customer’s business needs and goals grants the UX researcher greater insight into how to provide the best possible experience. This is primarily done by engaging with user feedback. From widely-used products like Google Kubernetes Engine and BigQuery, to the targeted solutions of the Recommendation API and Error Reporting tools, Google Cloud’s team of UX researchers are pursuing a deep understanding of customer’s workflows and pain points. Between in-person and remote sessions with UX researchers and online surveys, our Google User Experience Research program offers a range of options for customers to engage in user feedback.Applying insights to our productsGoogle’s researchers work with our product development team to act on user feedback. Insights learned during one of Google Cloud’s early customer migrations resulted in the co-creation of our beta and general availability versions of Migrate for Windows Containers in Google Slides. Not only did this underscore the importance of proactive and collaborative customer integration, but it was well received in the developer community. The user group also broadened the Google Cloud team’s perspective on the Error Reporting system by requesting that the product solutions expand to handle more categories of errors and events. As a result, the Error Reporting System now catches more types of issues, which are also presented to customers, making the reports more useful for them. Advocating for user insightsResearchers are the champions of the usability in applications, a principle which guides UX researchers as they tap into real world customer experiences to inspire new roadmaps and ways of working. This process is made possible when users share their lived experiences with our researchers. If you are interested in helping Google Cloud become more helpful for everyone, sign-up to be a part of the Cloud UX team’s user participant pool here.
Quelle: Google Cloud Platform