Cloud Computing Köln - Seite 1540 von 6935 - Neues zu Cloud Computing, Internet of Things und Technologien

medium.com – Compass has been growing rapidly over the last couple of years. As we became the largest independent brokerage in terms of 2021 closed sales volume as of March 25, 2022, our tech team has been…
Quelle: news.kubernauts.io

10. Dezember 2022

da Agency

How StreamNative facilitates integrated use of Apache Pulsar through Google Cloud

StreamNative, a company founded by the original developers of Apache Pulsar and Apache BookKeeper, is partnering Google Cloud to build a streaming platform on open source technologies. We are dedicated to helping businesses generate maximum value from their enterprise data by offering effortless ways to realize real-time data streaming. Following the release of StreamNative Cloud in August 2020, which provides scalable and reliable Pulsar-Cluster-as-a-Service, we introduced StreamNative Cloud for Kafka. This is to enable a seamless switch between Kafka API and Pulsar. We then launched StreamNative Platform to support global event streaming data platforms in multi-cloud and hybrid-cloud environments.By leveraging our fully-managed Pulsar infrastructure services, our enterprise customers can easily build their event-driven applications with Apache Pulsar and get real-time value from their data. There are solid reasons why Apache Pulsar has become one of the most popular messaging platforms in modern cloud environments, and we have strong beliefs in its capabilities of simplifying building complex event-driven applications. The most prominent benefits of using Apache Pulsar to manage real-time events include:Single API: When building a complex event-driven application, it traditionally requires linking multiple systems to support queuing, streaming and table semantics. Apache Pulsar frees developers from the headache of managing multiple APIs by offering one single API that supports all messaging-related workloads.Multi-tenancy: With the built-in multi-tenancy feature, Apache Pulsar enables secure data sharing across different departments with one global cluster. This architecture not only helps reduce infrastructure costs, but also avoids data silos.Simplified application architecture: Pulsar clusters can scale to millions of topics while delivering consistent performance, which means that developers don’t have to restructure their applications when the number of topic-partitions surpasses hundreds. The application architecture can therefore be simplified.Geo-replication: Apache Pulsar supports both synchronous and asynchronous geo-replication out-of-the-box, which makes building event-driven applications in multi-cloud and hybrid-cloud environments very easy.Facilitating integration between Apache Pulsar and Google CloudTo allow our customers to fully enjoy the benefits of Apache Pulsar, we’ve been working on expanding the Apache Pulsar ecosystem by improving the integration between Apache Pulsar and powerful cloud platforms like Google Cloud. In mid-2022, we added Google Cloud Pub/Sub Connector for Apache Pulsar, which enables seamless data replication between Pub/Sub and Apache Pulsar, and Google Cloud BigQuery Sink Connector for Apache Pulsar, which synchronizes Pulsar data to BigQuery in real time, to the Apache Pulsar ecosystem.Google Cloud Pub/Sub Connector for Apache Pulsar uses Pulsar IO components to realize fully-featured messaging and streaming between Pub/Sub and Apache Pulsar, which has its own distinctive features. Using Pub/Sub and Apache Pulsar at the same time enables developers to realize comprehensive data streaming features on their applications. However, it requires significant development effort to establish seamless integration between the two tools, because data synchronization between different messaging systems depends on the functioning of applications. When applications stop working, the message data cannot be passed on to the other system.Our connector solves this problem by fully integrating with Pulsar’s system. There are two ways to import and export data between Pub/Sub and Pulsar. The first, is the Google Cloud Pub/Sub source that feeds data from Pub/Sub topics and writes data to Pulsar topics. Alternatively, the Google Cloud Pub/Sub sink can pull data from Pulsar topics and persist data to Pub/Sub topics. Using Google Cloud Pub/Sub Connector for Apache Pulsar brings three key advantages:Code-free integration: No code-writing is needed to move data between Apache Pulsar and Pub/Sub.High scalability: The connector can be run on both standalone and distributed nodes, which allows developers to build reactive data pipelines in real time to meet operational needs.Less DevOps resources required: The DevOps workloads of setting up data synchronization are greatly reduced, which translates into more resources to be invested in unleashing the value of data.By using the BigQuery Sink Connector for Apache Pulsar, organizations can write data from Pulsar directly to BigQuery. This is unlike before, where developers could only use Cloud Storage Sink Connector for Pulsar to move data to Cloud Storage, and then query the imported data with external tables in BigQuery which had many limitations, including low query performance and no support for clustered tables.Pulling data from Pulsar topics and persisting data to BigQuery tables, our BigQuery sink connector supports real-time data synchronization between Apache Pulsar and BigQuery. Just like our Pub/Sub connector, Google Cloud BigQuery Sink Connector for Apache Pulsar is a low-code solution that supports high scalability and greatly reduces DevOps workloads. Furthermore, our BigQuery connector possesses the Auto Schema feature, which automatically creates and updates BigQuery table structures based on the Pulsar topic schemas to ensure smooth and continuous data synchronization.Simplifying Pulsar resource management on KubernetesAll the products of StreamNative are built on Kubernetes, and we’ve been developing tools that can simplify resource management on Kubernetes platforms like Google Cloud Kubernetes (GKE). In August 2022, we introduced Pulsar Resources Operator for Kubernetes, which is an independent controller that provides automatic full lifecycle management for Pulsar resources on Kubernetes.Pulsar Resources Operator uses manifest files to manage Pulsar resources, which allows developers to get and edit resource policies through the Topic Custom Resources that render the full field information of Pulsar policies. It enables easier Pulsar resource management compared with using command line interface (CLI) tools, because developers no longer need to remember numerous commands and flags to retrieve policy information. Key advantages of using Pulsar Resources Operator for Kubernetes include:Easy creation of Pulsar resources: By applying manifest files, developers can swiftly initialize basic Pulsar resources in their continuous integration (CI) workflows when creating a new Pulsar cluster.Full integration with Helm: Helm is widely used as a package management tool in cloud-native environments. Pulsar Resource Operator can seamlessly integrate with Helm, which allows developers to manage their Pulsar resources through Helm templates.How you can contributeWith the release of Google Cloud Pub/Sub Connector for Apache Pulsar, Google Cloud BigQuery Sink Connector for Apache Pulsar, and Pulsar Resources Operator for Kubernetes, we have unlocked the application potential of open tools like Apache Pulsar by making them simpler to build, easier to manage, and extended their capabilities. Now, developers can build and run Pulsar clusters more efficiently and maximize the value of their enterprise data. These three tools are community-driven services and have their source codes hosted in the StreamNative GitHub repository. Our team welcomes all types of contributions for the evolution of our tools. We’re always keen to receive feature requests, bug reports and documentation inquiry through GitHub, emails or Twitter.
Quelle: Google Cloud Platform

10. Dezember 2022

da Agency

How to build comprehensive customer financial profiles with Elastic Cloud and Google Cloud

Financial institutions have vast amounts of data about their customers. However, many of them struggle to leverage data to their advantage. Data may be sitting in silos or trapped on costly mainframes. Customers may only have access to a limited quantity of data, or service providers may need to search through multiple systems of record to handle a simple customer inquiry. This creates a hazard for providers and a headache for customers. Elastic and Google Cloud enable institutions to manage this information. Powerful search tools allow data to be surfaced faster than ever – Whether it’s card payments, ACH (Automated Clearing House), wires, bank transfers, real-time payments, or another payment method. This information can be correlated to customer profiles, cash balances, merchant info, purchase history, and other relevant information to enable the customer or business objective. This reference architecture enables these use cases:1. Offering a great customer experience: Customers expect immediate access to their entire payment history, with the ability to recognize anomalies. Not just through digital channels, but through omnichannel experiences (e.g. customer service interactions).2. Customer 360: Real-time dashboards which correlates transaction information across multiple variables, offering the business a better view into their customer base, and driving efforts for sales, marketing, and product innovation.Customer 360: The dashboard above looks at 1.2 billion bank transactions and gives a breakdown of what they are, who executes them, where they go, when and more. At a glance we can see who our wealthiest customers are, which merchants our customers send the most money to, how many unusual transactions there are – based on transaction frequency and transaction amount, when folks spend money and what kind spending and income they have.3. Partnership management: Merchant acceptance is key for payment providers. Having better access to present and historical merchant transactions can enhance relationships or provide leverage in negotiations. With that, banks can create and monetize new services.4. Cost optimization: Mainframes are not designed for internet-scale access. Along-side with technological limitation, the cost becomes a prohibitive factor. While Mainframes will not be replaced any time sooner, this architecture will help to avoid costly access to data to serve new applications.5. Risk reduction: By standardizing on the Elastic Stack, banks are longer limited in the number of data sources they can ingest. With this, banks can better respond to call center delays and potential customer-facing impacts like natural disasters. By deploying machine learning and alerting features, banks can detect and stamp out financial fraud before it impacts member accounts.Fraud detection: The Graph feature of Elastic helped a financial services company to identify additional cards that were linked via phone numbers and amalgamations of the original billing address on file with those two cards. The team realized that several credit unions, not just the original one where the alert originated from, were being scammed by the same fraud ring.ArchitectureThe following diagram shows the steps to move data from Mainframe to Google Cloud, process and enrich the data in BigQuery, then provide comprehensive search capabilities through Elastic Cloud.This architecture includes the following components:Move Data from Mainframe to Google CloudMoving data from IBM z/OS to Google Cloud is straightforward with the Mainframe Connector, by following simple steps and defining configurations. The connector runs in z/OS batch job steps and includes a shell interpreter and JVM-based implementations of gsutil, bq and gcloud command-line utilities. This makes it possible to create and run a complete ELT pipeline from JCL, both for the initial batch data migration and ongoing delta updates.A typical flow of the connector includes:Reading the mainframe datasetTranscoding the dataset to ORCUploading ORC file to Cloud StorageRegister ORC file as an external table or load as a native tableSubmit a Query job containing a MERGE DML statement to upsert incremental data into a target table or a SELECT statement to append to or replace an existing tableHere are the steps to install the BQ MainFrame Connector:copy mainframe connector jar to unix filesystem on z/OScopy BQSH JCL procedure to a PDS on z/OSedit BQSH JCL to set site specific environment variablesPlease refer to the BQ Mainframe connector blog for example configuration and commands.Process and Enrich Data in BigQueryBigQuery is a completely serverless and cost-effective enterprise data warehouse. Its serverless architecture lets you use SQL language to query and enrich Enterprise scale data. And its scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. An integrated BQML and BI Engine enables you to analyze the data and gain business insights. Ingest Data from BQ to Elastic CloudDataflow is used here to ingest data from BQ to Elastic Cloud. It’s a serverless, fast, and cost-effective stream and batch data processing service. Dataflow provides an Elasticsearch Flex Template which can be easily configured to create the streaming pipeline. This blog from Elastic shows an example on how to configure the template.Cloud Orchestration from MainframeIt’s possible to load both BigQuery and Elastic Cloud entirely from a mainframe job, with no need for an external job scheduler.To launch the Dataflow flex template directly, you can invoke the gcloud dataflow flex-template run command in a z/OS batch job step.If you require additional actions beyond simply launching the template, you can instead invoke the gcloud pubsub topics publish command in a batch job step after your BigQuery ELT steps are completed, using the –attribute option to include your BigQuery table name and any other template parameters. The pubsub message can be used to trigger any additional actions within your cloud environment.To take action in response to the pubsub message sent from your mainframe job, create a Cloud Build Pipeline with a pubsub trigger and include a Cloud Build Pipeline step that uses the gcloud builder to invoke gcloud dataflow flex-template run and launch the template using the parameters copied from the pubsub message. If you need to use a custom dataflow template rather than the public template, you can use the git builder to checkout your code followed by the maven builder to compile and launch a custom dataflow pipeline. Additional pipeline steps can be added for any other actions you require.The pubsub messages sent from your batch job can also be used to trigger a Cloud Run service or a GKE service via Eventarc and may also be consumed directly by a Dataflow pipeline or any other application.Mainframe Capacity PlanningCPU consumption is a major factor in mainframe workload cost. In the basic architecture design above, the Mainframe Connector runs on the JVM and runs on zIIP processor. Relative to simply uploading data to cloud storage, ORC encoding consumes much more CPU time. When processing large amounts of data it’s possible to exhaust zIIP capacity and spill workloads onto GP processors. You may apply the following advanced architecture to reduce CPU consumption and avoid increased z/OS processing costs.Remote Dataset Transcoding on Compute Engine VMTo reduce mainframe CPU consumption, ORC file transcoding can be delegated to a GCE instance. A gRPC service is included with the mainframe connector specifically for this purpose. Instructions for setup can be found in the mainframe connector documentation. Using remote ORC transcoding will significantly reduce CPU usage of the Mainframe Connector batch jobs and is recommended for all production level BigQuery workloads. Multiple instances of the gRPC service can be deployed behind a load balancer and shared by all Mainframe Connector batch jobs.Transfer Data via FICON and InterconnectGoogle Cloud technology partners offer products to enable transfer of mainframe datasets via FICON and 10G ethernet to Cloud Storage. Obtaining a hardware FICON appliance and Interconnect is a practical requirement for workloads that transfer in excess of 500GB daily. This architecture is ideal for integration of z/OS and Google Cloud because it largely eliminates data transfer related CPU utilization concerns.We really appreciate Jason Mar from Google Cloud who provided rich context and technical guidance regarding the Mainframe Connector, and Eric Lowry from Elastic for his suggestions and recommendations, and the Google Cloud and Elastic team members who contributed to this collaboration.
Quelle: Google Cloud Platform