Announcing curated detections in Chronicle SecOps Suite

A critical component of any security operations team’s job is to deliver high-fidelity detections of potential threats across the breadth of adversary tactics. But increasingly sophisticated threat actors, an expanding attack surface, and an ever-present cybersecurity talent shortage make this task more challenging than ever. Google keeps more people safe online than anyone else. Individuals, businesses and governments globally depend on our products that are secure-by-design and secure-by-default. Part of the “magic” behind Google’s security is the sheer scale of threat intelligence we are able to derive from our billions of users, browsers, and devices. Today, we are putting the power of Google’s intelligence in the hands of security operations teams. We are thrilled to announce the general availability of curated detections as part of our Chronicle SecOps Suite. These detections are built by our Google Cloud Threat Intelligence (GCTI) team, and are actively maintained to reduce manual toil in your team.Our detections provide security teams with high quality, actionable, out-of-the-box threat detection content curated, built and maintained by Google Cloud Threat Intelligence (GCTI) researchers. Our scale, and depth of intelligence, gained by securing billions of users everyday, gives us a unique vantage point to craft effective and targeted detections. These native detection sets cover a wide variety of threats for cloud and beyond, including Windows-based attacks like ransomware, remote-access tools (RAT), infostealers, data exfiltration, suspicious activity, and weakened configurations.With this launch, security teams can smoothly leverage Google’s expertise and unique visibility into the threat landscape. This release helps understaffed and overstressed security teams keep up with an ever evolving threat landscape, quickly identify threats, and drive effective investigation and response. With this new release, security teams can: Enable high quality curated detections with a single click from within the Chronicle console. Operationalize data with high-fidelity threat detections, stitched with context available from authoritative sources (such as IAM and CMDB). Accelerate investigation and response by finding anomalistic assets and domains with prevalence visualization for the detections triggered. Map detection coverage to the MITRE ATT&CK framework to better understand adversary tactics and techniques and uncover potential gaps in defenses.Detections are constantly updated and refined by GCTI researchers based on the evolving threat landscape. The first release of curated detections includes two categories that cover a broad range of threats, including:Windows-based threats: Coverage for several classes of threats including infostealers, ransomware, RATs, misused software, and crypto activity.Cloud attacks and cloud misconfigurations: Secure cloud workloads with additional coverage around exfiltration of data, suspicious behavior, and additional vectors. Let’s look at an example of how you can put curated detections to work within the Chronicle dashboard, monitor coverage, and map to MITRE ATT&CK®.An analyst can learn more details around specific detections and understand how they map to the MITRE ATT&CK framework. There are customized settings to configure deployment and alerting, and specify exceptions via reference lists. You can see each rule which has generated a detection against your log data in the Chronicle rules dashboard. You can observe detections associated with the rule and pivot to investigative views. For example, here is the detection view from the timeline of an Empire Powershell Stager launch triggered by the Windows RAT rule set. You can also easily pivot to associated information and investigate the asset on which it was triggered.By surfacing impactful, high-efficacy detections, Chronicle can enable analysts to spend time responding to actual threats and reduce alert fatigue. Our customers who used curated detections during our public preview were able to detect malicious activity and take actions to prevent threats earlier in their lifecycle. And there’s more to come. We will be delivering a steady release of new detection categories covering a wide variety of threats, community-driven content, and other out-of-the-box analytics.Ready to put Google’s intelligence to work in your Security Operations Center? Contact Google Cloud sales or your customer success CSM team. You can also learn more about all these new capabilities in Google Chronicle in our product documentation.  Thank you to Mike Hom (Product Architect, Chronicle) and Ben Walter (Engineering Manager, Google Cloud Threat Intelligence), who helped with this launch.Related ArticleIntroducing Cloud Analytics by MITRE Engenuity Center in collaboration with Google CloudTo better analyze the growing volumes of heterogeneous security data, Google has partnered with MITRE to create the Cloud Analytics proje…Read Article
Quelle: Google Cloud Platform

How a Vulnerability Exploitability eXchange can help healthcare prioritize cybersecurity risk

Diagnosing and treating chronic pain can be complex, difficult, and full of uncertainties for a patient and their treating physician. Depending on the condition of the patient and the knowledge of the physician, making the correct diagnosis takes time, and experimenting with different treatments might be required. This trial-and-error process can leave the patient in a world of pain and confusion until the best remedies can be prescribed. It’s a situation similar to the daily struggle that many of today’s security operations teams face. Screaming from the mountain tops “just patch it!” isn’t very helpful when security teams aren’t sure if applying a patch might create even worse issues like crashes, incompatibility, or downtime. Like a patient with chronic pain, they may not know the source of the pain in their system. Determining which vulnerabilities to prioritize patching, and ensuring those fixes actually leave you with a more secure system, is one of the hardest tasks a security team can face. This is where a Vulnerability Exploitability eXchange (VEX) comes in.The point of VEXIn previous blogs, we’ve discussed how establishing visibility and awareness into patient safety and technology is vital to creating a resilient healthcare system. We’ve also looked at how combining software bills of materials (SBOM) with Google’s Supply chain Levels for Software Artifacts (SLSA) framework can help build more secure technology that enables resilience. The SBOM provides visibility into the software you’re using and where it comes from, while SLSA provides guidelines that help increase the integrity and security of software you then build. Rapid diagnostic assessments can be added to that equation with VEX, which the National Telecommunications and Information Administration describes as a “companion” document that lives side-by-side with SBOM. To go back to our medical metaphor, VEX is a mechanism for software providers to tell security teams where to look for the source of the pain. VEX data can help with software audits when inventory and vulnerability data need to be captured at a specific point in time. That data also can be embedded into automated security tools to make it easier to prioritize vulnerability patching.  You can then think of SBOM as the prescription label on a bottle of medication, SLSA as the child-proof lid and tamper-proof seal guaranteeing the safety of the medication, and VEX as the bottle’s safety warnings. As a diagnostic aide, a VEX can help security teams make accurate diagnoses of “what could hurt” and system weaknesses before the bad guys do. Yet making an accurate assessment of that threat model can be challenging, especially when looking at the software we use to run systems. The ability to quickly and accurately evaluate an organizations’ weaknesses and pain points can be vital to hastening response to a vulnerability and stopping cyberattacks before they become destructive. We believe that VEX is an important part of the equation to help secure the software supply chain. As an example, look no further than the Apache Log4j vulnerabilities revealed in December 2021. Global industries including healthcare were dealt another blow when Apache’s Log4j 2 logging system was found to be so vulnerable that relatively unsophisticated threat actors could quickly infiltrate and take over systems. Through research conducted by Google and information contributed by CISA, we learned of examples of where vulnerabilities in Log4j 2, a single software component, could potentially impact thousands of companies using software that depend on it because of its near-ubiquitous use. While a VEX would not capture zero-day vulnerabilities, it would be able to inform security teams of other known vulnerabilities in Log4j 2. Once vulnerabilities have been published, security teams could use SBOM to find them, and use VEX to understand if remediation is a priority or not.How does VEX contribute to visibility?A key reason we focus on visibility mechanisms like SBOM and SLSA is because they give us the ability to understand our risks. Without the ability to see into what we must protect, it can be difficult to determine how to quickly reduce risk.Visibility is a crucial first step to stopping malicious hackers. Yet without context, visibility leaves security teams overwhelmed with data. Why? Well, where would you start when trying to mitigate the 30,000 known vulnerabilities affecting just open source software, according to the Open Source Vulnerabilities database (OSV)? NIST’s National Vulnerability Database (NVD) is tracking close to 181,000 vulnerabilities. We’ll be patching into the next millennium if we adopt a “patch everything” approach.It’s impossible to address every vulnerability individually. To make progress, security teams need to be able to prioritize findings and go after the ones that will have the greatest impact first. The goal of a VEX artifact is to make prioritization a little easier.While SBOMs are created or changed when the material included in a build is updated, VEXs are intended to be changed and distributed when a new vulnerability or threat has changed. This means that VEX and SBOM should be maintained separately. Since security researchers and organizations are constantly discovering new cybersecurity vulnerabilities and threats, a more dynamic mechanism like VEX can help ensure builders and operators have the ability to quickly ascertain the risks of the software they are using.Let’s dig into this VEX example from CycloneDX. You can see the list of vulnerabilities found, third parties who track and report those vulnerabilities, vulnerability ratings per CVSS, and most importantly, a statement from the developer that guides the operator reading the VEX to those vulnerabilities that are exploitable and need to be protected. At the bottom, you’ll see the VEX “affects” an SBOM. This information allows the user of the VEX document to refer to its companion SBOM. By necessity, the VEX is intentionally decoupled from the SBOM because they need to be updated at different times. A VEX document will need to be updated when new vulnerabilities emerge. An SBOM will need to be updated when changes to the software are made by a manufacturer. Although they can and need to be updated separately, the contents of each document can stay aligned because they are linked. Increasing resilience powered by visibility—SBOM+VEX+SLSA VEX could dramatically improve how security vulnerabilities are handled. It’s not uncommon to find operators buried in vulnerabilities, best-guessing the ones that need fixing, and trying to make sense of tens (and sometimes hundreds) of pages of documentation to determine the best, lowest impact fix.With SBOM+SLSA+VEX, operators are using software-driven mechanisms to conduct analyses and evaluate risk instead of relying on intuition and best guesses. The tripartite SBOM+SLSA+VEX approach provides an up-to-date list of issues and perspective on what needs attention. This is a transformative development in security—enabling teams to get a better handle on doing vulnerability mitigation, starting where it could hurt the most.Driven by repeated cyberattacks on critical infrastructure such as healthcare, government regulators have taken a more interested stance in software security and supply chains. Strengthening the effectiveness of SBOMs in the United States is a big part of the newly proposed Protecting and Transforming Cyber Health Care (PATCH) Act. The law would require medical device manufacturers adhere to minimum cybersecurity standards in their products, including the creation of SBOMs for their devices, and plans to monitor and patch any cybersecurity vulnerabilities that are discovered during the device’s lifetime.Meanwhile, new draft medical device cybersecurity guidance from the FDA continues that agency’s involvement in aggressively encouraging medical device manufacturers to improve the cybersecurity resilience of their products. The White House spoke for SBOMs, as well. An Executive Order from May 2021 lays out requirements for secure software development, including the production and distribution of SBOM for software used by the federal government.Regardless of how these initiatives pan out, Google believes controls like those provided by SBOM+SLSA+VEX are critical to protect software and build a resilient healthcare ecosystem. This approach provides detailed, critical risk exposure data to security teams so they can take necessary steps to reduce immediate and long-term risks. What do we suggest you do?At Google, we are working with the Open Source Security Foundation on supporting SBOM development. Our Know, Prevent, Fix report on secure software development creates a broader outline of how Google thinks about securing open source software from preventable vulnerabilities. You can read more about these efforts for securing workloads on Google Cloud from our Cloud Architecture Center. Take a look at Cloud Build, a Google Cloud service that can be used to generate up to SLSA Level 2 build artifacts.Customers often have difficulty getting full visibility and control over vulnerabilities because of their dependence on open source software (OSS). Assured Open Source Software (Assured OSS) is the Google Cloud service that helps teams both secure the external OSS packages they use and overcome avoidable vulnerabilities by simply eliminating them from the code base. Finally, ask us about Google’s Cybersecurity Action Team, the world’s premier security advisory team and its singular mission supporting the security and digital transformation of governments, critical infrastructure, enterprises, and small businesses.If you’re a software supplier, please consider our suggestions above. Whether you are or not, you should begin:Contractually mandating SBOM+VEX+SLSA (or their equivalent) artifacts to be provided for all new software.Train procurement teams to ask for and use SBOM+VEX+SLSA to make purchasing decisions. There should be no reason an organization procures software or hardware with known, preventable issues. Even if they do, the information these mechanisms provide should help security teams decide if they can live with the risks before equipment enters their networks.Establishing a governance program that ensures those who control procurement decisions are aware of and owning the risks associated with software they are buying.Enabling security teams to build pipelines to ingest SBOM+VEX+SLSA artifacts into their security operations and use it to strategically advise and drive mitigation activities.At Google, we believe the path to resilience begins with building visibility and structural awareness into the software, hardware, and equipment it rides on as a critical first step. Time will tell if VEX becomes widely adopted, but the point behind it won’t change—we can’t know how we are vulnerable without visibility. VEX is an important concept in this regard.Next month, we’ll be shifting gears slightly to focus on building resilience by establishing a security culture that obsesses over its patients and products.Related ArticleHow SLSA and SBOM can help healthcare’s cybersecurity resiliencyThere’s more to securing healthcare technology than just data privacy. Here’s why resilient healthcare security needs SBOM and SLSA.Read Article
Quelle: Google Cloud Platform

A visual tour of Google Cloud certifications

Interested in becoming Google Cloud certified? Wondering which Google Cloud certification is right for you? We’ve got you covered.Check out the latest#GCPSketchnote illustration, a framework to help you determine which Google Cloud certification is best suited to validate your current skill set and propel you toward future cloud career goals.Follow the arrows to see where you land, and for tips on how to prepare for your certification on Google Cloud Skills Boost: Cloud Digital Leader-This certification is for anyone who wishes to demonstrate their knowledge of cloud computing basics and how Google Cloud products and services can be used to achieve an organization’s goals.Associate Cloud Engineer – This certification is for candidates who have a solid understanding of Google Cloud fundamentals and experience deploying cloud applications, monitoring operations, and managing cloud enterprise solutions.Professional Google Cloud certifications – These certifications are ideal for candidates with in-depth experience working hands-on setting up cloud environments for organizations based on their business needs, and have experience deploying services and solutions.Professional Cloud ArchitectProfessional Cloud DeveloperProfessional Data EngineerProfessional Cloud Database EngineerProfessional DevOps EngineerProfessional Machine Learning EngineerProfessional Network EngineerProfessional Security EngineerProfessional Workspace Administrator Continue along the arrows for tips on how to prepare for your certification, while earning completion badges and skill badges through our on-demand learning platform, Google Cloud Skills Boost along the way.Where will your certification journey take you?Get started preparing for your certification today. New users are eligible for a 30-day no-cost trial on  Google Cloud Skills Boost.Related ArticleMeet the new Professional Cloud Database Engineer certificationGoogle Cloud launches a new Professional certification.Read Article
Quelle: Google Cloud Platform

Join us for a show-and-tell edition of Google Cloud Security Talks

If you’re new to Security Talks, you should know that this program is part of an ongoing series where we bring together experts from the Google Cloud security team, including the Google Cybersecurity Action Team and Office of the CISO, and the greater industry to share information on our latest security products, innovations, and best practices. The Q3 installment of the Google Cloud Security Talks on Aug. 31 is a special show-and-tell edition. We’re not just going to share what you need to know about our portfolio of products, we’re also going to be showing you how to use them as well. The format for this round of Security Talks will be focused on practitioners and emphasize how to apply Google Cloud products in popular use cases. This time, Security Talks sessions will spotlight key use cases and include how-to demonstrations. You’ll be able to glean best practices and see how you can apply these exact same scenarios to your own environment. Our agenda is packed with insightful sessions across Zero Trust, security operations, secure cloud, and more, including:How to leverage SOAR to grow your automated response playbook library’s value – but not the complexityThe ins and outs of protecting critical apps from fraud and botsHow to create and manage compliant environments in Google CloudHow to get started with network-based threat detection in Google CloudGuidance on where to begin your Zero Trust journeyTips for succeeding with your cloud data security strategyGoogle Cloud’s latest security innovations and product updatesAnd don’t miss the live Cloud Security Podcast roundtable featuring Mandiant Senior Director Robert Wallace, Cybereason Security Strategy Director Ken Westin, and our own Office of the CISO Director of Financial Services Alicja Cade in conversation with host Anton Chuvakin. Our esteemed panel will dig into the latest security trends and how to apply what we’ve learned from them to your own environment.  We’re looking forward to seeing you there. Sign up today to reserve your virtual seat. The Google Cloud Security Talks is 100% digital and free to attend. All sessions will be available on demand after the event. Until then, stay secure.Related ArticleJoin us for Google Cloud Security Talks: Zero Trust editionJoin us for Google Cloud Security Talks with sessions focused on zero trust. Learn how you can protect your users and critical information.Read Article
Quelle: Google Cloud Platform

How autonomic data security can help define cloud’s future

“Ninety percent of all data today was created in the last two years—that’s 2.5 quintillion bytes of data per day,” according to business data analytics company Domo. That would be a mind-bending statistic, except that it’s already five years old. As data usage has undergone drastic expansion and changes in the past five years, so have your business needs for data. Technology such as cloud computing and AI have changed how we use data, derive value from data, and glean insights from data. Your organization is no longer just crunching and re-crunching the same data sets. Data moves, shifts, and replicates, as you mingle data sets and gain new value in the process, as we say in our Data Cloud story. All the while, your data resides in—and is being created in—new places.Data lives in a myriad of locations now and requires access from different locations and mediums, yet many of today’s security models are not geared towards this. In short, your data has fallen out of love with your security model, but attackers have not. So, how do we realign data and security so they are once again in a healthy relationship?Google Cloud, as a leader in cloud data management and cloud security, is positioned uniquely to define and lead this effort. We’ve identified some challenges around the classic approach to data security and the changes triggered by the near-ubiquity of the cloud. The case is compelling for adopting a modern approach to data security. We contend that the optimal way forward is with autonomic data security. A relatively new concept, autonomic data security is security that’s been integrated with data throughout its lifecycle. It can make things easier on users by freeing them from defining and redefining myriad rules about who can do what, when, where. It’s an approach that keeps pace with constantly evolving cyberthreats and business changes. Autonomic data security can help you keep your IT assets more secure and can make your business and IT processes speedier. For example, data sharing with partners and data access decisions simultaneously becomes faster and more secure. This may sound like magic, but in fact relies on a constant willingness to change and adapt to both business changes and threat evolution.Taking the precepts, concepts, and forward-looking solutions presented in this paper into consideration, we strongly believe that now is the right time to assess where you and your business are when it comes to data security. Cloud also brings an incredible scale of computing. Where gigabytes once roamed, petabytes are now common. This means that many data security approaches, especially the manual ones, are no longer practical. To prepare for the future of data security, we recommend you challenge your current model and assumptions and ask critical questions, evaluate where you are, and then start to put a plan in place of how you could start incorporating the autonomic data security pillars into your data security model.There are two sets of questions organizations need to discover the answers to as they start this journey. The first set of questions will help you identify the nature and status of your data, and inform the answers to the second set.What data do I have?Who owns it?Is it sensitive?How is it used?What is the value in storing the data?The second set focuses on higher-level problems:What is my current approach to data security? Where does it fail to support the business and counter the threats?Does it support my business? Should I consider making a change? And if yes, in what direction?The path to improved data security starts by asking the right questions. You can read the full Autonomic Data Security paper for a more in-depth exploration here and learn more about the idea in this podcast episode.Related Article[Infographic] Achieving Autonomic Security Operations: Why metrics matter (but not how you think)Metrics can be a vital asset – or a terrible failure – for keeping organizations safe. Follow these tips to ensure security teams are tra…Read Article
Quelle: Google Cloud Platform

Best practices of migrating Hive ACID Tables to BigQuery

Are you looking to migrate a large amount of Hive ACID tables to BigQuery? ACID enabled Hive tables support transactions that accept updates and delete DML operations. In this blog, we will explore migrating Hive ACID tables to BigQuery. The approach explored in this blog works for both compacted (major / minor) and non-compacted Hive tables. Let’s first understand the term ACID and how it works in Hive.ACID stands for four traits of database transactions:  Atomicity (an operation either succeeds completely or fails, it does not leave partial data)Consistency (once an application performs an operation the results of that operation are visible to it in every subsequent operation)Isolation (an incomplete operation by one user does not cause unexpected side effects for other users)Durability (once an operation is complete it will be preserved even in the face of machine or system failure)Starting in Version 0.14, Hive supports all ACID properties which enables it to use transactions, create transactional tables, and run queries like Insert, Update, and Delete on tables.Underlying the Hive ACID table, files are in the ORC ACID version. To support ACID features, Hive stores table data in a set of base files and all the insert, update, and delete operation data in delta files. At the read time, the reader merges both the base file and delta files to present the latest data. As operations modify the table, a lot of delta files are created and need to be compacted to maintain adequate performance.  There are two types of compactions, minor and major.Minor compaction takes a set of existing delta files and rewrites them to a single delta file per bucket.Major compaction takes one or more delta files and the base file for the bucket and rewrites them into a new base file per bucket. Major compaction is more expensive but is more effective.Organizations configure automatic compactions, but they also need to perform manual compactions when automated fails. If compaction is not performed for a long time after a failure, it results in a lot of small delta files. Running compaction on these large numbers of small delta files can become a very resource intensive operation and can run into failures as well. Some of the issues with Hive ACID tables are:NameNode capacity problems due to small delta files.Table Locks during compaction.Running major compactions on Hive ACID tables is a resource intensive operation.Longer time taken for data replication to DR due to small files.Benefits of migrating Hive ACIDs to BigQuerySome of the benefits of migrating Hive ACID tables to BigQuery are:Once data is loaded into managed BigQuery tables, BigQuery manages and optimizes the data stored in the internal storage and handles compaction. So there will not be any small file issue like we have in Hive ACID tables.The locking issue is resolved here as BigQuery storage read API is gRPC based and is highly parallelized. As ORC files are completely self-describing, there is no dependency on Hive Metastore DDL. BigQuery has an in-built schema inference feature that can infer the schema from an ORC file and supports schema evolution without any need for tools like Apache Spark to perform schema inference. Hive ACID table structure and sample dataHere is the sample Hive ACID  table  “employee_trans” Schemacode_block[StructValue([(u’code’, u”hive> show create table employee_trans;rnOKrnCREATE TABLE `employee_trans`(rn `id` int, rn `name` string, rn `age` int, rn `gender` string)rnROW FORMAT SERDE rn ‘org.apache.hadoop.hive.ql.io.orc.OrcSerde’ rnSTORED AS INPUTFORMAT rn ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ rnOUTPUTFORMAT rn ‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’rnLOCATIONrn ‘hdfs://hive-cluster-m/user/hive/warehouse/aciddb.db/employee_trans’rnTBLPROPERTIES (rn ‘bucketing_version’=’2′, rn ‘transactional’=’true’, rn ‘transactional_properties’=’default’, rn ‘transient_lastDdlTime’=’1657906607′)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eda26dc94d0>)])]This sample ACID table “employee_trans” has 3 records.code_block[StructValue([(u’code’, u’hive> select * from employee_trans;rnOKrn1 James 30 Mrn3 Jeff 45 Mrn2 Ann 40 FrnTime taken: 0.1 seconds, Fetched: 3 row(s)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eda26dc9310>)])]For every insert, update and delete operation, small delta files are created. This is the underlying directory structure of the Hive ACID enabled table.code_block[StructValue([(u’code’, u’hdfs://hive-cluster-m/user/hive/warehouse/aciddb.db/employee_trans/delete_delta_0000005_0000005_0000rnhdfs://hive-cluster-m/user/hive/warehouse/aciddb.db/employee_trans/delete_delta_0000006_0000006_0000rnhdfs://hive-cluster-m/user/hive/warehouse/aciddb.db/employee_trans/delta_0000001_0000001_0000rnhdfs://hive-cluster-m/user/hive/warehouse/aciddb.db/employee_trans/delta_0000002_0000002_0000rnhdfs://hive-cluster-m/user/hive/warehouse/aciddb.db/employee_trans/delta_0000003_0000003_0000rnhdfs://hive-cluster-m/user/hive/warehouse/aciddb.db/employee_trans/delta_0000004_0000004_0000rnhdfs://hive-cluster-m/user/hive/warehouse/aciddb.db/employee_trans/delta_0000005_0000005_0000′), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eda26a28c50>)])]These ORC files in an ACID table are extended with several columns:code_block[StructValue([(u’code’, u’struct<rn operation: int,rn originalTransaction: bigInt,rn bucket: int,rn rowId: bigInt,rn currentTransaction: bigInt,rn row: struct<…>rn>’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eda26a28810>)])]Steps to Migrate Hive ACID tables to BigQueryMigrate underlying Hive table HDFS dataCopy the files present under employee_trans hdfs directory and stage in GCS. You can use either HDFS2GCS solution or Distcp. HDFS2GCS solution uses open source technologies to transfer data and provide several benefits like status reporting, error handling, fault tolerance, incremental/delta loading,  rate throttling, start/stop, checksum validation, byte2byte comparison etc. Here is the high level architecture of the HDFS2GCS solution. Please refer to the public github URL HDFS2GCS to learn more about this tool.The source location may contain extra files that we don’t necessarily want to copy. Here, we can use filters based on regular expressions to do things such as copying files with the .ORC extension only.Load ACID Tables as-is to BigQueryOnce the underlying Hive acid table files are copied to GCS, use the BQ load tool to load data in BigQuery base table. This base table will have all the change events.Data verificationRun  “select *” on the base table to verify if all the changes are captured. Note: Use of “select * …” is used for demonstration purposes and is not a stated best practice.Loading to target BigQuery tableThe following query will select only the latest version of all records from the base table, by discarding the intermediate delete and update operations.You can either load the results of this query into a target table using scheduled query on-demand with the overwrite option or alternatively, you can also create this query as a view on the base table to get the latest records from the base table directly.code_block[StructValue([(u’code’, u’WITHrn latest_records_desc AS (rn SELECTrn Row.*,rn operation,rn ROW_NUMBER() OVER (PARTITION BY originalTransaction ORDER BY originalTransaction ASC, bucket ASC, rowId ASC, currentTransaction DESC) AS rownumrn FROMrn `hiveacid-sandbox.hivetobq.basetable` )rnSELECT id,name,age,genderrnFROMrn latest_records_descrnWHERErn rownum=1rn AND operation != 2′), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eda2680bc90>)])]Once the data is loaded in target BigQuey table, you can perform validation using below steps:a. Use the Data Validation Tool to validate the Hive ACID table and the target BigQuery table. DVT provides an automated and repeatable solution to perform schema and validation tasks. This tool supports the following validations:Column validation (count, sum, avg, min, max, group by)Row validation (BQ, Hive, and Teradata only)Schema validationCustom Query validationAd hoc SQL explorationb. If you have analytical HiveQLs running on this ACID table, translate them using the BigQuery SQL translation service and point to the target BigQuery table. Hive DDL Migration (Optional)Since ORC is self-contained, leverage BigQuery’s schema inference feature when loading. There is no dependency to extract Hive DDLs from Metastore. But if you have an organization-wide policy to pre-create datasets and tables before migration, this step will be useful and will be a good starting point. a. Extract Hive ACID DDL dumps and translate them using BigQuery translation service to create equivalent BigQuery DDLs. There is a Batch SQL translation service to bulk translate exported HQL (Hive Query Language) scripts from a source metadata bucket in Google Cloud Storage to BigQuery equivalent SQLs  into a target GCS bucket. You can also use BigQuery interactive SQL translator which is a live, real time SQL translation tool across multiple SQL dialects to translate a query like HQL dialect into a BigQuery Standard SQL query. This tool can reduce time and effort to migrate SQL workloads to BigQuery. b. Create managed BigQuery tables using the translated DDLs. Here is the screenshot of the translation service in the BigQuery console.  Submit “Translate” to translate the HiveQLs and “Run” to execute the query. For creating tables from batch translated bulk sql queries, you can use Airflow BigQuery operator (BigQueryInsertJobOperator) to run multiple queriesAfter the DDLs are converted, copy the ORC files to GCS and perform ELT in BigQuery. The pain points of Hive ACID tables are resolved when migrating to BigQuery. When you migrate the ACID tables to BigQuery, you can leverage BigQuery ML and GeoViz capabilities for real-time analytics. If you are interested in exploring more, please check out the additional resources section. Additional ResourcesHive ACIDACID ORC FormatHDFS2GCS SolutionDistCpData Validation ToolBigQuery Translation ServiceRelated ArticleScheduling a command in GCP using Cloud Run and Cloud SchedulerHow to efficiently and quickly schedule commands like Gsutil using Cloud Run and Cloud Scheduler.Read Article
Quelle: Google Cloud Platform

Snooze your alert policies in Cloud Monitoring

Does your development team want to snooze alerts during non-business hours? Or proactively prevent the creation of expected alerts for an upcoming expected maintenance window? Cloud Alerting in Google’s Cloud operations suite now supports the ability to snooze alert policies for a given period of time. You can create a Snooze by providing specific alert policies and a time period. During this window, if the alert policy is violated, no incidents or notifications are created. When the window ends, the alerting behavior resumes as normal. Your team can use this feature in a variety of ways. One example is to avoid being paged for non-production environments over the weekend. Another way is to plan for a known maintenance window or cutover period. You can also quiet the noise during a growing outage, among other approaches. To create a Snooze, go to Monitoring >  Alerting. See the new table with Snoozes and click on Create Snooze. You provide the name of the Snooze, time period, and select the desired Alert Policies. After you select the criteria, a table lists recent Incidents that match this criteria. Events like those won’t cause an alert when the snooze is active.You will  see a timeline visualization of all past, active, and upcoming Snoozes. If you’d like to adjust the duration, you can go back and edit the details. For more information, please see the documentation.In the future, we’ll expand this functionality to allow snoozing by labels. You’ll be able to temporarily silence by the resource, system, metric, and custom labels which will allow you to snooze all alert policies in a specific environment, zone, or team. This functionality will be extended to be supported in the API, allowing you to create Snoozes programmatically for regularly repeating events.Related ArticleAdd severity levels to your alert policies in Cloud MonitoringAdd static and dynamic severity levels to your alert policies for easier triaging and include these in notifications when sent to 3rd par…Read Article
Quelle: Google Cloud Platform

Accelerate your developer productivity with Query Library

Our goal in Cloud Logging is to help increase developer productivity by streamlining the troubleshooting process. The time spent on writing and executing a query, and then analyzing the errors can impact developer productivity. Whether you’re troubleshooting an issue or analyzing your logs, finding the right logs quickly, is critical. That’s why we recently launched a Query Library and other new features to make querying your logs even easier. The Query Library in Cloud Logging makes it easier to find logs faster by using common queries.Build queries faster with our templatesThe new text search and drop-down features are designed to make querying something that you can achieve with a few mouse clicks. These features automatically generate the Logging query language necessary for you. The Query Library extends this simplicity with templates for common GCP queries.The Query Library is located in the query builder bar next to the Suggested queries. To help find the most relevant queries you’ll notice the following details:Query categories – Each query is broken down into categories that can be used to easily narrow down to relevant queries. Query occurrences – To help you pick queries that have the most useful results, sparklines are displayed for queries that have logs in your project. Query details – Each query has a description along with the Logging query Run/Stream – Run the query or start streaming logs right from the librarySave – Save the query in your list of saved queriesThe road aheadWe’re committed to making Logs Explorer the best place to troubleshoot your applications running on Google Cloud. Over the coming months, we have many more changes planned to make Logs Explorer both easier and more powerful for all users. If you haven’t already, get started with the Logs Explorer and join the discussion in our Cloud Operations page on the Google Cloud Community site.Related ArticleGoogle Cloud Deploy gets continuous delivery productivity enhancementsIn this latest release, Google Cloud Deploy got improved onboarding, delivery pipeline management and additional enterprise features.Read Article
Quelle: Google Cloud Platform

Google Cloud and Apollo24|7: Building Clinical Decision Support System (CDSS) together

Clinical Decision Support System (CDSS) is an important technology for the healthcare industry that analyzes data to help healthcare professionals make decisions related to patient care. The market size for the global clinical decision support system appears poised for expansion, with one study predicting a compound annual growth rate (CAGR) of 10.4%, from 2022 to 2030, to $10.7 billion.For any health organization that wants to build a CDSS system, one key block is to locate and extract the medical entities that are present in the clinical notes, medical journals, discharge summaries, etc. Along with entity extraction, the other key components of the CDSS system are capturing the temporal relationships, subjects, and certainty assessments.At Google Cloud, we know how critical it is for the healthcare industry to build CDSS systems, so we worked with Apollo 24|7, the largest multi-channel digital healthcare platform in India, to build the key blocks of their CDSS solution. We helped them to parse the discharge summaries and prescriptions to extract the medical entities. These entities can then be used to build a recommendation engine that would help doctors with the “Next Best Action” recommendation for medicines, lab tests, etc.Let’s take a sneak peek at Apollo 24|7’s entity extraction solutions, and the various Google AI technologies that were tested to form the technology stack. Datasets UsedTo perform our experiments on entity extraction, we used two types of datasets. i2b2 Dataset – i2b2 is an open-source clinical data warehousing and analytics research platform that provides annotated deidentified patient discharge summaries made available to the community for research purposes. This dataset was primarily used for training and validation of the models.Apollo 24|7’s Dataset – De-identified doctor’s notes from Apollo24|7 were used for testing. Doctors annotated them to label the entities and offset values. Experimentation and choosing the right approach — Four models put to testFor entity extraction, both Google Cloud products and open-source approaches were explored. Below are the details:1. Healthcare Natural Language API: This is a no-code approach that provides machine learning solutions for deriving insights from medical text. Using this, we parsed unstructured medical text and then generated a structured data representation of the medical knowledge entities stored in the data for downstream analysis and automation. The process includes:Extract information about medical concepts like diseases, medications, medical devices, procedures, and their clinically relevant attributes;Map medical concepts to standard medical vocabularies such as RxNorm, ICD-10, MeSH, and SNOMED CT (US users only);Derive medical insights from text and integrate them with data analytics products in Google Cloud.The advantage of using this approach is that it not only extracts a wide range of entity types like MED_DOSE, MED_DURATION, LAB_UNIT, LAB_VALUE, etc, but also captures functional features such as temporal relationships, subjects, and certainty assessments, along with the confidence scores. Since it is available on Google Cloud, this offers long-term product support. It is also the only fully-managed NLP service among all the approaches tested and hence, it requires the least effort to implement and manage. But one thing to keep in mind is that since the Healthcare NL API offers natural language models that are pre-trained, it currently cannot be used for custom entity extraction models trained using custom annotated medical text or to extract custom entities. This has to be done via AutoML Entity Extraction for Healthcare, another Google Cloud service for custom model development. Custom model development is important for adapting the pre-trained models to new languages or region-specific natural language processing, such as medical terms whose use may be more prevalent in India than in other regions2. Vertex AutoML Entity Extraction for Healthcare: This is a low-code approach that’s already available on Google Cloud. We used AutoML Entity Extraction to build and deploy custom machine learning models that analyzed documents, categorized them, and identified entities within them. This custom machine learning model was trained on the annotated dataset provided by the Apollo 24|7 team.The advantage of AutoML Entity Extraction is that it gives the option to train on a new dataset. However, one of the prerequisites to keep in mind is that it needs a little pre-processing to capture the input data in the required JSONL format. Since this is an AutoML model just for Entity Extraction, it does not extract relationships, certainty assessments, etc.3. BERT-based Models on Vertex AI: Vertex AI is Google Cloud’s fully managed unified AI platform to build, deploy, and scale ML models faster, with pre-trained and custom tooling. We experimented with multiple custom approaches based on pre-trained BERT-based models, which have shown state-of-the-art performance in many natural language tasks. To gain better contextual understanding of medical terms and procedures, these BERT-based approaches are explicitly trained on medical domain data. Our experiments were based on BioClinical BERT, BioLink BERT, Blue BERT trained on Pubmed dataset, and Blue BERT trained on Pubmed + MIMIC datasets.The major advantage of these BERT-based models is that they can be finetuned on any Entity Recognition task with minimal efforts. However, since this is a custom approach, it requires some technical expertise. Additionally, it does not extract relationships, certainty assessments, etc. This is one of the main limitations of using BERT-based models. 4. ScispaCy on Vertex AI: We used Vertex AI to perform experiments based on ScispaCy, which is a Python package containing spaCy models for processing biomedical, scientific or clinical text. Along with Entity Extraction, Scispacy on Vertex AI provides additional components like Abbreviation Detector, Entity Linking, etc. However, when compared to other models, it was less precise, with too many junk phrases, like “Admission Date,” captured as entities.“Exploring multiple approaches and understanding the pros/cons of each approach helped us to decide the one that would fit our business requirements.” according to Abdussamad M, Engineering Lead at Apollo 24|7. Evaluation StrategyIn order to match the parsed entity with the test data labels, we used extensive matching logic that comprised of the below four methods:Exact Match – Exact match captures entities where the model output and the entities in the test dataset match. Here, the offset values of the entities have also been considered. For example, the entity “gastrointestinal infection” that is present as-is in both the model output and the test label will be considered an “Exact Match.” Match-Score Logic – We used a scoring logic for matching the entities. For each word in the test data labels, every word in the model output is matched along with the offset. A score is calculated between the entities and based on the threshold, it is considered as a match.Partial Match – In this matching logic, entities like “hypertension” and “hypertensive” are matched based on the Fuzzy logic.UMLS Abbreviation Lookup – We also observed that the medical text had some abbreviations, like AP meaning abdominal pain. These were first expanded by doing a lookup on the respective UMLS (Unified Medical Language System) tables and then passed to the individual entity extraction models. Performance MetricsWe used precision and recall metrics to compare the outcomes of different models/experiments. Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while recall (also known as sensitivity) is the fraction of relevant instances that were retrieved. The below example shows how to calculate these metrics for a given sample.Example sample: “Krish has fever, headache and feels uncomfortable”Expected Entities: [“fever”, “headache”]Model Output: [“fever”, “feels”, “uncomfortable”]Thus,Experimentation ResultsThe following table captures the results of the above experiments on Apollo24|7’s internal datasets.Finally, the Blue BERT model trained on the Pubmed dataset had the best performance metrics with a 81% improvement on Apollo 24|7’s baseline mode with the Healthcare Natural Language API providing the context, relationships, and codes. This performance could be further improved by implementing an ensemble of these two models.“With the Blue BERT model giving the best performance for entity extraction on Vertex AI and the Healthcare NL API being able to extract the relationships, certainty assessments etc, we finally decided to go with an ensemble of these 2 approaches,“ Abdussamad added. Fast track end-to-end deployment with Google Cloud AI Services (AIS)Google AIS (Professional Services Organization) helped Apollo24|7 to build the key blocks of the CDSS system.The partnership between Google Cloud and Apollo 24|7 is just one of the latest examples of how we’re providing AI-powered solutions to solve complex problems to help organizations drive the desired outcomes. To learn more about Google Cloud’s AI services, visit our AI & ML Products page, and to learn more about Google Cloud solutions for health care, explore our Google Cloud Healthcare Data Engine page. AcknowledgementsWe’d like to give special thanks to Nitin Aggarwal, Gopala Dhar and Kartik Chaudhary for their support and guidance throughout the project. We are also thankful to Manisha Yadav, Santosh Gadgei and Vasantha Kumar for implementing the GCP infrastructure. We are grateful to the Apollo team (Chaitanya Bharadwaj, Abdussamad GM, Lavish M, Dinesh Singamsetty, Anmol Singh and Prithwiraj) and our partner team from HCL/Wipro (Durga Tulluru and Praful Turanur) who partnered with us in delivering this successful project. Special thanks to the Cloud Healthcare NLP API team (Donny Cheung, Amirhossein Simjour, and Kalyan Pamarthy).Related ArticleHIMSS 2022: Improving health through data interoperability and natural language processingAt HIMSS 2022, Google Cloud showcases how data interoperability and natural language processing can help improve health outcomes.Read Article
Quelle: Google Cloud Platform

Google Cloud Deploy gets continuous delivery productivity enhancements

Since Google Cloud Deploy became generally available in January 2022, we’ve remained focused on our core mission: making it easier to establish and operate software continuous delivery to a Google Kubernetes Engine environment. Through ongoing conversations with developers, DevOps engineers, and business decision makers alike, we’ve received feedback about onboarding speed, delivery pipeline management, and expanding enterprise features.Today, we are pleased to introduce numerous feature additions to Google Cloud Deploy in these areas. Faster onboardingSkaffold is an open source tool that orchestrates continuous development, continuous integration (CI), and continuous delivery (CD), and it’s integral to Google Cloud Deploy. Through Skaffold and Google Cloud Deploy, the local application development loop is seamlessly connected to a continuous delivery capability, bringing consistency to your end-to-end software delivery lifecycle tooling. This may be the first time your team is using Skaffold. To help, Google Cloud Deploy can now generate a Skaffold configuration for single manifest applications when one is not present. When you create a release, the new ‘gcloud deploy releases create … –from-k8s-manifest‘ command provides an application manifest, and generates a Skaffold configuration. This lets your application development teams and continuous delivery operators familiarize themselves with Google Cloud Deploy, reducing early-stage configuration and learning friction as they establish their continuous delivery capabilities. When you use this option, you can review the generated Skaffold configuration, and as your comfort with Skaffold configuration and Google Cloud Deploy increases, you can develop your own Skaffold configurations tailored to your specific delivery pipeline needs.Delivery pipeline managementContinuous delivery pipelines are always in use. New releases navigate a progression sequence as they make their way out to the production target. The journey, however, isn’t always smooth. In that case, you may need to manage your delivery pipeline and related resources more discretely. With the addition of delivery pipeline suspension, you can now temporarily pause problematic delivery pipelines to restrict all release and rollout activity. By pausing the activity, you can undertake an investigation to identify problems and their root cause. Sometimes it isn’t the delivery pipeline that has a problem, but rather a release. Through release abandonment, you can prohibit application releases that have a feature defect, outdated library, or other identified issues from being deployed further. Release abandonment ensures an undesired release won’t be used again, while keeping it available for issue review and troubleshooting.A suspended delivery pipeline and abandoned releasesWhen reviewing or troubleshooting release application manifest issues, you may want to compare application manifests between releases and target environments to determine when an application configuration changed and why. But comparing applications manifests can be hard, requiring you to use the command line to locate and diff multiple files.To help, Google Cloud Deploy now has a Release inspector, which makes it easy to review application manifests and compare against releases and targets within a delivery pipeline.Reviewing and comparing application manifests with the Release InspectorRollout listings within the Google Cloud Deploy console have, to date, been limited to a specific release or target. A complete delivery pipeline rollout listing (and filtering) has been a standing request, and you can now find it on the delivery pipeline details page.Delivery pipeline details now with complete Rollouts listingFinally, execution environments are an important part of configuring custom render and deploy environments. In addition to the ability to specify custom worker pools, Cloud Storage buckets, and service accounts, we’ve added an execution timeout to better support long-running deployments. Expanded enterprise featuresEnterprise environments frequently have numerous requirements to be able to operate, such as security controls, logging, Terraform support, and regional availability.In a previous blog post, we announced support for VPC Security Controls (VPC-SC) in Preview. We are pleased to announce that Google Cloud Deploy VPC-SC is now generally available. We’ve also documented how you can configure customer managed encryption keys (CMEK) with services that depend on Google Cloud Deploy.There are also times when reviewing manifest-render and application deployment logs may not be sufficient for troubleshooting. For these situations, we’ve added Google Cloud Deploy service platform logs, which may provide additional details towards issue resolution.Terraform plays an important role in deploying Google Cloud resources. You can now deploy Google Cloud Deploy delivery pipelines and target resources using Google Cloud Platform’s Terraform provider. With this, you can now deploy Google Cloud Deploy resources as part of a broader Google Cloud Platform resource deployment.Regional availability is important for businesses that need a regional service presence. Google Cloud Deploy is now available in an additional nine regions, bringing the total number of Google Cloud Deploy worldwide regions to 15.The futureComprehensive, easy-to-use, and cost-effective DevOps tools are key to building an efficient software delivery capability, and it’s our hope that Google Cloud Deploy will help you implement complete CI/CD pipelines. And we’re just getting started. Stay tuned as we introduce exciting new capabilities and features to Google Cloud Deploy in the months to come. In the meantime, check out the product page, documentation, quickstart, and tutorials. Finally, If you have feedback on Google Cloud Deploy, you can join the conversation. We look forward to hearing from you.Related ArticleGoogle Cloud Deploy, now GA, makes it easier to do continuous delivery to GKEGoogle Cloud Deploy managed service, now GA, makes it easier to do continuous delivery to Google Kubernetes EngineRead Article
Quelle: Google Cloud Platform