Cloud CISO Perspectives: November 2021

We’re coming up on the end of the year, yet many of the most pressing security themes from 2021 remain the same, from securing open source software, to enabling zero trust architectures and more. I’ll recap the latest updates from the Google Cybersecurity Action Team and industry progress on important security efforts in this month’s post. Thoughts from around the industrySecuring open source software: Google’s Open Source Software team recently announced ClusterFuzzLite, a continuous fuzzing solution that can run as part of CI/CD workflows to find vulnerabilities. With just a few lines of code, GitHub users can integrate ClusterFuzzLite into their workflow and fuzz pull requests to catch bugs before they are committed. Implementing security checks as early as possible in developer workflows is paramount for improving supply chain security, and NIST’s guidelines for software verification specify fuzzing among the minimum standard requirements for code verification.Runtime cloud-native security: Google Cloud’s Eric Brewer and I discussed the latest trends and the role of cloud providers and startups with InfoWorld in the ‘Race to Secure Kubernetes at Runtime’. Our work in this space goes back many years when we outlined our approach to cloud-native security through our BeyondProd framework, which details one of the core design principles of cloud-native security architectures: protections must extend to how code is changed and how user data in microservices is accessed. The risks and opportunities of the transition to cloud computing: Office of the CISO Director Nick Godfrey and I sat down with Robert Sales of the Global Association of Risk Professionals to discuss the digital risk management landscape. Our discussion covers timely themes like how ensuring the safe adoption of cloud computing is becoming an increasing priority, reflecting the benefits that an organization can accrue from a digital transformation in terms of agility, quality of product and services provided to customers, and relevance in the marketplace and understanding how cloud-driven transformation can actually mitigate existing security, control and resilience risks. Check out the full webinar here.Open source DDR controller framework for mitigating Rowhammer: Google and Antmicro developed a new Rowhammer Tester platform to enable memory security researchers and manufacturers to have access to a flexible platform for experimenting with new types of attacks and finding better Rowhammer mitigation techniques. This important work demonstrates how open source, vendor-neutral IP, tools and hardware can produce better platforms for more effective research and product development.Ethical AI best practices: Many of you are likely engaged in your organizations on controls around AI including the ethical framework for the use of AI. Take a look at SEEE (Security, Ethics, Explainability and Data) in this great summary from Maribel Lopez, Founder, Analyst & Author, Lopez Research, on the importance of controls in AI.Google Cybersecurity Action Team Highlights Here’s a snapshot of the latest updates, new services and resources across our Google Cybersecurity Action Team and Google Cloud Security products since our last post. SecurityReducing risk and increasing sustainability: Veolia, the global leader in optimized resource management, is using Google Cloud’s Security Command Center (SCC) Premium as the core product for protecting the company’s technology environments. In a recent blog post, Thomas Meriadec, Technical Lead and Product Manager for Veolia’s Google Cloud implementation, discusses how SCC Premium serves as the company’s risk management platform and enables Veolia to streamline the process of security management. ComplianceGoogle Cybersecurity Action Team’s Risk and Compliance as Code (RCaC) solution helps organizations prevent security misconfigurations and  automate cloud compliance. The solution enables compliance and security control automation through a combination of Google Cloud products, blueprints, partner integrations, workshops and services to simplify and accelerate time to value. We announced new public sector authorizations including the Impact Level 4 designation for Google Cloud services and FedRAMP High for Google Workspace. These authorizations are a part of our ongoing commitment to help the US federal government modernize their security with cloud-native services at scale. For Google Workspace, this means that federal agencies now have an alternative and choice for productivity and collaboration tools that are completely cloud-native in the marketplace. With IL4 authorization for select GCP services, this is a demonstration of the efficacy of our security controls at scale across our public cloud infrastructure. ControlsWe released new security capabilities for Google Cloud’s enterprise-ready control plane product Traffic Director, which provides fully-managed workload credentials for Google Kubernetes Engine (GKE) via our managed CA Service, and policy enforcement to govern workload communications. The fully-managed credential  provides the foundation for expressing workload identities and securing  connections between workloads leveraging mutual TLS (mTLS), while following zero trust principles.Review our timely guidance here on how to create and safeguard admin accounts in GCP including links to more in-depth guidance in our resource guides.Threat Intelligence Google’s Cybersecurity Action Team released the first issue of the new Threat Horizons report, which is based on cybersecurity threat intelligence observations from Google’s internal security teams. Part of offering a secure cloud computing platform is providing cloud users with cybersecurity threat intelligence so they can better configure their environments and defenses in manners most specific to their needs. This new report provides actionable intelligence that enables organizations to ensure their cloud environments are best protected against ever-evolving threats. Our future reports will continue to provide threat horizon scanning, trend tracking, and Early Warning announcements about emerging threats requiring immediate action. Learn more in our blog post or click here to download the executive summary.Must-listen podcasts Our Cloud Security Podcast has some must-listen episodes this month. Hear from MK Palmore,  a new director in Google Cloud’s Office of the CISO and member of the Cybersecurity Action Team on how Missing Diversity Hurts Your Security and other topics like why email phishing still isn’t solved with Ryan Noon, CEO at Material Security, and the difference between cloud misconfigurations and on-premise infra misconfiguration with the GSK team. Finally, an interview with a Chronicle customer about their SIEM experience is covered in the latest episode.Upcoming Q4 Security Talks – all things Zero TrustOur Google Cloud Security Talks event for Q4 will focus on a topic that we’ve emphasized continuously in our Cloud CISO Perspectives – Zero Trust. Join us on December 15 to hear from leaders across Google as well as leading-edge customers on the many facets of an enterprise zero trust journey. Click here to reserve your spot and we’ll see you there (virtually).If you’d like to have this Cloud CISO Perspectives post delivered every month to your inbox, click here to sign-up. We’ll be back next month for our final Cloud CISO Perspectives blog of 2021.Related ArticleCloud CISO Perspectives: October 2021Security recap from Next ‘21, including product updates that deliver “secure products” not just “security products” and important industr…Read Article
Quelle: Google Cloud Platform

Achieving Autonomic Security Operations: Reducing toil

Almost two decades of Site Reliability Engineering (SRE) has proved the value of incorporating software engineering practices into traditional infrastructure and operations management. In a parallel world, we’re finding that similar principles can radically improve outcomes for the Security Operations Center (SOC), a domain plagued with infrastructure and operational challenges. As more organizations go through digital transformation, the importance of building a highly effective threat management function rises to be one of their top priorities. In our paper,“Autonomic Security Operations — 10X Transformation of the Security Operations Center”, we’ve outlined our approach to modernizing Security Operations.One of the core elements of the Security Operations modernization journey is a relentless focus on eliminating “toil.” Toil is an SRE term defined in the SRE book as “the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” If you’re a security analyst, you may realize that sifting through toil is one of the most significant and burdensome elements of your role. For some analysts, their entire workload fits the SRE definition of “toil.”Another example from the same source states “If your service remains in the same state after you have finished a task, the task was probably toil.” Sound familiar? Some would say that most SOC work is inherently like this – attackers come, alerts trigger, triage and investigate, adjust, tune, respond, rinse, and repeat. If our infrastructure remains in the same state after this, it may be the desired outcome, but we are still left with all of the operational challenges that make the work of the analyst cumbersome.So, let’s talk about how you can make your SOC behave more like good SRE teams do. First, where is that 10X improvement, mentioned in the paper,  likely to come from? If you have an increase in attacks, an increase in assets under protection or an increase in the complexity of your environment, a “toil-based” SOC will need to grow at least linearly with all those changes. To get to 2X the attacks or to 2X increased scope (such as cloud added to your SOC coverage), you will need 2X the people, and sometimes 2X budget to spend on tools.However, if we transform the SOC based on the principles we discuss in the ASO paper, an increase of data and complexity may not require doubling your team and budget (two things that are quite an uphill battle for many security leaders!) The evolution of security operations in general and SOC effectiveness in particular is heavily dependent on driving an engineering-first mindset when operating secure systems at modern scale. You can’t “ops” your way to a modern SOC, but you can “dev” your way there! Using modern tools like Chronicle for detection and investigation can also help you reach that goal.So, how can we put these and other SRE lessons to work in your SOC?First,educate your team on how SRE philosophies can be implemented in the SOC. Find opportunities to do team-building exercises and empower your team to define the cultural transformation. Driving a cultural shift requires an inspired, motivated, and disciplined team.Invest in learning programs to upskill your analysts to develop more engineering skills. Investing in your team’s careers will both lead to more positive sentiment, a more motivated workforce, and a more solution-oriented team than a traditional operations team.Aim to minimize your ops time to 50%; try spending the remaining 50% on improving systems and detections with an “automate-first” mindset. BTW, engineering is not the same as writing code:  “Engineering work is novel and intrinsically requires human judgment. It produces a permanent improvement in your service, and is guided by a strategy.““Commit to eliminate a bit of toil each week with some good engineering” in your SOC. Here are some SOC examples: tweak that rule that produces non-actionables alerts, write a SOAR playbook to auto-close some alerts while using context data, script the test for log collection running optimally, etc. Another area to consider is to try hiring security automation engineers who have operations experience, or have the ability to ramp up quickly. The right person can set the tone for leading your whole team through evolution to an “SRE-inspired” 10X SOC.We here at the Google Cybersecurity Action Team look forward to helping organizations of all sizes and capabilities to achieve Autonomic Security Operations. While the challenges that plague the SOC can at times seem insurmountable, incremental engineering improvements can drive exponential outcomes. As you look to develop your roadmap for modernizing your threat management capabilities, we’re here to partner with you along the journey.Here are some additional resources that provide perspectives on the transition to more autonomic security operations:“Modernizing SOC … Introducing Autonomic Security Operations”“Autonomic Security Operations — 10X Transformation of the Security Operations Center””“SOC in a Large, Complex and Evolving Organization” (Google Cloud Security Podcast ep26) and “The Mysteries of Detection Engineering: Revealed!” (ep27)“A SOC Tried To Detect Threats in the Cloud … You Won’t Believe What Happened Next”Related ArticleModernizing SOC … Introducing Autonomic Security OperationsWe’ve launched the Autonomic Security Operations solution, a new approach to transforming Security Operations to protect against modern-d…Read Article
Quelle: Google Cloud Platform

Want to supercharge your DevOps practice? Research says try SRE

Reliability matters. When users can’t access your application, if it’s slow to respond, or it behaves unexpectedly, they don’t get the value that you intend to provide. That’s why at Google we like to say that reliability is the most important feature of any system. Its impact can be seen all the way to the bottom line, as downtime comes with steep costs—to revenue, to reputation, and to user loyalty. From the beginning of the DevOps Research and Assessment (DORA) project, we’ve recognized the importance of delivering a consistent experience to users. We measure this with the Four Key metrics—two metrics that track the velocity of deploying new releases, balanced against two that capture the initial stability of those releases. A team that rates well on all four metrics is not only good at shipping code, they’re shipping code that’s good. However, these four signals, which focus on the path to a deployment and its immediate effects, are less diagnostic of subsequent success throughout the lifespan of a release. In 2018, DORA began to study the ongoing stability of software delivered as a service (as typified by web applications), which we captured in an additional metric for availability, to explore the impact of technical operations on organizational performance. This year, we expanded our inquiry into this area, starting by renaming availability to reliability. Reliability (sometimes abbreviated as r9y) is a more general term that encompasses dimensions including response latency and content validity, as well as availability.In the 2021 State of DevOps Report’s cluster analysis, teams were segmented into four groups based on the Four Key metrics of software delivery. At first glance, we found that the application of reliability practices is not directly correlated to software delivery performance —  teams that score well on delivery metrics may not be the same as those who consistently practice modern operations. However, in combination, software delivery performance and reliability engineering exert a powerful influence on organizational outcomes: elite software delivery teams that also meet their reliability goals are 1.8 times more likely to report better business outcomes.How Google achieves reliability: SREIn Google’s early days, we took a traditional approach to technical operations; the bulk of the work involved manual interventions in reaction to discrete problems. However, as our products began to rapidly acquire users across the globe, we realized that this approach wasn’t sustainable. It couldn’t scale to match the increasing size and complexity of our systems, and even attempting to keep up would require an untenable investment in our operations workforce. So, for the past 15+ years, we’ve been practicing and iterating on an approach called Site Reliability Engineering (SRE). SRE provides a framework for measurement, prioritization, and information sharing to help teams balance between the velocity of feature releases and the predictable behavior of deployed services. It emphasizes the use of automation to reduce risk and to free up engineering capacity for strategic work. This may sound a lot like a description of DevOps; indeed, these disciplines have many shared values. That similarity meant that when, in 2016, Google published the first book on Site Reliability Engineering, it made waves in the DevOps community as practitioners recognized a like-minded movement. It also caused some confusion: some have framed DevOps and SRE as being in conflict or competition with each other.Our view is that, having arisen from similar challenges and espousing similar objectives, DevOps and SRE can be mutually compatible. We posited that, metaphorically, “class SRE implements DevOps”—SRE provides a way to realize DevOps objectives. Inspired by these communities’ continued growth and ongoing exchange of ideas, we sought to investigate their relationship further. This year, we expanded the scope of data collection to assess the extent of SRE adoption across the industry, and to learn how such modern operational practices interact with DORA’s model of software delivery performance.Starting from the published literature on SRE, we added the key elements of the framework as items in our survey of practitioners. We took care to avoid as much as possible any jargon, instead preferring plain language to describe how modern operations teams go about their work. Respondents reported on such practices as: defining reliability in terms of user-visible behavior; the use of automation to allow engineers to focus on strategic work; and having well-defined, well-practiced protocols for incident response. Along the way, we found that using SRE to implement DevOps is much more widely practiced than we thought. SRE, and related disciplines like Facebook’s Production Engineering, have a reputation for being niche disciplines, practiced only by a handful of tech giants. To the contrary, we found that SRE is used in some capacity by a majority of the teams in the DORA survey, with 52% of respondents reporting the use of one or more SRE practices.SRE is a force multiplier for software delivery excellenceAnalyzing the results, we found compelling evidence that SRE is an effective approach to modern operations across the spectrum of organizations. In addition to driving better business outcomes, SRE helps focus efforts—teams that achieve their reliability goals report that they are able to spend more time coding, as they’re less consumed by reacting to incidents. These findings are consistent with the observation that having reliable services can directly impact revenue, as well as offering engineers greater flexibility to use their time to improve their systems, rather than simply repairing them.But while SRE is widely used and has demonstrable benefits, few respondents indicated that their teams have fully implemented every SRE technique we examined. Increased application of SRE has benefits at all levels: within every cluster of software delivery performance, teams that also meet their reliability goals outperform other members of their cluster in regard to business outcomes. On the SRE road to DevOps excellenceSRE is more than a toolset; it’s also a cultural mindset about the role of operations staff. SRE is a learning discipline, aimed at understanding information and continuously iterating in response. Accordingly, adopting SRE takes time, and success requires starting small, and applying an iterative approach to SRE itself.Here are some ways to get started:Find free books and articles at sre.googleJoin a conversation with fellow practitioners, at all different stages of SRE implementation, at bit.ly/reliability-discussSpeak to your GCP account manager about our professional service offerings Apply to the DevOps awards to show how your organization is implementing award winning SRE practices along with the DORA principles!
Quelle: Google Cloud Platform