Enhancing LLM quality and interpretability with the Vertex AI Gen AI Evaluation Service

Developers harnessing the power of large language models (LLMs) often encounter two key hurdles: managing the inherent randomness of their output and addressing their occasional tendency to generate factually incorrect information. Somewhat like rolling dice, LLMs offer a touch of unpredictability, generating different responses even when given the same prompt. While this randomness can fuel creativity, it can also be a stumbling block when consistency or factual accuracy is crucial. Moreover, the occasional “hallucinations” – where the LLM confidently presents misinformation – can undermine trust in its capabilities. The challenge intensifies when we consider that many real-world tasks lack a single, definitive answer. Whether it’s summarizing complex information, crafting compelling marketing copy, brainstorming innovative product ideas, or drafting persuasive emails, there’s often room for multiple valid solutions.In this blog post and accompanying notebook, we’ll explore how to tackle these challenges by introducing a new workflow which works by generating a diverse set of LLM-generated responses and employing the Vertex Gen AI Evaluation Service to automate the selection process of the best response and provide associated quality metrics and explanation. This process is also extensible to multimodal input and output and stands to benefit almost all use cases across industries and LLMs.Picture this: a financial institution striving to summarize customer conversations with banking advisors. The hurdle? Ensuring these summaries are grounded in reality, helpful, concise, and well-written. With numerous ways to craft a summary, the quality varied greatly. Here is how they leveraged the probabilistic nature of LLMs and the Vertex Gen AI Evaluation Service to elevate the performance of the LLM-generated summaries.Step 1: Generate Diverse ResponsesThe core idea here was to think beyond the first response. Causal decoder-based LLMs have a touch of randomness built in, meaning they sample each word probabilistically. So, by generating multiple, slightly different responses, we boost the odds of finding a perfect fit. It’s like exploring multiple paths, knowing that even if one leads to a dead end, another might reveal a hidden gem.For example, imagine asking an LLM, “What is the capital of Japan?” You might get a mix of responses like “Kyoto was the capital city of Japan,” “Tokyo is the current capital of Japan,” or even “Tokyo was the capital of Japan.” By generating multiple options, we increase our chances of getting the most accurate and relevant answer.To put this into action, the financial institution used an LLM to generate five different summaries for each transcript. They adjusted the LLM’s “temperature,” which controls the randomness of output, to a range of 0.2 to 0.4, to encourage just the right amount of diversity without straying too far from the topic. This ensured a range of options, increasing the likelihood of finding an ideal, high-quality summary.Step 2: Find the Best ResponseNext came the need to search through the set of diverse responses and pinpoint the best one. To do this automatically, the financial institution applied the pairwise evaluation approach available in the Vertex Gen AI Evaluation Service. Think of it as a head-to-head showdown between responses. We pit response pairs against each other, judging them based on the original instructions and context to identify the response that aligns most closely with the user’s intent.Continuing the example above to illustrate, let’s say we have those three responses about Japan’s capital. We want to find the best one using pairwise comparisons:Response 1 vs Response 2: The API favors Response 2, potentially explaining, “While Response 1 is technically correct, it doesn’t directly answer the question about the current capital of Japan.”Response 2 (best response so far) vs Response 3: Response 2 wins again! Response 3 stumbles by using the past tense.After these two rounds of comparison, we conclude that Response 2 is the best answer.In the financial institution’s case, they compared their five generated summaries in pairs to select the best one.Step 3: Assess if the Response is Good EnoughThe workflow then takes the top-performing response (Response 2) from the previous step and uses the pointwise evaluation service to assess it. This evaluation assigns quality scores and generates human-readable explanations for those scores across various dimensions, such as accuracy, groundedness, and helpfulness. This process not only highlights the best response but also provides insights into why the model generated this response, and also why it’s considered superior to the other responses, fostering trust and transparency in the system’s decision-making. In the case of the financial institution, they now used the summarization-related metrics in pointwise evaluation on the winning response to obtaining an explanation of how this answer is grounded, helpful, and high-quality. We can choose to return just the best response or include its associated quality metrics and explanation for greater transparency.In essence, the workflow (as illustrated in this blog’s banner) encompasses generating a variety of LLM responses, systematically evaluating them, and selecting the most suitable one—all while providing insights into why that particular response is deemed optimal. Get started by exploring our sample notebook and adapting it to fit with your use case. You can reverse the order of pairwise and pointwise evaluations, by ranking individual responses based on their pointwise scores and then conducting pairwise comparisons only on the top candidates. Further, while this example focuses on text, this approach can be applied to any modality or any use case including but not limited to question answering and summarization like illustrated in this blog. Finally, if you need to minimize latency, both workflows can benefit greatly from parallelizing the various API calls.Take the next stepBy embracing the inherent variability of LLMs and utilizing the Vertex Gen AI Evaluation Service, we can transform challenges into opportunities. Generating diverse responses, systematically evaluating them, and selecting the best option with clear explanations empowers us to unlock the full potential of LLMs. This approach not only enhances the quality and reliability of LLM outputs but also fosters trust and transparency. Start exploring this approach in our sample notebook and check out the documentation for the Vertex Gen AI Evaluation Service.
Quelle: Google Cloud Platform

Embrace the future of container native storage with Azure Container Storage

We are thrilled to announce the general availability of Microsoft Azure Container Storage, the industry’s first platform-managed container native storage service in the public cloud. With Kubernetes driving cloud evolution, we are witnessing a transformative shift as enterprises move from virtual machines (VMs) to containers, optimizing for scalability, flexibility, and cost efficiency. We introduce Azure Container Storage to meet these demands, providing best in class price performance for hosting stateful containers on cloud-based storage and delivering lowest latency on locally attached storage.

Azure Container Storage joins our suite of container services, tightly integrating with Kubernetes and simplifying stateful workload management across Azure’s set of comprehensive storage offerings. Previously, customers needed to retrofit stateful workloads to VM-centric storage options with scalability constraints or deploy self-managed open-source container storage solutions. Since Azure Container Storage is built purposefully for Azure Kubernetes Service (AKS), it simplifies the process, allowing developers to focus on innovating and running applications without worrying about managing storage. With the ability to perform all storage operations directly through Kubernetes APIs—such as creating persistent volumes and scaling up capacity on demand, it eliminates the need to interact with control plane APIs of the underlying infrastructure.

Azure Container Storage

Try it today

Azure Container Storage also streamlines storage management across multiple backing storage options. With its general availability, Azure Container Storage supports Ephemeral Disks (local NVMe and temp SSD) and Azure Disks, just the start in our journey to transform the container storage landscape. Ephemeral Disks support marks a pivotal moment for container users, providing the most comprehensive volume management support for containers on local storage in the cloud. Beyond basic persistent volume (PV) provisioning, Azure Container Storage offers built-in capabilities such as snapshots and autoscaling, capabilities that cannot be found outside of Azure.  

During preview, customers have already begun taking advantage of Azure Container Storage to evolve their business-critical, next-generation solutions. Whether it be optimizing Redpanda cluster performance on Ephemeral Disks or scaling past existing persistent volume limits for Postgres workloads on Azure Disks, Azure Container Storage supports a wide range of workloads. For building stateful applications operating containers, this is just the beginning. Shortly after general availability, we will expand our offerings to include Elastic SAN and later, options like Azure Blobs and Azure Files for shared storage use cases.  

A cloud native solution for all use cases

Azure Container Storage ensures essential resiliency and security for every workload through built-in resiliency design and security enforcements. 

Built-in resiliency: Easily run highly available stateful applications on Azure Container Storage and protect against zonal failures on all levels of the resource hierarchy. You can choose between zone-redundant storage (ZRS) options or multi-zone storage pools on local-redundant storage (LRS) to deliver a highly available solution across zones. For local storage, we optimally place a pod’s persistent volumes on ephemeral disks that exist on the same node as the AKS pod, reducing the number of failure points that could impact your application’s runtime. Moreover, we offer the best balance for availability, cost, and performance—providing the most cost-efficient block storage offering on the cloud with multi-zonal high availability support and sub millisecond read latency. 

Security by default: Security is our top priority. We offer server-side encryption (SSE) with platform-managed keys by default and enforce network security per respective backing storage options. Customers can further enhance security through extensive options, such as SSE with customer-managed keys, per their security standards.

Modernizing existing applications

For any enterprise looking to modernize its applications, Azure Container Storage consolidates management across familiar block storage offerings, simplifies the movement of workloads, and provides continuity in backup and disaster recovery.  

We streamline and consolidate the management experience across our comprehensive portfolio of familiar Azure block storage offerings. Rather than needing to certify and manage multiple container orchestration solutions for each storage resource you deploy, Azure Container Storage efficiently coordinates volume provisioning within a storage pool, a concept we introduce to group storage resources into a unified resource for your AKS cluster. This storage pool can be backed by your preferred storage option, empowering you to choose the most cost-efficient resource tailored to your specific workload performance requirements. For example, Ephemeral Disk, newly introduced as a supported block storage offering for containers, is well-suited for latency-sensitive workloads that benefit from local NVMe or temp SSD storage. KPN, a Dutch telecommunications company, shared their positive experience using Azure Container Storage with local NVMe to host a mail solution on AKS: 

“With Azure Container Storage, we have been able to achieve improved performance in our KPN consumer mail workload by leveraging ephemeral disks and taking advantage of the pooling of resources that Azure Container Storage enables. Instead of the manual configuration of storage, we can focus on running our workloads, and Azure Container Storage will take care of auto-discovering and formatting the NVMe disks, making it simple to use and in line with the Kubernetes way.” 
—Peter Teeninga, Cloud Architect, Personal Cloud by KPN

To make your journey to the cloud as painless as possible we partnered with CloudCasa, a key player in Kubernetes data mobility, to simplify mission-critical migration to the cloud. To continue supporting your cloud estate, we partnered with Kasten, the leading service for data protection for Kubernetes, offering robust backup and disaster recovery capabilities. For more details on our data migration and backup experience provided through our partners, please refer to the later section. 

Building cloud native applications

For application developers building solutions in the cloud, Azure Container Storage offers seamless integration with Kubernetes, providing a container-native experience designed for scalability from the ground up. This ensures that your applications can grow easily and cost-efficiently over time. By supporting industry-standard protocols, such as NVMe-of and iSCSI, we simplify interoperability, providing additional performance options. For instance, you can take advantage of the lower persistent volume attach and detach latencies these protocols offer and achieve rapid scale-out and fast failover. Azure Container Storage allows customers to attach more storage resources to a single VM, increasing the limit to 75 volumes for any VM sizes. The added flexibility increases customers’ ability to optimize Azure resources to meet their cost and performance goals. Sesam, a Norwegian data synchronization and management company, has effectively leveraged this capability to reduce costs by scaling up their persistent volumes more efficiently: 

“Azure Container Storage (ACS) has enabled us to achieve lower total cost of ownership in our workload. We have a large number of pods that need their own persistent volumes, and through the use of Azure Disks and storage pool resource pooling, we are able to fulfill this in a more cost-effective manner, without hitting the limits on the number of disks that can be attached to a node. In practice this makes us able to allocate capacity more easily and more efficiently.”
—Geir Ove Grønmo, Product Manager, Sesam.io 

Data migration and backup support through our trusted partners

Highly efficient and operational storage management is the baseline experience Azure Container Storage strives for. Azure Container Storage tightly integrates with two key third-party solutions—CloudCasa and Kasten—to offer you an integrated migrate, backup, and disaster recovery experience for workloads hosted on stateful containers.  

With the ability to automatically recreate an entire cluster, CloudCasa centralizes the management of cluster recovery and migration, making it easy to move your existing Kubernetes workloads to and within AKS. To modernize your existing workloads on Azure, simply do a full backup of the existing storage resources then set up a restore, indicating Azure Container Storage as the new storage resource for your cluster. 

“With Azure Container Storage, Microsoft has removed much of the management burden from Kubernetes storage, allowing development and DevOps teams to focus on their data and applications. This approach enables organizations to more easily operate stateful production applications at scale. We are pleased to have worked with the Azure Container Storage team to certify CloudCasa for backup and recovery of stateful applications running on it, and to provide a jointly tested solution for easy migration to it.”
—Bob Adair, Head of Product Management, CloudCasa By Catalogic

Kasten automates the end-to-end workflow of backup and disaster recovery, protecting your Kubernetes clusters and application operations. When you deploy your storage pool in Azure Container Storage, you can enable Kasten during snapshot setup. Using dynamic policies, Kasten helps you manage backups at scale in a crash-consistent manner. 

“With Azure Container Storage and Kasten by Veeam, organizations can maximize performance, flexibility and resiliency, while protecting their cloud native workloads from ransomware attacks. Kasten by Veeam collaborated with Microsoft Product and Engineering teams to validate provisioning, volume snapshot and restore capabilities on Azure Container Storage to ensure joint Microsoft and Kasten by Veeam customers can backup, protect, and migrate their stateful workloads to Azure Kubernetes Service (AKS). Through our strategic partnership, we simplify organizations’ cloud journeys without sacrificing performance, scalability or resiliency.” 
—Matt Slotten, Principal Solution Architect, Cloud Native Partnerships Kasten by Veeam  

What is new with general availability?

Our announcement builds on the updates we’ve shared throughout our preview, highlighting several differentiated capabilities. We’ve enhanced the resiliency of stateful containers with multi-zone storage pools and volume replication for local NVMe storage pools to protect against availability loss during single node failures. We’ve also added snapshot support across all storage options for backup and disaster recovery. Additionally, we’ve expanded the Ephemeral Disk portfolio from local NVMe to include temp SSD support, enhancing cost efficiency for use cases that can leverage directly attached local storage. With this announcement, we are excited to introduce three new capabilities that will further increase the resiliency and performance of running stateful workloads: 

Enhance the resiliency of your persistent volumes hosted on local NVMe storage (L-series ephemeral disks) with replication support.

Improved persistent volume recovery after a restart of an Azure Kubernetes Service (AKS) cluster.

Customize the performance of your local NVMe storage with new performance tier options. 

Learn more about Azure Container Storage

Get started with installing Azure Container Storage to your AKS cluster! For a comprehensive guide, watch our step-by-step walkthrough video. You can also explore workload samples from our newly launched community repository to create your first stateful application. To learn more, refer to our AKS engineering blog. We encourage everyone to contribute and share your insights as you explore our newest storage offering.  

If you have any questions, please reach out to AskContainerStorage@microsoft.com. Embrace the future of stateful containers with Azure and unlock new possibilities! 

Explore Azure Container Storage capabilities

The post Embrace the future of container native storage with Azure Container Storage appeared first on Azure Blog.
Quelle: Azure

Docker Scout Health Scores: Security Grading for Container Images in Your Docker Hub Repo

We are thrilled to introduce Docker Scout health scores, our latest feature designed to make software security simpler and more effective for developers. 

Developer-friendly software security

Docker Scout health scores rate the security and compliance status of container images within Docker Hub, providing a single, quantifiable metric to represent the “health” of an image. This feature addresses one of the key friction points in developer-led software security — the lack of security expertise — and makes it easier for developers to turn critical insights from tools into actionable steps.

How Docker Scout health scores work

Docker Scout health scores utilize an alphabetical grading system to rate images stored in Hub repositories. The scores range from A to F, with A representing the highest overall standing and F the lowest. These health scores are calculated by evaluating images against a set of security and compliance checks based on widely accepted secure supply chain best practices. Factors considered include known vulnerabilities, risky licenses, Software Bill of Materials (SBOM) availability, provenance attestations, freshness of base image, and more. To learn more about these checks and the scoring process, visit our documentation.

Note: To maintain the privacy of these assessments, health scores can only be viewed by users who are members of the Docker Hub organization that owns an image repository and have at least “read” access to the repository.

The power of Docker Scout within Docker Hub

Health scores are powered by Docker Scout, our secure software supply chain tool that empowers organizations to strengthen their containerized application security posture via detailed analysis and insights across the software supply chain. Additionally, Docker Scout evaluates container images against detailed policies to ensure compliance with security and licensing standards.

By embedding Docker Scout’s powerful analysis capabilities into Docker Hub, health scores seamlessly fit into developers’ image lifecycle management workflows. Developers visiting hub.docker.com can leverage up-to-date and dependable assessments of their latest and historical images and take proactive measures to prioritize and improve images with lower scores. This capability is crucial for protecting containerized applications from potential security threats.

Figure 1 shows an example of an image with a low health score. The image was awarded a D score because it contains at least one known, high-profile CVE (think Log4Shell), is missing supply chain attestations (like SBOM and provenance), is using an out-of-date base image, and has specified a default root user.

Figure 1: Sample image with a low health score.

Health scores in Docker Hub 

We’ve made it straightforward for developers to leverage health scores. Users can view them directly within the Docker Hub interface by navigating to their organization’s Repositories tab (Figure 2) or from the detailed view for any given repository (Figure 3). 

Figure 2: Repositories tab — health scores per repository.

Figure 3: Repositories details — health scores per tag.

For those seeking more in-depth analysis, enabling Docker Scout for a specific image repository offers easy access to detailed secure software supply chain insights and recommendations for how to address identified issues (Figure 4).

Figure 4: Image details from Docker Scout.

Proactive security through gamification

In addition to making convoluted secure supply chain insights easier to digest, health scores also introduce an element of gamification. Within our own teams at Docker, we are seeing them motivate developers to improve the container images for which they’re responsible. With the clear, quantifiable A to F metric, developers are taking the initiative to pursue higher scores through proactive steps. This process has fostered a culture of continuous improvement, where our developers are self-motivated to prioritize corrective actions and updates to achieve better scores, thus bolstering the security and compliance of our own portfolio.

Conclusion

By leveraging Docker Scout health scores, we aim to encourage organizations to take proactive steps towards better security and compliance management in their containerized environments and increase the overall resilience of their software supply chain. 

The feature is currently available as beta and rolled out to a limited number of organizations that have been selected to participate in the early access program. To try out health scores or to give feedback, reach out to our product team on social channels, such as X and Slack.

Learn more

Subscribe to the Docker Newsletter. 

Visit the Docker Scout product page.

Looking to get up and running? Use our Quickstart guide.

Get the latest release of Docker Desktop.

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/