Beyond boundaries: The future of Azure Storage in 2026

2025 was a pivotal year in Azure Storage, and we’re heading into 2026 with a clear focus on helping customers turn AI into real impact. As outlined in last December’s Azure Storage innovations: Unlocking the future of data, Azure Storage is evolving as a unified intelligent platform that supports the full AI lifecycle at enterprise scale with the performance modern workloads demand.

What is Azure Storage?

Looking ahead to 2026, our investments span the full breadth of that lifecycle as AI becomes foundational across every industry. We are advancing storage performance for frontier model training, delivering purpose‑built solutions for large‑scale AI inferencing and emerging agentic applications, and empowering cloud‑native applications to operate at agentic scale. In parallel, we are simplifying adoption for mission‑critical workloads, lowering TCO, and deepening partnerships to co‑engineer AI‑optimized solutions with our customers.

We’re grateful to our customers and partners for their trust and collaboration, and excited to shape the next chapter of Azure Storage together in the year ahead.

Extending from training to inference

AI workloads extend from large, centralized model training to inference at scale, where models are applied continuously across products, workflows, and real-world decision making. LLM training continues to run on Azure, and we’re investing to stay ahead by expanding scale, improving throughput, and optimizing how model files, checkpoints, and training datasets flow through storage.

Innovations that helped OpenAI to operate at unprecedented scale are now available for all enterprises. Blob scaled accounts allow storage to scale across hundreds of scale units within a region, handling millions of objects required to enable enterprise data to be used as training and tuning datasets for applied AI. Our partnership with NVIDIA DGX on Azure shows that scale translates into real-world inference. DGX cloud was co-engineered to run on Azure, pairing accelerated compute with high-performance storage, Azure Managed Lustre (AMLFS), to support LLM research, automotive, and robotics applications. AMLFS provides the best price-performance for keeping GPU fleets continuously fed. We recently released Preview support for 25 PiB namespaces and up to 512 GBps of throughput, making AMLFS best in class managed Lustre deployment on Cloud.

As we look ahead, we’re deepening integration across popular first and third-party AI frameworks such as Microsoft Foundry, Ray, Anyscale, and LangChain, enabling seamless connections to Azure Storage out of box. Our native Azure Blob Storage integration within Foundry enables enterprise data consolidation into Foundry IQ, making blob storage the foundational layer for grounding enterprise knowledge, fine-tuning models, and serving low-latency context to inference, all under the tenant’s security and governance controls.

From training through full-scale inferencing, Azure Storage supports the entire agent lifecycle: from distributing large model files efficiently, storing and retrieving long-lived context, to serving data from RAG vector stores. By optimizing for each pattern end-to-end, Azure Storage has performant solutions for every stage of AI inference.

Evolving cloud native applications for agentic scale

As inference becomes the dominant AI workload, autonomous agents are reshaping how cloud native applications interact with data. Unlike human-driven systems with predictable query patterns, agents operate continuously, issuing an order of magnitude more queries than traditional users ever did. This surge in concurrency stresses databases and storage layers, pushing enterprises to rethink how they architect new cloud native applications.

Azure Storage is building with SaaS leaders like ServiceNow, Databricks, and Elastic to optimize for agentic scale leveraging our block storage portfolio. Looking forward, Elastic SAN becomes a core building block for these cloud native workloads, starting with transforming Microsoft’s own database solutions. It offers fully managed block storage pools for different workloads to share provisioned resources with guardrails for hosting multi-tenant data. We’re pushing the boundaries on max scale units to enable denser packing and capabilities for SaaS providers to manage agentic traffic patterns.

As cloud native workloads adopt Kubernetes to scale rapidly, we are simplifying the development of stateful applications through our Kubernetes native storage orchestrator, Azure Container Storage (ACStor) alongside CSI drivers. Our recent ACStor release signals two directional changes that will guide upcoming investments: adopting the Kubernetes operator model to perform more complex orchestration and open sourcing the code base to collaborate and innovate with the broader Kubernetes community.

Together, these investments establish a strong foundation for the next generation of cloud native applications where storage must scale seamlessly and deliver high efficiency to serve as the data platform for agentic scale systems.

Breaking price performance barriers for mission critical workloads

In addition to evolving AI workloads, enterprises continue to grow their mission critical workloads on Azure.

SAP and Microsoft are partnering together to expand core SAP performance while introducing AI-driven agents like Joule that enrich Microsoft 365 Copilot with enterprise context. Azure’s latest M-series advancements add substantial scale-up headroom for SAP HANA, pushing disk storage performance to ~780k IOPS and 16 GB/s throughput. For shared storage, Azure NetApp Files (ANF) and Azure Premium Files deliver the high throughput NFS/SMB foundations SAP landscapes rely on, while optimizing TCO with ANF Flexible Service Level and Azure Files Provisioned v2. Coming soon, we will introduce Elastic ZRS storage service level in ANF, bringing zone‑redundant high availability and consistent performance through synchronous replication across availability zones leveraging Azure’s ZRS architecture, without added operational complexity.

Similarly, Ultra Disks have become foundational to platforms like BlackRock’s Aladdin, which must react instantly to market shifts and sustain high-performance under heavy load. With average latency well under 500 microseconds, support for 400K IOPS, and 10 GB/s throughput, Ultra Disks enable faster risk calculation, more agile portfolio management, and resilient performance on BlackRock’s highest-volume trading days. When paired with Ebsv6 VMs, Ultra Disks can reach 800K IOPS and 14 GB/s for the most demanding mission critical workloads. And with flexible provisioning, customers can tune performance precisely to their needs while optimizing TCO.

These combined investments give enterprises a more resilient, scalable, and cost-efficient platform for their most critical workloads.

Designing for new realities of power and supply

The global AI surge is straining power grids and hardware supply chains. Rising energy costs, tight datacenter budgets, and industry-wide HDD/SSD shortages mean organizations can’t scale infrastructure simply by adding more hardware. Storage must become more efficient and intelligent by design.

We’re streamlining the entire stack to maximize hardware performance with minimal overhead. Combined with intelligent load balancing and cost-effective tiering, we are uniquely positioned to help customers scale storage sustainably even as power and hardware availability become strategic constraints. With continued innovations on Azure Boost Data Processing Units (DPUs), we expect step function gains in storage speed and feeds at even lower per unit energy consumption.

AI pipelines can span on-premises estates, neo cloud GPU clusters, and cloud, yet many of these environments are limited by power capacity or storage supply. When these limits become a bottleneck, we make it easy to shift workloads to Azure. We’re investing in integrations that make external datasets first class citizens in Azure, enabling seamless access to training, finetuning, and inference data wherever it lives. As cloud storage evolves into AI-ready datasets, Azure Storage is introducing curated, pipeline optimized experiences to simplify how customers feed data into downstream AI services.

Accelerating innovations through the storage partner ecosystem

We can’t do this alone. Azure Storage partners closely with strategic partners to push inference performance to the next level. In addition to the self-publishing capabilities available in Azure Marketplace, we go a step further by devoting resources with expertise to co-engineer solutions with partners to build highly optimized and deeply integrated services.

In 2026, you will see more co-engineered solutions like Commvault Cloud for Azure, Dell PowerScale, Azure Native Qumulo, Pure Storage Cloud, Rubrik Cloud Vault, and Veeam Data Cloud. We will focus on hybrid solutions with partners like VAST Data and Komprise to enable data movement that unlocks the power of Azure AI services and infrastructure—fueling impactful customer AI Agent and Application initiatives.

To an exciting new year with Azure Storage

As we move into 2026, our vision remains simple: help every customer unlock more value from their data with storage that is faster, smarter, and built for the future. Whether powering AI, scaling cloud native applications, or supporting mission critical workloads, Azure Storage is here to help you innovate with confidence in the year ahead.

What are the benefits of using Azure Storage?
Azure Storage services are durable, secure, and scalable. Review your options and check out our sample of scenarios.

Explore Azure Storage

The post Beyond boundaries: The future of Azure Storage in 2026 appeared first on Microsoft Azure Blog.
Quelle: Azure

Clawdbot with Docker Model Runner, a Private Personal AI Assistant

Personal AI assistants are transforming how we manage our daily lives—from handling emails and calendars to automating smart homes. However, as these assistants gain more access to our private data, concerns about privacy, data residency, and long-term costs are at an all-time high.

By combining Clawdbot with Docker Model Runner (DMR), you can build a high-performance, agentic personal assistant while keeping full control over your data, infrastructure, and spending.

This post walks through how to configure Clawdbot to utilize Docker Model Runner, enabling a privacy-first approach to personal intelligence.

What Are Clawdbot and Docker Model Runner?

Clawdbot is a self-hosted AI assistant designed to live where you already are. Unlike browser-bound bots, Clawdbot integrates directly with messaging apps like Telegram, WhatsApp, Discord, and Signal. It acts as a proactive digital coworker capable of executing real-world actions across your devices and services.

Docker Model Runner (DMR) is Docker’s native solution for running and managing large language models (LLMs) as OCI artifacts. It exposes an OpenAI-compatible API, allowing it to serve as the private “brain” for any tool that supports standard AI endpoints.

Together, they create a unified assistant that can browse the web, manage your files, and respond to your messages without ever sending your sensitive data to a third-party cloud.

Benefits of the Clawdbot + DMR Stack

Privacy by Design

In a “Privacy-First” setup, your assistant’s memory, message history, and files stay on your hardware. Docker Model Runner isolates model inference, meaning:

No third-party training: Your personal emails and schedules aren’t used to train future commercial models.

Sandboxed execution: Models run in isolated environments, protecting your host system.

Data Sovereignty: You decide exactly which “Skills” (web browsing, file access) the assistant can use.

Cost Control and Scaling

Cloud-based agents often become expensive when they use “long-term memory” or “proactive searching,” which consume massive amounts of tokens. With Docker Model Runner, inference runs on your own GPU/CPU. Once a model is pulled, there are no per-token fees. You can let Clawdbot summarize thousands of unread emails or research complex topics for hours without worrying about a surprise API bill at the end of the month.

Configuring Clawdbot with Docker Model Runner

Modifying the Clawdbot Configuration

Clawdbot uses a flexible configuration system to define which models and providers drive its reasoning. While the onboarding wizard (clawdbot onboard) is the standard setup path, you can manually point Clawdbot to your private Docker infrastructure.

You can define your provider configuration in:

Global configuration: ~/.config/clawdbot/config.json

Workspace-specific configuration: clawdbot.json in your active workspace root.

Using Clawdbot with Docker Model Runner

To bridge the two, update your configuration to point to the DMR server. Assuming Docker Model Runner is running at its default address: http://localhost:12434/v1.

Your config.json should be updated as follows:

{
"models": {
"providers": {
"dmr": {
"baseUrl": "http://localhost:12434/v1",
"apiKey": "dmr-local",
"api": "openai-completions",
"models": [
{
"id": "gpt-oss:128K",
"name": "gpt-oss (128K context window)",
"contextWindow": 128000,
"maxTokens": 128000
},
{
"id": "glm-4.7-flash:128K",
"name": "glm-4.7-flash (128K context window)",
"contextWindow": 128000,
"maxTokens": 128000
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "dmr/gpt-oss:128K"
}
}
}
}

This configuration tells Clawdbot to bypass external APIs and route all “thinking” to your private models.

Note for Docker Desktop Users:Ensure TCP access is enabled so Clawdbot can communicate with the runner. Run the following command in your terminal:docker desktop enable model-runner –tcp

Recommended Models for Personal Assistants

While coding models focus on logic, personal assistant models need a balance of instruction-following, tool-use capability, and long-term memory.

Model

Best For

DMR Pull Command

gpt-oss

Complex reasoning & scheduling

docker model pull gpt-oss

glm-4.7-flash

Fast coding assistance and debugging

docker model pull glm-4.7-flash

qwen3-coder

Agentic coding workflows

docker model pull qwem3-coder

Pulling models from the ecosystem

DMR can pull models directly from Hugging Face and convert them into OCI artifacts automatically:

docker model pull huggingface.co/bartowski/Llama-3.3-70B-Instruct-GGUF

Context Length and “Soul”

For a personal assistant, context length is critical. Clawdbot relies on a SOUL.md file (which defines its personality) and a Memory Vault (which stores your preferences).

If a model’s default context is too small, it will “forget” your instructions mid-conversation. You can use DMR to repackage a model with a larger context window:

docker model package –from llama3.3 –context-size 128000 llama-personal:128k

Once packaged, reference llama-personal:128k in your Clawdbot config to ensure your assistant always remembers the full history of your requests.

Putting Clawdbot to Work: Running Scheduled Tasks 

With Clawdbot and DMR running, you can move beyond simple chat. Let’s set up a “Morning Briefing” task.

Verify the Model: docker model ls (Ensure your model is active).

Initialize the Soul: Run clawdbot init-soul to define how the assistant should talk to you.

Assign a Task:“Clawdbot, every morning at 8:00 AM, check my unread emails, summarize the top 3 priorities, and message me the summary on Telegram.”

Because Clawdbot is connected to your private Docker Model Runner, it can parse those emails and reason about your schedule privately. No data leaves your machine; you simply receive a helpful notification on your phone via your chosen messaging app.

How You Can Get Involved

The Clawdbot and Docker Model Runner ecosystems are growing rapidly. Here’s how you can help:

Share Model Artifacts: Push your optimized OCI model packages to Docker Hub for others to use.

Join the Community: Visit the Docker Model Runner GitHub repo.

Quelle: https://blog.docker.com/feed/

AWS Transfer Family now supports Amazon FSx for NetApp ONTAP

AWS Transfer Family customers can now access file system data stored in Amazon FSx for NetApp ONTAP over SFTP, FTPS, and FTP.  AWS Transfer Family provides fully managed file transfers over SFTP, FTP, FTPS, AS2, and web browser-based interfaces. With this launch, you can now access FSx for ONTAP file systems over Transfer Family’s supported protocols through S3 Access Points, while maintaining access via native file protocols (NFS/SMB). This allows you to maintain existing file system workflows while adding secure access via industry-standard protocols for external partners and internal users. Access is controlled through standard IAM policies and S3 Access Point configurations, helping you meet data security and compliance requirements. Transfer Family support for FSx for ONTAP is available in select AWS Regions. To get started, visit the AWS Transfer Family console, or use AWS CLI/SDK. To learn more, visit the Transfer Family User Guide.
Quelle: aws.amazon.com

Amazon Managed Grafana now available in the AWS GovCloud (US) Regions

Amazon Managed Grafana is now available in both AWS GovCloud (US-West) and AWS GovCloud (US-East) Regions, enabling government customers and regulated industries to securely visualize and analyze their operational data while meeting stringent compliance requirements. Amazon Managed Grafana is a fully managed service based on open-source Grafana that makes it easier for you to visualize and analyze your operational data at scale. All Amazon Managed Grafana features are supported in AWS GovCloud (US) Regions except for Enterprise plugins. To get started with Amazon Managed Grafana, visit the AWS Console and Amazon Managed Grafana user guide. To learn more about Amazon Managed Grafana, visit the product page and pricing page.
Quelle: aws.amazon.com

Amazon Bedrock now supports 1-hour duration for prompt caching

Amazon Bedrock now supports a 1-hour time-to-live (TTL) option for prompt caching for select Anthropic Claude models. With this update, you can extend the persistence of cached prompt prefixes from the default 5 minutes to 1 hour, improving cost efficiency and performance for long-running agentic workflows and multi-turn conversations. Previously, cached content remained active for a fixed 5-minute window and refreshed when reused. With the new 1-hour TTL option, you can maintain context for users who interact less frequently, or for complex agents that require more time between steps—such as tool use, retrieval, and orchestration. The 1-hour TTL is also useful for longer sessions and batch processing where you want cached content to persist across extended periods. 1-hour TTL prompt caching is generally available for Anthropic’s Claude Sonnet 4.5, Claude Haiku 4.5, and Claude Opus 4.5 in all commercial AWS Regions and AWS GovCloud (US) Regions where these models are available. The 1-hour cache is billed at a different rate than the standard 5-minute cache. To learn more, refer to the Amazon Bedrock documentation and Amazon Bedrock Pricing page.
Quelle: aws.amazon.com

Amazon Lightsail expands blueprint selection with updated support for Node.js, LAMP, and Ruby on Rails blueprints

Amazon Lightsail now offers new Node.js, LAMP, and Ruby on Rails blueprints. These new blueprint have Instance Metadata Service Version 2 (IMDSv2) enforced by default, and support IPv6-only instances. With just a few clicks, you can create a Lightsail virtual private server (VPS) of your preferred size with Node.js, LAMP, or Ruby on Rails preinstalled. With Lightsail, you can easily get started on the cloud by choosing a blueprint and an instance bundle to build your web application. Lightsail instance bundles include instances preinstalled with your preferred operating system, storage, and monthly data transfer allowance, giving you everything you need to get up and running quickly. These new blueprints are now available in all AWS Regions where Lightsail is available. For more information on blueprints supported on Lightsail, see Lightsail documentation. For more information on pricing, or to get started with your free trial, click here.
Quelle: aws.amazon.com