Communities in the cloud

The cloud’s very name reflects how many people think of this data storage system: intangible, distant, and disentangled from day-to-day life. But MIT PhD student Steven Gonzalez is reframing the image and narrative of an immaterial cloud. In his research, he’s showing that the cloud is neither distant nor ephemeral: It’s a massive system, ubiquitous in daily life, that contains huge amounts of energy, has the potential for environmental disaster, and is operated by an insular community of expert technicians.

Who’s tending the cloud?

“People so often rely on cloud services,” Gonzalez notes, “but they rarely think about where their data is stored and who is storing it, who is doing the job of maintaining servers that run 24/7/365, or the billons of gallons of water used daily to cool the servers, or the gigawatts of electricity that often come from carbon-based grids.”

The first time Gonzalez walked into a server farm, he was enthralled and puzzled by this giant factory filled with roaring computers and by the handful of IT professionals keeping it all running. At the time, he was working with specialized sensors that measured air in critical spaces, including places like the server farm. But the surreal facility led him back to his undergraduate anthropological training: How do these server spaces work? How has the cloud shaped these small, professional communities?

Gonzalez has been fascinated with visible, yet rarely recognized, communities since his first undergraduate ethnography on bus drivers in the small New Hampshire city of Keene. “In anthropology, everyone is a potential teacher,” he says, “Everyone you encounter in the field has something to teach you about the subject that you’re looking at, about themselves, about their world.”

Server farms are high-stakes environments

Listening — and a lot of patience — are skills with which Gonzalez cultivated the technical expertise to understand his subject matter. Cloud communities are built around, and depend upon, the technology they maintain, and that technology in turn shapes their behavior. So far, Gonzalez has completed his undergraduate and masters research and degrees, and is currently wrapping up PhD coursework en route to his dissertation. He’s visited server farms across North America and in Scandinavia, where farm operators are seeking to go carbon-free in order to cut the cloud’s carbon emissions, which comprise up to 3 percent of greenhouse gases, according to Greenpeace.

The server-farm technicians function in an extremely high-stakes world: Not only is a massive amount of energy expended on the cloud, but even a few moments of downtime can be devastating. If the systems go down, companies can lose up to $50,000 per minute, depending on what sector (financial, retail, public sector, etc.) and which server racks are affected. “There’s a kind of existential dread that permeates a lot of what they say and what they do,” Gonzalez says. “It’s a very high-stress, unforgiving type of work environment.”

New technology, old gender inequity

In response to these fears, Gonzalez has noted some “macho” performances in language and behavior by cloud communities. The mostly male cloud workforce “tend to use very sexual language,” Gonzalez observes. For instance, when all the servers are functioning properly it’s “uptime”; “They’ll use sexualized language to refer to how ‘potent’ they are or how long they can maintain uptime.”

The cloud communities aren’t exclusively male, but Gonzalez says visibility for women is a big issue. Women tend to be framed as collaborators, rather than executors. Tied up in this sexist behavior is the decades-old patriarchal stereotype that technology is a male domain in which machines are gendered in a way that makes them subordinate.

Although anthropological research is the focus of his academic work, Gonzalez’s interests at MIT have been expansive. With the encouragement of his advisor, Professor Stefan Helmreich, he’s kept his lifelong interest in music and science fiction alive by singing in the MIT Jazz Choir and Concert Choir and taking coursework in science fiction writing. He also enjoyed exploring coursework in history, documentary making, and technology courses. Anthropology is the first among several passions he first discovered during explorations as an undergraduate at Keene State College.

“For me, what makes anthropology so capacious is just the diversity of human experience and the beauty of that,” says Gonzalez. “The beauty of so many different possibilities, different configurations of being, that exist simultaneously.”

The open doors of MIT

Gonzalez was born in Orlando, Florida, to Puerto Rican parents who made sure he always had a connection with the island, where he would spend summers with his grandmother. A first-generation college student, Gonzalez says it was never a given that he would even go to college, let alone earn a doctorate: “I never would have imagined that I would have ended up here. It’s a sad reality that, as a Latino person in this country, I was more likely to end up in prison than in a place like MIT. So I had — and I still do — immense respect and awe for the Institute. MIT has a mystique, and when I first arrived I had to deal with that mystique, getting over the sense that I don’t belong.”

He had big expectations about entering a hugely competitive institution but was surprised to find that, in addition to its competitive edge, the Institute was incredibly supportive. “The thing that surprised me the most was how open everyone’s door was.”

Gonzalez has become more and more deeply involved with the campus goings-on: he’s now a Diversity Conduit for the Graduate Student Council Diversity and Inclusion Initiative and is also part of an MIT student initiative that is exploring Institute ties and possible investments in the prison-industrial complex.
 

Story prepared by MIT SHASS Communications
Editorial and Design Director: Emily Hiestand
Writer: Alison Lanier
Quelle: Massachusetts Institute of Technology

Ramping up cloud migration discovery and assessment with StratoZone

Businesses around the world have been impacted by the global pandemic, making the need to migrate to the cloud more urgent than ever. On their own, many organizations have difficulty completing cloud migrations on time and on budget. Among their challenges: unknown capacity needs and sizing; unknown performance thresholds; navigating expensive hardware refresh cycles or contract renewals; understanding licensing and support issues; and compliance and security concerns. At Google Cloud, we believe migrating to the cloud must be simple and provide clear advantages. We recently launched RAMP, our holistic migration program that provides the best of our expertise, services, partners, and tools to enable a simpler, faster path to cloud success. RAMP helps customers unlock their cloud potential through repeatable processes with predictable results for cloud migrations. To accelerate the assessment and planning phases, we acquired StratoZone, a migration discovery and assessment specialist.Since 2014, StratoZone has helped thousands of companies understand their data center footprint, plan out their migration, and make the jump to the cloud. Now as part of Google Cloud, their technology and expertise supports enterprises who are looking to accelerate their cloud migrations, helping discover, assess, and understand complex dependencies across workloads moving to the cloud. As part of our cloud migration program, these tools will help you more predictably evaluate your IT landscape and plan what can move, what should move, and in what order. This means a migration program that is:Fast: Assess your workloads quickly, with a solution that takes very little time to install and run (typically under 45 minutes), and can scale to discover and assess thousands of assets in hours. Initial results are typically available in days.Easy to use: Discover workloads and assets using an agentless process. No appliances, hardware, or agents to deploy. Discovery is hypervisor and physical/virtual agnostic.Secure: All data is encrypted in transit and at rest. Customers control data anonymization.Based on best practices: Recommendations for optimal migration and modernization targets based on existing assets (including estimating TCO), proposing an optimal mix of cloud services as well as migration phases that minimize application disruption.How to get started with discovery and assessment for your migrationWe’re excited to help you make your move to the cloud. To get started, click here to estimate your cloud migration costs with a free assessment or get hands-on with the tech with this Qwiklab.
Quelle: Google Cloud Platform

Dataflow Under the Hood: comparing Dataflow with other tools

Editor’s note: This is the third blog in a three-part series examining the internal Google history that led to Dataflow, how Dataflow works as a Google Cloud service, and here, how it compares and contrasts with other products in the marketplace.To place Google Cloud’s stream and batch processing tool Dataflow in the larger ecosystem, we’ll discuss how it compares to other data processing systems. Each system that we talk about has a unique set of strengths and applications that it has been optimized for. We’re biased, of course, but we think that we’ve balanced these needs particularly well in Dataflow.Apache Kafka is a very popular system for message delivery and subscription, and provides a number of extensions that increase its versatility and power. Here, we’ll talk specifically about the core Kafka experience. Because it is a message delivery system, Kafka does not have direct support for state storage for aggregates or timers. These can be layered on top through abstractions like Kafka Streams. Kafka does support transactional interactions between two topics in order to provide exactly once communication between two systems that support these transactional semantics. It does not natively support watermark semantics (though can support them through Kafka Streams) or autoscaling, and users must re-shard their application in order to scale the system up or down.Apache Spark is a data processing engine that was (and still is) developed with many of the same goals as Google Flume and Dataflow—providing higher-level abstractions that hide underlying infrastructure from users. Spark has a rich ecosystem, including a number of tools for ML workloads. Spark has native exactly once support, as well as support for event time processing. Spark does have some limitations as far as its ability to handle late data, because its event processing capabilities (and thus garbage collection) are based on static thresholds rather than watermarks. State management in Spark is similar to the original MillWheel concept of providing a coarse-grained persistence mechanism. Users need to manually scale their Spark clusters up and down. One major limitation of structured streaming like this is that it is currently unable to handle multi-stage aggregations within a single pipeline.Apache Flink is a data processing engine that incorporates many of the concepts from MillWheel streaming. It has native support for exactly-once processing and event time, and provides coarse-grained state that is persisted through periodic checkpointing. The effect of this on the cost of state persistence is ambiguous, since most Flink deployments still write to a local RocksDB instance frequently, and periodically checkpoint this to an external file system. Depending on the frequency of checkpointing, this can increase time to recovery in the case that computation has to be repeated. Flink also requires manual scaling by its users; some vendors are working towards autoscaling Flink, but that would still require learning the ins and outs of a new vendor’s platform.Finally, a brief word on Apache Beam, Dataflow’s SDK. Given Google Cloud’s broad open source commitment (Cloud Composer, Cloud Dataproc, and Cloud Data Fusion are all managed OSS offerings), Beam is often confused for an execution engine, with the assumption that Dataflow is a managed offering of Beam. That’s not the case—Dataflow jobs are authored in Beam, with Dataflow acting as the execution engine. The benefits of Apache Beam come from open-source development and portability. Jobs can be written to Beam in a variety of languages, and those jobs can be run on Dataflow, Apache Flink, Apache Spark, and other execution engines. That means you’re never locked into Google Cloud.This concludes our three-part Under the Hood walk-through covering Dataflow. Check out part 1 and part 2. We’re excited about the current state of Dataflow, and the state of the overall data processing industry. We look forward to delivering a steady “stream” of innovations to our customers in the months and years ahead.
Quelle: Google Cloud Platform

Shining a light on Anthos at Next OnAir application modernization week

Welcome to Week 7 of Google Cloud Next ‘20: OnAir! This week’s app modernization track goes live Tuesday, August 25 at 9:00am PT, and is all about containers, serverless, app development—and my personal favorite, Anthos.App modernization is such a relevant and timely topic. Now more than ever, companies across the globe are looking at their existing applications, and considering how new technologies could help modernize them to make their businesses more efficient and streamlined. This week is all about that very challenge. As a Developer Advocate who strives to help businesses understand what app modernization can look like, I’m blocking lots of time on my calendar to watch these sessions this week!Here are some breakout talks at the top of my watch list:Hands-on Keynote: Building Trust for Speedy Innovation: I always love a good keynote to help dig into what this week in Next is all about. And a hands-on keynote sounds even better! As with any app modernization effort, this session focuses on the basics—“what are you really trying to accomplish?”—by tying technology innovation back to business goals such as increasing trust and speed.Getting Started with Anthos: Containers are at the core of many app modernization journeys. With many businesses adopting hybrid- and multi-cloud strategies, they’re looking for tools that can help them create the best environments for their applications, wherever they may be. Anthos is a tool with a lot to offer in these modern, distributed environments.Modernizing Texas’ Best Retail Chain with Anthos: There’s nothing like a customer story to understand how to use a new tool or technology in practice. Many viewers will be able to relate to how H-E-B grappled with the challenges of transforming traditional systems into modern microservice-style architectures. I’ll be looking to learn about what specific challenges H-E-B faced, why they chose the solutions they did, and what they’re planning to do next.Mainframe Modernization: Accelerating Legacy Transformation: Mainframes are everywhere, but more and more businesses are working to understand life beyond the mainframe. A whole session addressing challenges and solutions around mainframe-to-container modernization sounds like just what this app modernization expert ordered.Integrating VM Workloads into Anthos Service Mesh: While containers are a cornerstone of many app modernization efforts, “app modernization” is really all about the apps—and not every app belongs in a container! VM solutions are still popular and sometimes the best tool for the job. But did you know that popular technologies commonly associated with containers, like service mesh, can offer benefits to VM-based applications too?Evolve to Zero Trust Security Model‎ with Anthos Security: No app modernization journey would be complete without incorporating security. And a key component of modernizing security postures is evolving closer to a zero-trust security model. Google has a wealth of experience to draw on when it comes to modern, zero-trust application architectures and this session will share that expertise with you, and show you the tools to implement it yourself.These are just a few of the exciting sessions I have on my playlist. Looking through the schedule, I know I’ll be checking out a lot more as the week goes on! Let’s not forget that Next OnAir is also offering more than just the typical breakout sessions:Check out the Explore Anthos Demo for a technical demonstration of the tools Anthos uses to support modernized applications.Get hands-on practice managing traffic routing with Istio and Envoy and how to implement continuous delivery with Jenkins in Google Kubernetes Engine (GKE) during this week’s Cloud Study Jam workshops. You’ll also learn more about Google Cloud’s Professional Cloud DevOps Engineer certification. Take your app modernization skills to the next level and compete with peers for prizes in this week’s Cloud Hero game.Recap what you’ve learned, learn about some great content you might have missed, and take the opportunity to ask experts your questions at Cloud Talks by DevRel live on Friday, August 28th.We hope you have fun at Next OnAir this week! Don’t forget to check out our other breakout talks and register at g.co/cloudnext.
Quelle: Google Cloud Platform

How Azure Synapse Analytics can help you respond, adapt, and save

Business disruptions, tactical pivots, and remote work have all emphasized the critical role that analytics plays in all organizations. Uncharted situations demand proven performance insights so that businesses can quickly determine what is and is not working. In recent months, the urgency for business-guiding insights has only been heightened, leading to a need for real-time analytics solutions. Equally important is the need to discover and share these insights in the most cost-effective manner.

Not only has COVID-19 been a challenge to world health but also has created new economic challenges to businesses worldwide. These challenges have resulted in an increased need for tools that quickly deliver insights to business leaders—empowering informed decisions. This is where Microsoft Azure Synapse Analytics can help.

New circumstances demand new solutions

Azure Synapse Analytics is a new type of analytics platform that enables you to accelerate your time-to-insight with a unified experience and—just as important—save on costs while doing so. It is up to 14 times faster and costs 94 percent less than other cloud providers. Let’s dive into how Azure Synapse can help you respond, adapt, and save.

Respond to disruption and adapt to a new normal

History shows that proven analytics technologies, such as Azure Synapse, have a strong track record of enabling more dynamic and exploratory responses that can guide businesses through difficult times. Traditional data warehouses and reports can’t scale to provide the intelligence and insight that business executives demand in today’s world.

To make good strategic decisions, businesses need to quickly and effectively find new insights in their data. This can only come through more advanced tools and an improved understanding of how to get the most from them.

Each recent global economic crisis can be correlated with a follow-up increase in data analytics projects as companies worldwide lean on data analytics to boost their recovery.

To enable teams to collaborate and innovate, they need tools and services that help them discover, explore, and quickly and efficiently find new insights.

Azure Synapse has an intelligent architecture that makes it industry-leading in unifying big data workloads with traditional data warehousing while at the same time encouraging collaboration and reducing costs.

Using Azure Synapse, businesses can empower their teams to collaborate, adapt, and create new strategies that are driven by data. Azure Synapse not only makes it easy to start and scale in the cloud, but it has key security, governance, and monitoring tools that are critical for successful data analytics solutions.

Save on costs with Azure Synapse

The current economic challenges have certainly made us all—individuals and businesses—more conscious of our spending. Businesses are looking for new ways to improve productivity and efficiency on limited budgets. Cloud analytics in general, and Azure Synapse in particular, are a great fit for this requirement because it helps businesses start small and scale as needed.

Azure Synapse offers a cost-effective service due to its intelligent architecture, which separates storage, compute power, and resources—but makes them seamlessly available when needed. This means that you do not have to keep paying for cloud services if you experience unexpected events that cause business disruptions and tactical pivots. Services can simply be paused to release resources and save costs. You can also scale compute separately from storage, which brings even more cost savings.

Azure Synapse has been found to offer a significantly better price-to-performance ratio when compared with similar services from other cloud providers. This chart from an independent study shows the price-performance comparison (lower is better).

In a recent study, GigaOm, an independent emerging technology research firm, found that Azure Synapse has the best price-to-performance ratio on the market. The study surveyed many services from all of the major cloud providers and took both performance and cost into account. Besides being powerful and cost-effective, Azure Synapse offers industry-leading features when it comes to governance, monitoring, and collaboration that address key challenges for data analytics projects. These features provide businesses with the right tools to control not only costs but also the entire analytics lifecycle, including security, performance, and accuracy.

Learn more

Great leadership, a clear vision, and intelligent data analytics are key components that can help during significant economic and health challenges. Business leaders must lean on the data and insights available and the will, knowledge, and skills of their teams. Empowering your team with the right tools is critical to ensuring they have what they need to effectively collaborate, discover, and work towards recovery.

To learn more about Azure Synapse:

Read the e-book Three Ways Analytics Can Help: Respond, Adapt, and Save.
Get started on Azure Synapse Analytics with an Azure account.
Visit the Azure Synapse documentation webpage for tutorials.
Request a call from an Azure Synapse sales specialist when you’re ready.

Quelle: Azure

Power mission-critical applications with Azure Database for PostgreSQL

In our current environment, organizations are increasingly looking towards digital solutions to engage their customers and remain competitive. They’re discovering that their customers’ needs can be best met through differentiated, digital experiences delivered by cloud-native applications.

When building a new application, one of the most important decisions to make is where to store the application data. We see tremendous interest in Azure Database for PostgreSQL when it comes to storing relational data in the cloud for mission-critical applications. Here’s why:

Why Azure Database for PostgreSQL?

100 percent open source.

Azure Database for PostgreSQL is built on community edition Postgres, with open extension support so you can leverage valuable PostgreSQL features, including JSONB, geospatial support, and rich indexing.
Our Postgres team at Microsoft is committed to nurturing a culture of contributing to and collaborating with the Postgres community, and we’re excited to welcome Postgres committers to the team. These committers review submitted code for Postgres, “commit” it into the source code repository, and work with other contributors to test, refine, and eventually incorporate it into the next Postgres build. In future blogs, they’ll share what they’re working on when it comes to new versions of Postgres and updates to the Citus open source extension.

Fully managed.

Using a managed Postgres database service on Azure makes your job simpler and allows you to focus on your application, by automating time- and cost-intensive tasks like configuring and managing high-availability, disaster recovery, backups, and data replication across regions.
Azure Database for PostgreSQL has enterprise-grade security and compliance capabilities, such as Azure Advanced Threat Protection, and provides customized performance recommendations, making it suitable for your most mission-critical applications.

High-performance horizontal scaling.

Azure Database for PostgreSQL Hyperscale (Citus) scales out horizontally to hundreds of nodes, with no application rewrites, so you can easily build incredibly scalable applications. This is done using the Citus open source Postgres extension that intelligently and transparently distributes your data and queries across multiple nodes to achieve massive parallelism, along with a much bigger compute, memory, and disk footprint.
In future blogs, we’ll dive deeper into key use cases for Hyperscale (Citus) and share how organizations are building powerful applications using these capabilities. Hyperscale (Citus) enables organizations to scale multi-tenant SaaS applications, build high throughput transactional applications, and more.

Key considerations for running modern enterprise applications on Postgres

If you’re new to the world of Postgres and want to understand whether it’s a good fit for building your enterprise applications, check out our new whitepaper, Running Enterprise Applications on PostgreSQL, to learn about:

Benefits of using Postgres for modern, mission-critical applications, and considerations around performance, scalability, security, and more.
Postgres’ extensible design that enables you to future-proof your applications.
Postgres capabilities that empower developers to work more productively.

Announcing our Postgres blog series

We are here to help you at every stage of your journey with Azure Database for PostgreSQL, and will be keeping you up to date with regular posts both here and on our Azure Database for PostgreSQL blog. We’ll be sharing:

Insights and updates directly from Postgres committers Thomas Munro, Jeff Davis, Andres Freund, and David Rowley—you can read the first post in the series, How to securely authenticate with SCRAM in Postgres 13, now.
Key customer use cases, stories, and architectures.
Resources and best practices to help you develop innovative applications to solve business challenges.

Whether you’re new to Postgres, new to Azure, or are already using Azure Database for PostgreSQL to power your enterprise applications, we look forward to supporting your application development journey.

Stay tuned for future blogs here and subscribe to our Azure Database for PostgreSQL blog.

Learn more about Azure Database for PostgreSQL.
Quelle: Azure

Scaling Docker’s Business to Serve Millions More Developers: Storage

At Docker, our mission is to enable developers worldwide to quickly bring their ideas to life by reducing the complexity of application development. While over 6.5 million registered Docker developers are enjoying the benefits of Docker today, we want to scale Docker’s business to the tens of millions of developers who are just discovering Docker. Offering free tools and services is a cornerstone of our mission, and these are funded by our paid subscription services.

In this blog series, we will deep dive into why and how the recently announced Terms of Service changes were introduced. This blog, Part 1, will explore the inactive image retention policy and how it will impact development teams who use Docker Hub for managing their container images. Part 2 will focus on the new rate limits policies that were introduced for image pulls.

A deeper look at Docker Hub images

Delivering containerized applications in a portable, secure, and resource efficient manner also requires tools and services for securely storing and sharing applications with your development team. Today, Docker is proud to offer the world’s largest container registry, Docker Hub, which is used by over 6.5 million developers around the world. Over 15 PB of container images are stored on Docker Hub today, spanning everything from the world’s most popular in-memory databases to event streaming platforms, to highly curated and trusted Docker Official Images, and to the 150 million images created by the Docker community.

Docker’s internal analytics tools have shown that of the 15PB of images stored in Docker Hub, over 10PB of these images have not been accessed for more than 6 months. Digging a little deeper, we found that over 4.5PB of these inactive images are associated with free accounts. Many of these images were only being used for a short period of time, including images that resulted from CI pipelines configured with Docker Hub where the deletion of these ephemeral images was often overlooked.

With an enormous footprint of inactive data sitting idle in Docker Hub, the team was faced with a difficult question – how do we limit the amount of inactive data that Docker is incurring storage charges for on a monthly basis, while not impacting the overall Docker customer experience?

The guiding principles for addressing this challenge were two-fold:

Continue to offer a compelling suite of free-to-use Docker tools and services that developers – including open source projects – can use to build, share, and run applications.Ensure Docker can scale to satisfy the demand of new developers by limiting the currently unbounded storage costs – one of the largest operating expenses for Docker Hub.

Helping developers manage inactive images

To help Docker economically scale its infrastructure to support free services for our growing base of users, several updates were announced. First, a new inactive image retention policy was introduced that will automatically delete images hosted in free accounts that have not been used in 6 months. In addition, Docker will also be providing tooling, in the form of a UI and APIs, that will allow users to more easily manage their images. Together, these changes will allow developers to more easily clean up their inactive images and also ensure Docker can economically scale its infrastructure.

With this new policy, starting on November 1, images stored in free Docker Hub repositories that have not had their manifest pushed or pulled in the last 6 months will be removed. This policy does not apply to images stored by paid Docker Hub subscription accounts, Docker Verified Publishers , or Docker Official Images.

Example #1: Molly, a free Docker Hub user, pushed a tagged image molly/hello-world:v1 to Docker Hub on January 1, 2019. The image was never pulled since it was pushed. This tagged image will be considered inactive beginning November 1, 2020 when the new policy takes effect. The image and any tag pointing to hit, will be subject to deletion on November 1, 2020.

Example #2: Molly has another untagged image molly/myapp@sha256:c0ffee that was first pushed on January 1, 2018. This image was last pulled on August 1, 2020. This image will be considered an active image and will not be subject to deletion on November 1, 2020.

Minimizing impact to the developer community

For free accounts, Docker offers free retention of images inactive for six months. For users that need their inactive images to be retained, Docker also offers unlimited image retention as a feature of the Pro and Team plans. Visit www.docker.com/pricing to view the available plans.

In addition, Docker will be offering a set of tools and services to help developers easily view and manage their images, including the following product updates which will be made available on Docker Hub in the coming months:

Image management dashboard to view and manage images across all repositories within a namespace (roadmap issue #146)Email notifications for images that are set to expire (roadmap issue #147)Restoration of recently deleted images (roadmap issue #148)

Finally, as part of Docker’s commitment to the open source community, before November 1 we will be announcing availability of new open source plans. To apply for an open source plan, please complete the short form here.

For more information regarding the recent terms of service changes, please refer to the FAQ.

Stay tuned for email communications regarding any upcoming expiring images in your account or sign up for a Pro or Team plan for unlimited inactive image retention.

While we tried to minimize the impact on developers, it’s possible you have questions or use cases that are not covered. As always, we welcome your questions and feedback at pricingquestions@docker.com.
The post Scaling Docker’s Business to Serve Millions More Developers: Storage appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/

Scaling Docker to Serve Millions More Developers: Network Egress

In Part 1 of this blog we went into a deep dive that analyzed all of the images stored in Docker Hub, the world’s largest container registry. We did this to give you a better understanding of how our new Terms of Service updates will impact development teams who use Docker Hub to manage their container images and CI/CD pipelines.

Part 2 of this blog post takes a deep dive into rate limits for container image pulls. This was also announced as part of our updated Docker Terms of Service (ToS) communications. We detailed the following pull rate limits to Docker subscription plans that will take effect November 1, 2020:

Free plan – anonymous users: 100 pulls per 6 hours Free plan – authenticated users: 200 pulls per 6 hoursPro plan – unlimitedTeam plan – unlimited

Docker defines pull rate limits as the number of manifest requests to Docker Hub. Rate limits for Docker image pulls are based on the account type of the user requesting the image – not the account type of the image’s owner. For anonymous (unauthenticated) users, pull rates are limited based on the individual IP address. 

We’ve been getting questions from customers and the community regarding container image layers. We are not counting image layers as part of the pull rate limits. Because we are limiting on manifest requests, the number of layers (blob requests) related to a pull is unlimited at this time. This is a change based on community feedback in order to be more user-friendly, so users do not need to count layers on each image they may be using.

A deeper look at Docker Hub pull rates

In determining why rate limits were necessary and how to apply them, we spent considerable time analyzing image downloads from Docker Hub. What we found confirmed that the vast majority of Docker users pulled images at a rate you would expect for normal workflows. However, there is an outsized impact from a small number of anonymous users. For example, roughly 30% of all downloads on Hub come from only 1% of our anonymous users.

The new pull limits are based on this analysis, such that most of our users will not be impacted. These limits are designed to accommodate normal use cases for developers – learning Docker, developing code, building images, and so forth.

Helping developers understand pull rate limits 

Now that we understood the impact and where the limits should land, we needed to define at a technical level how these limits should work. Limiting image pulls  to a Docker registry is complicated. You won’t find a pull API in the registry specification – it doesn’t exist. In fact, an image pull is actually a combination of manifest and blob API requests, and these are done in different patterns depending on the client state and the image in question. 

For example, if you already have the image, the Docker Engine client will issue a manifest request, realize it has all of the referenced layers based on the returned manifest, and stop. On the other hand, if you are pulling an image that supports multiple architectures, a manifest request will be issued and return a list of image manifests for each supported architecture. The Docker Engine will then issue another specific manifest request for the architecture it’s running on, and receive a list of all the layers in that image. Finally, it will request each layer (blob) it is missing.

So an image pull is actually one or two manifest requests, and zero to infinite blob (layer) requests. Historically, Docker monitored rate limits based on blobs (layers). This was because a blob most closely correlated with bandwidth usage. However, we listened to feedback from the community that this is difficult to track, leads to an inconsistent experience depending on how many layers the image you are pulling has, discourages good Dockerfile practices, and is not intuitive for users who just want to get stuff done without being experts on Docker images and registries.

As such, we are rate limiting based on manifest requests moving forward. This has the advantage of being more directly coupled with a pull, so it is easy for users to understand. There is a small tradeoff – if you pull an image you already have, this is still counted even if you don’t download the layers. Overall, we hope this method of rate limiting is both fair and user-friendly.

We welcome your feedback

We will be monitoring and adjusting these limits over time based on common use cases to make sure the limits are appropriate for each tier of user, and in particular, that we are never blocking developers from getting work done.

Stay tuned in the coming weeks for a blog post about configuring CI and production systems in light of these changes.

Finally, as part of Docker’s commitment to the open source community, before November 1 we will be announcing availability of new open source plans. To apply for an open source plan, please complete the short form here.

For more information regarding the recent terms of service changes, please refer to the FAQ.

For users that need higher image pull limits, Docker also offers unlimited image pull as a feature of the Pro and Team plans. Visit www.docker.com/pricing to view the available plans.As always, we welcome your questions and feedback at pricingquestions@docker.com.
The post Scaling Docker to Serve Millions More Developers: Network Egress appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/

10 Years of OpenStack – Alex Xu at Intel

Storytelling is one of the most powerful means to influence, teach, and inspire the people around us. To celebrate OpenStack’s 10th anniversary, we are spotlighting stories from the individuals in various roles from the community who have helped to make OpenStack and the global Open Infrastructure community successful.  Here, we’re talking to Alex Xu from… Read more »
Quelle: openstack.org