The Livecycle Docker Extension: Instantly Share Changes and Get Feedback in Context

Zevi Reinitz and Roy Razon from Livecycle contributed this guest post.

A collaborative workflow is essential for successful development — developers need a way to quickly and easily share their work, and team members need a quick way to review and provide feedback. The sooner developers can share changes and get clear feedback, the faster the feedback loop can be closed and the new code can be merged to production.

Livecycle’s Docker Extension makes it easy for developers to share their work-in-progress and collaborate with the team to get changes reviewed. With a single click, you can securely share your local development environment, and get it reviewed to ensure your code meets your team’s requirements. In this post, we provide step-by-step instructions for setting up and getting started with the Livecycle Docker Extension.

Meet Livecycle — A fast way for dev teams to collaborate 

Livecycle enables development teams to collaborate faster, and in context. Generally, getting feedback on bug fixes and new features results in multiple iterations and long feedback loops between team members. Dev teams quickly struggle to have detailed discussions out of context, causing frustration and hurting productivity. Livecycle shortens the feedback loop by allowing you to share your work instantly and collect feedback immediately while everyone is still in context. 

Livecycle’s open source tool, Preevy, integrates into your CI pipeline to convert your pull requests into public or private preview environments, provisioned on your cloud provider or Kubernetes cluster. 

And, with the launch of our new Docker Desktop Extension, you can now do the same for your local development environment, by sharing it securely with your team and getting the review and feedback process started much earlier in the development lifecycle (Figure 1).

Figure 1: Livecycle feedback loop — before and after.

Architecture

The Livecycle architecture can be presented as two possible flows — one that works in the CI and the other using the Docker Extension, as follows.

When running a CI build to create a preview environment for a pull request, for example, the Preevy CLI provisions a VM on your cloud provider or a Pod on your Kubernetes cluster, and it runs a Docker server, which hosts your Docker Compose project containers. 

The Preevy CLI also starts a companion container, the Preevy Agent, which creates an SSH connection to the Preevy Tunnel Server. For every published port in your Docker Compose project, an SSH tunnel is created with its own HTTPS URL. When an HTTPS request arrives at the Tunnel Server, it gets routed to your specific service according to the hostname. If the service is defined as private, the Tunnel Server also handles authentication.

When using the Livecycle Docker Extension, the same Preevy CLI (bundled in the extension) is used to start the companion Preevy Agent on the local Docker Desktop server. A public or private URL is created for every published port in your Docker Compose project.

The Livecycle architecture is shown in Figure 2.

Figure 2: Livecycle architecture blueprints.

Why run Livecycle as a Docker Extension?

In the context of the development workflow, true collaboration is achieved when dev teams can share changes quickly and collect clear feedback from others on the team. If you can achieve both, you’re in excellent collaborative shape. If either the ability to share quickly or the ability to collect feedback is lacking, your team will not be able to collaborate effectively.

And that’s precisely the benefit of running Livecycle as a Docker Extension — to exploit both of these collaborative opportunities to the fullest extent possible: 

The fastest way to share changes at the earliest possible point: The Livecycle extension shares local containers without the headache of staging environments or CI builds. This is the fastest and earliest way to kick off a collaborative review cycle.

The most convenient way to collect feedback from everyone: The Livecycle extension provides built-in review tools so anyone on the team can give technical or visual feedback in context. 

More developers now see the benefits of a “shift-left” approach, and Docker’s native toolkit helps them do that. Using Livecycle as a Docker extension extends this concept further and brings a truly collaborative review cycle to an earlier part of the software development life cycle (SDLC). And that is something that can save time and also help benefit everyone on the team.

Getting started with the Livecycle Docker Extension

Getting started with the Livecycle Docker Extension is simple once you have Docker Desktop installed. Here’s a step-by-step walkthrough of the initial setup:1. Installing the extensionNavigate to the Livecycle extension or search for “Livecycle” in the Docker Desktop Extensions Marketplace. Select Install to install the extension (Figure 3).

Figure 3: Install Livecycle extension.

2. Setting up a Livecycle accountOnce you have installed the extension and opened it, you will be greeted with a login screen (Figure 4). You can choose to log in with your GitHub account or Google account. If you previously used Livecycle and created an organization, you can log in with your Livecycle account.

Figure 4: Create Livecycle account.

3. Getting shareable URLsAs soon as you log in, you will be shown a list of running Docker Compose applications and all the services that are running in them. To get a public shareable URL for every service, turn on the toggle next to the compose application name. After that, you will be prompted to choose the access level (Figure 5).

Figure 5: Share and establish secure tunnel toggle.

You can choose between public and private access. If you choose public access, you will get a public URL to share with anyone. If you choose private access, you will get a private URL that requires authentication and can only be used by your organization members. Then, select Share to get the shareable URL (Figure 6).

Figure 6: Choose access mode.

4. Accessing the shared URLURLs created by the extension are consistent, shareable, and can be used by a browser or any other HTTP client. Using these URLs, your team members can see and interact with your local version of the app as long as the tunnel is open and your workstation is running (Figure 7).

Figure 7: View and share the custom-generated links.

Private environments require adding team members to your organization, and upon access, your team members will be prompted to authenticate.

5. Accessing Livecycle dashboardYou can also access the Livecycle dashboard to see the logs and debug your application. Choose Open Link to open the Livecycle dashboard (Figure 8). On the dashboard, you can see all the running applications and services. The Livecycle dashboard requires authentication and organization membership, similarly to private environments/services.

Figure 8: Navigate to Livecycle logging dashboard.

6. Debugging, inspecting, and loggingOnce you have opened the Livecycle dashboard, you will see all the environments/apps that are running. Select the name of the environment for which you want to see the logs, terminal, etc. You can view the logs, terminal, and container inspection for each service (Figure 9).

Figure 9: Livecycle logging and debugging dashboard.

That’s it! You have successfully installed the Livecycle Docker Extension and shared your local development environment with your team.

Flexibility to begin collaborating at any point

Livecycle is flexible by design and can be added to your workflow in several ways, so you can initiate collaborative reviews at any point.

Our Docker extension extends this flexibility even more by enabling teams working on dockerized applications to shift the review process much farther left than ever before — while the code is still on the developer’s machine. 

This setup means that code changes, bug fixes, and new features can be reviewed instantly without the hassle of staging environments or CI builds. It also has the potential to directly impact a company’s bottom line by saving time and improving code quality.

Common use cases

Let’s look at common use cases for the Livecycle Docker Extension to illustrate its benefit to development teams. 

Instant UI Reviews: Livecycle enables collaboration between developers and non-technical stakeholders early in the workflow. Using the Livecycle extension, you can get instant feedback on the latest front-end changes you’re working on your machine.Opening a tunnel and creating a shareable URL enables anyone on the team to use a browser to access the relevant services securely. Designers, QA, marketing, and management can view the application and use built-in commenting and collaboration tools to leave clear, actionable feedback.

Code reviews and debugging: Another common use case is enabling developers to work together to review and debug code changes as soon as possible. Using the Livecycle extension, you can instantly share any front-end or back-end service running on your machine.Your team can securely access services to see real-time logging, catch errors, and execute commands in a terminal, so you can collaboratively fix issues much earlier in the development lifecycle.

Conclusion

Livecycle’s Docker Extension makes it easy to share your work in progress and quickly collaborate with your team. And tighter feedback loops will enable you to deliver higher quality code faster. 

If you’re currently using Docker for your projects, you can use the Livecycle extension to easily share them without deployment/CI dependencies.

So, go ahead and give Livecycle a try! The initial setup only takes a few minutes, and if you have any questions, we invite you to check out our documentation and reach out on our Slack channel. 

Learn more

Try the Livecycle Docker Extension.

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Quelle: https://blog.docker.com/feed/

How JW Player Secured 300 Repos in an Hour with Docker Scout

In a world where technology continually advances, the demand for reliable, scalable, and up-to-date development environments remains constant. To meet this demand, organizations must invest in tools that offer flexibility, adaptability, and a team capable of maintaining these environments. 

For companies like JW Player, whose core business revolves around streaming, content, and infrastructure, security must be a priority without slowing down delivery or affecting operations. In this article, we’ll share how JW Player uses Docker to help meet such challenges, including how JW Player enabled more than 300 repositories for Docker Scout within just one hour. 

JW Player: Streaming excellence

As a global leader in video streaming, JW Player has long been at the forefront of technological innovation. With a mission to empower their customers through monetization, engagement, and seamless video delivery, JW Player’s services have facilitated the streaming of more than 860 billion videos and counting. However, this remarkable achievement comes with its own set of complex technical challenges.

Operating at this scale, JW Player relies on a multitude of technologies, thousands of nodes, and an extensive fleet of multiple Kubernetes clusters. Docker, a fundamental pillar of JW Player’s workflow, plays an indispensable role in the organization’s daily operations. By leveraging their dev teams’ existing adoption of Docker, JW Player enabled more than 300 repositories for Docker Scout within just one hour. 

 JW Player shared their impressive technical accomplishments at DockerCon 2023:

860B+ streamed videos

8.5k containers

300+ repositories enabled for Docker Scout in 1 hour

“In fact, it was so easy, we didn’t have to wait weeks or months for development teams to go update their development pipeline. We were able to get more than 300 repositories running on our first day using the product in under an hour,” says Stewart Powell, Engineering Manager at JW Player.

The Docker difference

Earlier this year, JW Player set out with a clear objective: to implement a comprehensive image vulnerability management program while preserving their core engineering values. Central to this vision was the empowerment of development teams, allowing them to retain full ownership of their build and release pipelines. This seemingly ambitious goal posed a challenge of integrating software supply chains into various build and release pipelines without overburdening the DevOps team.

With Docker Scout, the solution was surprisingly straightforward. By ticking a few boxes in Docker Hub, JW Player provided its developers with a comprehensive software supply chain and image vulnerability management program without adding extra workload to their team. In fact, they achieved this without waiting for weeks or months for development teams to update their pipelines; more than 300 repositories were up and running within an hour.

“With Docker Scout, we were able to go into Docker Hub, check a box, and with virtually no effort from my team whatsoever, were able to provide developers with a comprehensive software supply chain and image vulnerability management program with no effort,” Powell says.

JW Player recognizes the importance of offering their entire engineering organization a unified overview of their security posture. This transparency and accountability is essential for any successful security program. Docker Scout excels in providing a central portal for developers to see all the critical information needed to make decisions about the entire software supply chain. Developers receive near real-time feedback on container health and security, while also giving the security team tools to prioritize and remediate areas as needed.

Try Docker Scout

As JW Player’s journey shows, Docker Scout empowers organizations to streamline their development processes, enhance security, and maintain innovation at scale. At an even higher level, Docker Scout makes it even easier for developers to build quality, resilient, and trustworthy solutions. 

Try Docker Scout today.

Learn more

Looking to get up and running? Use our Quickstart guide.

Watch the JW Player presentation during the keynote at DockerCon.

Watch What’s in My Container? Docker Scout CLI and CI to the Rescue (DockerCon 2023).

Watch Docker Scout: Securing The Complete Software Supply Chain (DockerCon 2023).

Vote on what’s next! Check out the Docker Scout public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Quelle: https://blog.docker.com/feed/

LLM Everywhere: Docker for Local and Hugging Face Hosting

This post is written in collaboration with Docker Captain Harsh Manvar.

Hugging Face has become a powerhouse in the field of machine learning (ML). Their large collection of pretrained models and user-friendly interfaces have entirely changed how we approach AI/ML deployment and spaces. If you’re interested in looking deeper into the integration of Docker and Hugging Face models, a comprehensive guide can be found in the article “Build Machine Learning Apps with Hugging Face’s Docker Spaces.”

The Large Language Model (LLM) — a marvel of language generation — is an astounding invention. In this article, we’ll look at how to use the Hugging Face hosted Llama model in a Docker context, opening up new opportunities for natural language processing (NLP) enthusiasts and researchers.

Introduction to Hugging Face and LLMs

Hugging Face (HF) provides a comprehensive platform for training, fine-tuning, and deploying ML models. And, LLMs provide a state-of-the-art model capable of performing tasks like text generation, completion, and classification.

Leveraging Docker for ML

The robust Docker containerization technology makes it easier to package, distribute, and operate programs. It guarantees that ML models operate consistently across various contexts by enclosing them within Docker containers. Reproducibility is ensured, and the age-old “it works on my machine” issue is resolved.

Type of formats

For the majority of models on Hugging Face, two options are available. 

GPTQ (usually 4-bit or 8-bit, GPU only)

GGML (usually 4-, 5-, 8-bit, CPU/GPU hybrid)

Examples of quantization techniques used in AI model quantization include the GGML and GPTQ models. This can mean quantization either during or after training. By reducing model weights to a lower precision, the GGML and GPTQ models — two well-known quantized models — minimize model size and computational needs. 

HF models load on the GPU, which performs inference significantly more quickly than the CPU. Generally, the model is huge, and you also need a lot of VRAM. In this article, we will utilize the GGML model, which operates well on CPU and is probably faster if you don’t have a good GPU.

We will also be using transformers and ctransformers in this demonstration, so let’s first understand those: 

transformers: Modern pretrained models can be downloaded and trained with ease thanks to transformers’ APIs and tools. By using pretrained models, you can cut down on the time and resources needed to train a model from scratch, as well as your computational expenses and carbon footprint.

ctransformers: Python bindings for the transformer models developed in C/C++ with the GGML library.

Request Llama model access

We will utilize the Meta Llama model, signup, and request for access.

Create Hugging Face token

To create an Access token that will be used in the future, go to your Hugging Face profile settings and select Access Token from the left-hand sidebar (Figure 1). Save the value of the created Access Token.

Figure 1. Generating Access Token.

Setting up Docker environment

Before exploring the realm of the LLM, we must first configure our Docker environment. Install Docker first, following the instructions on the official Docker website based on your operating system. After installation, execute the following command to confirm your setup:

docker –version

Quick demo

The following command runs a container with the Hugging Face harsh-manvar-llama-2-7b-chat-test:latest image and exposes port 7860 from the container to the host machine. It will also set the environment variable HUGGING_FACE_HUB_TOKEN to the value you provided.

docker run -it -p 7860:7860 –platform=linux/amd64
-e HUGGING_FACE_HUB_TOKEN="YOUR_VALUE_HERE"

registry.hf.space/harsh-manvar-llama-2-7b-chat-test:latest python app.py

The -it flag tells Docker to run the container in interactive mode and to attach a terminal to it. This will allow you to interact with the container and its processes.

The -p flag tells Docker to expose port 7860 from the container to the host machine. This means that you will be able to access the container’s web server from the host machine on port 7860.

The –platform=linux/amd64 flag tells Docker to run the container on a Linux machine with an AMD64 architecture.

The -e HUGGING_FACE_HUB_TOKEN=”YOUR_VALUE_HERE” flag tells Docker to set the environment variable HUGGING_FACE_HUB_TOKEN to the value you provided. This is required for accessing Hugging Face models from the container.

The app.py script is the Python script that you want to run in the container. This will start the container and open a terminal to it. You can then interact with the container and its processes in the terminal. To exit the container, press Ctrl+C.

Accessing the landing page

To access the container’s web server, open a web browser and navigate to http://localhost:7860. You should see the landing page for your Hugging Face model (Figure 2).

Open your browser and go to http://localhost:7860:

Figure 2. Accessing local Docker LLM.

Getting started

Cloning the project

To get started, you can clone or download the Hugging Face existing space/repository.

git clone https://huggingface.co/spaces/harsh-manvar/llama-2-7b-chat-test

File: requirements.txt

A requirements.txt file is a text file that lists the Python packages and modules that a project needs to run. It is used to manage the project’s dependencies and to ensure that all developers working on the project are using the same versions of the required packages.

The following Python packages are required to run the Hugging Face llama-2-13b-chat model. Note that this model is large, and it may take some time to download and install. You may also need to increase the memory allocated to your Python process to run the model.

gradio==3.37.0
protobuf==3.20.3
scipy==1.11.1
torch==2.0.1
sentencepiece==0.1.99
transformers==4.31.0
ctransformers==0.2.27

File: Dockerfile

FROM python:3.9
RUN useradd -m -u 1000 user
WORKDIR /code
COPY ./requirements.txt /code/requirements.txt
RUN pip install –upgrade pip
RUN pip install –no-cache-dir –upgrade -r /code/requirements.txt
USER user
COPY –link –chown=1000 ./ /code

The following section provides a breakdown of the Dockerfile. The first line tells Docker to use the official Python 3.9 image as the base image for our image:

FROM python:3.9

The following line creates a new user named user with the user ID 1000. The -m flag tells Docker to create a home directory for the user.

RUN useradd -m -u 1000 user

Next, this line sets the working directory for the container to /code.

WORKDIR /code

It’s time to copy the requirements file from the current directory to /code in the container. Also, this line upgrades the pip package manager in the container.

RUN pip install –no-cache-dir –upgrade -r /code/requirements.txt

This line sets the default user for the container to user.

USER user

The following line copies the contents of the current directory to /code in the container. The –link flag tells Docker to create hard links instead of copying the files, which can improve performance and reduce the size of the image. The –chown=1000 flag tells Docker to change the ownership of the copied files to the user user.

COPY –link –chown=1000 ./ /code

Once you have built the Docker image, you can run it using the docker run command. This will start a new container running the Python 3.9 image with the non-root user user. You can then interact with the container using the terminal.

File: app.py

The Python code shows how to use Gradio to create a demo for a text-generation model trained using transformers. The code allows users to input a text prompt and generate a continuation of the text.

Gradio is a Python library that allows you to create and share interactive machine learning demos with ease. It provides a simple and intuitive interface for creating and deploying demos, and it supports a wide range of machine learning frameworks and libraries, including transformers.

This Python script is a Gradio demo for a text chatbot. It uses a pretrained text generation model to generate responses to user input. We’ll break down the file and look at each of the sections.

The following line imports the Iterator type from the typing module. This type is used to represent a sequence of values that can be iterated over. The next line imports the gradio library as well.

from typing import Iterator
import gradio as gr

The following line imports the logging module from the transformers library, which is a popular machine learning library for natural language processing.

from transformers.utils import logging
from model import get_input_token_length, run

Next, this line imports the get_input_token_length() and run() functions from the model module. These functions are used to calculate the input token length of a text and generate text using a pretrained text generation model, respectively. The next two lines configure the logging module to print information-level messages and to use the transformers logger.

from model import get_input_token_length, run

logging.set_verbosity_info()
logger = logging.get_logger("transformers")

The following lines define some constants that are used throughout the code. Also, the lines define the text that is displayed in the Gradio demo.

DEFAULT_SYSTEM_PROMPT = """"""
MAX_MAX_NEW_TOKENS = 2048
DEFAULT_MAX_NEW_TOKENS = 1024
MAX_INPUT_TOKEN_LENGTH = 4000

DESCRIPTION = """"""

LICENSE = """"""

This line logs an information-level message indicating that the code is starting. This function clears the textbox and saves the input message to the saved_input state variable.

logger.info("Starting")
def clear_and_save_textbox(message: str) -> tuple[str, str]:
return '', message

The following function displays the input message in the chatbot and adds the message to the chat history.

def display_input(message: str,
history: list[tuple[str, str]]) -> list[tuple[str, str]]:
history.append((message, ''))
logger.info("display_input=%s",message)
return history

This function deletes the previous response from the chat history and returns the updated chat history and the previous response.

def delete_prev_fn(
history: list[tuple[str, str]]) -> tuple[list[tuple[str, str]], str]:
try:
message, _ = history.pop()
except IndexError:
message = ''
return history, message or ''

The following function generates text using the pre-trained text generation model and the given parameters. It returns an iterator that yields a list of tuples, where each tuple contains the input message and the generated response.

def generate(
message: str,
history_with_input: list[tuple[str, str]],
system_prompt: str,
max_new_tokens: int,
temperature: float,
top_p: float,
top_k: int,
) -> Iterator[list[tuple[str, str]]]:
#logger.info("message=%s",message)
if max_new_tokens > MAX_MAX_NEW_TOKENS:
raise ValueError

history = history_with_input[:-1]
generator = run(message, history, system_prompt, max_new_tokens, temperature, top_p, top_k)
try:
first_response = next(generator)
yield history + [(message, first_response)]
except StopIteration:
yield history + [(message, '')]
for response in generator:
yield history + [(message, response)]

The following function generates a response to the given message and returns the empty string and the generated response.

def process_example(message: str) -> tuple[str, list[tuple[str, str]]]:
generator = generate(message, [], DEFAULT_SYSTEM_PROMPT, 1024, 1, 0.95, 50)
for x in generator:
pass
return '', x

Here’s the complete Python code:

from typing import Iterator
import gradio as gr

from transformers.utils import logging
from model import get_input_token_length, run

logging.set_verbosity_info()
logger = logging.get_logger("transformers")

DEFAULT_SYSTEM_PROMPT = """"""
MAX_MAX_NEW_TOKENS = 2048
DEFAULT_MAX_NEW_TOKENS = 1024
MAX_INPUT_TOKEN_LENGTH = 4000

DESCRIPTION = """"""

LICENSE = """"""

logger.info("Starting")
def clear_and_save_textbox(message: str) -> tuple[str, str]:
return '', message

def display_input(message: str,
history: list[tuple[str, str]]) -> list[tuple[str, str]]:
history.append((message, ''))
logger.info("display_input=%s",message)
return history

def delete_prev_fn(
history: list[tuple[str, str]]) -> tuple[list[tuple[str, str]], str]:
try:
message, _ = history.pop()
except IndexError:
message = ''
return history, message or ''

def generate(
message: str,
history_with_input: list[tuple[str, str]],
system_prompt: str,
max_new_tokens: int,
temperature: float,
top_p: float,
top_k: int,
) -> Iterator[list[tuple[str, str]]]:
#logger.info("message=%s",message)
if max_new_tokens > MAX_MAX_NEW_TOKENS:
raise ValueError

history = history_with_input[:-1]
generator = run(message, history, system_prompt, max_new_tokens, temperature, top_p, top_k)
try:
first_response = next(generator)
yield history + [(message, first_response)]
except StopIteration:
yield history + [(message, '')]
for response in generator:
yield history + [(message, response)]

def process_example(message: str) -> tuple[str, list[tuple[str, str]]]:
generator = generate(message, [], DEFAULT_SYSTEM_PROMPT, 1024, 1, 0.95, 50)
for x in generator:
pass
return '', x

def check_input_token_length(message: str, chat_history: list[tuple[str, str]], system_prompt: str) -> None:
#logger.info("check_input_token_length=%s",message)
input_token_length = get_input_token_length(message, chat_history, system_prompt)
#logger.info("input_token_length",input_token_length)
#logger.info("MAX_INPUT_TOKEN_LENGTH",MAX_INPUT_TOKEN_LENGTH)
if input_token_length > MAX_INPUT_TOKEN_LENGTH:
logger.info("Inside IF condition")
raise gr.Error(f'The accumulated input is too long ({input_token_length} > {MAX_INPUT_TOKEN_LENGTH}). Clear your chat history and try again.')
#logger.info("End of check_input_token_length function")

with gr.Blocks(css='style.css') as demo:
gr.Markdown(DESCRIPTION)
gr.DuplicateButton(value='Duplicate Space for private use',
elem_id='duplicate-button')

with gr.Group():
chatbot = gr.Chatbot(label='Chatbot')
with gr.Row():
textbox = gr.Textbox(
container=False,
show_label=False,
placeholder='Type a message…',
scale=10,
)
submit_button = gr.Button('Submit',
variant='primary',
scale=1,
min_width=0)
with gr.Row():
retry_button = gr.Button('Retry', variant='secondary')
undo_button = gr.Button('Undo', variant='secondary')
clear_button = gr.Button('Clear', variant='secondary')

saved_input = gr.State()

with gr.Accordion(label='Advanced options', open=False):
system_prompt = gr.Textbox(label='System prompt',
value=DEFAULT_SYSTEM_PROMPT,
lines=6)
max_new_tokens = gr.Slider(
label='Max new tokens',
minimum=1,
maximum=MAX_MAX_NEW_TOKENS,
step=1,
value=DEFAULT_MAX_NEW_TOKENS,
)
temperature = gr.Slider(
label='Temperature',
minimum=0.1,
maximum=4.0,
step=0.1,
value=1.0,
)
top_p = gr.Slider(
label='Top-p (nucleus sampling)',
minimum=0.05,
maximum=1.0,
step=0.05,
value=0.95,
)
top_k = gr.Slider(
label='Top-k',
minimum=1,
maximum=1000,
step=1,
value=50,
)

gr.Markdown(LICENSE)

textbox.submit(
fn=clear_and_save_textbox,
inputs=textbox,
outputs=[textbox, saved_input],
api_name=False,
queue=False,
).then(
fn=display_input,
inputs=[saved_input, chatbot],
outputs=chatbot,
api_name=False,
queue=False,
).then(
fn=check_input_token_length,
inputs=[saved_input, chatbot, system_prompt],
api_name=False,
queue=False,
).success(
fn=generate,
inputs=[
saved_input,
chatbot,
system_prompt,
max_new_tokens,
temperature,
top_p,
top_k,
],
outputs=chatbot,
api_name=False,
)

button_event_preprocess = submit_button.click(
fn=clear_and_save_textbox,
inputs=textbox,
outputs=[textbox, saved_input],
api_name=False,
queue=False,
).then(
fn=display_input,
inputs=[saved_input, chatbot],
outputs=chatbot,
api_name=False,
queue=False,
).then(
fn=check_input_token_length,
inputs=[saved_input, chatbot, system_prompt],
api_name=False,
queue=False,
).success(
fn=generate,
inputs=[
saved_input,
chatbot,
system_prompt,
max_new_tokens,
temperature,
top_p,
top_k,
],
outputs=chatbot,
api_name=False,
)

retry_button.click(
fn=delete_prev_fn,
inputs=chatbot,
outputs=[chatbot, saved_input],
api_name=False,
queue=False,
).then(
fn=display_input,
inputs=[saved_input, chatbot],
outputs=chatbot,
api_name=False,
queue=False,
).then(
fn=generate,
inputs=[
saved_input,
chatbot,
system_prompt,
max_new_tokens,
temperature,
top_p,
top_k,
],
outputs=chatbot,
api_name=False,
)

undo_button.click(
fn=delete_prev_fn,
inputs=chatbot,
outputs=[chatbot, saved_input],
api_name=False,
queue=False,
).then(
fn=lambda x: x,
inputs=[saved_input],
outputs=textbox,
api_name=False,
queue=False,
)

clear_button.click(
fn=lambda: ([], ''),
outputs=[chatbot, saved_input],
queue=False,
api_name=False,
)

demo.queue(max_size=20).launch(share=False, server_name="0.0.0.0")

The check_input_token_length and generate functions comprise the main part of the code. The generate function is responsible for generating a response given a message, a history of previous messages, and various generation parameters, including:

max_new_tokens: This is an integer that indicates the most tokens that the response-generating model is permitted to produce.

temperature: This float value regulates how random the output that is produced is. The result is more random at higher values (like 1.0) and more predictable at lower levels (like 0.2).

top_p: The nucleus sampling is determined by this float value, which ranges from 0 to 1. It establishes a cutoff point for the tokens’ cumulative probability.

top_k: The number of next tokens to be considered is represented by this integer. A greater number results in a more concentrated output.

The UI component and running the API server are handled by app.py. Basically, app.py is where you initialize the application and other configuration.

File: Model.py

The Python script is a chat bot that uses an LLM to generate responses to user input. The script uses the following steps to generate a response:

It creates a prompt for the LLM by combining the user input, the chat history, and the system prompt.

It calculates the input token length of the prompt.

It generates a response using the LLM and the following parameters:

max_new_tokens: Maximum number of new tokens to generate.

temperature: Temperature to use when generating the response. A higher temperature will result in more creative and varied responses, but it may also result in less coherent responses

top_p: This parameter controls the nucleus sampling algorithm used to generate the response. A  higher top_p value will result in more focused and informative responses, while a lower value will  result in more creative and varied responses.

top_k: This parameter controls the number of highest probability tokens to consider when generating the response. A higher top_k value will result in more predictable and consistent responses, while a lower value will result in more creative and varied responses.

The main function of the TextIteratorStreamer class is to store print-ready text in a queue. This queue can then be used by a downstream application as an iterator to access the generated text in a non-blocking way.

from threading import Thread
from typing import Iterator

#import torch
from transformers.utils import logging
from ctransformers import AutoModelForCausalLM
from transformers import TextIteratorStreamer, AutoTokenizer

logging.set_verbosity_info()
logger = logging.get_logger("transformers")

config = {"max_new_tokens": 256, "repetition_penalty": 1.1,
"temperature": 0.1, "stream": True}
model_id = "TheBloke/Llama-2-7B-Chat-GGML"
device = "cpu"

model = AutoModelForCausalLM.from_pretrained(model_id, model_type="llama", lib="avx2", hf=True)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

def get_prompt(message: str, chat_history: list[tuple[str, str]],
system_prompt: str) -> str:
#logger.info("get_prompt chat_history=%s",chat_history)
#logger.info("get_prompt system_prompt=%s",system_prompt)
texts = [f'<s>[INST] <<SYS>>n{system_prompt}n<</SYS>>nn']
#logger.info("texts=%s",texts)
do_strip = False
for user_input, response in chat_history:
user_input = user_input.strip() if do_strip else user_input
do_strip = True
texts.append(f'{user_input} [/INST] {response.strip()} </s><s>[INST] ')
message = message.strip() if do_strip else message
#logger.info("get_prompt message=%s",message)
texts.append(f'{message} [/INST]')
#logger.info("get_prompt final texts=%s",texts)
return ''.join(texts) def get_input_token_length(message: str, chat_history: list[tuple[str, str]], system_prompt: str) -> int:
#logger.info("get_input_token_length=%s",message)
prompt = get_prompt(message, chat_history, system_prompt)
#logger.info("prompt=%s",prompt)
input_ids = tokenizer([prompt], return_tensors='np', add_special_tokens=False)['input_ids']
#logger.info("input_ids=%s",input_ids)
return input_ids.shape[-1]

def run(message: str,
chat_history: list[tuple[str, str]],
system_prompt: str,
max_new_tokens: int = 1024,
temperature: float = 0.8,
top_p: float = 0.95,
top_k: int = 50) -> Iterator[str]:
prompt = get_prompt(message, chat_history, system_prompt)
inputs = tokenizer([prompt], return_tensors='pt', add_special_tokens=False).to(device)

streamer = TextIteratorStreamer(tokenizer,
timeout=15.,
skip_prompt=True,
skip_special_tokens=True)
generate_kwargs = dict(
inputs,
streamer=streamer,
max_new_tokens=max_new_tokens,
do_sample=True,
top_p=top_p,
top_k=top_k,
temperature=temperature,
num_beams=1,
)
t = Thread(target=model.generate, kwargs=generate_kwargs)
t.start()

outputs = []
for text in streamer:
outputs.append(text)
yield "".join(outputs)

To import the necessary modules and libraries for text generation with transformers, we can use the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM

This will import the necessary modules for tokenizing and generating text with transformers.

To define the model to import, we can use:

model_id = "TheBloke/Llama-2-7B-Chat-GGML"

This step defines the model ID as TheBloke/Llama-2-7B-Chat-GGML, a scaled-down version of the Meta 7B chat LLama model.

Once you have imported the necessary modules and libraries and defined the model to import, you can load the tokenizer and model using the following code:

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

This will load the tokenizer and model from the Hugging Face Hub.The job of a tokenizer is to prepare the model’s inputs. Tokenizers for each model are available in the library. Define the model to import; again, we’re using TheBloke/Llama-2-7B-Chat-GGML.

You need to set the variables and values in config for max_new_tokens, temperature, repetition_penalty, and stream:

max_new_tokens: Most tokens possible, disregarding the prompt’s specified quantity of tokens.

temperature: The amount that was utilized to modify the probability for the subsequent tokens.

repetition_penalty: Repetition penalty parameter. 1.0 denotes no punishment. 

stream: Whether to generate the response text in a streaming manner or in a single batch. 

You can also create the space and commit files to it to host applications on Hugging Face and test directly.

Building the image

The following command builds a Docker image for the llama-2-13b-chat model on the linux/amd64 platform. The image will be tagged with the name local-llm:v1.

docker buildx build –platform=linux/amd64 -t local-llm:v1 .

Running the container

The following command will start a new container running the local-llm:v1 Docker image and expose port 7860 on the host machine. The -e HUGGING_FACE_HUB_TOKEN=”YOUR_VALUE_HERE” environment variable sets the Hugging Face Hub token, which is required to download the llama-2-13b-chat model from the Hugging Face Hub.

docker run -it -p 7860:7860 –platform=linux/amd64 -e HUGGING_FACE_HUB_TOKEN="YOUR_VALUE_HERE" local-llm:v1 python app.py

Next, open the browser and go to http://localhost:7860 to see local LLM Docker container output (Figure 3).

Figure 3. Local LLM Docker container output.

You can also view containers via the Docker Desktop (Figure 4).

Figure 4. Monitoring containers with Docker Desktop.

Conclusion

Deploying the LLM GGML model locally with Docker is a convenient and effective way to use natural language processing. Dockerizing the model makes it easy to move it between different environments and ensures that it will run consistently. Testing the model in a browser provides a user-friendly interface and allows you to quickly evaluate its performance.

This setup gives you more control over your infrastructure and data and makes it easier to deploy advanced language models for a variety of applications. It is a significant step forward in the deployment of large language models.

Learn more

Read Build Machine Learning Apps with Hugging Face’s Docker Spaces.

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Quelle: https://blog.docker.com/feed/

Achieve Security and Compliance Goals with Policy Guardrails in Docker Scout

At DockerCon 2023, we announced the General Availability (GA) of Docker Scout. We built Docker Scout for modern application teams, to help developers navigate the complexities and challenges of the software supply chain through actionable insights. 

The Scout GA release introduced several new capabilities, including a policy-driven evaluation mechanism, aka guardrails, that helps developers prioritize their insights to better align their work with organizational standards and industry best practices. 

In this article, we will walk through how Docker Scout policies enable teams to identify, prioritize, and fix their software quality issues at the point of creation — the developer inner loop (i.e., local development, building, and testing) — so that they can meet their organization’s security and reliability standards without compromising their speed of execution and innovation. 

Prioritizing problems

When implementing software supply chain tools and processes, organizations often encounter a daunting wall of issues in their software. The sheer volume of these issues (ranging from vulnerabilities in code to malicious third-party dependencies, compromised build systems, and more) makes it difficult for development teams to balance shipping new features and improving their product. In such situations, policies play a crucial role in helping developers prioritize which problems to fix first by providing clear guidelines and criteria for resolution. 

Docker Scout’s out-of-the-box policies align with software supply chain best practices to maintain up-to-date base images, remove high-risk vulnerabilities, check for undesirable licenses, and look for other issues to help organizations maintain the quality of the artifacts they’re building or consuming (Figure 1). 

Figure 1: A summary of available policies in Docker Scout.

These policies bring developers critical insights about their container images and enable them to focus on prioritizing new issues as they come in and to identify which pre-existing issues require their attention. In fact, developers can get these insights right from their local machine, where it is much faster and less expensive to iterate than later in the supply chain, such as in CI, or even later in production (Figure 2).

Figure 2: Policy evaluation results in CLI.

Make things better

Docker Scout also adopts a more pragmatic and flexible approach when it comes to policy. Traditional policy solutions typically follow a binary pass/fail evaluation model that imposes rigid, one-size-fits-all targets, like mandating “fewer than 50 vulnerabilities” where failure is absolute. Such an approach overlooks nuanced situations or intermediate states, which can cause friction with developer workflows and become a main impediment to successful adoption of policies. 

In contrast, Docker Scout’s philosophy revolves around a simple premise: “Make things better.” This premise means the first step in every release is not to get developers to zero issues but to prevent regression. Our approach acknowledges that although projects with complex, extensive codebases have existing quality gaps, it is counterproductive to place undue pressure on developers to fix everything, everywhere, all at once.

By using Docker Scout, developers can easily track what has worsened in their latest builds (from the website, the CLI and CI pipelines) and only improve the issues relevant to their policies (Figures 3 and 4).

Figure 3: Outcomes driven by Docker Scout Policy.

Figure 4: Pull Request diff from the Scout GitHub Action.

But, finding and prioritizing the right problems is only half of the effort. For devs to truly “make things better,” the second step they must take is toward fixing these issues. According to a recent survey of 500 developers conducted by GitHub, the primary areas where development teams spend most of their time include writing code (32%) and identifying and addressing security vulnerabilities (31%). This is far from ideal, as it means that developers are spending less time driving innovation and user value. 

With Docker Scout, we aim to address this challenge head-on by providing developers access to automated, in-context remediation guidance (Figure 5). By actively suggesting upgrade and remediation paths, Docker Scout helps to bring teams’ container images back in line with policies, reducing their mean time to repair (MTTR) and freeing up more of their time to create value.

Figure 5: Example scenario for the ‘Base images not up to date’ policy.

While Docker Scout initially helps teams prioritize the direction of improvement, once all the existing critical software issues have been effectively addressed, developers can transition to employing the policies to achieve full compliance. This process ensures that going forward, all container images are void of the specific issues deemed vital to their organization’s code quality, compliance, and security goals. 

The Docker Scout team is excited to help our customers build software that meets the highest standards of safety, efficiency, and quality in a rapidly evolving ecosystem within the software supply chain. To get started with Docker Scout, visit our product page today.

Learn more

Visit the Docker Scout product page.

Looking to get up and running? Use our Quickstart guide.

Vote on what’s next! Check out the Docker Scout public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Quelle: https://blog.docker.com/feed/

Building Trusted Content with GitHub Actions

As part of our continued efforts to improve the security of the software supply chain and increase trust in the container images developers create and use every day, Docker has begun migrating its Docker Official Images (DOI) builds to the GitHub Actions platform. Leveraging the GitHub Actions hosted, ephemeral build platform enables the creation of secure, verifiable images with provenance and SBOM attestations signed using OpenPubkey and the GitHub Actions OIDC provider.

DOI currently supports up to nine architectures for a wide variety of images, more than any other collection of images. As we increase the trust in the DOI catalog, we will spread out the work over three phases. In our first phase, only Linux/AMD64 and Linux/386 images will be built on GitHub Actions. For the second phase, we eagerly anticipate the availability of GitHub Actions Arm-based hosted runners next year to add support for additional Arm architectures. In our final phase, we will investigate using GitHub Actions self-hosted runners for the image architectures not supported by GitHub Actions hosted runners to cover any outstanding architectures.

In addition to using GitHub Actions, the new DOI signing approach requires establishing a root of trust that identifies who should be signing Docker Official Images. We are working with various relevant communities — for example, the Open Source Security Foundation (OpenSSF, a Linux Foundation project), the CNCF TUF (The Update Framework) and in-toto projects, and the OCI technical community — to establish and distribute this trust root using TUF.

To ensure smooth and rapid developer adoption, we will integrate DOI TUF+OpenPubkey signing and verification into the container toolchain. These pluggable integrations will enable developers to seamlessly verify signatures of DOI, ensuring the integrity and origin of these fundamental artifacts. Soon, verifying your DOI base image signatures will be integrated into the Build and push Docker images GitHub Action for a more streamlined workflow.

What’s next

Looking forward, Docker will continue to develop and extend the TUF+OpenPubkey signing approach to make it more widely useful, enhancing and simplifying trust bootstrapping, signing, and verification. As a next step, we plan to work with Docker Verified Publishers (DVP) and Docker-Sponsored Open Source (DSOS) to expand signing support to additional Docker Trusted Content. Additionally, plans are in place to offer an integration of Docker Hub with GitHub Actions OIDC, allowing developers to push OCI images directly to Docker Hub using their GitHub Actions OIDC identities.

Learn more

OpenPubkey FAQ 

Signing Docker Official Images Using OpenPubkey 

Docker Official Image Signing based on OpenPubkey and TUF

Quelle: https://blog.docker.com/feed/

Docker Desktop 4.25: Enhancements to Docker Desktop on Windows, Rosetta for Linux GA, and New Docker Scout Image Analysis Settings

We’re excited to share Docker Desktop’s latest advancements that promise to elevate your experience, enhance productivity, and increase speed. The Docker Desktop 4.25 release supports the GA of Rosetta for Linux, a feature that furthers the speed and productivity that Docker Desktop brings. We’ve also optimized the installation experience on Windows and simplified Docker Scout image analysis settings in this latest Docker Desktop release.Get ready for near-native emulation, seamless updates, and effortless image analysis control. Let’s dive into some of the newest features in Docker Desktop.

Enhanced productivity and speed with Rosetta for Linux GA

We’re thrilled to announce the general availability of Rosetta for Linux, a game-changing Docker Desktop feature that significantly boosts performance and productivity. Here’s what you need to know:

Rosetta for Linux GA: Docker now supports running x86-64 (Intel) binaries on Apple silicon with Rosetta 2. It’s no longer an experimental feature but a seamlessly integrated component of Docker Desktop.

Near-native emulation: The x86_64 emulation performance is now nearly on par with native execution, all thanks to Rosetta 2. This means you can expect near-native speed when running your applications.

Easy activation: Enabling Rosetta for Linux is a breeze. Simply navigate to Docker Desktop Settings > General and toggle it on to take advantage of the enhanced performance.

System requirements: Rosetta for Linux is available on macOS version 13.0 and above, specifically for Apple silicon devices. Notably, it’s enabled by default on macOS 14.1 and newer, making it even more accessible.

Figure 1: Docker Desktop 4.25 User settings displaying the new option to select turning on Rosetta on Apple Silicon.

Customers who used the previously beta feature of Rosetta for Linux experienced remarkable improvements, particularly when compared to alternatives. Real-world examples:

Database operations: SQL queries are running significantly faster, resulting in notable speed-ups. For instance, tasks like creating databases, running queries, and making updates are showing impressive performance gains ranging from 4% to as high as 91%.

Development efficiency: Customers have reported substantial improvements in their development workflows. Tasks like installing dependencies and building projects are considerably faster, translating to more productive development cycles.

Compatibility: For projects that rely on compatibility with Linux/AMD64 platforms due to binary compatibility issues, Rosetta for Linux ensures a smooth and efficient development process.

With Rosetta for Linux in Docker Desktop, users can look forward to a significant performance boost and increased efficiency.

Enhanced Docker Desktop installation experience on Windows

At Docker, we’re committed to delivering a seamless and efficient Docker Desktop experience for Windows users, irrespective of local settings or privileges. We understand that keeping your WSL (Windows Subsystem for Linux) up to date is crucial for a seamless Docker Desktop experience. With this in mind, we’re pleased to announce a new feature in Docker Desktop that detects the version of WSL during installation and offers automated updates.

When an outdated version of WSL is detected, you now have two convenient options:

Automatic update (default): Allow Docker Desktop to handle the WSL update process seamlessly, ensuring your environment is always up to date without any manual intervention.

Manual update: If you have specific requirements or prefer to manage your WSL updates manually, you can choose to update WSL outside of Docker Desktop. This flexibility allows you to make custom kernel installations and maintain full control over your development environment.

With these enhancements, Docker Desktop on Windows becomes more user-friendly, reliable, and adaptable to your unique needs.

Figure 2: Prompt displaying two new options to finish the installation of Docker Desktop.

Improved Docker Desktop compatibility with Windows 

Docker Desktop’s recent update also includes a change in the minimum supported Windows version, now set at 19044. This update isn’t just about staying in sync with Microsoft’s supported operating systems; it’s about providing you with a seamless Docker Desktop installation experience. By raising the minimum version, we aim to prevent issues tied to older Windows versions, reducing installation failures. 

Figure 3: Alert regarding the installed version of Windows being incompatible with the version of Docker Desktop being installed.

To ensure all Windows users can harness the latest Docker Desktop features and functionalities, we’ve implemented a clear prompt to upgrade Windows versions below 19044.

New Docker Scout settings management in Docker Desktop 4.25

Introducing an easy way for users to manage Docker Scout image analysis in Docker Desktop 4.25. Now, users can easily control Docker Scout image indexing from the Docker Desktop general settings panel with a user-friendly toggle to enable or disable the analysis of local images. 

Administrators can fine-tune access with customized user policies, ensuring precise control of Docker Scout image analysis within their organizations. By specifying an organizational setting in admin-settings.json, administrators can control the Docker Scout image analysis feature for their developers. This enhancement is the first of many to ensure that both users and administrator experiences support personalization.

Figure 4: Docker Desktop 4.25 user settings displaying the new option to turn Scout SBOM indexing on or off at a user settings level. For organizations that have administration, this feature can be restricted per company policies.

Conclusion

The 4.25 release is all about enhancing your Docker Desktop experience. Rosetta for Linux provides remarkable speed and efficiency, optimized installation on Windows ensures seamless updates, and Docker Scout image analysis settings are more easily established.

Update to Docker Desktop 4.25 to empower every user and team to continue to improve productivity and efficiency in developing innovative applications. Do you have feedback? Leave feedback on our public GitHub roadmap, and let us know what else you’d like to see in upcoming releases.

Learn more

Read the Docker Desktop Release Notes.

Get the latest release of Docker Desktop.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Quelle: https://blog.docker.com/feed/

Highlights from DockerCon 2023: New Docker Local, Cloud, and AI/ML Innovations

DockerCon 2023 was held October 4-5 in Los Angeles, California, and was a hybrid event, with the first in-person attendance since 2019. The keynotes both days were packed with Docker announcements and demos, plus customers and partners joined us on stage. 

In this post, we round up highlights from DockerCon 2023. Event videos are available on-demand now on the DockerCon site and will be added to YouTube in the coming weeks.

Docker CEO Scott Johnston kicked off DockerCon 2023, celebrating 10 years of Docker. 

“In the last 10 years, we’ve grown to more than 15 million developers, across the globe, and you all collectively — just on Docker Hub alone — have created and shared more than 15 million repos across open source, commercial, and many other communities,” he said. “Now, with your code, with your apps, with your Dockerfiles, your Docker Compose files, tweets, blog posts, YouTube videos, and introducing your colleagues to Docker, you have made it clear that Docker is the way to build, share, and run any application, anywhere.” 

The first-day keynote included product announcements to accelerate the delivery of secure apps. 

“With these products, Docker is clearly making ‘shift-left’ the new standard in developer experience,” writes Zevi Reinitz for Livecycle. “Each of these new tools aims to achieve a singular goal for developers everywhere: combine the responsiveness and convenience of local development with the on-demand resources, connectedness, and collaboration possibilities of the cloud. This combination enables developers to do their best work much earlier in the SDLC than they ever imagined possible.”

The second-day keynote, hosted by Docker CTO Justin Cormack, focused on innovations in artificial intelligence (AI). 

“The critical importance of Docker to the modern development ecosystem cannot be overstated, and the new AI efforts could have a big impact on GenAI development efforts,” writes Sean Michael Kerner in VentureBeat.

Here’s a roundup of the news and announcement buzz at DockerCon:

Docker Desktop 4.24: Improving the developer experience

Prior to the event kickoff, Docker announced the release of Docker Desktop 4.24. This release brings the Docker Compose Watch GA release, a tool to improve the inner loop of application development. Docker Compose Watch enables devs to instantly see the impact of their code changes without manually triggering image builds. Read the Docker Compose Watch GA announcement to learn more. 

Docker Desktop 4.24 also includes the GA release of the Resource Saver performance enhancement feature. This new feature automatically detects when Docker Desktop is not running containers and reduces its memory footprint by 10x, freeing up valuable resources on developers’ machines for other tasks and minimizing the risk of lag when navigating across different applications. 

And with this latest Docker Desktop release, developers can view and manage Docker Engine state directly from the Docker Dashboard, minimizing clicks. Learn more in the Docker Desktop 4.24 announcement.

New local + cloud products

In the first-day keynote, we announced the Docker Scout GA release, next-generation Docker Build, and Docker Debug. The new products bring the power of the cloud to a development team’s “inner-loop” code-build-test-debug process.

The Docker Scout GA release enables developers to evaluate container images against a set of out-of-the-box policies. Scout’s new capabilities strengthen its position as integral to the software supply chain. Read the Docker Scout announcement to learn more. 

Development teams can waste as much as an hour per day per team member waiting for their image builds to finish. To address this, next-generation Docker Build speeds up builds by as much as 39 times by automatically taking advantage of large, on-demand cloud-based servers and team-wide build caching.

Developers can spend as much as 60% of their time debugging their applications. But much of that time is taken by sorting and configuring tools and set-up instead of debugging. Docker Debug provides a language-independent, integrated toolbox for debugging local and remote containerized apps — even when the container fails to launch — enabling developers to find and solve problems faster.

The Mutagen File Sync feature of Docker Desktop takes file sharing to new heights with up to a 16.5x improvement in performance. To give it a try and help influence the future of Docker, sign up for the Docker Desktop Preview Program.

Udemy + Docker

Docker and Udemy announced a partnership to offer developers accessible learning paths to further their Docker education. Read the announcement blog post to learn more.

AI/ML announcements

Docker AI

Docker AI, Docker’s first AI-powered product, boosts dev productivity by generating guidance for developers that follows best practices and aids selecting up-to-date, secure images for their applications. Read the press release and “Docker dives into AI to help developers build GenAI apps” on VentureBeat to learn more.

Docker AI is available to sign up for early access now.

New GenAI stack

Docker and partners Neo4j, LangChain, and Ollama launched a new GenAI Stack designed to enable developers to deploy a full GenAI stack in a few clicks. Read the blog post and press release to learn how the GenAI Stack simplifies AI/ML integration. 

The GenAI Stack is available in early access now and is accessible from the Docker Desktop Learning Center or on GitHub. 

OpenPubkey

During DockerCon, we announced our intention to use OpenPubkey, a project jointly developed by BastionZero and Docker and recently open-sourced and donated to The Linux Foundation. Read our blog post to learn about signing Docker Official Images using OpenPubkey.

Hackathon kicks off

A Docker AI/ML Hackathon kicked off the week of DockerCon. The Docker AI/ML Hackathon is open from October 3 – November 7, 2023. Winning projects receive prizes, including Docker swag and up to US$10,000.

Register for the Docker AI/ML Hackathon to participate and to be notified of event activities.

Videos now online

Thank you to the DockerCon attendees, speakers, and sponsors for making the 2023 hybrid event  a huge success! And thank you to Docker partners, customers, Docker Captains, and our community for helping make this year happen. 

Visit DockerCon.com for on-demand videos from the event, and subscribe to the Docker YouTube channel to be notified as videos are uploaded.

Learn more

DockerCon 2023 videos on-demand

Docker YouTube channel

DockerCon archives on YouTube: 2020, 2021, 2022

Docker Desktop 4.24: Compose Watch, Resource Saver, and Docker Engine

Announcing Docker Compose Watch GA Release

Docker’s Journey Toward Enabling Lightning-Fast Developer Innovation: Unveiling Performance Milestones

What is Resource Saver Mode in Docker Desktop and what problem does it solve?, by Ajeet Raina on Collabnix

Announcing Docker Scout GA: Actionable Insights for the Software Supply Chain

Announcing Udemy + Docker Partnership

Docker dives into AI to help developers build GenAI apps, by Sean Michael Kerner on VentureBeat

Docker Announces Docker AI, Boosting Developer Productivity Through Context-Specific, Automated Guidance

Docker Announces New Local + Cloud Products to Accelerate Delivery of Secure Apps

Docker with Neo4j, LangChain, and Ollama Launches New GenAI Stack for Developers

Signing Docker Official Images Using OpenPubkey

Announcing Docker AI/ML Hackathon 

Register for the Docker AI/ML Hackathon

Quelle: https://blog.docker.com/feed/

Docker State of Application Development Survey 2023: Share Your Thoughts on Development

Welcome to the second annual Docker State of Application Development survey!

Please help us better understand and serve the developer community with just 20 minutes of your time. We want to know where developers are focused, what they’re working on, and what is most important to them. Your participation and input will help us build the best products and experiences for you.

For example, in Docker’s 2022 State of Application Development Survey, we found that the task for which Docker users most often refer to support/documentation was creating a Dockerfile (reported by 60% of respondents). Among other improvements, this finding helped spur the innovation of Docker AI.

We also found that 59% of respondents use Udemy for online courses and certifications, so we have partnered with Udemy to make learning and using Docker the best and most streamlined experience possible.

Take the Docker State of Application Development survey now!

By participating in the survey, you will be entered into a raffle for a chance to win* one of the following prizes:

1 laptop computer (an Apple M2 16”)

3 Lego kits: Choose from Ferrari™ Daytona SP3, the HulkBuster™, or The Lord of the Rings: Rivendell

2 game consoles: Choose from a Playstation 5, Xbox Series X, or Nintendo Switch OLED

2 $300 Amazon.com gift cards 

20 Docker swag sets 

The survey is open from October 20, 2023 (7AM PST) to November 20, 2023 (11:59PM PST). 

We’ll choose the winners randomly from those who complete the survey with meaningful answers. Winners will be notified via email on December 11, 2023.

The Docker State of Application Development survey only takes about 20 minutes to complete. We appreciate every contribution and opinion. Your voice counts!

*Docker State of Application Development Promotion Official Rules.

Quelle: https://blog.docker.com/feed/

Signing Docker Official Images Using OpenPubkey

At DockerCon 2023, we announced our intention to use OpenPubkey, a project jointly developed by BastionZero and Docker and recently open-sourced and donated to the Linux Foundation, as part of our signing solution for Docker Official Images (DOI). We provided a detailed description of our signing approach in the DockerCon talk “Building the Software Supply Chain on Docker Official Images.” 

In this post, we walk you through the updated DOI signing strategy. We start with how basic container image signing works and gradually build up to what is currently a common image signing flow, which involves public/private key pairs, certificate authorities, the Update Framework (TUF), timestamp logs, transparency logs, and identity verification using Open ID Connect.

After describing these mechanics, we show how OpenPubkey, with a few recent enhancements included, can be leveraged to smooth the flow and decrease the number of third-party entities the verifier is required to trust.

Hopefully, this incremental narrative will be useful to those new to software artifact signing and those just looking for how this proposal differs from current approaches. As always, Docker is committed to improving the developer experience, increasing the time developers spend on adding value, and decreasing the amount of time they spend on toil.

The approach described in this post aims to allow Docker users to improve the security of their software supply chain by making it easier to verify the integrity and origin of the DOI images they use every day.

Signing container images

An entity can prove that it built a container image by creating a digital signature and adding it to the image. This process is called signing. To sign an image, the entity can create a public/private key pair. The private key must be kept secret, and the public key can be shared publicly.

When an image is signed, a signature is produced using the private key and the digest of the image. Anyone with the public key can then validate that the signature was created by someone who has the private key (Figure 1).

Figure 1: An image is signed using a private key, resulting in a signed image. As a next step, the image’s signature is verified using the corresponding public key to confirm its authenticity.

Let’s walk through how container images can be signed, starting with a naive approach, building up to the current status quo in image signing, and ending with Docker’s proposed solution. We’ll use signing Docker Official Images (DOI) as part of the DOI build process as our example since that is the use case for which this solution has been designed.

In the diagrams throughout this post, we’ll use colored seals to represent signatures. The color of the seal matches the color of the private key it was signed with (Figure 2).

Figure 2: Two distinct private keys, labeled 1234 (red) and 5678 (yellow), generate corresponding unique signatures.

Note that all the verifier knows after verifying an image signature with a public key is that the image was signed with the private key associated with the public key. To trust the image, the verifier must verify the signature and the identity of the key pair owner (Figure 3).

Figure 3: DOI builder pushing a signed image to the registry and verifier pulling the same image. At this point, the verifier only knows what key signed the image, but not who controls the key.

Identity and certificates

How do you verify the owner of a public/private key pair? That is the purpose of a certificate, a simple data structure including a public key and a name. The certificate binds the name, known as the subject, to the public key. This data structure is normally signed by a Certificate Authority (CA), known as the issuer of the certificate. 

Certificates can be distributed alongside signatures that were made with the corresponding key. This means that consumers of images don’t need to verify the owner of every public key used to sign any image. They can instead rely on a much smaller set of CA certificates. This is analogous to the way web browsers have a set of a few dozen root CA certificates to establish trust with a myriad of websites using HTTPS.

Going back to the example of DOI signing, if we distribute a certificate binding the 1234 public key with the Docker Official Images (DOI) builder name, anybody can verify that an image signed by the 1234 private key was signed by the DOI builder, as long as they trust the CA that issued the certificate (Figure 4).

Figure 4: DOI builder provides proof of identity to a Certificate Authority (CA), which provides a certificate back. DOI builder pushes a signed image and certificate to the registry. The verifier is able to verify the signed image and that image was created by DOI builder.

Trust policy

Certificates solve the problem of which public keys belong to which entities, but how do we know which entity was supposed to sign an image? For this, we need trust policy, some signed metadata detailing which entities are allowed to sign an image. For Docker Official Images, trust policy will state that our DOI build servers must sign the images.

We need to ensure that trust policy is updated in a secure way, because if a malicious party can change a policy, then they can trick clients into believing the malicious party’s keys are allowed to sign images they otherwise should not be allowed to sign. To ensure secure trust policy updates, we will use The Update Framework (TUF) (specification), a mechanism for securely distributing updates to arbitrary files.

A TUF repository uses a hierarchy of keys to sign manifests of files in a repository. File indexes, called manifests, are signed with keys that are kept online to enable automation, and the online signing keys are signed with offline root keys. This enables the repository to be recovered in case of online key compromise.

A client that wants to download an update to a file in a TUF repository must first retrieve the latest copy of the signed manifests and make sure the signatures on the manifests are verified. Then they can retrieve the actual files.

Once a TUF repository has been created, it can be distributed by any means we choose, even if the distribution mechanism is not trusted. We will distribute it using the Docker Hub registry (Figure 5).

Figure 5: TUF repository provides a Trust Policy that says the image should be signed by DOI builder. DOI builder provides proof of identity to a Certificate Authority (CA), which provides a certificate back. DOI builder pushes signed image, certificate from the CA, and TUF policy to the registry. The verifier is able to verify the signed image and that the image was created by the identity defined in the Trust Policy.

Certificate expiry and timestamping

In the preceding section, we described a certificate as simply a binding from an identity to a public key. In reality, certificates do contain some additional data. One important detail is the expiry time. Usually, certificates should not be trusted after their expiry time. Signatures on images (as in Figure 5) will only be valid until the attached certificate’s expiry time. A limited life span for a signature isn’t desirable because we want images to be long-lasting (longer-lasting than a certificate).

This problem can be solved by using a Timestamp Authority (TSA). A TSA will receive some data, bundle the data with the current time, and sign the bundle before returning it. Using a TSA allows anybody who trusts the TSA to verify that the data existed at the bundled time.

We can send the signature to a TSA and have it bundle the current timestamp with the signature. Then, we can use the bundled timestamp as the ‘current time’ when verifying the certificate. The timestamp proves that the certificate had not expired at the time the signature was created. The TSA’s certificate will also expire, at which point all of the signed timestamps they’ve created will also expire. TSA certificates typically last for a long time (10+ years)(Figure 6).

Figure 6: DOI builder provides proof of identity to a Certificate Authority (CA), which provides a certificate back. DOI builder sends the image signature to the Timestamping Authority (TSA), which provides a signed bundle with the signature and the current time. DOI builder pushes the signed image, certificate from CA, and the bundle signed by the TSA to the registry. The verifier is able to verify the signed image and that the image was created by DOI builder at a specific time.

OpenID Connect

Thus far, we’ve ignored how the CA verifies the signer’s identity (the “proof of ID” box in the preceding diagrams). How this verification works depends on the CA, but one approach is to outsource this verification to a third-party using OpenID Connect (OIDC).

We won’t describe the entire OIDC flow, but the primary steps are:

The signer authenticates with the OIDC provider (e.g., Google, GitHub, or Microsoft).

The OIDC provider issues an ID token, which is a signed token that the signer can use to prove their identity.

The ID token includes an audience, which specifies the intended party that should use the ID token to verify the identity of the signer. The intended audience will be the Certificate Authority. The ID token must be rejected by any other audience.

The CA must trust the OIDC provider and understand how to verify the ID token’s audience claim.

OIDC ID tokens are signed using the OIDC provider’s private key. The corresponding public key is distributed from a discoverable HTTP endpoint hosted by the OIDC provider.

Signed DOI will be built using GitHub Actions, and GitHub Actions can automatically authenticate build processes with the GitHub Actions OIDC provider, making ID tokens available to build processes (Figure 7).

Figure 7: Using OIDC, DOI builder verifies its identity to GitHub Actions, which provides a token the DOI builder sends to the CA to verify its identity. The CA verifies the token with GitHub Actions and provides a certificate back to the DOI builder.

Key compromise

We mentioned at the start of this post that the private keys must be kept private for the system to remain secure. If the signer’s private key becomes compromised, a malicious party can create signatures that can be verified as being signed by the signer.

Let’s walk through a few ways to mitigate the risk of these keys becoming compromised.

Ephemeral keys

A nice way to reduce the risk of compromise of private keys is to not store them anywhere. Key pairs can be generated in memory, used once, and then the private key can be discarded. This means that certificates are also single-use, and a new certificate must be requested from the CA every time a signature is created.

Transparency logging

Ephemeral keys work well for the signing keys themselves, but there are other things that can be compromised:

The CA’s private key (practically, this cannot be ephemeral)

The OIDC provider’s private key (practically, this cannot be ephemeral)

The OIDC account credentials

These keys/credentials must be kept private, but in case of an accidental compromise, we need to have a way to detect misuse. In this situation, a transparency log (TL) can help.

A transparency log is an append-only tamperproof data store. When data is written to the log, a signed receipt is returned by the operator of the log, which can be used as proof that it is contained in the log. The log can also be monitored to check for suspicious activity.

We can use a transparency log to store all signatures and bundle the TL receipt with the signature. We can only accept a signature as valid if the signature is bundled with a valid TL receipt. Because a signature will only be valid if an entry is in the TL, any malicious party creating fake signatures will also have to publish an entry to the TL. The TL can be monitored by the signer, who can sound the alarm if they notice any signatures in the log they didn’t create (Figure 8). The log can also be monitored by concerned third parties to check for any signatures that don’t look right (Figure 9).

We can also use a transparency log to store certificates issued by the CA. A certificate will only be valid if it comes with a TL receipt. This is also how TLS certificates work — they will only be trusted by browsers if they have an attached TL receipt.

The TL receipts also contain a timestamp, so a TL can completely replace the role of the TSA while also providing extra functionality.

Figure 8: DOI builder sends the signed image and certificate from CA to the Transparency Log (TL), which appends the signature to the TL and returns a receipt for the current time. The monitor is able to observe that the signature was made by the DOI builder at a specific time.

Figure 9: Example of a malicious party signing an image using a fake certificate they received from the CA using hacked OIDC credentials. Monitor is able to discern something is not quite right.

Similar attacks with a stolen private key and a legitimate certificate are also detectable in this way.

A summary of the signing status quo

Everything up to this point describes the status quo in artifact signing. Let’s pull together all of the components described so far to recap (Figure 10). These are:

OIDC provider, to verify the identity of some entity

Certificate authority, to issue certificates binding the identity to a public key

Signer, to sign an image with the corresponding private key

Transparency log (TL), to store signatures and return signed timestamped receipts

TUF repository, to distribute trust policy

Transparency log monitors, to detect malicious behavior

Registry, to store all of the artifacts

Client, to verify signatures on images

Figure 10: Building on all the previous figures, using OIDC the DOI builder identifies itself to GitHub Actions, which provides a token the DOI builder sends to the CA to verify its identity. The CA verifies the token with GitHub Actions and provides a certificate back to the DOI builder. DOI builder sends the signed image and certificate from CA to the Transparency Log (TL), which appends the signature to the TL and returns a receipt for the current time. DOI builder pushes the signed image, the certificate from the CA, and the TL receipt to the registry. The verifier is able to verify the signed image and that the image was created by the identity consistent with trust policy at a specific time. The monitor is able to observe that the signature was made by the DOI builder at a specific time.

The client verifying a signature needs to trust:

The CA

The TL

The OIDC provider (transitively, they need to trust that the CA verifies ID tokens from the OIDC provider correctly)

The signers of the TUF repository

There are many things to trust. Any of these entities being compromised or acting maliciously themselves will compromise the security of the system. Even if such a compromise can be detected by monitoring the transparency log, remediation can be difficult. Removing any of these points of trust without compromising the overall security of the solution would be an improvement.

Docker’s proposed signing solution

Before a CA issues a certificate, it needs to verify control of the private key and control of the identity. In Figure 10, the CA outsources the identity verification to an OIDC provider. We can already use the OIDC provider to verify the identity, but can we use it to verify control of the private key? It turns out that we can.

OpenPubkey is a protocol for binding OIDC identities to public keys. Full details of how it works can be found in the OpenPubkey paper, but below is a simplified explanation. 

OIDC recommends a unique random number to be sent as part of the request to the OIDC provider. This number is called a nonce.

If the nonce is sent, the OIDC provider must return it in the signed JWT (JSON Web Token) called an ID token. We can use this to our advantage by constructing the nonce as a hash of the signer’s public key and some random noise (as the nonce still has to be random). The signer can then bundle the ID token from the OIDC provider with the public key and the random noise and sign the bundle with its private key.

The resulting token (called a PK token) proves control of the OIDC identity and control of the private key at a specific time, as long as a verifier trusts the OIDC provider. In other words, the PK token fulfills the same role as the certificate provided by the CA in all the signing flows up to this point, but does not require trust in a CA. This token can be distributed alongside signatures in the same way as a certificate.

OIDC ID tokens, however, are designed to be verified and discarded in a short timeframe. The public keys for verifying the tokens are available from an API endpoint hosted by the OIDC provider. These keys are rotated frequently (every few weeks or months), and there is currently no way to verify a token signed by a key that is no longer valid. Therefore, a log of historic keys will need to be used to verify PK tokens that were signed with OIDC provider keys that have been rotated out. This log is an additional point of trust for a verifier, so it may seem we’ve removed one point of trust (the CA) and replaced it with another (the log of public keys). For DOI, we have already added another point of trust with the TUF repository used to distribute trust policy. We can also use this TUF repository to distribute the log of public keys.

Figure 11: Using OIDC the DOI builder identifies itself to GitHub Actions, which provides an ID token that binds the OIDC identity to the public key. DOI builder sends the signed image and PK token to the Transparency Log (TL), which appends the signature and returns a receipt for the current time. DOI builder pushes the signed image, the PK token, and the TL receipt to the registry. The verifier is able to verify the signed image and that the image was created by the identity consistent with trust policy at a specific time. The monitor is able to observe that the signature was made by the DOI builder at a specific time.

OpenPubkey enhancements

As originally formulated, OpenPubkey was not designed to support code signing workflows as we’ve described. As a result, the implementation described here has a few drawbacks. In the following, we discuss each drawback and its associated solution.

OIDC ID tokens are bearer auth tokens

An OIDC ID token is a JWT signed by the OIDC provider that allows the bearer of the token to authenticate as the subject of the token. As we will be publishing these tokens publicly, it means a malicious party could take a valid ID token from the registry and present it to a service to identify as the subject of the ID token.

In theory, this should not be a problem because, according to the OIDC spec, any consumer must check the audience in the ID token before trusting the token (i.e., if the token is presented to Service Foo, Service Foo must check that the token was intended for Service Foo by checking the audience claim). However, there have been issues with OIDC client libraries not making this check.

To solve this issue, we can remove the OIDC provider’s signature from the ID token and replace it with a Guillou-Quisquater (GQ) signature. This GQ signature allows us to prove that we had the OIDC provider’s signature without sharing the signed token, and this proof can be verified using the OIDC provider’s public key and the rest of the ID token. More information on GQ signatures can be found in the original paper and in the OpenPubkey reference implementation. We’ve used a similar approach to one discussed in a paper by Zachary Newman.

OIDC ID tokens can contain personal information

For the case where OIDC ID tokens from CI systems such as GitHub Actions are used, it is unlikely that there is any personal information that could be leaked in the token. For example, the full data made available in a GitHub Actions OIDC ID token is documented on GitHub.

Some of this data, such as the repository name and the Git commit digest, are already included in the unsigned provenance attestations that the Docker build process generates. ID tokens representing human identities may include more personal data, but arguably, this is also the kind of data consumers may wish to verify as part of trust policy.

Key compromise

If the signer’s private key is compromised (admittedly unlikely as this is an ephemeral key), it is trivial for an attacker to sign any images and combine the signatures with the public PK token. As mentioned previously, the transparency log can help detect this kind of compromise, but we can go further and prevent it in the first place.

In the original OpenPubkey flow, we create the nonce from the signer’s public key and random noise, then use the corresponding private key to sign the image. If, however, we also include the hash of the image in the nonce, then the image, which we have already signed, is in effect also signed by the OIDC provider. This means the PK token becomes a one-use token that cannot be replayed to sign other images. Thus, compromising the ephemeral private key is no longer useful to an attacker.

OpenPubkey uses the nonce claim in the ID token

The full OIDC flow isn’t available on GitHub Actions. Instead, a simple HTTP endpoint is provided where a build process can request an ID token with an optional audience (aud) claim. We need to get the OIDC provider to sign some arbitrary data during authentication. We can do this by sending some data to the OIDC provider which will end up in one of the ID token claims, as long as we’re not preventing the claim’s intended use. Because GitHub Actions allows us to set the aud claim to an arbitrary value, we can use it for this purpose.

What’s next?

Docker aims to enable the broader open source community to improve security across the entire software supply chain. We feel strongly that good security requires good, easy-to-use tooling. Or, as Founder and CEO of Bounce Security Avi Douglen more eloquently put it, “Security at the expense of usability comes at the expense of security.” 

The approach explained in this post aims to make signing container images as easy as possible without sacrificing security and trust. By simplifying the overall approach and eliminating complicated infrastructure requirements, our goal is to foster widespread adoption of container signing, in the same way we enabled the widespread adoption of Linux containers a decade ago. 

Open source community and cryptography practitioners: Let us know what you think of this approach to signing. You can review the preliminary implementation across the various repositories in the OpenPubkey GitHub organization. Feel free to open issues in the various repositories or join the discussion in the OpenSSF community. 

We look forward to hearing your feedback and working together to improve the security of the software supply chain!

Learn more

Questions about DOI signing? Check out the DOI signing FAQ.

Use Docker Scout to improve your software supply chain security.

Implementation questions? Check out the code in the OpenPubkey GitHub organization.

Questions about OpenPubkey? See the OpenPubkey FAQ.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Stick Figures image library by Youri Tjang.
Quelle: https://blog.docker.com/feed/

Getting Started with JupyterLab as a Docker Extension

This post was written in collaboration with Marcelo Ochoa, the author of the Jupyter Notebook Docker Extension.

JupyterLab is a web-based interactive development environment (IDE) that allows users to create and share documents that contain live code, equations, visualizations, and narrative text. It is the latest evolution of the popular Jupyter Notebook and offers several advantages over its predecessor, including:

A more flexible and extensible user interface: JupyterLab allows users to configure and arrange their workspace to best suit their needs. It also supports a growing ecosystem of extensions that can be used to add new features and functionality.

Support for multiple programming languages: JupyterLab is not just for Python anymore! It can now be used to run code in various programming languages, including R, Julia, and JavaScript.

A more powerful editor: JupyterLab’s built-in editor includes features such as code completion, syntax highlighting, and debugging, which make it easier to write and edit code.

Support for collaboration: JupyterLab makes collaborating with others on projects easy. Documents can be shared and edited in real-time, and users can chat with each other while they work.

This article provides an overview of the JupyterLab architecture and shows how to get started using JupyterLab as a Docker extension.

Uses for JupyterLab

JupyterLab is used by a wide range of people, including data scientists, scientific computing researchers, computational journalists, and machine learning engineers. It is a powerful interactive computing and data science tool and is becoming increasingly popular as an IDE.

Here are specific examples of how JupyterLab can be used:

Data science: JupyterLab can explore data, build and train machine learning models, and create visualizations.

Scientific computing: JupyterLab can perform numerical simulations, solve differential equations, and analyze data.

Computational journalism: JupyterLab can scrape data from the web, clean and prepare data for analysis, and create interactive data visualizations.

Machine learning: JupyterLab can develop and train machine learning models, evaluate model performance, and deploy models to production.

JupyterLab can help solve problems in the following ways:

JupyterLab provides a unified environment for developing and running code, exploring data, and creating visualizations. This can save users time and effort; they do not have to switch between different tools for different tasks.

JupyterLab makes it easy to share and collaborate on projects. Documents can be shared and edited in real-time, and users can chat with each other while they work. This can be helpful for teams working on complex projects.

JupyterLab is extensible. This means users can add new features and functionality to the environment using extensions, making JupyterLab a flexible tool that can be used for a wide range of tasks.

Project Jupyter’s tools are available for installation via the Python Package Index, the leading repository of software created for the Python programming language, but you can also get the JupyterLab environment up and running using Docker Desktop on Linux, Mac, or Windows.

Figure 1: JupyterLab is a powerful web-based IDE for data science

Architecture of JupyterLab

JupyterLab follows a client-server architecture (Figure 2) where the client, implemented in TypeScript and React, operates within the user’s web browser. It leverages the Webpack module bundler to package its code into a single JavaScript file and communicates with the server via WebSockets. On the other hand, the server is a Python application that utilizes the Tornado web framework to serve the client and manage various functionalities, including kernels, file management, authentication, and authorization. Kernels, responsible for executing code entered in the JupyterLab client, can be written in any programming language, although Python is commonly used.

The client and server exchange data and commands through the WebSockets protocol. The client sends requests to the server, such as code execution or notebook loading, while the server responds to these requests and returns data to the client.

Kernels are distinct processes managed by the JupyterLab server, allowing them to execute code and send results — including text, images, and plots — to the client. Moreover, JupyterLab’s flexibility and extensibility are evident through its support for extensions, enabling users to introduce new features and functionalities, such as custom kernels, file viewers, and editor plugins, to enhance their JupyterLab experience.

Figure 2: JupyterLab architecture.

JupyterLab is highly extensible. Extensions can be used to add new features and functionality to the client and server. For example, extensions can be used to add new kernels, new file viewers, and new editor plugins.

Examples of JupyterLab extensions include:

The ipywidgets extension adds support for interactive widgets to JupyterLab notebooks.

The nbextensions package provides a collection of extensions for the JupyterLab notebook.

The jupyterlab-server package provides extensions for the JupyterLab server.

JupyterLab’s extensible architecture makes it a powerful tool that can be used to create custom development environments tailored to users’ specific needs.

Why run JupyterLab as a Docker extension?

Running JupyterLab as a Docker extension offers a streamlined experience to users already familiar with Docker Desktop, simplifying the deployment and management of the JupyterLab notebook.

Docker provides an ideal environment to bundle, ship, and run JupyterLab in a lightweight, isolated setup. This encapsulation promotes consistent performance across different systems and simplifies the setup process.

Moreover, Docker Desktop is the only prerequisite to running JupyterLabs as an extension. Once you have Docker installed, you can easily set up and start using JupyterLab, eliminating the need for additional software installations or complex configuration steps.

Getting started

Getting started with the Docker Desktop Extension is a straightforward process that allows developers to leverage the benefits of unified development. The extension can easily be integrated into existing workflows, offering a familiar interface within Docker. This seamless integration streamlines the setup process, allowing developers to dive into their projects without extensive configuration.

The following key components are essential to completing this walkthrough:

Docker Desktop

Working with JupyterLabs as a Docker extension begins with opening the Docker Desktop. Here are the steps to follow (Figure 3):

Choose Extensions in the left sidebar.

Switch to the Browse tab.

In the Categories drop-down, select Utility Tools.

Find Jupyter Notebook and then select Install.

Figure 3: Installing JupyterLab with the Docker Desktop.

A JupyterLab welcome page will be shown (Figure 4).

Figure 4: JupyterLab welcome page.

Adding extra kernels

If you need to work with other languages rather than Python3 (default), you can complete a post-installation step. For example, to add the iJava kernel, launch a terminal and execute the following:

~ % docker exec -ti –user root jupyter_embedded_dd_vm /bin/sh -c "curl -s https://raw.githubusercontent.com/marcelo-ochoa/jupyter-docker-extension/main/addJava.sh | bash"

Figure 5 shows the install process output of the iJava kernel package.

Figure 5: Capture of iJava kernel installation process.

Next, close your extension tab or Docker Desktop, then reopen, and the new kernel and language support will be enabled (Figure 6).

Figure 6: New kernel and language support enabled.

Getting started with JupyterLab

You can begin using JupyterLab notebooks in many ways; for example, you can choose the language at the welcome page and start testing your code. Or, you can upload a file to the extension using the up arrow icon found at the upper left (Figure 7).

Figure 7: Sample JupyterLab iPython notebook.

Import a new notebook from local storage (Figures 8 and 9).

Figure 8: Upload dialog from disk.

Figure 9: Uploaded notebook.

Loading JupyterLab notebook from URL

If you want to import a notebook directly from the internet, you can use the File > Open URL option (Figure 10). This page shows an example for the notebook with Java samples.

Figure 10: Load notebook from URL.

A notebook upload from URL result is shown in Figure 11.

Figure 11: Uploaded notebook from URL.

Download a notebook to your personal folder

Just like uploading a notebook, the download operation is straightforward. Select your file name and choose the Download option (Figure 12).

Figure 12: Download to local disk option menu.

A download destination option is also shown (Figure 13).

Figure 13: Select local directory for downloading destination.

A note about persistent storage

The JupyterLab extension has a persistent volume for the /home/jovyan directory, which is the default directory of the JupyterLab environment. The contents of this directory will survive extension shutdown, Docker Desktop restart, and JupyterLab Extension upgrade. However, if you uninstall the extension, all this content will be discarded. Back up important data first.

Change the core image

This Docker extension uses a Docker image — jupyter/scipy-notebook:lab-4.0.6 (ubuntu 22.04) —  but you can choose one of the following available versions (Figure 14).

Figure 14: JupyterLab core image options.

To change the extension image, you can follow these steps:

Uninstall the extension.

Install again, but do not open until the next step is done.

Edit the associated docker-compose.yml file of the extension. For example, on macOS, the file can be found at: Library/Containers/com.docker.docker/Data/extensions/mochoa_jupyter-docker-extension/vm/docker-compose.yml

Change the image name from jupyter/scipy-notebook:ubuntu-22.04 to jupyter/r-notebook:ubuntu-22.04.

Open the extension.

On Linux, the docker-compose.yml file can be found at: .docker/desktop/extensions/mochoa_jupyter-docker-extension/vm/docker-compose.yml

Using JupyterLab with other extensions

To use the JupyterLab extension to interact with other extensions, such as the MemGraph database (Figure 15), typical examples only require a minimal change of the host connection option. This usually means a sample notebook referrer to MemGraph host running on localhost. Because JupyterLab is another extension hosted in a different Docker stack, you have to replace localhost with host.docker.internal, which refers to the external IP of another extension. Here is an example:

URI = "bolt://localhost:7687"

needs to be replaced by:

URI = "bolt://host.docker.internal:7687"

Figure 15: Running notebook connecting to MemGraph extension.

Conclusion

The JupyterLab Docker extension is a ready-to-run Docker stack containing Jupyter applications and interactive computing tools using a personal Jupyter server with the JupyterLab frontend.

Through the integration of Docker, setting up and using JupyterLab is remarkably straightforward, further expanding its appeal to experienced and novice users alike. 

The following video provides a good introduction with a complete walk-through of JupyterLab notebooks.

Learn more

Get the latest release of Docker Desktop.

Vote on what’s next! Check out our public roadmap.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Quelle: https://blog.docker.com/feed/