A Quick Guide to Containerizing Llamafile with Docker for AI Applications

This post was contributed by Sophia Parafina.

Keeping pace with the rapid advancements in artificial intelligence can be overwhelming. Every week, new Large Language Models (LLMs), vector databases, and innovative techniques emerge, potentially transforming the landscape of AI/ML development. Our extensive collaboration with developers has uncovered numerous creative and effective strategies to harness Docker in AI development. 

This quick guide shows how to use Docker to containerize llamafile, an executable that brings together all the components needed to run a LLM chatbot with a single file. This guide will walk you through the process of containerizing llamafile and having a functioning chatbot running for experimentation.

Llamafile’s concept of bringing together LLMs and local execution has sparked a high level of interest in the GenAI space, as it aims to simplify the process of getting a functioning LLM chatbot running locally. 

Containerize llamafile

Llamafile is a Mozilla project that runs open source LLMs, such as Llama-2-7B, Mistral 7B, or any other models in the GGUF format. The Dockerfile builds and containerizes llamafile, then runs it in server mode. It uses Debian trixie as the base image to build llamafile. The final or output image uses debian:stable as the base image.

To get started, copy, paste, and save the following in a file named Dockerfile.

# Use debian trixie for gcc13
FROM debian:trixie as builder

# Set work directory
WORKDIR /download

# Configure build container and build llamafile
RUN mkdir out &&
apt-get update &&
apt-get install -y curl git gcc make &&
git clone https://github.com/Mozilla-Ocho/llamafile.git &&
curl -L -o ./unzip https://cosmo.zip/pub/cosmos/bin/unzip &&
chmod 755 unzip && mv unzip /usr/local/bin &&
cd llamafile && make -j8 LLAMA_DISABLE_LOGS=1 &&
make install PREFIX=/download/out

# Create container
FROM debian:stable as out

# Create a non-root user
RUN addgroup –gid 1000 user &&
adduser –uid 1000 –gid 1000 –disabled-password –gecos "" user

# Switch to user
USER user

# Set working directory
WORKDIR /usr/local

# Copy llamafile and man pages
COPY –from=builder /download/out/bin ./bin
COPY –from=builder /download/out/share ./share/man

# Expose 8080 port.
EXPOSE 8080

# Set entrypoint.
ENTRYPOINT ["/bin/sh", "/usr/local/bin/llamafile"]

# Set default command.
CMD ["–server", "–host", "0.0.0.0", "-m", "/model"]

To build the container, run:

docker build -t llamafile .

Running the llamafile container

To run the container, download a model such as Mistral-7b-v0.1. The example below saves the model to the model directory, which is mounted as a volume.

$ docker run -d -v ./model/mistral-7b-v0.1.Q5_K_M.gguf:/model -p 8080:8080 llamafile

The container will open a browser window with the llama.cpp interface (Figure 1).

Figure 1: Llama.cpp is a C/C++ port of Facebook’s LLaMA model by Georgi Gerganov, optimized for efficient LLM inference across various devices, including Apple silicon, with a straightforward setup and advanced performance tuning features​.

$ curl -s http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."
},
{
"role": "user",
"content": "Compose a poem that explains the concept of recursion in programming."
}
]
}' | python3 -c '
import json
import sys
json.dump(json.load(sys.stdin), sys.stdout, indent=2)
print()
'

Llamafile has many parameters to tune the model. You can see the parameters with man llama file or llama file –help. Parameters can be set in the Dockerfile CMD directive.

Now that you have a containerized llamafile, you can run the container with the LLM of your choice and begin your testing and development journey. 

What’s next?

To continue your AI development journey, read the Docker GenAI guide, review the additional AI content on the blog, and check out our resources. 

 Learn more

Read the Docker AI/ML blog post collection.

Download the Docker GenAI guide.

Read the Llamafile announcement post on Mozilla.org. 

Subscribe to the Docker Newsletter.

Have questions? The Docker community is here to help.

New to Docker? Get started.

Quelle: https://blog.docker.com/feed/

Empowering Developers at Microsoft Build: Docker Unveils Integrations and Sessions

We are thrilled to announce Docker’s participation at Microsoft Build, which will be held May 21-23 in Seattle, Washington, and online. We’ll showcase how our deep collaboration with Microsoft is revolutionizing the developer experience. Join us to discover the newest and upcoming solutions that enhance productivity, secure applications, and accelerate the development of AI-driven applications.

Our presence at Microsoft Build is more than just a showcase — it’s a portal to the future of application development. Visit our booth to interact with Docker experts, experience live demos, and explore the powerful capabilities of Docker Desktop and other Docker products. Whether you’re new to Docker or looking to deepen your expertise, our team is ready to help you unlock new opportunities in your development projects.

Sessions featuring Docker

Optimizing the Microsoft Developer Experience with Docker: Dive into our partnership with Microsoft and learn how to leverage Docker in Azure, Windows, and Dev Box environments to streamline your development processes. This session is your key to mastering the inner loop of development with efficiency and innovation.

Shifting Test Left with Docker and Microsoft: Learn how to address app quality challenges before the continuous integration stage using Tescontainers Cloud and Docker Debug. Discover how these tools aid in rapid and effective debugging, enabling you to streamline the debugging process for both active and halted containers and create testing efficiencies at scale.

Securing Dockerized Apps in the Microsoft Ecosystem: Learn about Docker’s integrated tools for securing your software supply chain in Microsoft environments. This session is essential for developers aiming to enhance security and compliance while maintaining agility and innovation.

Innovating the SDLC with Insights from Docker CTO Justin Cormack: In this interview, Docker’s CTO will share insights on advancing the SDLC through Docker’s innovative toolsets and partnerships. Watch Thursday 1:45pm PT from the Microsoft Build stage or our Featured Partner page. 

Introducing the Next Generation of Windows on ARM: Experience a special session featuring Docker CTO Justin Cormack as he discusses Docker’s role in expanding the Windows on ARM64 ecosystem, alongside a Microsoft executive.

Where to find us

You can also visit us at Docker booth #FP29 to get hands-on experience and view demos of some of our newest solutions.

If you cannot attend in person, the MSBuild online experience is free. Explore our Microsoft Featured Partner page.

We hope you’ll be able to join us at Microsoft Build — in person or online — to explore how Docker and Microsoft are revolutionizing application development with innovative, secure, and AI-enhanced solutions. Whether you attend in person or watch the sessions on-demand, you’ll gain essential insights and skills to enhance your projects. Don’t miss this chance to be at the forefront of technology. We are eager to help you navigate the exciting future of AI-driven applications and look forward to exploring new horizons of technology together.

Learn more

Explore our Microsoft Featured Partner page.

New to Docker? Create an account. 

Learn how Docker Build Cloud in Docker Desktop can accelerate builds.

Secure your supply chain with Docker Scout in Docker Desktop.

Start testing with Testcontainers Cloud.

Subscribe to the Docker Newsletter.

Quelle: https://blog.docker.com/feed/

R7i-Instances in Amazon EC2 sind jetzt in der Region AWS GovCloud (USA-Ost) verfügbar

Ab heute sind R7i-Instances in Amazon Elastic Compute Cloud (Amazon EC2), die auf maßgeschneiderten skalierbaren Intel-Xeon-Prozessoren der 4. Generation (Codename Sapphire Rapids) basieren, in der Region AWS GovCloud (USA-Ost) verfügbar. Diese kundenspezifischen Prozessoren, die nur in AWS verfügbar sind, bieten eine bis zu 15 % bessere Leistung als vergleichbare x86-basierte Intel-Prozessoren, die von anderen Cloud-Anbietern verwendet werden.
Quelle: aws.amazon.com