Fine-Tuning Local Models with Docker Offload and Unsloth

I’ve been experimenting with local models for a while now, and the progress in making them accessible has been exciting. Initial experiences are often fantastic, many models, like Gemma 3 270M, are lightweight enough to run on common hardware. This potential for broad deployment is a major draw.

However, as I’ve tried to build meaningful, specialized applications with these smaller models, I’ve consistently encountered challenges in achieving the necessary performance for complex tasks. For instance, in a recent experiment testing the tool-calling efficiency of various models, we observed that many local models (and even several remote ones) struggled to meet the required performance benchmarks. This realization prompted a shift in my strategy.

I’ve come to appreciate that simply relying on small, general-purpose models is often insufficient for achieving truly effective results on specific, demanding tasks. Even larger models can require significant effort to reach acceptable levels of performance and efficiency.

And yet, the potential of local models is too compelling to set aside. The advantages are significant:

Privacy

Offline capabilities

No token usage costs

No more “overloaded” error messages

So I started looking for alternatives, and that’s when I came across Unsloth, a project designed to make fine-tuning models much faster and more accessible. Its growing popularity (star history) made me curious enough to give it a try.

In this post, I’ll walk you through fine-tuning a sub-1GB model to redact sensitive info without breaking your Python setup. With Docker Offload and Unsloth, you can go from a baseline model to a portable, shareable GGUF artifact on Docker Hub in less than 30 minutes. In part 2 of this post, I will share the detailed steps of fine-tuning the model. 

Challenges of fine-tuning models

Setting up the right environment to fine-tune models can be… painful. It’s fragile, error-prone, and honestly a little scary at times. I always seem to break my Python environment one way or another, and I lose hours just wrestling with dependencies and runtime versions before I can even start training.

Fortunately, the folks at Unsloth solved this with a ready-to-use Docker image. Instead of wasting time (and patience) setting everything up, I can just run a container and get started immediately.

Of course, there’s still the hardware requirement. I work on a MacBook Pro, and Unsloth doesn’t support MacBooks natively, so normally, that would be a deal-breaker.

But here’s where Docker Offload comes in. With Offload, I can spin up GPU-backed resources in the cloud and tap into NVIDIA acceleration, all while keeping my local workflow. That means I now have everything I need to fine-tune models, without fighting my laptop.

Let’s go for it.

How to fine-tune models locally with Unsloth and Docker

Can a model smaller than 1GB reliably mask personally identifiable information (PII)?

Here’s the test input:

This is an example of text that contains some data. The author of this text is Ignacio López Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau López, who was born on 21-07-2021.

Desired output:

This is an example of text that contains some data. The author of this text is [MASKED] [MASKED], but everybody calls him [MASKED]. His ID number is [MASKED]. He has a son named [MASKED], who was born on [MASKED].

When tested with Gemma 3 270M using Docker Model Runner, the output was:

[PERSON]

Clearly, not usable. Time to fine-tune.

Step 1: Clone the example project

​​git clone https://github.com/ilopezluna/fine-tuning-examples.git
cd fine-tuning-examples/pii-masking

The project contains a ready to use python script to fine tune Gemma 3 using the pii-masking-400k dataset from ai4privacy.

Step 2: Start Docker Offload (with GPU)

docker offload start

Select your account.

Answer Yes when asked about GPU support (you’ll get an NVIDIA L4-backed instance).

Check status:

docker offload status

See the Docker Offload Quickstart guide.

Step 3: Run the Unsloth container

The official Unsloth image includes Jupyter and some example notebooks. You can start it like this:

docker run -d -e JUPYTER_PORT=8000
-e JUPYTER_PASSWORD="mypassword"
-e USER_PASSWORD="unsloth2024"
-p 8000:8000
-v $(pwd):/workspace/work
–gpus all
unsloth/unsloth

Now, let’s attach a shell to the container: 

docker exec -it $(docker ps -q) bash

Useful paths inside the container:

/workspace/unsloth-notebooks/ → example fine-tuning notebooks

/workspace/work/ → your mounted working directory

Thanks to Docker Offload (with Mutagen under the hood), the folder /workspace/work/ stays in sync between cloud GPU and local dev machine.

Step 4: Fine-tune

The script finetune.py is a small training loop built around Unsloth. Its purpose is to take a base language model and adapt it to a new task using supervised fine-tuning with LoRA. In this example, the model is trained on a dataset that teaches it how to mask personally identifiable information (PII) in text.

LoRA makes the process lightweight: instead of updating all of the model’s parameters, it adds small adapter layers and only trains those. That means the fine-tune runs quickly, fits on a single GPU, and produces a compact set of weights you can later merge back into the base model.

When you run:

unsloth@46b6d7d46c1a:/workspace$ cd work
unsloth@46b6d7d46c1a:/workspace/work$ python finetune.py
Unsloth: Will patch your computer to enable 2x faster free finetuning.
[…]

The script loads the base model, prepares the dataset, runs a short supervised fine-tuning pass, and saves the resulting LoRA weights into your mounted /workspace/work/ folder. Thanks to Docker Offload, those results are also synced back to your local machine automatically.

The whole training run is designed to complete in under 20 minutes on a modern GPU, leaving you with a model that has “learned” the new masking behavior and is ready for conversion in the next step.

For a deeper walkthrough of how the dataset is built, why it’s important and how LoRA is configured, stay tuned for part 2 of this blog!  

Step 5: Convert to GGUF

At this point you’ll have the fine-tuned model artifacts sitting under /workspace/work/.

To package the model for Docker Hub and Docker Model Runner usage, it must be in GGUF format. (Unsloth will support this directly soon, but for now we convert manually.)

unsloth@1b9b5b5cfd49:/workspace/work$ cd ..
unsloth@1b9b5b5cfd49:/workspace$ git clone https://github.com/ggml-org/llama.cpp
Cloning into 'llama.cpp'…
[…]
Resolving deltas: 100% (45613/45613), done.
unsloth@1b9b5b5cfd49:/workspace$ python ./llama.cpp/convert_hf_to_gguf.py work/result/ –outfile work/result.gguf
[…]
INFO:hf-to-gguf:Model successfully exported to work/result.gguf

Next, check that the file already exists locally (this indicates the automatic Mutagen-powered file sync process did finish):

unsloth@46b6d7d46c1a:/workspace$ exit
exit
((.env3.12) ) ilopezluna@localhost pii-masking % ls -alh result.gguf
-rw-r–r–@ 1 ilopezluna staff 518M Sep 23 15:58 result.gguf

At this point, you can stop Docker Offload:

docker offload stop

Step 6: Package and share on Docker Hub

Now let’s package the fine-tuned model and push it to Docker Hub:

((.env3.12) ) ilopezluna@localhost pii-masking % docker model package –gguf /Users/ilopezluna/Projects/fine-tuning-examples/pii-masking/result.gguf ignaciolopezluna020/my-awesome-model:version1 –push
Adding GGUF file from "/Users/ilopezluna/Projects/fine-tuning-examples/pii-masking/result.gguf"
Pushing model to registry…
Uploaded: 517.69 MB
Model pushed successfully

You can find more details on distributing models in the Docker blog on packaging models.

Step 7: Try the results!

Finally, run the fine-tuned model using Docker Model Runner:

docker model run ignaciolopezluna020/my-awesome-model:version1 "Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, ' ' and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio López Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau López, who was born on 21-07-2021"
This is an example of text that contains some data. The author of this text is [GIVENNAME_1] [SURNAME_1], but everybody calls him [GIVENNAME_1]. His ID number is [IDCARDNUM_1]. He has a son named [GIVENNAME_1] [SURNAME_1], who was born on [DATEOFBIRTH_1]

Just compare with the original Gemma 3 270M output:

((.env3.12) ) ilopezluna@F2D5QD4D6C pii-masking % docker model run ai/gemma3:270M-F16 "Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, ' ' and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio López Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau López, who was born on 21-07-2021"
[PERSON]

The fine tuned model is far more useful, and now it’s already published on Docker Hub for anyone to try.

Why fine-tuning models with Docker matters

This experiment shows that small local models don’t have to stay as “toys” or curiosities. With the right tooling, they can become practical, specialized assistants for real-world problems.

Speed: Fine-tuning a sub-1GB model took under 20 minutes with Unsloth and Docker Offload. That’s fast enough for iteration and experimentation.

Accessibility: Even on a machine without a GPU, Docker Offload unlocked GPU-backed training without extra hardware.

Portability: Once packaged, the model is easy to share, and runs anywhere thanks to Docker.

Utility: Instead of producing vague or useless answers, the fine-tuned model reliably performs one job, masking PII, something that could be immediately valuable in many workflows.

This is the power of fine-tuning models: turning small, general-purpose models into focused, reliable tools. And with Docker’s ecosystem, you don’t need to be an ML researcher with a huge workstation to make it happen. You can train, test, package, and share, all with familiar Docker workflows.

So next time you think “small models aren’t useful”, remember, with a bit of fine-tuning, they absolutely can be.

This takes small local models from “interesting demo” to practical, usable tools.

We’re building this together!

Docker Model Runner is a community-friendly project at its core, and its future is shaped by contributors like you. If you find this tool useful, please head over to our GitHub repository. Show your support by giving us a star, fork the project to experiment with your own ideas, and contribute. Whether it’s improving documentation, fixing a bug, or a new feature, every contribution helps. Let’s build the future of model deployment together!

Start with Docker Offload for GPU on demand →

Learn more

Check out Model Runner General Availability announcement

Visit our Model Runner GitHub repo!

Learn how Compose makes building AI apps and agents easier

Check out Unsloth documentation for more details on the Unsloth Docker image.

Quelle: https://blog.docker.com/feed/

From Shell Scripts to Science Agents: How AI Agents Are Transforming Research Workflows

It’s 2 AM in a lab somewhere. A researcher has three terminals open, a half-written Jupyter notebook on one screen, an Excel sheet filled with sample IDs on another, and a half-eaten snack next to shell commands. They’re juggling scripts to run a protein folding model, parsing CSVs from the last experiment, searching for literature, and Googling whether that one Python package broke in the latest update, again.

This isn’t the exception; it’s the norm. Scientific research today is a patchwork of tools, scripts, and formats, glued together by determination and late-night caffeine. Reproducibility is a wishlist item. Infrastructure is an afterthought. And while automation exists, it’s usually hand-rolled and stuck on someone’s laptop.

But what if science workflows could be orchestrated, end-to-end, by an intelligent agent?

What if instead of writing shell scripts and hoping the dependencies don’t break, a scientist could describe the goal, “read this CSV of compounds and proteins, search for literature, admet, and more”, and an AI agent could plan the steps, spin up the right tools in containers, execute the tasks, and even summarize the results?

That’s the promise of science agents. AI-powered systems that don’t just answer questions like ChatGPT, but autonomously carry out entire research workflows. And thanks to the convergence of LLMs, GPUs, Dockerized environments, and open scientific tools, this shift isn’t theoretical anymore.

It’s happening now.

What is a Science Agent?

A Science Agent is more than just a chatbot or a smart prompt generator; it’s an autonomous system designed to plan, execute, and iterate on entire scientific workflows with minimal human input.

Instead of relying on one-off questions like “What is ADMET?” or “Summarize this paper,” a science agent operates like a digital research assistant. It understands goals, breaks them into steps, selects the right tools, runs computations, and even reflects on results.

CrewAI: AI agents framework -> https://www.crewai.com/ADMET: how a drug is absorbed, distributed, metabolized, and excreted, and its toxicity

Let’s make it concrete:

Take this multi-agent system you might build with CrewAI:

Curator: Data-focused agent whose primary role is to ensure data quality and standardization.

Researcher: Literature specialist. Its main goal is to find relevant academic papers on PubMed for the normalized entities provided by the Curator.

Web Scraper: Specialized agent for extracting information from websites.

Analyst: Predicts ADMET properties and toxicity using models or APIs.

Reporter: Compiles all results into a clean Markdown report.

Each of these agents acts independently but works as part of a coordinated system. Together, they automate what would take a human team hours or even days, now in minutes and reproducibly.

Why This Is Different from ChatGPT

You’ve probably used ChatGPT to summarize papers, write Python code, or explain complex topics. And while it might seem like a simple question-answer engine, there’s often more happening behind the scenes, prompt chains, context windows, and latent loops of reasoning. But even with those advances, these interactions are still mostly human-in-the-loop: you ask, it answers.

Science agents are a different species entirely.

Instead of waiting for your next prompt, they plan and execute entire workflows autonomously. They decide which tools to use based on context, how to validate results, and when to pivot. Where ChatGPT responds, agents act. They’re less like assistants and more like collaborators.

Let’s break down the key differences:

Feature

LLMs (ChatGPT & similar)

Science Agents (CrewAI, LangGraph, etc.)

Interaction

Multi-turn, often guided by user prompts or system instructions

Long-running, autonomous workflows across multiple tools

Role

Assistant with agentic capabilities abstracted away

Explicit research collaborator executing role-specific tasks

Autonomy

Semi-autonomous; requires external prompting or embedded system orchestration

Fully autonomous planning, tool selection, and iteration

Tool Use

Some tools are used via plugins/functions (e.g., browser, code interpreter)

Explicit tool integration (APIs, simulations, databases, Dockerized tools)

Memory

Short- to medium-term context (limited per session or chat, non-explicit workspace)

Persistent long-term memory (vector DBs, file logs, databases, explicit and programmable)

Reproducibility

Very limited, without the ability to define agents’ roles/tasks and their tools

Fully containerized, versioned workflows, reproducible workflows with defined agent roles/tasks

Try it yourself

If you’re curious, here’s a two-container demo you can run in minutes.

git repo: https://github.com/estebanx64/docker_blog_ai_agents_research

We just have two containers/services for this example:

Prerequisites

Docker and Docker Compose

OpenAI API key (for GPT-4o model access)

Sample CSV file with biological entities

Follow the instructions from README.md in our repo to set up your OpenAI API KEY

Running the next workflow with the example included in our repo is going to charge ~1-2 USD for the OpenAI API.

Run the workflow.

docker compose up

The logs above demonstrate how our agents autonomously plan and execute a complete workflow.

Ingest CSV File

 The agents load and parse the input CSV dataset.

Query PubMed

 They automatically search PubMed for relevant scientific articles.

Generate Literature Summaries

 The retrieved articles are summarized into concise, structured insights.

Calculate ADMET Properties

 The agents call an external API to compute ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) predictions.

Compile Results into Markdown Report

 All findings are aggregated and formatted into a structured report.md.

Output Files

report.md – Comprehensive research report.

JSON files – Contain normalized entities, literature references, and ADMET predictions.

This showcases the agents’ ability to make decisions, use tools, and coordinate tasks without manual intervention.

If you want to explore and dive in more, please check the README.md included in the github repository

Imagine if your lab could run 100 experiments overnight, what would you discover first?

But to make this vision real, the hard part isn’t just the agents, it’s the infrastructure they need to run.

Infrastructure: The Bottleneck

AI science agents are powerful, but without the right infrastructure, they break quickly or can’t scale. Real research workflows involve GPUs, complex dependencies, and large datasets. Here’s where things get challenging, and where Docker becomes essential.

The Pain Points

Heavy workloads: Running tools like AlphaFold or Boltz requires high-performance GPUs and smart scheduling (e.g., EKS, Slurm).

Reproducibility chaos: Different systems = broken environments. Scientists spend hours debugging libraries instead of doing science.

Toolchain complexity: Agents rely on multiple scientific tools (RDKit, PyMOL, Rosetta, etc.), each with their own dependencies.

Versioning hell: Keeping track of dataset/model versions across runs is non-trivial, especially when collaborating.

Why Containers Matter

Standardized environments: Package your tools once, run them anywhere, from a laptop to the cloud.

Reproducible workflows: Every step of your agent’s process is containerized, making it easy to rerun or share experiments.

Composable agents: Treat each step (e.g., literature search, folding, ADMET prediction) as a containerized service.

Smooth orchestration: You can use the CrewAI or other frameworks’ capabilities to spin up containers and isolate tasks that need to run or validated output code without compromising the host.

Open Challenges & Opportunities

Science agents are powerful, but still early. There’s a growing list of challenges where developers, researchers, and hackers can make a huge impact.

Unsolved Pain Points

Long-term memory: Forgetful agents aren’t useful. We need better semantic memory systems (e.g., vector stores, file logs) for scientific reasoning over time.

Orchestration frameworks: Complex workflows require robust pipelines. Temporal, Kestra, Prefect, and friends could be game changers for bio.

Safety & bounded autonomy: How do we keep agents focused and avoid “hallucinated science”? Guardrails are still missing.

Benchmarking agents: There’s no standard to compare science agents. We need tasks, datasets, and metrics to measure real-world utility.

Ways to Contribute

Containerize more tools (models, pipelines, APIs) to plug into agent systems.

Create tests and benchmarks for evaluating agent performance in scientific domains.

Conclusion

We’re standing at the edge of a new scientific paradigm, one where research isn’t just accelerated by AI, but partnered with it. Science agents are transforming what used to be days of fragmented work into orchestrated workflows that run autonomously, reproducibly, and at scale.

This shift from messy shell scripts and notebooks to containerized, intelligent agents isn’t just about convenience. It’s about opening up research to more people, compressing discovery cycles, and building infrastructure that’s as powerful as the models it runs.

Science is no longer confined to the lab. It’s being automated in containers, scheduled on GPUs, and shipped by developers like you.

Check out the repo and try building your own science agent. What workflow would you automate first?
Quelle: https://blog.docker.com/feed/

AWS Builder ID now supports Sign in with Google

You can now create an AWS Builder ID using sign in with Google. AWS Builder ID is a personal profile that provides access to AWS applications including Kiro, AWS Builder Center, AWS Training and Certification, AWS re:Post and AWS Startups. AWS Builder ID represents you as an individual and is independent from any credentials and data you may have in existing AWS accounts. Like other personal profiles, AWS Builder ID remains with you as you progress through your personal, educational, and career goals. Sign in with Google offers a convenient way for you to access AWS applications and websites with a single click using your Google account. This eliminates the need for separate credentials, further simplifies the registration process, and reduces the likelihood of forgotten passwords. Returning users will benefit from a frictionless sign-in experience to their AWS applications. Sign in with Google is available to help you get started quickly with any of the applications that support AWS Builder ID.
Quelle: aws.amazon.com

Amazon GameLift Servers adds ability to view and connect to instances in the console

Today, Amazon GameLift Servers launched new console capabilities that let you view and connect to individual fleet instances. The EC2 and Container Fleet Detail pages have a new Instances tab to see a list of instances associated with a fleet. For each instance, there is an instance details page that displays metadata in a human-readable format (data also available via Amazon GameLift Server APIs). From the list and detail views, you can invoke the connect button, open a modal, and launch AWS CloudShell to start an SSM session directly into that instance. These console improvements give hands-on tools to debug, inspect, and resolve issues faster. Instead of relying on external tooling or guesswork, directly investigate host performance, pull recent game server logs, or diagnose issues such as network configuration and instance health – all from within the Amazon GameLift Servers Console. This reduces turnaround time when troubleshooting and enhances visibility into what’s happening “under the hood” of a game server fleet. SSM in Console is available in Amazon GameLift Servers supported regions, except AWS China. For more information, visit the Amazon GameLift Servers documentation.
Quelle: aws.amazon.com

Amazon Connect now provides agent time-off balance data in analytics data lake

Amazon Connect now provides agent time-off balance data in analytics data lake, making it easier for you to generate reports and insights from this data. With this launch, you can now access latest and historical agent time-off balances across different time-off categories (paid time-off, sick leave, leave of absence, etc.) in the analytics data lake. In addition to balances, you can also view a chronological list of all transactions that impacted the balance. For example, if an agent starts with 80 hours of paid time-off on January 1, submits a 20-hour request on January 3, and later cancels it, you can see each transaction’s impact on the final 80-hour balance. This launch makes time-off management easier by eliminating the need for managers to manually reconcile balances and time-off transactions, thus improving manager productivity and making it easier for them to respond to agent inquiries. This feature is available in all AWS Regions where Amazon Connect agent scheduling is available. To learn more about Amazon Connect agent scheduling, click here.
Quelle: aws.amazon.com

AWS Direct Connect announces 100G expansion in Makati City, Philippines

Today, AWS announced the expansion of 10 Gbps and 100 Gbps dedicated connections with MACsec encryption capabilities at the existing AWS Direct Connect location in the ePLDT data center near Makati City, Philippines. You can now establish private, direct network access to all public AWS Regions (except those in China), AWS GovCloud Regions, and AWS Local Zones from this location. The Direct Connect service enables you to establish a private, physical network connection between AWS and your data center, office, or colocation environment. These private connections can provide a more consistent network experience than those made over the public internet.  For more information on the over 146 Direct Connect locations worldwide, visit the locations section of the Direct Connect product detail pages. Or, visit our getting started page to learn more about how to purchase and deploy Direct Connect.
Quelle: aws.amazon.com

Amazon Connect now supports agent screen recording for ChromeOS

Amazon Connect now provides screen recording for agents using ChromeOS devices making it easier for you to help improve their performance. With screen recording, you can identify areas for agent coaching (e.g., long contact handle duration or non-compliance with business processes) by not only listening to customer calls or reviewing chat transcripts, but also watching agents’ actions while handling a contact (i.e., a voice call, chat, or task). Screen recording on ChromeOS is available in all the AWS Regions where Amazon Connect is already available. To learn more about screen recording, please visit the documentation and webpage. For information about screen recording pricing, visit the Amazon Connect pricing page.
Quelle: aws.amazon.com