April 29, 2026 01:21:34 E9

Can AI Agents Safely Become DevOps Engineers?

It's a bit of a shortcut to
think AI non deterministic.

We need to be deterministic, so
we shouldn't use AI for that.

I think the right analogy is AI has the
ability to replace human beings on certain

tasks or help human beings augment them.

So it's not about replacing, everything
that people are doing, but in terms of,

of analogy, it's really about building
an agent that can behave and think and

act exactly like a human being would do.

So, for instance, your platform engineer
or your DevOps engineer, working for

you is actually non deterministic.

Humans are non

Humans are non deterministic.

Right.

This is the Agentic DevOps Podcast.

I'm your host, Bret Fisher, and today
I have my guest, Sam Alba of Mendral

on Sam goes way back, uh, one of the
early devs at Docker, co-founder of

Dagger, and now co-founder of Mendral.

So he's been focused on Cloud
native and now Agentic products

for 15 plus years at least.

And this is a wide ranging conversation.

We mostly focus on Mendral, the tool.

And the elevator pitch, I guess, that I
would give is, this tool currently looks

at GitHub, it's focused on GitHub Actions
right now, but GitHub as a platform,

particularly on actions and the workflows
and the events that happen there, but

not just actions and it tries to act like
a DevOps junior engineer essentially.

It's one, one of the closest, if not
the closest thing I have had as someone

who focuses on GitHub Actions and
CI in there and just administrating.

And managing that platform for DevOps
teams in particular, uh, it feels like

the closest thing to an AI buddy that
is constantly looking at my GitHub,

finding problems, not necessarily in the
code that the application developers are

making, but in everything else around it.

The linters, the testing infrastructure,
the GitHub action workflows and those

pipelines that are happening in there,
the logging events, uh, anything

misconfigured in security that dependabot
and renovate stuff, like just all the

stuff that is focused around the code.

For managing the platform for
storing code, running workflows

and automations on that code,
and then eventually shipping it.

They operate in this middle space
between the developers and the

production deployments in infrastructure.

And I think that's a sweet spot for me.

Like that's exactly what I
wanted to talk to Sam about.

And we spend quite some time digging
into the use cases for this thing.

I actually run through some of my own
experiences because I, they onboarded

me to the platform over a month ago, so
I've been using this thing irregularly,

but as a single operator of over a
hundred repos, now, in my own GitHub,

some of those are actual infrastructure
things in production for my own use, but

also a lot of examples for my courses,
a lot of demos and sample tools and

sample code that I'm have to manage.

I treat this like my operations, right?

So I'm doing the dependabot, I'm
doing the security reviews of things.

I'm automating things with GitHub Actions.

So while I'm not necessarily the
picturesque large team, managing

large, big projects on there, I do
operate my own little business of one.

I operate that in a very
similar fashion and I need help.

I need a lot more DevOps help than
I admit, because I need to operate

my business and I don't necessarily
have time to manage the platform

automation stuff around my code.

And that's really the problem I think
that Mendral's trying to go after.

So we break down what it does, what
they're doing as an early stage startup.

They just graduated YC.

And we then get into even more details
around AI and because they're building

today and they're graduating just
a few weeks ago from Y Combinator,

they are one of these new AI
companies that are building with AI.

They're using AI in the product
and their product ships AI

features to us as the users of it.

So they're kind of the
triple threat of AI.

So we lean into that a little bit,
talking about what they're using AI for,

what they see for the future of their
product, how do we going experience

these things in the future where we're
all operating our own AI harnesses

to manage our agents like Claude Code
and OpenCode and whatnot, when, how

do we use this tool in the future?

Anyway, we get into all that.

It's a great conversation.

I, I was excited to, to have it and we
went on so long that I, at some point

just had to say, okay, we we're gonna
have to stop talking and make this

another episode because we could have
gone for, uh, hours, I believe on this.

So please enjoy this episode
with Sam Alba of Mendral.

Welcome to the show.

Hi Bret.

Thanks for inviting me.

we started this new company, Mendrall.

We are now building an AI DevOps engineer.

we, we basically see that, you know,
the emergence of coding agents and

how they are shaping the, the future
of CI, CD and software delivery.

And so we're building an
agent that can unblock steam.

and now thanks to AI, we can
automate certain things that

we could not automate before.

And so today we have an agent that,
monitors and fixes, some of the,

software delivery issues, flaky tests,
slow build, broken release processes.

So, let's get into it because I,
so for, for the audience, Sam and

I talked, I don't know, a couple
months ago at least, I think.

and we had talked about Mendral.

I'm building GitHub Actions courses
and content, and I've been a GitHub

Actions consultant for probably
half a decade at least, or more.

I think, actions I argue is like the
most popular, certainly for open source,

but the most popular, CI platform.

I'm just kind of going to call it
automation platform from now on because

people do a lot more than just CI
in there and deployments and stuff.

Solomon had clued me in to you all
because we were staying in touch

around Dagger and, that's still going.

we started talking about your
focus and it felt like a tool,

that I should have in my toolbox.

it also felt like something where
typically with large platforms,

just like a cloud platform, like
AWS or Google Cloud, GitHub Actions

is a very raw platform to me.

I feel like it's got a lot of features.

It's got a lot of sharp edges.

But it also stops short of
what a, team typically needs

out of everything, automation.

And, we all talk about private
runners and we talk about sometimes

like custom dashboards or, you know,
org level statistics and awareness,

more, Observability into GitHub
Actions those are common questions.

and I don't often have
great stories for that.

because I think that market historically
where people have tried to have very niche

little products, but pre AI, that solved
a little pain point for GitHub Actions.

One, GitHub Actions wasn't quite
as popular five years ago, right?

It hadn't rose Jira and Travis
and a lot of the other ones.

And at the same time, there wasn't
the popularity, but these tools,

since they were pre AI, they were
very limited, I feel like, in

things that they could help with.

And so we saw, I saw personally little
companies starting out, like little

hobby products that almost felt like it.

They weren't true, they weren't YC
Combinator companies trying to come out

and be actually full fledged company.

They were more like side projects
and someone figured out, oh, I

could spin up runners faster.

Which is like a whole new segment of the
market where there's now many companies,

that host your runners for you, and
they're faster and cheaper and better.

And then we've had little companies
experiment with like GUIs or

web dashboards that do more than
what you might get out of the

basics of GitHub Actions there.

When we walked through yours,
was very excited about it.

because rarely on this podcast, after
we're about to hit 200 episodes of

this and the DevOps and Docker talk,
and rarely is there something there,

especially in the Kubernetes land,
that I feel like is meant for me.

That's something that solves my
problems, even as a solo developer

and as a consulting DevOps engineer.

So, could you talk a little bit
about like, when you both said, Hey,

look, we're going to start this whole
new company because we believe this

is the right time for this thing.

Like, Where was your headspace?

What problems were you trying to solve?

Yeah, so, is a really broad, area.

actually mentioned it.

not exactly for just running your test.

It's more like a workflow engine.

it's a lot about orchestration
and automation, and we

do a lot of things in CI.

CI was always a bottleneck.

every time, you know, you have
teams, as soon as you have some CIs

you start with some workflows, like
a linter, a builder on your GitHub

Actions or some other CI systems.

And CI is always a bottleneck because
first of all, it's a central place

for integration and for running
your tests when you ship your code.

When we started Dagger, we wanted
to help people with this bottleneck,

building programming tools so
engineers could actually solve

their CI issues more efficiently.

When we started, this new company,
Mendral, Andrew and I saw an opportunity

to finally automate certain things
that we could not automate before AI.

You know, for instance, there
are some release processes

that you need to run manually.

every team that's growing has
some sort of manual operations in

their, in their release process.

And on the other side, they have
issues that they don't spend time on

fixing because it's never the priority
you want to build your product.

You don't want to build your CI.

And the problem is these
problems are piling up.

That bottleneck is even getting bigger
with aI because now you have coding agents

that push a lot of code to your CI system.

And so the problem, this bottleneck that
was already a problem is getting worse.

And it's only the beginning.

And so we thought that now we have the
tools to, well now the problem is bigger

one side there is more demand for it.

And on the other side, we finally
have the tools thanks to AI to

solve these problems efficiently.

So the idea with Mendral is not
so much to make your CI better and

equip developers, it's actually to
replace that work, to automate that

work entirely so those developers
can focus on their applications.

So that's really TLDR
of why we started that.

And, and obviously software delivery
and CI CD in general is really broad.

There is security involved.

There is, you know, quality
control regression testing, like

a lot of things goes through CI.

And so we started initially
by building, we built an and a

data platform really, looks at

everything that's going on on your CI
system, the logs, the code changes,

the past incidents, all of the events,
the behavior of the team, like all

of this data, and then we started
to build specific agents on top.

That's really what we're
building with Mendral.

We have a lot of, you know, when
we started YC, we didn't know

exactly what we had a working
MVP, but we didn't know when, what

would be interesting for people.

We ended end up spending a lot
of time fixing flaky tests and,

you know, reliability problems.

We have some teams actually using the
agent to improve performance of their CI.

Like for instance, implement sharding,
strategies on top of their pipelines,

to have some parallelization and so
they can ship faster because the CI can

complete in a shorter amount of time.

There are also some teams
using us for security reasons.

Like for instance, our agent is
looking at security alerts, look at A

CVE, see if it's exploitable on your
code base, auto remediate if needed.

So there are really things
that you would expect a senior

DevOps engineer to do for you.

Uh, some of those teams actually
don't have specific roles for that.

So use the agent it's a person.

Yeah.

I think I've been in tech for 30 years,
20 of those years I've been dealing with

some sort of code management system.

And that doesn't always have an
automation system built, like we

didn't always have something like
GitHub where the automation system

and the code storage
system were the same thing.

but I'm just thinking back, and I don't
know a time where I would agree that

the CI or the automation system for code
was a first class citizen in the team.

It always feels like it's.

The, the, I don't know if this is an
American phrase, the, redheaded stepchild.

Like it always, yeah, it always felt
like it was just barely working.

And especially when it came to Jenkins,
we're not sure if we could exactly

get, we don't know how long it would
take to recover from a server failure.

Sometimes those servers
are under someone's desk.

Sometimes those servers are unique,
often, especially with Jenkins, not to

pick on it, but it was the most popular
and it was self hosted, so you always had

these special snowflakes of servers that
were built typically by developers, not

usually by ops, where the professional
sysadmins were operating, right.

Where they were the ones creating
systems to manage and control servers.

But often I would walk in and find the
dev team had their special CI thing and

it might be running on someone's machine.

It might be under the desk,
it might be in the closet.

And those days, I think because we're
pretty much all using at least some sort

of cloud API for managing our automation.

And we might have runners in different
places, but we're, leaning into more

cloud stuff, although it's amazing how
much I still see Jenkins and talk to

people that are still using Jenkins.

I often look at this as like, we have
needed tools like this for so long to

help clean up the, to me it's like the
janitor because while you're saying

you're building the DevOps engineer, I
know that every DevOps teams wish they

had another DevOps engineer to help them.

And feel like right now, at least
in this moment in AI, management

seems to think that they're going
to be reducing the number of DevOps

engineers not increasing them.

I just recently launched something
called the Agentic DevOps Guild

for people, personal plug.

But this is a membership program
where DevOps engineers are coming

in to help accelerate their AI
learning onboard AI tooling.

and I do these onboarding calls.

And so for the last three weeks I've
been having multiple, calls, with

engineers as they come onto the program.

And one theme that's maybe not a
majority yet, but it's a consistent

theme is, DevOps engineers are worried
of unreasonable expectations of their

management to do magic and they don't
even know these tools yet, right?

Tools like Mendral that run on
top of maybe an AI infrastructure,

they don't yet know Claude Code.

They're just dipping their
toes into copilot sometimes.

I think operators and DevOps engineers
are maybe a little bit behind

application software developers in
terms of their expectations to onboard,

development or coding tools for AI.

And of course, just six months ago I
was talking to people at KubeCon, that

were saying, well, we're never going to
put AI into infrastructure management

because that's a non deterministic
and we need full determinism.

And yet when I use your tool, which
feels like I don't actually know your

architecture, we'll get into that.

But it feels like there's
AI in the background.

I'm writing.

I'm editing plans for execution
with it in human language, right?

Like, I'm not checking boxes.

I'm talking, I feel like I'm writing
back I'm not literally chatting, but

I'm writing a plan, helping it edit
the plan so that it executes properly.

I'm writing, paragraphs of sort
of my rules for how it should do

certain things in infrastructure.

Um, so it feels very AI based even
though I'm not, I'm not, I'm not

literally chatting with the chatbot yet.

But at the same time, I feel like
we're, DevOps engineers are just in

this unfortunate situation right now.

We're getting hit from all sides.

We're expected to keep operating
our understaffed janitored, or

caretaked, infrastructure, right?

Our struggling infrastructure.

we're worried that we're going to
have less staff here pretty soon.

We haven't yet completely consumed all the
AI madness that the app developers have,

because we know that basically we're one
prompt away from production going down

because all we typically have the keys to
the kingdom, and a lot of teams have keys

that, you know, I was just talking to one
of the engineers onboarding that said,

I have the Terraform keys on my machine.

I have all the production AWS and
cube control keys on my machine.

I need to learn how to
sandbox this AI agent.

Or if it does one thing wrong,
I am I'm probably losing my job.

Right?

And that just feels much higher
stakes than an app developer that

got the wrong font on something
and has to recommit a new PR.

It just feels higher stakes for
all of us in the operations realm.

So that all being said, it feels
like you've kind of nailed it

without saying like too much of
a fan boy yet, but it feels like

you're solving a problem for me.

And so, for the audience, over the last
week, I've actually been working with the

team to try to get more of my problems
solved mean, I have over 100 repos.

Majority of those are training
repos or sample repos for learning

Docker, Kubernetes, GitHub Actions.

Like, so there's lots of sample
code, but it's app code, right?

There's lots of sample code GitHub
Actions, kubectl, like all sorts

of various infrastructure stuff.

I have an impossible time to keep,
like I cannot keep up with all of it.

I can't even keep up with the NPM
updates, much less, you know, CVE scans

failed, linting jobs, failed test runs.

There's just so much stuff that's
happening in the background.

I just, basically ignore it until
it becomes a problem for my students

or somebody bugs me in an, issue.

And I feel like finally with Mendral,
from a user's perspective, it's giving

me an opportunity to shortcut that.

It doesn't fully automate everything
for me yet, so I want to ask about

that, like where your vision's at.

But doesn't fully automate, doesn't
solve all my problems automatically.

It just feels like it's raising the stakes
of what's more important for me and not

to get distracted on the stupid stuff
that I might not need to worry about

because they're not really an issue.

They're just, you know, a failed
linting job isn't as important as

a failed test job, Um, so it's nice
that it elevates things that are

important and it also rolls things up.

it's like if you had this issue
a hundred times in a hundred

repos, maybe fix that one first.

And so where do you see all of this going?

If this thing is, you know, helping,
if it's giving intelligence and

insight, is what I feel like
it's doing to the failures that

I have in my automation platform?

Is this thing eventually
like learning from me?

And is it going to start solving
some of these automatically?

Like, where do you see that going?

Yeah.

So, yeah, it's an interesting question
and, is a lot to say about what you

said earlier about deterministic and
non deterministic and the use of AI,

pipelines that are deterministic actually.

and so.

Really briefly, I'll explain, just
for context, the, how Mendral is, is

behaving and how the product works.

And so usually teams, on board with a
single GitHub app install, one click

and then we start ingesting all the
CI logs and events on our platform.

And we basically run the agent in
such a way that the agent can see

everything that's going on on your CI.

It's like, it's like someone looking
at your logs, your events, your code

changes, everything that's going on and
looks for opportunity to be helpful.

you mentioned linter failing, that's
one, can be one of your pipeline

is slower by 30% percent this week.

And it was faster last week.

Why?

It can be GitHub is down,
like, uh, what's going on?

everything is broken.

The agent actually is able to
spot these kind of problems.

and and tell the team, don't worry,
the GitHub is down, it will come back.

It's not you.

Uh, they are, looking for many
opportunities to be helpful the problem

with most AI tools is that, You know,
people tell you, oh, it can do anything.

It's very powerful, but you still
have to find the right thing

that need to ask to the chatbot.

you know, One of the key, architecture
when we started to build the product was

that we didn't want to give yet another
dashboard, yet another chatbot to people.

And that's why it's an agent that joins
your Slack and start working for you

exactly like a human being would do.

One thing that's important to, to
keep in mind is it's not for me, it's

a bit of a shortcut to think AI non
deterministic, non deterministic.

And We need, we need to be deterministic,
so we shouldn't use AI for that.

I think the right analogy is AI has the
ability to replace human beings on certain

tasks or help human beings augment them.

We have some of our customers who already
have DevOps engineers and big platform

teams, and actually they have Mendral
joining that team and augmenting them.

So it's not about replacing, everything
that people are doing, but in terms of

analogy, it's really about building an
agent that can behave and think and act

exactly like a human being would do.

So, for instance, your platform engineer
or your DevOps engineer, working for

you is actually non deterministic.

Humans are non

Humans are non deterministic.

Right.

And the, output and the work
they do can be deterministic.

Like example of a linter,
like linter breaks.

We have to understand mistake,
what's the problem, why it broke.

All of that can be done by AI.

Then fixing the linter and pushing a
PR that can be totally deterministic.

Uh, so the way Mendral works today,
is we, didn't push the Cursor too

far in terms of automation because
we care a lot about security.

Like we have fairly large teams
using the agent in production.

And so we're very cautious about
the kind of, changes we make.

because the agent actually has
the capability to open PRs and

push code to people's repos.

And So what we do is the agent will
ask every time when it wants to do

something, it will ask for the permission.

So you have to confirm that yes,
you can go ahead implement that.

Yes, you can go ahead and do this or that.

Over time we have people asking us more
and more to actually automate more.

So if the level of confidence of the agent
is greater than, let's say 85%, percent, I

want a PR to be open automatically.

because actually have a pretty
high merge rate from the pull

request that module opens the pull
requests that are accepted by teams.

And the reason for that is because
we have fairly long coding session.

So when the agent implements
something, it's fairly similar to

what you Claude Code or Cursor.

But the main difference is that it
will wait for the CI to complete,

wait for the logs, to show up and
actually wait for the confirmation

it actually fixed the problem.

So if it's fixing your linter, it
will wait to confirm the agent itself,

will wait to get the confirmation
that the problem was solved before

saying, yes, okay, my PR is ready.

Exactly like a human
being would do, right.

And so, that's the main thing.

but yeah, over time, like we're
going to push for more automation

and we're going to do a lot more.

you asked also about the,
um, the, learning phase.

That one is actually very interesting.

I don't know if you want to react
to what I just said, or if I should

expand on the learning aspect.

Yeah, let's talk about the learning.

So on the learning side,
what's very important exactly.

Like when someone joins your team,
the person doesn't have context.

So you start by watching the team, joining
the team, making themselves useful, right?

but then over time, the person
will knowledge a lot of things.

problems, the tools, the best practices,
and then it will get better and better.

Same thing with the agent.

So the agent, when it sees something,
that needs to be remediated or it sees

problems, it can identify patterns and
maintain a list of what we call insights.

They are basically opportunities for
the agent to do something can be a

failure, can be a performance regression,
security alerts, etc. All of that

is being tracked by the agent and
constantly being refreshed with new data.

That's why, for instance, if it spots
a problem the first time, the level of

confidence on the resolution might be low.

And then as problem appears several
times, level of confidence will

get higher to the point that we can
entirely automate the resolution.

And that happens based on, the agent
being able to, to constantly update

this living memory and taking that
into account whenever it is a problem.

So when it sees the problem, it
doesn't just look at the problem in

the logs, it sees at all the context
of the problems that happened before.

It has the ability to look at, you
know, similar issues or similar

patterns that he saw in the past.

And I can dig into the implementation
of that because it's actually

quite interesting how we architect
architected this, this agent.

But then also team sometimes talk
to the agent on Slack and say, oh,

whoa, whoa, whoa, you did that,
but we actually don't do that.

Like let's say, you know, we,
always follow that benchmark, so we

always use this tool or we always
do this, and basically people react

to the agent and say, Hey you keep
in mind we do this and not that.

And the agent also has a memory
system and maintains its memory.

You can also review the memories
and edit them, et cetera.

But yeah, the idea is to make that
learning part entirely automated so the

team doesn't have to care about what
the agent knows and what it does not.

and so that's really the key part,
about the non deterministic way.

I think the right way to frame it
is really think about what a human

would do and think about the fact
that an LLM can resonate and navigate

through our problems in the same way.

you and so that's really what makes this
kind of automation possible today it

was not before LLMs were good enough.

Yeah, and it's subtle.

The more I feel like we're spending
leaning into just trying to use AI

in various scenarios, the more I feel
like my mind expands to understand

where it actually could apply to
things that I didn't even think about.

Like I feel like to me, Mendral,
the premise to a, to a non AI,

what I would call a blue pillar.

keep a blue pill and red pill
matrix, 25 year old dated reference.

but the blue pill people would be the
ones who were not necessarily anti AI,

but very, you know, very much not pro AI.

They're not leaning in hard, they're
like, yeah, yeah, it might help me

with my code completion, but I'm
not looking to put AI everywhere.

And that's fine, there's people like that.

And then there's the red pillars,
and I used to consider myself the

blue, and now I'm basically all red
and looking for new opportunities.

In this use case where we're saying,
okay, yeah, it's going to be helpful in

CI at first when, you have this three or
four years of experience of hallucinating

agents and things going crazy.

That sounds like a wild premise.

But what I find is interesting, and
I'm using this also in my own work with

things like Claude Code and OpenCode
and learning how skills and other,

other new tools all patterns that us
humans are using to help guide the

AI, give it more context, give it more
guardrails so that it won't hallucinate.

And we, like you said, we get to
like the 85% percent trust level,

you know, we get to this certain
level of trust, particularly

with a certain model or a certain
harness, and then we start to relax.

And I've noticed recently, like even
my prompts are getting sloppier.

Like, I'm not prompt
engineering anymore, right?

Where we were consumed with that a year
ago, where you got to have the best

prompt, the only way you're going to get
a good, reliable AI is the best prompt.

And you go to these websites and they
have all these listed prompts, and now

I'm just like, Hey, can you fix that?

I'm very vague, I'm very casual
like I would to an employee.

But for me, I know that, that one,
that's because I've, I'm consistently

using skills, which are these large
documents full of context, and that

I'm operating on larger and larger,
whether we want to call it context

or memory or sessions, whatever.

In the AIs, I'm constantly Continue to
use the same session, which it compresses

and I expand it and it compresses.

what I'm finding in my AI conversations
as I'm able to stick with the same

conversation, the same session much
longer because now we have Claude

Code with million token context now.

And that just unlocks, I feel like for me,
I mean, yes, I'm using more tokens because

I'm in the same session, so I'm not a
Ralph Loop, fanatic where I'm constantly

dumping context and starting fresh.

But I find that I'm actually able to
have these conversations that extend for

more than a few days, but into weeks.

And it remembers the thing
that I told it two weeks ago.

And that's where it starts, I think,
to get really, really interesting.

And that feels a little bit like what
I'm playing around with, with Mendral,

I wanted to call out one feature that I
just started playing with a couple days

ago and this very specific problem of,
we've got so much, what's, the term we use

for teams, that have all the canon inside
their brains and it's not documented.

I'm trying to think what that,

It's tribal knowledge.

Tribal knowledge.

Okay.

So, a while ago, and I love that
term, tribal knowledge, like

these are the things that you
onboard that new DevOps engineer

And they're, they're not
going to just read docs.

they're going to make mistakes.

And then a team member is going to say,
oh, no, no, no, we don't do it like that.

This is what we do.

I ran into that where it wasn't
documented in any of my agent files yet.

It wasn't anywhere in code or, documented
that when I run superlinter, particularly

against GitHub Actions, because since
I'm teaching GitHub Actions consulting

GitHub Actions, I'm all into that.

and I've always been very concerned
with the security side of it.

And now that we've had a rough year,
GitHub Actions has had a rough year.

This last year has not
been kind to, to me.

the GitHub team in terms
of security attacks.

I'm not even going to
call them vulnerabilities.

I'm just going to call them sharp edges
and misconfigurations a lot of times.

because it turns out
they're not hacking GitHub.

They're just finding people
that didn't configure things

correctly on an open source repo.

And this is all happening.

So I'm always scanning my GitHub Actions.

I use actionlint and I use
something called Zizmore, I

think that's how you say it.

Zizmore is Zizmore.

Um, And Zizmore is very
focused on security stuff.

And so down this rabbit hole, I'm going
to go with you for a second, but what

was happening was they added a new rule
to the linter that says, from now on

you need all of your actions need to be
pinned, Which for those of you out there,

if you're using GitHub Actions, hey, if
you don't know about pinning, I've got

a bunch of videos and courses on that.

Like, you absolutely should
be pinning all your actions.

And rightfully so, now, the linter warns
me that I'm not, when I'm not doing it.

And in reusable workflows, there is a
pattern that surfaces with teams where

if they're controlling the reusable
workflow, they don't tend to pin to the

SHA hash from their calling workflows
because they control everything

centrally from the reusable workflow.

And that's the point of them is to
centrally control things so that

we don't have a hundred different
repos that I have to update every

time a different workload changes.

And so this security tool doesn't
know that, Mendral doesn't know

that that is not documented
anywhere in any of my systems.

It's not in my notion,
it's not in my repos.

So that's tribal knowledge.

And what I was able to do is it was
flagging a bunch of these things and

saying, Hey, Mendral, Mendral was thinking
the fix to this is that it needs to

pin the SHA, but in my team of one, the
real fix is no, we need to just write

an ignore rule for the Zyzmor linter.

And that's very specific to
my workflow, not something

that other people would know.

And so I could give, instead of making
that documented somewhere and then

somehow connecting some MCP and some
convoluted way to let Mendral know it

has this memory feature where I can just
sort of dump in this tribal knowledge

in just copy and paste or writing in
stories, basically user stories that

helps it guide the decision of how it's
going to treat future failures, because

what was the outcome was happening, and
sorry for the audience, this is a really

long story, but I feel like it's very
tactical and relevant as an example.

Is, in that scenario, if I had a junior
engineer, I would say, okay, please write

all, we're going to need to update all
of these calling workflows with a hundred

repos now need this new line inside of
the GitHub action for this particular

calling workflow that calls a reusable.

And then from now on, everyone in
the team needs to know we don't

shop in on our calling re workflows,
we only shop in on the Reusables.

And with Mendral.

I was able to put in one memory and I
haven't seen the outcome yet, but I'm

gonna ask you in theory, I guess that
means that from now on, every time

Mendral sees this error, its new PR plan.

That it's going to give me, isn't
going to say, Hey, I'm just going

to replace this with a SHA hash.

It's going to say, oh, I'm
going to put in a new rule.

I'm going to basically ignore this in the
Zyzmor linter so that we don't, we know

that this is okay and we don't flag it.

Is that kind of the
expected outcome of that?

Yeah, that's right.

there, are many ways you can influence,
the behaviors of the agent today.

And yes, indeed you can
actually create a memory.

It makes sense when there is
a pattern that you want to be

widespread and apply everywhere.

the sticky note that you want to
put on the deck of your engineer.

That you want to make
sure is never forgotten.

Another thing you can do is, for
instance, let's say you want to

migrate or to implement, some sort.

of, let's say you have this new linter
that you want to migrate from an old

one, or you want to implement a new
security tool or something like that.

So, the outcome will be deterministic,
but asking that will be certain, more,

likely done in a non deterministic way.

maybe there are some
things that you don't know.

Like for instance, the agent has the
ability to do a web search or like look

into best practices or things like that.

And so what you can do is also ask
the agent, Hey, this is my plan.

This is what I noticed and this is a
problem for me, I'd like to address it.

The agent will actually create an
insight for it and track it, and

propose an implementation when the
level of confidence is high enough.

And and yeah, it has also the
ability to, update, findings.

So sometimes the agent, you know,
you talk about tribal knowledge.

There is also some of the tribal
knowledge and our own experience,

as as engineer with Andrea,
Olivier into The agent itself.

So we constantly change the way we, you
know, Mendral does stuff, and that's why

it feels magical sometimes when people
on board with it, because the, what they

actually don't realize is the agent has
our own experiences hardcoded in, in

some of the subagent that we, we wrote.

And so you can actually,
influence that too.

And the agent has the ability to update
its own insights based on your input.

So if there are certain things
you disagree with, because let's

say, you know, if you would hire
me as a DevOps engineer, I have

certain things I would like to do.

And you're like, well, no, actually,
Sam, I'm paying you, so I need

you to do it this way instead.

can ask Mendral the same
thing, and it will be fine.

So yeah, the, customizability is
important because that's the kind

of control you need from someone
you would hire as well, right?

You expect this person to bring
their experience, but also you

have your own requirements.

so so yeah, that's, that's
something we constantly, improve

and, make available to, teams.

Yeah.

And that's the kind of thing where, mean,
testing tools have this problem, linting

tools have this problem, really any tool.

But I think like those are the two
areas where, maybe the linters might

be the worst, where when something
new is added, a new rule to a linter.

uh, I, think I had another
tool recently, actually.

Well, that was, that was a NeoVim tool

Okay.

It's like everybody's refactoring
all their apps now with AI.

So it broke a lot of things and
I spent an hour on it and AI was

able to solve it so much faster.

So, but not so not related.

That was just something
that happened yesterday.

Because I mean, the stories, if someone's
like in the trenches as a DevOps engineer,

that scenario that gave, while it's just
a linter, not inherently high stakes,

I it wasn't a testing failure or a
production deployment failure or anything.

But that's the kind of thing when I work
with DevOps teams, that's the toil, right?

That's the thing that they don't want
to have to go change literally a hundred

repos of microservices and back end
things because they've implemented

some sort of central reusable workflow.

But now the crux of all that is
they have this thing everywhere

and something breaks, and now
they don't have the tooling.

To update.

They basically end up spending days
writing scripts to create the PRs in

100 different repos to fix and then,
automate the creation of the commit,

creation of the PR, the acceptance of
the PR, and then the merging of the PR.

Because they don't want to literally
spend 3 or 4 days just mindlessly

clicking through 100 repos to do this.

And I've lost count of how many times
I've watched teams go through this

toil of, like, your week is going to
be fixing a stupid linter rule across

50 to 100 repos because the, dev
teams don't want to do it or whatever.

Or it's now our job to do it for
some reason, even though we're

not the application engineers.

Sometimes I have worked with teams
where it's the DevOps engineers

literally implementing the linting
rules for the software engineers.

I don't know why that happens,
but, sometimes we get saddled

with work that's not ours.

But that automation part,
I feel like is another.

it's almost like a hidden feature that
talk about raising issues, we talk about,

helping understand the nuance of things
and the patterns that you're seeing

so that it can be more intelligent.

But at the very end of this really is
all about, to me, saving the toil of

manually checking repos for things.

Because, constantly struggling
with workflow failures that

aren't rising to the top.

Because when you have a big
enough team, you've always got

workflows running, you're just
constantly inundated with workflows.

And the challenges is the alert fatigue
in Slack or whatever tool you're

using because you're, working with.

just can't keep up.

So you start to say, okay, well now
we're linterfailures, we're no longer

alert in Slack, so now we're going
to just remove those from Slack.

We're too busy to do that now.

We're only going to deal with
deployment failures or only testing

failures on these particular repos.

Like you end up having to force
yourself to ignore a whole series

of problems because you just can't
handle, there's just too much work.

And to me, the more exciting thing
is that I might be able, you know,

and maybe someday I don't know if it
does it today, if it would be able

to go and simply apply a hundred PRs
for that particular calling workflow.

because I name it the
same thing everywhere.

But my hope is that someday, if not
today, like that would be able to get

automated so that I could just be done
like basically a hundred PRs later,

it's taken 30 minutes or something.

I didn't have to write a script,
I didn't have to worry about

nuking or breaking some repos
because I wrote the wrong script.

I can, give that to Mendral,
which is exciting for me.

Yeah, I can tell you more actually
about, about that, because there

are a lot of things we're working
on today actually that some of

it actually might be available by
the time this podcast is out, so,

okay.

yeah, we're working, yeah.

quite, fast on these things because
there is a lot of demand for it, you

know, as soon as people onboarded the
agents and they see the value that it

can unblock thanks to the data layer
and, the, which usually we refer to

with the agent harness, which is kind
of the combination of the tools and the

context and the way we build that and
keep the agent accurate at any given

time, actually, this really the key.

But yeah, in terms of what works today,
so the agent is not stuck with one repo.

Sometimes in CI, when you have,
you use products, you realize

that repo maps to a project that
could sometimes map to a team.

Some companies do it this way.

And the issue is, well, it's great
when you have, you know, that

mapping is great and works fine and
you're okay with those boundaries.

Problem is, most teams are not,
and sometimes having a repo is

just an implementation detail.

And exactly like you said, you want to
think about something you would apply

to all of your repos, all of your codes.

need to think about this repo
behave like this or like that,

etc. TLDR is, Mendral has been
designed, so it's not tied to a repo.

It's actually tied to
an organization today.

uh, that's, that has been
designed one or many repos.

We have teams with like a
gigantic monorepo and that's fine.

And usually what they do is
they have this mapping inside

folders and sub directories.

and then you have some teams who have lots
of repos actually, and there is no mapping

whatsoever from a team or role to a repo.

And Mendral doesn't care.

Like insights can be
applied to many repos.

One thing that we were working on
that that's actually I'm very excited

about is, so we have this data that
comes and that knowledge and the agent

running based on events and teams
asking for stuff and things appearing

and background tasks and all of that.

And so the agent constantly working for
you and constantly looking at its data.

We are building the ability for
you to have your own agents on top.

So basically you will have a
simple way to define your own

sub agent on top of Mendral.

And so it will really feel like
you're giving, some specific

instruction, like almost a mission
to Mendral behind the scene.

In terms of architecture,
it's actually a real agent.

It's like a real agent that's
linked to everything else.

And it's a sub agent that will
be called by main Mendral agent

whenever it needs to be called.

And so you'll have the ability to say
whenever, there is a code change or

whenever, like, we'll actually add
mapping to other sources of data.

it could be whenever there is new
exception on Sentry, for instance,

because we are actually expanding
beyond the CI logs also, I want to do

X, YRZ, want to do this, and I want
to be notified on Slack or not, you

know, because I don't want the noise.

So you'll have the ability to build
your own, so I call that agent because

it's de facto like a real agent.

Some people might call that like
agentic workflows or something,

but it really depends what
will be your use case.

but yeah, we want to, you know, I realized
that It's actually a dream to onboard

an agent that is doing work for you.

and that dream is true today,
actually, saying that in, you know, not

overselling the thing, but that works.

That said, every team
has a unique CI, right?

every time you have a unique set of
tools, unique set of, best practices,

you are using several external
services and infrastructure, data

that you are managing somewhere.

And so all of that is unique to a team.

And you cannot have an agent that
will figure this out entirely.

And so we're gonna make the ability
to plug those data and this,

this integration into the agent.

So it can actually Work in a
more, specific way to your needs.

so that's really what
we are building today.

And there is a lot of demand for it.

Um, actually two things.

There is a lot of demand
for customization.

and plugging to other data
sources and other workflows.

And there is another demand
for, having more automation.

it's very interesting to me to see,
and you said it well earlier, is.

it's interesting to me to see that, people
actually trust AI tools a lot today.

Like some people ask us like, can
you actually for, some of those

problems, like automatically
open and merge the pull requests?

And like, well, you know, it's going fast,

So, so

Well, that's the,

talk.

we'd get there.

Yep.

the trust, right?

Like if you had, like on the, on
the live stream, we had a couple

weeks ago I was talking about that
I don't remember the last time.

My coding agent locally.

and that use, I use Opus, I
use sonnet, basically any state

of the art model I use GPT 5.

4. I have all the subscriptions.

I don't remember the last
time I would have classified

it as a true hallucination.

it has been months.

Right now it gets things wrong,
but it's usually because I was

lazy and didn't give it context.

Right.

it just made the wrong choice
because it acted like its first week

as being the engineer on my team.

And I blame myself for that, right?

it's a me not you problem.

And so that has definitely, for me, I
have certainly established more trust.

Like I might be at that 85% percent
level like keep talking about.

And if I did this in CI,
if I was using Mendral.

and, you know, I don't technically I
don't have to know what agent, what models

you're using on the back end, right?

Like, I just know this thing is
able to give me plans reliably that

I agree with, and it gives me this
implementation plan to fix a problem.

And I go, yep, that sounds
like it's great, a great plan.

That's exactly what I would see in
a PR from a junior engineer while

I'm reviewing it, right, I just
happen to be reviewing this pre

pull request, implementation plan.

And if I did that a hundred times
over the first couple of months of

onboarding a tool and I never saw, or,
you know, almost rarely ever saw anything

wrong and those wrong things weren't.

That wrong, they were
just maybe a preference.

I would absolutely be more trustworthy.

And I can imagine myself right now in
a team where if I'm onboarding uh, an

orchestration agent engine like Mendral,
where I'm going to tell the team, okay,

they have this new automation feature
that they've just launched so that we

can fully allow the AI to make the PRs
and then commit them automatically.

And maybe it's two different
models with two different contexts.

That one makes it, another one reviews it.

Right?

We've been discussing lately about
how, like, when are we all going

to be comfortable with the AI
writing the software, and then the

different AI reviewing the software?

And does it need to be a different model?

Does it need a different system prompt?

Like, you know, these are questions
that are coming up actually

within uh, the guild meetings
that we are having every week.

And I can see myself very quickly
saying, well, we're going to allow

it to auto merge any linting failure.

Because it hasn't been wrong
in its implementation plan in

two months or whatever, right.

And we're going to tiptoe in with low
stakes stuff and then, you know, maybe

we can make a rule where Dependabot, this
is another big pain point for me, right?

When you have dozens and dozens of
repos and Dependabot, or renovate

comes out with a minor update.

And typically, it's, a wonderful life if
you're in a monorepo right now, because

if you're in a bunch of microservice
repos, you just have sprawl of repos, you

know, one JavaScript dependency module
update, and suddenly you have 20 PRs

to approve and they're all the same PR.

So they're, and I don't believe
that Dependabot has a, just do this

for me in all 20 repo mode, right?

it doesn't automate that process.

So then I'm literally going through
and clicking, I think I'm probably

at the point of comfortable with
my local AI saying, Hey, just

use the GitHub command line tool
and look through all of my repos.

this would be a very long prompt, but
look through all my repos for this one

particular Dependabot for updating.

It's got this exact title because
they're all going to have the same title.

And if you see that in there, go
ahead and accept it and merge it

with the GitHub command line tool.

I do think I'm at that level, but,
so if I'm at that level, and I'm not

even the most aggressive AI person
I know, like I'm not even, I haven't

even installed OpenClaw, right?

I haven't installed any of these
crazy, orchestration engines.

I'm sure that there's lots of people
that are absolutely comfortable with

this and I'm, I am all about this
because there has never been a good

time to tell the story of DevOps
automation platforms as a thing that I

can implement in a reasonable amount of
time for a reasonable amount of money.

everyone that I know that's
struggling with this, you

know, don't want to entirely.

invested in a proprietary tool sometimes
and they want to open source everything

themselves, which is the tough road ahead
for anyone that's trying to do that.

because I'm doing this in my courses
where I'm trying to explain how

to do all these AI workflows that
help you do a lot of automation.

And I can tell you that tools like
Mendral are the easy button versus

trying to do it yourself with a bunch
of more workflows that all do the

things that Mendral's already doing.

But this is, there's, you know, we've
had all these different automation

engines over years that have tried to
become a market dominant force for just

easing the toil on DevOps engineers.

And I never feel like
anyone's really cracked it.

I don't go into any shop and find that
they're all consistently, or a majority

of them even are using one tool beyond
GitHub Actions to automate things.

And very rarely do I see
people with GitHub Actions.

Only the, like most mature teams that
I see in GitHub Actions are doing

things where they might have like a
central repository of actions that

are aggressively doing things on other
repos in an automated way, like checking

the security settings across all my
org repos in order to make sure that

we're not exposing a security risk
by allowing, you know, forked pull

requests to automatically run actions,
for example, which is one that's really

biting people in the foot right now.

Like that thing needs to be locked down
on every repo, and there's no tool built

in to tell you what that setting is, so
you have to literally either hand code

something check it yourself or create
actions that do all this automation.

And it feels like we're right at the
cusp of just, I just write a, either

a skill for this or I have a tool
that automatically does this for me.

I do like this agent idea because I
would like to put in like a DevSecOps.

That's the next one for me where I
wanted to go and actually like, look

at these org and repo settings and
report them, port back when it finds

one that's not set properly for me.

Maybe something that I don't want
to be a linter, but I want it to be,

you know, security checks on my repos
and on GitHub itself that an engineer

doesn't want to have to run that script
manually every day and then do all the

work of fixing the things every day.

I'd rather just an AI do that,
'cause it's a binary thing.

It's like either this setting's
checked or it's not, and if it's

not checked, you need to check it.

This is a very

basic thing.

Yeah.

there are a lot of things like that.

Like for instance, when you start,
putting together some compliance

like, you know, soc 2, for instance.

there are a lot of controls that, you
start implementing that very important

and sometimes time consuming as well.

Like you need to have a human
checking that constantly or regularly.

and so yeah, this
compliance is another thing.

And security in general, is something
that uh, we think we can help with.

you mentioned Dependabot and the noise
that it causes and, we had some early

prototypes of having some rules that are,
when some rules are met, actually the, I

don't even want to know about the PR and
I just like merge the thing and so so we

if it's a point release, yeah, if
it's a patch release, just do it.

Just do it.

Yeah.

Yeah, especially the build
works like the CI passes, like

I don't want to deal with that.

And so yeah, we had some prototypes
where we automated entirely

some of those, use cases yeah.

And I think they, from the feedback
we're getting from our customers,

people are getting there actually,
um, they are getting ready and

so, yeah, it's very interesting.

And think very soon you'll
have the ability implement

your own DevSecOps agent.

we're going to keep adding
more agents ourselves too,

because magic, plug and play.

like it when you're on board because you
don't have a lot of time evaluate another

tool so you prefer to onboard it, it run
on the side and see if it's valuable.

And then if it's valuable you want
invest more in it, which means

defining your own agents, and then
on top of that, we do orchestration

on top of this fleet of agents.

so they are called at the right
time with the right context.

You mentioned something
also about hallucination.

I think it's very interesting because
I think you're right that LLMs got

better recently in the last few months.

And, they're definitely a lot more
powerful in term of how they think

and the kind of mistakes they make.

we actually built a lot of, engineering
around the LLM to deal with this.

And so we realized that, you know, the
era of the rag is kind of over now.

Like you don't need to pull like
an entire context and try to guide

every single thing that the LM
should do, or, or should consider.

Instead, the prompts are getting
smaller and you put a lot more

intelligence in the tools themselves.

and what I mean by intelligence is, for
instance, in the case of Mendral, we

do static analysis on the tool calls.

And so we are able at runtime when
the agent calls some tools to detect

some drift from initial mission.

and we, nice thing is you can, you have
the ability when you do that correctly,

with result of the tool call to actually
influence the thinking of the agent.

So it's almost like Yeah.

That's great.

the agent starts somewhere and
starts doing something, and then at

some point he goes on the side on
something, you know, useless calling

comments or, you know, some things
that are not actually very useful.

You can actually steer it back to plan.

And so we spot those things
in the tool calls and say, no,

no, actually don't do that.

Do this instead.

So for the agent, it's like I'm calling
a tool, and it's weird because the tool

is telling me to do something else.

So that's really works.

But the nice thing at the end is that
you, have almost a dynamic prompt.

So instead of having like very long
prompt that you pass initially.

You let the agent pull the context,
and from that context, you can actually

dynamically change that prompt at runtime.

And that's really what
gave us best results.

coupling that with sub agents,
it's actually very important.

Claude Code is doing that really well too.

Like when it

explores codebase, it's doing
that in a sub agent because

you don't need the whole thing.

All context of the exploration
back into the main loop.

You don't need that once you got the
results, you want just that result to be

in the cause of your LLMs moving forward.

And so that's also what we do.

And so so yeah.

It's very interesting to see that like
those patterns being LLMs getting better.

I think all of that is getting
closer and closer to, the kind of

work that a human could do For you.

And so, yeah, that's really
what motivated us to start this

company in the first place.

Because you can sort of foresee, I can
imagine that you can foresee like these

things are getting better at a steady
pace and it's not just the models that are

getting better, we're understanding better
how to, you know, because we don't just.

you know, taking a blind model and
putting it into a random situation

where you need to have it write code.

it's like, to me it was like bringing
a kid to school straight out of

university who just learned how to
program in that language and sitting

in the chair on day one and saying,
okay, now, write me some code.

Commit it to the do all these
things without any context, right?

We just didn't understand
that we needed context.

And of course we had smaller context
windows, so that was also a struggle.

But like for me recently, I use OpenCode
a lot more than Claude Code now.

every time I keep trying to go back
to Claude Code, even though it's

got some really cool things that
there's not yet in OpenCode, they're

back and forth It's my two favorite.

And I talk about this a lot, I
think I probably mention it on

every show nowadays because I'm
just obsessed with it all day long.

But started to integrate LSPs, which I
think is leveling up code accuracy the,

we don't really see it happening, but
just feel like my OpenCode, because LSPs

are in there out of the box we're with
the Claude Code, I think you have to

actually add the extensions manually.

where OpenCode dynamically injects it
when it sees a language in real time.

And I just feel like OpenCode
for me is a little bit better.

And I don't have a way to prove this
theory, but I think it's maybe because of

that LSP background where it's constantly
helping tools, helping keep it on the

rails essentially, of how it's writing.

And this is all I feel like leading
us to what some of the experts out

there for the last year, over a year
really, I think last year I saw a great

talk from the president of, I think
it was Gradle, talking about not only

is AI coming for DevOps and operations
because it just has to, because the

software development lifecYCle can't
be optimized for agentic coding without

the rest of the pipeline also improving.

if we're going to improve 2x,
the entire pipeline, the entire

lifecYCle has to be improved, 2x.

We can't just have developers tripling
their PR rate and then the rest of

us all act like nothing's changed.

we're going to have to accelerate,
we're going to have to use AI as well to

accelerate, unless we're suddenly going
to double the number of ops people, which

nobody, I don't see any teams doing that.

they're going to have to use AI.

And in that premise, if we're possibly
at this moment in time, if we consider

these more junior engineers, not so
much senior engineers in terms of their

overall intelligence and accuracy.

If that's the case, then we're going to
need better guardrails, we're going to

need more testing and, you more rules
and more guidelines for them to follow.

And then we're also going
to need to remediate faster.

A lot of times when people talk about
remediation of failures or recovery from

failures, at least in like the Kubernetes
world that I live, a lot of people

are talking about that from just being
able to detect failures in production.

Where I live is more in the CI world, and
that remediation is more important to me.

And I feel like that hasn't
been clearly unlocked.

And it feels like tools like Mendral
are way forward in that regard in

terms of they're going to help me
recover from failures faster so that

I can, you know, this is all in Git.

like we were this entire
conversation is talking about Git.

GIT is protocol that
allows me to undo mistakes.

So if we're so apprehensive and we
sometimes in DevOps, especially in

ops, we get so apprehensive about
change and we're constantly fighting

our one side of our brain wants to
not change anything because it works

right now and we're just fine with it.

The other half of our brain's like,
this is all needs to be better.

It could be so much better.

Let me fix things, let me improve things.

And that tension is just
naturally in our brains.

And this feels like a
way for me to go faster.

Also, understanding that maybe it's
not going to get it correct 100 percent

of the time, when it fails, it's
also going to elevate the failures.

It's also going to find the failures
and fix them faster so I could go faster

if that means, well, that outage might
be 10 minutes on that test failure.

we broke a test, we fixed it
within 15, 20 minutes, nobody

even noticed, we're all fine.

So what's the real risk here, if
this thing is really just committing

PRs against my infrastructure,
it doesn't feel like a huge risk

because I'm not having git repo.

Right.

at least that's how I take it.

is the stakes are actually a
little lower because I'm in git.

thing you said actually that's very
interesting is, just about going faster.

I, heard people telling me that
the production of code has been

solved it's going to be fine.

And I think it's about to be solved,
honestly, using a lot of AI ourselves.

But I think there are some physics that,
are ruling the world that are not going to

change, which means, you can write a lot
of code in parallel, but at a given time,

there is only one version of your code
that goes to production at a given time.

And so you, you always need
that no matter what happens.

and I'm sure the software
delivery is going to change.

We're going to participate to that change.

It has to change, that's for sure.

Like when you even just a detail.

But when you look at how PRs
are being reviewed today?

Like, realize that maybe
it's not the right, paradigm.

be done differently or possibly
with different tools actually,

But in any way the

to do human q a, by the way, Q, not Q,

A, QA, like we used to do human qa.

Now, if you're doing human
qa, your legacy, like if

you, yeah, this could happen,

you still need to do, it's actually a
very good example because you right?

it's automated you still need to do it.

And I think the same thing happens
with, software delivery at a given time.

You need an integration loop that's
going to make sure that you can

actually ship the one single version
of your software at a given time.

you don't have this constraint when you
publish code thanks to Git, actually,

and you know, you can open any branch,
any PR, anything, but you still need

to integrate them at some point.

And I think that that's going to stay.

and so even though all the processes
and the way people work is going to

be different, but, yeah, no, so that's
actually very interesting, to hear

in terms of going faster and what are
all the things needed to go faster.

It's not just about writing code.

Let me ask you real quick
on a very specific subject.

our friend Victor Farcic, you might
know him, DockerCaptain alumni and,

YouTuber, who has been doing a lot
of AI videos over the last year.

We had a conversation recently where
he believes, and I totally agree with

this statement, that the harness, our
local harness, whether that's Claude

Code or OpenCode or Copilot and VS
Code or however you want to roll, That,

that's going to be the way that not just
devs, but maybe DevOps and operators

and platform engineers and SREs.

This is going to be like our window to
the world, and the more we can stuff

context into it, the more we can give it.

learned yesterday from another DevOps
engineer that's making his own, he

has his own skills, essentially.

He's not using skills for this particular
thing yet, but he's creating a me skill

and that's how I this, and I'm going
to practice this in the next week and

see if it helps there's a rising theory
that we shouldn't be telling the AI.

Hey, you need to be an expert marketer.

You need to be in this role,
you need to be an expert SRE.

But it's more important that we tell it
about us and what, how we work and what

we expect, rather than tell it what it is.

And so one of these engineers in the guild
was saying that he's had a lot better

output of his LLM by describing himself.

And he, I think, I think he injects it
as a command or something in his harness.

I see of that as more of a skill
that I just need to flood into each

conversation at some point so that every
session, the AI knows more about me

and my role and how I want to operate.

And so getting back to Viktor, Viktor
feels like this harness that eventually

will know us personally better.

It will have maybe it's
more docs from the team.

It'll have more access to.

confluence or Jira or Notion or
whatever you might have that gives

it more context about environment.

So thus, that seems to be the best
way forward for interacting with all

of our systems, not just our code.

This the theory he has.

I think this feels like a
pattern a thing that I want.

Do you see Mendral as being
something rather than in Slack

what is the primary chat interface?

You know, obviously there's
this great dashboard.

Probably going to always
have the dashboard.

I don't gravitate to Slack right now.

Like, I'm not someone who jumps into
Slack to have a conversation with AI.

I always think about
it being in my harness.

do you see it being like an MCP or an A2A?

I don't really understand exactly
how agents talk to each other.

I don't actually currently
have any of my own.

How does that, what does that
future look like for Mendral?

So definitely it's very interesting
topic and, so short answer is yes, yes.

I do also have a single interface and
I, constantly, tweak it and improve

it and customize it to, to my profile,
my needs, my skills, all of that.

And so, so, yes.

And, and today Mendral has a lot of
knowledge and he's building a lot of,

fairly large context at a given time about
your state of your software delivery.

And we started to get people asking us,
Hey, can I use that knowledge locally?

Because we are actually doing
a lot of things locally.

You know, I mentioned software
delivery is gonna change.

And one, I think one of the biggest
change that's gonna happen in CI/CD is

a lot more things will be done locally
on the machine, that's for sure.

And so, you know, starting with the
reviews, of code, like Anthropic released,

and there are more and more people doing
reviews locally before they land and

I think that's only the beginning and,
I think, um, eventually, we can expect

the, code to land on your CI to be more
and more perfect because there is a lot

more thinking and calls and and back and
forth happening on your local machine.

And so, yes, indeed, we want
Mendral to be available, locally.

so it can be included entirely with
the way you do work already and

also bring this knowledge before
it lands on CI, kind of sucks today

that every time you need to, there
is this big surprise that happens

when you have to kick in CI runner to

verify certain things
that you cannot verify.

Otherwise, I think that needs to change.

And so yes, in terms integration,

right?

Like, this is why you were working
on Dagger for seven years, yeah.

Exactly.

that's exactly right.

And, in term of implementation,
you mentioned, MCP, A2 A, I think,

think what's, where most people
are moving right now is by having,

really good, well documented CLIs.

' cause again, LLMs actually perform
Like the human brain, not at

the same level, but it's very
similar in the way it thinks.

And so do you think your software engineer
would behave better with an API and

MSCP server or with a great CLI, And
when you look at Claude and skills, it

actually works much better by using CLIs.

and so I think, um, MCP is
good for certain things.

but I see it.

Exactly the same as an
API API MCP same thing.

it actually works better usually when
you have remote MCP servers, but remote

API or remote MCP server, at the end
of the day, it's not very different.

Uh, and so I think the best way to
integrate with some of those APIs is

to have a really good CLI That your
harness, as you said, can integrate with.

And so that, that's what we're planning
to do eventually, is to have, A-I-C-L-I

that gives you all the capabilities as
you could get on Slack or the dashboard.

Those are just front end to the
engine that we run on the backend.

So yeah.

short answer, yes, and I'm glad you
asked because, I think it shows also

that you're, pretty advanced with
your own harness because not a lot

of people are not like that today.

You know, sometimes it feels, when
we're talking to each other, we're

like, yeah, of course it's obvious.

You know, need improve your skills,
your customization, your profile.

And I do that too.

It's just that so many people out
there who are still figuring out what

they should do with AI, you know,

Well, and the reality is that six
months ago I knew none of this, right?

We didn't even have skills
until six months ago.

Like there's just, yeah, there is so much.

And we're all, it's the only reason
I think any, any team that is in

our sort of level of maturity or
beyond is getting productive is

because the AI is doing the work.

Like we are, we're so consumed
with having to learn patterns.

I'm reading constantly.

I've had to adopt a ReadWise reader in
the last year as critical part of my

learning workflow where I just dump every
tweet, every blog post, every YouTube

that I think is interesting around AI
that I think I probably should consume.

I throw it in a read, shout out to
ReadRise Reader, uh, Readwise is the

company the product reader is the
app that you can consume all these

different types of media in one place.

And I can log, I can have it summarized.

I can tag it, I can do all
these great things with it.

It's kind of like to me, the graduation
of the old Feedly or the Google reader

or the RSS readers that we used to have.

But it's kind of becoming like my podcast
and YouTube player at this point too.

Instead of using the algorithms, I
just dump things into it that I think

are interesting from the algorithm.

And then when I want to be
focused and learn, I go there.

So that's like a hack for me
to keep up because well, and

I'm not nobody's keeping up.

Like this is a crazy time.

it's insanity.

no one can actually know it all.

No one's an expert yet.

So it's exciting to see that because
like, do you know specifically how we

would implement that in terms of me
having a Claude Code in front of me?

How would that talk to
Mendral, like getting into

your architecture for a second?

is that an A2A, I'm not
smart enough to know.

Is A2A the thing that would allow my
agent to somehow talk to like an API

on your system that has an agent.

Do you know anything about this stuff

yet?

so, I mean by a CLI is, I think a
good example of that would be look

at the difference when you use either
OpenCode or Claude Code, or even Cursor.

Like look at the difference of interacting
with GitHub with on one side, uh, GitHub

MCP, which is actually quite good.

And on the other side, the GHCLI.

I don't know if you tried both
actually, but, I invite you to do it.

you'll see that there is a huge
difference between the two.

Basically TLDR is one works really
well, the other is a bit clunky.

I'll Let you guess which one.

Well, the CLI is great because when
you look at the agent session and how

it navigates the CLI, it basically
reads, reads the help, it requires

your CLI to have really good error
messages so the agent can actually

react from it, basically, it's the same
thing like you, your, your CLI needs

to be intuitive so a human can use it.

If a human can use it well without
much documentation, it means an LLM

will use it pretty well because it
can read those messages and, and can

interact with the CLI and so yeah, when
we do, we got a lot more success with,

really good CLIs when we integrate
with services than anything else.

And then it's obviously your job as,
I mean our job for Mendral, to make it

work Well, you know, with our API and the
CLI, but for the user, and for both the

user and the agent, the, you know, your
local agent, like, it doesn't matter.

the, the CLI,

the CLI works well, is able to
expose the right context is able

to grab the right input and output,
and have, the right integration.

All of that is obviously complex,
but it's the problem of the person

building the CLI and the API behind.

So, yeah, I wouldn't think too much
about the best way integrate, like

you can actually write a skill, that
says, Hey, you have the CLI, X, you

know, and the CLI is to do this.

some context.

That's it.

The integration is done and it will
work much, much better any MCP server.

Yeah, I just realized while you're
saying that, maybe a question I should

start asking products that are on
this Agentic DevOps podcast is, are

you prepared for the AI to sign up
and use your tool versus the human?

Exactly.

that's like now we have Stripe doing
this, we have, people booking their planes

with AI and OpenClaw's doing all this.

And I'm wondering for companies that
are built in the AI era, and actually

using AI just like AI centric, right?

are they also, presumably thinking,
well, let's see how far Claude Code

can get just signing up and using the
tool implementing our preferences since

presumably your local harness knows
more about you and your infrastructure

than Mendral does on day one at least.

usage scenario that you're considering?

Yeah.

And, I think that's, you know,
there are, even startups right now.

Like, I think there were a few in our, our
YC Batch that are specialized in giving

access, like to, to all the services out
there, like booking a flight or anything,

giving it access to agent harnesses.

so for instance, creating actually A CLI
on MCP server to interact with some of the

services that today are only accessible
through a dashboard and some clunky ux.

so yeah, I think the web of tomorrow is
gonna be adjunct and, it doesn't, very

interesting because it's not so important
anymore to make, interface that's really

good beautiful for humans, but it's more
important that it's ergonomic to an agent.

so yeah, definitely.

right now for us specifically, Mendral
is giving you like a turnkey solution

in a few clicks, but I realized that
the integrations with other agents is

gonna be key moving forward because
what everyone wants is not a single

agent, it's actually a team of agent.

So even when you use Claude Code, you
are actually already using several agents

underneath, locally on your machine.

And when you call to remote, services,
you already have a team of agents with

some agents running on your machines,
some other agents running remotely.

And I think that's the future.

That's what people want.

Yeah.

All right, we're going to do
some rapid fires real quick, but

before that, I think I have one more
question on the future of Mendral.

Like we've been talking about GitHub.

Not everybody that listens to this
podcast is maybe using GitHub.

We've even got some people, Kurt, I'm
hearing news recently that some teams

are leaving GitHub for various reasons.

Are you, like, where do you
see going out the rest of 2026?

Do you have plans for other tooling, other
platforms, other, anything like that?

Like,

we're already working on some of it.

Yeah, we started to work with
bigger companies lately realized

that some of them have different CI
needs, you know, things like we got

questions for CircleCI, buildguides,
so yeah, we're gonna support them.

the Mendral today is, we talked
a lot about GitHub, because I

would say So, based on what we
see from people, it's probably 80%

percent of the demand, at least.

so big still, but do agree that it's
going to disappear over time, or at

least reduce, you know, for me, GitHub
is more like a protocol nowadays.

because when you look at bringing
a GitHub app that replaces GitHub

Actions with another, CI system, or
even replace the GitHub action runners,

or bring another review tool on top.

some people migrated to linear
instead of GitHub issues.

So yeah, everyone is grabbing a
piece of GitHub and integrating

with GitHub because they have to.

So it became more like an integration
protocol than anything else.

And so, we have to follow that.

And so we made Mendral CI
agnostic by design, from an

architecture point of view.

we haven't built all the integrations
yet because there is still a high demand

for GitHub Actions, but it's gonna come.

Yeah, I secretly wish that they
would just open source the, the

non open source parts, the API
essentially for GitHub Actions.

That way we can have local runner, like
we can have, we can do this all locally.

We can still, you know, I feel
like at some point the automation

engine behind GitHub Actions
is just going to be commodity.

And I'm looking forward to that future
because we do have all these rough edges.

It does feel like a very
low level, raw tool to me.

but I, you know, I absolutely love it.

you know, I use it every day.

I make money by selling courses on
it and stuff, so I obviously love it.

It's just, it's there's a lot there
that could be improved and, you know,

they would need, like, you know,
another hundred engineers on that

actions team to move at the pace that
I think it needs right now in AI.

I mean, you talk about this as
like, we need these other tools.

There's like a future
with these other things.

And even GitHub's, former own CEO has
left, started a new company, funded

I think 60 million by Microsoft, or
at least partly by Microsoft to help

solve this agentic coding problem.

They sound like they're going to be a
layer on top of GitHub, which again,

makes me feel like GitHub is becoming
like a cloud provider in a sense, just

for code, because they already have
been, but they're just going to be this

thing that we maybe don't touch that
much, and that lots of people are using

things on top of it, but don't ever
actually have to go there because the

AI is the one submitting, you know, I
don't type Git commands anymore, right?

there's so much I don't do, and the
only reason I think I'm even still going

to GitHub is because one habit, like I
got a lot of bad old habits to break.

I've got a lot of things that I should
be asking my local agent to do or look at

or go find out that I'm manually going,
doing, and doing, and I don't know why.

feels like I need to sometimes just
break myself the habit and see how

far I can go in the day without ever
actually going to the GitHub website.

Um, because you're right, like the
GitHub CLI is doing a lot more.

And now, I mean, even now we
have the Gmail, CLI like a Google

Workspace launched recently, a CLI.

so you can access all of these workspace
tools, Google Docs, Google Gmail, Google

Drive and all that stuff from a CLI.

So yeah, that, that's awesome.

the CLI future, I feel like is
strong and I'm here for all of that,

'cause one of my favorite things to
use go for is to buy, to make CLIs

for solving my own little problems.

And I've already got, like most
of us, I think probably half dozen

local projects that I'm just making
CLIs for my own use to feed back

to the AI to do things for me.

So it's just a fun time.

We could talk forever.

I love talking to you about this platform.

but a couple of quick
fires for the audience.

I think I'm gonna need to start
putting these in my show just to start

asking engineers that are on the show.

current favorite harness.

Oh, I use Claude Code a lot actually for,

even more than code.

Yeah.

I start to customize it for
automating some of my non

technical work too, actually,

Okay.

Are you into, like the new Dispatch
and the Computer Use yet, is that

like for personal stuff or for
things that aren't necessarily code?

Are you leaning into some of
the cloud stuff with Claude?

so usually I use a combination
of, a couple of things.

I, I like a lot of the Anthropic products
and so, I use Claude Code, the CLI,

uh,

locally for most of the.

The code, I started to automate, a lot
of my code boring work, even with Claude

Code and some integrations locally.

And so I maintain like a catalog of
skills, uh, of personal skills that

I used to automate some of my job.

The problem with Claude Code is
that very specific to code also.

So I'm aware of that and so sometimes I
go to Claude Desktop, which I think has

a lot of, you know, a way of managing
the context that it's entirely different.

The way it manages memory and
context is very different.

So I like Claude Desktop.

I do not like Claude Cowork, Actually,

That's some of the things that
people start talking about and

like, I do not like it because I
think it's not as advanced as Claude

Code in terms of context management
and multi agents and all of that.

I think it will get there eventually.

Yeah,

and then I started lately
and that, one might make you

laugh, but it's kind of weird.

started to use the Claude Code, iOS app.

the cloud app, has a
Code, tab or something.

you have a kind of a tiny version
of Claude Code inside your

mobile, and it's using sandboxes.

And so I started to use that to make
some very simple PRs on the repo.

use that only for very simple stuff,
you know, like, oh, I need to update

that on the landing page or something.

I do that from my phone.

And that's kind of scary because it
works quite well for simple stuff.

So it's almost like a preview of what
we're going to be able to do tomorrow, you

know, like almost like talk to your phone
or something and work happens and code

gets pushed to progress.

yeah.

so,

I feel like we're so close to, you
know, the Tony Stark Iron Man, Jarvis.

feel like we're getting so
close, at least for developers.

Like, I feel like we're the
first wave of really the people

that are onboarding with this.

I went and saw last night

For the third time in two weeks,
I went and saw the movie Project

Hail Mary, because I read the book.

My wife and I loved it.

We're big Andy Weir fans, and this
new movie is amazing and perfect,

and Ryan Gosling is fantastic.

And we sat next to some people that
were I'm really, really into it.

And so we started talking the fact.

the movie theater.

the kind of person that
loves a full movie theater.

think the reason you go to movies is
to the experience with other people.

So the guy next to me, leans over at the
end and says, we started talking about the

movie and how it's a positive experience.

view of the future where the world's
actually collaborating and working

together to solve esoteric problems.

And he said that there's this
subreddit he's a fan of called,

I think it's called Humanity.

Hell yeah.

Or Humans, hell Yeah.

Or something like that.

That's like a post war Look at the future
of civilization where we all tend to agree

on that and we're solving bigger problems,
we're dealing with aliens or whatever,

and then we all just, hold hands.

And so we start talking back and forth
and I get home and I'm walking the dog

want to know more about this subreddit.

So I'm in the OpenAI app, and
the ChatGPT app essentially.

And I'm having a a speaker
conversation in the audio mode.

And I'm just talking back and forth with
the AI and I'm living in a little bit of

an urban area, so there's other people
walking their dogs and I'm realizing that

people around me are hearing someone.

I, I sound like the old guy on
speakerphone walking around talking

to someone on speakerphone, But I'm
actually just talking to an AI about a

subreddit and sci fi, and I'm realizing
that I'm walking my dog and I'm way

more advanced than everyone around me.

I'm like doing things that are sci fi
level, future where I'm talking to a

robot, but also I look like grandpa,
talking to someone on a phone, on

speakerphone, that's way too loud.

It's like really weird scenario.

So I feel the same way.

Like think these things are all, happening
so fast and it's very cool to do.

Um, okay.

last question.

I'm assuming you're using Opus a lot.

Is that your, uh, your, Opus 4.

6 person or are you open 4.

5 person?

know some people that didn't jump into 4.

I like Opus 4, 6.

But, actually not using it for everything.

I'm a big believer of, the
right model for the right task.

Mendral also is built this way.

We use Anthropic models today,
and we're going to move, multi

models, multi providers, soon too.

I believe there models that are
really good for certain things.

Opus is really good for complex
reasoning, but it's also very slow.

And sometimes you don't, need to,
like, if you summarize an email or

something, or like if you want to
rewrite some things that you wrote.

so yeah.

but I'd say if I have to pick one,
yes, that would be my go to, but

yeah, usually I multi modal
based on what I'm doing.

Nice.

thank you so much, Sam, for being here.

We could talk for another hour.

People are already going to go, Bret, like
you can't keep having these multi hour

podcast episodes, but I feel like there's
so much to talk about and it's great to

actually talk to not just founders, but
engineers in the thick of it that are

also living and drinking the, you know,
the Silicon Valley Kool Aid a little bit.

it's great to see the different
levels or I guess maturity levels of

everyone, that we have on the show
and to see where everyone is at.

And always fun to have people
on the show are ahead of us.

And I feel like, you know, touching
the AI even closer to what we think is

the utopia of this Star Trek future.

I feel like we're in.

So I'm excited that you guys are
progressing so quickly on the

product and I'm looking forward to
using more of it, especially since

it's scratch is a niche for me.

I'm looking forward to having you back
on the show maybe later this year and

talking through some of the advancements.

We'll probably have new models by then.

Supposedly Claude's going to have this
amazing new model later this year.

We'll see see whether it lives up to
the height, but, it'll be interesting

to see what you can do with some
of the new stuff that's coming out.

Yeah, thanks.

a lot for the opportunity to share
all of that and very, very cool.

Didn't feel like a time at all.

I'm glad.

Uh, so yeah, you can find Mendral
at mendral.com, M-E-N-D-R-A l.com.

well thanks again, man.

Thanks for joining us, and I'll
see you in the next episode.

Episode Video

Creators and Guests

Host

Bret Fisher

Cloud native DevOps Dude. Course creator, YouTuber, Podcaster. Docker Captain and CNCF Ambassador. People person who spends too much time in front of a computer.

Producer

Beth Fisher

Producer of the DevOps and Docker Talk and Agentic DevOps podcasts. Assistant producer on Bret Fisher Live show on YouTube. Business and proposal writer by trade.

Editor

Cristi Cotovan

Video editor and educational content producer. Descript, Camtasia and Riverside coach.

Guest

Sam Alba

AI DevOps Engineer | Co-founder @ Mendral (YC W26) | ex-Docker (1st hire, VP Eng)