I am a big fan of Marimo and was trying to use it as my agent’s “REPL” a while back, because it’s naturally so good at describing its own current state and structure. It made me think that it would make a better state-preserving environment for the agent to work. I’m very excited to play with this.
I keep getting hung up on securely storing and using secrets with CLI vs MCP. With MCP, you can run the server before you run the agent, so the agent never even has the keys in its environment. That way. If the agent decides to install the wrong npm package that auto dumps every secret it can find, you are less likely to have it sitting around. I haven’t figured out a good way to guarantee that with CLIs.
A CLI can just be a RPC call to a daemon, exact same pattern apply. In fact my most important CLI based skill are like this.. a CLI by itself is limited in usefulness.
> what stops the agent from echoing the secure storage?
The fact that it doesn't see it and cannot access it.
Here is how this works, highly simplified:
def tool_for_privileged_stuff(context:comesfromagent):
creds = _access_secret_storage(framework.config.storagelocation)
response = do_privileged_stuff(context.whatagentneeds, creds)
return response # the agent will get this, which is a string
This, in a much more complex form, runs in my framework. The agent gets told that this tool exists. It gets told that it can do privileged work for it. It gets told how `context` needs to be shaped. (when I say "it gets told", I mean the tool describes itself to the agent, I don't have to write this manually ofc.)
The agent never accesses the secrets storage. The tool does. The tool then uses the secret to do whataever privileged work needs doing. The secret never leaves the tool, and is never communicated back to the agent. The agent also doesn't need, or indeed can give the tool a secret to use.
And the "privileged work" the tool CAN invoke, does not include talking to the secrets storage on behalf of the agent.
All the info, and indeed the ability to talk to the secrets storage, belongs to the framework the tool runs in. The agent cannot access it.
If the tool fails for some reason, couldn't an overly eager agent attempt to fix what's blocking it by digging into the tool (e.g. attaching a debugger or reading memory)? I think the distinction here is that skill+tool will have a weaker security posture since it will inherently run in the same namespaces as the agent where MCP could impose additional security boundaries.
I keep wondering this too. It feels like such a self fulfilling prophecy: don’t build new power plants. Don’t build nuclear. Get mad when the grid can’t keep up…it’s defeatist and anti-growth-of-any-sort through a different lens.
To be fair, for decades, electricity consumption has been mostly flat. There has not been a need to massively ramp up new generation or distribution. It is only in the last few years that such mega consumers have come online that is requiring new development at a frantic pace.
Not true. Electric vehicles have been threatening to collapse residential grids for quite a few years now. The US hasn't been making the necessary infrastructure investments for a long time. See PG&E for example.
For something the size of the electrical grid, you can find regional variations, but the national trend is quite clear. One report from a quick search[0]
Consumption Growth Acceleration: After 14 years of near-stagnant growth (0.1% annually from 2008-2021), US electricity consumption surged 3.0% in 2024, driven by data centers, electric vehicles, and economic recovery, signaling a new era of demand growth.
I mean one has to also consider the current political _and_ geopolitical landscape now when it comes to energy needs. And given the current outlook and environments even states are now operating in with federal overreach shutting down offshore wind farm efforts and more, it's not hard to do the calculus that lands you squarely in this reality:
- most grids can't sustain the AI energy demands at the moment
- literally no one could tell you if scaling up with clean/renewable energy sources to meet demand is even going to get greenlit right now. it is straight up gambling to try and give a black and white answer to it.
so to a lot of degrees i absolutely understand why a state might pump the brakes. this is increased pressure on a limited resource that is squeezing _the peoples_ economic circumstances. pump the brakes because no one is talking about how to greenlight it and scale up the right way so it doesn't result in even more financial uncertainty for people that are already financially uncertain. its absolutely not something i would want to give the go ahead on without guarantees that renewable energy is going to be the backbone of the increased energy demand.
I appreciate the response, but I don’t think you realize what people are upset about.
This is a security issue, not just a privacy issue.
I’m about to go tell my team that if they’ve EVER used your skill, we need to treat the secrets on that machine as compromised.
Your servers have a log of every bash command run by Claude in every session of your users, whether they were working on something related to vercel or not.
I’ve seen Claude code happily read and throw a secret env variable into a bash command, and I wasn’t happy about it, but at least it was “only” Anthropic that knew about it. But now it sounds like Vercel telemetry servers might know about it too.
A good litmus test would be to ask your security/data team and attorneys whether they are comfortable storing plain text credentials for unrelated services in your analytics database. They will probably look afraid before you get to the part where you clarify that the users in question didn’t consent to it, didn’t know about it, and might not even be your customer.
Well said!
We built in protections to multi-user and single user systems, but now we seem to be relearning them…your agent is not “you” and should probably not run as the same user with the same default permissions as “you”
Just came to say (since the person you’re responding to has a different view of the world) that I agree with you that this is both a more accurate, and easier way to live.
Assuming malice as the default sounds like a recipe for being very, very unhappy.
I’ve been researching the “best” way to build a little outbound network proxy to replace credential placeholders with the real secrets. Since this is designed to secure agents workloads, I figured I might as well add some domain blocking, and other outbound network controls, so I’ve been looking for Little-snitch-like apps to build on.
I’ve been surprised to find that there aren’t a ton of open source “filter and potentially block all outbound connections according to rules”. This seems like the sort of thing that would be in a lot of Linux admins’ toolkit, but I guess not! I appreciate these guys building and releasing this.
Something almost no firewalls get right is pausing connections (NOT rejecting them) until I've decided whether to allow or not. The only firewalls I've seen do this are Little Snitch for Mac, and Portmaster for Windows (before they made it adware / started locking existing local features behind the subscription).
Firewalls don't do this because they are built at the wrong layer to do proper pending calls. It's too narrow of a design space for most firewalls to care.
True, most firewalls aren't built to pause for user input. But then again, that's why almost no firewall software is suitable for this user experience.
I use Portmaster (on Linux) and I have never seen ads (either in the app or apps that get their DNS from Portmaster) on it. About the only thing I saw different between the free version and the base level paid for version was traffic history and weekly reports (and badges on Discord if that's your kind of thing).
Both used to be free. And you may not consider it advertising when unavailable features exist in the free UI just to tell you they're paid, but I do. Especially when they used to be free.
OpenSnitch seems to do this just fine? Unless I’m misunderstanding your point. Connections seem to just block until I take an action on the dialog. Now, if an application itself has specified a short timeout (looking at you, NodeJS-based stuff), that obviously doesn’t help. But for most software it works great.
This is indeed seriously impressive. I keep wanting to keep my entire knowledgebase on a canvas so that I can "think" or navigate spatially. Thisis neat.
In the main landing page, as I was clicking around, I kept wishing to have a legend to show me either "how deep I am" or "how do I get out of here?", and like someone else commented, I would love an affordance showing me what was clickable/zoomable.
Wow, forking memory along with disk space this quickly is fascinating! That's something that I haven't seen from your competitors.
If the machine can fork itself, it could allow for some really neat auto-forking workflows where you fuzz the UI testing of a website by forking at every decision point. I forget the name of the recent model that used only video as its latent space to control computers and cars, but they had an impressive demo where they fuzzed a bank interface by doing this, and it ended up with an impressive number of permutations of reachable UI states.
I am a big fan of Marimo and was trying to use it as my agent’s “REPL” a while back, because it’s naturally so good at describing its own current state and structure. It made me think that it would make a better state-preserving environment for the agent to work. I’m very excited to play with this.
reply