You’re midway through a pentesting engagement. Recon’s wrapped, and a couple of privilege escalation paths have already failed. You flip over to ChatGPT hoping for something useful, but it offers the usual: SUID binaries, kernel exploits, and weak folder permissions. It doesn’t know your host, the tools you've used, or what phase of the operation you're in—and that’s the real problem.
We started tinkering with a question: what would it take to make an assistant that thinks more like an operator under pressure? One that tracks what’s actually happening in your shell without having to copy and paste over and over again. It watches the flow of your session, reasons over what you’ve already done, and suggests next steps that are grounded in your actual operation, not pulled from some generic playbook.
This write-up shares what we’ve learned so far, what didn’t work, and where we think things could go. Would love feedback from folks building or breaking in the same space.
We started tinkering with a question: what would it take to make an assistant that thinks more like an operator under pressure? One that tracks what’s actually happening in your shell without having to copy and paste over and over again. It watches the flow of your session, reasons over what you’ve already done, and suggests next steps that are grounded in your actual operation, not pulled from some generic playbook.
This write-up shares what we’ve learned so far, what didn’t work, and where we think things could go. Would love feedback from folks building or breaking in the same space.