You can't copy code with memcpy

WalterBright · on Dec 30, 2021

In my 1980s version of Empire, all the global variables were kept contiguously in one source file. To save/restore the game, it just took the address of the first one, the address of the last one, and blitted it to a disk file, and blitted it back.

Very fast & easy.

Of course, it broke when COMDATs were introduced.

I did a similar thing with my text editor. The colors were configurable. The usual way was to have a configuration file, which the editor would read upon startup. But floppy disk systems were unbearably slow. So what I did was take the address of the configuration data in the data segment. I'd work backwards to where those bytes were in the EXE file, and patch the EXE file. This worked great!

Until the advent of virus scanners, which broke that. Virus scanners hated self-modifying EXE files.

halt40 · on Dec 30, 2021

I love these little "factoid" comments that you hand out from time to time. What would be even better is a book with all the nitty-gritty details of the software you've written, along with all the misadventures in doing so.

You can name it: 'Bright Moments'

I'd be happy to place an advanced order if it leans you in that direction!

philbo · on Dec 30, 2021

If we're bikeshedding the book title, surely "Bright Ideas" warrants a mention.

Cullinet · on Dec 30, 2021

Something using Bright Lines would be my first thought.

__david__ · on Dec 30, 2021

See also: Emacs's old unexec() function. How do you speed up a bunch standard library lisp loading and initialization? Just do it once then write a new exe out with all your state pre-computed. Genius.

weinzierl · on Dec 30, 2021

Yup, and early TeX did it too. You could run a memory dump of a previously initialized version to improve start-up times.

For a modern take on similar idea see also this discussion from a while ago:

"Improving startup time in Atom with V8 snapshots"

https://news.ycombinator.com/item?id=14140421

WalterBright · on Dec 30, 2021

I stole that idea from the 1970's FORTRAN program ADVENT, the precursor to Zork.

I was amazed that I never ran across another DOS program that used the technique.

barchar · on Jan 2, 2022

there's also unfork(2)!

Do you unfork then unexec or unexec then unfork?

withinrafael · on Dec 30, 2021

How did it break? COMDAT usage is optional today. (Perhaps it wasn't when first created?)

WalterBright · on Dec 30, 2021

COMDATs aren't placed sequentially by the linker. Also, zero initialized data got placed in a separate section.

floor_ · on Dec 30, 2021

I've pre-packed game asset data to the end of EXE files using this trick and it seems that newer neural network virus scanners seem to the the only type of scanner to pick it up. So we may even lose the ability to do that soon once these type of scanner become more popular/prevalent.

pvillano · on Jan 1, 2022

how would you do this today? could you wrap it in a struct?

SystemOut · on Dec 30, 2021

This reminded me of the old days working in Windows 3.1 and my first professional project was to write a SOCKS client that could be loaded up and intercept all calls to Winsock's connect() function. It needed to do this without modifying the other programs and it had to happen at the DLL level and not the VxD layer where our IP stack ran.

Turns out there was an undocumented Windows API function along the lines of "AliasCsToDsRegister" or something like that - I've tried to find a reference to it but I can't find it. It allowed me write into the code segment (the CS was global and read only - as it was shared among all processes) and replace the first few bytes of the connect function call with a jump to my code which would then put it back, make the call to the socks server, do some other magic, put my jump hook back in and the return to the caller. Good times!

Kind of surprised I remember this and more so that it actually worked.

pacaro · on Dec 30, 2021

Yeah win16 had AllocCStoDS and AllocDStoCS, one was documented the other wasn't. They also had the fabulously named PrestoChangoSelector which toggled the code/data bit in the descriptor table

bonzini · on Dec 30, 2021

IIRC there was a documented ChangeSelector function that was actually not implemented, so you had to use the undocumented PrestoChangoSelector instead. Visual Basic used it to implement a direct threaded code interpreter.

Some other functions had similarly great names:

>>Anyway, I am not sure what type of person writes a function called "BozosLiveHere" and puts it into USER.EXE

>This started out life with a non-bozo name as an undocumented function in Windows 3.0.

>Windows 3.1 removed the undocumented function, but we found that some programs were using the undocumented function and started crashing.

>So we reluctantly put the function back, but changed its name to "BozosLiveHere" so that nobody else would use it in the future.

>A similar story exists for "TabTheTextOutForWimps".

throwaway2037 · on Dec 30, 2021

Cool. I did some Google searching about PrestoChangoSelector and I found this:

KnowledgeBase Archive

An Archive of Early Microsoft KnowledgeBase Articles

Q89560: Creating Dynamic Code Segments Using PrestoChangoSelector

https://jeffpar.github.io/kbarchive/kb/089/Q89560/

adontz · on Dec 30, 2021

https://www.microsoft.com/en-us/research/project/detours/

oshiar53-0 · on Dec 30, 2021

It's AllocDStoCSAlias.

SystemOut · on Dec 30, 2021

Thank you! I thought the word Alloc was in there but couldn't remember the right combination of Alias/Alloc/etc.

skim_milk · on Dec 30, 2021

I remember doing exactly this to get code injection working on GNU/Linux systems! I made a library injection library in college for some coursework, which involved copying a C function into a code cave on a remote process and getting the remote process to execute it and return. It only works because its so bare-bones, it doesn't use try/catch and calls to other functions are possible because the function pointers are passed in through the registers and the compiled code is small enough to fit in a single page.

An example function I memcpy and run:

https://github.com/skimmilk/liblibinject/blob/master/src/lib...

tomcam · on Dec 30, 2021

Wait does C now have try/catch?

basementcat · on Dec 30, 2021

It’s called setjmp() and longjmp().

https://web.archive.org/web/20091104065428/http://www.di.uni...

skim_milk · on Dec 30, 2021

Technically the code being uploaded is compiled from C++ as the rest of the project is written in C++. Kind of similar to the article but in my case all I'm doing is calling a bunch of C library functions from the shellcode.

tptacek · on Dec 30, 2021

It has setjmp and longjmp, so yeah.

kragen · on Dec 30, 2021

As you are well aware, calling setjmp and longjmp doesn't make code non-position-independent in the way that Raymond is talking about, because setjmp saves the return address with which it was actually called. It doesn't rely on PC tables in the executable the way that (a common implementation of) C++ exception handling does. I mention this not to inform you of it (you already understand it better than I do) but to keep anyone else reading the thread from being misled.

tomcam · on Jan 2, 2022

Yeah like me. That was my understanding but I thought perhaps C had snuck in some high level exception handling that I didn’t know about.

justicz · on Dec 30, 2021

"This code is such a bad idea, I’ve intentionally introduced errors so it won’t even compile."

I am stealing this line.

qahaz · on Dec 30, 2021

This was common back in the day with ezines. They would include codes to exploit security bugs in common software but they would have intentional errors so script kiddies couldn't compile them.

koofdoof · on Dec 30, 2021

Are there any good ezines with content like this you could reccomend to read?

commandlinefan · on Dec 30, 2021

Seems like running it through a compiler and fixing the errors would address that for you, though…

thecupisblue · on Dec 30, 2021

Exactly the idea. If you're a skiddie, you'll copy/paste, look for a minute, say it doesn't work. If you dive in, you gotta learn about the language, debugging it, the exploit itself and suddenly you're not a script kiddy anymore. It's like a leap of faith.

behringer · on Dec 30, 2021

Script kiddies can't debug.

t-3 · on Dec 30, 2021

Really? Compiler error messages are downright friendly compared to networking tools and concepts. When I was 12, I certainly found them to be much easier, even though I completely failed to learn C, Perl, C++, or any of the other popular languages of the time.

DoctorDabadedoo · on Dec 30, 2021

What is this mysterious art you speak of? /s

pgwhalen · on Dec 30, 2021

The point is that script kiddies do not have the ability to do that.

howdydoo · on Dec 30, 2021

Raymond Chen is a great writer. He writes about the driest topics on earth, but still manages to make it entertaining. Definitely one of my favorite blogs.

rightbyte · on Dec 30, 2021

And he seems to be one of the few who bothers to keep up writing at MS devblogs.

I wonder how it looks internally. Like, can anyone write there. Apart from a university I worked at management would never let me write any "blogs" about work related stuff.

shaklee3 · on Dec 30, 2021

he really is one of the best technical bloggers out there. I've learned a ton from his c++ articles.

jiggawatts · on Dec 30, 2021

I once replied to a flippant comment about the inherent reliability of the cloud with a two-line PowerShell script that would end any 100% cloud-hosted business if executed in the context of an admin account. As in: pack up everything, turn the lights off, lock the doors, and go home because the game is over.

I deliberately obfuscated the script by replacing characters with various Unicode confusables, both the letters and the symbols.

I still felt bad and ended up deleting the post.

Hopefully nobody tried to type it in manually instead of cut & pasting just to see what would happen...

stavros · on Dec 30, 2021

What, like rm -rf?

jiggawatts · on Dec 30, 2021

There are cloud-era equivalents that will do a heck of a lot more damage. Think bulk account/subscription deletion with a sprinkling of -force and -purge.

Delete locks and the like will stop this… unless you bulk delete them first.

The admins will get warning emails that they will finish reading in abject terror some time after their entire cloud tenant has gone to heaven.

The modern equivalent of “Operating system not found” is “Click the guides below to get started with your new cloud account.”

justsomehnguy · on Dec 30, 2021

https://docs.aws.amazon.com/powershell/latest/reference/item...

# this would wipe all EC2 instances, assuming you are authenticated and there is nothing else need to get a list of them

Get-EC2Instance | Remove-EC2Instance -Force

If there is a way to enumerate objects without explicitly specifying some properties then PS makes it very easy to pipe that output to remove or disable cmdlets.

I never worked with AWS but if I needed to write such killer script I would kill instances (as in above example), remove all S3 objects (quite similar, Get-S3Object | Remove-S3Object) and finally wipe out all IAM roles and users (similar Get/Remove-IAMGroup, ..IAMUser etc). And thesubscription/account things, as @jiggawatts says, I'm just not familiar with the AWS lingo. And secrets/creds, of course.

By the time a human starts to investigate why the monitoring gone mad (assuming it wasn't hosted with all other infra on same account, lol) there would be nothing to do, too many thing are gone already, too many things are on the way to be purged and even if they could be restored there would be too many things to tie it again to something resembling a functioning system.

stavros · on Dec 30, 2021

Oh huh, for some reason that didn't even come to mind. Good thing all our infrastructure is applied by CI and our local tokens are read-only!

jiggawatts · on Dec 31, 2021

Do you gave S3 buckets? They’re just as easy to bulk-delete, and then they’re… gone.

raverbashing · on Dec 30, 2021

It will be interesting if this snippet lands on GitHub Autopilot (though it has important parts missing, even with the minor errors added)

RcouF1uZ4gsC · on Dec 30, 2021

> I pointed out to the customer liaison that what the customer is trying to do is very suspicious and looks like a virus. The customer liaison explained that it’s quite the opposite: The customer is a major anti-virus software vendor! The customer has important functionality in their product that that they have built based on this technique of remote code injection, and they cannot afford to give it up at this point.

As an aside, whenever I set up a Windows PC for me or a family member, the first thing I do is uninstall any third-party antivirus that may have come with the computer. I have found that anti-virus software likely makes my computer more insecure by having a big attack surface, not to mention slowing it down.

kmeisthax · on Dec 30, 2021

Actual Chromium developers have similar opinions w.r.t. antivirus vendors.

It's almost like the extensibility necessary to make third-party security products work requires creating entirely new attack surface for those products to work.

Arnavion · on Dec 30, 2021

You shouldn't trust that uninstalling the AV reliably gets rid of whatever kernel drivers and other detritus it came with. I don't know of any AV examples but plenty of video game anti-cheat software has that problem.

Just wipe and reinstall Windows.

AshamedCaptain · on Dec 30, 2021

It's not as if the first party antivirus inspires any confidence, either. It still loses in performance to some 3rd party ones, and has had its own list of security issues.

For those who remember, its early versions (before the MS acquisition) required the Visual Basic runtime.

ssklash · on Dec 30, 2021

The current Windows Defender is not the crappy old MS AV it used to be. It's gotten drastically better, and for normal home users I always recommend it over installing any 3rd party AV.

I'm a pentester and red teamer, and yes, bypassing Defender isn't all that hard, but neither are most 3rd party AVs, and Defender does not bring in all the instability and additional attack surface. Also I think it can take advantage of newer kernel APIs that non-MS AV can't.

floatboth · on Dec 30, 2021

Defender still introduces performance issues, sometimes it's better to disable Defender too and have no scanning crap whatsoever.

rwmj · on Dec 30, 2021

I think the even crazier thing is Windows having a function to allocate memory in another process (https://docs.microsoft.com/en-us/windows/win32/api/memoryapi...). That seems like a potential source for all kinds of impossible to track bugs.

> The customer is a major anti-virus software vendor! The customer has important functionality in their product that that they have built based on this technique of remote code injection, and they cannot afford to give it up at this point.

Oh now it makes sense.

Diggsey · on Dec 30, 2021

Injecting code into another process is well supported :P

1) VirtualAllocEx some memory to hold the path to a DLL you want to inject.

2) WriteProcessMemory to write the path to that memory.

3) CreateRemoteThread passing in the address of `LoadLibrary` and the pointer to the previously allocated memory.

It's no coincidence that `LoadLibrary` has a signature which can directly be passed to `CreateRemoteThread`!

asveikau · on Dec 30, 2021

LoadLibrary is not guaranteed to have the same address across processes. As of Win7 you might also have a process link to kernelbase and not kernel32, so it's not even guaranteed to be in the same DLL.

However it should be possible to use GetModuleHandleEx to find the DLL base addresss in a remote process, then ReadProcessMemory to implement your own remote GetProcAddress.

NtGuy25 · on Dec 30, 2021

You do end with modules having the same address in every process if untampered though. This is due to Copy on Write windows implements internally with DLL's to save space. So while not guaranteed. On X64 you can be certain that because of COW the module will have the same address.

asveikau · on Dec 30, 2021

That's not true. You're not considering different virtual addresses backed by the same pages.

Yes, the loader will create file-backed memory mappings and not redundantly store read-only parts. However, it is free to load it at a different address in each process. This can happen via ASLR, or if the mapping is already claimed by the time the module loads.

They may get the same base address repeatedly in multiple processes and work most of the time, but it's not guaranteed.

barchar · on Jan 2, 2022

It's extremely likely for stuff from Kernel32.dll.

> That's not true. You're not considering different virtual addresses backed by the same pages.

technically I suppose, but PEs don't tend to be relocatable, so if it mapped it in at different virtual addresses that would be extremely unlikely to be backed by the same pages as much of the just-mapped-in code would need relocs

codedokode · on Dec 30, 2021

Isn't this a way to introduce bugs? For example:

- injected DLL can change some process-related settings and break main program

- injected DLL can call non-reenterable function from a system library simultaneously with main program

bmm6o · on Dec 30, 2021

No need to go that far. If you allow an untrusted process to write to your memory, you've already lost. One thing I haven't seen called out yet is that there is security associated with this API call - ordinary users can only call this on processes that they own.

(And as far as the terminology, these aren't "bugs" since they aren't defects in the software.)

barchar · on Jan 2, 2022

I mean sure. If you can do this you can debug the process (how do you think debuggers are implemented?) and ofc once you're debugging a process you can introduce bugs and cause random havoc.

Linux can do this too with ptrace. And mac has something similar I'm sure, you gotta be able to debug.

Now, the target process can implement countermeasures against you, that's what anti-debug is, but it's impossible in the general case to defend yourself against a debugger with the same privilege level as you. (it is an arms race though, so sometimes it's easiest to resort to kernel-mode anti-anti-debug techniques even against pure user-mode anti-debug. If you do that you need to disable KPP, and you can't disable KPP in the supported way because the supported way is to attach a kernel debugger to the kernel, which will make it obvious that someone is debugging!)

CodeArtisan · on Dec 30, 2021

Code injection is common in video gaming, it is used by the Steam overlay, OBS Studio, anticheats, profilers, ...

jrtc27 · on Dec 31, 2021

You can allocate memory in another process on Unix too: use ptrace to make the other process call malloc (use PTRACE_SETREGS to set PC to malloc and the first argument register to the number of bytes, then intercept the return).

GDB will use this if you tell it something like `p foo("bar")`, as it needs to allocate memory for that string somewhere.

watermelon0 · on Dec 30, 2021

What would be the use case of AV software needing to perform remote code injection?

rwmj · on Dec 30, 2021

There's barely a case for AV software existing at all. It's the root cause for all kinds of impossible to track bugs and performance regressions.

sebazzz · on Dec 30, 2021

Cries in Symantec Endpoint Protection with "file access scans" turned on, compiling a fairly average .NET solution

DarkWiiPlayer · on Dec 30, 2021

The best kind of antivirus is the one that stays quiet unless you specifically ask it to scan a file. Did someone say Clam AV just now?

Darvon · on Dec 30, 2021

Why did you download an executable that you don't trust? Don't even scan it, just delete it and go find one you do trust.

fennecfoxen · on Dec 30, 2021

Trust, but verify.

SeanLuke · on Dec 30, 2021

When I took a compiler class back in the early '90s, the project was to write the compiled machine code into an array, then cast the array into a function, and execute it. I and another student were doing it on 68040 NeXT workstations. One other student was doing it on a Mac, one on a VAX, and the rest on PCs (the PC students largely failed!). We were mystified why, when we tried to execute our code, it was as if it wasn't there. Took us a while to realize that the 68040 had separate instruction and data caches, and even more time (and emailing people at NeXT) to determine what the cache flush procedure was.

rkeene2 · on Dec 30, 2021

I once saw code like:

unsigned short main[] = {0104525, 0xb8e5, 32, 0, 0xC35D};

and I made a tool to generate it automatically from a function [0].

[0] https://rkeene.org/viewer/tmp/ret32-maker.c.htm

foota · on Dec 30, 2021

You may be thinking of https://jroweboy.github.io/c/asm/2015/01/26/when-is-main-not...

rkeene2 · on Dec 30, 2021

I think what I saw was older since "ret32-maker.c" was last updated in 2008.

yuubi · on Dec 30, 2021

IOCCC 1984, mullender.c ?

rkeene2 · on Jan 3, 2022

That's definitely the right style, though the particular example I saw was for Linux/x86, probably a port of that PDP-11 code.

seligman99 · on Dec 30, 2021

I wrote a "cd" replacement for cmd[1] a _long_ time ago (I only recently uploaded it to Github).

It uses exactly this technique to run a thread in cmd's process to actually change the directory. It's kept working from XP on up to Windows 11 now. I am always amazed it works, I fully expect it to go boom some day, probably with an error along the lines of "Don't do that, please".

[1] https://github.com/seligman/ccd/blob/master/RemoteThread.cpp...

rightbyte · on Dec 30, 2021

Terrible hacks is an art form that is hugely underrated today in the name of overengineered best practice complexity monsters. Sometimes, just doing it the stupid way is simpler than doing it properly. Especially when working with propertiary systems ...

jl2718 · on Dec 30, 2021

I was competing in the Jump Trading programming competition and thought I had a pretty good implementation in AVX asm, but I was still behind one of their engineers, so I asked him after the competition. Turns out he was a Linux kernel committer and wrote a process to spawn multiple threads by copying itself, modifying the parameters, and then setting the offsets directly in the thread table, avoiding all mallocs and thread startup. So basically, his math code was just basic C loops, but his process was complete before my threads even finished allocation.

Forgive me if I got it wrong, I am definitely not a Linux kernel committer.

rkeene2 · on Dec 30, 2021

This reminds me of the time I wanted to run binaries compiled for SSE3 on a system that lacked SSE3. I started writing a tool to emulate this [0], and one thing it could do is rewrite the executable pages with replacement instructions if there was something that would fit (using memcpy(2), naturally).

This harkens back to the days when you could "download" a math coprocessor for your SX system, which was a TSR which likely did the same catching and handling of illegal instructions.

[0] https://github.com/rkeene/sse3-emu/blob/master/libsse3.c

tjalfi · on Dec 30, 2021

Older versions of Windows used a similar technique to replace the PREFETCHW instruction with NOPs on processors that didn't support it.

fork-bomber · on Dec 30, 2021

Memcpying and executing code could also surface micro-architectural realities of the underlying CPU and memory subsystem micro-architecture that may need attention from the programmer.

For example:

- On most RISCy Arm CPUs with Harvard style split instruction and data caches special architecture specific actions would need to be taken to ensure that after the memcpy any code still lingering in the data cache was cleaned/pushed out to the intended destination memory immediately (instead of at the next cache line eviction).

- Any stale code that happened to be cached from the destination (either by design or coincidence) needs to be invalidated in the instruction cache.

- Depending on the CPU micro architecture, programmer unknown speculative prefetching into caches as a result of the previous two actions may also need attention.

oshiar53-0 · on Dec 30, 2021

Is there anything else that should be done except flushing D-cache and invalidating I-cache? I'm genuinely curious.

jlokier · on Dec 30, 2021

If you are asking about general cases:

If using paging you may need to invalidate the TLB entry which contains execute permission for the page.

On x86 if using segments, after changing segment attributes you need to reload the segment selectors.

The execution pipeline may need to be flushed, using a serialising instruction.

When modifying code in place that may be being executed by another thread on another core at the same time, some modifications may trigger CPU errata.

On particular CPUs there may be other kinds of caches or state invalidation required, but hopefully the OS provides a "flush I-cache" function that covers all of them.

userbinator · on Dec 30, 2021

I've done stuff like this before; it works very well if you know the limitations, and I'd say that it even gives you a better understanding of how things actually work. Of course, don't bother MS or any other "official" vendor if it doesn't work, because you are on your own in debugging it.

Terry_Roll · on Dec 30, 2021

>because you are on your own in debugging it.

"The customer reported that this code “worked just fine on 32-bit x86 and 64-bit x86”, but it doesn’t work on Itanium."

"Actually, I’m surprised that it worked even on x86!"

And that is probably how Spectre and Meltdown were found!

Differences in the cpu's.

wyldfire · on Dec 30, 2021

Well, it should work in some well-controlled cases and in those cases if you're writing this code you are probably an OS vendor. Examples of legit use case: an executable loader or some bootstrapping code/bootloader.

When writing code like this, as Chen says, you are bound to the architectural rules regarding how to appropriately locate code and safely invalidate code caches etc.

So if you follow the rules it doesn't work then typically I figure you'd take it up with the CPU vendor.

While you could potentially do it and have a good reason to in userspace code, it should be heavily scrutinized because it's so unconventional.

ouid · on Dec 30, 2021

"No, the opposite of a virus, an anti-virus". Thereby demonstrating that a virus is its own anti-particle.

watersb · on Dec 30, 2021

When I read this quote from the customer in the story, I wasn't surprised.

I figured it was likely that code that aggressively scans and modifies other running executables would be written as a kludge, an unorthodox way of abusing the compiler-loader-runtime chain.

anothernewdude · on Dec 30, 2021

Kaspersky actually has been known to travel networks.

MonkeyClub · on Dec 31, 2021

Bit more on this, please? Sounds interesting!

basementcat · on Dec 30, 2021

The client certainly should have made sure their code was truly position independent.

Also, the client should have embedded their code in the executable file name so they just have to jump to the appropriate offset in argv[0]. This way, future updates just require renaming the file!

liftm · on Dec 30, 2021

Why use PIC if you can just write your own relocator? https://fasterthanli.me/series/making-our-own-executable-pac...

MattPalmer1086 · on Dec 30, 2021

I imagine they were dynamically building the code to inject, or why bother with the complexity?

jnwatson · on Dec 30, 2021

At least one additional step which is required on some architectures is you must flush the data cache and invalidate the instruction cache at the location of the new code.

Dynamically loading code is indistinguishable from self-modifying code, and each architecture has special steps you must take in order for it to work.

WalterBright · on Dec 30, 2021

> This code is such a bad idea, I’ve intentionally introduced errors so it won’t even compile.

No problem, I just fixed the compiler to compile it!

truekonrads · on Dec 30, 2021

Much of red-team/pentest/malware code works like this and is surprisingly reliable.

josephcsible · on Dec 30, 2021

The "such a bad idea" part of this is that the code being injected is written in C++ rather than assembly. The injector itself is perfectly reasonable.

beaconstudios · on Dec 30, 2021

really? I've not written much shellcode at all, but what I did write wasn't generically-compiled C++ - it was always either C or ASM, specifically because you get to avoid all the platform and position-dependent stuff (except in return-to-libc payloads).

ssklash · on Dec 30, 2021

You can definitely use C++, but you need to use specific compiler flags and avoid things like the STL or exceptions. Strings need to be created on the stack, a few other tricks. Then you can extract the .text section assembly of the resulting binary and inject and run it.

beaconstudios · on Dec 30, 2021

That makes sense, but then a C++ without exceptions, without vtables, without the STL - might as well be C, right? There's not much going on there beyond syntax sugar!

pjmlp · on Dec 30, 2021

It still has stronger type safety, that is why C89 is mostly C++ compatible, not all of it is.

Then you also get namespaces, modules, compile time execution, type safe macros (aka templates).

It is a bit more than syntax sugar.

NtGuy25 · on Dec 30, 2021

You want to use C anyhow as you want to make sure you have control over the code that is output.

For example the following code you know what the assembly is going to be.

strcmp(char* a, char* b);

strcmp(str1,str2);

If you do the above as a template you can run into some weird issues that you may not be expecting. So while tedious, you would need to make your own wscmp. You also have to be very careful so that you don't pull in ANY libraries. Since your code needs to be 100 % independent and do the loading itself.

C++ exceptions are implemented at the OS level in windows. C++ exceptions using SEH, while there's also VEH and unhandled exceptions. You can easily use SEH for your shell code, it's just not documented well. But sadly you have to manually set this up by having something like

SetExceptionHandler(curAddr,Handler) // Where curaddr can be found by doing something like call $+5 so you remain position independent.

ssklash · on Dec 30, 2021

Yeah it's not super helpful. Maybe a bit smaller code, stricter type checking, a little faster compilation time possibly But not really a huge benefit over C. Sure beats writing it all in hand-coded assembly though!

mox1 · on Dec 30, 2021

Uhh yea, this is Running shellcode 101, works very well. My Red Team stuff at work all starts with a simple loader like this (with some encryption / obfuscation sprinkled in).

When I was first shown this I was like 'What non virus use case does this have!?!?'

ssklash · on Dec 30, 2021

Red teamer here too, and this was my exact thought. There's lots of legit uses for DLL injection, but straight up shellcode injection? Shady as hell. So of course it was an AV vendor...

tptacek · on Dec 30, 2021

Instrumenting and debugging live processes is the big one.

dotancohen · on Dec 30, 2021

Which is not an end-user advantage but rather a tool to better understand end-user software - equally useful for improving said software or attacking it. Assuming GP meant "virus" as in "malware" then actually this supports his point.

dark-star · on Dec 30, 2021

I actually did something like that on Windows x86, and it worked fine. Even I was surprised by that fact :)

I used it to copy out a (forgotten) password from a password inputfield in another program, which you cannot read remotely (for security reasons). Worked fine for that one use-case, and I haven't used this trick it anywhere else ever again :)

gavinray · on Dec 30, 2021

How did this work exactly?

Suppose you are able to inject and execute remote code into the process containing your password in a text field

How did you read the password out of it? Did you know that the password variable was stored at some particular memory location?

dark-star · on Dec 30, 2021

I don't think I have that code around anymore, at least I can't find it now. And it's been a while. Here is what I remember:

Basically you use VirtualAllocEx() to allocate some memory in the remote thread. The returned pointers are in the context of the target process.

You can access that remote memory with ReadProcessMemory() and WriteProcessMemory(), which uses those "remote" pointers to copy data to/from your process.

You can then use these memory areas to pass global handles and other stuff around.

For accessing the actual password field data, you use standard Window-Messages with SendMessage() etc.

mwcampbell · on Dec 30, 2021

Some Windows screen readers used, and maybe still use, the same technique to get data out of the SysListView32 common control, since the parameters and results of that control's window messages aren't marshaled by the OS.

Edit: Looks like NVDA still does. https://github.com/nvaccess/nvda/blob/master/source/NVDAObje...

heinrich5991 · on Dec 30, 2021

Permanent link (press 'y' anywhere on Github): https://github.com/nvaccess/nvda/blob/b5f82f878f344ab26004f3....

unnouinceput · on Dec 30, 2021

This technique is a classic one. I remember learning about it in late 90's after I got infected with a malware that hid itself from Task Manager. That made me also write my own Task Manager.

To this day (Win10/Win11) you can hide your program from Task Manager using this technique and any malware that respect itself does it.

kelnos · on Dec 30, 2021

I don't know much about Windows GUI programming, but in other GUI toolkits I've used, there's usually some sort of text_field.get_current_value() function you can call. Presumably the parent injected some code that repurposes the callback of a button or something so that when you click it, it calls get_current_value() and then dumps it to console or a log file or something.

stevekemp · on Dec 30, 2021

That reminds me of the "Shatter Attack" - which basically involved posting messages to the event-loop/window of a different process:

https://en.wikipedia.org/wiki/Shatter_attack

You could do that to call WM_GETTEXT, against an input-control, for example.

wmu · on Dec 29, 2021

- It looks like a virus... - But it's the antivirus!

:)

jayd16 · on Dec 30, 2021

But Doctor, I Am Pagliacci

leni536 · on Dec 30, 2021

Of course all of this is very much undefined behavior in standard C and C++. Some programmers really need to learn that they program the "abstract machine" when they write C or C++.

plorkyeran · on Dec 30, 2021

That is probably the smallest and least interesting of the problems with this. Any method for injecting code into another process is inherently going to be platform-specific and outside the bounds of the C++ abstract machine.

10000truths · on Dec 30, 2021

The standard is pointless when what you’re trying to implement is non-portable or architecture-specific by nature. At that point, your compiler implementation and target architecture are what matters.

alisonkisk · on Dec 30, 2021

compiler-defined is extremely different from undefined.

10000truths · on Dec 30, 2021

That's the thing, though. What is considered undefined by the standard is NOT necessarily undefined by the compiler or target architecture. Type punning via union is UB in the standard, but well defined in `gcc`. Signed overflow is UB in the standard, but well defined in `gcc` with `-fwrapv`. Casting a void pointer to a function pointer is UB in the standard, but well defined for pretty much every non-MCU target arch (dlsym relies on it, after all!). Dereferencing a runtime-known NULL pointer is UB in the standard, but is well defined to trigger a segfault in pretty much every arch with an MMU. Etcetera, etcetera, etcetera.

atq2119 · on Dec 31, 2021

> Dereferencing a runtime-known NULL pointer is UB in the standard, but is well defined to trigger a segfault in pretty much every arch with an MMU.

This is incorrect for C/C++ though. Modern compilers definitely treat null dereferences as UB, with real consequences (e.g. eliminating redundant null pointer checks). The compiler is part of the architecture.

pjmlp · on Dec 30, 2021

The fun starts when there is a new compiler version in town and that changes.

Even Linus got to enjoy that fun.

woodruffw · on Dec 30, 2021

I don't think most of this is actually UB. The cast from a function pointer to `BYTE *` is, but the rest is "sound" (AFAICT) from the C abstract machine's perspective. The reason it fails is basically orthogonal to C.

leni536 · on Dec 30, 2021

From the C abstract machine's perspective it is undefined:

1. To substract the two function pointers after casting to BYTE*.

2. To read the bytes through those pointers.

3. To cast the copied bytes back to function and invoke it.

The article only focuses on 3, and only about how it's undefined because the compiler is not required to generate position independent code. But an optimizing compiler in theory can just optimize the whole function away from looking at the very first undefined line.

The message is that yes, you need to step out of the standard to do stuff like this, but you may have to consult a lot more about your implementation defined behavior than you originally signed up for.

Kranar · on Dec 30, 2021

The reason it fails is due to UB. There's not much point in saying that most of it isn't UB since the part that is UB is the part that causes the failure.

woodruffw · on Dec 30, 2021

The cast isn't the part that causes the failure (C says that function pointers don't have to be safely representable within `void *` or any other fundamental pointer type, but they are on both x86(-64) and Itanium[1]).

The failure happens because the programmer assumed that the code is "self contained" and position-independent, both of which are concepts outside of the C abstract machine.

[1]: Function pointers on Itanium are actually fat, but IIRC most compilers hide this by making the "function pointer" point to some kind of thunk instead.

Kranar · on Dec 30, 2021

This is the type of reasoning that leads people to write insecure code, by trying to outguess the compiler with implementation details that are entirely inapplicable. It suggests that if the code was self-contained or position independent or satisfied any property whatsoever, that using memcpy would be a perfectly fine way to copy it, that is simply untrue, it would never be safe to use memcpy on pointers to functions under any circumstance. It's irrelevant whether you are on x86, Itanium, or any other platform, undefined behavior WILL result in incorrect and invalid program semantics and can not be relied upon to produce consistent results even within the same execution of the same program.

There's literally decades worth of people trying to use undefined behavior in clever ways and ultimately failing and yet here we are...

woodruffw · on Dec 30, 2021

I think you're reading too far into my reasoning. I'm not a huge fan of UB, and I mostly write Rust these days to get away from cheekiness like this.

My only point was that, for better or worse, UB is not the culprit in this code. C could have well-defined abstract semantics for copying functions or aliasing function pointers through datatype pointers, and this code would still be platform dependent and would still break on different hosts.

Kranar · on Dec 30, 2021

This is also untrue, if the standard did mandate that pointers to functions could participate in a memcpy and the behavior was as if the instructions of that function were treated as an array of chars, then C++ compilers would be forced to accommodate that implementation regardless of what platform it ran on, Itanium, x86, even a PDP-11 it wouldn't matter. For example it might mean that implementations must tag or otherwise store a dictionary to keep track of addresses to functions so as to differentiate pointers to functions from pointers to objects and produce whatever appropriate behavior is needed to copy said instructions. An implementation could keep an intermediate representation of any function that can potentially be memcpy'd and then translate that representation at runtime when it participates in a memcpy. Whatever the case may be, if the standard mandates certain behavior then an implementation is required to respect it. While not entirely the same, C++ does do something along these lines in certain situations involving pointers to virtual member functions involved in a diamond inheritance structure.

Saying that C++ (the code in the article is not C and the two languages differ about the treatment of pointers to functions) could have well defined semantics to handle this is entirely moot... it's about as relevant as saying that a C++ program could run on top of the JVM with a garbage collector [1] and hence eliminate all types of memory errors. Even if a C++ program ran on a platform that had guaranteed garbage collection the fact would still remain that C++ as a language does not have well defined semantics for what happens to a dangling reference and as such a C++ compiler is free to exploit that to make very strong assumptions about the runtime behavior of the program for the sake of generating efficient code.

The fact that this is undefined behavior gives a compiler the freedom to perform optimizations under the assumption that runtime behavior will never engender said behavior. Raymond makes use of this property when he discusses COMDAT folding which is a common optimization to elide multiple copies of the same function or to even produce multiple versions of a single function optimized for different scenarios. From the article:

"Even without Profile-Guided Optimization, compile-time optimization may inline some or all of a function, so a single function might have multiple copies in memory, each of which has been optimized for its specific call site."

This property has nothing to do with x86, or Itanium or PDP-11, it's a purely logical optimization permissible only because of the various forms of undefined behavior in C++ with respect to the treatment of function pointers.

[1] http://nestedvm.ibex.org/

anticensor · on Dec 31, 2021

> if the standard did mandate that pointers to functions could participate in a memcpy and the behavior was as if the instructions of that function were treated as an array of chars, then C++ compilers would be forced to accommodate that implementation

C/C++ compilers would then not be portable to ISAs with Harvard architecture.

0xdky · on Dec 30, 2021

I froze for a moment seeing this article after having worked at a major anti-virus company long time back and used some low level Win32 APIs.

Fortunately, I followed some of the techniques from “Programming Applications for Microsoft Windows” book and Detours project to intercept and execute custom code mostly based on loading custom DLL in target remote process and using DllMain() to execute.

ggm · on Dec 30, 2021

Yet, copy-on-write works well in Unix fork/exec() models and helps reduce memory pressure. Presumably, the kernel has a mechanism which presents as logistically simple "copy" but takes care of page/pointer/vm necessity.

bzbarsky · on Dec 30, 2021

Position-dependent code can be copied in _physical_ memory as long as the _virtual_ addresses (which is what you observe) don't change.

So yes, the mechanism you posit exists, and its the virtual memory manager.

oshiar53-0 · on Dec 30, 2021

Related: https://devblogs.microsoft.com/oldnewthing/20190902-00/?p=10...

weinzierl · on Dec 30, 2021

Isn't this whole idea of injecting code on modern systems doomed because of write xor execute (W^X, NX), which is hopefully enabled?

liftm · on Dec 30, 2021

If you can allocate memory in a foreign process, I'd guess you could also change the permissions on that memory… So write first, then change to executable.

(VirtualProtectEx looks like it would do that. Never used winapi, not sure.)

mrlonglong · on Dec 30, 2021

Yes, you can copy code with memcpy as long as it's position independent code!

donkarma · on Dec 30, 2021

Does anyone else remember reading an Old New Thing article like this one before?

donkarma · on Dec 30, 2021

Found it, https://devblogs.microsoft.com/oldnewthing/20180615-00/?p=99...

Jyaif · on Dec 30, 2021

Don't publicly make fun of your customers... unless they are an anti-virus company.

What boggles my mind is how they went on to ask MS for help fixing their obviously wrong vulnerability-and-crash-introducing software.

Arnavion · on Dec 30, 2021

Raymond Chen makes fun of customers all the time. He just doesn't name them.

Also, in this particular case the case is old enough that it's not necessarily any AV that's still around anyway.

phendrenad2 · on Dec 30, 2021

As long as the code is truly position-independent, I.E. compiled with -fPIC in GCC/LLVM, then there will be no problem.

skim_milk · on Dec 30, 2021

But the code being memcpy'ed is using the parent process's specific symbol relocations. When a library with PIC is loaded the executable code is copied from the file into RAM to a random offset and all references to structures in the ASM are updated to match their now random offset in memory (simplifying). Say function XYZ in library libfoo is in offsetXYZ, parent process loads libfoo at offset 0xDEADBEEF, injectee process loads libfoo at offset 0xDEC0DE. In Windows the call to function XYZ in the parent process uses the address offsetXYZ+0xDEADBEEF, but the call in the injectee process uses offsetXYZ+0xDEC0DE, causing any reference to the parent process's function to fail. GNU/Linux is very similar but library symbols are found based on an offset to a structure in memory that contains the library symbol metadata, that changes every time the program is loaded.

So actually the opposite is true, if the code wasn't position-independent and was statically located, the assembly code offsets wouldn't need to be updated and you might be able to call a memcpy'd function. Position-independent code could only possibly work if you updated the reference to the symbol metadata structure in the ASM after the memcpy, but at that point you're re-implementing libdl and no longer just using memcpy.

ssklash · on Dec 30, 2021

This depends on the libraries/DLLs being used. Windows loads system DLLs at the same location in every process's address space, so you can use process-local offsets in a remote process. For custom libraries of course this wouldn't work. Or if the required system library hasn't been loaded in the remote process.

phendrenad2 · on Dec 30, 2021

Yes, the example Raymond uses isn't PIC.

badrabbit · on Dec 30, 2021

Malware authors disagree.