Going Viral, or an Infectious 🧝
Roses are red,
Violets are blue,
Sugar is sweet,
And viruses are cool.
– Nursery Rhyme
Current events yada yada. We’re going to write a virus today!
In hindsight it’s obvious that our previous adventures led to the magical realm of malware. What other group of software needs to hide in the trenches of an operating system (or even lower), always trying to evade detection. In order to succeed, malware authors need to be well aware of all the low-level details of their target environment. This includes but is not limited to ways of hooking into important functionality, injecting into running processes and being able to hide code inside files because of the way certain formats work.
What better way to learn about those details than to write a tiny piece of malware ourselves? It’s not going to be very practical, but rather educational. We’re not in for the money. It’s just going to be another step in our journey of learning binary stuff on
Linux. As usual we’ll deal with
ELF files of the 64-bit variety, but that’s not at all important for the method of infection we’re showing off today.
As already mentioned, we’re going to write a small virus that infects
ELF files by prepending itself to those host binaries. There’s a long history of these types of viruses and I think it makes for a great starting point into the subject matter. I hope you enjoy.
What’s a Prepender?
The infection method presented in this article is the simplest one imaginable. Yet, we’re still going over the concept with a manual example, just to be able to visualize it better.
Our starting point are two programs. A host…
… and a virus.
Let’s compile them.
As expected, we get two ELF binaries. Let’s quickly search for the ELF magic number.
Now we’re going to do what viruses do best: Infecting stuff. Let’s prepend the virus binary to the host binary. We do it the other way round, though, by appending the raw host bytes to the virus.
Alright, we’ve got two for one! Running the binary will now simply run the virus code and exit. We note that two concatenated ELF files function as expected. The first one gets run while the second one is still present.
Let’s manually “extract” our host code from the binary. We saw the offset of the second ELF header in the
Okay, our host code apparently starts at offset
16080 decimal. Armed with this knowledge, we can use
dd to carve out our host file:
That worked! As we can see, it’s important to know the exact size of the (compiled) virus. Something to keep in mind for later.
Be Prepared, Not Scared
That’s enough manual labor for a whole week! A computer should have done that. But how does the actual virus behave. Here’s some pseudocode:
There are more nuances than are shown here, but this should suffice for now.
We have a rough road map, so let’s think about our language of choice.
Scary viruses are written in
C, right? Yeah well, those languages are even more scary than the viruses themselves!
So what are our options? Looking at it, that pseudocode is almost a valid
Python program. Almost. Let’s just stick with that lovely language for now. Malware written in it seems to be on the rise anyway. We’re all about that cutting-edgyness here!
Without further ado, I proudly present to you Linux.DoomsdayPreppers1:
We start by defining some constants. First off, we definitely want to put our virus under quarantine. The
DARK_MARK in line 7 is a new concept. It’s going to help us identify already infected files.
Lastly, we have our hardcoded
VIRUS_SIZE. We’ll talk about it in a minute.
The two functions
is_infected() use the constants to check for the
ELF magic and the infection mark respectively.
Next up is our infection routine:
We cut the virus bytes from the currently executing binary2 (which could be the virus itself or an already infected binary), prepend them to the not yet infected file’s bytes and overwrite the file (identified by its
path) with the whole thing in line 22.
The next function has a little bit of fancyness going on. In order to appreciate it, allow me to make a quick detour:
There are a couple of ways to run the host binary on
Linux. We could create a temporary file, write the host bytes into it and execute it via one of the
exec() functions. This approach works, but it’s just not fancy enough. While researching, I got the feeling that “dropping a file” is frowned upon. Because we too want to be cool kids, there has to be another way.
Well, of course there is with over 300 system calls! One of those is
memfd_create(). Let’s have a look at its description:
memfd_create() creates an anonymous file and returns a file descriptor that refers to it. The file behaves like a regular file, and so can be modified, truncated, memory-mapped, and so on. However, unlike a regular file, it lives in RAM and has a volatile backing storage. Once all references to the file are dropped, it is automatically released. Anonymous memory is used for all backing pages of the file.
Look at us hot shot virus authors. Being really stealthy and all that! Obviously I jest3, I have no clue about intrusion detection systems and such. But that’s besides the point, because this right here is merely about discovering and trying new things.
But wait a second, we’re getting ahead of ourselves. Are we even able to try it? We’re writing
Python, after all. It does have the wonderful
os module. But does said module have a wrapper around
You know, that’s what they call a trick question.
Python, of course, has everything. Don’t you ever doubt it again!
So back to our code: In line 25 we ask the
memfd_create() wrapper for a file descriptor, use it to write the host code into the in-memory file in line 30 and execute said file with
os.execve()4 in line 32.
Who says systems programming is hard? We’re doing it with our comfy
The payload could be anything, but ours brings to attention the modern comforts of toilet paper in these infectious times. Why not start mining crypto currency on those poor victim machines? Because we’re altruistic virus writers, that’s why!
Lastly we have our main loop that ties everything neatly together:
We iterate over every entry in our quarantine directory, skipping over other directories (because, again, we’re polite virus writers!) and the executing binary itself. A suitable file is identified via the
In order to not infect a file again, we check for the presence of the
is_infected(). We don’t have to specifically write the mark to every infected binary, because it’s already statically present in the virus by merely defining it.
If a file is not yet infected, we… you guessed it. After we’re done infecting all the possible files, we want to actually run the host code so that it looks like every infected program gets executed normally. Our simple check in line 59 prevents us from trying to execute non-existing host code in the original virus binary.
Are we done? The attentive reader might have noticed a fatal flaw 5: All we’ve talked about so far are
ELF files. But we’ve got a
Python script at hand. That’s not working, ain’t it?6
Creating a binary from a Python script
Luckily there are a few options for “compiling” our script, each with their own considerations and tradeoffs. At first, I wanted to give a quick overview over the landscape of helpful tools, but that topic has no business of taking up much space in this article. Oh dear, taking up very much space it certainly would!
That’s why we keep it brief:
There are three main tools for converting a
Python script into an executable:
Nuitka(which I went with for no particular reason)
Let’s create a simple executable:
That works! But what kind of file is it?
Nuitka creates a dynamically linked
ELF file. If we check all the required shared libraries with
ldd, we see that it depends on a specific version of
libpython. Good luck getting a widespread infection with this virus!
What’s that? Create a statically linked binary, you say? That’s a fantastic idea, which will result in a binary that’s between 20 and 30 MB large! Our dynamically linked one is large enough as is (line 15 shows the byte count).
So while I would consider a static
Python binary for a Post-Exploitation-Framework, because of the sheer comfort of writing it, there’s something distasteful about doing our little experiments with so much stuff attached. We’re still going to test our dynamically linked version, but the virus is called Doomsday Preppers for a reason!
Now that we have a way of compiling our script, let’s just do that and start a controlled outbreak:
After compiling our virus, we copy some system binaries (namely
echo) into our quarantine directory. We can see just how small those programs are in lines 16 and 17.
Running our virus binary conveniently gives us feedback about what happened. I wish every virus writer was as polite as we are! According to the output of our virus, everything should have gone as expected. And indeed, checking the file sizes in lines 26 and 27, they’ve grown quite a bit.
Our infection routine seems to be working if we run the original virus, but what about running an infected binary. Will it infect other binaries and execute the host code?
Running the infected
ls produces the payload’s output and the correct
ls output (line 7). That’s so cool, the whole executing the host from memory via
memfd_create() does work!
For our last test, we copy a fresh binary into our little zoo:
Running the infected
ls again does indeed infect the fresh copy of
ps, in addition to running the payload and the host code.
We did it! Our very first, very own, very useless virus. The year of our Lord 2022 will be the year of the virus! Wait…
Now that we’re done with our first little virus, I’m letting you in on a secret. This right here is not purely educational. It’s also a vanity project! I have to at least go down to
C level in this article. I want the scene to put some respeck on my name7!
Because we’re not drastically altering the structure of our virus, I think it’s safe to take in all the sights at once:
Now we’re talking! There are a couple of things to note, though. First off, I have no idea what I’m doing with regards to
C8! Again, the error handling is missing intentionally for brevity. But how should a virus handle errors, anyway?
Probably be as quiet as possible about it.
Another difference is the way we detect infected files. The
Python one-liner for finding our static
DARK_MARK was super convenient. Doing a
C version of this would be quite a few one-liners. Instead we’re simply appending the
DARK_MARK manually to every binary that gets infected. This way we only have to check the last
sizeof(DARK_MARK) bytes of every file to see if it’s already infected. What’s that? How do we mark the original virus binary? Well, that’s a build step now! We’ll have a look at it in a moment.
Next up is a system call that’s super convenient:
sendfile(). It copies data between two file descriptors within the kernel, which makes it more efficient because there’s no need for temporary buffers in user space.
The last thing I want to highlight is the usage of
libelf.h9. It’s completely overkill for this virus, as we could have easily just checked the first four bytes of every file for the
ELF_MAGIC. I still wanted to give the library a try, because more sophisticated infection mechanisms than prepending rely on
ELF internals, which in turn means we need a way of actually parsing those files in the future.
Finally let’s have a look at our
-lelf switch tells the compiler to link against
libelf. As commented, the
echo line appends the dark mark. We have to use the little endian representation of
And just like that, we’re done. Let’s recap what we did:
First we manually prepended a binary to another one. Executing this Frankenstein worked just fine. We then proceeded to write our first virus based on that concept. Doing it in
Python gave us all the goodies of a high-level language. But because
Python programs usually get interpreted, we needed another build step to compile our script into a native
ELF binary. Those can get pretty chunky, however, especially when statically linked.
While certainly a great preparation, it didn’t feel quite right. We’re all about minimalism here and there’s nothing minimalist about a 30MB prepender virus!
The logical choice was to use
C, which was a super fun exercise in reading up on system calls. We learned about manually seeking in files (
lseek()), efficiently copying data between file descriptors (
sendfile()) and creating ephemeral files in memory (
Armed with all that knowledge, we actually succeeded in writing a simple prepender virus that doesn’t drop any temporary files. Can I now buy some merch without feeling like an impostor?
As always, if you have any questions or suggestions: Feel free to holla at me. Thanks for reading!
Resources and Acknowledgments
- Unix Viruses by Silvio Cesare, the early
ELFvirus bible! Our article touches upon approximately 5% of what he had to say ages ago.
- @guitmz’s website with many prepender examples in different languages. Thanks for the inspiration and the
- Himanshu Arora’s decade old article in the Linux Journal that describes the exact technique we’re using.
- libelf by example by Joseph Koshy, which is a great introduction.
- ELF-VIRUS by Shail Shah for some
That’s precisely why we need to keep track of the
It’s not the lack of error handling. That’s missing intentionally so that things stay readable. ↩︎
I’m not being overmodest, I really have no clue. ↩︎