Going Viral, or an Infectious ELF 🧝
Roses are red,
Violets are blue,
Sugar is sweet,
And viruses are cool.
— Nursery Rhyme
Current events yada yada. We’re going to write a virus today!
In hindsight it’s obvious that our previous adventures led to the magical realm of malware. What other group of software needs to hide in the trenches of an operating system (or even lower), always trying to evade detection. In order to succeed, malware authors need to be well aware of all the low-level details of their target environment. This includes but is not limited to ways of hooking into important functionality, injecting into running processes and being able to hide code inside files because of the way certain formats work.
What better way to learn about those details than to write a tiny piece of malware ourselves? It’s not going to be very practical, but rather educational. We’re not in for the money. It’s just going to be another step in our journey of learning binary stuff on Linux
. As usual we’ll deal with ELF
files of the 64-bit variety, but that’s not at all important for the method of infection we’re showing off today.
As already mentioned, we’re going to write a small virus that infects ELF
files by prepending itself to those host binaries. There’s a long history of these types of viruses and I think it makes for a great starting point into the subject matter. I hope you enjoy.
What’s a Prepender?
The infection method presented in this article is the simplest one imaginable. Yet, we’re still going over the concept with a manual example, just to be able to visualize it better.
Our starting point are two programs. A host…
|
|
… and a virus.
|
|
Let’s compile them.
|
|
As expected, we get two ELF binaries. Let’s quickly search for the ELF magic number.
|
|
Now we’re going to do what viruses do best: Infecting stuff. Let’s prepend the virus binary to the host binary. We do it the other way round, though, by appending the raw host bytes to the virus.
|
|
Alright, we’ve got two for one! Running the binary will now simply run the virus code and exit. We note that two concatenated ELF files function as expected. The first one gets run while the second one is still present.
Let’s manually “extract” our host code from the binary. We saw the offset of the second ELF header in the xxd
output:
|
|
Okay, our host code apparently starts at offset 16080
decimal. Armed with this knowledge, we can use dd
to carve out our host file:
|
|
That worked! As we can see, it’s important to know the exact size of the (compiled) virus. Something to keep in mind for later.
Be Prepared, Not Scared
That’s enough manual labor for a whole week! A computer should have done that. But how does the actual virus behave. Here’s some pseudocode:
|
|
There are more nuances than are shown here, but this should suffice for now.
We have a rough road map, so let’s think about our language of choice.
Scary viruses are written in Assembly
or C
, right? Yeah well, those languages are even more scary than the viruses themselves!
So what are our options? Looking at it, that pseudocode is almost a valid Python
program. Almost. Let’s just stick with that lovely language for now. Malware written in it seems to be on the rise anyway. We’re all about that cutting-edgyness here!
Without further ado, I proudly present to you Linux.DoomsdayPreppers1:
|
|
We start by defining some constants. First off, we definitely want to put our virus under quarantine. The DARK_MARK
in line 7 is a new concept. It’s going to help us identify already infected files.
Lastly, we have our hardcoded VIRUS_SIZE
. We’ll talk about it in a minute.
The two functions is_elf()
and is_infected()
use the constants to check for the ELF
magic and the infection mark respectively.
Next up is our infection routine:
|
|
We cut the virus bytes from the currently executing binary2 (which could be the virus itself or an already infected binary), prepend them to the not yet infected file’s bytes and overwrite the file (identified by its path
) with the whole thing in line 22.
The next function has a little bit of fancyness going on. In order to appreciate it, allow me to make a quick detour:
There are a couple of ways to run the host binary on Linux
. We could create a temporary file, write the host bytes into it and execute it via one of the exec()
functions. This approach works, but it’s just not fancy enough. While researching, I got the feeling that “dropping a file” is frowned upon. Because we too want to be cool kids, there has to be another way.
Well, of course there is with over 300 system calls! One of those is memfd_create()
. Let’s have a look at its description:
memfd_create() creates an anonymous file and returns a file descriptor that refers to it. The file behaves like a regular file, and so can be modified, truncated, memory-mapped, and so on. However, unlike a regular file, it lives in RAM and has a volatile backing storage. Once all references to the file are dropped, it is automatically released. Anonymous memory is used for all backing pages of the file.
Look at us hot shot virus authors. Being really stealthy and all that! Obviously I jest3, I have no clue about intrusion detection systems and such. But that’s besides the point, because this right here is merely about discovering and trying new things.
But wait a second, we’re getting ahead of ourselves. Are we even able to try it? We’re writing Python
, after all. It does have the wonderful os
module. But does said module have a wrapper around memfd_create()
?
You know, that’s what they call a trick question. Python
, of course, has everything. Don’t you ever doubt it again!
|
|
So back to our code: In line 25 we ask the memfd_create()
wrapper for a file descriptor, use it to write the host code into the in-memory file in line 30 and execute said file with os.execve()
4 in line 32.
Who says systems programming is hard? We’re doing it with our comfy Pythonz
…
The payload could be anything, but ours brings to attention the modern comforts of toilet paper in these infectious times. Why not start mining crypto currency on those poor victim machines? Because we’re altruistic virus writers, that’s why!
Lastly we have our main loop that ties everything neatly together:
|
|
We iterate over every entry in our quarantine directory, skipping over other directories (because, again, we’re polite virus writers!) and the executing binary itself. A suitable file is identified via the ELF_MAGIC
bytes.
In order to not infect a file again, we check for the presence of the DARK_MARK
via is_infected()
. We don’t have to specifically write the mark to every infected binary, because it’s already statically present in the virus by merely defining it.
If a file is not yet infected, we… you guessed it. After we’re done infecting all the possible files, we want to actually run the host code so that it looks like every infected program gets executed normally. Our simple check in line 59 prevents us from trying to execute non-existing host code in the original virus binary.
Are we done? The attentive reader might have noticed a fatal flaw 5: All we’ve talked about so far are ELF
files. But we’ve got a Python
script at hand. That’s not working, ain’t it?6
Creating a binary from a Python script
Luckily there are a few options for “compiling” our script, each with their own considerations and tradeoffs. At first, I wanted to give a quick overview over the landscape of helpful tools, but that topic has no business of taking up much space in this article. Oh dear, taking up very much space it certainly would!
That’s why we keep it brief:
There are three main tools for converting a Python
script into an executable:
PyInstaller
PyOxidizer
Nuitka
(which I went with for no particular reason)
Let’s create a simple executable:
|
|
That works! But what kind of file is it?
|
|
By default Nuitka
creates a dynamically linked ELF
file. If we check all the required shared libraries with ldd
, we see that it depends on a specific version of libpython
. Good luck getting a widespread infection with this virus!
What’s that? Create a statically linked binary, you say? That’s a fantastic idea, which will result in a binary that’s between 20 and 30 MB large! Our dynamically linked one is large enough as is (line 15 shows the byte count).
So while I would consider a static Python
binary for a Post-Exploitation-Framework, because of the sheer comfort of writing it, there’s something distasteful about doing our little experiments with so much stuff attached. We’re still going to test our dynamically linked version, but the virus is called Doomsday Preppers for a reason!
First outbreak
Now that we have a way of compiling our script, let’s just do that and start a controlled outbreak:
|
|
After compiling our virus, we copy some system binaries (namely ls
and echo
) into our quarantine directory. We can see just how small those programs are in lines 16 and 17.
Running our virus binary conveniently gives us feedback about what happened. I wish every virus writer was as polite as we are! According to the output of our virus, everything should have gone as expected. And indeed, checking the file sizes in lines 26 and 27, they’ve grown quite a bit.
Our infection routine seems to be working if we run the original virus, but what about running an infected binary. Will it infect other binaries and execute the host code?
|
|
Running the infected ls
produces the payload’s output and the correct ls
output (line 7). That’s so cool, the whole executing the host from memory via memfd_create()
does work!
For our last test, we copy a fresh binary into our little zoo:
|
|
Running the infected ls
again does indeed infect the fresh copy of ps
, in addition to running the payload and the host code.
We did it! Our very first, very own, very useless virus. The year of our Lord 2022 will be the year of the virus! Wait…
Linux.Doomsday
Now that we’re done with our first little virus, I’m letting you in on a secret. This right here is not purely educational. It’s also a vanity project! I have to at least go down to sea C
level in this article. I want the scene to put some respeck on my name7!
Because we’re not drastically altering the structure of our virus, I think it’s safe to take in all the sights at once:
|
|
Now we’re talking! There are a couple of things to note, though. First off, I have no idea what I’m doing with regards to C
8! Again, the error handling is missing intentionally for brevity. But how should a virus handle errors, anyway?
Probably be as quiet as possible about it.
Another difference is the way we detect infected files. The Python
one-liner for finding our static DARK_MARK
was super convenient. Doing a C
version of this would be quite a few one-liners. Instead we’re simply appending the DARK_MARK
manually to every binary that gets infected. This way we only have to check the last sizeof(DARK_MARK)
bytes of every file to see if it’s already infected. What’s that? How do we mark the original virus binary? Well, that’s a build step now! We’ll have a look at it in a moment.
Next up is a system call that’s super convenient: sendfile()
. It copies data between two file descriptors within the kernel, which makes it more efficient because there’s no need for temporary buffers in user space.
The last thing I want to highlight is the usage of libelf.h
9. It’s completely overkill for this virus, as we could have easily just checked the first four bytes of every file for the ELF_MAGIC
. I still wanted to give the library a try, because more sophisticated infection mechanisms than prepending rely on ELF
internals, which in turn means we need a way of actually parsing those files in the future.
Finally let’s have a look at our Makefile
:
|
|
The -lelf
switch tells the compiler to link against libelf
. As commented, the echo
line appends the dark mark. We have to use the little endian representation of 666
:
|
|
b'\x9a\x02'
Conclusion
And just like that, we’re done. Let’s recap what we did:
First we manually prepended a binary to another one. Executing this Frankenstein worked just fine. We then proceeded to write our first virus based on that concept. Doing it in Python
gave us all the goodies of a high-level language. But because Python
programs usually get interpreted, we needed another build step to compile our script into a native ELF
binary. Those can get pretty chunky, however, especially when statically linked.
While certainly a great preparation, it didn’t feel quite right. We’re all about minimalism here and there’s nothing minimalist about a 30MB prepender virus!
The logical choice was to use C
, which was a super fun exercise in reading up on system calls. We learned about manually seeking in files (lseek()
), efficiently copying data between file descriptors (sendfile()
) and creating ephemeral files in memory (memfd_create()
).
Armed with all that knowledge, we actually succeeded in writing a simple prepender virus that doesn’t drop any temporary files. Can I now buy some merch without feeling like an impostor?
As always, if you have any questions or suggestions: Feel free to holla at me. Thanks for reading!
Resources and Acknowledgments
- Unix Viruses by Silvio Cesare, the early
ELF
virus bible! Our article touches upon approximately 5% of what he had to say ages ago. - @guitmz’s website with many prepender examples in different languages. Thanks for the inspiration and the
memfd_create()
call! - Himanshu Arora’s decade old article in the Linux Journal that describes the exact technique we’re using.
- libelf by example by Joseph Koshy, which is a great introduction.
- ELF-VIRUS by Shail Shah for some
C
inspiration (mainlysendfile()
).
That’s precisely why we need to keep track of the
VIRUS_SIZE
. ↩︎🎩 ↩︎
Which conveniently lets us specify a file descriptor instead of a path for the file to be executed since
Python 3.3
. ↩︎It’s not the lack of error handling. That’s missing intentionally so that things stay readable. ↩︎
That doesn’t mean there are no viruses in script form. Have a look at this talk by Ben Dechrai. ↩︎
I’m not being overmodest, I really have no clue. ↩︎