Replicating Cobalt Strike's Port Scanner as a BOF for Open-Source C2 Frameworks
Tldr
Was annoyed open source C2 tools had inferior port scanners compared to Cobalt Strike so I decided to replicate the functionality. Also wrote a ping scanner to help with host discovery. OPSEC may still require some improvements as described throughout the blog but the functionality is there.
The tool is available here: https://github.com/fyxme/portscanbof
On my quest to reviewing random C2’s for pure self enjoyment, I’ve come to realise that very few implement port-scanning for whatever reason.
While I understand that it’s possible to use a socks proxy and perform nmap
scans over the proxy using a TCP connect scan (-sT
flag), it feels less convenient, generates a lot of unwanted network traffic, requires the beacon to communicate constantly (sleep 0
) and may have DNS issues, which made me want to re-create the Cobalt Strike Port Scan functionality.
It’s already possible to run binaries (ie. exe
and dll
) via other BOFs like noconsolation, using dotnet CLR to run csharp binaries inline or using a PowerShell port scanner via the C2’s powershell functionality, however, a lot of these generate more IoCs than BOFs and their output formats don’t integrate as well with C2’s since they were not directly made for those.
Through reviewing a number of BOFs on Github, looking at custom C2 agents and more, I didn’t find many implementing portscanning up to the level of Cobalt Strike’s Port Scanner. Although, I will give the following honourable mentions:
- rvrsh3ll’s bofportscan BOF which works well but only supports a single host and a single port
- Mythic’s thanatos agent which seemed to be the most complete, supports IP and IP subnets as well as ports and port ranges. While the agent itself is really good and the blog post about it deserves a read, this port scan functionality is only available to users of this specific agent, isn’t portable like a BOF, and still doesn’t support as many arguments or output as Cobalt Strike’s port scanner.
Hence, the goals were set:
- write a portscanner BOF (COFF) that replicates Cobalt Strikes Port Scanning functionality including the wide variety of input parameters and provide the same amount of information as its output
- write a pingscanner to complement the portscanner and use as a first pass scan
- have some OPSEC considerations or at least describe the OPSEC limitations of the tool.
The initial release will be used to test and improve the BOF in training environments and/or test labs, with future improvement adding functionality and improving OPSEC. We’ll write the BOF in C as it’s the most common language to use for BOFs.
Enough intro, let’s begin.
Parsing Arguments
Cobalt Strike’s port scanner can be using the following way:
portscan [pid] [arch] [targets] [ports] [arp|icmp|none] [max connections]
- The
[pid]
and[arch]
options are used to “inject into the specified process to run a port scan against the specified hosts”. We’ll ignore completely as our BOF will just execute within the context of the running agent. Nevertheless, this is a good functionality if you want to improve OPSEC as you may want to choose a process which is likely to perform arbitrary requests to other hosts (eg. a web browser for HTTP/HTTPS). - The
[targets]
option is a comma separated list of hosts to scan. You may also specify IPv4 address ranges (e.g.,192.168.1.128-192.168.2.240
,192.168.1.0/24
) - The
[ports]
option is a comma separated list or ports to scan. You may specify port ranges as well (e.g.,1-65535
) - The
[arp|icmp|none]
target discovery options dictate how the port scanning tool will determine if a host is alive.none
assumes that all hosts are alive. Instead of doing this in the port scan BOF itself, I’ve decided to implement theicmp
functionality in a different BOF (ie.pingscan
) which will be described later on. Thearp
scan feature will be implemented as a feature update in a future release. For our portscan, we’ll just assume that all hosts are alive (ie.none
) and use timeouts instead to reduce the time it takes to perform a portscan. - Lastly, the
[max connections]
option limits how many connections the port scan tool will attempt at any one time. While this is not hard to code, performing synchronous scans is actually already really fast if you set low enough timeouts (which should be fine considering you’d expect low RTT in most internal environments). Nevertheless, this is a good feature that will be added in a future release.
We’ll split our BOF between portscan
(ie. TCP connect scan) and pingscan
(ie. ICMP echo/reply) with each being used the following way:
portscan [targets] [ports]
pingscan [targets]
Parsing ports
Parsing ports is relatively easy, there’s only three (3) options:
- a single port (eg.
22
,445
,80
, etc) - a port range (eg.
1-100
,5000-6000
, etc) - an arbitrary combination of the two above options separated by a comma (eg.
22,5000-9000,445,7
)
To improve user experience, we’ll also allow decreasing port ranges (eg. 900-100
) for bragging rights on Cobalt Strike’s inferior port scanning argument parser.
The pseudo-code for this is simple:
split input on ","
for each part:
if it contains a dash ("-") assume its a range:
get both parts of the range using sscanf
convert each part to an int
check that each part is between 1 and 65535
else assume its a port:
convert str to int
check that the int is between 1 and 65535
Assuming the parts are valid and passed our rudimentary checks, we need to store these ports. A design decision was made to use a linked list to store these as it reduces memory usage when large ranges are provided and easily allows arbitrary combinations to be made. To make things easier, we’ll assume that a single port is the same as the range from that port to itself (ie. "80" == "80-80"
), which allows us to use the following struct:
Parsing targets
Parsing targets is a lot more involved since there are many available options, however we can split valid input options between the following input types:
- IP Range: token contains a dash (
-
). Need to validate that each part of the dash is a valid IP otherwise you might end up matching hosts (eg.asdf-asdf.com
). - CIDR: that one is easy, just match on
/
, parse the IP and the mask and do calculations to find the IP range, which brings us back to type 1 - Host: you can use getaddrinfo to do the work for you here as it will find IPs for hostnames for you.
- IP: similarly to hosts, we can use
getaddrinfo
and validate the IP which allows us to treat single IPs, the same way as single Hosts. - A combination of any of the above input types, split by a comma (
,
)
In the end, you can categorise each target into one of two categories:
- IP Range: IP ranges, and CIDR Subnet (ie. an IP range in disguise)
- IP/Host: Everything else that was parsed correctly by
getaddrinfo
Similarly to port arguments, we’ll use a linked list to store the information as it provides the most flexibility. This results in the following struct definitions:
Important
OPSEC: The getaddrinfo function returns results for the NS_DNS namespace. The getaddrinfo function aggregates all responses if more than one namespace provider returns information. For use with the IPv6 and IPv4 protocol, name resolution can be by the Domain Name System (DNS), a local hosts file, or by other naming mechanisms for the NS_DNS namespace. (ref)
ie. getaddrinfo
will resolve host names to their IPs. This potentially results in network traffic being generating while parsing input. To limit network traffic, we store a pointer to the addrinfo
struct populated after running getaddrinfo
and we’ll reuse it during scanning operations.
Correct parsing results in the following (with reversible IP ranges to assert dominance on Cobalt Strike’s inferior parser):
ICMP (ping scan)
Windows API provides a very easy to use function to send ICMP echo requests, namely IcmpSendEcho. We can use this function to send ICMP echoes to our targets easily:
// The **IcmpSendEcho** function sends an IPv4 ICMP echo request and returns any echo response replies. The call returns when the time-out has expired or the reply buffer is filled.
IPHLPAPI_DLL_LINKAGE DWORD IcmpSendEcho(
[in] HANDLE IcmpHandle,
[in] IPAddr DestinationAddress,
[in] LPVOID RequestData,
[in] WORD RequestSize,
[in, optional] PIP_OPTION_INFORMATION RequestOptions,
[out] LPVOID ReplyBuffer,
[in] DWORD ReplySize,
[in] DWORD Timeout
);
The function takes in a HANDLE IcmpHandle
as its first parameter which denotes a handle returned by the IcmpCreateFile function (ie. a “function which opens a handle on which IPv4 ICMP echo requests can be issued.”). From testing, it appears we can reuse this file allowing us to create the initial handle and reuse it on subsequent calls to IcmpSendEcho
resulting in improved scanning speed.
OPSEC
I wanted to see the network traffic difference between the IcmpSendEcho
default configuration and the traffic generated from the ping.exe
Windows utility. As it turns out, there is quite a bit of a difference:
We notice the following from the screenshot above:
ping.exe
: the ICMP requests and replies from the first four packets (those generated by theping
utility) are all 74 bytes long. The packet’s time to live (ie.ttl
) is set to 128ms.IcmpSendEcho
: the request packets are 46 bytes long and the reply packets are 60 bytes long. The packet’s TTL defaults to 255ms.
By comparing the requests packets, we notice that the ping
utility sends data as part of the request, namely the following string abcdefghijklmnopqrstuvwabdcefghi
.
We can match this in our function call by passing in RequestData
, and adjust the packet’s TTL to 128ms. This results in the following traffic being generated which is now identical between both ping.exe
and our ping scanner:
Note
The ping reply size difference was simply due to the difference in data being sent (since the reply contains the data our packet has sent), hence no further configuration was needed.
With configurations and optimisations via timeout, this results in the following code:
Danger
OPSEC: Other than the improvements described above, the tool does not currently add any delay between ping requests which can result in extremely fast consecutive requests being made which could lead to Denial of Service (DoS) on fragile hosts/environments if you’re not careful. Further customisation options will added in a future release to help improve OPSEC, as well as the ability to configure a custom timeout. The defaults will be adjusted to ping.exe
’s timeouts.
We can combine this with the parser to have a working pingscanner
that supports many input types:
Speed
Surprisingly if you don’t add a delay between the ping requests, it’s actually extremely fast… Who knew…
Without multi-threading or any other optimisations, running it against a /24
subnet (ie. 255 IPs) with low response times, you can scan the whole subnet in about 4 seconds:
Just watch out… If you ping scan too quickly, you get angry neighbours… From ~20ms to over 10 times that for each ping:
From testing, it appears there are two events which can slow down the ping scan:
- A ping request timeout which results in error
11010
akaIP_REQ_TIMED_OUT
. This is possible to optimise by setting the request timeout inIcmpSendEcho
, although there is a trade-off where you may end up with false negatives (ie. the host exists but took longer than timeout time to respond).
- Supplying non existant domains will result in delay from
getaddrinfo
. Unfortunately, it doesn’t have a timeout so you’d need to run it in a thread and time it out yourself if you wanted to optimise this. Or implement DNS requests yourself, but I’ll leave that as an exercise for the reader.
Furthermore,
getaddrinfo
is also extremely slow when provided with invalid input… A stunning 2.4s to say that the host doesn’t exist:
The simplest fix to the above would be to check whether the input is a valid IP or hostname based on a regex, although this is harder than expected because in an internal environment, pretty much anything could be a valid hostname….
OPSEC wise performing a scan this fast really isn’t great and will start flooding the network with too many packets, potentially generating alerts or breaking stuff. But damn it’s fun to see the packets go by at MACH 2 speed!
I’ll add IP/Host validation as an optimisation in the backlog, move on to another project and never implement it. But at least you know its there… So yeah, don’t pass stupid data to it and you should be sweet!
TCP (port scan)
Before attempting to write a port scanner, I looked at the video linked on Cobalt Strike’s website about the Port Scanning functionality. From this, I gathered a few things I wanted to include in this port scanner:
- The port scanner shows all open ports (that’s an obvious one)
- For each port/service it finds, if the service supplies a banner (ie. SSH), then it will receive the banner and display it
- The scanner provides additional information for Windows hosts with SMB open (port 445)
This is the functionality I wanted to replicate with this BOF.
Port Scan Function
Note
This is describing a TCP Connect Scan. If you are interested in other types of scans, checkout nmap’s book on scan methods.
Port scanning in itself is very simple, you have an IP and a port, you connect to an address, if the socket connects successfully, that port is open, if the socket errors that port is closed. Translate to code, this looks something like this (assuming WSA is initialised outside of this function):
Not much going on here but this would also make a terrible port scanner because it would be terribly slow when you try to scan a port/service that doesn’t exist (ie. you’d get timed out but it would take a while). The screenshot below demonstrates this. It’s pretty much instant when the service exists (Elapsed time: 0.00 seconds
) but takes forever when the services do not respond (Elapsed time: 42.00 seconds
):
To improve the speed we can set a timeout when the socket attempts to connect to the remote port/service. On Windows, this results to setting the socket to non-blocking mode and using the select function to wait for the socket to successfully connect or timeout. This allows us to set a socket timeout and greatly increase the speed of the program:
The code above results in an increase of 41 seconds over the old code:
Lastly, since we want to receive the headers from sockets we successfully connect to, we should set a timeout on the socket receive too. We can do this using the setsocketopt function prior to connecting to the socket as such:
This is pretty much it in terms of basic setup and allows us to identify open ports at a reasonable speed. We can now move on to more fun stuff like receive service headers and querying Windows Host information.
Service Headers and Information Discovery
In the video posted on the Cobalt Strike website, they show banner information retrieved when port scanning:
Now, the simple headers like SSH
are simply sent upon connecting to the service. You just need to receive after connecting to the socket. You don’t even need to send any data:
This is trivial to add to our scanner using the following snippet:
Note: In the above snippet, we are setting the socket back to blocking mode, however if we wanted to keep it asynchronous we could use a similar approach as what we did for
connect
, check the WSA Error and use select
to wait for the recv
function to end or timeout.
I knew you could get computer information from SMB such as hostname and domain but I’ve never had to implement it in code so I wasn’t sure where to get it from. The computer information seemed harder to get from SMB and I thought surely you wouldn’t get that from simply connecting to the socket. Spoiler alert, you don’t…
So I went down the rabbit hole and found some documentation on how to get server information on MS Documentation. I started by using NetServerGetInfo to try and retrieve the same information as in the cobalt strike clip… Turns out it was not that…. I didn’t get much information from it but was on the right track:
Did more digging and found another more promising function, namely NetWkstaGetInfo. And as it turns out, it looks like this is what they are using under the hood. I was able to get the exact same information as Cobalt Strike’s port scanner from that one function call:
Note
You can request different levels of information from both NetWkstaGetInfo
and NetServerGetInfo
.
Example for NetWkstaGetInfo
:
- Anonymous access is always permitted for level 100.
- Authenticated users can view additional information at level 101
- Members of the Administrators, and the Server, System and Print Operator local groups can view information at levels 102.
We’re using level 100 here since we want to do this unauthenticated.
All that remained was adding the following code to our scanner:
Final Scanner
After implementing all of the above, we have a scanner that can identify open ports, prints Windows host information and displays service banners:
However, it’s not a BOF yet…
OPSEC
At the moment the tool scans all selected ports on one (1) IP before moving on to the next, however it might be better to do the opposite and scan one (1) port on all IPs and move on to the next port.
An even better solution might be to randomly choose 1 port and 1 IP and scan that, although if you see random ports and IPs popping up in logs than it also might stand out as weird… More research required.
Interestingly, looking at the output from Cobalt Strike’s video demonstrating their port scanner, I noticed two things:
- They are scanning IPs and ports from highest to lowest (probably due to how they are parsing the arguments). You can see the first service to come back is port
5357
on10.10.10.2222
:
And after the video cut, you can see the IPs are decreasing and ports too:
- They perform the
NetWkstaGetInfo
on all hosts with port 445 open at the end of the scan, which is why you see all of the port 445 hosts popup at once:
No idea if their design is better or not. Need to test it inside a lab environment where monitoring tools are in place to gain a better idea. Good follow up research!
Converting to a BOF
A follow up blog will be released highlighting how the scanners were converted to BOFs and will cover the following topics:
- Dynamic Resolution of Win32 APIs and generating function declaration mappings automatically
- Compiling larger BOFs with multiple c files
- Improved BOF output through batching prints
In the meantime, you can enjoy using the tool by downloading it from github: https://github.com/fyxme/portscanbof
Initial release
After converting the application to a BOF and cleaning up the output, we get the following:
The code and usage guide has been released publicly and can be found here: https://github.com/fyxme/portscanbof
The tool may have been update since this blog was written. See the corresponding GitHub repository for up to date information.
Future improvements
Add async/multi-threading support for TCP scan- This has been added since this blog was written. Still need to do more testing but good for now- ARP scan
- Add the ability to check if the host is alive before running a TCP scan (similar to how cobaltstrike does it)
- Additional arguments including timeout, number of threads, only scan 1 ip per host, etc
- Add sleep delay option between ICMP requests to prevent DOS and improve OPSEC (approx 1 second delay from
ping.exe
) - Fix code and Makefile for compiling exe’s
- code refactor, and ensure all failure checks are validated (ie. memory allocation failures, etc)
- UDP scan
- IPV6 Support
- Linux support
References
- https://gist.github.com/dascandy/544acdfdc907051bcaa0b51d6d4a334a
- https://github.com/rvrsh3ll/BOF_Collection/blob/master/Network/PortScan/PortScan.c
- https://github.com/tijme/amd-ryzen-master-driver-v17-exploit
- https://trustedsec.com/blog/bofs-for-script-kiddies
- https://blog.cybershenanigans.space/posts/thanatos-agent
- https://blog.cybershenanigans.space/posts/writing-bofs-without-dfr/
- https://www.elastic.co/security-labs/detonating-beacons-to-illuminate-detection-gaps
- https://beej.us/guide/bgnet/html/split/
- https://tangentsoft.com/wskfaq/
- https://blog.cybershenanigans.space/posts/writing-bofs-without-dfr/#intro
- https://github.com/rvrsh3ll/BOF_Collection/blob/master/Network/PortScan/portscan.cna
- https://github.com/wsummerhill/C2_RedTeam_CheatSheets/blob/main/CobaltStrike/BOF_Collections.md
- https://github.com/phra/PEzor/blob/b4e5927775de49735e22dc4b352b7e45d750cb15/bof.cpp#L151
- https://frn.sh/posts/sockets/