Benchmark Python PCAP Parsers: dpkt vs scapy vs pyshark and finding 100x Speed Improvements
I’ve been working on a personal project that requires parsing very large war-driving packet capture (pcap) files and I wanted to find the fastest python library for parsing these files. I found a few libraries that provided this functionality and out of those, the most robust and complete ones were the following:
- scapy: the Python-based interactive packet manipulation program & library.
- pyshark: Python wrapper for tshark, allowing python packet parsing using wireshark dissectors.
- dpkt: The dpkt project is a python module for fast, simple packet parsing, with definitions for the basic TCP/IP protocols.
Initial benchmark
The comparison script has the ability to create a sample pcap file using scapy and compares both speed, memory and parsing accuracy. Using a sample size of 100,000 packets we can get pretty good estimations. The results are as follows:
- Performance metrics for 100,000 Packets
| Library | Status | Time (s) | Memory (MB) | Packets |
|---|---|---|---|---|
| scapy | โ OK | 29.866 | 609.5 | 100000 |
| pyshark | โ OK | 59.576 | 2.6 | 100000 |
| dpkt | โ OK | 0.229 | 0.0 | 100000 |
- Extracted data for 100,000 Packets
| Library | Beacons | ProbeReq | ProbeResp | SSIDs | BSSIDs |
|---|---|---|---|---|---|
| scapy | 60000 | 20000 | 20000 | 8 | 256 |
| pyshark | 60000 | 20000 | 20000 | 8 | 256 |
| dpkt | 60000 | 20000 | 20000 | 8 | 256 |
As we can see from the results above, dpkt is way faster than its peers by more than 100x, taking only 0.229 seconds to parse 100,000 packets. The accuracy is the same, they all find and parse all packets properly and accurately which is expected.
Improved memory benchmark
The memory calculations are odd and that’s because the memory calculation is based on the memory of the current process. scapy and dpkt use C-bindings and spawn child processes which are not included in the calculation. pyshark uses tshark under the hood which is its own process too and therefore not in the current calculation. Additionally, the calculation is made before and after the work is done which might miss the actual peak memory usage if the garbage collector has already cleaned-up parts of memory.
We’ll update the script with a small harness that run each library in its own process and includes the memory usage for the child processes to get a more accurate calculation. Again, this shows that dpkt is much more efficient then the alternatives:
- Performance metrics for 100,000 Packets with improved memory profiling
| Library | Status | Time (s) | Memory (MB) | Packets |
|---|---|---|---|---|
| scapy | โ OK | 30.992 | 678.9 | 100000 |
| pyshark | โ OK | 57.997 | 209.0 | 100000 |
| dpkt | โ OK | 0.229 | 19.7 | 100000 |
The final bench-marking script can be found here: https://gist.github.com/fyxme/1dc1662fddfa231f5fa4d2bf519ec93d.
Log scale input pcap sizes parsing comparison
To get a better grasp of the parsing speed differences, I wanted to test with files ranging from 1,000 packets to 1,024,000 packets. Since it takes a while to generate each file, I decided to use a log scale for input sizes such that I don’t need to test every size in between but I still get an accurate estimate. As such, I doubled input every time starting with 1,000 packets, then 2,000, then 4,000, then 8,000, etc… Until I reached a pcap file with 1,024,000 packets.
The following table holds the results of each run:
Note: scapy ran out of memory (OOM) and was killed by the operating sytem (OS) when trying to parse the 1 million packets file. Hence why you wont see a value for it below, nor in the table above.
If we plot the data, we can clearly see that on large datasets, dpkt blows both scapy and pyshark out of the water:
Conclusion
This quick analysis already shows how much different there can be between different libraries/tools we’re using. In this case, dpkt was much more resource efficient then the two other alternatives providing much faster parsing with a much lower memory profile. I will use dpkt for this project which I may release in the future, only time will tell.
We didn’t need to in this case, but if we had to improve the memory profile of one of our python applications I would highly recommend using memray to perform a much deeper analysis and identify improvements. I’ll write a quick tutorial next time I come across an interesting use case.
Furthermore, the benchmark does not take CPU usage in consideration because I’m only interested in speed and memory usage. However, a more in-depth benchmark should include CPU usage. Additionally, the pcap generation should be randomised further such that the Beacons, ProbeReq, ProbeResp, SSIDs and BSSIDs vary more, to see how this affects resource usage. I’ll leave this as an exercise for the reader.
Lastly, remember that its always important to test your tools especially when you’re competing against other tools/people (ie. bug bounties). A lot of tools may work for the job but its always faster to cut a steak with a sharp knife than a blunt fork.
Stay safe out there.
Appendix
Appendix 1: Initial tool output with broken memory calculations
(m) > python3 scripts/compare_pcap_parsers.py sample_wardriving.pcap
Parsing: sample_wardriving.pcap (7.45 MB)
----------------------------------------
Testing scapy... โ
Testing pyshark... โ
Testing dpkt... โ
================================================================================
PCAP PARSER COMPARISON RESULTS
================================================================================
### Performance Metrics
Library Status Time (s) Memory (MB) Packets
------------------------------------------------------------
scapy โ OK 29.866 609.5 100000
pyshark โ OK 59.576 2.6 100000
dpkt โ OK 0.229 0.0 100000
### Extracted Data
Library Beacons ProbeReq ProbeResp SSIDs BSSIDs
-----------------------------------------------------------------
scapy 60000 20000 20000 8 256
pyshark 60000 20000 20000 8 256
dpkt 60000 20000 20000 8 256
### Sample SSIDs (from scapy)
- CoffeeShop_WiFi
- GuestWiFi
- HomeNetwork
- IoT_Network
- OfficeNet
- OpenWiFi
- SecureNet_5G
- TestNetwork
### Summary
Fastest: dpkt (0.229s)
Lowest Memory: dpkt (0.0 MB)
Most BSSIDs: scapy, pyshark, dpkt (256 found)
================================================================================Appendix 2: Improved tool with fixed memory calculations
(m) > python3 scripts/compare_pcap_parsers.py sample_wardriving.pcap
Parsing: sample_wardriving.pcap (7.45 MB)
----------------------------------------
Testing scapy... โ
Testing pyshark... โ
Testing dpkt... โ
================================================================================
PCAP PARSER COMPARISON RESULTS
================================================================================
### Performance Metrics
Library Status Time (s) Memory (MB) Packets
------------------------------------------------------------
scapy โ OK 30.992 678.9 100000
pyshark โ OK 57.997 209.0 100000
dpkt โ OK 0.229 19.7 100000
### Extracted Data
Library Beacons ProbeReq ProbeResp SSIDs BSSIDs
-----------------------------------------------------------------
scapy 60000 20000 20000 8 256
pyshark 60000 20000 20000 8 256
dpkt 60000 20000 20000 8 256
### Sample SSIDs (from scapy)
- CoffeeShop_WiFi
- GuestWiFi
- HomeNetwork
- IoT_Network
- OfficeNet
- OpenWiFi
- SecureNet_5G
- TestNetwork
### Summary
Fastest: dpkt (0.229s)
Lowest Memory: dpkt (19.7 MB)
Most BSSIDs: scapy, pyshark, dpkt (256 found)
================================================================================Appendix 3: Example with sample generation
(m) > python3 scripts/compare_pcap_parsers.py --generate-sample --sample-packets 10000
Generated sample pcap: sample_wardriving.pcap
Packets: 10000
Unique SSIDs: 8
Parsing: sample_wardriving.pcap (0.75 MB)
----------------------------------------
Testing scapy... โ
Testing pyshark... โ
Testing dpkt... โ
================================================================================
PCAP PARSER COMPARISON RESULTS
================================================================================
### Performance Metrics
Library Status Time (s) Memory (MB) Packets
------------------------------------------------------------
scapy โ OK 3.302 2.5 10000
pyshark โ OK 6.065 0.6 10000
dpkt โ OK 0.022 0.3 10000
### Extracted Data
Library Beacons ProbeReq ProbeResp SSIDs BSSIDs
-----------------------------------------------------------------
scapy 6000 2000 2000 8 256
pyshark 6000 2000 2000 8 256
dpkt 6000 2000 2000 8 256
### Sample SSIDs (from scapy)
- CoffeeShop_WiFi
- GuestWiFi
- HomeNetwork
- IoT_Network
- OfficeNet
- OpenWiFi
- SecureNet_5G
- TestNetwork
### Summary
Fastest: dpkt (0.022s)
Lowest Memory: dpkt (0.3 MB)
Most BSSIDs: scapy, pyshark, dpkt (256 found)
================================================================================