-
Notifications
You must be signed in to change notification settings - Fork 1
Benchmarks
The performance of an external text file sort depends on multiple factors - number of lines, length of lines and available computational resources - CPU, memory, type of disk. Lines cannot be treated as records, and so when a line has to be moved to another location in the file we cannot just swap it with another line but have to move huge amounts of data. Another limiting factor is that the external sort contains at least two sequential parts - split work to concurrent tasks and merge the result. It is not obvious that the concurrent part is of benefit at all, as we have to merge all the data into the result file in the end.
The goal of this benchmark is to verify that there are minimal anomalies when increasing the file size of the file and that adding CPU resources improves the performance. The basic unit work is a text line of approximately 50 bytes so the size of a file of 1 million lines is about 50 MB. We run the sort on files of increasing length and evaluate the growth in sort duration and the scalability to additional CPU.
The benchmark was run on a MacBook Pro, without reaching memory or CPU limits. See system details below.
System Software Overview:
System Version: macOS 12.2.1 (21D62)
Kernel Version: Darwin 21.3.0
Boot Volume: Macintosh HD
Boot Mode: Normal
Computer Name: *****
User Name: *****
Secure Virtual Memory: Enabled
System Integrity Protection: Enabled
Time since boot: 55 days 21:54
Hardware Overview:
Model Name: MacBook Pro
Model Identifier: MacBookPro18,2
Chip: Apple M1 Max
Total Number of Cores: 10 (8 performance and 2 efficiency)
Memory: 32 GB
From charts below we can see that when the input is read in large chunks, in our case 100 MB for files ranging from 50 MB to 10 GB, there is a significant decrease in sorting duration from adding CPU tasks to sorting work. However, this is not proportional to the number of tasks added. For example using 8 tasks for a 10 GB file is only 2.5 times faster than using 1 task. This is because a significant part of time is spent reading and writing all the content for purpose of merge. We can also see that merging intermediate results concurrently before merging them int the final result is also reducing the sort time.
Configured to read the input in chunks of 10 MB.
6fe77)
Configured to read the input in chunks of 100 MB.
Configured to read the input in chunks of 100 MB.