University of Oulu

Comparing the performance of string operations across programming languages

Saved in:
Author: Pelkonen, Niko1
Organizations: 1University of Oulu, Faculty of Information Technology and Electrical Engineering, Department of Information Processing Science, Information Processing Science
Format: ebook
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 2.6 MB)
Pages: 73
Persistent link:
Language: English
Published: Oulu : N. Pelkonen, 2020
Publish Date: 2020-01-20
Thesis type: Master's thesis
Tutor: Vesanen, Ari
Siirtola, Antti
Reviewer: Vesanen, Ari
Siirtola, Antti
Seppänen, Pertti


In this thesis, the performance of string operations are compared across programming languages. Handling strings effectively is important especially when performance is a crucial factor and large string sizes may emerge. Common examples where large string sizes emerge are during digitalization of a product, reading string data from a database, reading and handling large CSV-files and Excel-files, converting file format to another file format (e.g. CSV to Excel and vice versa), and reading and handling a DOM-tree of a website.

There has been a lot of corresponding research where programming languages are benchmarked, but none of them focus directly on string operations. The main goal of this thesis is to fill this gap in literature and try to find out which programming languages have the best results on string operations in terms of execution time and memory (maximum RSS) usage.

The test environment was formed by creating randomly generated string files with sizes varying from ten thousand characters to 100 million characters. The generated characters were ‘a’, ‘b’, and ‘ ‘ (whitespace character). The programming languages selected for this thesis were Python, C, C++, Java, Perl, Ruby, Go, and Swift.

Go seemed to be the most effective language in execution times, although it was not the fastest in many operations. C used very little memory, but only five operations were implemented in it. Every operation was implemented in Python, and it used additional memory to loading the string file in only one operation, which was sorting a string. Swift had quite bad results, and this could be caused by the Linux version of Swift that was used. In regular expressions, Perl and C++ were overwhelmingly effective. Java used the most memory in every operation.

see all

Copyright information: © Niko Pelkonen, 2020. This publication is copyrighted. You may download, display and print it for your own personal use. Commercial use is prohibited.