Saturday, 16 December 2017

SPO600 - Project - Stage 1

For the SPO600 project we are supposed to find an Open Source Software Project and identify a Hash or Checksum Function in its source code.
I have come across and picked the open source project named Hashids.
From its official website:
"Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.
It converts numbers like 347 into strings like “yr8”, or array of numbers like [27, 986] into “3kTMd”.
You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs."
Hashids is relatively a small-sized project that offers hashing as a method of managing IDs (for example in websites like YouTube that hash their videos' ID's).

Benchmarking:
For the purpose of benchmarking, I have created a simple tester that will encode numbers in 2 different testing methods:
For the first method: 11 different numbers will be encoded manually and then timed together.
The code to trigger the encode() function and run the software is:
clock_t begin = clock();
std::cout << hash.encode({123}) << std::endl;
std::cout << hash.encode({123456}) << std::endl;
....
....
std::cout << "Time: " << (std::clock() - begin) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
These are the results that I got:
The run-times are at best 0.106ms and at worst 0.130ms for 6 tests.

For the second method: Run a for loop that will encode 5,000,000 values.
The code to trigger the encode() function and run the software is:
int i;
clock_t begin = clock();
for (i = 0; i < 5000000; i++)
       hash.encode({i});

std::cout << "Last value (i) is: " << i << std::endl;
std::cout << "Time: " << (std::clock() - begin) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
 These are the results that I got:
The run-times are at best 3730.58ms and at worst 3922.57ms for 7 tests.

Optimisation:
On first glance, I can already spot that the Makefile is using a -O2 flag. Given the minimal use of resources that software uses for operation, the time differences might be small, so we will be using milliseconds to benchmark the current performance times.
In addition to the flag, I will be focusing on the function named encode() that is inside "hashids.h" and will attempt to adjust the foreign function calling processes to make the function more independent. Considering the fact the code is almost entirely designed to be using vectors, it is already well optimised but can hopefully still be made faster with these updates and algorithm improvements.

In stage 2 of the project I will be implementing the changes discussed in this post and will report back with results of the outcome.

No comments:

Post a Comment