Planet CDOT

December 17, 2017

Chun Sing Lam

SPO600 Project – Stage 1 Summary

The first step of stage one of this project is to find an open source software that implements a hash function. It is very difficult to find a software that uses a hash function because it usually does not tell you that it uses a hash function, so you need to figure that out by looking into the source code. Once I find a software with a hash function, my next step is to figure out a way to benchmark the performance of the hash function. It took me a while to create the script to call the three hash functions for benchmarking because I need to deal with a lot of variables. My software OlegDB has optimized the hash functions for x86 and x64 systems, so there are three hash function versions that I need to deal with. I decide to benchmark all three of them so that I can use them in stage two of this project. The hash functions are already optimized, so there is a lower chance that I will find an optimization opportunity. Therefore, working with three hash functions will increase my chance of successfully optimizing a hash function. I look forward to stage two – the implementation stage!

by chunsinglam at December 17, 2017 05:02 AM

Marco Beltempo

Optimization Google’s Compression Algorithm (SPO600 Stage 1)

The final project for my software portability and optimization class consists of two stages. The first stage involves  finding an open source software package that includes hashing functionaluity written in a language that compels to machine code (C, C++, Assembler).

Still learning the ropes of the open source community I figured this was a great opportunity to use my experiences from past projects to my advantage. Boy do I wish it was that easy.  The difference this time around was that we weren’t necessarily looking to tackle an existing bug or a particular project. It was up to use to dig through the source code looking for some sort of hashing functionality.

I figured the best place to start searching was GitHubs Trending page for popular C/C++ projects. Although there are hundreds of  great C/C++ projects written in C/C++ I almost forgot how overwhelming new source code can be. It’s like entering a foreign land where you speak the language but a different dialect.

Projects can contain hundreds of thousands of line of code. Understanding it’s entire functionality is next to impossible. Searching for file name son GitHub is relatively simple but this will also return third part files and outside sources. I had started to download various packages to perform a project wide search for any reference to hashing algorithms.

I spent hours digging through endless repos and directories trying to find a great function I could optimize for this assignment.

Then I stumbled upon this project from Google and new this would be a great fit.

The project is named Zopfli, a compression algorithm library programed in C designed with performance over speed. Zopfli will compress to deflate, gzip and zlib output format, but it won’t decompress.
Existing gzip or deflate libraries can decompress the data.

Zopfli Compression Algorithm is a compression library programmed in C to perform very good, but slow, deflate or zlib compression.

What’s great about this project is it’s not overly huge and theres a variety of different hashing


For this project I will be focussing on ZopfliResetHash function which compares the current hash state and determines if it should be reset.

void ZopfliResetHash(size_t window_size, ZopfliHash* h) {
size_t i;

h->val = 0;
for (i = 0; i < 65536; i++) { h->head[i] = -1; /* -1 indicates no head so far. */
for (i = 0; i < window_size; i++) { h->prev[i] = i; /* If prev[j] == j, then prev[j] is uninitialized. */
h->hashval[i] = -1;

for (i = 0; i < window_size; i++) { h->same[i] = 0;

h->val2 = 0;
for (i = 0; i < 65536; i++) { h->head2[i] = -1;
for (i = 0; i < window_size; i++) { h->prev2[i] = i;
h->hashval2[i] = -1;


In term of performance benchmarking for Zopfli I plan on isolating a few key features. I’ll be focusing re-structuring the ZopfliResetHash  function for better optimization during build time and execution.

As of right now to executing  time make command yields the following results:

  • real: Elapsed real (wall clock) time used by the process, in seconds.
  • user: Total number of CPU-seconds that the process used directly (in user mode), in seconds.
  • sys: Total number of CPU-seconds used by the system on behalf of the process (in kernel mode), in seconds.

real 0m2.613s user 0m2.322s sys 0m0.286s

To isolate the function form the rest of the program, Ive wrapped it in a simple time function that times the execution of the function.


clock_t start, end; start = clock(); ZopfliResetHash end = clock(); printf( "Number of seconds: %f\n", (end-start)/(double)CLOCKS_PER_SEC );]

Wrapping the function reset hash executes a consistent result of:

Number of seconds: 0.00231


After taking sometime to analyze the functionality of the hash.c file I’ve started to assemble a strategy to optimize the hash functionality. I will work on implementing altered build options in the makefile and improve loop algorithms to trigger compiler optimizations

The post Optimization Google’s Compression Algorithm (SPO600 Stage 1) appeared first on Marco Beltempo.

by Marco Beltempo at December 17, 2017 04:59 AM

Saeed Mohiti

Final Project Stage 1 #2

I downloaded the file from and extract it to the system.

I ran the commands which provided on github:

$ sh
$ ./configure
$ make

But unfortunately I got an error when I wanted to ./configure; the error was:

checking build system type… /bin/sh: ./config.guess: No such file or directory

configure: error: cannot guess build type; you must specify one


so I tried to figureit out what is confige.guess and how can I update that.

I found the latest version on the

$ wget -O config.guess ';a=blob_plain;f=config.guess;hb=HEAD'
$ wget -O config.sub ';a=blob_plain;f=config.sub;hb=HEAD'

And after that the package perfectly installed and compiled.

Right now, is time to benchmark, but before that I need to make few files with different size, in order to do this task, I ran the following command, which can create a file with random data and a size of approximately 110mb.

dd if=/dev/random of=sam110m bs=1M count=110 iflag=fullblock

by the following command we can run the program to see the time of completion

time md5deep sam110m

In addition, to get very accurate benchmark we need to run the test file for lot of time and calculate the average

So, running the time command for 20 times or more is hard, I decided to write this command which can help me to run the time command for specific times:

time for i in {1..10}; do md5deep sam110m; done

I goth this result after running the time command for 10 times:


real    0m19.818s

user    0m17.109s

sys     0m3.073s

this program is fast for small file, and it seems this file get hashed quickly, so I tried to make another file with 500MB and I tested for 10 times, then I got this result:

real    0m15.028s

user    0m12.143s

sys     0m2.744s

we can see the difference I still can say its fast for this size of file

the next step is to focus on area for optimization, after looking at source code finding the function that acting itself and separately, was hard, but luckily after calling different function I reached to this:


void sha1_update( sha1_context *ctx, const unsigned char *input, size_t ilen )
size_t fill;
unsigned long left;

if( ilen <= 0 )

left = ctx->total[0] & 0x3F;
fill = 64 – left;

ctx->total[0] += (unsigned long) ilen;
ctx->total[0] &= 0xFFFFFFFF;

if( ctx->total[0] < (unsigned long) ilen )

if( left && ilen >= fill )
memcpy( (void *) (ctx->buffer + left),
(const void *) input, fill );
sha1_process( ctx, ctx->buffer );
input += fill;
ilen  -= fill;
left = 0;

while( ilen >= 64 )
sha1_process( ctx, input );
input += 64;
ilen  -= 64;

if( ilen > 0 )
memcpy( (void *) (ctx->buffer + left),
(const void *) input, ilen );


it seems there is no possible way to optimize this function. Oh frustration after taking while to find the function. Next post I will look into the compiler flags since they use -02, I might be able to replace it with -03.


by msmohiti at December 17, 2017 04:58 AM

Chun Sing Lam

SPO600 Project – Stage 1

Benchmark  the performance of the hash function

As mentioned in my previous post, my next step is to benchmark the performance of the Murmur3 hash function in OlegDB. On AArchie (AArch64 system), after I downloaded the OlegDB tarball and extracted it using the command “tar -xvzf;, I found the Murmur3 hash function in the file called murmur3.c. Basically, the entire file is specific to the hash function implementation. In fact, the file contains three Murmur3 hash functions, which are optimized for x86 and x64 platforms. The first function is optimized for a 32-bit machine and it produces a 32-bit output. The second function is also optimized for a 32-bit machine but it produces a 128-bit output. The GitHub website for Murmur3 hash states that the second function takes about 86% more time to run than the first function.  The third function is optimized for a 64-bit machine and it produces a 128-bit output.

The hash functions accept input from arguments and are not called from within the murmur3.c script, so I cannot simply compile and run the murmur3.c script and set a timer for each function to determine the time that it takes for each function to execute. Therefore, I need to benchmark the performance of the hash functions by creating and running a script that will call the hash functions by supplying them with the necessary arguments and then setting a timer to calculate the time that it takes for each function to execute. Before I do this, I look at the Makefile file, which is used to compile and build the software. The contents show that the entire OlegDB software is compiled using gcc and using the -02 compiler option and some warning options. In the same directory, I come across the file CONTRIBUTORS and it shows a contributor with a GitHub website with information about Murmur3 implementation. I access the website and I find a file called example.c, which is a sample program. When a user runs the program, the user needs to provide a string as the first argument, which will later be converted into a hash. The program will use that string along with other defined values as arguments to call the three hash functions to compute the hash value for that string and display it on the screen. Each hash function will output a different hash value.

It will make my life easier to use this sample program for benchmarking. I create a script called benchmark.c in the same directory as the header file murmur3.h. I copy all the code in example.c and murmur3.c into benchmark.c. There are only a few changes that I need to make in benchmark.c in order to use it for benchmarking. Since example.c executes each hash function only once, I need to make a change to have the functions execute many times in order to evaluate the performance accurately. I generate random characters to be used as sample data that will be converted to hashes. I predefine the total number of function executions and then I add a “for” loop to each function to have each function execute the predefined number of times. Lastly, I add a “clock” function to set the timer right before and after each hash function to determine the total amount of time that is used to execute each function. Here is the code from benchmark.c:

/* Murmur3 benchmarking */

#include <stdio.h>

#include <stdlib.h>

#include <stdint.h>

#include <string.h>

#include <time.h>

#include "murmur3.h"

#define SIZE 100000


* Platform-specific functions and macros


#ifdef __GNUC__

#define FORCE_INLINE __attribute__((always_inline)) inline




static FORCE_INLINE uint32_t rotl32 ( uint32_t x, int8_t r )


  return (x << r) | (x >> (32 - r));


static FORCE_INLINE uint64_t rotl64 ( uint64_t x, int8_t r )


  return (x << r) | (x >> (64 - r));


#define            ROTL32(x,y)    rotl32(x,y)

#define ROTL64(x,y)   rotl64(x,y)

#define BIG_CONSTANT(x) (x##LLU)


* Block read - if your platform needs to do endian-swapping or can only

* handle aligned reads, do the conversion here


#define getblock(p, i) (p[i])


* Finalization mix - force all bits of a hash block to avalanche


static FORCE_INLINE uint32_t fmix32 ( uint32_t h )


  h ^= h >> 16;

  h *= 0x85ebca6b;

  h ^= h >> 13;

  h *= 0xc2b2ae35;

  h ^= h >> 16;

  return h;


/* ---------- */

static FORCE_INLINE uint64_t fmix64 ( uint64_t k )


  k ^= k >> 33;

  k *= BIG_CONSTANT(0xff51afd7ed558ccd);

  k ^= k >> 33;

  k *= BIG_CONSTANT(0xc4ceb9fe1a85ec53);

  k ^= k >> 33;

  return k;



int main() {

  uint32_t hash[4];

  uint32_t seed = 40;

  int i;

  clock_t startTime1, startTime2, startTime3;

  clock_t endTime1, endTime2, endTime3;

  double totalTime1, totalTime2, totalTime3;

  char *samples = (char*)calloc(SIZE, sizeof(char));


  // Generate random sample of characters

  for (i = 0; i < SIZE; i++) {

            samples[i] = (random() % 26) + 'A';


  startTime1 = clock();  // Start time

  for (i = 0; i < SIZE; i++) {

  MurmurHash3_x86_32(samples, strlen(samples), seed, hash);


  endTime1 = clock();  // End time

  totalTime1 = (double)(endTime1 - startTime1) / CLOCKS_PER_SEC;          // Total execution time

  printf("x86_32:  %08x\n", hash[0]);

  printf("Total time: %lf seconds\n", totalTime1);

  startTime2 = clock(); // Start time

  for (i = 0; i < SIZE; i++) {

  MurmurHash3_x86_128(samples, strlen(samples), seed, hash);


  endTime2 = clock();  // End time

  totalTime2 = (double)(endTime2 - startTime2) / CLOCKS_PER_SEC;          // Total execution time

  printf("x86_128: %08x %08x %08x %08x\n",

         hash[0], hash[1], hash[2], hash[3]);

  printf("Total time: %lf seconds\n", totalTime2);

  startTime3 = clock(); // Start time

  for (i = 0; i < SIZE; i++) {

  MurmurHash3_x64_128(samples, strlen(samples), seed, hash);


  endTime3 = clock();  // End time

  totalTime3 = (double)(endTime3 - startTime3) / CLOCKS_PER_SEC;          // Total execution time

  printf("x64_128: %08x %08x %08x %08x\n",

         hash[0], hash[1], hash[2], hash[3]);

  printf("Total time: %lf seconds\n", totalTime3);

  return 0;


void MurmurHash3_x86_32 ( const void * key, int len,

                          uint32_t seed, void * out )


  const uint8_t * data = (const uint8_t*)key;

  const int nblocks = len / 4;

  int i;

  uint32_t h1 = seed;

  uint32_t c1 = 0xcc9e2d51;

  uint32_t c2 = 0x1b873593;


  * body


  const uint32_t * blocks = (const uint32_t *)(data + nblocks*4);

  for(i = -nblocks; i; i++)


    uint32_t k1 = getblock(blocks,i);

    k1 *= c1;

    k1 = ROTL32(k1,15);

    k1 *= c2;

    h1 ^= k1;

    h1 = ROTL32(h1,13);

    h1 = h1*5+0xe6546b64;



  * tail


  const uint8_t * tail = (const uint8_t*)(data + nblocks*4);

  uint32_t k1 = 0;

  switch(len & 3)


  case 3: k1 ^= tail[2] << 16;

  case 2: k1 ^= tail[1] << 8;

  case 1: k1 ^= tail[0];

          k1 *= c1; k1 = ROTL32(k1,15); k1 *= c2; h1 ^= k1;


  /* finalization */

  h1 ^= len;

  h1 = fmix32(h1);

  *(uint32_t*)out = h1;


void MurmurHash3_x86_128 ( const void * key, const int len,

                           uint32_t seed, void * out )


  const uint8_t * data = (const uint8_t*)key;

  const int nblocks = len / 16;

  int i;

  uint32_t h1 = seed;

  uint32_t h2 = seed;

  uint32_t h3 = seed;

  uint32_t h4 = seed;

  uint32_t c1 = 0x239b961b;

  uint32_t c2 = 0xab0e9789;

  uint32_t c3 = 0x38b34ae5;

  uint32_t c4 = 0xa1e38b93;

  /* body */

  const uint32_t * blocks = (const uint32_t *)(data + nblocks*16);

  for(i = -nblocks; i; i++)


    uint32_t k1 = getblock(blocks,i*4+0);

    uint32_t k2 = getblock(blocks,i*4+1);

    uint32_t k3 = getblock(blocks,i*4+2);

    uint32_t k4 = getblock(blocks,i*4+3);

    k1 *= c1; k1  = ROTL32(k1,15); k1 *= c2; h1 ^= k1;

    h1 = ROTL32(h1,19); h1 += h2; h1 = h1*5+0x561ccd1b;

    k2 *= c2; k2  = ROTL32(k2,16); k2 *= c3; h2 ^= k2;

    h2 = ROTL32(h2,17); h2 += h3; h2 = h2*5+0x0bcaa747;

    k3 *= c3; k3  = ROTL32(k3,17); k3 *= c4; h3 ^= k3;

    h3 = ROTL32(h3,15); h3 += h4; h3 = h3*5+0x96cd1c35;

    k4 *= c4; k4  = ROTL32(k4,18); k4 *= c1; h4 ^= k4;

    h4 = ROTL32(h4,13); h4 += h1; h4 = h4*5+0x32ac3b17;


  /* tail */

  const uint8_t * tail = (const uint8_t*)(data + nblocks*16);

  uint32_t k1 = 0;

  uint32_t k2 = 0;

  uint32_t k3 = 0;

  uint32_t k4 = 0;

  switch(len & 15)


  case 15: k4 ^= tail[14] << 16;

  case 14: k4 ^= tail[13] << 8;

  case 13: k4 ^= tail[12] << 0;

           k4 *= c4; k4  = ROTL32(k4,18); k4 *= c1; h4 ^= k4;

  case 12: k3 ^= tail[11] << 24;

  case 11: k3 ^= tail[10] << 16;

  case 10: k3 ^= tail[ 9] << 8;

  case  9: k3 ^= tail[ 8] << 0;

           k3 *= c3; k3  = ROTL32(k3,17); k3 *= c4; h3 ^= k3;

  case  8: k2 ^= tail[ 7] << 24;

  case  7: k2 ^= tail[ 6] << 16;

  case  6: k2 ^= tail[ 5] << 8;

  case  5: k2 ^= tail[ 4] << 0;

           k2 *= c2; k2  = ROTL32(k2,16); k2 *= c3; h2 ^= k2;

  case  4: k1 ^= tail[ 3] << 24;

  case  3: k1 ^= tail[ 2] << 16;

  case  2: k1 ^= tail[ 1] << 8;

  case  1: k1 ^= tail[ 0] << 0;

           k1 *= c1; k1  = ROTL32(k1,15); k1 *= c2; h1 ^= k1;


  h1 ^= len; h2 ^= len; h3 ^= len; h4 ^= len;

  h1 += h2; h1 += h3; h1 += h4;

  h2 += h1; h3 += h1; h4 += h1;

  h1 = fmix32(h1);

  h2 = fmix32(h2);

  h3 = fmix32(h3);

  h4 = fmix32(h4);

  h1 += h2; h1 += h3; h1 += h4;

  h2 += h1; h3 += h1; h4 += h1;

  ((uint32_t*)out)[0] = h1;

  ((uint32_t*)out)[1] = h2;

  ((uint32_t*)out)[2] = h3;

  ((uint32_t*)out)[3] = h4;


void MurmurHash3_x64_128 ( const void * key, const int len,

                           const uint32_t seed, void * out )


  const uint8_t * data = (const uint8_t*)key;

  const int nblocks = len / 16;

  int i;

  uint64_t h1 = seed;

  uint64_t h2 = seed;

  uint64_t c1 = BIG_CONSTANT(0x87c37b91114253d5);

  uint64_t c2 = BIG_CONSTANT(0x4cf5ad432745937f);

  const uint64_t * blocks = (const uint64_t *)(data);

  for(i = 0; i < nblocks; i++)


    uint64_t k1 = getblock(blocks,i*2+0);

    uint64_t k2 = getblock(blocks,i*2+1);

    k1 *= c1; k1  = ROTL64(k1,31); k1 *= c2; h1 ^= k1;

    h1 = ROTL64(h1,27); h1 += h2; h1 = h1*5+0x52dce729;

    k2 *= c2; k2  = ROTL64(k2,33); k2 *= c1; h2 ^= k2;

    h2 = ROTL64(h2,31); h2 += h1; h2 = h2*5+0x38495ab5;


  const uint8_t * tail = (const uint8_t*)(data + nblocks*16);

  uint64_t k1 = 0;

  uint64_t k2 = 0;

  switch(len & 15)


  case 15: k2 ^= (uint64_t)(tail[14]) << 48;

  case 14: k2 ^= (uint64_t)(tail[13]) << 40;

  case 13: k2 ^= (uint64_t)(tail[12]) << 32;

  case 12: k2 ^= (uint64_t)(tail[11]) << 24;

  case 11: k2 ^= (uint64_t)(tail[10]) << 16;

  case 10: k2 ^= (uint64_t)(tail[ 9]) << 8;

  case  9: k2 ^= (uint64_t)(tail[ 8]) << 0;

           k2 *= c2; k2  = ROTL64(k2,33); k2 *= c1; h2 ^= k2;

  case  8: k1 ^= (uint64_t)(tail[ 7]) << 56;

  case  7: k1 ^= (uint64_t)(tail[ 6]) << 48;

  case  6: k1 ^= (uint64_t)(tail[ 5]) << 40;

  case  5: k1 ^= (uint64_t)(tail[ 4]) << 32;

  case  4: k1 ^= (uint64_t)(tail[ 3]) << 24;

  case  3: k1 ^= (uint64_t)(tail[ 2]) << 16;

  case  2: k1 ^= (uint64_t)(tail[ 1]) << 8;

  case  1: k1 ^= (uint64_t)(tail[ 0]) << 0;

           k1 *= c1; k1  = ROTL64(k1,31); k1 *= c2; h1 ^= k1;


  h1 ^= len; h2 ^= len;

  h1 += h2;

  h2 += h1;

  h1 = fmix64(h1);

  h2 = fmix64(h2);

  h1 += h2;

  h2 += h1;

  ((uint64_t*)out)[0] = h1;

  ((uint64_t*)out)[1] = h2;


Before I run the program, I check who is logged in to AArchie by using the “who” command. There are two users logged in, so it should not affect my program performance. I use the “top” command to view running processes and I only see my processes, so that is good. Now, I compile this program using the same options as in the Makefile by using the command “gcc -Wall -Werror -g3 -O2 -Wstrict-aliasing=2 benchmark.c -o benchmark”. I will run this program ten times and get the average time to produce more accurate results. I run this program and here are the results:

[cslam4@aarchie include]$ ./benchmark

x86_32:  a005988f

Total time: 7.189417 seconds

x86_128: ead5c42f 18504ee9 0f36d6f8 8a268189

Total time: 6.251050 seconds

x64_128: fafd75f5 b84021d9 157f69c4 c52090ae

Total time: 5.295304 seconds

[cslam4@aarchie include]$ ./benchmark

x86_32:  a005988f

Total time: 7.191669 seconds

x86_128: ead5c42f 18504ee9 0f36d6f8 8a268189

Total time: 6.254057 seconds

x64_128: fafd75f5 b84021d9 157f69c4 c52090ae

Total time: 5.289192 seconds

[cslam4@aarchie include]$ ./benchmark

x86_32:  a005988f

Total time: 7.189152 seconds

x86_128: ead5c42f 18504ee9 0f36d6f8 8a268189

Total time: 6.252524 seconds

x64_128: fafd75f5 b84021d9 157f69c4 c52090ae

Total time: 5.289142 seconds

[cslam4@aarchie include]$ ./benchmark

x86_32:  a005988f

Total time: 7.190567 seconds

x86_128: ead5c42f 18504ee9 0f36d6f8 8a268189

Total time: 6.253532 seconds

x64_128: fafd75f5 b84021d9 157f69c4 c52090ae

Total time: 5.291871 seconds

[cslam4@aarchie include]$ ./benchmark

x86_32:  a005988f

Total time: 7.188331 seconds

x86_128: ead5c42f 18504ee9 0f36d6f8 8a268189

Total time: 6.249359 seconds

x64_128: fafd75f5 b84021d9 157f69c4 c52090ae

Total time: 5.291187 seconds

I only ran this program five times since it was producing very consistent results. The three hash functions produce different hashes due to different hash lengths and code that is optimized for a specific platform. However, it is good to see that the same hash has been produced for each function individually for all five runs. The average execution time for the first hash function is 7.1898 seconds. The average execution time for the second hash function is 6.2521 seconds. The average execution time for the third hash function is 5.2913 seconds.

Identify strategies to optimize hash function

After benchmarking the performance of the current hash functions, the next step is to identify strategies that I can use to optimize the current hash functions. I will start off with the easiest method, which is to alter the current build options for OlegDB. Currently, -O2 option is the only compiler option used to compile and build OlegDB. I will try to use other options such as -O3 option to enable more optimization flags or -Ofast option to enable all -O3 optimizations and optimizations that do not follow strict standards. I will also look at individual optimization flags and other options and perform tests to find out if there is a possible improvement in the performance of the hash functions. After looking at the build options, I will analyze the hash function code to search for ways to change the code so that the compiler can better optimize the function. This can include creating loops that can guide the compiler to allow for vectorization and the use of SIMD operations. It can also be changing code by replacing an expensive operation with a cheaper one or changing a loop iteration to make the loop run more efficiently. The third approach is to improve on existing algorithms. This will most likely not happen since the hash function implementation is already optimized for x86 or x64 systems, bit-shift operation is being used and other operations seem to be using the best algorithm possible. The last method is to use inline assembly language, which is also unlikely to happen since writing assembly code is difficult and I do not have enough knowledge to understand the complexity of inline assembly code. As a final note, I have been working with three hash functions instead of one because the OlegDB source code has already optimized the hash function implementation for x86 and x64 systems. The three hash functions are integrated into one file, so it is easier to leave it alone than to extract one specific function for optimization. Also, I can work with all three hash functions to increase my chance of finding optimization opportunities. Now that stage one of this project is complete, I can move on to stage two, which is to implement optimizations to the hash function.

by chunsinglam at December 17, 2017 04:49 AM

Matthew Welke

Beginning my Open Source Contribution (SPO Final Project – Phase 1)

My final project for my SPO600 class is to find a hashing or checksum algorithm in popular open source software and improve it using the optimization techniques I learned in class. We were given the example of MurmurHash, a non-cryptographic hashing function. However, I wasn’t able to find many uses of this online. Where I did find it, it already seemed to be fully optimized.

I decided to think critically about what role hashing plays in computer science. I remembered learning about MongoDB and I liked how it generated the GUIDs for documents as they’re inserted. Each was guaranteed to be unique. I thought that sounded like hashing. So on that topic, I thought about any other database or database-like software that might have this, and a few others came to mind: PostgreSQL, MySQL, Redis. I thought that Redis would be the least likely to have received enough widespread attention that it would already be optimized. So, I checked out its source code.

What I found was pretty interesting. Hashing definitely played a role. I found a hashing function that looks like it has to do with deciding on the key to use as objects are inserted into a hash table data structure in memory. Luckily, the creator of the software and any maintainers wrote very clear code, and it was well-commented, which helped me quickly understand that this might work for my project. The source code file in question is here and this is the function I’ll be studying:

/* Generic hash function (a popular one from Bernstein).
 * I tested a few and this was the best. */
static unsigned int dictGenHashFunction(const unsigned char *buf, int len) {
    unsigned int hash = 5381;

    while (len--)
        hash = ((hash << 5) + hash) + (*buf++); /* hash * 33 + c */
    return hash;

I can see that this function is important in the software because it is called in many places in the source code repository including in the files ‘src/server.c’ (called 6 times), ‘src/dict.c’ (called once), ‘src/latency.c’ (called once), ‘deps/hiredis/async.c’ (called once), and ‘utils/hashtable/rehashing.c’ (called one).

Another reason I think it’s a candidate for my project is the nature of the code. They don’t appear to be taking advantage of any optimization techniques, such as loop unrolling or using inline assembly. I will be investigating whether or not these techniques will help. I can also investigate what compiler flags the function is compiled with. If I learn that it was not compiled with “03” (for maximum optimization at the expense of a larger binary file size), then this will be one of the first things I do to optimize it.


In order to establish a baseline as I begin to improve the code, it’s a good idea to benchmark it before I make changes. Since the class focuses on the AArch64 architecture, I’ll be building it on our school’s AArch64 system and will be running my future changes there too. I have a Raspberri Pi 2 (a newer model, also with the AArch64 architecture), so I might test it on there too just for the sake of comparison. I might identify trends that help me understand my changes as I go.

I created the following program to benchmark this function. Note that the timing code only times the hash generation, not the setup portion where I prepare the random data for the test:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define NUM_TESTS 10000000
#define STRING_LENGTH 50

static unsigned int dictGenHashFunction(const unsigned char *buf, int len) {
    unsigned int hash = 5381;

    while (len--)
        hash = ((hash << 5) + hash) + (*buf++); /* hash * 33 + c */
    return hash;

int main() {

    printf("Generating data for test.\n");

    // Generate NUM_TEST random strings of length STRING_LENGTH
    char** testStrings = (char**) malloc(NUM_TESTS * sizeof(char*));

    for (int i = 0; i < NUM_TESTS; i++) {
        testStrings[i] = (char*) malloc(STRING_LENGTH * sizeof(char*));

        // Fill up new string with random letters
        int j;
        for (j = 0; j < STRING_LENGTH - 1; j++) {
            testStrings[i][j] = (char) rand() % 255;
        // End with null terminator
        testStrings[i][j] = '\0';

    printf("Beginning test.\n");

    clock_t begin = clock();

    for (int i = 0; i < NUM_TESTS; i++) {
        unsigned int hash = dictGenHashFunction(testStrings[i], 42);

    clock_t end = clock();

    double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;

    printf("Time spent: %.3f\n", time_spent);

    return 0;

This test results in the following results on the school’s AArch64 machine:

gcc source.c && ./a.out
Generating data for test.
Beginning test.
Time spent: 2.145

gcc source.c && ./a.out
Generating data for test.
Beginning test.
Time spent: 2.134

gcc source.c && ./a.out
Generating data for test.
Beginning test.
Time spent: 2.150


This means I now have a good base to work on Phase 2 of the project. I’ll be able to measure the improvements I make. I’ll be posting more as I begin this work!

by Matt at December 17, 2017 04:45 AM

Aaron Brooks Patola

Project (Stage 1)

For the SPO600 project we are to find an open source software package that includes a hash function (Murmurhash, SHA, etc) that is implemented in a language that compiles to machine code (C, C++, or Assembler). From there we will benchmark performance on AArch64 systems and then identify one or several potential optimizations methods that we will attempt to implement in Stage 2 of the project.

After spending some time searching to not much avail for a strong hash function candidate I reverted back to a program I looked into at the start of the semester. My blog post post a couple months back was focused on building an open source software package which was the gnugo game, and after looking into its source code again I found that the game engine uses a hash function. This is what I will be looking into further from now on, unless I find a stronger candidate that my be easier to test optimization with.

The gnugo game engine uses a hashing method called Zobrist hashing which is used in computer programs that play board games such as Chess and Go, to implement transposition tables, which is a special hash table that is indexed by a board position and used to avoid analyzing the same position more than once.

Here is a look at how the code is implemented within the hash.c file of the game engine…

***Zobrist hashing starts by randomly generating bit strings for each possible element of a board game***

Screen Shot 2017-12-16 at 11.25.23 PM

And here we can see how it calculates a transformation invariant hash value…

Screen Shot 2017-12-16 at 11.37.30 PM.png

For those curious to look more into the code of the game itself it can be found here.

At the moment I will not be running any performance/optimization tests as I am still pondering the most appropriate way to go about this for efficient results.

For a method to optimize the current implementation for AArch64 systems, with the advice of my professor, was to possibly use table lookup instructions (TBL and TBX) to accelerate some parts of this hash. These instructions will look up single-byte values in the range of 0-127 in a table, fetching the results for up to 16 input values with a single instruction (0-63) or two instructions (0-127).

Over the course of the next week I will be looking more into this hashing algorithm and trying to isolate its performance on the AArch64 systems. Stay tuned!

by bpatola at December 17, 2017 04:44 AM

Ronen Agarunov

SPO600 - Project - Stage 1

For the SPO600 project we are supposed to find an Open Source Software Project and identify a Hash or Checksum Function in its source code.
I have come across and picked the open source project named Hashids.
From its official website:
"Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.
It converts numbers like 347 into strings like “yr8”, or array of numbers like [27, 986] into “3kTMd”.
You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs."
Hashids is relatively a small-sized project that offers hashing as a method of managing IDs (for example in websites like YouTube that hash their videos' ID's).

For the purpose of benchmarking, I have created a simple tester that will encode numbers in 2 different testing methods:
For the first method: 11 different numbers will be encoded manually and then timed together.
The code to trigger the encode() function and run the software is:
clock_t begin = clock();
std::cout << hash.encode({123}) << std::endl;
std::cout << hash.encode({123456}) << std::endl;
std::cout << "Time: " << (std::clock() - begin) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
These are the results that I got:
The run-times are at best 0.106ms and at worst 0.130ms for 6 tests.

For the second method: Run a for loop that will encode 5,000,000 values.
The code to trigger the encode() function and run the software is:
int i;
clock_t begin = clock();
for (i = 0; i < 5000000; i++)

std::cout << "Last value (i) is: " << i << std::endl;
std::cout << "Time: " << (std::clock() - begin) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
 These are the results that I got:
The run-times are at best 3730.58ms and at worst 3922.57ms for 7 tests.

On first glance, I can already spot that the Makefile is using a -O2 flag. Given the minimal use of resources that software uses for operation, the time differences might be small, so we will be using milliseconds to benchmark the current performance times.
In addition to the flag, I will be focusing on the function named encode() that is inside "hashids.h" and will attempt to adjust the foreign function calling processes to make the function more independent. Considering the fact the code is almost entirely designed to be using vectors, it is already well optimised but can hopefully still be made faster with these updates and algorithm improvements.

In stage 2 of the project I will be implementing the changes discussed in this post and will report back with results of the outcome.

by ron-spo ( at December 17, 2017 03:41 AM

Eric Ferguson

Final Project Part 01 - Final Summary

To end part 1 I will summarize what information I have gathered for part 2:
  • I am optimizing clib, a package manager.
  • After benchmarking, it is clear there is a significant time delay in some advanced calls of the SHA1 function, such as ones that call update many times.
  • To optimize, I am going to add the -O3 flag and remove a loop condition (currently).
Some other observations:
  • This project is relatively small with no ./configure for other platforms.
  • The Sha1 code is unique and does not conform to the simple sha1 algorithm such as on    Wikipedia.
  • The documentation (i.e. README) is relatively vague at describing the dependancies. It suggests only syntax that implies installation and isn't clear at documenting development vs. published code. 

 I have learned alot getting to this point in part 1. Firstly, I learned that library files can only be linked by using sudo ldconfig and the files must be in usr/lib. Secondly, I learned how to alter an advanced Makefile's flags from vi instead of command line CFLAGS argument with ./configure. Thirdly, I learned that certain libraries have cflags that must be enabled to compile with that library. In conclusion, this has been very informative and I expect this to be extremely useful in other open source development projects.

by Eric Ferguson ( at December 17, 2017 01:10 AM

Jiyoung Bae


For the final project, I have been searching software which used checksum or hash functions written in C, C++ or assembler whenever I had time since the final project was released. MurmurHash3 is my first target so that I have looked all open-source project listed in the wiki page. npm(nodejs package manager) looked interesting. That was what I was looking for was my first thought! But I found out that npm(nodejs package manager) seemed to be fully optimized. So did other software listed. The fact made me overwhelmed.

Next, I googled with the keyword- free software using/implementing checksum, Comparison of file verification software, but I figured out that most software are written by java, c#, pearl or other languages, or I cannot run it in Linux environment. However, I tried some software available.

cksum – I realized that it is a command that I can use in Linux without any installation. see also cksum.html

My test if I could use this without installation!

// hello.c
main() {
     printf("Hello World");

[jbae18@aarchie project]$ cksum hello.c
1634081651 59 hello.c

RHash – Console utility which supports a lot of checksum and hash functions. I tried to  choose one of hash functions and then post experiment later because the code available in GitHub is huge. After code review to pick up a function, it is hard to get one of functions to experiment because they are all dependent on each other although it was great package.

pHash – It is used to hash for audio, image and text. Its pHash code was also huge like RHash, but I have decided to choose this package to experiment to perform audio hash, which was familiar from the class. I fixed some code like below due to error in Aarch64 to compile, and modified the main code from lab 6. However, I could not perform further without fully understanding of this code.

/* Each value is computed from successive overlapping frames of the input buffer. 
 * The value is based on the bark scale values of the frame fft spectrum. The value
 * computed from temporal and spectral differences on the bark scale.
 * /param buf - pointer to start of buffer
 * /param N   - length of buffer
 * /param sr  - sample rate on which to base the audiohash
 * /param nb_frames - (out) number of frames in audio buf and length of audiohash buffer returned
 * /return uint32 pointer to audio hash, NULL for error

#define SAMPLESIZE 500000000
#define FACTOR 0.75
#define SAMPLERATE 8000

u_int32_t* ph_audiohash(float *buf, int N, int sr, int &nb_frames){

   int frame_length = 4096;//2^12
   int nfft = frame_length;
   int nfft_half = 2048;
   int start = 0;
   int end = start + frame_length - 1;
   int overlap = (int)(31*frame_length/32);
   int advance = frame_length - overlap;
   int index = 0;
   nb_frames = (int)(floor(N/advance) - floor(frame_length/advance) + 1);
   double window[frame_length];
   for (int i = 0;i<frame_length;i++){
       //hamming window
       window[i] = 0.54 - 0.46*cos(2*M_PI*i/(frame_length-1));
   double frame[frame_length];
   //fftw_complex *pF;
   //fftw_plan p;
   //pF = (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*nfft);
   double *pF = (double*)malloc(sizeof(double)*nfft);

   double magnF[nfft_half];
   double maxF = 0.0;
   double maxB = 0.0;
   double minfreq = 300;
   double maxfreq = 3000;
   double minbark = 6*asinh(minfreq/600.0);
   double maxbark = 6*asinh(maxfreq/600.0);
   double nyqbark = maxbark - minbark;
   int nfilts = 33;
   double stepbarks = nyqbark/(nfilts - 1);
   int nb_barks = (int)(floor(nfft_half/2 + 1));
   double barkwidth = 1.06;    

   double freqs[nb_barks];
   double binbarks[nb_barks];
   double curr_bark[nfilts];
   double prev_bark[nfilts];
   for (int i=0;i< nfilts;i++){
       prev_bark[i] = 0.0;
   u_int32_t *hash = (u_int32_t*)malloc(nb_frames*sizeof(u_int32_t));
   double lof,hif;

   for (int i=0; i < nb_barks;i++){
       binbarks[i] = 6*asinh(i*sr/nfft_half/600.0);
       freqs[i] = i*sr/nfft_half;
   double **wts = new double*[nfilts];
   for (int i=0;i<nfilts;i++){
      wts[i] = new double[nfft_half];
   for (int i=0;i<nfilts;i++){
       for (int j=0;j<nfft_half;j++){
           wts[i][j] = 0.0;
   //calculate wts for each filter
   for (int i=0;i<nfilts;i++){
       double f_bark_mid = minbark + i*stepbarks;
       for (int j=0;j<nb_barks;j++){
           double barkdiff = binbarks[j] - f_bark_mid;
           lof = -2.5*(barkdiff/barkwidth - 0.5);
           hif = barkdiff/barkwidth + 0.5;
           double m = std::min(lof,hif);
           m = std::min(0.0,m);
           m = pow(10,m);
           wts[i][j] = m;

   //p = fftw_plan_dft_r2c_1d(frame_length,frame,pF,FFTW_ESTIMATE);

   while (end < N){
       maxF = 0.0;
       maxB = 0.0;
       for (int i = 0;i<frame_length;i++){
           frame[i] = window[i]*buf[start+i];
       //if (fft(frame, frame_length, pF) < 0){
       //    return NULL;
       for (int i=0; i < nfft_half;i++){                         //magnF[i] = sqrt(pF[i][0]*pF[i][0] +  pF[i][1]*pF[i][1] );                        magnF[i] = abs(pF[i]);                        if (magnF[i] > maxF){
               maxF = magnF[i];

       for (int i=0;i<nfilts;i++){
           curr_bark[i] = 0;
           for (int j=0;j < nfft_half;j++){                                 curr_bark[i] += wts[i][j]*magnF[j];                        }                        if (curr_bark[i] > maxB)
               maxB = curr_bark[i];

       u_int32_t curr_hash = 0x00000000u;
       for (int m=0;m<nfilts-1;m++){
           double H = curr_bark[m] - curr_bark[m+1] - (prev_bark[m] - prev_bark[m+1]);
           curr_hash = curr_hash << 1;                          if (H > 0)
               curr_hash |= 0x00000001;

       hash[index] = curr_hash;
       for (int i=0;i<nfilts;i++){
           prev_bark[i] = curr_bark[i];
       index += 1;
       start += advance;
       end   += advance;

   for (int i=0;i<nfilts;i++){
       delete [] wts[i];
   delete [] wts;
   return hash;

When I fixed all errors and compiled it again, I had other errors and other issues. I realized that I should find another package, which is simpler after spending few hours on this. The attachment below is the last issue I have got.


I also had a loook clhash and hashcat, but I did not find any easy solution yet.

At this point, my another plan will be one of them – I want to do the first thing if possible.

  1. I will figure out the problem in pHash after getting some help from professor, and then move forward because I see a lot of opportunity in the long code.
  2. I will examine clhash or hashcat more.
  3. I will find source code from GitHub because it looks simpler and easier without package.

by Irene Bae at December 17, 2017 12:44 AM

Eric Ferguson

Final Project Part 01 - Benchmarking and how to Optimize the Sha1 Function in clib

To benchmark the sha1test file I copied the flags from the standard make file of clib and placed it in the local dev makefile of sha1. It yielded the following results after I used clock to time the functions in each test:

The revised timing code can be found here for personal testing. Now we can see the majority of the time is spent in Test 6, so we are going to apply our optimization and hopefully see a change in the speed of that test. The current cflags of the standard clib Makefile are as follows:

-DCURL_STATICLIB -std=c99 -Ideps -Wall -Wno-unused-function -U__STRICT_ANSI__ $(shell deps/curl/bin/curl-config --cflags)

I plan on adding the -O3 optimization here. The challenge will be to either ensure that this does not affect any other dependancy or I will have to make a special case in the Makefile for only the SHA1 function.

I also plan to take out the if condition below in the sha1.c file and putting the if case in another loop.

 This appears to be the only optimizations I can currently make due to the uniqueness of the sha1 implementation, if more are determined they will be applied.

by Eric Ferguson ( at December 17, 2017 12:35 AM

December 16, 2017

Sofia Ngo-Trong

Finding an Open-Source Project Hash Function

For Part 1 of our term project, we are to find an open-source package that utilizes a checksum or hashing function which we have to benchmark the performance of. The goal is to optimize the piece of code and improve its performance, in Part 2 of the project, and benchmark the optimization to prove it.

According to Wikipedia, a hash function is “any function that can be used to map data of arbitrary size to  data of fixed size”. They are commonly used in databases for rapid data lookup, or in cryptography for assuring integrity of data. There are many famous hashing algorithms, including the SHA family, MD-5, and MurmurHash, which is relatively new, and recommended by our professor for its simplicity and its general-purpose usage.

I would’ve liked this blog to be about Part 1 of the project, but it looks like my Part 1 will constitute many parts. It turns out that simply looking for an open-source project to work with has been fairly challenging. I have spent roughly the equivalent of a whole day to just look for software to work with, and I am still on the hunt.

Nevertheless, here is a whirlwind breakdown of the challenges I faced.

Searching the net for a package

This activity proved a lot tougher than I thought it would. I literally spent hours and hours looking for a suitable package. Where I found some relevant contenders, I felt like the code could not be optimized any further.

An example is this library which is the c++ implementation of this library for C, which is based on the MurmurHash function.

At first, these two libraries were very appealing to me because of their simplicity, and the fact that they were libraries solely to provide the MurmurHash function. However, because that’s their main function, the developers would surely have spent a lot of time to ensure it is optimal, and when I looked at the code, I did not find anywhere I could optimize.

Here is the code for the c++ implementation:

uint32_t MurMurHash::MurMur3_32(std::vector& input) {
if (input.empty()) {
return 0;

const int nBlocks = input.size() / 4;
const uint32_t* blocks = reinterpret_cast(&input[0]);
const uint8_t* tail = &input[nBlocks*4];

uint32_t hash = 0;

uint32_t k;
for (int i = 0; i < nBlocks ; ++i) {
k = blocks[i];

k *= c1;
k = rotl32(k,15);
k *= c2;

hash ^= k;
hash = rotl32(hash,13);
hash = (hash * m) + n;

k = 0;
switch (input.size() & 3) {
case 3:
k ^= tail[2] << 16;
case 2: // intentionally inclusive of above
k ^= tail[1] <> 16;
hash *= final1;
hash ^= hash >> 13;
hash *= final2;
hash ^= hash >> 16;

return hash;

In the first for-loop, we can see that there is an external variable, k, that keeps on being re-written, so that region is not parallelizable. Also, at the end part, bit-shifting is used already.

Discouraged by the lack of evident room for optimization, I kept on looking.

Some software that I found were actually just libraries that an individual put together, implement his/her version of a hash function, and it did not actually contain any configuration or make files that build the project. This would be disadvantageous because it would remove one possible optimization option, which is to change the build options of the package for optimization.

An example of such package is this C++ Hashing Library by Stephan Brumme, which was appealing to me, because it was simple, just contained the methods for different hashing functions which you can import in your code, and as a first scan, I felt I had identified room for optimization. In the code for the SHA256 processBlock method, there were some parts of the code that I thought perhaps I could vectorize:

// get last hash

uint32_t a = m_hash[0];
uint32_t b = m_hash[1];
uint32_t c = m_hash[2];
uint32_t d = m_hash[3];
uint32_t e = m_hash[4];
uint32_t f = m_hash[5];
uint32_t g = m_hash[6];
uint32_t h = m_hash[7];


// update hash
m_hash[0] += a;
m_hash[1] += b;
m_hash[2] += c;
m_hash[3] += d;
m_hash[4] += e;
m_hash[5] += f;
m_hash[6] += g;
m_hash[7] += h;

Although, I figured that the amount of elements were too little and that speed would not be improved due to parallelization overhead. But the fact that this was not a package that I could build made this choice less appealing. 

As I grew more desperate to just find an open-source package, I went on the GNU website to look at their free open-source packages and looked at their section where there were small blurbs about each package, and searched for the word “hash”. There several programs that contained it, of which one with combine, and when I looked through its source code, I was happy to find that it had separate hash.h and hash.c files where it implemented its own hashing function. Furthermore, it was great that this program was not mainly about providing the hashing service, but it was a program that would help to combine data from two different files by using keys within the file. The hashing part of it was used internally in the program to help it identify files. Alas, I had found the one… or so I thought..

Making the Open-Source Package

Another project block for me was simply configuring and building the software packages I had found. One of the interesting packages I found, mnoGoSearch, which provides searching functions through web pages, and also implemented its own hash function in its own hash.c source file, had an error upon trying to make the file. I was not able to figure out what the error was.  For the “combine” package specifically, after downloading and unzipping the program in the AArchie server, and running “configure”, then “make” on the package, I had an error that said that it could not guess what the build of the system was. But as my professor diagnosed, the reason was because the software package is a bit old, the last update being 2013, which was before Aarch64 systems became common. He gave me a fix, which was to issue two commands:

wget -O config.guess ‘;a=blob_plain;f=config.guess;hb=HEAD&#8217;
wget -O config.sub ‘;a=blob_plain;f=config.sub;hb=HEAD&#8217;

As he explained, these two files would help the configure script to configure for Aarch64. So, after this fix, the program was correctly built, and I thought my woes were finally over…

Isolating the hash code to prepare for benchmarking

In order to benchmark the hash function for the “combine” program, I would need to take great care in ensuring that I’m targeting the hash function, and not the whole program. I wanted to extract the hash file and just use it in my own mini test C program.

However, as I tried to write my test program, I soon realized that the hash file had way too many dependencies within the program for me to use it independently in mine.

The hash file only contains three functions. Here are their declarations in “hash.h”:

HASHTYPE calc_hash_key (STRINGTYPE *);
HASHTYPE find_table_entry (STRINGTYPE *, val_entry *, long int);
HASHTYPE create_table_entry (STRINGTYPE *, val_entry *, long int);

The calc_hash_key function was the one I needed and wanted to extract. In its implementation in the c file, the description says that it “Calculates a hash key based on the string passed in. Takes the three least significant bits from each byte in the string up to a maximum that will fit in the hash key.” and the argument passed in is the string that needs to be hashed. Perfect, and simple.

calc_hash_key (string)
int i;
int j;
HASHTYPE hash_key;
HASHTYPE temp_key;
unsigned char *string_tmp;

j = 0;
hash_key = 0;

switch (gi_hashmovement_ind) {
case hm_binary:
/* Use all bits, as many at a time as will fit into HASHTYPE. */
string_tmp = string->string;
while (string_tmp < string->string + string->length) {
i = 0;
temp_key = 0;
while (i < sizeof (temp_key)
&& string_tmp < string->string + string->length) {
temp_key <<= 8;
temp_key |= *string_tmp;
temp_key <<= j;
hash_key = (hash_key ^ temp_key);

case hm_binary_long:
/* Not really binary. This is more a generic string hash. */
string_tmp = string->string;
while (string_tmp < string->string + string->length) {
i = 0;
temp_key = 0;
while (i < sizeof (temp_key) * 2
&& string_tmp < string->string + string->length) {
temp_key <<= 4;
temp_key |= (*string_tmp & 15);
temp_key <<= j;
hash_key = (hash_key ^ temp_key);
case hm_number:
hash_key = dstrtonum (string, NULL, 10);
string_tmp = string->string;
while (string_tmp < string->string + string->length) {
hash_key *= 10;
hash_key += *string_tmp – ‘0’;
case hm_beginning:
case hm_end:

switch (gi_hashmovement_ind) {
case hm_beginning:
string_tmp = string->string;
while ((i < (sizeof hash_key) * 8 / 3)
&& string_tmp < string->string + string->length) {
hash_key <<= 3;
hash_key |= (*string_tmp & 7);
case hm_end:
string_tmp = string->string;
if (string->length > (sizeof hash_key) * 8 / 3) {
string_tmp += string->length – ((sizeof hash_key) * 8 / 3);

while ((i < (sizeof hash_key) * 8 / 3)
&& string_tmp < string->string + string->length) {
hash_key <<= 3;
hash_key |= (*string_tmp & 7);



return hash_key;

However, when I tried creating a test program that used this file, I realized that it had too many dependencies on other files. I tried to decouple this file by substituting the macros for their definitions from the other files. For example the macro “HASHTYPE” was defined as “unsigned long long” in the file “df_common.h”, so I could simply substitute that and not use the dependency to the file, but the macro “STRINGTYPE” is defined as another program type called DStr_string_descriptor, which was defined in “dstring.h” as a struct like

struct DStr_string_descriptor
size_t length;
DStr_codes own_string;
unsigned char *string;

which itself had another program type which was an enum which itself contained other custom types… And to make matters worse, when I included all of the needed header files in my test program and tried to compile, there was an error reporting a missing header file “extras.h”, which actually belonged in another directory in the package.

In other words, I was going down a rabbit hole, the more I tried to decouple the function.

I figured that this program was too complex in order to benchmark a small function that occupied such an entangled and embedded part of the program.


So… I think I will end up just using one of the simple hash libraries I mentioned in the very beginning… and see what, if any, optimization I could possibly do with it…














by sofiangotrong at December 16, 2017 11:53 PM

Eric Ferguson

Final Project Part 01 - Making and Installing Dependancies for clib

To preface making clib I will first explain what it is. clib is a package manger, with the one major difference being it contains it's own dependancies and cannot be used to install anything else. The default clib repository is on git here. If you notice by default it has no hash or checksum function. To do this you must import it through clib. To make clib I used these steps:
  1. mkdir test
  2. cd test
  3. git clone
  4. make
  5. clib-install sha1
clib is now succesfully made and I have the sha1 function available to test. Inside the sha1 development repository here I notice it already had an extensive test file for function. I presumed getting this to work would be much easier than it actually was. The first thing to note, is the sha1 folder that gets pulled into the deps (dependancies folder) is much different than the development version. The published version has 3 files: package.json, sha1.c, and sha1.h. The development version has 7: .travis.yml, Makefile, README.rst, package.json, sha1.c, sha1.h, and test.c. The Makefile for clib does not allow compilation of the test.c file in the sha1 folder in deps. It will tell you that main has been defined somewhere else (clib.c). This means you will have to yank the sha1 folder out of clib entirely to test it.

After pulling the sha1 folder and compiling test.c it will likely tell you CUnit/Basic.h is undefined. This test file requires intalling CUnit. The biggest problem with CUnit is is it hosted on sourceforge here which only allows local downloading and our wget command does not work here. I was able to dowload it by using the command below:
This should naturally be done in an empty directory, which to get to CUnit you must do the command cd cunit/branches/mingw64. To make and install is tricky, first, you have to replace the config.guess file in the repository with yours from usr/share/automake.1.15. This will eliminate the error when running make that says it does not know where to build. Then run the commands to install:
  • automake --add-missing
  • autoreconf (these steps fixed some dependancy errors)
  • ./configure --prefix=/usr/lib/ (This is VERY important)
  • make
  • sudo make install
When you try to recompile the test.c file it will state the error that it cannot read line 0 of By default without defining the prefix CUnit will install in usr/local which we don't want because the linker cannot find the library. We want it in usr/lib and then we want to run this commad from /usr/lib
  • sudo cp ./lib/ ./
  • sudo ldconfig (finds new lib)
After this you should now be able to run make on test.c with no errors.

by Eric Ferguson ( at December 16, 2017 10:51 PM

Final Project Part 01 - Finding an Open Source Package

The final project of this course has been split into 2 major parts, the first part is as follows:
  • Find an open source software package with a hash or checksum function that compiles to machine code.
  • Benchmark the performance on an AArch64 system.
  • Identify how you plan to optimize the function.
 It turns out that finding the open soure software package took much longer than expected. Typing in google some combination of "open source software with hash function" clearly produces nothing. These were the strategies I employed below to find my package:
  1.  Reverse search using Wikipedia.
  • Start here at the list of hash functions.
  • Pick one, look at the External links and References sections for an implementation that is open source.

      2.  Install open source packages from the Free Software Directory here.
  • Pick a package and use wget on our AArch64 system if applicable.
  • grep -Ri for hash/checksum and see if it returned anything useful.

      3.  Google search "(hash/checksum) hash c".
  • Search through approximately 5 pages per result to see if any packages appear.

Ironically, although option 1 followed by 2 sounds the most probable to find a package, option 3 brought me to the package I chose. Wikipedia unfortunately offered only plenty of information regarding the way the hash works or it's vulnerabilities and very little on any implementations. Option 2 after many packages began to feel hopeless and the descriptions give you no indication of whether you will find a hash/checksum function in them. After typing in sha1 hash c I came across clib here.

by Eric Ferguson ( at December 16, 2017 09:20 PM

Matthew Marangoni

MurmurHash in NGINX


For the SPO600 class' final project, we're given a choice of optimizing either a hashing or checksum function being used in an open source software project (which is implemented in a language that compiles to machine code such as C, C++, or Assembler). In Stage 1 of this project, we're tasked with identifying and benchmarking the function on an AArch64 system, isolating the hashing function's performance from all other functions existing in the code's test files, and identifying the strategy we'll be using in Stage 2 in attempt to optimize the function for better performance.

Finding an Open Source Software Package with a Hash Function

I've decided to go with NGINX's MurmurHash implementation (they also have other hashing implementations!). Hopefully I'll be able to make some optimizations to it!

First off, lets have a look at the hashing function we'll be working with. You can view the original source code here on GitHub, but I'll include a snippet of it here for convenience's sake.

Benchmarking Performance

With the MurmurHash function isolated, we can create a simple main file which creates sample test data, and then uses the given hashing function to hash the data we just created. I used clock_t variables to get the time before and after hashing is complete, then subtracted the difference to see how much time was spent doing the hashing work. After a compile and a few runs of the program, we'll have our baseline benchmark data (this will be the time to beat after we optimize the function in Stage 2!). In the screenshots below, you'll see my test file as well as my benchmarking results. Before benchmarking, I made sure our AArch64 system (aarchie) was not being used by other users, using the 'who' and 'top' commands.

All clear! Now lets see the results of our test file...

After 4 consecutive runs of the program, we can see that the execution time of the hashing function at best finished in 32.335892 seconds and at worst 32.335972 seconds, so it's safe to say that on average 32.3359 seconds should be our benchmark time.

Optimization Strategy

A quick note on some of the data types being used in the program: char's are 8 bits, and range between -128 and 127 while u_char's (unsigned char's) range from 0 to 255. uint32_t's as you may have guessed are 32 bits, unsigned, ranging from 0 to 4,294,967,295. The code I'll be attempting to optimize is right at the start of the function body:

u_int32_t ngx_murmur_hash2(u_char *data, size_t len)
    u_int32_t  h, k;

    h = 0 ^ len;

    while (len >= 4) {
        k  = data[0];
        k |= data[1] << 8;
        k |= data[2] << 16;
        k |= data[3] << 24;

The first parameter 'data' of the hash function is a u_char array, each value is 8 bits (1 byte). Then in the body of the function, a 32-bit (4 byte) variable 'k' is created, and is used to store four 8 bit values (data[0], data[1], data[2], data[3]) one at a time, shifting them over to fill all 32 bits. In stage 2 of the project, My optimization strategy is to dereference the four data values, and then point them to the appropriate locations in memory where the 32-bit value k is stored in lieu of bit-shifting. I look forward to working on this optimization in Stage 2 and seeing if I can gain a performance improvement!

by Matthew Marangoni ( at December 16, 2017 08:48 PM

Shaun Richardson

SPO600 – Assignment Pt2

So after about a day of trying to get the profile generation tools to work, I failed to do so. So after some thought about what I could do without some in class assistance, I decided to look at each function that is in my programs hash.c file and see what I could do to change / optimize the code.

The first most obvious part of doing optimizations on this code would be to build it with the -O3 flag. When I get down to actually optimizing the code that will definitely be one of the first things I do.

So lets take a look at each function.

hash_init ()
 int i;
 for (i=0; i&amp;amp;lt;256; i++) {


Essentially this program just clears out the hash table. Before profiling I do not really think that this program would be worthy of optimizing as it would logically be only called once at the start of the program running. The only optimization I could also see being done here is separating the hash[i] into a different loop to allow vectorization by the compiler.

unsigned long
hash_stats ()
 int i;
 unsigned long total=0;
 for (i=0; i&amp;amp;lt;256; i++) {
 total += hash_length[i];

Again another fairly simple function that totals the number of words stored in the document. There is the possibility of loop unrolling that can be done here.

static HashItem *
hashitem_new (char *str)
 HashItem *hi;
 unsigned long i;
 hi=(HashItem*) my_malloc(sizeof(HashItem));
 if (!hi)
 error_handler("Out of memory");
 memset ((void*)hi, 0, sizeof (HashItem));
 hi-&gt;str = my_strdup(str);
 i = *str;
 if (i=='\\') i=str[1];
 i &lt;&lt;= 24;
 hi-&gt;value = i | (hash_value++ &amp; 0xffffff);
 hi-&gt;next = NULL;
 #if 0
 if (debug_mode) {
 printf ("&lt;!-- storing val %08lx str %s --&gt;\n",
 hi-&gt;value, hi-&gt;str);
return hi;

This function could possibly be optimized but I am not really sure what I would want to do with it in order to do with it in order to optimize.

unsigned long
hash_get_index (char *str)
#if 1 /* daved - 0.19.1 */
unsigned short index;
unsigned char ch;
int index;
char ch;
HashItem *hi;
#if 1 /* daved - 0.19.1 */
ch = (unsigned char)*str;
ch = *str;
if (ch=='\\' && *(str+1))
ch = *(str+1);
index = ch;
hi = hash[index];
while (hi) {
if (!strcmp(hi->str,str))
return hi->value;
/* not in hash */
hi = hashitem_new (str);
hi->next = hash[index];
hash [index] = hi;
++hash_length [index];
return hi->value;

This function searches through the hash table, and given a specific string it returns the index. Again my knowledge of optimization is limited so I will possibly have to get some in class assistance or look into it more in order to be able to optimize this portion of code.

hash_get_string (unsigned long value)
int index;
HashItem *hi;
index = value >> 24;
hi = hash[index];
while (hi) {
if (hi->value == value)
return hi->str;
warning_handler("Word not in hash");
return NULL;

This function is the opposite to the one above, given an index, return the string. As a very similar situation to above, I would not know how to begin optimizing this specific function.

So where are we at?

So, so far we have a few functions that I would not know how to start getting to optimize, and a few functions that are very small loops that would possibly have no impact.


So here is one of the issues I may have with trying to optimize this code. I ran some sample data (that is provided by the program itself). Each test of the program after being timed took between 0.02 – 0.05s. Not much time at all. The test data was fairly small which could account for such a small time. I am a little bit worried that any optimization I would perform on the hashing functions would result in such a small impact that nothing would be gained. My next goal is to create larger test data (in this case a very large .rtf file) and see if that has any impact on the benchmarking time. My plan is to do that for Monday, and if I feel as though the program it still “unworthy” I may switch to a different project.


I am a little bit disappointed in my progression so far, not in the amount of progression that I have done but rather at this point I feel as though my program may not actually be a great candidate. I really liked the program too as it (mainly worked) but also seemed very simple and would be easy to work with. I will do some thinking about this and come back on Monday.

by srichardson6blog at December 16, 2017 08:42 PM

Azusa Shimazaki

SPO600 Project Stage 1

For the project of SPO600 (,
First thing I needed was finding Open source software package that has a checksum or hash function or method.

Speaking of open source, GitHub might be good a place to find something.
I tried searching on GitHub with the keywords, such as "C++", "hush", "murmur", "SHA".
then, I found a simple SHA that looks very nice.

The file structure is simple and the code is also simple.

... but it didn't have any license notation.
 How simple it is! Too bad.
Maybe I should find another one since there is a possibility to try pull request...

During the second search,
I found a different one that interests me.
It is named  "sha1", written with C, and beautifully simple!
Ok, let's try this one.

I made a clone of the repository on AArch64, and I put a timer inside of the code.

Now, here is the code, sha.c

#include <stdio.h>

#include <stdint.h>

#include <stdlib.h>

#include <time.h>

clock_t start, end;

uint32_t H[5] =








uint32_t rotl( uint32_t x, int shift )


return (x << shift) | (x >> (sizeof(x)*8 - shift));


uint32_t roundFunc( uint32_t b, uint32_t c, uint32_t d, int roundNum )


if( roundNum <= 19 )


return (b & c) | ((~b) & d);


else if( roundNum <= 39 )


return ( b ^ c ^ d );


else if( roundNum <= 59 )


return (b & c) | (b & d) | (c & d);




return ( b ^ c ^ d );



uint32_t kForRound( int roundNum )


if( roundNum <= 19 )


return 0x5a827999;


else if( roundNum <= 39 )


return 0x6ed9eba1;


else if( roundNum <= 59 )


return 0x8f1bbcdc;




return 0xca62c1d6;



int pad(uint8_t * block, uint8_t * extraBlock, int blockSize, int fileSize)


int twoBlocks = 0;

//l is block size in bits

uint64_t l = (uint64_t)fileSize * 8;

if(blockSize <= 55)


block[blockSize] = 0x80;

int i;

for( i = 0; i < 8; i++ )


block[56+i] = (l >> (56-(8*i)));





twoBlocks = 1;

if(blockSize < 63)

block[blockSize] = 0x80;


extraBlock[0] = 0x80;

int i;

for( i = 0; i < 8; i++ )


extraBlock[56+i] = (l >> (56-(8*i)));



return twoBlocks;


void doSha1(uint8_t * block)


static uint32_t w[80] = {0x00000000};

int i;

for( i = 0; i < 16; i++ )


int offset = (i*4);

w[i] = block[offset] << 24 |

block[offset + 1] << 16 |

block[offset + 2] << 8 |

block[offset + 3];


for( i = 16; i < 80; i++ )


uint32_t tmp = (w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16]);

w[i] = rotl( tmp, 1 );


uint32_t a = H[0];

uint32_t b = H[1];

uint32_t c = H[2];

uint32_t d = H[3];

uint32_t e = H[4];

for( i = 0; i < 80; i++ )


uint32_t tmp = rotl(a, 5) + roundFunc(b,c,d,i) + e + w[i] + kForRound(i);

e = d;

d = c;

c = rotl(b, 30);

b = a;

a = tmp;

// printf("%d: %x, %x, %x, %x, %x\n", i, a, b, c,d,e);


H[0] = H[0] + a;

H[1] = H[1] + b;

H[2] = H[2] + c;

H[3] = H[3] + d;

H[4] = H[4] + e;


int main( int argc, char **argv )


if( argc == 2 )


char * fileName = argv[1];

//Read the input file 512 bits at a time = 64 bytes = 16 32-bit words

FILE * file = fopen( fileName, "rb" );

int fileSize = 0;

if( file != NULL )


uint8_t *block = malloc(64);

size_t readSize = fread( block, 1, 64, file );

fileSize += readSize;

start = clock();

while( readSize == 64 )



readSize = fread( block, 1, 64, file );

fileSize += readSize;


//We need padding

uint8_t *extraBlock = malloc(64);

int i;

for(i = readSize; i < 64; i++)


block[i] = 0x00;

extraBlock[i] = 0x00;


int twoBlocks = pad(block, extraBlock, readSize, fileSize);


if(twoBlocks == 1)




end = clock();

for( i = 0; i < 5; i++)








printf("Input file must exist.\n");





printf("Usage is sha1 [filename]\n");


printf("process time milisecond: %f \n\n", (float)(end-start));

The way of execution is very easy. After the compile (gcc -o sha sha.c)
 just  ./sha "filename".

Now, let's check the benchmark
I picked "LICENSE" file (GNU General Public License v2.0 document) as an argument, since it was in the same directory.

so, the command is,
./sha LICENSE 

The result is below.

829 milisecond.

So, How could I optimize?
..Maybe some "for" blocks can be optimized,  and in-line assembler might be helpful also.
Anyway, the instruction suggests,
  • Altered build options
  • Code changes to permit better optimization by the compiler
  • Algorithm improvements
  • In-line assembler
I will give it a try.

Goto Stage2.

by Az Smith ( at December 16, 2017 03:22 PM

Chun Sing Lam

SPO600 Project – Stage 1

For the first stage of this project, I need to find an open source software package that contains a checksum or hash function that is implemented in a language that compiles to machine code such as C. After finding the software, I need to benchmark the performance of the hash function on an AArch64 system. Lastly, I need to identify a few possible methods that I can use to try to optimize the hash function to improve performance.

Search for an open source software with hash function

The first step is to find an open source software. I have spent a few hours searching for software that contains a hash function that I can use for optimization. It turns out to be a difficult task. The problem is the software usually does not mention whether or not it uses a hash function, so I need to search the source code to find out, which is time consuming. I try to locate the hash function by using the “grep -inr” command to search for all of the files for a particular software that contain the string “hash” or “murmur” or “sha”. I see that some software such as AIDE and GPG support hash algorithms but I am unable to find a hash function in the source code. There is a lot of software such as pass that are not written in C and are written in other languages such as Java and bash, which means that I cannot use. In the end, I am able to find a hash function in OlegDB, which is a database software. OlegDB uses MurmurHash3 hashing algorithm to hash and index keys. Now that I have a software to work with, I will need to benchmark the performance of the hash function and come up with a plan to optimize the hash function, which will be discussed in my next post.

by chunsinglam at December 16, 2017 03:06 PM

Saeed Mohiti

Final Project

Finally, I started to work on final project which is like nightmare to me. For some health reason I started late, let me give you quick advise; a herniated disk is known as lower back pain occurs from different facts, one of them is sitting for long time. So, don’t stick with your laptop or PC for long time, once a while change your position and try to walk. Anyway ….

 Honestly, the hardest part is to choose the right function.

The requirement of this project is to find an open source software package that includes a checksum or hash function.

So, I have decided to choose the m5deep. md5deep is a software package used in the computer security, system administration and computer forensics. This program is a tool to compute and audit hashes for different number of input files. However, like other hashing programs such as md5sum, but the md5deep has more option and features. md5deep can also use SHA family of hashes. it can also use cross algorithmic and variety of algorithms.

md5deep currently supports MD5, SHA-1, SHA-256, Tiger, and Whirlpool.

As stage one requirements, after finding the package we have to benchmark the performance of package on AArch64 the we have to identify the strategy we use in stage 2.

So, I will take look at the code and will try to understand how can I optimize the code and what is the possibilities.

But these are my possible strategies on stage 2 now on:

Improving the algorithm (hash).

Modifying the source code to work on AArch64 but I have to keep in my mind this modification should not affect the other systems.

Modify the make file to perform the program faster.

This is just beginning of the project blogging, I will update and complete this post tomorrow.


by msmohiti at December 16, 2017 04:04 AM

Michael Pierre

Async Image Load Test and Code Reading

Even after doing years of coding a software development I always find there are infinite things to learn when it comes to computers and technology. I wouldn’t have thought that a simple image load test would test my understanding of JavaScript as a programmer. This was basically the original testing code given
var img = document.querySelector('#image-1234');
img.onload = function loaded() {
img.src = "";
img.src = "";

Which to me initially looked like a perfectly fine piece of code that would run fairly straight forward. The code is meant to set source of an image and then later in the code set the source of the image again. Sounds easy however after a small modification to see if the code was setting the image source twice I noticed that it wasn’t happening at all.

Original test code modified:

var img = document.querySelector('#doggo');
var onloadUpdate = document.querySelector('#update');var onloadCount = 0;
img.onload = function loaded() {"FIRE THE ONLOAD!");
onloadCount = onloadCount +1;;
onloadUpdate.innerHTML = onloadCount;
if(onloadCount == 1) { = "url('error.jpg')";
else if(onloadCount == 2) { = "url('success.jpg')";}
img.src = "doggo.jpg";
img.src = "doggo.jpg";

The code above counts how many times the onload event runs and displays it on the screen as well as changes the background depending on if the image loaded once or twice giving an error or success type background. To my dismay the onload event was only running once which left me a little confused. This happened across all browsers I tested which were Google Chrome, Internet Explorer, Edge, and Mozille Firefox. Luckily Professor David Humphrey offered me some assistance, see below:

lab11 1

The problem I was having was a synchronization issue, since I was setting the source twice one after another the first setting of the source wouldn’t have time to load and it would skip to the second and thus enter the onload function only once. The key was to make sure the initial image was loaded and then load the image again which then made onload fire twice as desired.

Modified Test Code: 

var img = document.querySelector('#doggo');
var onloadUpdate = document.querySelector('#update');
var onloadCount = 0;
img.onload = function loaded() {"FIRE THE ONLOAD!");
onloadCount = onloadCount +1;;
onloadUpdate.innerHTML = onloadCount;
if(onloadCount == 1) { = "url('error.jpg')";
img.src = "doggo.jpg";
else if(onloadCount == 2) { = "url('success.jpg')";
img.src = "doggo.jpg";

In this modified code the source is only set again if the first source has been loaded already thus eliminating the synchronization issue. It’s interesting how JavaScript handles image loading like this and I would of most likely run into this issue in the future if I hadn’t addressed it now. As a programmer I’m so used to reading code top down (even though I have down extensive work with Threads and AJAX) that for some reason the concept actually took me a while to understand and thankfully Professor Humphrey wasn’t hesitant to help explain something that seemed super simple.

Click to run the test for yourself!

Another goal was to find the code behind the image load capability of the browser which turned out be a bit more difficult than I initially thought. I had no idea that most browsers are actually written in C/C++ which is really interesting because it’s a language that we learned how to do a lot of logical/mathematical things with however in classes we never applied it to much outside of that. firefox logoIt would be interesting to brush up on some C++ and try to understand and improve some of the existing Open Source browsers like Mozilla Firefox and Chromium. chromium logoAnyway, during my search I went through the mass amount of source code in Chromium and Firefox, not so much to my surprise searching image in the source code gave me around 150,000+ results which made it very difficult to search through being one, not a C++ buff as well as having no previous experience navigating a browsers source code other than to build it manually. Mostly I was relying off of class names like Image.h and Image.cpp however when asking my professor if those files were the ones behind onload I quickly found out they weren’t. This wasn’t the end for my search, for being a Software Developer means having excellent Google skills so I simply googled the issue I was originally having (image source won’t load twice when called twice). I ended up coming across a bug post from 2009 in which a Chromium developer had the same issue “image onload event does not fire when loading an already loaded image” which was kind of cool to see open source bug collaboration dating back that far as well as finding somewhat what I looking for (the power of Google!). At the bottom of the post there are some links to what I’m thinking is the modified code to address the issue which in term gave me the name of the rendering engine for chromium as well as the filename for the code that handles the onload functionality. Another quick google search and I ended finding the source code for an image load which is appropriately called ImageLoader.cpp *facepalm*.

Set image code:
void ImageLoader::setImage(ImageResource* newImage)
// Only consider updating the protection ref-count of the Element immediately before returning
// from this function as doing so might result in the destruction of this ImageLoader.

void ImageLoader::updatedHasPendingEvent()
// If an Element that does image loading is removed from the DOM the load/error event for the image is still observable.
// As long as the ImageLoader is actively loading, the Element itself needs to be ref'ed to keep it from being
// destroyed by DOM manipulation or garbage collection.
// If such an Element wishes for the load to stop when removed from the DOM it needs to stop the ImageLoader explicitly.
bool wasProtected = m_elementIsProtected;
m_elementIsProtected = m_hasPendingLoadEvent || m_hasPendingErrorEvent;
if (wasProtected == m_elementIsProtected)
if (m_elementIsProtected) {
if (m_derefElementTimer.isActive())
m_keepAlive = m_element;
} else {
m_derefElementTimer.startOneShot(0, FROM_HERE);

Overall it was interesting to see how what originally went from testing a simple image loading example went to figuring out how browser event loading works and taking a deep dive into the source code that makes the whole system works itself. As for the future I’d like to try to do some browser coding to improve my C++ and to work in something I haven’t really thought of doing before.

by michaelpierreblog at December 16, 2017 03:21 AM

December 14, 2017

Sean Prashad

English 101: Code Literacy

"Code literacy": the ability to read, interpret and understand code is a new phrase that has recently been added to my vocabulary. It's something that I practice daily as a student whether I'm tutoring at Seneca's Learning Centre or a skimming over a main() implementation for an upcoming lab.

This week in a lecture by our Professor and Mozillian, David Humphrey, we explored code snippets in various browsers including WebKit Trac, Chromium and Mozilla DXR. It became apparent that understanding another individual's code exposes you to the authors mindset and also reveals good or bad practices. Up until now, I was unconsciously doing just this - reading code from engineers at Mozilla or Rust to progress on bugs of mine.

In essence, by gaining more and more exposure to the world of Open Source, I'm able to learn from experienced engineers around the world for free.


Our task was simple: test which browsers did not call the onload() event when setting the src of an img element to the same value multiple times.

Now full disclaimer, I'm not particularly strong at JavaScript which made this seemingly easy task more difficult than it needed to be (also lack of sleep didn't help). I made use of fellow classmates Michael Pierre, Phil Henning and Marco Beltempo's code examples to get started on my own.

One thing I ran into was what seemed like an infinite loop (denoted by my Macbook Pro throttling it's fans at 150%). With some diagnostic help from our Professor, he identified that with my current logic, an infinite loop had been formed as a result of a small misunderstanding:

Incorrect onload() Logic
Incorrect onload() logic
Infinite Loop
I can count this fast

Setting img.src inside img.onload() created an infinite loop as setting the src would cause the image to reload which causes onload() to be called and the cycle repeats over and over again!

With a list of possible fixes presented, I chose to simply set an if else condition inside the onload() to avoid this problem:

Fixing Infinite Loop Logic
Correct onload() logic


With the simple test case ready, I was able to test it along with other classmates in the following browsers. Here's the results gathered from all of us:

  • PASSED: Google Chrome
  • PASSED: Firefox Quantum
  • PASSED: Internet Explorer 11
  • PASSED: Opera
  • FAILED: Safari

So with all but Safari passing, I knew that we had to search the WebKit code base to figure out where the bug was. Thankfully, Marco had pointed out that the following code snippet from ImageLoader.cpp looked interesting:

Suspicious Sode
Is this the bug?

My gut instinct told me that we had to look further - maybe for a specific line of code, maybe even line 354 of ImageLoader.cpp:

My Suspicion
My suspicion

Fate so has it that just at that moment, a wild David Humphrey appeared to nudge us in the right direction:

Professor Aid
There's few things that I would bet money on

And so, David and I set out on the journey with his trusty lldb debugger, webkit debug build and my instructions to steer the ship (no bug stands a chance!).


At first, I thought that line 273 would be a suitable breakpoint to view the value returned by image_content_.Get():

First Step

But as our Professor had an older version of the debug build which wasn't completely mapped with the one I was looking at, he pointed me to the equivalent line of code on his end - line 216 found here:

Old Image Loader Code
Code from David's current build

As promised, David attached his lldb debugger to Safari using the following command:

lldb -p 7373

Where the -p argument stood for the Process ID that we were attaching the debugger to (in this case, 7373 was the Process ID for Safari on David's laptop).

Attaching a Debugger to the Browser
LLDB at work

A tutorial and more documentation for lldb can be found here.

With a breakpoint set at line 214 (b ImageLoader.cpp:214), he continued to execute code until that breakpoint using the c command.

After that, I simply provided my test case repository which was hosted on a gh-pages branch that made it publicly accessible for anyone to view!

The breakpoint at line 214 was triggered when the page loaded for the first time (as expected):

First Breakpoint Hit
Breakpoint hit for the first time!

But it only hits that breakpoint once, despite clicking the button to invoke the img.onload() event again.. interesting!

Second Breakpoint NOT Hit
Same breakpoint not hit a second time!

Using the bt command to visualize the 9 frames helps to see what was going on before the breakpoint was hit.

Stack Track
Each layer that occured before we got here

David pointed out that the bug is somewhere in the 9 layers of the stack he showed but it wasn't where I thought it was - back to the drawing board!


I knew we were close so I started looking at other parts of the code, more specifically, the function updateFromElement():

My next suspect

I noticed that line 168 returned the current URL of the img element but it ended up not being very useful (we need to find where the src would be set, not retrieved).

Next up, I saw line 197 which referenced another thing that caught my attention:

newImage = document.cachedResourceLoader().requestImage(WTFMove(request)).value_or(nullptr);

but unfortunately David mentioned that might be used in bypassing cache storage inside Safari and wasn't what we wanted.

But he did mention that "the load event is dispatched in ImageLoader.cpp" and that something "wasn't choosing to call it". He also mentioned that "the dispatch stuff doesn't get called there but only defined". What did this all mean? Basically the event for onload() was not getting dispatched or invoked a second time for some reason!

Feeling frustrated, I took a step back to really read the code. At line 217, the if statement read:

if (newImage != oldImage) {...}

Okay - this was very important but at the time, I was fixated on figuring out what happened before it and not what happened inside it. If we got the same image as the old one, the if statement would be false and skip straight to line 261. And then it hit me:

Is This It?
Wait a second...

And then next thing I know...


So it was hiding under my nose all along - inside the if statement. The bug was that the if statement code for an old image (aka the same URL being passed in) was never called because it was unreachable! Boy, what a great feeling it was to find it after bumping into so many walls!


What was really cool was that David linked us when the bug was introduced and by who. It turns out that it's been around for over 12 years ago!!! See for yourself here and here.

It felt like digging up a dinosaur or making another discovery - now time to delete my Slack posts so my peers have to search for it too :)

December 14, 2017 01:20 PM

Joshua Longhi

Continuing with Mozilla

As I dive deeper into open source I am choosing to stick with Mozilla for a few reasons. Firstly I like the layout of Bugzilla making it easy to find bugs that are unassigned, secondly mentored bugs make give you a convenient way of finding someone for help and thirdly I really like the idea of being apart of Firefox, a browser I use every day.

In order to grow during this release I will attempt to do more bugs then I previously took on. I will also try to branch out and try and fix bugs in different branches of Firefox and not the ones I previously worked on.

I considered various python projects but after looking through a few of them it seemed a lot of python projects that were very active were all in machine learning or big data which was not something I wanted to get into while trying to get a foot in the door of the open source world, perhaps in the future I will visit these projects.

The first bug I have assigned so far is here:

To get at least two more bugs done on top of that one I plan to do a bug a week starting now which will give me enough time by the end of the school semester to get my bugs done. To fix the bug above I need to refactor code which is easy enough but there are some questionable lines that may or may not need refactoring, to figure this out I will consult the mentor of the bug.

by jlonghiblog at December 14, 2017 03:45 AM

Shaun Richardson

SPO600 -> Start of the Assignment

For SPO600’s very large project, we are required to find an open source project that uses some sort of hashing algorithm (e.g. Mumur Hash, PJW Hash, SHA, etc). We are then tasked with performing some sort of optimizations on the code after profiling which functions are the most worthy of optimization (no point in optimizing a function that is only called once!). The first task is to find a project that actually uses a hashing algorithm, lets just say I thought this task would take less time than it actually did.

After (literally) hours of searching around Open Source Package databases, many uses of wget . (insert zipped file here) to either find a project that was a little bit too complicated, simply did not have any hash functions, or were not able to be used with this project (I found an interesting project about Nutrition tracking that I got really excited about but then that excitement was quickly removed after I had realized the project was used with Python) I finally found a project that seems worthy.

The Open Source Project I am initially choosing (this may change after I do some actual profiling and benchmarking) is called UnRTF which can be found here. The program itself seems very simple, it converts documents in RTF formatted documents to HTML, LaTeX, troff macros, and RTF itself. I really don’t know what those middle two formats are, but .rtf to HTML seems cool. Heres a wikipedia article on LaTeX format and heres a link to the wikipedia article on Troff macros.

Moving on, lets take a look at the actual source code that has implemented some sort of hash functions. Turns out, there are several actual hash functions used within the hash.h/hash.c files. The hash files include hash_init() -> initiates the hash table, hash_stats() -> returns the number of words stored (all words, including commands to RTF), hashitem_new(char *str) -> creates a new linked list item for the hash table, hash_get_index(char *str) -> given a string returns the index or the word indentifier, and finally hash_get_string(unsigned long value) -> given the index returns the word string. So given all these functions, it basically looks like the hash functions stores the words of the RTF document into a hash table and it has the ability to return either the index or the string of that hash table document.

There isn’t going to be much profiling on the individual functions as I need to figure out how that actually works first. But I just wanted to blog about my progress with the project so far, as I have actually found a project that I was able to configure, make and install that had some sort of hash function in the source code. The amount of time it took to actually find something that fulfilled those requirements took a surprisingly long time.

Next work on the project with be using profiling tools to determine which function would be the most useful to optimize.

by srichardson6blog at December 14, 2017 02:18 AM

December 13, 2017

Pablo Calderon


So for the SPO600 final project I chose an open source package called GNU Units (which can be found here). I thought choosing an open source package with a hashing function (non-cryptographic, universal, etc.) or checksum function would be one of the easier tasks of this project, but this process took hours. Non of the packages are explicit about containing hash functions, and when you do choose one, tracking the hash code in the source code requires some time. Originally, I chose a package called GNU Shogi (if you’re curious about it, check it out here); for those who don’t know what Shogi is, it is essentially Japanese chess. Like Chess, Shogi is riddled with rules, complex strategic plays, and intellectual thought. This game is obviously no easy task to code but it was done, regardless. The Shogi package did contain some hash functions, but it was too incredibly complex to understand what was going on in the code, there were tens of .c files all connected doing multiple things at once. It was too overwhelming for me to profile and benchmark, especially at my current level as a software optimizer/tinkerer. I figured I would have a more enjoyable time and learn more about optimization if I chose a package where I could learn the ins and outs of the program and know what parts do what. Thus, I ended up with GNU Units.

The final project is broken down into 2 parts. The first part requires you to find an open source project (GNU Units) that contains a hash function. This software has to be written in a language that compiles on AArch64 (aka Aarchie) such as c, assembly, etc. Once you isolate the hash function, profile and benchmark the performance of the hash code. Afterwards, come up with a strategy to improve the algorithm of the hash function. For more in-depth look at the project, the outline is right here.

So to begin…

Similar to our previous labs, I retrieved the package with:

wget .

And to unzip the package:

tar -zxvf units-2.14.tar.gz

Once I had the files, I went straight to the source code of GNU Units, and to my surprise (but what else did I expect?), the units.c file was long with lots of lines of code, and at times a bit incomprehensible; a clear indication that this is a complex program (more so than I thought). This will take a while to profile and benchmark. For those who don’t know what Units is, it’s essentially a unit conversion program, but due to its complexity and vast conversion scales, it uses a hashing algorithm.

I didn’t know where to begin to look in the unit.c file for hash functions so I used the following to command to get an idea where to start:

grep ‘hash’ *

I looked up the word ‘hash’ in every file, and what’s on the image (below) is what I got.


From the image above, you can see the “hashing algorithm for Units is in the file “units.c”. But that still wasn’t good enough to locate the algorithm, the file had thousands of lines of code, so I implemented the “grep” command with the “-n” flag — like this:

grep -n ‘hash’ *


I like how the file name is purple and the line number is green, very easy to read. Going from this information, my algorithm is at line 569.

Going into the units.c file, I finally found it.


Now for my next step was to identify what kind of hashing algorithm this was. For this part I took to the internet and began to see if anything resembled this piece of code. Unfortunately, this isn’t a well known hash function like some of the ones out there such as MurmurHash, xxHash, PJW hash/ Elf Hash etc. However, after a couple hours of exploration, I came across some hash functions that were build in a similar way as the hash function I’m working with. Though it is not an exact match, djb2 come pretty close. You can read more about it here.

After speaking with my instructor, Chris Tyler, about the hashing function that I’m using, I realized that this hash function was specifically made for this program and that it’s not one of the more powerful/popular ones. In fact, for this small conversion program this hashing function does its job, but it would under perform in other, more complex programs. To speak honestly, it is not a very good hashing algorithm, but that just means that it can be heavily optimized (hopefully this theory proves to be true).


To begin benchmarking/testing the algorithm, I’ll have to extract the hashing function onto a separate .c file called testing.c. Luckily for me this was not too hard, the algorithm just pops off from GNU Units and does not depend on any other functions for the hash function to run. To test this hash function I will write a simple program that generates random characters to an array and passes it to the hash function. From there the characters will be mapped and given a key (via hashing). When I call the hashing function in the program that I’ll be writing, I’ll place a clock before and after the function to obtain the speed at which the hashing occurs. This test is very similar to lab 6. Something that I noticed while looking at the source code of GNU Units is that no optimization flags were used, so for the testing I’ll be using the -O3 flag to run my program. This alone will severely optimize the hashing function – through vectorization. Also, to really test the limits of my hashing function, I’ll be generating 500, 000, 000 million random characters in the array to be hashed. Lastly I’ll be running the test program ten times to avoid any discrepancies in the experiment results, and have one general fundamental result.

Another thing that I noticed with my hashing function is that it already had predefined variables called HASHNUMBER and HASHSIZE. As you can see in the image below HASHNUMBER was defined as 31 and HASHSIZE was defined as 101.


Again for the purpose of this experiment, only HASHSIZE will change to 500, 000, 000 million. I’ll be making these modications in my test program.

So the following image is the test program that I build (testing.c):


This is the array generating SIZE being 500, 000, 000 random charters and storing them in *str.

for (int x = 0; x < SIZE; x++) {
tmp = (u_char)((rand()%26)+ ‘a’);
str[x] = tmp;

This is where key being the hash function (uhash) passes the argument (500, 000, 000 char) string. It’s also being tested for speed.

//testing hash function speed
start = clock();
key = uhash(str);
end = clock();

total = (float)(end – start)/ CLOCKS_PER_SEC;

However, I want to make sure that I’m implementing my array correctly and its successfully passing through the hash function. So in my main I temporarily added the following print statement in a new for loop:

int i;

for (i=0; 1<10; i++){
start = clock();
key = uhash(str);
end = clock();
total = (float)(end – start)/ CLOCKS_PER_SEC;
print(“key = %d, name = %s\n”, key, str);

I’m obviously not going to print this 500, 000, 000 times so I modified the HASHNUMBER and SIZE to 10.


As you can see the hash function maps the name (being the random characters) to an integer (being the key). Since the name is the same, than the key is the same as well, but if the name changes, then the key changes as well. Two names cannot have the same key, that would be called a “collision”, and that would mean that its a poor hash function. You can read more about hash functions here.

Well, as you can see, my testing program is working perfectly.

Now we can begin testing…

Wait. One more thing. Before we begin let me show you the difference between the hash function (the one I’m using) not being optimized, and then being optimized with the -O3 flag. (Remember that for the test segment of this project, I’ll be using the -O3 flag).

No optimization – 3.248198 sec:


With optimization – 0.513003 sec:


That’s an 84% increase with the optimization flag.

Now I’ll be testing the hash function ten times:


I also wanted to profile further and use the “GProf” command to see each function of the GNU Units program, and see what capacity each function is working at – all this being displayed as a flow chart/image. Easier to see which function is doing the most work. So I did “make clean” to recompile the program and used the “-pg” flags when executing “./configure” – like this:

./configure CFLAGS=”-pg” CXXFLAGS=”-pg” LDFLAGS=”-pg”

This will allow me to produce a “gmon.out” file that I’ll need to extract the image of my functions. At first, the “gmon.out file” wasn’t being produced, but that’s because I didn’t run Units and fed it information to actually use, meaning that non of the functions did any work. Also when running the program, make sure to quit using “crtl d” to actually save the data input, because using “crtl c” will erase the data and the “gmon.out” file won’t produce.

After I used this command to produce the image:

gprof units | gprof2dot | dot -Tpng > image.png

Once the image was generated, I used FileZilla and logged into to retrieve the image – which you can below:


As you can see there are a lot of functions being used, and to be hones I’ll further dissect this in Part 2 of the final project. Right now, I just wanted to show you an image of the functions that uses the hash function. I’ll give a better visual representation of the functions I’ll be focusing on, cause right now you can’t see it very clearly.

I’ve already identified some of the strategies to optimize the hash function such as playing with the build options. However the problem with this is that the hash function is optimized, but the rest of the program isn’t build to keep up at that capacity and could break the program. Also I’m thinking about isolating other functions that  heavily rely on the hash function and optimizing that function by SIMD Vectorization. Meaning, that instead of having one “for” loop going in series, have multiple “for” loops operating simultaneously. The hash function is a simple program, so it can’t be modified to heavily to process faster, I think my best option is going after other functions that uses the hash function. I already have some candidates based off the “GProf” image I produced about. Also I know it’s risky (could cause the Units to crash), but I might try some inlining to alter some of these functions – first I have to re-read lab 7 again.

by pabinski at December 13, 2017 01:57 AM

Henrique Coelho

SPO600 Project - Planning (Stage 1)

As a requirement for my SPO600 course, I need to complete and blog about a project for identifying a checksum/hash algorithm in an open-source project and improving its performance. The full requirements can be found here. This post is the Stage 1 of my project, which is composed of the following steps:

1- Finding an open source package that has a checksum/hash algorithm, implemented in a language that gets compiled to machine code (C, C++, Assembly, etc).

2- Benchmark the performance on AArch64.

3- Identify the strategy to improve the algorithm, for example: altered build options, algorithm optimization, changes in the code to allow better optimization by the compiler, and in-line assembly.

Finding an open source package

One of our professor's recommendations was a hash called MurmurHash. I found the algorithm very interesting: it is simple, very easy to port, and due to its simplicity, improving it can be very challenging. I decided to follow the recommendation, and I found that Nginx uses this algorithm. I don't think it will be an easy task to make it more performant, but I am up for the challenge.

Benchmark the performance on AArch64

Testing the performance of this algorithm should be relatively straight forward: the function can easily be detached from the file, which will allow me to put a wrapper around it specifically made for testing its performance. A few important factors when testing the performance of this algorithm:

  • I do not want other users in the system (since I am using a computer that is shared among several other users) spawning processes that will interfere with the algorithm's performance, so I made sure to run the tests when no other users were using the system.
  • In this case, since I can easily detach the function, I will measure the time right before and right after the function runs, so I will not be measuring anything except for that function in particular.
  • A quick search in the Nginx repository shows that it is built with the -O2 compiler flag for optimization and a few other options. I will assume that this is true for all cases. Because of this, my tests will be ran with the -O3 compiler flag.
  • Sometimes, a program will run much faster on the second time it is executed (because of how it is cached). To avoid these possible errors on my measurements, I will run the benchmark 10 times.

So, here is my extracted algorithm:

// murmur_original.c

// The include below is just for me to define a few
// constants, if I ever need it
#include "toolbox.h"

original_hash(u_char *data, size_t len)
    uint32_t  h, k;

    h = 0 ^ len;

    while (len >= 4) {
        k  = data[0];
        k |= data[1] << 8;
        k |= data[2] << 16;
        k |= data[3] << 24;

        k *= 0x5bd1e995;
        k ^= k >> 24;
        k *= 0x5bd1e995;

        h *= 0x5bd1e995;
        h ^= k;

        data += 4;
        len -= 4;

    switch (len) {
    case 3:
        h ^= data[2] << 16;
        /* fall through */
    case 2:
        h ^= data[1] << 8;
        /* fall through */
    case 1:
        h ^= data[0];
        h *= 0x5bd1e995;

    h ^= h >> 13;
    h *= 0x5bd1e995;
    h ^= h >> 15;

    return h;

I duplicated this file and called it murmur_modified.c (changing the name of the function as well). I will use the other file to modify the function, so I can compare them side by side.

All I need to do is make a wrapper and call the function. This is the wrapper I made:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/time.h>
#include <time.h>
#include "murmur_original.c"
#include "murmur_modified.c"

#define DATA_LENGTH 500000002

int main()
  uint32_t        result_original;
  uint32_t        result_modified;
  u_char          tmp;
  u_char         *data_o;
  int             i;
  int             are_identical;
  struct timeval  timer_ori_before,


  // Creating random characters
  data_o = calloc(DATA_LENGTH, sizeof(u_char));
  for (i = 0; i < DATA_LENGTH; i++) {
    tmp = (u_char)((rand() % 26) + 'a');
    data_o[i] = tmp;

  gettimeofday(&timer_ori_before, NULL);
  result_original = original_hash(data_o, sizeof(u_char) * DATA_LENGTH);
  gettimeofday(&timer_ori_after, NULL);

  gettimeofday(&timer_mod_before, NULL);
  result_modified = modified_hash(data_o, sizeof(u_char) * DATA_LENGTH);
  gettimeofday(&timer_mod_after, NULL);

  timersub(&timer_ori_after, &timer_ori_before, &timer_ori_result);
  timersub(&timer_mod_after, &timer_mod_before, &timer_mod_result);

  are_identical = result_original == result_modified;

  printf("Generated %d characters for hash functions\n", DATA_LENGTH);
  if (are_identical)
    printf("SUCCESS! - All hashes are identical for both functions\n");
    printf("!!! ERROR !!! The hashes for both functions do not match\n");

  printf("Time for original: %ld.%06ld\n",
    (long int)timer_ori_result.tv_sec,
    (long int)timer_ori_result.tv_usec);

  printf("Time for modified: %ld.%06ld\n",
    (long int)timer_mod_result.tv_sec,
    (long int)timer_mod_result.tv_usec);


  return 0;

Still not perfect, but a good start. This wrapper:

1- Creates a long array of random characters

2- Starts the timer for the original algorithm

3- Sends the array to the first hash function and records the result

4- Stops the timer for the original algorithm

5- Starts the timer for the modified algorithm

6- Sends the array to the second hash function and records the result

7- Stops the timer for the modified algorithm

8- Compares both results. If they are different, my modified hash function is broken and the test is invalid. Otherwise, they comparable

9- Print the time for the original and the modified function

Here are the results for the first test run (the "modified" version is still the original algorithm):

Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330580
Time for modified: 0.330509
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.333051
Time for modified: 0.332968
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330849
Time for modified: 0.330729
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330410
Time for modified: 0.330330
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330744
Time for modified: 0.330551
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330604
Time for modified: 0.330511
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330427
Time for modified: 0.330301
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330391
Time for modified: 0.330341
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330375
Time for modified: 0.330274
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330385
Time for modified: 0.330292

Not bad. The results seem to be very uniform.

Identifying the strategies

First, about altering build options: Nginx has a very complete build configuration, and I don't think I am comfortable with messing with their building options. However, I do know that some compiler flags are available for AArch64 that may offer some hints to the compiler for optimizations. When I investigated the file /proc/cpuinfo for the machine, I saw the following line:

Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

Some AArch64 processors offer features to speed up hashing algorithms. This one in particular seem to have these features for algorithms such as SHA-1 and SHA-2, but not for Murmur. I am not particularly optimistic about compiler flags, but it seems like SIMD may be an option for optimization. I will try a few options.

Second, about algorithm optimization: the code is very straight-forward and there is not much room for improvement there. It is already using bitwise operations, and it has a very simple structure. I think it is safe to say that I will not find anything to improve in terms of algorithm performance.

Third, about changing the code to allow optimizations by the compiler: I see some room for improvement here. Maybe I can make some changes in the code so the compiler will vectorize (SIMD) a few operations. It may give me some milliseconds.

Lastly, inline-assembly: probably not a tool that should be used very often. I find inline-assembly really neat, but it has a fatal flaw: it kills portability. If I write inline-assembly for AArch64, it will not run on X86-64 at all. It is something I should avoid, but it may be useful if I really can't find anything else to improve and I still want better performance.

Other to-dos

  • Investigate how to test with different string lengths
  • Check if the performance for x86-64 is affected by the changes

by Henrique at December 13, 2017 01:12 AM

December 12, 2017

Dan Epstein

Release 0.2 – More Bugs!

The Goal for Release 0.2

Getting near the end of the semester can be challenging and almost impossible to complete large bugs, as these take time to get merged. Therefore, my goal for this release 0.2 is to finish as many small bugs as possible on any open source project from GitHub. This is still an improvement compare to what I did in my previous release 0.1, because getting to adapt to any new open source projects can be rewarding as I learn new environments and languages on the go.

So far I have three bugs on the line, which you can check from the button links on the right. I prefer to work on Mozilla open source project but as I have said, my goal for this release is to adapt to new environments and solve as many issues from different projects.

The plan is simple, in order to finish as many bugs as possible with a tight schedule, I will search for issues on a daily basis. If I get stuck, I will refer to the project's documentation or communicate directly with the bug's reporter\requester.

by Dan at December 12, 2017 04:44 PM

earle white

Bug Release 0.2

with this semester coming to a close, we where giving our second  and last bug release. my confidence has been shattered after my completion of my first bug release. at this point in my computer programming journey i can be completely honest to say that this is not my cup of tea. due to the fact that i  like to know what i am doing before i do it. most of these bugs are focus on a language that is some what foreign to me. this time around we are encourage to find more challenging bugs,but my goal was to try and find something more simple like erase some lines or change functions name etc, but no such luck.

i have decide to work on a Mozilla Firefox Developer Tools project that as issues with java-script not displaying the styles properly in console logs. i believe that is will be a good project to show some growth and help me to  build back my confidence. i  will also create a fail proof plan just in-case that this bug don’t work out

by ewhite7blog at December 12, 2017 03:41 PM

Svitlana Galianova

Goals for the next month

It's about month before the due date for the next project for my Open Source course.
So it's time to set some goals and determine how I am going to grow as an open source contributor.

My goals are:

It's quite tricky to work on the assignment at the end of the semester with no plan. And Christmas is coming! I definitely need a plan how to spend more time with my family and friends, but also do my best for this project.

My plan:
  • Spend 3 hours on fixing bugs every week from Dec 12 till Jan 5
  • Learn Node.js, JavaScript and React if needed (depends on my knowledge and bugs I find)
  • Fix at least 2 bugs
So far I have found only one bug, but I hope to get involved in this project and continue working on it.

by svitlana.galianova ( at December 12, 2017 03:32 PM

Joao Rodrigues

Lab 10 - Getting Started on 0.2

  • How will you try to grow during this release? What are your goals (e.g., work on larger bugs, try to do more bugs, learn technology X, ...)
I want to land a fix, since for my first release, it didn't get accepted. Although it was a very valuable learning experience, which felt pretty close to a victory, it still wasn't the real thing. I want my code to land and make an impact.

  • Which projects did you consider? What made you end up choosing one over the others? How did you decide?
I looked around for a lot of gaming related projects. Eventually I stumbled into Destiny Item Manager, which is an app I have used in the past while playing the game. 

  • Which bugs are you going to work on (include links)? - this was quick work, simply wanted to help out thanks to me knowing Portuguese. I have already submitted a pull request.  - This is an enhancement which requires me not only to add new code, but also work with Destiny's API. I think working on this will be pretty neat.

(Will add more if time permits)

  • Which bugs are you saving as backups in case something goes wrong?
I am saving these as backup:

The reason why these will be saved as backup is due to the application itself not being completely implemented yet for Windows. The setting up of the development and testing environments is proving to be a little tricky, and if there is something I learned from release 0.1, is that I'd much rather spend more time coding, than troubleshooting.

  • How will you use the time you have left in the semester to get this done? You have other courses, holidays, etc, and still need to find time for this. What is your plan?
My plan, which I have been working on until now, was to finish the main bulk of school work from other subjects. Although there is still work left, knowing that I can spend more time trying to fix bugs, rather than catching up on missed material from the strike is pretty great.
  • What do you need to know in order to do this work? Do you have to learn something new? Are there tools you'll need to learn, code you don't understand?
I will have to work in Javascript which is a language I am somewhat familiar with now, a lot of it thanks to 

  • Do a quick survey of your bug and what's going to be involved in fixing it. Sometimes we don't fully understand what a bug is all about until we start trying to work on it. After 30 mins of research you might discover that it's going to be easier or harder than you thought. How will you respond to that?
Working with the Destiny API will for sure prove to be a challenge. I will need to request my very own API key so that I am even able to work on this project. The code seems understandable enough and I have reached out to folks who seem to have some experience with the app.

by João Rodrigues ( at December 12, 2017 01:38 AM

December 11, 2017

Sean Prashad

Fourth Quarter: Everything Counts

As we enter the final stretch of our semester, I'm losing hair, sleep and the remaining pieces of my social life but thankfully I've managed to stay somewhat ontop of my school work (and still have sheepy). This semester has been a milestone in my academic career; I've been successfully mentoring and tutoring ontop of a full-course load for just over 4 months now. I've learned more about myself this semester than in any other and more importantly, that giving back is very rewarding.

But the game isn't over - we still have a few weeks and a few bugs left to defeat before Fall 2017 comes to a conclusion!


I truly believe that all growth - personal or professional - starts small and over time develops into something truly magnificent despite all the hurdles, setbacks and disapointments. Growth is something that we're expected to consider as we officially enter the beginning stages of our second release, Release 0.2.

So this time around I took it upon myself to tackle more bugs (some even more difficult!). See below for the lineup:

You might have noticed that I went back to the same projects that I did for my first revision. To be quite honest, I wanted to branch out into different projects but with the burden of so many assignments and labs soon due (not to mention future tutor and mentee sessions), I couldn't risk venturing into new territory only to find out it's abandoned.

In case I'm not successful with the bugs listed up above, I'll be looking into helping with documentation for Rust or other simple bugs.

The Plan

Time is limited and here's how I plan to make the most of the few weeks left:

  1. Spend 30 minutes a day browsing through the source code for a bug, googling fixes or consulting a mentor on IRC/Slack for insight
  2. Dedicate half a day every Sunday to solve my easier bugs first, then work on the medium difficulty ones
  3. For Rust documentation, I'll have to rewrite much more than I've ever had to. Luckily I can bounce some ideas off Rust documentation guru, Steve Klabik, to see if I'm on the right track or not
  4. Specifically for writing unit tests in Activity Stream, I'll have to investigate what framework they use and get myself up to speed on using it - should be fun!
  5. For anything that I'm not sure of, I'll refer to documentation whether it's in a Git repo or on a website!

Until next time

I'm excited to see what else I can do in these last few weeks. Funny thing is that I know my journey with Open Source won't end here... so is this really the fourth quarter? Anyhow, adios until next time!

December 11, 2017 05:00 AM

December 09, 2017

Mat Babol

Learning Proper Documentation

Earlier this week I finally finished my JavaScript Node.js package. I finished all the coding and testing, and thought to myself "Okay great job, on to the next project." However there was one thing left to do: documentation. I know the value of good documentation because I've worked with a few large projects in the past. After coming back to a project after a month or two without any documentation, the code can seem very alien to you, as if it wasn't even you that wrote it. I typically try commenting most of my code, however my set-up instructions, project descriptions, and generally my ReadMe file has never been up to par. This project has taught me how to write a cleaner, better looking read me.

GitHub has detailed instructions for creating good documentation that I've been following. It specifies the order and content that a ReadMe generally follows: project name, description, table of contents, installation, usage, contributing, credits, and license. The ReadMe should only contain the necessary information for developers to get started using and contributing to the project, longer documentation can go to the wiki.

There is also plenty of markdown available to make the ReadMe nice and pretty. Markdown ranges from headers and tables, to inline code and even emojis. All the markdown syntax is relatively easy to remember, for example any header is preceded by a number sign (#), with each consecutive number sign representing another header (# is h1 while ###### is h6). My personal favorite is inline code, and it's very simple to use, simply surround your code with ` to inline your code

My ReadMe has went from looking like this..

to this.

Finishing my own node package was pretty. Having a very nice looking ReadMe is just the cherry on top. Having good documentation is not only important for myself to remember my own code, but also to attract other developers to start contributing. These are the valuable skills that I hope to get away from this project.

by Mat Babol ( at December 09, 2017 10:19 PM

Matthew Marangoni

Inline Assembly Lab

Inline Assembler Lab

Part A:

In part A of this lab, we compare our volume scaling solution from our previous lab, with a new version provided to us by the professor which was built for AArch64. We compare the two programs by timing similar test conditions (scaling and summing the samples) with equal sample sizes. For reference, below is the solution which was provided to us:

// vol_simd.c :: volume scaling in C using AArch64 SIMD
// Chris Tyler 2017.11.29

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include <time.h>
#include "vol.h"

int main() {
        clock_t start_t, end_t, total_t;        // timer variables

        int16_t*                in;             // input array
        int16_t*                limit;          // end of input array
        int16_t*                out;            // output array

        // these variables will be used in our assembler code, so we're going
        // to hand-allocate which register they are placed in
        // Q: what is an alternate approach?
        register int16_t*       in_cursor       asm("r20");     // input cursor
        register int16_t*       out_cursor      asm("r21");     // output cursor
        register int16_t        vol_int         asm("r22");     // volume as int16_t

        int                     x;              // array interator
        int                     ttl;            // array total

        in=(int16_t*) calloc(SAMPLES, sizeof(int16_t));
        out=(int16_t*) calloc(SAMPLES, sizeof(int16_t));

        printf("Generating sample data.\n");
        for (x = 0; x < SAMPLES; x++) {
                in[x] = (rand()%65536)-32768;

// --------------------------------------------------------------------

        in_cursor = in;
        out_cursor = out;
        limit = in + SAMPLES ;

        // set vol_int to fixed-point representation of 0.5
        // Q: should we use 32767 or 32768 in next line? why?
        vol_int = (int16_t) (0.5 * 32767.0);

        printf("Scaling samples.\n");

        // get start time
        start_t = clock();

        // Q: what does it mean to "duplicate" values here?
        __asm__ ("dup v1.8h,w22"); // duplicate vol_int into v1.8h
        while ( in_cursor < limit ) {
                __asm__ (
                        "ldr q0, [x20],#16              \n\t"
                        // load eight samples into q0 (v0.8h)
                        // from in_cursor, and post-increment
                        // in_cursor by 16 bytes

                        "sqdmulh v0.8h, v0.8h, v1.8h    \n\t"
                        // multiply each lane in v0 by v1*2
                        // saturate results
                        // store upper 16 bits of results into v0

                        "str q0, [x21],#16              \n\t"
                        // store eight samples to out_cursor
                        // post-increment out_cursor by 16 bytes

                        // Q: what happens if we remove the following
                        // lines? Why?
                        : "=r"(in_cursor)
                        : "r"(limit),"r"(in_cursor),"r"(out_cursor)

// --------------------------------------------------------------------

        printf("Summing samples.\n");
        for (x = 0; x < SAMPLES; x++) {

        // get end time
        end_t = clock();
        // calculate time spent
        total_t = (end_t - start_t) / CLOCKS_PER_MS;

        // Q: are the results usable? are they correct?
        printf("Result: %d\n", ttl);
        printf("Time elapsed: %ldms\n", total_t);
        return 0;


The performance of the inline assembler solution is slower (~3300ms) than my lab 6 solution (which used bit shifting to do the scaling) in the Algorithm Selection lab (~450ms). To make sure it wasn't the platform, I tested on both AArch64 and x84_64 and in both cases, the bit shifting solution was significantly faster. I'll attempt to answer some of the questions which are embedded in the code above:

Q: What is an alternate approach?
A: The alternate approach is to do the task without designating registers and leaving it up to the compiler to decide which registers should be used.

Q: Should we use 32767 or 32768 in the next line? why?
A: We should use 32767 because int16_t (signed) has a maximum value of 32767 (and a minimum value of -32768).

Q: What does it mean to duplicate values here?
A: The values are being duplicated throughout the register, the value from w22 is being use to fill the register v1.8h.

Q: What happens if we remove the following lines? Why?
A: If we remove the lines, the code won't work (we'll get a segmentation fault) because our output value is no longer being moved to 'in_cursor', and we are no longer be specifying 'limit', 'in_cursor', and 'out_cursor' values to be placed in registers.

Q: Are the results usable? Are they correct?
A: The results are usable and correct.

Part B:

I'll be examining the open source package groonga for any assembly code, and determining its purpose and impact overall. Groonga is an open-source fulltext search engine and column store.

The assembly in this package is mostly present in header files related to nginx (an HTTP server), for various platforms. In most of the locations assembly code is used, it is being used with the 'volatile' keyword, to ensure that the compilers do not move the code during events such as optimization.

The platforms we can see assembly being used on include:

The assembly code is only appears in the unix and win32 subfolders, but groonga can be installed on a Windows, OS X, Debian GNU/Linux, Ubuntu, CentOS, Fedora, or Solaris environment.

The purpose of the assembly used in the header files is primarily for defining memory barriers and in other 'c' files for memory management. It is rewritten with slight variations across multiple files to be platform specific. On other platforms other than the ones mentioned above, it should just ignore the platform-specific assembler optimizations and attempt to execute the rest of the code.

Value vs Complexity:
While I'm sure the use of assembly in this package adds a lot of value by being optimized for various platforms, I feel it is extremely difficult for beginner and intermediate-level programmers to interpret. The majority of the assembly code is not documented well, and its intent/purpose is hard to determine without making some assumptions. Memory management appears to be an issue for this package, as they recommend only running in 64-bit environments since the 32-bit package can easily encounter a out-of-memory error, and all the inline-assembly is related to memory barriers. I feel that although this is an open source package, it would be difficult for many developers who don't have a thorough background in assembly to make much sense of it and therefore contribute to it, which in my opinion is counter-intuitive to it being open source. The complexity added by assembly isn't the biggest issue, but more so the lack of comments and documentation.

by Matthew Marangoni ( at December 09, 2017 05:16 PM

Oleh Hodovaniuk

Final Project Stage 1


For the the final project of this course, i have to find an open source project that contains either hash or checksum function. I should install, profile the project as well as benchmark the hash function separately in order to produce the base test information for the stage 2 of my project.  So lets begin and find an open source project that would fulfill our project requirements.

For my final project  i chose hashdeep , an open source program  to compute, match, and audit hashsets. Hashdeep doesn’t only hash files with md5, but  can also computes MD5SHA-1SHA-256Tiger, or Whirlpool message digests on an arbitrary number of files. Hashdeep is similar to the md5deep and md5sum programs, however it has some additional features:

  • Recursive operation – hashdeep is able to recursively examine an entire directory tree. That is, compute the MD5 for every file in a directory and for every file in every subdirectory.
  • Comparison mode – hashdeep can accept a list of known hashes and compare them to a set of input files. The program can display either those input files that match the list of known hashes or those that do not match. Hashes sets can be drawn from Encase, the National Software Reference LibraryiLook InvestigatorHashkeepermd5sum, BSD md5, and other generic hash generating programs.
  • Time estimation – hashdeep can produce a time estimate when it’s processing very large files.
  • Piecewise hashing – Hash input files in arbitrary sized blocks
  • File type mode – hashdeep can process only files of a certain type, such as regular files, block devices, etc.

All the information about hashdeep is taken from from here. Ok, lets download and configure out project first.

Hashdeep Installation 

I  download the latest version of hashdeep tarball from Github. I downloaded the tarball by running wget <link to the tarball>. Once the download is complete i extracted the tarball files by running tar -zxvf <name of tarball with .tar.gz extension> 

OK, now we should be able to move forward and build the package, at least that is whats the plan. However,  building hashdeep on aarchie (AArch64 machine) is neither easy nor quick, because hashdeep requires additional dependencies and libraries to be installed. I figured that the best solution to get rid of all error messages that i got during the configuration are the bellow steps:

  • Install the g++ (c++) library by running sudo dnf install gcc-c++ command
  • Then run  autoreconf -ifv  which takes care of running aclocal, autoheaders, autoconf and automake in the right order to regenerate the building infrastructure from input files

We need to run the autoreconf, because by default the ./configure executable file is not yet generated and you will not be able to run it unless you use the auto tools to regenerate this configuration file first. So, once this is done you should be able to proceed to the configuration and installation.

First run the ./configure, then make and then sudo make install commands.  Note that this time we actually install the software not just build the program via make install command. The reason why we use sudo make install is because the hashdeep requires the access to the folders and files outside of the server partition allocated to you. Meaning you will hit permission error message without using sudo. Once the above is done we can test if our hash program works by hashing a small file.

[ohodovaniuk@aarchie src]$ md5deep testfile.txt
6169f4d38e109960f41e266ec56ff947 /home/ohodovaniuk/project/md5deep-4.1.1/src/testfile.txt

The highlighted unicode is our hash key for the file. Now it is time to analyse the source files of our program, find optimization candidates and benchmark the hash function. Lets profile our program first.

Hashdeep Profiling

In order to observe visible hashdeep run time improvements i need to identify and optimize the function that has a major impact on the overall program run time. Regardless of the optimization strategy it is important to find out how the hashdeep program logic is divided and what functions are used more than the other (or do more work than the other). Meaning if the function does total 60% of the whole program work it would make more sense to improve that function as long as it will likely improve overall hashdeep run time. In order to find out which functions takes the majority of the execution time we need to profile the program. Profiling is done through the gprof tool that can be enabled with the -pg flag.  Profiling should be enabled during the program configuration and to do so we need to run the configure command with the following flags:
           ./configure CFLAGS=”-pg” CXXFLAGS=”-pg” LDFLAGS=”-pg”
These linker flags: “CFLAGS”, “CPPFLAGS”, “LDFLAGS” are referenced in each hashdeep Makefile and they define how the program is configured.
The above command will not produce the profiling output file. Not yet. You will need to execute the hashdeep first. To test our program and profile it successfully we need to generate a dummy file that would be big enough to make the hashdeep work (it will be used for benchmarking). Command to create a dummy file of certain size:
            fallocate -l 1G gprof.txt
After the successful configuration and hashdeep test run we can finally find the gmon.out – gprof output file. This file contains all the hashdeep profiling information. There are couple ways to view the profiling information.

You can generates a readable output file by running the following command:

gprof md5deep > gprof.txt

I personally prefer the visual diagram that can be generated by running:
gprof m5deep | gprof2dot | dot -Tpng > image.png
On Aarchie i could not use the display command to view the image, so i simply copied the image from the server to my local windows machine. And thats when the bad news came in. The graph is huge and it is easy to get lost. There are more than a hundred of various functions that hashdeep uses. Most of the functions simply recall each other and do not have much of the program logic in it, but i still had to go thorough each one of them.  The diagram is too big to paste in my blog because it has multiple branches of functions that you can follow. Based on pure numbers i followed the path that is used the most by the hashdeep. That brought me to the MD5Transform function. Here is the snippet of the diagram that helped me identify the branch where the functions with the most optimization potential are located.
Now we can partially view the hierarchy of our program functions. However, based on the complete profiling output it looks like hashdeep mostly reuses the sha1 and sha256 cryptographic hash functions like sha1(256)_update and hash1(256)_finish. So at this point we have two potential candidates for optimization. Even though profiling gave me hints on where to search, i still had to review each function manually to decide if it has at least relatively complex logic or not. Some of the functions are simply recalling the other functions, so i had to dig deeper to find the function that would have optimization potential.
Ok, lets review the main of the hashdeep program to confirm that the mentioned above functions are used. Note that most of the functions in hashdeep are indirectly hash functions, because they are reusing the hash logic passed from the main, but they are not naturally hash. The most used functions are the following:

void hash_update_sha1(void * ctx, const unsigned char *buf, size_t len)
sha1_update((sha1_context *)ctx,buf,len);

void hash_final_sha1(void * ctx, unsigned char *sum)
sha1_finish((sha1_context *)ctx,sum);

Ok, so sha1_update and sha1_finish functions can be found inside the ./src/sha1.c source file. Here is each function content:

void sha1_update( sha1_context *ctx, const unsigned char *input, size_t ilen )
size_t fill;
unsigned long left;

if( ilen <= 0 )

left = ctx->total[0] & 0x3F;
fill = 64 – left;

ctx->total[0] += (unsigned long) ilen;
ctx->total[0] &= 0xFFFFFFFF;

if( ctx->total[0] < (unsigned long) ilen )

if( left && ilen >= fill )
memcpy( (void *) (ctx->buffer + left),
(const void *) input, fill );
sha1_process( ctx, ctx->buffer );
input += fill;
ilen -= fill;
left = 0;

while( ilen >= 64 )
sha1_process( ctx, input );
input += 64;
ilen -= 64;

if( ilen > 0 )
memcpy( (void *) (ctx->buffer + left),
(const void *) input, ilen );

void sha1_finish( sha1_context *ctx, unsigned char output[20] )
unsigned long last, padn;
unsigned long high, low;
unsigned char msglen[8];

high = ( ctx->total[0] >> 29 )
| ( ctx->total[1] << 3 );
low = ( ctx->total[0] << 3 );

PUT_ULONG_BE( high, msglen, 0 );
PUT_ULONG_BE( low, msglen, 4 );

last = ctx->total[0] & 0x3F;
padn = ( last < 56 ) ? ( 56 – last ) : ( 120 – last );

sha1_update( ctx, (const unsigned char *) sha1_padding, padn );
sha1_update( ctx, msglen, 8 );

PUT_ULONG_BE( ctx->state[0], output, 0 );
PUT_ULONG_BE( ctx->state[1], output, 4 );
PUT_ULONG_BE( ctx->state[2], output, 8 );
PUT_ULONG_BE( ctx->state[3], output, 12 );
PUT_ULONG_BE( ctx->state[4], output, 16 );

First of all, i noticed that the hashdeep source code files have very little if any comments and documentation to help understand what the function actually does and how is it related to to other functions. The main logic uses multiple functions that start with “hash_”, but all of them are indirectly related to the hash logic. In the main other helper hash functions are called and the hash variables are passed in it. So, i could not identify a function that would be separated from the other functions because the hash function is part of the main. At this point i do have an algorithm that could be optimized in the future, but to be honest i am leaning towards other methods of optimization. Either way, i will decide once i get there. Lets benchmark the sha1 hash function separately from the hashdeep hash program.

Sha1deep and MD5deep benchmarking

I did all the bench marking around 12AM, so that aarchie is not busy to achieve more accurate results. For bench marking i need to things: a couple of big files and a loop that would time the run time of each sha1 execution. Lets first compile the sha1 hash function source files and store the result in electable file sha1deep. Now we need to create two dummy files with different sizes using the bellow command:
[ohodovaniuk@aarchie src]$ fallocate -l 100M test1.txt
[ohodovaniuk@aarchie src]$ fallocate -l 1G test2.txt
We need to benchmark the performance of the sha1deep using a custom command that will execute the time sha1deep test.txt multiple times (you do not want to run the command 10 times yourself, let the machine do the work).

Here is the command plus the total benchmark time required for the sha1deep to hash a file with size 100M 10 times in a row.

[ohodovaniuk@aarchie src]$ time for i in {1..10}; do sha1deep test1.txt; done
2c2ceccb5ec5574f791d45b63c940cff20550f9a /home/ohodovaniuk/project/md5deep-4.1.1/src/test1.txt
2c2ceccb5ec5574f791d45b63c940cff20550f9a /home/ohodovaniuk/project/md5deep-4.1.1/src/test1.txt
2c2ceccb5ec5574f791d45b63c940cff20550f9a /home/ohodovaniuk/project/md5deep-4.1.1/src/test1.txt

real 0m16.069s
user 0m15.852s
sys 0m0.541s
[ohodovaniuk@aarchie src]$

As far as i know the benchmarking results are based on the total real time/times executed formula. So the average benchmark time of the sha1deep on a 100m file is:

Average bench marking time:  0m1.606s

Here is the command plus the total benchmark time required for the sha1deep to hash a file with size 1G 10 times in a row.

[ohodovaniuk@aarchie src]$ time for i in {1..10}; do sha1deep test2.txt; done
2a492f15396a6768bcbca016993f4b4c8b0b5307 /home/ohodovaniuk/project/md5deep-4.1.1/src/test2.txt
2a492f15396a6768bcbca016993f4b4c8b0b5307 /home/ohodovaniuk/project/md5deep-4.1.1/src/test2.txt

real 2m43.870s
user 2m41.276s
sys 0m6.318s

Bench-marking average time is:
Average bench marking time:  0m24.387s
The above benchmarking results will be a good base test case for the stage 2, when we will have to compare our new benchmarking results to the above to find out the difference in results.
Conclusion and Optimization strategy
It is hard to isolate the hash function from other functions and test it separately, because in my case the hash consists of multiple partitions that function only when the most important functions are linked together. It is extremely hard to understand the logic of the main but the hash function depends on other function and does very little without the rest of the main logic. Because the hashdeep is such a big project i was unable to isolate a single function, i rather isolated a small group of self related hash functions.
Plus to have noticeable optimization results we need to work with the the functions that do the most work or are frequently used, but in the same time it has to be hash function which makes it harder to do the analysis. However, overall i was able to isolate the hash logic that has the most optimization potential. The update and finish functions are used a lot and any optimization done to them might improve the overall performance.
So here are my optimization options:
  • Use build in compiler option
  • Source code changes to enable better compiler optimization
  • Algorithm improvements
  • Use in-line assembler

After reviewing the code i assume it is not a good idea to do neither improve the algorithm programatically nor use inline assembler, because hashdeep is such a big project with so many functions, helpers and other logic that it requires someone who actually completely understand how everything works to make reasonable modifications. To make any major code changes i need to understand the code and because the documentation is so poor it is hard to navigate thorough the logic. In stage 2 i will optimize the hashdeep with the compiler options and some code modifications to enable maximum compiler optimization.


by ohodovaniuk at December 09, 2017 02:46 PM

Steven De Filippis

Exploring bugs beyond the horizon

With the upcoming date of Release 0.2 coming close, it is time to explore for a more worthwhile bug. We were encouraged to look for more difficult bugs and also possibly in a project that is outside the scope of Mozilla. I was going through the available bugs for Mozilla products, and noticed the selection is not as extensive as it was for Release 0.1. As a result, I decided to take a look at the libimobiledevice library that is used in many applications across a variety of platforms.

libimobiledevice is a library that enables a developer to link against an open-sourced re-implementation of Apple’s MobileDevice library. MobileDevice is essentially the underlying library iTunes utilizes to make a Mobile Apple product (iPhone, iPad, iPod) restore, sync, activate, etc. It has been utilized in many public iOS jailbreaks as it helps ease the tension in having to link against a private/closed-sourced library which sometimes has no publicly exported symbols. This means if someone wanted to hook a function on the Windows variant of the MobileDevice library, they would have to manually specify the function’s address instead of just calling dlopen()+dlsym(); As the MobileDevice dll is stripped of any symbols. If one did ship their application with hardcoded function addresses, it would regularly require updates due to iTunes updates shifting the function address every release.

libimobiledevice relieves that frustration while also being able to ship the library with the application. You can communicate with Apple’s Mobile Device products while not requiring the user install proprietary software (iTunes). Not to mention this enables cross-platform flexibility along with supporting non-macOS/Windows platforms (i.e Ubuntu). Thus it seems like an overall better choice when requiring a library with such capabilities and long-term support.

That being said, in libimobiledevice’s idevicerestore module, I came across this bug/issue “2nd Generation iPod Touch restore to 4.2.1 fails“. Its an outstanding issue as it actively supports every other device and firmware combination possible except the iPod touch 2.

By taking on this bug, I plan on expanding my knowledge of lower level restores for Apple Mobile Devices. I decided to take on this bug as I use this very library internally at work for various projects and would benefit by ensuring capability is maximized (in the case that we will encounter an iPod touch 2; Though this bug may expand beyond this individual model).

In terms of my plan for Release 0.2:

1. How will you use the time you have left in the semester to get this done? You have other courses, holidays, etc, and still need to find time for this. What is your plan?

While there is not much time left in the semester due to the strike, I plan on utilizing the weekend to dig into this Release in-depth. It will likely require a few hours of research to determine the underlying issue, followed by more hours of research in determining the appropriate way to effectively patch it without affecting the integrity of the library.

2. What do you need to know in order to do this work? Do you have to learn something new? Are there tools you’ll need to learn, code you don’t understand?

I will need to know how the various modules interconnect with each other (libirecovery, idevicerestore, usbmuxd). I plan on tracing the bug by grep’ing source files for the error string and following function calls throughout libraries. I will likely need to add debug messages for myself and restore on a working device to see what the expected result should be. There may also be the use of gdb/lldb to check the program state of when the bug occurs and when the bug does not occur.

3. Do a quick survey of your bug and what’s going to be involved in fixing it. Sometimes we don’t fully understand what a bug is all about until we start trying to work on it. After 30 mins of research you might discover that it’s going to be easier or harder than you thought. How will you respond to that?

I feel like the bug is likely not going to be easier than I thought due to the amount of libraries incorporated into idevicerestore. Even if the underlying bug may be easy, it will take more time understanding the code around it before being able to determine the bug and implement a fix. Jumping between the various libraries may be a bit frustrating – but may be easy to resolve by using some IDE and importing all libraries to allow seamless jumping without having to have them opened in a separate window.

That’s all for now.

by Steven at December 09, 2017 05:00 AM

Creating Fancy README files via Markdown Language

README files have evolved quite a bit over the past few decades in terms of software releases. In a world without rich text READMEs, we were previously use to seeing ASCII art in README files along with the use of asterisks and slashes to emphasizes words. While there was not a standard, it was interesting to see the various takes people had on creating README files.

In today’s modern age, Markdown language is introduced in a variety of popular web-based git portals (GitHub, bitbucket, Gitlab, etc.). Markdown processes formatted text files and enables Rich Text functionality for the reader. When incorporating such things into README files, one can emphasize words no longer by asterisks/slashes, but by bold text now. Along with the incorporation of media, Markdown language creates a more user-friendly reading experience.

Though I already was familiar with Markdown language from experience at work, I did find making a useful README is harder than it looks. Through various examples online, some people utilize Emojis/Unicode to format their README files. An excellent example that was used to demonstrate the extent you can push Markdown is displayed in image-to-ascii. Documentation expands into data types returned from library functions. Example code was formatted and coloured accordingly. It gave me some ideas for my own README to not seem so generic and uninteresting.

You can review my README over here.

by Steven at December 09, 2017 04:16 AM

December 08, 2017

Marco Beltempo

Tips for Creating and Effective README

Markdown Logo


So, you finally have time to work on that program you’ve been setting aside for weeks. You open your project folder, only to find yourself starring dumbfounded at the screen, trying to dissect your own logic, and questioning the compiler for successfully building spaghetti code. Many programmers have experienced this. Myself included.

As we continue to develop our open source file utility libraries, we’ve shifted focus from continuous integration to understand the benefits and importance of creating supportive documentation.  I believe that being able to effectively explain your program is a great skill in the open source community. We often make assumptions and “forget” to comment our code. If you know the language that means you shouldn’t have a problem figuring out what it does. Often this isn’t the case.

Do people actually readme?

Yes. The reality is, documents such as README and CONTRIBUTING are some of the most important files in your project. It’s the first screen users navigate to for insight. Having an effective README is a great way to spark a person’s interest in your project. They understand why this project matters, and how their contributions can benefit its growth.

“Bug free software that nobody knows how to use is just as bad as software that cannot be used because it’s so full of bugs.” Elliot Land

To create a great README, it’s important to understand it’s purpose. The goal is to enable new users quickly and easily set up by providing, the purpose of the project, setup and installation instructions, define APIs, give clear examples.


Some tips to consider while writing your README:

  1. Use Markdown for styling and organizing your document
  2. Examine README’s from other projects for ideas and examples
  3. Document new features as their added
    • This will not only help other developers better understand the changes but also save time retracing those changes
    • It will also give you a different perspective on the current state of the project
  4. Provide descriptive examples. Utilize images, GIFs, snippets etc.
  5. Clearly label steps and procedures


Below is a README template I created for the fileside project.

This template can be used as a starting point for your project.

The post Tips for Creating and Effective README appeared first on Marco Beltempo.

by Marco Beltempo at December 08, 2017 06:00 AM

Saeed Mohiti

Vectorization Lab(lab5)

Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn’s taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. It is processing architecture which is the same operation is performing to multiple data at the same time.(SIMD)

with SMID we have access to vector instructions which uses vector registers, then the CPU able to load different elements at the same time, and after that can add the elements with a single instruction. This process class auto vectorization.

Let’s make it simple and clear: we know an integer is 32 bytes, now we have a register with 128 bytes wide, we can use an instruction to lead the first 4 elements in integer lane, the with the single instruction that mentioned above we can add all 4 elements at the same time.

For this lab we must “Write a short program that creates two 1000-element integer arrays and fills them with random numbers in the range -1000 to +1000, then sums those two arrays element-by-element to a third array, and finally sums the third array and prints the result.”

let’s look at this program which has been written in C:

#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define MAX 1000

int main()


int i;

int sum;

sum = 0;

int arr1[MAX], arr2[MAX], arrsum[MAX];

time_t t;

srand((unsigned) time(&t));

for ( i =0 ; i < MAX; i++){

     arr1[i] = rand() % 2001 -1000;

     arr2[i] = rand() % 2001 -1000;


for ( i =0 ; i < MAX; i++){

     arrsum[i] = arr1[i] + arr2[i];


for ( i =0 ; i < MAX; i++){

     sum += arrsum[i];


printf(“ the total is %d \n”, sum);



The program compiled successfully with this command

gcc -o  lab5.c lab5

and I got the result :

the total is: 47799

after that I used the objdump -d to see disassembly and non-optimized code:

objdump -d  lab5

ok lets look at the <main> section :


0000000000400684 <main>:

  400684:       d285e010        mov     x16, #0x2f00                    // #12032

  400688:       cb3063ff        sub     sp, sp, x16

  40068c:       a9007bfd        stp     x29, x30, [sp]

  400690:       910003fd        mov     x29, sp

  400694:       b92efbbf        str     wzr, [x29,#12024]

  400698:       910043a0        add     x0, x29, #0x10

  40069c:       97ffff9d        bl      400510 <time@plt>

  4006a0:       97ffffb0        bl      400560 <srand@plt>

  4006a4:       b92effbf        str     wzr, [x29,#12028]

  4006a8:       14000027        b       400744 <main+0xc0>

  4006ac:       97ffffa1        bl      400530 <rand@plt>

  4006b0:       2a0003e1        mov     w1, w0

  4006b4:       529a9c60        mov     w0, #0xd4e3                     // #54499

  4006b8:       72a83000        movk    w0, #0x4180, lsl #16

  4006bc:       9b207c20        smull   x0, w1, w0

  4006c0:       d360fc00        lsr     x0, x0, #32

  4006c4:       13097c02        asr     w2, w0, #9

  4006c8:       131f7c20        asr     w0, w1, #31

  4006cc:       4b000040        sub     w0, w2, w0

  4006d0:       5280fa22        mov     w2, #0x7d1                      // #2001

  4006d4:       1b027c00        mul     w0, w0, w2

  4006d8:       4b000020        sub     w0, w1, w0

  4006dc:       510fa002        sub     w2, w0, #0x3e8

  4006e0:       b9aeffa0        ldrsw   x0, [x29,#12028]

  4006e4:       d37ef400        lsl     x0, x0, #2

  4006e8:       914007a1        add     x1, x29, #0x1, lsl #12

  4006ec:       913d6021        add     x1, x1, #0xf58

  4006f0:       b8206822        str     w2, [x1,x0]

  4006f4:       97ffff8f        bl      400530 <rand@plt>

  4006f8:       2a0003e1        mov     w1, w0

  4006fc:       529a9c60        mov     w0, #0xd4e3                     // #54499

  400700:       72a83000        movk    w0, #0x4180, lsl #16

  400704:       9b207c20        smull   x0, w1, w0

  400708:       d360fc00        lsr     x0, x0, #32

  40070c:       13097c02        asr     w2, w0, #9

  400710:       131f7c20        asr     w0, w1, #31

  400714:       4b000040        sub     w0, w2, w0

  400718:       5280fa22        mov     w2, #0x7d1                      // #2001

  40071c:       1b027c00        mul     w0, w0, w2

  400720:       4b000020        sub     w0, w1, w0

  400724:       510fa002        sub     w2, w0, #0x3e8

  400728:       b9aeffa0        ldrsw   x0, [x29,#12028]

  40072c:       d37ef400        lsl     x0, x0, #2

  400730:       913ee3a1        add     x1, x29, #0xfb8

  400734:       b8206822        str     w2, [x1,x0]

  400738:       b96effa0        ldr     w0, [x29,#12028]

  40073c:       11000400        add     w0, w0, #0x1

  400740:       b92effa0        str     w0, [x29,#12028]

  400744:       b96effa0        ldr     w0, [x29,#12028]

  400748:       710f9c1f        cmp     w0, #0x3e7

  40074c:       54fffb0d        b.le    4006ac <main+0x28>

  400750:       b92effbf        str     wzr, [x29,#12028]

  400754:       14000012        b       40079c <main+0x118>

  400758:       b9aeffa0        ldrsw   x0, [x29,#12028]

  40075c:       d37ef400        lsl     x0, x0, #2

  400760:       914007a1        add     x1, x29, #0x1, lsl #12

  400764:       913d6021        add     x1, x1, #0xf58

  400768:       b8606821        ldr     w1, [x1,x0]

  40076c:       b9aeffa0        ldrsw   x0, [x29,#12028]

  400770:       d37ef400        lsl     x0, x0, #2

  400774:       913ee3a2        add     x2, x29, #0xfb8

  400778:       b8606840        ldr     w0, [x2,x0]

  40077c:       0b000022        add     w2, w1, w0

  400780:       b9aeffa0        ldrsw   x0, [x29,#12028]

  400784:       d37ef400        lsl     x0, x0, #2

  400788:       910063a1        add     x1, x29, #0x18

  40078c:       b8206822        str     w2, [x1,x0]

  400790:       b96effa0        ldr     w0, [x29,#12028]

  400794:       11000400        add     w0, w0, #0x1

  400798:       b92effa0        str     w0, [x29,#12028]

  40079c:       b96effa0        ldr     w0, [x29,#12028]

  4007a0:       710f9c1f        cmp     w0, #0x3e7

  4007a4:       54fffdad        b.le    400758 <main+0xd4>

  4007a8:       b92effbf        str     wzr, [x29,#12028]

  4007ac:       1400000b        b       4007d8 <main+0x154>

  4007b0:       b9aeffa0        ldrsw   x0, [x29,#12028]

  4007b4:       d37ef400        lsl     x0, x0, #2

  4007b8:       910063a1        add     x1, x29, #0x18

  4007bc:       b8606820        ldr     w0, [x1,x0]

  4007c0:       b96efba1        ldr     w1, [x29,#12024]

  4007c4:       0b000020        add     w0, w1, w0

  4007c8:       b92efba0        str     w0, [x29,#12024]

  4007cc:       b96effa0        ldr     w0, [x29,#12028]

  4007d0:       11000400        add     w0, w0, #0x1

  4007d4:       b92effa0        str     w0, [x29,#12028]

  4007d8:       b96effa0        ldr     w0, [x29,#12028]

  4007dc:       710f9c1f        cmp     w0, #0x3e7

  4007e0:       54fffe8d        b.le    4007b0 <main+0x12c>

  4007e4:       90000000        adrp    x0, 400000 <_init-0x4d8>

  4007e8:       91230000        add     x0, x0, #0x8c0

  4007ec:       b96efba1        ldr     w1, [x29,#12024]

  4007f0:       97ffff60        bl      400570 <printf@plt>

  4007f4:       52800000        mov     w0, #0x0                        // #0

  4007f8:       a9407bfd        ldp     x29, x30, [sp]

  4007fc:       d285e010        mov     x16, #0x2f00                    // #12032

  400800:       8b3063ff        add     sp, sp, x16

  400804:       d65f03c0        ret


As we can see there are no any optimization or vectorization are used. Because I did not use the special flag when compiling the code.

Now I compile the code with -O3 to enable the auto-vectorization:

gcc -O3 lab5.c -o lab5

there are no any changes in output and I still get the random numbers.

So, let’s use the objdump -d again but this time want to see the disassembly and auto-vectorized or optimized code:

Obviously the <main> section is shorter

In the first part of the code that belongs to the first loop, both calls to rand it can be seen

0000000000400580 <main>:

  400580:       d285e610        mov     x16, #0x2f30        // #12080

  400584:       cb3063ff        sub     sp, sp, x16

  400588:       a9007bfd        stp     x29, x30, [sp]

  40058c:       910003fd        mov     x29, sp

  400590:       910123a0        add     x0, x29, #0x48

  400594:       a90153f3        stp     x19, x20, [sp,#16]

  400598:       529a9c74        mov     w20, #0xd4e3        // #54499

  40059c:       a9025bf5        stp     x21, x22, [sp,#32]

  4005a0:       72a83014        movk    w20, #0x4180, lsl #16

  4005a4:       f9001bf7        str     x23, [sp,#48]

  4005a8:       910143b6        add     x22, x29, #0x50

  4005ac:       913fc3b5        add     x21, x29, #0xff0

  4005b0:       5280fa33        mov     w19, #0x7d1        // #2001

  4005b4:       d2800017        mov     x23, #0x0          // #0

  4005b8:       97ffffd6        bl      400510 <time@plt>

  4005bc:       97ffffe9        bl      400560 <srand@plt>

  4005c0:       97ffffdc        bl      400530 <rand@plt>

  4005c4:       9b347c01        smull   x1, w0, w20

  4005c8:       9369fc21        asr     x1, x1, #41

  4005cc:       4b807c21        sub     w1, w1, w0, asr #31

  4005d0:       1b138020        msub    w0, w1, w19, w0

  4005d4:       510fa000        sub     w0, w0, #0x3e8

  4005d8:       b8376ac0        str     w0, [x22,x23]

  4005dc:       97ffffd5        bl      400530 <rand@plt>

  4005e0:       9b347c01        smull   x1, w0, w20

  4005e4:       9369fc21        asr     x1, x1, #41

  4005e8:       4b807c21        sub     w1, w1, w0, asr #31

  4005ec:       1b138020        msub    w0, w1, w19, w0

  4005f0:       510fa000        sub     w0, w0, #0x3e8

  4005f4:       b8376aa0        str     w0, [x21,x23]

  4005f8:       910012f7        add     x23, x23, #0x4

  4005fc:       f13e82ff        cmp     x23, #0xfa0

  400600:       54fffe01    4005c0 <main+0x40>


The second part that belongs to the second loop, add both arrays. As we can see the code is vectorized.

Clearly, we can see the program loads two registers (ldr) and adds them in an instruction store register (str)


  400604:       d283f202        mov     x2, #0x1f90       // #8080

  400608:       8b0203a1        add     x1, x29, x2

  40060c:       d2800000        mov     x0, #0x0        // #0

  400610:       3ce06ac0        ldr     q0, [x22,x0]

  400614:       3ce06aa1        ldr     q1, [x21,x0]

  400618:       4ea18400        add     v0.4s, v0.4s, v1.4s

  40061c:       3ca06820        str     q0, [x1,x0]

  400620:       91004000        add     x0, x0, #0x10

  400624:       f13e801f        cmp     x0, #0xfa0

  400628:       54ffff41    400610 <main+0x90>


the third loop is vectorized too,


  40062c:       4f000400        movi    v0.4s, #0x0

  400630:       aa0103e0        mov     x0, x1

  400634:       d285e601        mov     x1, #0x2f30     // #12080

  400638:       8b0103a1        add     x1, x29, x1

  40063c:       3cc10401        ldr     q1, [x0],#16

  400640:       4ea18400        add     v0.4s, v0.4s, v1.4s

  400644:       eb01001f        cmp     x0, x1

  400648:       54ffffa1    40063c <main+0xbc>

  40064c:       4eb1b800        addv    s0, v0.4s

  400650:       90000000        adrp    x0, 400000 <_init-0x4d8>

  400654:       91210000        add     x0, x0, #0x840

  400658:       0e043c01        mov     w1, v0.s[0]

  40065c:       97ffffc5        bl      400570 <printf@plt>

  400660:       f9401bf7        ldr     x23, [sp,#48]

  400664:       a94153f3        ldp     x19, x20, [sp,#16]

  400668:       52800000        mov     w0, #0x0           // #0

  40066c:       a9425bf5        ldp     x21, x22, [sp,#32]

  400670:       d285e610        mov     x16, #0x2f30       // #12080

  400674:       a9407bfd        ldp     x29, x30, [sp]

  400678:       8b3063ff        add     sp, sp, x16

  40067c:       d65f03c0        ret


The auto-vector does not change the result but optimization, in terms of time-consuming the program process is much faster in run-time.

gcc can vectorized many loops, but I did not understand why the first loop was not vectorized!!

by msmohiti at December 08, 2017 12:49 AM

December 07, 2017

Michael Pierre

Planning For Bug Release 0.2

Ever since I fixed my first bug I have become fascinated with the fact that I can jump onto to a random project and start contributing. I always had a love for music ever since I was a child and learned how to play piano, guitar, and produce music to help feed my passion. On the other end I also love computers and technology so I wanted find a middle ground where I could develop employable skills like software development as well as my hobby being music. Lab10 1To do this I searched all around GitHub for music based projects that I could possibly contribute too and fell across a project called APlayer. APlayer is an HTML5 music player programmed in JavaScript. It allows you play songs locally or across the net as well as make playlists and add cover art to songs. In my opinion it’s really cool and after spending some time working with the raw code of the project I can see it has a lot to offer.

When I first landed on the projects GitHub page I noticed most of the issues and pull requests were rather old and didn’t have a lot of activity which made me worry initially. I decided to email the author of the project and ask if it was still actively running and if I could work on it to gain experience in school. To my surprise the author “DIYgod” replied in less than a day saying that he was busy with his full-time job now to work on the project but invited me to join the organization and start contributing instead. After reading the reply I got pretty excited because I’ve always wondered how to join an organization on GitHub and if I’ll ever be invited to one. Eo6BOvqbdIIt’s nice contributing to a project however being part of the projects organization and contributing felt a lot better in my opinion. It gave me a sense of ownership for the bugs I was fixing and features I was working on even though it’s as silly as just having a little badge on your GitHub page. Never the less I hope to work on more bugs and features for APlayer and possibly branch off to other projects in the MoePlayer organization like DPlayer which is an HTML5 video player for the browser or cPlayer which is another web based media player.

by michaelpierreblog at December 07, 2017 04:26 PM

December 06, 2017

Jiyoung Bae

Review: gprof (Profiling)

Profiling is about how much time each function spends at execution time. With the information gained from profiling, we can make the program more efficient by rewriting the code.

gprof is the profiling toll provided by GNU. Today, I will review the basic commands how to use it, and I will analyze it later posting.

The steps are below.

  1. Make profiling is enabled using -pg option at compile time.
  2. Execute the program
  3. Find profiling data generated
  4. Run the gprof tool

Let’s have a look with an example; the code was modified from a tutorial (

void funcTester1(void) {
    printf("\n In funcTester1 \n");
    int i = 0;

static void funcTester2(void) {
    printf("\n In funcTester2 \n");
    int i = 0;

int main(void) {
    printf("\n Inside main()\n");
    int i = 0;


    return 0;

After the code is written to loop for a huge amount time, I will compile it with  -pg option.

$ gcc -pg gprofTest.c -o gprofTest

Profiling is enabled this time with -pg option, and execute the program so profiling data is generated, gmon.out.

$ gprof gprofTest gmon.out > analysis.txt

To run gprof tool, executable command(gprof) and argument(gmon.out) should be given. Also, if we can save the result to a text file format, in this case is analysis.txt, which is flat profile. If we do not enter the specific file name, it will be stored to output.txt as a default set.

That is easy but the more important part is how to understand the result and how to improve it.


by Irene Bae at December 06, 2017 11:34 PM

Joshua Longhi


My most recent addition to my file utility repo written in go was a file. My repo can be found here: Adding a README allowed me to convey a lot of information about my project in an organized manner. README’s are useful for open source projects as they allow people to get familiar with large code bases and look up any relevant information about it. My README includes instructions on how to build, test and contribute to the project as well as how to set up a build environment.

README’s on github have the file extension .md which stands for markdown. Markdown is a way to easily render information. With markdown you can create headers, tables links and many more things that make the presentation of your README easier to understand. I mostly leveraged headers and code blocks to organize my README and display example code. The syntax isn’t too hard to learn and there is great documentation available on it.

by jlonghiblog at December 06, 2017 11:06 PM

Henrique Coelho

SPO600 Lab 7 - More Inline Assembly

A while ago I made two posts: SPO600 - Lab 6 - Algorithm Selection was about different algorithms for scaling digital sound, and Experimenting with Inline Assembly in C talking about how to use Inline Assembly in C. For this lab, I will join both things by making a better algorithm using Inline Assembly.

The algorithm was given to us by the professor, but I wanted to make it look more like the approaches I did before, so I modified it a bit.

Here is the logic behind it: Instead of going through every single element in the array, we can use vector registers to operate with 8 elements at a time (8 elements of 16 bit each = 128 bits). To make sure the operation is as efficient as possible, we will implement this with inline assembly.

And here are the changes:

  // This variable will hold the last address of the array
  int16_t* limit;

  // Will be used for the assembler code: the volume will now be
  // stored in register 22, and a cursor for the array (for looping)
  // will be stored in register r20
  // - Question: Is there an alternate approach for this?
  // - Answer: Yes. Instead of naming the registers we are going to use,
  // we can leave it to the compiler. If we do this, the asm() instruction
  // will have to change, since we will not be able to use the name of
  // the register anymore
  register int16_t  volumeInt asm("r22");
  register int16_t *arrCursor asm("r20");

  // First and last address of the array
  arrCursor = arr;
  limit = arr + LENGTH;

Those are the variables I will need. Now, for the algorithm that goes inside the loop:

  // Here I am duplicating the volume factor in vector v1.8
  // - Question: What do you mean by "duplicating"?
  // - Answer: By duplicating, I mean that the volume factor
  // will be repeated through the vector. For example, say the
  // volume factor is "19" and the vector has 8 lanes. The vector
  // register will then be filled as "1919191919191919"
  asm("dup v1.8h, w22");

  // While we did not reach the last element of the array...
  while(arrCursor < limit) {
      // Load eight shorts into the vector v0.8 (q0)
      "ldr q0, [x20]                  \n\t"

      // Multiply vector v0.8 by the volume factor and save
      // into v0.8
      "sqdmulh v0.8h, v0.8h, v1.8h    \n\t"

      // Store the value of v0.8 back in the array (all 8 shorts at once)
      // and advance the cursor to the next 8 shorts (16 bytes)
      "str q0, [x20],#16              \n\t"

      // Using "arrCursor" as input/output
      : "+r"(arrCursor)

      // Specifying that "limit" is a read-only register
      : "r"(limit)

Another question that arises is: do we need the input/output sections in the inline assembly in this case, since we are addressing the registers directly? In theory, no, we wouldn't need it; however, if we do not specify it, the compiler will be confused: the compiler does not know what the asm procedure is doing, so it will see it as a "black box". Since it is a black box, the compiler is unaware that the controls for the loop while(arrCursor < limit) are being changed in it. In other words, the compiler will think that it is actually an infinite loop, and will replace it with a faster alternative. We explicitly tell the compiler that the arrCursor is being modified, so it will be aware that the loop is not infinite, and must be checked every iteration.

The result was slightly better than the previous implementation (which was around 0.22 seconds): the new version can finish in around 0.19 seconds. My guess is that the previous implementation was already been vectorized by the compiler, so the difference was not too significant.

Inline Assembly in mjpegtools

The next part of the lab is to pick an open source library that uses Inline Assembly and determine:

  • How much assembley-language code is present
  • Which platform(s) it is used on
  • Why it is there (what it does)
  • What happens on other platforms
  • Your opinion of the value of the assembler code VS the loss of portability/increase in complexity of the code.

The library I choose is called mjpegtools. According to the documentation,

Programs for MJPEG recording and playback and simple cut-and-paste editting and MPEG compression of audio and video under Linux.

These are the instances of Inline Assembly I found:

file utils/cpuinfo.c

// In the instructions below, the application is moving the values
// from register B to register source index, placing the bytes 0x0f
// and 0xa2, and then exchanging the contents of the two registers.
// I don't know what the instructions from those bytes are, so sadly,
// I can't really tell what these sections are doing or if they could
// be better.
// The platform is obviously for x86, but the variation seem to be if
// the platform is 64 bits or not: the 32 bits will use the prefix "e",
// while the 64 bit version will use "r" (extended registers)
// What happens on other platforms? If I did not miss anyting, there
// are no other platforms supported.

#define CPUID   ".byte 0x0f, 0xa2; "
#ifdef __x86_64__
  asm("mov %%rbx, %%rsi\n\t"
  asm("mov %%ebx, %%esi\n\t"
#ifdef __x86_64__
      "xchg %%rsi, %%rbx\n\t"
      "xchg %%esi, %%ebx\n\t"

// ...

#define RDTSC   ".byte 0x0f, 0x31; "
  asm volatile (RDTSC : "=A"(i) : );

file utils/cpu_accel.c

#ifdef HAVE_X86CPU 

/* Some miscelaneous stuff to allow checking whether SSE instructions cause
   illegal instruction errors.

static sigjmp_buf sigill_recover;

static RETSIGTYPE sigillhandler(int sig )
    siglongjmp( sigill_recover, 1 );

typedef RETSIGTYPE (*__sig_t)(int);

static int testsseill()
    int illegal;
#if defined(__CYGWIN__)
    /* SSE causes a crash on CYGWIN, apparently.
       Perhaps the wrong signal is being caught or something along
       those line ;-) or maybe SSE itself won't work...
    illegal = 1;
    __sig_t old_handler = signal( SIGILL, sigillhandler);
    if( sigsetjmp( sigill_recover, 1 ) == 0 )
        asm ( "movups %xmm0, %xmm0" );
        illegal = 0;
        illegal = 1;
    signal( SIGILL, old_handler );
    return illegal;

static int x86_accel (void)
    long eax, ebx, ecx, edx;
    int32_t AMD;
    int32_t caps;

    /* Slightly weirdified cpuid that preserves the ebx and edi required
       by gcc for PIC offset table and frame pointer */

#if defined(__LP64__) || defined(_LP64)
#  define REG_b "rbx"
#  define REG_S "rsi"
#  define REG_b "ebx"
#  define REG_S "esi"

#define cpuid(op,eax,ebx,ecx,edx)    \
    asm ( "push %%"REG_b"\n" \
          "cpuid\n" \
          "mov   %%"REG_b", %%"REG_S"\n" \
          "pop   %%"REG_b"\n"  \
     : "=a" (eax),          \
       "=S" (ebx),          \
       "=c" (ecx),          \
       "=d" (edx)           \
     : "a" (op)         \
     : "cc", "edi")

    asm ("pushf\n\t"
     "pop %0\n\t"
     "mov %0,%1\n\t"
     "xor $0x200000,%0\n\t"
     "push %0\n\t"
     "pop %0"
         : "=a" (eax),
           "=c" (ecx)
     : "cc");

    if (eax == ecx)        // no cpuid
    return 0;

    cpuid (0x00000000, eax, ebx, ecx, edx);
    if (!eax)            // vendor string only
    return 0;

    AMD = (ebx == 0x68747541) && (ecx == 0x444d4163) && (edx == 0x69746e65);

    cpuid (0x00000001, eax, ebx, ecx, edx);
    if (! (edx & 0x00800000))    // no MMX
    return 0;

    caps = ACCEL_X86_MMX;
    /* If SSE capable CPU has same MMX extensions as AMD
       and then some. However, to use SSE O.S. must have signalled
       it use of FXSAVE/FXRSTOR through CR4.OSFXSR and hence FXSR (bit 24)
    if ((edx & 0x02000000))    
        caps = ACCEL_X86_MMX | ACCEL_X86_MMXEXT;
    if( (edx & 0x03000000) == 0x03000000 )
        /* Check whether O.S. has SSE support... has to be done with
           exception 'cos those Intel morons put the relevant bit
           in a reg that is only accesible in ring 0... doh! 
        if( !testsseill() )
            caps |= ACCEL_X86_SSE;

    cpuid (0x80000000, eax, ebx, ecx, edx);
    if (eax < 0x80000001)    // no extended capabilities
        return caps;

    cpuid (0x80000001, eax, ebx, ecx, edx);

    if (edx & 0x80000000)
    caps |= ACCEL_X86_3DNOW;

    if (AMD && (edx & 0x00400000))    // AMD MMX extensions
        caps |= ACCEL_X86_MMXEXT;

    return caps;

/* AltiVec optimized library for MJPEG tools MPEG-1/2 Video Encoder
 * Copyright (C) 2002  James Klicman <>
 * The altivec_detect() function has been moved here to workaround a bug in a
 * released version of GCC (3.3.3). When -maltivec and -mabi=altivec are
 * specified, the bug causes VRSAVE instructions at the beginning and end of
 * functions which do not use AltiVec. GCC 3.3.3 also lacks support for
 * '#pragma altivec_vrsave off' which would have been the preferred workaround.
 * This AltiVec detection code relies on the operating system to provide an
 * illegal instruction signal if AltiVec is not present. It is known to work
 * on Mac OS X and Linux.

static sigjmp_buf jmpbuf;

static void sig_ill(int sig)
    siglongjmp(jmpbuf, 1);

int detect_altivec()
    volatile int detected = 0; /* volatile (modified after sigsetjmp) */
    struct sigaction act, oact;

    act.sa_handler = sig_ill;
    act.sa_flags = 0;

    if (sigaction(SIGILL, &act, &oact)) {
    return 0;

    if (sigsetjmp(jmpbuf, 1))
    goto noAltiVec;

    /* try to read an AltiVec register */ 

    detected = 1;

    if (sigaction(SIGILL, &oact, (struct sigaction *)0))

    return detected;

The code above is a bit cryptic, to say the least. Again, the platform is x86 and other platforms don't see to be supported. What it does is, again, not exactly obvious, but I can guess: the Inline Assembly seems to deal a lot with vector registers and vectorization, so it probably sets up a very efficient way to deal with multiple data that needs a single instruction (SIMD). This makes sense, since it is a library for MJPEG, dealing with big streams of data.

Since I am not exactly sure what the code does, I can't say I have a strong opinion about it. It does seem like it has a good reason to be there: enforcing vectorization. I am not sure, however, if it really needs to be written in assembly or could be left to the compiler to optimize it. I would say that the loss in portability is definitely a problem, since other architectures are becoming more popular now: it would probably be a good idea to write more cases for the other platforms, or look for an alternative solution.

by Henrique at December 06, 2017 07:28 PM

Matthew Welke

Hacking Some Assembly into C

Today I’ll be talking about hacking your C code by injecting some Assembly code into it.


Alright, maybe this isn’t really “hacking” and isn’t as revolutionary as it might seem. Turns out the GCC compilers have had this functionality for a long time now. Perhaps it even has to do with how Chris Sawyer did his thing with RollerCoaster Tycoon.

I’ve learned by now in my SPO600 class that you can get down into the nitty gritty details of a library, compiling different versions for different platforms to get as much performance as possible for that platform. You can even go beyond different C implementations and get into Assembly implementations (though you couldn’t pay me enough to be an Assembly programmer, to be honest). Now I’ve learned that you can have your cake and eat it to, if you’re willing to use a certain syntax to mix some assembly into your C program just where you need it.


Here’s a quick example. First, a C program that adds two numbers together and displays the result:

#include <stdio.h>

int main() {
    int a = 20;
    int b = 22;

    // The action is here!
    b = a + b;

    printf("The sum is %d.", b);

This may be a simple example, in fact so simple that the optimizations done for me by the C compiler may make it as fast as it can possibly be, but let’s throw in some inline assembly to replace the “action” anyways. Here’s the same C program, with the computation replaced by inline assembly (for the x86_64 architecture):

#include <stdio.h>

int main() {
    int a = 20;
    int b = 22;

    // The action is here!
    __asm__("add %1,%0"
            : "+r"(b)
            : "r"(a)

    printf("The sum is %d.", b);

In this version of the program, the adding is done by taking the contents of the second register I described (which is linked to the variable “b”), adding the contents of the first register I described (which is linked to the variable “a”), and putting that new sum into the first register I described. Why the odd order? The “asm” function requires me to describe the output registers first.

After the asm part is finished, the output of the assembly program is stored in the variables I described for it for output purposes. In this case, it means that the sum of “42” is stored in variable “b”, and just like with the C program, I can display the sum to the user by reading the contents of this variable.

Real World Usage

So where does this come into play for more real world usage? In my last lab, I examined different algorithms for processing an audio signal. I was able to get the most efficient result by using a lookup table containing the results of multiplying each possible signal value by the volume scaling factor. On the AArch64 platform, unexpectedly, this approach was slower. But in the real world, on most systems, the lookup table should be faster than doing each multiplication operation. Now, armed with the knowledge that you can inject Assembly code into C programs, let’s examine an algorithm that uses this approach to attempt to be more efficient. This code was provided by our professor. I’ve annotated the questions featured in the code comments with their answers.

// set vol_int to fixed-point representation of 0.5
// Q: should we use 32767 or 32768 in next line? why?
// A: 32767 must be used because this is the maximum value of the int16_t data type. The volume factor could go as high as 1.0. So, 1.0 * 32768 would be one too high for this data type.
vol_int = (int16_t) (0.5 * 32767.0);

printf("Scaling samples.\n");

// Q: what does it mean to "duplicate" values here?
// A: This takes the data present in the w22 register and in one operation, writes it into many registers. This parallel operation makes it more efficient.
__asm__ ("dup v1.8h,w22"); // duplicate vol_int into v1.8h
while ( in_cursor < limit ) {
    __asm__ (
    "ldr q0, [x20],#16 \n\t"
    // load eight samples into q0 (v0.8h)
    // from in_cursor, and post-increment
    // in_cursor by 16 bytes

    "sqdmulh v0.8h, v0.8h, v1.8h \n\t"
    // multiply each lane in v0 by v1*2
    // saturate results
    // store upper 16 bits of results into v0

    "str q0, [x21],#16 \n\t"
    // store eight samples to out_cursor
    // post-increment out_cursor by 16 bytes

    // Q: what happens if we remove the following
    // lines? Why?
    // A: If the following lines are removed, the constraints aren't set on the registers we reference. This would cause a compiler error, and if the compiler didn't help, would cause unintended behavior at runtime as register values are unexpectedly overwritten.
    : "=r"(in_cursor)
    : "r"(limit),"r"(in_cursor),"r"(out_cursor)

Now let’s use the same value for the “number of samples” value as I used in the previous algorithms lab (500000000) and see if this implementation is faster:

Fastest old implementation on AArch64: 3.840 seconds (fixed-point arithmetic)
This implementation: 0.752 seconds

In this case, the assembly optimization proved to be much faster! It’s probably worth it to optimize small parts of your programs this way if it’s a small amount of code that isn’t likely to change over time, and where its benefit can be taken advantage of by other programs that depend on it.

Investigating Popular Software

Now let’s investigate a popular open source software project to see how they use inline Assembly code to speed up their program. For this, I’ll be looking at Blender.

The entire Blender git repo only has inline assembly code in one file called “system.c”. This is an excerpt from this source code file where they use it in one function definition:

int BLI_cpu_support_sse2(void)
#if defined(__x86_64__) || defined(_M_X64)
  /* x86_64 always has SSE2 instructions */
  return 1;
#elif defined(__GNUC__) && defined(i386)
  /* for GCC x86 we check cpuid */
  unsigned int d;
      "pushl %%ebx\n\t"
      "popl %%ebx\n\t"
      : "=d" (d)
      : "a" (1));
  return (d & 0x04000000) != 0;
#elif (defined(_MSC_VER) && defined(_M_IX86))
  /* also check cpuid for MSVC x86 */
  unsigned int d;
  __asm {
    xor     eax, eax
    inc eax
    push ebx
    pop ebx
    mov d, edx
  return (d & 0x04000000) != 0;
  return 0;

The purpose here is to use the “CPUID” instruction to learn about the CPU’s capabilities. Specifically, they’re looking for the SSE2 instruction set, presumably because certain features of Blender require this instruction set. I don’t have time to investigate the whole source code base for Blender, but my assumption is that this instruction set contains instructions that help with rendering things quickly or performing other performance-sensitive work.

When it comes to programs using inline assembly code, there’s always a trade off of portability and simplicity vs. performance. Use too much inline assembly, and your code becomes less portable (are you sure your specific register instructions are supported on the CPU that ends up compiling this? etc) and the maintainers need to understand some assembly just to know what your code is doing. Use none at all, and you risk having your program not take full advantage of the hardware it runs on.

I think that in this case, the usage of inline assembly was appropriate. They need to find out what instructions the CPU supports and this is the only way to do it. It’s only a few lines in one source code file, it’s not like they’re sprinkling bits of assembly code everywhere in a disorganized manner. This particular use of assembly code is also simple to understand. As a novice, I was able to learn about it with some quick googling, so I’m sure that anybody wishing to contribute to the Blender project will not be too confused by it.


While I’ve learned that I’m definitely most comfortable with web development and high level languages, it has been fun to investigate this low level coding, seeing what little hacks you can do when you’re willing to micromanage the registers. I feel like there may be a day when I’m coding something and I think “Gee it sure would be simpler to just make some byte arrays and make my own algorithm in C to do this” and I set about making a C extension for my task. At that point, maybe I’ll have a “go big or go home” attitude and if it’s performance critical, be willing to get my hands dirty with the registers. This kind of coding may help me out in a pinch in my future career and I’m glad I’ve had a chance to see it.


by Matt at December 06, 2017 02:02 AM

Jiel Selmani

Good Documentation, Great Project

In my last post, which you can find here, I had been working on a rather simple project where I integrated testing with TravisCI, ESLint, and Jest.  Even though it was a rather easy project, I wanted to become familiar with the testing tools that were set out to us by my Professor/Open-Source Genius.  Today, I want to discuss the importance of great documentation versus, well, not-so-good documentation.  I'll be diving into a successful project with great docs and what it does for the community that works on that project and how it influenced me to follow suit to do the same.  If you'd like to take a look at my project's documentation, please go ahead and click here.  I am very open on how we can improve this project even further so feel free to open an issue and contribute.

"Why is documentation so important?"

I've been asked this before when I write comments in my code for things that seem like they would be common sense, and I hear this when I respond with "it's good practice. You should do it."

I don't want to write comments, this code makes perfect sense!

Like every developer, code always make sense when you're writing it.  Simply, you understand the problem at hand and are solving it, so of course you know how to traverse through your project.  But what happens when you step away from it to solve new problems on a new project? When someone else wants to get involved?  Without proper documentation, we lose sight of what we set out to accomplish and when we join new projects, it can be quite difficult to get started without some guidance.  With all of this being stated, let's move on to a successful project and its documentation.


It goes without saying, I am really intrigued by this repository.  When I first started contributing in the world of open source, I was able to get started right away and hit the ground running.  Was it daunting and rather intimidating at first? Yes, it was.  No doubt about it, but the documentation and community in this project is world-class and beyond welcoming.  You can find the debugger.html repository by clicking here


Right off the bat, you see a paragraph that describes what the project is about, the tools used, and what they have set out to accomplish.  As you continue to read through the README, it explains how to set-up the project for development locally, how to get involved and contribute, a development guide to provide insight into how you should develop for the project and outlines the style/programming guides, and finally documentation that is updated weekly!  Essentially, this project is full of guidance to help anyone learn a few things before diving in and asking questions that are likely answered there.

What does this do for the community?  First, it lets them know that the project is active and being worked on frequently.  For newcomers (like me) to React and Redux, explanations in the development guide supplement the documentation from their respective libraries, giving the developer more insight into how to traverse through the code-base.  Finally, excellent documentation shows the community that you actually want them to read it; the effort put into writing it does not go unnoticed.

So what did I do?  Well, I did what any developer would do. Emulate what people that are smarter than you are doing with regards to context and complexity and learn from them.

Extractor Docs

Writing documentation is fun.  It gives you the opportunity to explain what you did and how others can use what you created for themselves, but my favourite part is working with Markdown.  It's like developing software, except writing it entirely in readable text.  When I was writing my documentation, I enjoyed it for this reason.  I took what I learned from the debugger.html documentation and moved forward from there.

The most important part about my 'doc' was certainly how to test the project with the scripts.  I kept everything consistent throughout the document, but also provided information on what each script call does and why.  Even though there are only three tests that can be run in Extractor, the value here is that I know what each test does and can explain it back to someone who stumbles upon this repository.

Reflection and Understanding

Throughout my entire post-secondary career, my classmates and I have never really been pushed to write documentation.  We've either had to be interested in writing it for ourselves, or continue through our assignments and personal projects not understanding its importance.  With the completion of this project, I see the requirement for having great docs.  Throughout our careers, documentation allows us to keep a history of a point in time (sounds like Git, right?) on what you were thinking and why.  Without it, you could spend hours, and even days, trying to remember why you solved a problem in that particular way.

If you truly want to be a great developer (and colleague), write documentation.  Let others know what you're thinking, and they might just come along and give you a hand.

by Jiel Selmani at December 06, 2017 01:13 AM

December 05, 2017

Joao Rodrigues

Lab 9 - Read me

Adding appropriate documentation for your code is standard practice in any company you work for when dealing with a big, or even small project. Sometimes, code is not immediately apparent to others, and functions that one may think are pretty straight-forward, actually aren't. Setting up an environment can also be something that is very time consuming, having that type of information can cut down the time it'll take for someone to contribute to your project. As such, I added a Readme section to my project, where labs 6/7 and 8 were. I also added comments to my functions so it is clearer to others as to what the function is actually doing. I was aware of the existence of Markdown, thanks to the two terms I have spent as a coop for two different companies, it was fun to see how other students set all of their Readme sections differently. As for the reviewing part of the lab, I did so for this project, where I created a pull request to fix some typos and to add a link to their License.

by João Rodrigues ( at December 05, 2017 03:57 PM

Saeed Mohiti

build and Test Glibc (lab 4 part 2)

In the following lab 4, for 2nd part we need to find and build the latest version of GNU Standard C Library (glibc).

To start this lab I used the provide code by Chris:

$ mkdir $HOME/src

$ cd $HOME/src

$ git clone git://

$ mkdir -p $HOME/build/glibc

$ cd $HOME/build/glibc

$ $HOME/src/glibc/configure –prefix=/usr

$ make


“Configure” command set out the package and create the “Makefile” file for my system

“–prefix” option inform “configure” to install the glibc on specific path

But I got error when I wanted to configure, the error was:

“These critical programs are missing or too old: bison”

Now what is the bison? I started to search, and I found it on GNU official page:

Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR(1) parser tables. As an experimental feature, Bison can also generate IELR(1) or canonical LR(1) parser tables. Once you are proficient with Bison, you can use it to develop a wide range of language parsers, from those used in simple desk calculators to complex programming languages.

Bison is upward compatible with Yacc: all properly-written Yacc grammars ought to work with Bison with no change. Anyone familiar with Yacc should be able to use Bison with little trouble. You need to be fluent in C or C++ programming in order to use Bison. Java is also supported as an experimental feature. “(GNU)

I spent half day to figure it out what is the bison and how can I install it, and amazingly Chris helped me to solve this issue.

So then I tried to run ./configure again and perfectly done. Then I ran the make and after long process I got:

make[2]: Leaving directory ‘/home/msmohiti/src/glibc/elf’

make[1]: Leaving directory ‘/home/msmohiti/src/glibc’

it seems compiled successfully

after the compilation I wrote my simple code to test it with new installed glibc

This code is generating random numbers between 0 and 5.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main()

int r;

return 0;

Let’s compile it and run it with new glibc:

gcc test.c -o test

./ ~/lab4/test

The result is :

5[msmohiti@xerxes glibc]$ ./ ~/lab4/test

5[msmohiti@xerxes glibc]$ ./ ~/lab4/test

3[msmohiti@xerxes glibc]$ ./ ~/lab4/test

And when I tried to test the program with system glibc and it works perfectly:

5[msmohiti@xerxes lab4]$ ./test

1[msmohiti@xerxes lab4]$ ./test

1[msmohiti@xerxes lab4]$ ./test

I have to modify my code to test the glibc again. At this time the program will make random numbers between 0 and 10, then I have to “make” again and compile my code to test.

For testing we use same command above and the program works perfect.

I haven’t see any different between new glibc and system glibce.


 we made a directory to store the source code and made another separated directory for compiling glibc.

Need to install some tools if they are missing or old by using this command:

Sudo install dnf (tool’s name)

And testing the file with new installed glibc and system glibc.

Override mechanism:

Overriding is an object-oriented programming feature that enables a child class to provide different implementation for a method that is already defined and/or implemented in its parent class or one of its parent classes. The overriden method in the child class should have the same name, signature, and parameters as the one in its parent class.”

In simple word we can interpret it to:

Replacing or overwriting the implementation which has been defined for previous uses which the system supports.

Multiarch mechanism:

Multiarch is a general mechanism for installing libraries of more than one architecture (ABI) on a system”

“Multiarch is the term being used to refer to the capability of a system to install and run applications of multiple different binary targets on the same system. For example running a i386-linux-gnu application on an amd64-linux-gnu system. This example is the most common case, but many other working combinations are possible, such as armel and armhf.”

by msmohiti at December 05, 2017 03:22 PM

Mat Babol

Setting up a complete CI Solution for a project

With this lab, I continued working on my library that I created. Although originally my library was pure JavaScript/HTML, I changed it over to Node for some new experience. This required me to modify some of my functions.

This lab has taught me how to set a continuous integrated environment. In simpler words, continuous integration is a process of automating the build and testing of code for every new commit. My first experience with automated code testing came as I pushed first bug fix, I received a message saying my code didn't get accepted because Travis CI build failed. This time, I got to set up my entire build and test process.

There are a few files that are needed for a continuous integration process. The first thing needed was a way to tell TravisCI how to configure, build, and test my project. To do this, I created a .travis.yml file in my root folder, which holds the language used, as well as the appropriate versions. Next, I used the package.json file to automate my scripts, my dependencies, and other information. I love the ability to create these automated scripts. Notice the install script above. If I am to continue writing this code on a new machine, or maybe my colleague wants to contribute, they would need to write out all these commands to get the same working environment as myself. This script automated the entire process, allowing a developer on another machine focus on the code and not on how to properly set up the environment. This way, there's no need to upload all my extra node_modules, the script will automatically install everything. That's absolutely fantastic!

Now onto creating automated tests. I created a separate directory for my test cases. I had some troubles understanding how to write these test cases, how do they work, do they need to be in the same directory as the source files, and so on. My main problem was that I had all my functions in a main file, file-info.js, once I split the function up into their own files, the test cases started working properly. In no time I had all my tests passing.

My current progress on the library can be found here. For the next lab, I will be learning different documenting techniques for an open source project.

by Mat Babol ( at December 05, 2017 05:36 AM

Dan Epstein

The Importance of Documenting Open Source Projects

Why to Document?

Documenting is important because it allows for your readers to understand how to use your project and in general provide useful information  relating to it. This week I was documenting my Ruby lib functions on Github. Usually documentation in Github are in .md file format, which is a MarkDown language. I have heard before about markdown as most open source softwares contains at least one of them. Markdown is very easy to learn and there are useful tutorials and even online editors on how to get started with it. 

Markdown Syntax Example

This week a class mate of mine have contributed to my project, special thanks to Sean Prashad. Documenting and writing comments to your code is the start of creating a great project, because think about it, if you don't document anything, how could someone else understand it, how will YOU understand it if you come back to it after a long duration of time without seeing any instructions or guides on it. Therefore, Always document your source!

by Dan at December 05, 2017 02:42 AM

December 04, 2017

Svitlana Galianova

Proper documentation is a key to success

This week in my Open Source Development class we were discussing a topic of Documentation.

When (almost) every programmer hears a word "documentation", first thought is pain.

When you start documenting a production-ready project, you know that the next days/weeks are going to be a hell.

It would be nice to start documenting stuff from the beginning, but you didn't... You were too excited with coding and solving more significant(as you think) issues.

I have been there. If you are working on your own the only documentation you might have are comments to explain some hard-to-follow places in your code and maybe that's fine for now. But what if you come back to this project in a few weeks or even months. You will not remember which component does what.

I have a small rule I follow while working on every new project. I try to put meaningful comments, so later on you can reuse a code without spending a long time figuring out what that function does.

I also like to start working on README file as soon as I set up development environment. It's much easier to document things you are currently experiencing, maybe paste some links/commands to download all the dependencies or put a note how to solve an issue and how you overcome it.

Proper documentation would be definitely appreciated by your team, your boss, contributors and even yourself.

So if there is a question: do I need a documentation? The answer is always: "YES!!!", even if that's a small project living on your local machine.

Tips and useful links:
  • Use this website to get started with Markdown (it stores cookies, if you want to start everything from the scratch again either clean those or access the website in incognito mode)
  • Gitlab Markdown cheat sheet

by svitlana.galianova ( at December 04, 2017 07:00 PM

Sean Prashad

A Programmers Nightmare: Documentation

"I'm in school to write code, not author it (let alone writing blog posts)..."

Sadly, this is something that a lot of Computer Science students tend to think when an assignment's instructions state "document your code".

As someone who has dipped both feet in the work industry for a bit, I definitely understand why proper documentation is necessary however I don't see it happening often enough at Seneca. This is very apparent for any student that sits down with me for tutoring at the Learning Centre and as such, there's one thing that I drive home:

"Comment what was necessary for resolving your bug(s) and/or any new things that you have learned with me"

Most students exclaim "Yes! Great idea Sean!" then hastily try to backtrack to the respective files within their IDE, leaving me wondering why they have not picked up on the usefulness of writing comments or documenting their code during their time here at Seneca.

And so our Professor and Mozillian, David Humphrey, recently bestowed upon us the task of learning Markdown to properly document one of our previous labs (mine being Fiffy).

Where to Start?

Properly documenting my own project was definitely a new experience, one which I actually liked... I got the chance to show off what my code was capable of; that unit tests had been written and more importantly, were reported as successful by TravisCI. Before doing so, I had to search for an existing repo that was doing it right and learn how to write my own For that, I found that my fellow classmate, Marco Beltempo, had taken off to a great start for his own project and as such, I leveraged a similar style going forward (If you're reading this, thanks Marco!). In addition, commenting the source code was somewhat challenging simply because I had to be succint but only to the point where I provided enough insight - I think that balance will come with time and experience.

Magnificient Markdown

Thankfully I had previous experience with Markdown which helped me feel right at home. The hardest part of writing my file was to ensure that formatting and consistency were properly maintained. Luckily, the IDE I was using (Atom) had a built-in Markdown previewer. See it in action below:

Atom Markdown Preview
Utilizing Atom's Markdown preview interface!

Sharing Fiffy with the World

With my library newly documented, I thought to myself: "Hey, why don't I try to publish it so that anyone writing Rust can use it?". I remembered that I had actually contributed to a project that hosted "Crates" (or libraries if you will) for Rust and returned to

Returning to
Look at which crate was just updated!

Following the steps outlined here, I was able to publish Fiffy publicly:

Fiffy Published to
World: Meet Fiffy

Understanding The Need

I have to admit, publishing my own crate was pretty awesome! Ensuring another developer, experienced or not, could come along and integrate my library into their project was eye-opening. Great documentation is more reassuring to anyone using your code as it shows signs of respecting others time and adhering to an unanimous standard in modern day Open Source.

Further more, I even have another classmate, Dan Epstein, help improve Fiffy! I completely forgot about the best thing about great documentation - it attracts others to do your work for you :)

December 04, 2017 03:30 PM

Lucas Verbeke


In this lab we had to make an array of 500 million 16 bit integers then scale it by 0.75 using three different methods.

The first method


this method takes all of the samples and multiply them by the scaling factor this is the most simple method of doing it.

Second method


This method creates a table with all possible outcomes then checks to compare them instead of doing all the calculations. For this method it is faster than the first one as long as the size of the samples is larger then the size of the number of possible outcomes.

Third Method


The last method uses bit shift to preform the operations this method is faster than any of the other two methods regardless of size.


In conclusion the first method is generally the slowest because preforming the multiplication task over and over again consumes a lot of resources. The second Method can be quicker then the first method if there are enough samples that it negates the extra time taken to create the lookup table. The third method is just the quickest of them all around.

by thelucasexcerpt at December 04, 2017 02:42 AM

Michael Pierre

Learning Markdown and Forming a Proper

On most pages I’ve come across on GitHub I usually find really well written and constructed README’s (example) that outline and describe exactly how to use the application. I haven’t really thought about creating an intricate README like those and stuck to either the old fashion README.txt in the root of an application or used microsoft word to write instructions on how the application works and functions. By using GitHub more and more it has become clear to me that having a is almost critical to having the potential user/contributor understand the application and avoid them from just clicking away when all they see is the code on your repository page. Spending a lot of time on a repository and then having no users to benefit from it isn’t best of things especially for a software developer so therefore I had to learn how to create a README like the ones I’ve been seeing all across GitHub. lab 9 1

The first step was learning how to write markdown which is a lightweight markup language with plain text formatting syntax. It is used a lot on online forums and to create rich text using a plain text editor. My first question about markdown was why not just use HTML since it has been around forever and a lot of software developers already know it to a degree. When I asked my professor about this his exact words were “it’s a lot less dense than HTML, much closer to pure writing.” Which now looking back after learning a decent amount about markdown I can agree with. For example in markdown to show an image in the page all you have to do is write “![alt text](link/to/image)” which is a lot easier than writing the HTML equivalent “img src=’…’ alt=’…’ width=’…’ height=’…’>”. For the average user it is a lot easier to learn markdown than be writing HTML every time you want to post in a discussion forum or make changes to a README. GitHub has a great article on how to master markdown which helped me a lot with the process of creating my README (link). Since the repository for my README doesn’t hold a lot of code and is more for learning about the functionality of git and GitHub there wasn’t a lot I could write, however it did help me develop a good structure for a README as well a make a which holds the guidelines for contributing to a repository. To write markdown for the README I used Visual Studio Code which offers a feature to preview your markdown while you write it which really helps instead of constantly committing new changes to my README like I was originally doing. lab 9 2By creating a proper README I most likely will do the same with my other repositories that currently hold just .doc files on how to use it that might not be visible to the average user visiting a repository of GitHub.

by michaelpierreblog at December 04, 2017 12:33 AM

December 03, 2017

Saeed Mohiti

Code Building Lab (Lab 4)

In this lab we’ll try to build an open source package from Free Software Foundation’s GNU Project.

I decided to choose “Gcal”, this program can calculate and print calendars. Let’s go through this simple software package.

The current operating system we are running now, has the basic “Gcal” software in it, but this software has more extra features such as: displaying eternal holiday lists for many countries around the globe, and features a very powerful creation of fixed date lists that can be used for reminding purposes, or calculating various astronomical data and times of the Sun and the Moon for pleasure at any location, precisely enough for most civil purposes. In addition, “Gcal” supports some other calendar systems, for example, the Chinese and Japanese calendars, the Hebrew calendar, and the civil Islamic calendar, too.

Let’s start to build this package.

I downloaded the latest version (4.1) from one of provided in the official website, directly to the current directory by this command:


then I tried to unzip the file with:

tar -xzf gcal-4.1.tar.gz

inside the extracted file are were bunch of files and I opened the README text file to find out how can I install this package. But there was nothing related and directed me to the INSTALL text file.

Actually, the process of installation is very simple and easy but the time of installation it takes quit bit long.

For installing the software, the first step is:

I ran the  ‘./configurecommand to configure the package for my system.

This command took long time; a long list of statements falls down the screen, with “YES” word at the end of each statement, that proved something has been successfully checked.

and at the very end I got this funny message:

Bored? Fallen asleep??   All checks done!

Perhaps you might want to inspect the created

Makefiles now for tuning some settings…


So, as it asked I ran the ‘make’ command to compile the package

This step is optionally but to making sure I ran ‘make check’ to run any self-tests that come with

the package, generally using the just-built uninstalled binaries.

It was little bit confusing to how to run this software but finally got, there was a folder by the name “src

cd src’ and then run the “Gcal” by ‘./gcal’ command


This is the result:

    December 2017

 Su Mo Tu We Th Fr Sa

                 1  2

 $<2> 3$<2> 4  5  6  7  8  9

 10 11 12 13 14 15 16

 17 18 19 20 21 22 23

 24 25 26 27 28 29 30


I started to read the Documentation for more options, it was interesting when I found lots of option.

Let have a look at some of options:



This option displays all legal holidays and all further memorial days.

./gcal -n

Eternal holiday list:        The year 2017 is NO leap year

Its seems I need more option to see some acceptable result:

./gcal -n –christian-holidays


Eternal holiday list:                      The year 2017 is NO leap yea

Mary – Blessed Virgin (Chr)              – Sun,  Jan  1st 2017 = -336 days

Epiphany/Three King’s Day (Chr)          – Fri,  Jan  6th 2017 = -331 days

Mary’s Candlemas (Chr)                   – Thu,  Feb  2nd 2017 = -304 days

Septuagesima Sunday (Chr)                – Sun,  Feb 12th 2017 = -294 days

St Valentine’s Day (Chr)                 – Tue,  Feb 14th 2017 = -292 days

Sexagesima Sunday (Chr)                  – Sun,  Feb 19th 2017 = -287 days

Quinquagesima Sunday (Chr)               – Sun,  Feb 26th 2017 = -280 days

Ash Wednesday (Chr)                      – Wed,  Mar  1st 2017 = -277 days

1st Sunday in Lent (Chr)                 – Sun,  Mar  5th 2017 = -273 days

2nd Sunday in Lent (Chr)                 – Sun,  Mar 12th 2017 = -266 days

3rd Sunday in Lent (Chr)                 – Sun,  Mar 19th 2017 = -259 days

St Joseph’s Day (Chr)                    – Sun,  Mar 19th 2017 = -259 days

Mary’s Annunciation Day (Chr)            – Sat,  Mar 25th 2017 = -253 days

4th Sunday in Lent (Chr)                 – Sun,  Mar 26th 2017 = -252 days

Passion Sunday (Chr)                     – Sun,  Apr  2nd 2017 = -245 days

Palm Sunday (Chr)                        – Sun,  Apr  9th 2017 = -238 days

Good Friday (Chr)                        – Fri,  Apr 14th 2017 = -233 days

Good Saturday/Easter Eve (Chr)           – Sat,  Apr 15th 2017 = -232 days

Easter Sunday (Chr)                      – Sun,  Apr 16th 2017 = -231 days

Easter Monday (Chr)                      – Mon,  Apr 17th 2017 = -230 days

Rogation Sunday (Chr)                    – Sun,  May 21st 2017 = -196 days

Christ’s Ascension Day (Chr)             – Thu,  May 25th 2017 = -192 days

Whitsunday/Pentecost (Chr)               – Sun,  Jun  4th 2017 = -182 days

Whit Monday (Chr)                        – Mon,  Jun  5th 2017 = -181 days

Holy Trinity (Chr)                       – Sun,  Jun 11th 2017 = -175 days

Feast of Corpus Christi (Chr)            – Thu,  Jun 15th 2017 = -171 days

Feast of Heart Jesus (Chr)               – Fri,  Jun 23rd 2017 = -163 days

St John’s/Midsummer Day (Chr)            – Sat,  Jun 24th 2017 = -162 days

St Peter and St Paul (Chr)               – Thu,  Jun 29th 2017 = -157 days

Mary’s Visitation (Chr)                  – Sun,  Jul  2nd 2017 = -154 days

St Laurentius Day (Chr)                  – Thu,  Aug 10th 2017 = -115 days

Mary’s Ascension Day (Chr)               – Tue,  Aug 15th 2017 = -110 days

St Bartholomew Day (Chr)                 – Thu,  Aug 24th 2017 = -101 days

Mary’s Nativity (Chr)                    – Fri,  Sep  8th 2017 =  -86 days

Mary’s Name (Chr)                        – Tue,  Sep 12th 2017 =  -82 days

Mary’s Maternity (Chr)                   – Wed,  Oct 11th 2017 =  -53 days

Reformation Day (Chr)                    – Tue,  Oct 31st 2017 =  -33 days

All Saints’ Day (Chr)                    – Wed,  Nov  1st 2017 =  -32 days

All Souls’ Day (Chr)                     – Thu,  Nov  2nd 2017 =  -31 days

Martinimas (Chr)                         – Sat,  Nov 11th 2017 =  -22 days

Mary’s Sacrifice (Chr)                   – Tue,  Nov 21st 2017 =  -12 days

St Andrew’s Day (Chr)                    – Thu,  Nov 30th 2017 =   -3 days

1st Advent (Chr)                         – Sun,  Dec $<2> 3rd$<2> 2017

St Nicholas’ Day (Chr)                   – Wed,  Dec  6th 2017 =   +3 days

Mary’s Immaculate Conception (Chr)       – Fri,  Dec  8th 2017 =   +5 days

2nd Advent (Chr)                         – Sun,  Dec 10th 2017 =   +7 days

3rd Advent (Chr)                         – Sun,  Dec 17th 2017 =  +14 days

Mary’s Expectation (Chr)                 – Mon,  Dec 18th 2017 =  +15 days

4th Advent (Chr)                         – Sun,  Dec 24th 2017 =  +21 days

Christmas Eve (Chr)                      – Sun,  Dec 24th 2017 =  +21 days

Christmas Day (Chr)                      – Mon,  Dec 25th 2017 =  +22 days

Boxing Day (Chr)                         – Tue,  Dec 26th 2017 =  +23 days

Sylvester/New Year’s Eve (Chr)           – Sun,  Dec 31st 2017 =  +28 days


As I mentioned above the process of building was very simple, just was too slow. And it required some searching and browsing over the documentation for available option and commands.

In the second part I must build and test the GNU C library.

by msmohiti at December 03, 2017 07:07 PM

Eric Schvartzman

Open Source Release 0.1

After a spending a few weeks learning about open source technology and learning how to contribute to an open source project I was finally able to merge a commit to Mozilla's Thimble project. There was a bit of a learning curve with regards to understanding how to setup thimble so that it works on a local environment. Before any code can be written I had to install the following dependencies:
  • Node.js (version 6.11.1 or later)
  • Brackets (Bramble)
  • Virtualbox (version 5.1 or later)
  • Vagrant (version 1.9 or later)
Thimble works tightly with Brackets, which is an open-source code editor for HTML, CSS and JavaScript. To install Brackets you have to first clone the Brackets repository onto a local machine and install all of its dependencies using npm. The next thing to install is vagrant, which is a command line utility for managing the life cycle of virtual machines. Lastly, you need to have Virtualbox in order to run the virtual machine.

The next bug

 The merged commit was related to issue #879, which had dealt with the design modification of the inline editor close button icon. The next bug I am currently working on is focused on stopping the reload of the live dev preview in Thimble when importing files. The goal is to wait until all of the files have completed the importing process, and then initiate the reload of the live dev preview.

by Eric S ( at December 03, 2017 03:42 PM

Azusa Shimazaki

Algorithm Selection Lab

For the SPO600 lab6 ( ,
I was requested to simulate volume adjusting of digital audio data.

Digital audio are represented signed 16-bit integer and its volume are changed by volume factor that has a range 0.0000 to 1.0000

I was given following tasks.
  A. Create a large (500M?) array of int16_t numbers to represent sound samples.
  B. Scale each sample by the volume factor (0.75). Store the results into the original array or into a separate result array.
  C. Sum the results and display the total (just to keep the optimzer from eliminating the scaling code).
  D. Determine the time taken for step B of each approach. You can add instrumentation to your program or you can use the 'time' command.

And also requested different approaches for B
  1. Multiply each sample by the floating point volume factor 0.75
  2. Pre-calculate a lookup table (array) of all possible sample values multiplied by the volume factor, and look up each sample in that table to get the scaled values.
  3. Convert the volume factor 0.75 to a fix-point integer by multiplying by a binary number representing a fixed-point value "1". For example, you could use 0b100000000 (= 256 in decimal). Shift the result to the right the required number of bits after the multiplication (>>8 if you're using 256 as the multiplier).

For the start, I used visual studio 2015 (so, I had to use "__int16" instead of "int16_t".) since I used to use it, and I wrote the following codes to meet requirements. 

: Multiply each sample by the floating point volume factor 0.75
 ... It is a basic type.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main() {

int size = 50000000;
__int16* array;
array = calloc(size, sizeof(__int16));
int random;
float sum = 0.0;

clock_t start, end;
start = clock();

for (int i = 0; i < size; i++) {
array[i] = 0;
random = rand() % 65536 + 1 - 32768;
float scaledNum = (float)random *0.75;
array[i] = scaledNum;
sum += array[i];
end = clock();

printf("process time milisecond: %f \n\n", (float)(end-start));

printf("Sum: %f \n", sum);

return 0;


: Pre-calculate a lookup table (array) of all possible sample values multiplied by the volume factor, and look up each sample in that table to get the scaled values. 
... It is like a hush table. Might be fast,

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main() {

const int range = 65536; //int range, right and left tota
float volumeValue[range]; // store possible downed volume, this time 0.75
int num = -32768; // minimum value for generating map

int size = 50000000; //total sound length
int* random;
random = (int*)calloc(size, sizeof(int));
__int16* scaled;
scaled = (__int16*)calloc(size, sizeof(__int16));
float sum = 0.0;

clock_t start, end;

// make a map
for (int i = 0; i < range; i++) {
volumeValue[i] = 0;
volumeValue[i] = num * .75; //store the changed value index 0 to 65536

//generate a sample sound - random number from -32768 to +32767
for (int i = 0; i < size; i++) {
random[i] = 0;
random[i] = rand() % 65536 + 1 - 32768;

start = clock();

// change volume - apply the map to found the result
for (int i = 0; i < size; i++) {
scaled[i] = 0;
scaled[i] = volumeValue[(int)random[i] + 32768];
sum += scaled[i];

end = clock();

printf("process time milisecond: %f \n\n", (float)(end - start));

printf("Sum: %f \n", sum);

return 0;


: Convert the volume factor 0.75 to a fix-point integer by multiplying by a binary number representing a fixed-point value "1". For example, you could use 0b100000000 (= 256 in decimal). Shift the result to the right the required number of bits after the multiplication (>>8 if you're using 256 as the multiplier). 

...First time, I got a wired result. The total sum was positive value , although other two was negative value. The reason why is I set result holder named "scaledNum" to "unsigned int" to use bit shift.
It is no wonder the total sum was positive!

but how can I keep sign for "scaledNum"?  Seems I cannot use normal int.... 
Oh, how about to use __int16. bitshift .. no error. It must be good!
The result was....  

Yahoo! I got a negative number !
But it's different with other two numbers...
Ah! By any chance... the head of numbers was cut by overflow?
Then.. how about to use __int32?

Good! Now the result is same!
the final code is below!
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main() {

int size = 50000000;
__int16* array;
array = (__int16*)calloc(size, sizeof(__int16));
int random;
float sum = 0.0;
__int32 scaledNum = 0;

clock_t start, end;
start = clock();
int factor = 0.75 * 256;
for (int i = 0; i < size; i++) {
array[i] = 0;
random = rand() % 65536 + 1 - 32768;
scaledNum = random *factor;
array[i] = scaledNum>>8;
sum += array[i];
end = clock();

printf("process time milisecond: %f \n\n", (float)(end-start));

printf("Sum: %f \n", sum);

return 0;

Ok, then, it's time to try on X86 !
 I changed "__int16" and "__int32" to "int16_t" and "int32_t".
I compiled and ran three files.
the result was.... 


The sum results are different.... why?
and the speed balance of three is totally different compare with visual studio!

Ummm, how about to run on Aarch64?

...Also defferent and same sum result.

Oh, because the environment is linux??
Let's print the fiirst 10 result of each.

Apparently, there are small gaps on the first digit.
I guess there is a rounding (or cast?) problem...

Ok, then let's fix the small gap later.
For now, let's compare the speed of each!

Here is the result

Surprisingly, each approach changes the time differently depends ont the envionment.
"test2.c"- map approach is much faster than two others. but if it was on on X_86_64 and not optimized, there is no difference with "test1.c" - normal one.
"test3.c"- bitshift approach is not fast as I was expecting. Maybe there is a problem on my code??
Somehow on visual studio, programs runs super faster, while on AArch, much slower.
Also, the second approach is stand out with the speed on AArch64. Its around 5 times faster than others, and on visulas studio, 9-10 times faster!
I will definitely choose map approach to handle the sound control!

by Az Smith ( at December 03, 2017 05:50 AM

December 02, 2017

Saeed Mohiti

YUM, an open source command line tool



What is YUM?

Yum stands for “Yellow dog Updater Modified”. It’s an Open-Source Command Line tool for the Package Management that uses in Linux.

This command line tool has compatibility with Linux Operating Systems that supports the RPM which the RPM structure is predefined. The main point if designing and creating of yum is to find and install the Dependency of packages in Linux software’s which is very hard to find them in RPM.

For example, when the user encounters the RPM Dependency Hell; this issue can be solve by the yum.

This powerful and valuable tool can manages enormous systems without using and RPM commands in very simple and fast way.

By using the yum all the Packages will store in the/var/cache/yum path of root directory.

Let’s go back and review the up2date command line, this tool was like yum and used for updating the RPM packages in Linux Operating System which has been isolated from Linux and just can find in old version and distributions. The most using of up2date was on Linux RedHat Enterprise, Centos, and Fedora Core distribution. The Last using of up2date was belong to RHEL the 4.5 version of RedHat and after that is no longer supported with distributions.

In other word, yum has been completely substituted . The reason of this replacement was complexity and understanding in up2date syntax.

Right now we compare some of the syntax differences between yum and up2date:

These commands are for updating the RPM packages:

#yum update

#up2date –u

When we want to install the package with using of yum:

#yum install

And the same installation with up2date is:

#up2date -i

This command with yum can remove package with all its Dependencies:

#yum remove

And the same task with up2date:

#up2date -e

Let’s go little bit further, if we want to update group of packages by using the yum command:

#yum groupupdate

And this is group update by using up2date command:

#up2date -u @

And the last example for group install with yum:

#yum groupinstall

Guess what is used for up2date?




#up2date -i@

I don’t want to bring all the yum and up2date syntaxes and commands, as you can see in the commands the up2date syntax was more complicated. It was obvious that yum syntax was much easier to understand in terms of meaning and that was enough to replace the up2date with yum .





by msmohiti at December 02, 2017 05:49 AM

Jiyoung Bae

Review: Inline Assembly Language

Inline Assembly language is embedded within a program written by higher-level language such as C or Ada. Its advantages include optimization by performing better in algorithm, access to processor specific instructions that implement multitasking and system calls that higher-level languages do not make direct operating system calls; inline assembly language is used to write wrapper functions.

Syntax of Inline Assembly Language is simple. Both asm and __asm__ are acceptable, and colon(:) is used as a delimiter.

asm();   // basic format
int x=10, y;
asm("mov %0, %1" : "=r"(y) : "r"(x) : ); 

To understand the example above, we should at least know that assignment operator(=) represents output register while input register without the operator(=), percentage(%) for register and the number beside register sign(%) for the same register used later in the matching operand, and that registers are allocated by compiler decision for the best patterns.

So, in this example, the input value of x, which is 10, is allocated to the register called %1 in the code,  the value moves to the register %0, and then the result of operand, which is 10, is returned to y as a output.

To make a specific register allocated, we can declare it like below. Here, the register used is specified as r15.

int x=10;
register int y asm("r15");
asm("mov %1,%0; inc r15;" : "=r"(y) : "r"(x) : );

However, we can assign clobber registers in the last part so we can avoid register conflict or mean to overwrite the value in the registers.

int x=10;
register int y asm("r15");
asm("mov %1,%0; inc r15;" : "=r"(y) : "r"(x) : "r20");

It is more secure to use keyword volatile not to be moved by the compiler.

After this review, I learn that Inline Assembly Language is not that difficult to understand, and it is very helpful to make my code written by C more efficient. Also, I believe that I am able to perform lab 7 to examine the source code provided ant then analyze another assembler.

by Irene Bae at December 02, 2017 04:05 AM

December 01, 2017

Jiel Selmani

Integrating Travis With My Express Application, And How You Can Too

Back in October, I was introduced to the concept of continuous-integration but I never really understood it's importance.  Well, after adding Travis to my file-extraction project, I see its importance vividly.  Continuous Integration is something that all developers should be able to comfortably use; it can make or break a project during merges. Like many, when you introduce yourself to something new, we often walk on eggshells and hope not to break anything along the way; the only problem here is, Travis expects you to run a series of tests to make sure that if it is broken, that you shouldn't merge it.  My advice is to just go crazy and enjoy the process of turning your beloved project into a testing war-zone.

Where Do We Start?

Good question.  First and foremost, you want to head to Travis and create your account.  Naturally, it utilizes your GitHub credentials to log you in, sync your repositories, and give you the opportunity to enable TravisCI integration with the GitHub repositories of your choice.  It's a simple process to get that piece set up, and now comes the fun (but it's not hard).  Toggling is as simple as this GIF:


Next up, we need to set up our .travis.yml file in our project directory. No, the period before travis isn't a typo. If you don't follow that format, Travis won't recognize what you're doing and consequently fail. The contents in my file looked like the following image:


What this file specifies is how Travis will interpret what we tell it to do for our tests.  For this application,  we are telling Travis we are using Node.js (specifically version 8) however we could test for multiple versions should we specify to.  We also tell Travis to not give Super User permissions to the virtual machine to run our tests, and we finally tell it to cache our node_modules directory as it rarely changes and allows us to speed up the builds.  So far, so good, now let's get to testing.

ESLint & Jest

To get started, I decided that I would start with linting my project to make sure I am following best industry practices.  To get started with the ESLint node module, go ahead and head over to their Getting Started Tutorial.  They have great documentation too that will help you get through the initialization stage.  Before we continue to configuration, we also need to install Jest.  Jest is a JavaScript testing module that allows you to test your code based on tests you write and requires no configuration.  Essentially, it lets you make sure you have a healthy code-base right from the get go (keep reading to find out more on how to use it).  Jest is a developmental dependency, so make sure you install it as such in your command line.  Lastly, ESLint will require the Jest plugin to run properly and is also a developmental dependency; again, install that too.  Ok, let's move on to configuring our ESLint by checking out the following image:


To explain what I've done here, I decided to make my .eslintrc.js a JavaScript file. You do have options, but that was my choice. Essentially, I am telling ESLint to make sure that it tests for ES6 syntax, Node.js global variables and Node.js scoping, and that our project utilizes Jest, so don't throw undefined errors for functions that are used but never defined elsewhere in the project.

We also ensure to tell ESLint that we are using the Jest plugin, and the move on to the rules to ensure proper formatting based on our projects style. We check for indentation of 2 spaces, single quotes are used rather than double, we use semi-colons, and we want to disallow the use of the console. On the right hand side of this image, we also tell ESLint what files to ignore (exactly like .gitignore).

Almost there

Next up, we need to make sure that our tests will run both locally and remotely on Travis, meaning we have to set up our package.json file to have npm run our tests properly when called. Take a look at the image below for more clarification:


I don't think you needed to see the whole thing, but just so you have an idea of my project's structure, I decided to have you see it. As you can tell, I have several script definitions that perform different tasks. Of the six, four of them are relevant to us.

When we run npm run lint, npm run lint-fix, npm run jest, and npm run test, we want to ensure that our scripts run the way they are intended to. let's start with npm run lint.


As you can tell, the script runs as intended, and returns back to the command line without showing any errors. Great! The project is running the way it should so far. When running npm run lint-fix, essentially ESLint will go through your project and attempt to fix any issues you have in your styling based on the rules you provided in your .eslintrc.js file. Up next, we would like to run npm run jest, but we need to create tests first.

hashContentsFile.PNG hashContentsTest.PNG

I had already created a function that took the digest type of the hash and the data that was required to be hashed and used it in my function, but I wanted to make sure that it always returned the same value for both SHA-1 and MD5 hashing. The second image is where I did just that. I exported the hashContents function as a module to be imported for the test, and then ran it with test cases that made sure the hash value was the same.

What's important here is that if the file contents change by even ONE character, the hash value would fail and tests would not pass. Additionally, if the file was moved to another directory or it could not be opened, the tests would also fail. By running these tests, we ensure the integrity of our code-base. This isn't necessarily the best example, but it does reinforce the concept of testing for appropriate input and output.

The Home Stretch

Alright, let's put our freshly written the test. I ran npm run jest...


I could also run npm run test, but decided to let Travis do that. I saved all my files and decided to push everything up to GitHub. I compared everything and performed a Pull Request. After configuring everything properly, I was happy to see that everything passed correctly. You can see the image below:


Reflecting On This Process

For any developer, learning about testing and continuous-integration is crucial to making dreams of large, fantastic projects possible and healthy.  I had a great time figuring this out by reading through documentation, asking questions, and of course, Google searches.  Working with tools like this also looks great on a resume, and even better, gives you an opportunity to talk about your experience in working with it.  As I continue building and working on new-found ideas and projects, Travis will be my go-to friend..when everything passes.

by Jiel Selmani at December 01, 2017 08:10 PM

Ray Gervais

When Tweaking Doesn’t Fit Anymore

When I was in Highschool, I remember spending every moment I could on XDA, Reddit, and various other Android tweak-centric mediums; emulating such tweaks and ‘optimizations’ on my device during breaks.

Throughout most of College, I had done the same, to the degree where I often would end-up with a completely new ROM and setup at the end of each month with minimal effort made on my homework or social tendencies. It was a mix of utter freedom similar to driving on an empty highway, and self-inflicted chaos which can only described as ‘russian roullete with a single player, you’. Still, it was fantastic until my addiction to tweaking led to two phones being hardbricked, and the last straw being my device not being able to display the correct time, fetch new emails or use bluetooth headphones without causing a spike.

With that, impulse led to transition to an iPhone 6S Plus straight from Apple. This would in consequence, reduce what I am able to change on the phone tenfold unless I jailbroke it – which, I promised myself that I wouldn’t do. My daily driver was the average iOS user’s daily driver, Facebook and Twitter included.

Jumping forward two years, and I decided I wanted to see what Android 8.0 Oreo on a Pixel 2 XL was like, and after establishing the display wouldn’t be the leading factor to my regret of said purchase, I learned an interesting fact about myself: I didn’t find any wish to tweak every square inch of the device after configuring it.

Instead, I found myself going for the minimalistic setup that I had always used on an OS (where possible, inspired by this article:, which heavily implied a blank canvas without widgets or text, instead just your Dock icons and wallpaper. To me, this made much more sense than a screen full of icons ala iOS, or differently styled widgets ala Android. My OCD appreciates the aesthetic.

Perhaps this is from my two years exposed to iOS exclusively, building up the perpetual ‘it just works’ mantra throughout it’s usage. Or, it could be the maturing of both Operating Systems compared to previous experience, lending to a much more reserved temptations to ‘fix’ or replace items which annoy me. Realistically, if I had to mention the most common tweaks I used to focus on, it was the following:

Unified System Theme

Google’s introduction of Material Design as an utter mess on Android. Popular applications updated months behind, some only being updated as of recent. This created quite the dissociation between the applications, resulting in a horrible experience and driving me to discover the CyanogenMod / LineageOS theme engine. This engine allowed for system-wide themeing, which was utter bliss once a theme was found through the Play Store or XDA forums.

On Android O, or even iOS 11, I would have loved a dark theme built-in by default. But alas, no such luck aside from small ‘hacks’ or ‘tricks’ to invert the entire display. Not the best effort, but some nonetheless. While playing with the Pixel, I still yearn for a dark theme to utilize the P-OLED technology, but it’s not the same priority as I had in the past.

Optimizing CPU / GPU Performance

I am a product of the generation whose entire life has seen the performance increases in the yearly iPhone releases, and envied just how smooth iOS was for the everyday user. This envy derived from Android’s lack of optimizations (which started with Project Butter), or inherit lack of cohesion with the hardware. Indeed, the flaw of open hardware became clear, but that didn’t mean that a silly high schooler couldn’t root his Nexus 5, install new kernels every week and attempt to boost performance right?

That is what I had attempted, often sacrificing battery life or stability to get that ‘buttery smooth’ effect on a stock AOSP ROM. This tweak to CPU / GPU governors led to my first hardbrick when I stupidly set the CPU max frequency to 1%.

Mimicking Other System Features

I have an unhealthy obsession with those whom oppose the norm; BB10’s / WebOS gesture based navigation (now found in the iPhone X funnily enough), A unified Messaging application (ala BB10 Hub), or even Ubuntu Phone’s side-dock multitasking system. All of the aforementioned above were ideas or attempts which failed horribly, or proved that perhaps if I wanted said functionality, I’d had to implement it myself. Though I never did back then, I feel that perhaps an implementation in my free time may help more than just myself.

Being Annoyed by Application imperfections

So this one is completely and utterly blown out of proportion I admit, but it is also one which means dearly to me and, is found throughout the other examples listed above (in theme). I found in my experience from using Android Oreo for a week, I had already tried out multiple SMS applications because I noticed that the text field on Google’s Android Messages lacked the same padding and height as the font should have for that application.


This, plus having noticed that also making me notice the lazy approach Google had taken on the right side with the condensed SMS ‘send’ button which to me, is more of an eyesore than anything else. Not to make this sound like the end of the world, but realizing that by having all the choice in the world when it comes to applications and devices, I will forever be trapped in a spiral of ‘try’, ‘enjoy’ and finally ‘annoyed’ with a multitude of applications.


This entire article may sound like a rant, or even a disapproval of how Android operates as a system, but that was not the purpose of this post. Sometimes, I write simply to put jumbled thoughts to a page – attempting to make sense of them through the process. While spending a week with Android Oreo on a Pixel 2 Xl, writing this article in the process, I came to similar conclusions or revelations about why even with an amazing device I still had discontent.

Android is an amazing system, likewise so is iOS. They both have so many unique perspectives and implementations that often the end-user all they could ever want. In recent years, feature parity has blurred the differences between the two operating systems – creating a fantastic experience regardless of the chosen device.

In the end, I suppose Android will remain my hobby operating system, simply because it gives me far too much choice for my OCD mind to fathom. I love the choice, but in the end I found myself tweaking and longing for hours both as of recent or in the past. Luckily, choice is still an option and I have time to continue deducing what is the best for myself. I know many who are happy as can be with choice, and others who treat Android as a defaults-only configuration. It’s truly amazing when you think about just how many different types of users there are out there!

As for the Pixel, perhaps it’s my lack of discipline which is causing disconnect; an idea which only time will tell.

by RayGervais at December 01, 2017 04:32 AM

Fateh Sandhu

Travis Tests

Working with Travis CI was a very fun experience mostly because it gave us opportunity to work on something interesting and new. With Travis CI I tested my code to make sure that it was dong what I wanted it to do.

The process was a lot of fun to be honest. Mostly because it looked very interesting to be interacting with something that tests your code without you even touching it. I learned how to use this tool more effectively in the future. I can use this tool to make sure if I am building something open source or even something for myself I can make sure that nothing is broken in the process. It also helps to make sure that the consistency is maintained throughout the development.

I did not know about this before I started working on the 0.1 release. After I saw and used it the first time I was very confused and lost as to what was happening. After playing around with it for a while I have learned about it more.
Screen Shot 2017-11-30 at 21.31.47.png

When I did my first test it failed miserably and I had no idea as to what to do. But after I got the hang of it by googling some information and looking at some of the other projects online I managed to finish it quite fast.

by firefoxmacblog at December 01, 2017 02:33 AM

Joao Rodrigues

Lab 8 Travis

In this lab we were to connect with TravisCI. I had some experience with Travis thanks to my 0.1 release bug, but it was interesting to be able to read more and actually connect to it from scratch. All of the requests made to Travis are done through a .travis.yml file, where you tell it how to configure, build and test the project. I wrote some tests in order to check whether or not the code I wrote in Python on the previous lab was working as intended, in a way that would be automated. My testing framework was pytest. Everything went along very smoothly, and it didn't take too long to see a very happy sight: a green checkmark on my Travis build.

by João Rodrigues ( at December 01, 2017 01:48 AM

November 30, 2017

Chun Sing Lam

SPO600 Lab 6 Part 2 – Algorithm Selection

After performing all of the tests on Aarchie, I will now do the same on Xerxes to compare the relative performance of the three methods. I compile my program with no optimization and here are the results for all three methods after running the time command:

[cslam4@xerxes lab6]$ gcc lab6a.c -o lab6a
[cslam4@xerxes lab6]$ time ./lab6a
Sum of scaled samples: -115014951
CPU time used to scale samples: 3.730427 seconds

real      0m14.264s
user     0m13.106s
sys       0m1.145s
[cslam4@xerxes lab6]$ gcc lab6b.c -o lab6b
[cslam4@xerxes lab6]$ time ./lab6b
Sum of scaled samples: -115014951
CPU time used to scale samples: 3.501012 seconds

real      0m14.014s
user     0m12.797s
sys       0m1.202s
[cslam4@xerxes lab6]$ gcc lab6c.c -o lab6c
[cslam4@xerxes lab6]$ time ./lab6c
Sum of scaled samples: -302496613
CPU time used to scale samples: 2.712908 seconds

real      0m13.249s
user     0m12.055s
sys       0m1.179s

The sum of all scaled samples is -115014951 for the first two methods and is -302496613 for the third method, which is the same as the results on Aarchie. The first method took 3.730427 seconds to calculate the scaled sample values and store them in another array. The second method took about 6% less time than the first method. The third method took about 27% less time than the first method. For the first method, the real time is 14.264 seconds, the user time is 13.106 seconds and the system time is 1.145 seconds. For the second method, the real time, user time and system time are shorter than the first method. For the third method, the real time, user time and system time are shorter than the first two methods.

Now, I compile my program using -O3 option, which enables a lot of optimization. I perform the same tests above and here are the results for all three methods:

[cslam4@xerxes lab6]$ gcc -O3 lab6a.c -o lab6a
[cslam4@xerxes lab6]$ time ./lab6a
Sum of scaled samples: -115014951
CPU time used to scale samples: 1.587858 seconds

real      0m10.103s
user     0m8.908s
sys       0m1.184s
[cslam4@xerxes lab6]$ gcc -O3 lab6b.c -o lab6b
[cslam4@xerxes lab6]$ time ./lab6b
Sum of scaled samples: -115014951
CPU time used to scale samples: 1.729359 seconds

real      0m10.466s
user     0m9.237s
sys       0m1.219s
[cslam4@xerxes lab6]$ gcc -O3 lab6c.c -o lab6c
[cslam4@xerxes lab6]$ time ./lab6c
Sum of scaled samples: -302496613
CPU time used to scale samples: 1.088714 seconds

real      0m9.609s
user     0m8.451s
sys       0m1.147s

The sum of all scaled samples does not change after using -O3 option. Interestingly, the second method took about 9% more time than the first method to calculate the scaled sample values and store them in another array. For the third method, it is about 46% faster than the first method and 59% faster than the second method. The third method has the shortest real time, user time and system time while the second method has the longest real time, user time and system time. After optimization is enabled using -O3 option, the processing time has been drastically reduced for all three methods. When I compare to no optimization, the first method is 135% faster, the second method is 102% faster and the third method is 149% faster.

Now, I use “/usr/bin/time -v” to find out how much memory is used to run my program. Here are the results for all three methods with no optimization during compilation:

[cslam4@xerxes lab6]$ /usr/bin/time -v ./lab6a
Sum of scaled samples: -115014951
CPU time used to scale samples: 3.212162 seconds
            Command being timed: "./lab6a"
            User time (seconds): 12.59
            System time (seconds): 1.13
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.75
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 1954152
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 488338
            Voluntary context switches: 1
            Involuntary context switches: 1323
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0

[cslam4@xerxes lab6]$ /usr/bin/time -v ./lab6b
Sum of scaled samples: -115014951
CPU time used to scale samples: 3.605827 seconds
            Command being timed: "./lab6b"
            User time (seconds): 12.95
            System time (seconds): 1.17
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:14.15
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 1954044
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 488369
            Voluntary context switches: 1
            Involuntary context switches: 1409
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0

[cslam4@xerxes lab6]$ /usr/bin/time -v ./lab6c
Sum of scaled samples: -302496613
CPU time used to scale samples: 2.712172 seconds
            Command being timed: "./lab6c"
            User time (seconds): 12.07
            System time (seconds): 1.16
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.25
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 1954152
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 488338
            Voluntary context switches: 1
            Involuntary context switches: 1334
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0

The results show that the first method and third method have the highest peak memory usage (1954152 kilobytes) and the second method has the lowest peak memory usage (1954044 kilobytes). However, the difference in peak memory usage is very small, so it is insignificant.

Here are the results for all three methods using -O3 option during compilation:

[cslam4@xerxes lab6]$ /usr/bin/time -v ./lab6a
Sum of scaled samples: -115014951
CPU time used to scale samples: 1.584637 seconds
            Command being timed: "./lab6a"
            User time (seconds): 8.90
            System time (seconds): 1.17
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.09
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 1954204
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 488341
            Voluntary context switches: 1
            Involuntary context switches: 968
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0

[cslam4@xerxes lab6]$ /usr/bin/time -v ./lab6b
Sum of scaled samples: -115014951
CPU time used to scale samples: 1.724290 seconds
            Command being timed: "./lab6b"
            User time (seconds): 9.21
            System time (seconds): 1.23
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.45
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 1954104
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 488369
            Voluntary context switches: 1
            Involuntary context switches: 1003
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0

[cslam4@xerxes lab6]$ /usr/bin/time -v ./lab6c
Sum of scaled samples: -302496613
CPU time used to scale samples: 1.088512 seconds
            Command being timed: "./lab6c"
            User time (seconds): 8.44
            System time (seconds): 1.14
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.60
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 1954196
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 488338
            Voluntary context switches: 1
            Involuntary context switches: 965
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0

The results show that the first method has the highest peak memory usage (1954204 kilobytes) and the second method has the lowest peak memory usage (1954104 kilobytes). Peak memory usage is pretty much the same as the case with no optimization enabled.


From the results above, we see that the third method generates the shortest CPU processing time, which is also true on Aarchie. The first method is slower than the second method without optimization but the first method is faster than the second method with optimization. We get the same results when we compare the three methods using the real time and user time from the time command. This is different from the results on Aarchie where the first method is faster than the second method regardless of whether a lot of optimization is enabled. The results still show that the third method is the adjusting volume approach that results in the best performance. In general, the results on Xerxes are similar to the results on Aarchie.

by chunsinglam at November 30, 2017 11:49 PM