Planet CDOT

April 13, 2015

Thana Annis


Looking at the program we went over in class, if the compiler can accurately predict that a piece of code will loop a certain amount of times like “for(int i = 0; i < 256; i++)” and that the number of times it loops is divisible by 4 then it might choose to use vectorization instructions if the optimization settings are high enough when it is compiled.

Even though the first loop is divisible by 4 and is predictable, it is not a candidate for vectorization because rand needs to be called for each index of a. It will need to go through each loop to populate the array.

In the second loop, however, all the code is doing is reading each value in a and adding it to t. This makes it a good candidate for vectorization because all the memory is loaded and just needs to be processed. With vectorization, instead of reading and adding 32bits at a time it will grab a 128bit chunk from a and put each 32bit chunk into its own “lane” in memory.

|__32bits__|__32bits__|__32bits__|__32bits__| = 128bits of a’s memory

The processor knows that the register holding this 128 bits is vectorized so when add is called it adds up each individual lane and creates a total for each lane. Then it will advance the pointer in memory 128bits so that it can grab the next chunk of memory to process. Once all the totals are calculated and the end of a is reached, an instruction can be called that adds each lane up and creates a final total to store in t.

The advantage to this is that you are essentially only looping 1/4 of the times you would have if you didn’t use vectorization. If you were doing it the way it was written in the program, then you would be grabbing a 32bit chunk, adding it to t, moving the pointer over 32bits and repeating 256 times.Vectorization can be a powerful tool to decrease run time of your program.

by bwaffles91 at April 13, 2015 02:00 AM

April 11, 2015

Chris Tyler (ctyler)

Connecting a 96boards HiKey to a Breadboard

I've wanted to experiment with interfacing the 96boards HiKey (8-core, 64-bit ARM development computer) to some devices, but the 2mm 40-pin header is difficult to bring out to a breadboard (except by individual wires). I designed a simple adapter board and had some produced by the awesome OSH Park prototype PCB fabbing service. The boards are back, and I've been assembling a few of them for testing.

The adapter has a 40-pin 2mm male connector that plugs into the 96boards "low-speed" (LS) female header. It brings those signals to a 40-pin 0.1"/2.54mm header that can be used with an insulation displacement connector (IDC) on a ribbon cable. The other end of the ribbon cable can then be connected to a standard solderless breadboard using an adapter (any of the 40-pin IDC-to-breadboard adapters used with the Raspberry Pi A+/B+/2 should work -- for example, the AdaFruit Pi Cobbler Plus (straight) or T Cobbler Plus (T-shaped), KEYES SMP0047 from (U-shaped), or a 40-pin IDC Ribbon Breakout from Creatron in Toronto (straight) -- obviously, the pinouts from the 96boards connector will be different, so ignore or rewrite the pin labels on the adapter board).

These first-try "Alpha 1" boards are not ideal -- they're bigger than they need to be, the board occludes one of the 96boards mounting holes, and the routing is wonky (I let the autorouter do its thing, and ended up with a couple of unnecessary vias and some wild traces). Nonetheless, the board seems to do what it was intended to do. I'm going to work on a second-generation design, but it may be a little while before I get around to it.

If you're interested in using this design, it's licensed under CC-BY-SA and can be downloaded or ordered from the OSH Park sharing site.

You can use this with the 2mm (96boards LS) and 2.5mm (IDC) connectors on the same side of the board, or on opposite sides of the board.

If you place them on the same side of the board (see the populated PCB with no cable attached in the photo -- though the IDC socket should be rotated 180 degrees), the pin numbering on the IDC cable will match the 96boards pin numbering. However, the IDC connector will be really tight to the edge of the board -- you may want to slightly angle the IDC header when you solder it on.

If you place them on the opposite sides of the board (the adapter plugged into the HiKey in the photo), the even and odd numbered pins will be swapped (1<->2, 3<->4).

One final note -- these should work fine with other 96boards devices that comply with the Consumer Edition specification -- and hopefully, it won't be long before the DragonBoard 410c is available so we can test this theory!

(The board in the photo is in one of the very basic acrylic cases I built for the HiKeys).

by Chris Tyler ( at April 11, 2015 05:22 PM

Artem Luzyanin

[SPO600] Disassembling lab

During out last lecture, we were given the code to disassemble and analyze. So I copied the tarball to my folder, run the usual “tar -xvf name” on it, run “make”, and now I was ready to go. This is the code I was analyzing:

#include <stdlib.h>
#include <stdio.h>

void main() {

int a[256], i, t;

srand(1); // initialize prng

for (i=0; i<256; i++) {
a[i]=rand(); // load array randomly

for (i=0; i<256; i++) {
t+=a[i]; // sum the array
printf("%d\n",t); // print the sum

As you can see, it’s pretty easy and straightforward program. After I run “objdump -d name”, I got the following:

0000000000400500 <main>:
400500: d11003ff sub sp, sp, #0x400
400504: 52800020 mov w0, #0x1 // #1
400508: a9bd7bfd stp x29, x30, [sp,#-48]!
40050c: 910003fd mov x29, sp
400510: f90013f5 str x21, [sp,#32]
400514: 9100c3b5 add x21, x29, #0x30
400518: a90153f3 stp x19, x20, [sp,#16]
40051c: 9110c3b4 add x20, x29, #0x430
400520: aa1503f3 mov x19, x21
400524: 97ffffef bl 4004e0 <srand@plt>
400528: 97ffffe2 bl 4004b0 <rand@plt>
40052c: b8004660 str w0, [x19],#4
400530: eb14027f cmp x19, x20
400534: 54ffffa1 400528 <main+0x28>
400538: 4f000400 movi v0.4s, #0x0
40053c: 3cc106a1 ldr q1, [x21],#16
400540: eb15029f cmp x20, x21
400544: 4ea18400 add v0.4s, v0.4s, v1.4s
400548: 54ffffa1 40053c <main+0x3c>
40054c: 4eb1b800 addv s0, v0.4s
400550: a94153f3 ldp x19, x20, [sp,#16]
400554: f94013f5 ldr x21, [sp,#32]
400558: 90000000 adrp x0, 400000 <_init-0x460>
40055c: a8c37bfd ldp x29, x30, [sp],#48
400560: 911da000 add x0, x0, #0x768
400564: 0e043c01 mov w1, v0.s[0]
400568: 911003ff add sp, sp, #0x400
40056c: 17ffffe1 b 4004f0 <printf@plt>

Now this code took a bit of figuring out. Breaking the code into the sections helped:

400524: 97ffffef bl 4004e0 <srand@plt>

This line is out call to srand(), this is fairly clear.

400528: 97ffffe2 bl 4004b0 <rand@plt>
40052c: b8004660 str w0, [x19],#4
400530: eb14027f cmp x19, x20
400534: 54ffffa1 400528 <main+0x28>

This block acts in a pretty interesting way, since it FIRST calls rand() function, stores the result, and only then it checks the exit condition. “str w0, [x19],#4” Line shows that instead of incrementing the index, compiler decided to increment the pointer to the array, and when the pointer points to the projected end of the array, the loop will stop.

400538: 4f000400 movi v0.4s, #0x0
40053c: 3cc106a1 ldr q1, [x21],#16
400540: eb15029f cmp x20, x21
400544: 4ea18400 add v0.4s, v0.4s, v1.4s
400548: 54ffffa1 40053c <main+0x3c>

This block is the whole reason for the exercise. Here we are touching upon Single Instruction Multiple Data (SIMD) concept. “400538: 4f000400 movi v0.4s, #0x0” loads a vector register, divided in 4 single word sized parts(32 bits) with zeros, preparing it to be the sum register.  "ldr q1, [x21],#16" loads another vector register with the values from our initial array. Then again, exit condition checked. Then the loaded values are added to the sum register in the respective positions. Then the loop ends or continues depending on the result of the condition check.

40054c: 4eb1b800 addv s0, v0.4s

This line adds up all the parts of the sum vector register into the rightmost position, filling the rest with nulls.

40056c: 17ffffe1 b 4004f0 <printf@plt>

Finally, we are calling printf for the output.

As we can see, by performing couple more of step to work with the vector registers, we save enormous time on the calculations, that now happen four at the time. This is the beauty of vectorization: if you can condense it like that, you should use it!

Unfortunately, there are problems with the vectorization, that we have to watch out for. First of all, we need to specify ‘-O3″ gcc option for it to work. This is the code that was vectorized in my example, when I change the option to “-O2″:

400538: b8404660 ldr w0, [x19],#4
40053c: eb1302df cmp x22, x19
400540: 0b0002b5 add w21, w21, w0
400544: 54ffffa1 400538 <main+0x38>

It reads much easier, but it does addition one at the time. No more vectorization for us. Unfortunately, in some situations running the third level of optimization might be problematic for the rest of the code due to drastic changes in it. So one would have to choose.

Another limitation is happening when we try to add two arrays into a third one (it would be something like this: “add v0.4s, v1.4s, v2.4s“).  Since the compiler doesn’t know whether the arrays are overlapping or not, it will play safe and assume that they will. Vectorization will not happen then. To fix that, you would need to add “__restricted__” option to the array to specify that it will not be overlapped.

Another limitation is that the vectorization might have problems with doing calculations on misaligned values. So imagine accessing first element of the first array and second element of the second array and going from there. There is an option to run “__builtin_assume_aligned” function (can be found here) to tell the compiler how to align. Now, according to these changes, this problem is actively worked on, and it seems that the compiler should figure it out on his own by now.

Another limitation for vectorization is that you cannot vectorize values, coming one at the time from a function. In our code, when we do “a[i]=rand();“, compiler sees that rand() will return only 1 value at the time, so you need to perform 4 separate calls to obtain 4 values. Hence no vectorization.

And the last limitation I will mention is that there is no output/calculation/condition can be done on the vectorized values.

This lab was very interesting, showing me that sometimes, a small change in a code or optimization level can have a significant impact on performance, if it allows vectorization to kick in!

by lisyonok85 at April 11, 2015 05:15 PM

[SPO600] Project – Part 3 – V2

As I have described in my previous post, after hours upon hours of learning how to submit a path, I finally made it. It took the community only 2 hours to completely destroy my work. Good thing that I was given two suggestions to how to improve my code. One suggestion was to drastically change my code, while another suggestion was to abandon the code and do something else. I decided to… do both. Well, except the abandoning part. First of all, I localized my code to the proper parts, instead of creating one giant lump in the top. Second, I created the file layout. Third, I committed. Four, profit!

by lisyonok85 at April 11, 2015 04:15 PM

April 10, 2015

Danylo Medinski

Project Update 2: Progress and Bugs

Since my last posting (, I have received more feedback on the official PHP internals mailing list about my project( The reply discussed how sorting within the function may compromise performance. Being as this is the second reply to address this issue, I decided to implement the sorting after I have working code.

After a couple of days of messing around with the C function that I implemented, I managed to get a working binary search algorithm.
At the moment this code only supports the long data type for testing purposes.

static inline void php_binary_search_array(INTERNAL_FUNCTION_PARAMETERS, int behavior) /* {{{ */
    zval *value,                /* value to check for */
         *array,                /* array to check in */
         *entry;                /* pointer to array entry*/
    zend_ulong num_idx;
    zend_string *str_idx;
    zend_bool strict = 0;       /* strict comparison or not */

#ifndef FAST_ZPP
    if (zend_parse_parameters(ZEND_NUM_ARGS(), "za|b", &value, &array, &strict) == FAILURE) {
HashTable *arr_hash;
    arr_hash = Z_ARRVAL_P(array);

    int low=0;
    int high=zend_hash_num_elements(arr_hash);
    int mid;

    zval res;

            entry = zend_hash_index_find(arr_hash, mid);          
            //php_printf("%ld\n", Z_LVAL(res));     
         if(Z_LVAL(res) > 0){
         }else if(Z_LVAL(res) < 0){

After running and examining the results, I discovered a major bug.
At first I tested the code with a print statement which outputs the
result of the compare statement. After a couple runs of my PHP search
benchmarker that I used in my previous postings, the results turned out to be pretty favorable.

Linear Search
Linear search time: 0.0007469654083252 seconds
Binary Search
Binary search time: 0.00012421607971191 seconds

Seeing as the binary search was quicker than the linear search, I commented out the print statement and rebuilt PHP. After running the PHP benchmarker again, I noticed a discrepancy with the outputs. What now appears is that the binary search preformed is slower than the linear search this time.

Linear Search
Linear search time: 0.00079488754272461 seconds
Binary Search
Binary search time: 1.0967254638672E-5 seconds

How can that be, I though, the output of the last test clearly showed that the binary search preformed quicker than the linear search. After some more testing, it appears that the print statement is effecting the algorithm to be quicker. Seeing as this is a pretty large (and very strange) must be addressed before I can proceed to create a pull request on github.

Once the bug is addressed; I plan of adding support for every possible data type in my function. I also plan of adding a sort flag into my function as discussed on the mailing list, but before that implementation, I must first determine the crossover point to check if the performance will be affected from the sorting. Once the crossover point is determined, I can decide if it would be wise to perform a sort within the function. Once these steps are complete I can proceed to my pull request.

by ddmedinski at April 10, 2015 05:46 AM

April 09, 2015

Thana Annis

Patch Submission Outcome

It’s with a heavy heart that I write this post. The outcome to my patch submission was not good. The community was fantastic in responding within 4 hours of my submission. Unfortunately they responded with “Thanks for the patch! I thought line 654-656 was already tested by self.assertFailure(‘-L’, ‘en’)”. So obviously my first reaction was to go into the code and check for myself. Sure enough, he was right and my patch wasn’t needed. There wasn’t really much point in arguing my case to push my patch through so all I could really do was thank them for responding so quick and close the ticket which is exactly what I did.

At this point in the semester there isn’t much time to devote to solving another issue, but I am going to see if I can make another patch next week. Wish there was more to say in this post, but i think the issue tracker messages speak for themselves.

Here is the link to the issue again Issue #23807.

by bwaffles91 at April 09, 2015 01:44 PM

April 07, 2015

Jordan Theriault

R6 – Contribution to React-Bootstrap

React is a javascript library developed by facebook for UI. It’s gaining traction as one of the most popular UI building tools because of it’s ingenious and easy to use coding style. Bootstrap is created by Twitter and is the most popular front-end framework which gives you beautiful components to rapidly web develop.

React-Bootstrap is a marriage of these two technologies.

I’ve contributed a new component to react-bootstrap. I’ve added the thumbnail component to the project as seen in Bootstrap’s docs which is a great component for featuring images, and is best used within the already existing grid system. Being new to the repository, I followed the coding examples of already existing components to create Thumbnail. It was a straightforward component to code and I am currently working through the responses on my pull request.

by JordanTheriault at April 07, 2015 06:58 PM

Hosung Hwang

cc_xmp_tag Android/Perl Implementation Compatibility

In the last posting, I was developing and testing Java implementation of Creative Commons license tagging and reading library for Android. In this posting, changes since then and explanation of current version will be covered.

Source Repository



In the git repository, cc-xmp-tag/java is Java implementation for Android.
com directory is Adobe XMP Toolkit for Java version 5.1.0 source code.
pixy directory is the most recent version of PixyMeta Android. Recently Wen updated for png and gif format. Current version is merged version.


Andrew Smith wrote Perl script that uses ExifTool.
cc-xmp-tag/perl/ file is the script.

Compatibility test

All test worked for an image that doesn’t have xmp tag and an image that already has xmp tag. The later’s case, keeping the other and add only CC license tag is important. Test result for both were the same.

Write in CCXMPTag -> Read in CCXMPTag

  • For all supported image format(jpg, png, gif, tif), works well.

Write in Android CCXMPTag -> Read in

  • worked for all formats.

Write in -> Read in Android CCXMPTag

  • worked for all formats.

Write in Android CCXMPTag -> Read in XnViewMP

  • worked for jpg, png, tif
  • doesn’t show the information in gif

Write in -> Read in XnViewMP

  • worked for jpg, png, tif
  • doesn’t show the information in gif

Write in Android CCXMPTag -> Read in an online metadata viewer (

  • worked for all formats

Write in -> Read in an online metadata viewer (

  • worked for all formats

Write in Android CCXMPTag -> Read in on-line metadata viewer (

  • worked for all formats

Write in -> Read in on-line metadata viewer (

  • worked for jpg, png, tif
  • doesn’t work for gif


  • Between CCXMPTag and, worked well for 4 image formats.
  • XnViewMP doesn’t show XMP in gif format. This is weird because XnViewMP uses ExifTool.
  • Online metadata viewer worked well with both Java and Perl implementation.
  • Another online metadata viewer worked well with Java implementation, but it didn’t read XMP in gif written by Perl version.

by Hosung at April 07, 2015 05:55 PM

Liam Christopher Martin

SPO Phase 2 – Entry #2

For this entry I will be elaborating on python.

I managed to download and install python into a local directory onto both architectures, red and australia.

For comparisons sake I will show the commands used to create the same testing conditions.


Tar zxvf Python-3.4.3.tgz

Cd Python-3.4.3




make test

mkdir $HOME/bin
DESTDIR=$HOME/bin make install

I used a local directory on advice from my instructor to avoid using sudo, this also seems like a good way to pre-determine testing conditions. To verify accuracy, input these commands and compare.

In order to access this version, if you installed it in a similar way, issue this command


This will work on both architectures if installed in the same way. This will also be the first line of my python scripts.

To test, I created a small hello world.

#! /usr/bin/python3.4

print (“Hello World”)

In order to execute this script , chmod u+x


I am still waiting on some feedback from instructor for a few things to finalize, The next post will hopefully detail the Python script I’ve created for my project, as well as the output.
-Liam Martin

by martinliam at April 07, 2015 02:50 PM

Mohamed Baig

Creating new events in the viewer API of PDF.js for better integration with other libraries

I continued to work on PDF.js. This time around I created new events within the viewer API that help better integrate PDF.js with other libraries. First event that I created is called pageremoved Pageremoved was needed so an external library can know exactly when a page is removed from cache. This can be useful if […]

by mbbaig at April 07, 2015 02:01 PM

April 05, 2015

Artem Luzyanin

[SPO600] Project – Nightmares of patching

So, I’ve done with coding the patch. At least so I thought. Next step is to submit the patch and, hopefully, get it accepted. To avoid all extra communication I decided to follow the instructions for submitting the patch as closely as I can. That’s where I started having problems. Although PostgreSQL community claims that anyone can submit a patch, steps that one has to take seemed to be pretty scary to me.

First of all, I had to read tons of documentation, related to submitting the patch. These are some (not all) of the links I had to pile through:

After spending few hours on just reading through those articles, I finally started trying to do something. First of all, I run “git diff –color s_lock.h s_lock2.h” to see how things will come out (s_lock.h is the original file, while s_lock2.h is my new file). Just as I expected, it was horrible. Tons of whitespace everywhere. Alignment, that I did in Visual Studio for the file, came out completely screwed up  on Linux. Plus, seeing my code in the context of the file showed me how poorly I organized it. So, back to the drawing board.

After I fixed all the problems and reworked the organization, I run the command again and checked that everything was, in fact as I expected it to be. Second round of changes took only a few minutes, and the third run of the command showed that I am finally there.

The biggest question now was “Now what?”. I decided to take it easy and followed these steps:

git clone git://
cd postgresql
git checkout -b my-cool-feature
git commit -a
git diff --patience master my-cool-feature | filterdiff --format=context > ../my-cool-feature.patch

I created new directory, pulled the repository there, switched my new file, and run the commit. It complained that it didn’t know who I am, so I had to run “git config” command. Trying to commit again didn’t work, so I spent some time trying to understand why. Finally I realized that I need to set up not only my name, but also my email. “git config” command fixed the problem. Then it committed fine.

The first warning sign was the message that told me the amount of lines that were removed and added. I was surprised, since I have changed way less lines than it said. I went into the patch file and realized something terrible. The git repository has newer version of the file than the one I downloaded from the website. It includes the changes people have made since the last release. So what should I do? Should I use the older version, but then my patch file has tons of extra lines, that aren’t in the git repository, or should I change the new version of file, which means I would have to change a whole lot to the code I have written? With a sign I went to check what was changed in the new file. Again, I run “diff” command to see the changes (it’s a very handy command, I must say!), and started working on merging my code with the new file.

Some time later I finally created my working patch file. Next step is to send an email to the developers with my patch.

This was the most emotionally complicated task. I sent the email to developers with info, outlined in Hopefully, they will take it as something serious, not as a lame email by a student! Also, hearing from my classmates that their patches were getting rejected really didn’t help my case. In any case, I sent an email, I created the post on CommitFest. Uploaded my patch to github, as it was requested in the CommitFest post, and…

Now we wait.

by lisyonok85 at April 05, 2015 07:05 PM

April 04, 2015

Andor Salga (asalga)

Normal Mapping using PShaders in Processing.js

Try my normal mapping PShader Demo:

Last year I made a very simple normal map demo in Processing.js and I posted it on OpenProcessing. It was fun to write, but something that bothered me about it was that the performance was very slow. The reason for this was because it uses a 2D canvas–there’s no hardware acceleration.

Now, I have been working on adding PShader support into Processing.js on my spare time. So here and there i’ll make a few updates. After fixing a bug in my implementation recently, I had enough to port over my normal map demo to use shaders. So, instead of having the lighting calculations in the sketch code, I could have them in GLSL/Shader code. I figured this should increase the performance quite a bit.

Converting the demo from Processing/Java code to GLSL was pretty straightforward–except working out a couple of annoying bugs–I got the demo to resemble what I originally had a year ago, but now the performance is much, much, much better :) I’m no longer limited to a tiny 256×256 canvas and I can use the full client area of the browser. Even with specular lighting, it runs at a solid 60 fps. yay!

If you’re interested in the code, here it is. It’s also posted on github.

#ifdef GL_ES
precision mediump float;

uniform vec2 iResolution;
uniform vec3 iCursor;

uniform sampler2D diffuseMap;
uniform sampler2D normalMap;

void main(){
	vec2 uv = vec2(gl_FragCoord.xy / 512.0);
	uv.y = 1.0 - uv.y;

	vec2 p = vec2(gl_FragCoord);
	float mx = p.x - iCursor.x;
	float my = p.y - (iResolution.y - iCursor.y);
	float mz = 500.0;

	vec3 rayOfLight = normalize(vec3(mx, my, mz));
	vec3 normal = vec3(texture2D(normalMap, uv)) - 0.5;
	normal = normalize(normal);

	float nDotL = max(0.0, dot(rayOfLight, normal));
	vec3 reflection = normal * (2.0 * (nDotL)) - rayOfLight;

	vec3 col = vec3(texture2D(diffuseMap, uv)) * nDotL;

	if(iCursor.z == 1.0){
		float specIntensity = max(0.0, dot(reflection, vec3(.0, .0, 1.)));
		float specRaised = pow(specIntensity, 20.0);
		vec3 specColor = specRaised * vec3(1.0, 0.5, 0.2);
		col += specColor;

	gl_FragColor = vec4(col, 1.0);

Filed under: gfx, GLSL, JavaScript, Open Source, Processing.js

by Andor Salga at April 04, 2015 04:26 PM

April 03, 2015

James Boyer

Project update

Project Switch
Project Progress: Stack-less Just in time compiler
This is an update about my project progress in SPO600, an area to work on has been found!


After my previous post I began talking to one of the developers of sljit about contributing to the project. He was very helpful and after some talk of what would be a good area for me we settled on one. This area involves offering more floating point registers on arm or x86. Currently the sljit compiler only has 6 registers available for floating point operations on all cpus but could offer more if the cpu has more available. This will require me to save and restore floating point registers and also modify which registers get used as temporary floating point registers through register mapping.

Moving forward

For now I will be looking at all this in more detail will provide a more detailed update in the future. I am really glad that I finally have a solid direction to go in and I look forward to contributing to this project.
Thanks for reading

by James Boyer ( at April 03, 2015 05:00 PM

April 01, 2015

Ryan Dang

Release 0.6

I worked on Add delete button in preview app for admin for my release 0.6
When I was working on the previous release, I noticed that it would be good to have a delete button beside the open app button. I made an issue on github to see if it is a good idea.
The issue is approved later on and a milestone is set for it to be completed.

I use the existing “modalPromt” and “shim” component to get the work done. The pull request is up and waiting to be reviewed.

by byebyebyezzz at April 01, 2015 04:27 AM

March 31, 2015

James Boyer

Switching projects

Project Switch
Project Progress: PERL5 PCRE / Stack-less Just in time compiler
This is an update about my project progress in SPO600, I have switched projects and made some steps forward.


In my previous post I ruled out some areas in Perl5 and was heading towards the regular expression engine as an area to optimize. Well I got a response from the mailing list saying the regular expression engine should be fine and does not really need any simple optimizations. I was kind of stubborn in thinking I could find something in perl and I kept looking when nothing obvious was there. In hindsight I should have stopped and tried out different packages sooner.

PCRE and sljit

After a suggestion from my professor I have looked at PCRE(Perl compatible regular expressions) This small library deals with parsing regular expressions using similar semantics as perl. I downloaded the source via subversion
svn co svn:// pcre
and found a folder in the source called sljit. There are many files in that directory with architecture specific code. They have some for x86, some for ARM, some for Sparc and more, this peaked my interest so I looked into it. I found that sljit is a stack-less just in time compiler which is cpu independent, more info here. Specifically in pcre it is part of a pcre performance project which uses sljit to improve the pattern matching speed of pcre.

Moving forward

I have just found all this recently, so for now I will be looking into any functions that can be ported over to another architecture or any areas that look promising for optimization. I hope to soon zero in on one particular location, and actually get something done.
Thanks for reading

by James Boyer ( at March 31, 2015 07:02 AM

Yan Song

SPO600 Course Project Phase 2

How we squander our sorrows, gazing beyond them
into the sad wastes of duration,
to see if maybe they have a limit.

Rainer Maria Rilke, Duino Elegies

At this stage, we need to:

  1. Verify the correctness of our » MongoDB C driver
    after compilation and installation on (the relatively new Aarch64 machine) red. Specificallly, we need to get a » MongoDB serverup and running on red.
  2. Direct our project effort into two directions, one experimental, the other theoretical.
  3. In the event that our builds (of the driver and server) are positively verified, following the standard benchmarking/profiling to get something plausible done. This would essentially reset the last number in the title of this blog to 1. (It might also be desirable to find out some of the performance bottlenecks (if any) via code reading/inspection.)
  4. Evaluate the value of the aforementioned benchmarking/profiling approach, because, though the C driver build is standalone, the testing/benchmarking/profiling would be done in a networking environment where the DB server, the driver client, and the networking are (tightly or loosely) coupled. (Even if the server and the driver reside on the same machine, there’re still setup/tear-down of TCP/IP connections; see » the MongoDB Wire Protocol.) What would be the effect of this coupling on the performance=related metrics? How could this (theoretical) insight guide the design of our performance measurement?
  5. Be prepared for the worse!

by ysong55 at March 31, 2015 03:12 AM

Liam Christopher Martin

SPO Phase 2 – Entry #1

This entry is to document the second phase of the final project, Optimization.

One of my main objectives in this phase is to test all possible combinations of optimizations for the gcc compiler.

Considering that there is 187 different tests in the distribution of gcc that I am working with, I have begun to work on a script.

My instructor mentioned working with Python so i have begun to do some research on that. Considering I am  not as familiar with scripts as I am with other forms of programs, I have come across a few questions I need to solve to proceed (and is where my research is heading)

-How does python work with specific cl arguments ? Can i just sequentially call certain arguments with some programming logic intertwined?

-How is this implemented on to the system ?

-Will this script itself be effected by the architecture?

Maybe some pretty basic stuff that will come early during research, but do need to be addressed

During research I have thought of a few different approaches to the logic of the problem. Seperate from how syntactically I would call these arguments etc, I am figuring out how to test for all combinations.

I believe I will create an array (or two dimensional, depending on exact approach) that is 187 fields long to represent all total options for the debugger. Each field will contain a boolean value (representing enabled or disabled on a grander scheme) that if true, will determine its inclusion in that particular command.

More to come very soon on both of these.

-Liam Martin

by martinliam at March 31, 2015 02:45 AM

Danylo Medinski


Since my last posting, there has been some progress made with the SPO600 project,but at the same time new issues have presented themselves. First, the good news, the progress. I was beginning to wonder if anyone would reply to my posting( on the PHP internals mailing list, but after some time I finally received a reply. In the reply, I was recommended a lookup key O(1) algorithm when searching in an array as an alternative to using O(log n) for binary search and O(n) for linear search. The user also stated that since binary search requires a sorted array, the time it takes to sort the array will compromise the performance gain, thus unnecessary to have a sort flag in the function parameters .

Based on this feedback,I decided to begin implementing binary search since I would like to see for myself if I can optimize searching. As a backup project, I will research about searching with a lookup key if I decide to abandon working on binary_search.


My first step was to fork the the official PHP source code into my github account since contributing to the PHP interrupter we are required to work within the official source code.Instead of downloading and installing the official package of the PHP website,download just the PHP interrupter source code.

Building php-src is similar to the official package, but some differences are present. To regenerate PHP parsers, re2c is required,bison is required tobuildPHP/Zend when building a GIT checkout and autoconf is required to generate a configure script. These packages will need to be installed before building the source code. After downing the source code off my github and into the system I was using, I ran the git checkout PHP=5.6 command. Then I ran ./buildconfig to generate a ./configure script. At first the ./buildconfig command failed since the system I was using did not include re2c and bison. I quickly fixed this issue by running commands,sudo dnf install bison and sudo dnf install re2e.

I ran the ./buildconfig command again and, success, I now had a ./configure file. From here, configuring and building the source code is the same as building the official package.
During my time looking at the source code, I begin to notice somethings that would make my task more difficult.

Zend Engine
I expected the PHP source code to comprise of native C code, but turns out, this expectation was far from reality. Turns out the PHP intemperate uses the Zend Engine to interpret the PHP code before calling the C code responsible for the function behavior. Instead of using chars,ints,arrays etc, all variables in PHP are represented by one structure, the zval. Zval contains a zvalue_value datatype named value is a union which can represent all types a variable may hold.

typedef struct _zval_struct {
    zvalue_value value;        /* variable value */
    zend_uint refcount__gc;    /* reference counter */
    zend_uchar type;           /* value type */
    zend_uchar is_ref__gc;     /* reference flag */
} zval;
typedef union _zvalue_value {
    long lval;                 /* long value */
    double dval;               /* double value */
    struct {                   
        char *val;
        int len;               /* this will always be set for strings */
    } str;                     /* string (always has length) */
    HashTable *ht;             /* an array */
    zend_object_value obj;     /* stores an object store handle, and handlers */
} zvalue_value;

Learning and getting used to this datatype turned out to be a real pain in the ass and still continues to be, especially since I have to work with a HashTable instead of a regular C/C++ array or a linked-list, restricting me with certain code that I can use.


I will first write a separate function instead directly modifying the php_search_array function.For now I will call my function “binary_search”. I may choose in the future to revise its name or implement it directly into the php_search_array function, but for now I will keep it as a separate function in order to make benchmarking easier(I will have both functions to compare). It will take the same arguments as the php_search_array function and will return the same boolean datatype.

The binary_search function must first be defined in the php_array.h header. This will cause the text “binary_search” to be recognized as a PHP function or keyword. I just added PHP_FUNCTION(binary_search) into the list.

PHP_FUNCTION(binary_search); //the binary_search function I added

I then added this code into the array.c file located in the same directory.

    php_search_array(INTERNAL_FUNCTION_PARAM_PASSTHRU, 0);

The new function only calls the original php_search_array for now since it was a test to see if my function was recognized.

I run the make command, and the source compiled successfully. I quickly modified the script I used to benchmark binary search and linear search in the previous part. The only difference is that now instead of the binarySearch function having a binary_search algorithm, it calls the binary_search function that Created. When I ran the script I was given a undefined function call error for the new binary_search function.

After doing some research and looking through the PHP source code, I realized that since I was modifying the standard extension, I would also have to modifythe basic_functions.c C file. I added the first group of code after the equivalent in_array and array_search code near the top of the file and added the line,PHP_FE(binary_search, arginfo_binary_search) after the equivalent in_array and array_seach code.

ZEND_BEGIN_ARG_INFO_EX(arginfo_binary_search, 0, 0, 2)
    ZEND_ARG_INFO(0, needle)
    ZEND_ARG_INFO(0, haystack) /* ARRAY_INFO(0, haystack, 0) */
    ZEND_ARG_INFO(0, strict)

Now that the binary_search function was defined in basic_functions.c, I was ready to build the source again.  This time after building the source code and running the PHP script, the function was recognized and called.  Now that I had a function to work on now, I began implementing the binary search algorithm.

In order to better understand zval, I first did some conversions and casting of zval to the relevant C datatypes, Unfortunately, I was not however able to work with arrays as PHP uses a hashtable (zval array in the code). After some time I managed to write out some code for the binary_search algorithm to the best of my experience that I have with the PHP source code at the moment.

/* {{{ proto bool binary_search(mixed needle, array haystack [, bool strict])
 *    Checks if the given value exists in the array using binary search*/
    php_binary_search_array(INTERNAL_FUNCTION_PARAM_PASSTHRU, 0);
/* }}} */

/* void php_binary_search_array(INTERNAL_FUNCTION_PARAMETERS, int behavior)
 *  * 0 = return boolean
 *   * 1 = return key
 *    */
static inline void php_binary_search_array(INTERNAL_FUNCTION_PARAMETERS, int behavior) /* {{{ */
    zval *value,                /* value to check for */
         *array,                /* array to check in */
         *entry;                /* pointer to array entry */
    zend_ulong num_idx;
    zend_string *str_idx;
    zend_bool strict = 0;       /* strict comparison or not */

#ifndef FAST_ZPP
    if (zend_parse_parameters(ZEND_NUM_ARGS(), "za|b", &value, &array, &strict) == FAILURE) {

    HashTable *arr_hash;
    arr_hash = Z_ARRVAL_P(array);//returns the hashtable of the array zval
    int result;

    int rc=-1;
    int low=0;
    int high=arr_hash->nNumOfElements;//zend_hash_num_elements(array) also   returns nNumOfElements
    int mid;
    //  char * needle = malloc(Z_STRLEN_P(value)+1);

     //char * element = malloc(sizeof(char) * 1024); //test char pointer to store each index
         entry = zend_hash_index_find(arr_hash, mid);//returns zval from the hashtable based on the specified index
         compare_function(result,entry,value);//compares both zvals
         if(result > 0){
         else if(result < 0){
/* }}} */

The binary search algorithm does not work at the moment as it suffers form segmentation faults. When I realized that I am unable to go to a specified index in the HashTable as I would do in a array (array[100]),I began to worry a little since if I was not able to access to specific elements in the array based on the element number, the algorithm would not work.After some research about PHP Hashtables, I discovered the function, ZEND_HASH_INDEX_FIND which supposedly finds a element based on the specified index and returns that element. If the function does behave like I just described, implementing binary_search would be possible. Even though the function does not work quite yet, it is a good starter as I am able to access most of the variable(HashTable is an exception and most likely causes the seg fault) and know what functions I should be using for my code.

Next Steps

Now that I have some experience with the PHP source code, I will continue my research of zval and ZEND while continuing to work on my optimization. After I receive some more feedback on my post on the mailing list, I will proceed to get a RFC since my contribution may be considered a large change or a new feature. To hope to have the binary_search function working a couple days after this post where I can continue to refine it before I submit any changes upstream or submit a pull request. Once the function is complete, I will do a update post.

by ddmedinski at March 31, 2015 01:42 AM

Artem Luzyanin

[SPO600] Project – Alternative Path – Update #1

As I’ve mentioned before, I decided to update the spinlock file with new structure and documentation. As the first few updates showed me, it’s not as easy as one would think. Although quite a bit of code is repeating itself (hence the whole idea of restructuring: get rid of repeating code and ease up reading the file), there are a lot of distinctions between platforms, that are hard to address. For instance, one variable (which I assume is used for register definition) is defined as “unsigned int” for one platform, while it is defined as “unsigned char” for another. Also, depending on the result of a particular #ifdef statement, some platforms (for example “arm”) will do different code for a half of the whole cycle, so combining it with other #ifdef statements will be very  nested, bulky, and hard to follow.

Documentation also proved to be a slight problem. Different platforms had comments is random places. Not only I have to preserve the location, I have to make sure, that the comment will be correctly linked to the correct platform code, which is a bit hard for a platform merge.

Despite these problems, the file becomes shorter and shorter! I hope the community will like it!

by lisyonok85 at March 31, 2015 12:38 AM

March 30, 2015

Hong Zhan Huang

SPO600: Project Perl Phase 2 – Perly Red Hot Ops

Coming off from last time when I first dipped my toes into Perl for the sake of the course’s final project, I’ve since decided on my focus for the optimization to be looking at the Hot Operations route as suggested by the community. This works outs well as it doesn’t require in depth knowledge on how the Perl interpreter works but rather it places much of the effort needed on profiling and benchmarking (which I’ve gotten quite a bit better at thanks to this course). To recap the idea of Hot Op optimization:

Profile for hot ops – There are certain operations in the Perl code that are located in the pp_hot.c file and these operations are the ones that are used most often. To again quote the to-do entry: “As part of this, the idea of F<pp_hot.c> is that it contains the I<hot> ops, the ops that are most commonly used. The idea is that by grouping them, their object code will be adjacent in the executable, so they have a greater chance of already being in the CPU cache (or swapped in) due to being near another op already in use. Except that it’s not clear if these really are the most commonly used ops. So as part of exercising your skills with coverage and profiling tools you might want to determine what ops I<really> are the most commonly used. And in turn suggest evictions and promotions to achieve a better F<pp_hot.c>. One piece of Perl code that might make a good testbed is F<installman>.”

Now then how to we go about doing Op level optimization? Perhaps we first need to do some Op level profiling.

During my irc conversation with the community that led me to this route, it was brought up that on CPAN (The Comprehensive Perl Archive Network), I might be able to locate a profiling module that would do Op level profiling. On that search I came upon the OpProf module that would apparently do what I was looking for. Unfortunately I wasn’t able to get it to build correctly with the given installation instructions so that turned out to be a wash. At that point I decided to go back to my grpof profiling results and saw that I could reuse those results toward my new purpose.


As can be seen above, the gprof results show off what Ops when running the installman script which I still mainly used as my testbed. Some of these Ops are among those in the pp_hot.c file which houses the most used Ops. By manually recording and reordering the results I have a baseline on the hotness of each of the Hot Ops with respect to installman. The following is a snippet of what I found and the full data can be had here.


I ranked the results in two ways. One was based on the number of times a Hot Op was called and the other was the total % of time a Hot Op took during the course of execution. Looking at the results it can be seen that not all the Ops in pp_hot.c were utilized during installman. In fact only 19/37 Ops were used which is a little over half of them. There were also some operations used that aren’t in pp_hot.c. At this point in time I’ve decided to forgo looking at those operations but perhaps they could be looked at for promotion into a Hot Op as an alternative optimization route.

Back to what we have, given the above data I can now create two variants of the pp_hot.c file by shuffling the order of the Ops to match what I have. The two variants: hot_pp-callbased.c and hot_pp-timebased.c For comparison this is the original.

With these piping Hot Ops in hand it’s time to build a fresh batch of Perl and test it out. The testing platform this time will be Red and we’ll be using the 5.20.2 stable build of Perl5. Australia will follow in a later post.

hot_pp-callbased.c (gprof result)

What I found is that shuffling the Ops based on number of times it was called lead to a decrease in performance. Originally the gprof result for installman on Red took a total of 51.84 seconds and using the callbased variant lead to the result of 53.25 seconds. That’s a fairly significant slow down as it comes up to be a 2.6% difference.

hot_pp-timebased.c (gprof result)

On the other hand with the timebased variant we actually had a very slight increase in performance. The total time that it took was 51.15 seconds. Comparing that to the original the difference is about 1.3% difference which oddly enough is half of the callbased variant. A coincidence maybe?


All in all, I’m rather glad that I was able to find some hint that shuffling the Hot Ops can have a affect on performance. it turned out to be a decent proof of concept I guess haha. There are a couple things to note however:

  • The test bed is currently very narrow. I’m only using installman so the results and therefore the optimization is biased towards installman. This likely will not grant the same improvements to a general perl application.
  • Only about half the Hot Ops were ranked.
  • Only tested on Red.

Moving forward I’ll need to expand the testbed of software to get a ranking of Hot Ops that will better encompass the needs of the general non-trivial Perl application. This will hopefully lead to a more accurate shuffling of the pp_hot.c file. Additionally I’ll also need to inquire the community about the original order of the Hot Ops to see if there is any significance to it before going full steam ahead.

Beginning training montage while on route to the final boss. Need to better organize my inventory. End log of an SPO600 player until next time~

by hzhuang3 at March 30, 2015 11:53 PM

Hosung Hwang

Creative Commons License Tagging Android Implementation for Image

In the previous posting, I tested CCL license tagging Java implementation on Linux desktop. After several conversation and change with Wen, finally I could get Java implementation that extract, manipulate, create, and put Creative Commons license into image files : jpg, png, tif, and gif on Android.

I wrote a wrapper class that enables Android developers easily use in their project to read/create/put Creative Commons license called CCXMPTag. This class uses Adobe XMP Library for Java CS6 and Wen Yu’s PixyMeta Android. This posting is draft of the tutorial.

CCXMPTag source code will be uploaded to

Setting Creative Commons license

CCXMPTag t1 = new CCXMPTag();
t1.setNewValue(CCXMPTag.TAG_ATTRIBUTIONNAME, "Hosung Hwang");
t1.writeInfo("test.jpg", test_out.jpg");

By creating CCXMPTag, empty CC License is created. using setNewValue method, CC property is set. Followings are CC properties defined as static members. Although arguments of setNewValue method are String, using static member protects typos. Those are defined as TAG_XXXX.

Member XMP Property

In License property, there are 6 kind of license. License is defined as LICENSE_XXXX.

Member License Name URL
LICENSE_BY Attribution 4.0 International
LICENSE_BYNC Attribution-NonCommercial 4.0 International
LICENSE_BYNCND Attribution-NonCommercial-NoDerivs 4.0 International
LICENSE_BYNCSA Attribution-NonCommercial-ShareAlike 4.0 International
LICENSE_BYND Attribution-NoDerivs 4.0 International
LICENSE_BYSA Attribution-ShareAlike 4.0 International

After setting license information, writeInfo change the XMP tag that is in test.jpg and make new file named test_out.jpg.

Getting Creative Commons License

CCXMPTag t2 = new CCXMPTag("test.jpg");
String a = t2.getValue(CCXMPTag.TAG_ATTRIBUTIONNAME);
String b = t2.getValue(CCXMPTag.TAG_ATTRIBUTIONURL);
String c = t2.getValue(CCXMPTag.TAG_LICENSE);
String d = t2.getValue(CCXMPTag.TAG_MARKED);

By creating CCXMPTag with existing file name, CCXMPTag read XMP from image file. getValue returns Creative Commons property value as a string.

Changing existing XMP tag

CCXMPTag t1 = new CCXMPTag("test.jpg");
if (t1.getValue(CCXMPTag.TAG_LICENSE).equals("")){
   t1.setNewValue(CCXMPTag.TAG_ATTRIBUTIONNAME, "Hosung Hwang");
   t1.setNewValue(CCXMPTag.TAG_ATTRIBUTIONURL, "");
   t1.writeInfo("test.jpg", test_out.jpg");

Simply by reading XMP information from existing image file, read and change values. Since writeInfo takes two arguments, License values read by another file can set to am image file.

Other than these usage,

  • extractInfoFromXML can be used to parse XMP data from String not from image.
  • existValue can be used to check if it has a CC property.
  • deleteValue deletes existing CC property

by Hosung at March 30, 2015 11:08 PM

Jan Ona

SPO600 – Project Phase 2: Progress

To continue with my project, I used gitk to find the files that was previously modified for the issue: SERVER-4745.

This issue modifies the following files: (link to git commit)


Based on this commit, I’ve noticed that the getShardsForQuery() that I was talking about from my last post was non-existent from the commit and the wrong file to modify. So I took a look on the latest MongoDB source to find the files.

Due to the fact that this particular fix was made on Feb 10, 2012, The file structure of the master file changed considerably, with these files being changed by other patches. These changes made it difficult for me to find the proper files to modify.

Moving Forward:

With this problem putting a halt to most of my progress, I’ve decided to send a message to the creator of the patch to see if he has any idea of where the files are currently. Unfortunately, as of this state, I will have to wait for a response before moving forward with the project.


To take a more useful use of my time while looking for the files, I decided to look at the comments of the issue. From the comments, The patch seems to have 1 possible implementation that the patch can be as. To create a FieldRangeVector::matchesBetween( start, end ) function which will do the following:

>Iterate through the Chunks
  >For every chunk, do a binary search to each of the range's max and min
    >If the result of the searches is different, that means that there will be a match between them

This possible implementation reverses the current implementation from O(R*log(C)+C) to O(C*log(R)) incases where R >> C.

by jangabrielona at March 30, 2015 09:14 PM

Jordan Theriault

Traversing the Backlog

For this release, I traversed the backlog of Webmaker and added validation for a “counter block” editor. I investigated and marked reported issues such as this one have been resolved. As with most moderate sized open source development there are a large number of issues which are resolved through unrelated updates, or are fixed by other contributors who were not aware there was an open issue on Github regarding the issue. Sometimes it’s hard to put words to a specific issue, and because of that, it’s difficult to find it among the large list of issues for the project. Sometimes, the back log needs some work to clear out solved issues.

If you’re attempting to get into open source development, confirming issues do in fact exist is a very valuable contribution. Whether it’s a visual issue or an issue that only appears on certain devices there is a lot of value to validating the backlog.

As mentioned, I also added validation to the counter block’s editor. When the user is typing in a min that is larger than the max, both fields will show an error underneath the field. This is using Vue’s v-if functions which validate the “truthiness” of a statement and will create the error div as the user is typing. You can check out the pull request here.

For the next release, I will be focusing on Facebook’s React framework rather than Vue. While Vue is a great library for presentation and synchronous features, it has an API that leaves much to be desired. Further, React has a great deal more traction in today’s front-end development world. I will be starting by browsing Github’s trending section to find a suitable project to contribute to. I investigated into contributing to React directly but at this time, all the bugs labelled as beginner bugs are already assigned and in the works. Therefore, to beginning with projects that utilize react is the most logical start.

by JordanTheriault at March 30, 2015 08:00 PM

Gideon Thomas

Adapting to browserify

Over the past few days I have been working on fixing a small issue on Filer for release 0.5 for my open source class. I decided to take something tiny due to other course commitments and so that I can work on more fun issues for subsequent releases.

The issue I was working on was related a to an incomplete code port we did for Filer when we switched from using RequireJS to Browserify. With browserify, we should be able to use simple node.js syntax and it will convert the code to work appropriately in the browser. Most of this was done a while ago. However, some remnants of RequireJS remained.

When we run tests for Filer, a feature that was implemented in the RequireJS days was the ability to provide via the query string, the Storage Provider to use for the tests. Obviously this is only possible in the browser. I wrote the code which does this so that we can use node’s URL module to extract the needed info. Unfortunately it did not work. After some deep investigation, I found that browserify was not changing `global` to `window` for the browser build. That seemed strange and I spent hours going through browserify documentation to figure out why.

Turns out I was being silly. I didn’t look carefully enough at the file. Turns out, it was still using RequireJS syntax and the module was being passed `global` as a parameter which referenced the module itself so browserify considered each reference to the global inside that module, as a reference to the parameter that was passed in.

In short, to fix my main problem, I had to change:

(function(global) {
  // code

to just

// code

by Gideon Thomas at March 30, 2015 04:31 PM

Yan Song

SPO600 Course Project Phase 1: Update Cont.

In » this post, we talk about the failed tests that occurred when our new MongoDB C driver installs were passed through make test:

2015/03/29 21:08:23.0924: [27881]:  WARNING:       client: Failed to connect to: ipv4, error: 111, Connection refused

2015/03/29 21:08:23.0924: [27881]:  WARNING:       client: Failed to connect to: ipv4, error: 111, Connection refused


Assert Failure: 0 == 1
tests/test-mongoc-client.c:97  test_mongoc_client_authenticate()
/bin/sh: line 1: 27881 Aborted                 ./$TEST_PROG -f -p -F test.log
Makefile:4209: recipe for target 'test' failed
make: *** [test] Error 134

From the error message, could we decide that the C driver was expecting a running instance of the MongoDB server that is listening on port 27017 on localhost?

Using the precompiled, Linux 64-bit legacy version (why?), we did a couple experiments on an x86_64 machine running Fedora 21. These experiments confirm a running instance of the MongoDB server is sufficient to scaffold the test.

Unfortunately, this version of MongoDB server is an x86_64 targeted build, so it’s not an option on red; it simple won’t run because red is, as we know it, an Aarch64 machine. (In fact, when we try we get “Exec format error.”) We need to find out an alternative.

by ysong55 at March 30, 2015 02:06 AM

March 29, 2015

Justin Grice

SPO600 Update – Writing Assembly and Patches

As part of my SPO600 course, I am working on optimizing a portion of the PHP code base. As described in my previous posts regarding optimization of the zend_operators.h code, I’ve decided to write assembly code to improve the functionality of two of it’s functions: fast_increment_function and fast_decrement_function.

I started by trying to figure out what the purpose of the current x86 inline assembly code was for. The following code is the inline code for x86_64 architecture for the fast_increment_function with the comments added to describe its functionality:

   "incq (%0)\n\t"                     //Increment a register
   "jno 0f\n\t"                        //Jump to flag '0' if not overflow
   "movl $0x0, (%0)\n\t"               //Move 0 into register
   "movl $0x43e00000, 0x4(%0)\n\t"     //Move 0x43e00000 into offset register
   "movb %1, %c2(%0)\n"                //Move register offset by ZVAL_OFFSETOF_TYPE into register
   "0:"                                //Flag '0'
   : "r"(&op1->value),
   : "cc");

After spending some time attempting to figure out what the assembly code instructions meant, I then had to try and find the similar commands for AArch64 assembly. This proved to be more difficult than I originally thought due to almost every command having different syntax between the two types of assembly. It got even more difficult when I realized there were two different formats of assembly just for x86: Intel and AT&T(GAS). This by far, took up the majority of my time on this project.

After getting a rough idea of the ARM assembly language I proceeded to start porting the existing x86_64 assembly to AArch64 as shown below with comments describing functionality.

#elif defined(__aarch64__) && defined(__GNUC__) //Checks for AArch64 
        "add %0, %0, #1\n\t"          //Adds 1 to register
        "bvc 0f\n\t"                  //Jumps to flag '0' if not Overflow
        "mov %0, #0x0\n\t"            //Move 0 into register
        "ldr x3, [%0, 0x4]\n\t"       //Load value offset by 4 into register 3
        "mov x3, #0x43e00000\n\t"     //Move 0x43e00000 into register 3
        "ldr x3, [%0, %c2]\n\t"       //Load value offset by ZVAL_OFFSETOF_TYPE into register 3
        "mov x3, %1\n\t"              //Move value register into register 3
        "0:"                          //Flag '0'
        : "r"(&op1->value),
        : "cc",

There were a couple of key differences between the codes. There is no increment command in ARM assembly, so I was forced to use add. There also didn’t seem a way to easily move an immediate value into an offset register in ARM, so I was forced to load the offset register into another register and then move the value into that register. After seeing this, I’m beginning to think a performance increase may not be as noticeable over C code.

I repeated this code for the fast_decrement_function with a few minor changes to decrement the value. I then proceeded to build PHP package using make. The make process seemed to build all the files with no issue until it wanted to generate the phar.phar

This issue has delayed my patch significantly as I would not want to introduce code that can’t even build correctly. I hope to get this issue resolved within the next day so that I can ensure performance improvements and proceed with the PHP RFC process to submit the patch.

A full copy of my work so far can be found my Github and the orignal file can be found on PHP’s github. I will follow up with another post of the resolution to the memory issue.

by jgrice18 at March 29, 2015 09:25 PM

Liam Christopher Martin

SPO600 Project Phase 1 – PHP Project Selection & Analysis

This post is to document the Phase 1 of SPO600’s final project. There are three total phases for this project, being:

1.Identifying a Possible Optiization in the LAMP stack

2. Optimizing

3.Commiting the Changes Upstream

Areas to Optomize

For Phase 1 I will be discussing my chosen topic PHP.

This Link is where I downloaded the php mirror from if you want to download through website:

Commands Used to Build my version of PHP I will be testing on:

wget -P PHP/

tar xvjf php-5.6.7.tar.bz2

scp /home/lcmartin2/PHP/php-5.6.7.tar.bz2

./configure –prefix=/wwwroot –enable-so


Make test

After being notified about a lack of architecture specific coding for Aarch 64 in a lot of files that do contain that type of code for x86_64, I have decided to to test with the gcc -Q option to determine if there are any combination of compiler optimization flags that can improve performance, or detect a certain area in the code that will greatly benefit from some platform specific code.

This brought me to comparing the gcc -Q -O# –help=optimizers command on both architectures and seeing if they differed.

(The command used here will search through different optimization levels, COunting the amount of Enables and Disabled tests when compiling.)

gcc -Q -O3 –help=optimizers | grep -c -F “[disabled]” > test2
gcc -Q -O3 –help=optimizers | grep -c -F “[enabled]” > test

// Results of Above Statements
[lcmartin2@red php-5.6.7]$ cat test
[lcmartin2@red php-5.6.7]$ cat test2
[lcmartin2@australia php-5.6.7]$ cat test
[lcmartin2@australia php-5.6.7]$ cat test2
gcc -Q –O2 –help=optimizers | grep -c -F “[enabled]” > ../ProjResults/O2ena
gcc -Q –O2 –help=optimizers | grep -c -F “[disabled]” > ../ProjResults/O2dis

//Results of above statements
[lcmartin2@australia php-5.6.7]$ cat ../ProjResults/O2ena
[lcmartin2@australia php-5.6.7]$ cat ../ProjResults/O2dis

[lcmartin2@red php-5.6.7]$ cat ../ProjResults/O2ena
[lcmartin2@red php-5.6.7]$ cat ../ProjResults/O2dis

gcc -Q –O1 –help=optimizers | grep -c -F “[enabled]” > ../ProjResults/O1ena
gcc -Q –O1 –help=optimizers | grep -c -F “[disabled]” > ../ProjResults/O1dis

//Results for above tests
[lcmartin2@australia php-5.6.7]$ cat ../ProjResults/O1ena
[lcmartin2@australia php-5.6.7]$ cat ../ProjResults/O1dis
[lcmartin2@red php-5.6.7]$ cat ../ProjResults/O1ena
[lcmartin2@red php-5.6.7]$ cat ../ProjResults/O1dis

gcc -Q –O0 –help=optimizers | grep -c -F “[enabled]” > ../ProjResults/O0ena
gcc -Q –O0 –help=optimizers | grep -c -F “[disabled]” > ../ProjResults/O0dis

[lcmartin2@red php-5.6.7]$ cat ../ProjResults/O0ena
[lcmartin2@red php-5.6.7]$ cat ../ProjResults/O0dis

[lcmartin2@australia php-5.6.7]$ cat ../ProjResults/O0ena


[lcmartin2@australia php-5.6.7]$ cat ../ProjResults/O0dis



You will notice in all situations “Australia” will have two more flags disabled than red, except for with option -0O, having only one extra. This change in amount leads me to believe that with further tinkering to these options, a more powerful set can be discovered. For phase two combinations of flags will be tested against each other in a script to determine optimization accuracy.

Fulfilling these requirements will possibly lead me to the conclusion that as a whole the optimization flags are as good as they need to be, in that instance I will research more into determining which specific functions may need some architecturally specific designs.

PHP offers a handy testing suite, so changes of this type can be made without concern of accidentally breaking things, allowing for a trial and error approach much appreciated when dealing with broad changes and a large platform. This will also be very beneficial for testing all optimization flags on all possible tests.

Why PHP?

PHP is an extremely established language, and finding information on it is not an unreasonable request, as it is well documented and well discussed in the community. Tutorials on installation and usage as well as easily accessible and well laid out source code on git cuts down on sifting  through files to find what you are looking for. Additionally, PHP is a developing language, with new release versions coming out often, some oversights in well established functionality or new additions may have happened. More personally, PHP is a language that I forsee myself working with more in my career as a programmer, and knowing more about its inner workings could be beneficial to me.

Plans To Upstream

A drawback of what I said about PHP before, as a well established language, contributing will be a bit of a hard task. With many people wishing to make contributions, feedback will be slow if any is given at all. If my results are as I expect them to be, a possibility of making a couple simple changes or implementing some platform specific code that would increase optimization in a meaningful way, I would still have to work hard to get this contribution noticed. That being said, a contribution upstream may take several months so it being accepted may not happen until the conclusion of the course. Ideally I would like to have a well documented submission and get at least some feedback before the culmination of the semester.

Full Contributon guidelines are here:

-Liam Martin


by martinliam at March 29, 2015 09:14 PM

Thana Annis

Patch Submitted!

As true with anything you try to do as a programmer, something always comes up at the last minute. In this case, as I was preparing to create my patch I noticed that someone had come before me and fixed one of the lines that I had already fixed. It wasn’t included in the file yet, but the issue was open. So I had to remove one of my changes, luckily I still had something left that I could use.

My patch file is test_calendar.patch. What it does is test that a user can’t use the –locale option without –encoding specified. This was something that wasn’t covered when I ran the coverage tests in my previous post. It’s a small change, but at this point finding something that I could change at all is like striking gold.

After I made my patch file using “hg diff > test_calendar.patch” and ensured that all the tests still passed, I ran the built-in command “make patchcheck” which checks if all the required files are present in your patch. I passed this too and the next step was to create an issue on their bug tracker. You can see my issue at Issue 23807. In addition to creating the issue I had to sign a contributor form to consent to the licensing agreement, which may take a few days to become active on my account. Right now, though, I’m playing the waiting game for my patch to get reviewed. Since it’s such a small change, I’m hoping it will be looked at soon. If nothing happens in the next few days I might check out IRC and ask someone to take a look. There was a developer that reviewed the patch that caused me a bit of trouble earlier in my post, so she might be a good person to ask, but I will leave that for another day.

by bwaffles91 at March 29, 2015 04:10 PM

Danylo Medinski

Quick Project Update

As my research continued, I been to speculate if my decisions to implement a binary search algorithm into the PHP search algorithm included in array.c was wise. This was due to the fact that in order to use a binary search algorithm,it is required for the array being searched to be sorted. Now this is not a problem if the array is already sorted, but if we need to sort the array before the search, this may compromise the potential performance gain of the function.

After a having meeting with my Professor, he mentioned that it might be possible to optimize the function using a binary search algorithm under certain circumstances. He recommended that I look for a ‘crossover point’, a term used to describe the point at which algorithms algorithms should be switched for performance.We also discussed the possibility of preforming the sort within the binary search function in order to have more adaptable code. Though this may reduce performance as mentioned before, we can avoid sorting every time the function is called by using a boolean in the function arguments that specifies if the array should be sorted or not.

It was also recommended that I get in touch with the PHP community through a bug-tracker or mailing list where I can get more feedback about my possible optimization. I have since created a discussion on the mailing list, but I am yet to receive any feedback, hopefully I will receive some feedback soon.

by ddmedinski at March 29, 2015 07:18 AM

Yan Song

SPO600 Course Project Phase 1: Update

It turns out that I’m building (on red, an Aarch64 machine running Fedora 21; currently the machine’s running in little-endian mode.) » this single piece of software—failure by failure. Worse, I’m doing the whole thing mostly based on superstition. What? Superstitious software building? I know, after viewing the config.log files line by line zillions of times, this type of build is backed up by auto tools like autoconf, automake, and libtool. Does that mean there’s no room for superstition?

Actually,  “as far as my eyes can see,” superstition built out while I’m trying to have a successful build. The buildout comes to a climax after I got an error-free build the other day, using the following bash script:

# A bash script to automate 
# the build of mongo-c-driver 1.1.2 from
# &amp;amp;amp;lt;;amp;amp;gt;
cd $HOME/tmp
rm -rf *


if [ $? -ne 0 ]; then
        echo &amp;amp;amp;quot;Download failed&amp;amp;amp;quot; 1&amp;amp;amp;gt;&amp;amp;amp;amp;2
        exit 1

tar xvzf *
if [ ! -d $release ]; then
        echo &amp;amp;amp;quot;tar: failed&amp;amp;amp;quot; 1&amp;amp;amp;gt;&amp;amp;amp;amp;2
        exit 1

if [ ! -d $HOME/$release ]; then
        mkdir $HOME/$release
rm -rf $HOME/$release/*
cd $release
./configure --prefix=$HOME/$release

Everyone can see the script follows the standard procedure:

./configuer --prefix=PREFIX

No tweak, no twist! And after running this same script seven times, I’ve gotten seven error-free builds, in a row, back-to-back! You may ask, “Why seven?” Because it’s a prime number? Maybe. But I’m sure the reason why seven: it’s my lucky number and more important I like it because my birthdate has a 7 in it.

To be fair, there really is something rational behind the foggy superstition—and something interesting behind the familiarity of the script. I, somehow, decided to put the wget command in the script (see line 8 in the script)—my reasoning is that I might be better-off getting a fresh copy of the tarball. (I used the same download before that.)  But why a new download worked? Was I using a bad download corrupted by, say, cosmic ray hiiting my WAP while I was running the very first wget? I should’ve verified my downloads, right? But how? There’s no MD5 checksum, SHA1 checksum, or something like that on the » download page. (It turns out they do have the SHA256 checksum » here.)

So am I in good shape now and ready to move on? Guess what? With the intention of beefing up, I invoked the make test command—after a mysterious twitching of my fingers. Guess what? It reports:


Assert Failure: 0 == 1
tests/test-mongoc-client.c:97  test_mongoc_client_authenticate()
/bin/sh: line 1:  4685 Aborted                 ./$TEST_PROG -f -p -F test.log
Makefile:4209: recipe for target 'test' failed
make: *** [test] Error 134

The test failed! Now, the sore point is: how can I know, for sure, whether my error-free (according to make) builds are good or not?

by ysong55 at March 29, 2015 03:34 AM

March 28, 2015

Artem Luzyanin

[SPO600] Project, step 2, update #2

As I mentioned in my previous post, my plans to optimize spinlocks of PostgreSQL for aarch64 were ruined by whoever already implemented them. At the same time, I spent long time trying to understand the file and see what and how was already implemented. The file has a lot of repeating code, and not enough of documentation to elaborate on platforms and compilers already supported. So as a result, I will be reworking the file with new structure and documentation!

Example of what I mean: Let’s compare two platform-specific implementations:


#ifdef __checi386__ /* 32-bit i386 */

typedef unsigned char slock_t;

#define TAS(lock) tas(lock)

static __inline__ int
tas(volatile slock_t *lock)
register slock_t _res = 1;
__asm__ __volatile__(
" cmpb $0,%1 \n"
" jne 1f \n"
" lock \n"
" xchgb %0,%1 \n"
"1: \n"
: "+q"(_res), "+m"(*lock)
: "memory", "cc");
return (int) _res;

#define SPIN_DELAY() spin_delay()

static __inline__ void

__asm__ __volatile__(
" rep; nop \n");

#endif /* __i386__ */


#ifdef __x86_64__ /* AMD Opteron, Intel EM64T */

typedef unsigned char slock_t;

#define TAS(lock) tas(lock)

#define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock))

static __inline__ int
tas(volatile slock_t *lock)
register slock_t _res = 1;

__asm__ __volatile__(
" lock \n"
" xchgb %0,%1 \n"
: "+q"(_res), "+m"(*lock)
: "memory", "cc");
return (int) _res;

#define SPIN_DELAY() spin_delay()

static __inline__ void
__asm__ __volatile__(
" rep; nop \n");

#endif /* __x86_64__ */

As you can see, pretty much the only difference is extra line for TAS_SPIN(lock) definition and some difference in the assembly of the tas(lock) implementation. With majority of PS implementation being so much alike, I would like to shorted the code by about 70%, and have the checks  made at the time when needed. This will also make the flow of the code much easier to read, as there will be mostly only 1 path, as opposed to multiple paths, which is a lot harder to read.

Another thing I wanted to do for this file is to add some information in documentation. As a student/developer, I would find it very convenient to read right in the beginning of the file what platforms the code was already optimized for, what compilers were supported, and where compiler intrinsics vs PS assembler were used.  This allows for a quick read to see if there is something missing, or can be further improved.

Wait for “[SPO600] Project, step 2, update #3″ to see what I have done to the file!

by lisyonok85 at March 28, 2015 11:52 PM

[SPO600] Project, step 2, update #1

As I have mentioned in step 1 of my project, I was planning to optimize the spinlock function of PostgreSQL with platform specific code (in particular for aarch64). Unfortunately, due to hard to read code, I haven’t noticed the present aarch64 implementation. Possibly, this calls for a file documentation update, which I will discuss in the later update.

After a lecture, given on spinlocks, compiler intrinsics and atomic libraries, I was given an idea to rewrite the file entirely, to avoid the use of platform-specific spinlocks, and to switch them to compiler intrinsics. Further digging through the code showed that currently the code is checking whether it’s possible to use the compiler intrinsics on a specific platform, using them if it’s possible and if not, invoking platform specific code. If no code was written for a specific platform, a general code is used.

So the structure of the spinlock file is:

  •  check if spinlocks were requested at all: “#ifdef HAVE_SPINLOCKS”
    • if yes, check if gcc compiler will be used: “#if defined(__GNUC__) || defined(__INTEL_COMPILER)”
      • If yes, check the platform: “#ifdef __checi386__”
        • Now, if the platform is expected to support or not support the compiler intrinsics, then the code will only have one version for each of the two implemented functions (tas(lock) and spin_delay(lock)). In case if there are options, it will check to make sure that the right option will be used: “#ifndef __INTEL_COMPILE check ” or “#ifdef HAVE_GCC_INT_ATOMICS”.
      • After finishing with one platform, move to another. Currently supported hardware-specific implementations are for: i386, x86_64, ia64, arm, aarch64, s390 and s390x, sparc, powerpc, mc68000 and m68k (on Linux), vax, alpha, mips and sgi, m32r, sh.
    • If another compiler is used, go through the list of supported compilers and use the compiler intrinsics if it’s available. List of additional implemented compilers: UNIVEL, alpha, hppa, non gcc HPUX on IA64, AIX, SUNPRO, WIN32 and WIN64 compilers.
    • If no compilers can be used, it will stop with an error saying that “PostgreSQL does not have native spinlock support on this platform.  To continue the compilation, rerun configure using –disable-spinlocks.  However, performance will be poor.” So, essentially, we can’t use spinlocks on this machine.
  • If spinlocks weren’t requested, create the fake implementations, which shouldn’t be called anyways, since, again, no spinlocks were requested. My assumption is that they create it to generally implement the methods.
  • And finally, if whatever required methods weren’t implemented in the above steps, at the bottom of the file, those values are assigned based on default values.

So the structure of the file that is checking: 1) the possibility of using compiler intrinsics as an alternative for a spin-lock code; 2) if none are available, it does platform-specific in-line assembler spin-lock implementation; 3) if none can be done, it will make a generic implementation. Considering that this code structure will always utilize the best case scenario (intrinsics, then PS assembly, then generic), I have a strong opinion that this file cannot be reworked for a better spin-lock/alternative implementation, therefore the optimization of this file is complete.

Having said that, I am still in need to submit something for the step 3 of SPO600 project. What I will do, I will explain in my next post “[SPO600] Project, step 2, update #2″.

by lisyonok85 at March 28, 2015 11:19 PM

Thana Annis

Change of Plans…

This post is coming a little late, but there has been a change of course for my project. A few days ago I started looking through the build files and as I was reading I started to get this feeling of dread in the pit of my stomach. I think I was really naive in choosing my project because I didn’t realize just how complex building a project the size of python is. Not only is it very complex, it’s very written in script which is something I am a real beginner with. So it dawned on me that for me to make any meaningful change to the build options it would take an enormous amount of time which is something I am running out of on this project. I am also worried that with something so complex and with me being a completely new developer to python, there is an extremely low chance of it ever getting accepted to upstream.

With the deadline looming over me, I have decided to go to Plan C in an effort to produce something at all to be pushed upstream. If you don’t remember from my first project post, Plan C is improving coverage of the unit tests. I know that this isn’t exactly optimization, but at this point it is the only thing I will be able to come up with.

The first step to doing this is to get an idea of the current coverage for python. There is tool called that will go through each library file and point out which lines have yet to be covered in the tests. It will create a report in html where you can see a list of all the files and the details of the lines that need to be tested. The module I have decided to work on is and it has a test coverage of 86% with 54 lines that are not included in the testing. As you can probably guess, calendar will create a calendar with options for language, encoding, months, days and can be created in text or html. The corresponding test for this is called and it is where I will be making the changes. Now, not having written unit tests before… or written in python at all, I spent several hours going through the files and figuring out what everything is supposed to do. The coverage tool is nice, but I don’t think it’s entirely accurate and I was only able to really find a place to add 2 tests. With the 2 tests I added it reduced the amount of lines not included in the tests to 53. A lot of the missing lines are part of main and I don’t think they’re being read in properly to show that they’re being tested for.

I have the final version of my file done and all the tests still work so I think it’s something that can be accepted by the community. The next step is getting a patch file out of it and making a new issue on their issue tracker to be reviewed. I’ll be working on that tonight and tomorrow, so stay tuned for my next post tomorrow night where I’ll let you know how it went and will also include the patch file so you can see what changes I’ve made.

by bwaffles91 at March 28, 2015 08:32 PM

March 25, 2015

Thana Annis

Control Benchmarks for Build Optimization

So after many hours trying to get The Grand Unified Python Benchmark Suite working, I kind of got it. On both the x86 and the aarch64 machine, I was getting a permission denied error when I tried to run the universal command. I decided to abandon ship and see if I could get any benchmarking tests to work. I dug down further and found some single benchmarking tests that I can run. The only advantage I see to running the command that I couldn’t get to work is that it can run all the tests and compare 2 different python installations with one command. I will be doing it manually since I don’t have much time to spend on this anymore.

So without further ado, here are the results of my benchmarking tests for my control python installation on both architectures. I ran each test 10 times on each system.

Control Benchmarks

All these results indicate that the aarch64 is performing slower than the x86 system. Now that I have some benchmarks to compare my results to I will start trying to optimize the build options. My first step will be to see the difference between the build on both systems. The aarch64 seemed to take less time to build for some reason so that’ll be a good place to start. I’ll have another python installation set up for my changes and I’ll run all the same benchmark tests after each build change to see if the aarch64 has improved and that the x86 hasn’t gotten worse.

by bwaffles91 at March 25, 2015 11:07 PM

Mohamed Baig

Working on PDF.js

So last week I started working on a bug for PDF.js. I spoke to the developers of the library, on IRC, and they advised me to concentrate on a different bug because this particular one was an edge case and not easily implemented. So they asked me to focus on something that would be immediately […]

by mbbaig at March 25, 2015 08:52 PM

Justin Grice

SPO600 Update – Removing Inline Assembly Tests

As part of the SPO600 project and my decision to optimize PHPs zend_operators.h file, I wanted to get some accurate benchmarks of existing performance. This was accomplished through a mix of bench.php and micro_bench.php.

I initially wanted to test if removing the inline assembly code for the zend functions outlined in my previous post affected performance. I did this by installing PHP-5.6.5 onto a local directory on Australia and running bench.php. After running the benchmark 20 times, It resulted in a average time of 3.098s to complete. I then proceeded to modify the PHP zend_operators file by removing the inline assembly and rebuilt the files. When I repeated the tests with the new source, it resulted in an average of 3.106s. The two values seemed close enough to be within a margin of error so I wanted to test using micro_bench.php as well. When I repeated the tests using micro_bench.php the results were clearer. The removal of the inline assembly resulted in an increase of about 0.4s total. This seemed to indicate that performance decreases when inline assembly is removed.

Overall, I’ve decided to approach the issue differently and add inline assembly code for AArch64 systems that reproduces what is currently written for x86 systems. Ideally, this should improve performance for those systems by the same amount removing the inline assembly for x86 decreases it.

by jgrice18 at March 25, 2015 06:28 PM

Catherine Leung

On Code Documentation

There are two major problems that I see when I look at my student’s documentation for their code.

  1. no documentation … we are talking about not even having a header with their name on it here for assignments
  2. too much documentation – repeat all code in comments!

Both of these are problems.  One may not seem like a problem but it in fact is.

The lack of documentation in its entirety is just a job of incompleteness.  Most students understand that when they don’t do it, they will lose marks.  They know they should but for whatever reason they didn’t.   Its like knowing you should exercise but you don’t…This post isn’t about that.  The complete lack of documentation is relatively simple to address.

This post is addressed to those that document everything.  I know students who do this are very proud of their work and they put a lot of effort in it.  I get that.  I’m not knocking that.  However, I would like to ask you to please please please please please stop doing it.  Don’t do it.  It isn’t better.  A bad comment is worse than no comments at all.

Here are a few commenting guidelines:

  1. please don’t repeat your code in your comments… your code is there… anyone reading it sees what it does if they need to know it.  “this is a for loop” is not useful.  “this loop runs 10 times” completely silly.  “assign 5 to x”….please stop. please stop.
  2. I have found that comments become sacred over time.  programmers do not hesitate to alter code… and yet comments…well you can’t touch those (sarcasm).  So, with this in mind, I encourage you to write as little comments in the body of your code as possible!  Your code should be clear enough to read.  Unless you are doing something very very tricky, don’t put in a comment in the body of the code.
  3. Comments that are useful explain reasoning, that may not be immediately obviously.  In my first programming job I was trying to hunt down a bug in a program and I saw this comment “find first sunday in april and last sunday in october”… I could tell thats what the code was doing… that wasn’t hard.  I then spent a day trying to figure out why they needed to do this and whether or not it was the cause of the bug.  What they did and how they did it was clear… the code said as much.  I couldn’t figure out why it was that this was something that needed doing.  Finally, I realized the why and changed the comment to “adjust time calculations for daylight savings times”… to me this makes way more sense.  I’m not saying how to do it (and the how has changed since then as the time of year that daylight savings time comes into effect has changed).

Here is are the three things you should document:

  1. Have a header!  State who wrote the code, major modification dates, versions etc
  2. state the intention of each function (not how but what it is suppose to do), its parameters, expected return value and restrictions.
  3. document unclear variables, units  of measurement etc.

That’s it.  Let the code speak for itself.

by Cathy at March 25, 2015 05:04 PM

March 24, 2015

James Boyer

Project updates

Project Progress
Project Progress: PERL5
This is an update about my project progress in SPO600

Ruled out

In my previous post I mentioned three areas, The tail call optimization, Regex super - linear cache and inline assembly. After contacting the perl5 community on irc it seems that the tail call optimization portion should not be there, therefore it has been ruled out. I mailed the mailing list about any suggestions regarding the inline assembly portion and any other ideas they had but I have not received word back as of now. I kind of ruled out this assembly code for now unless further information comes up to suggest it has potential.

Moving on

In proceeding with this project I have not narrowed down or progressed as much as I would have liked to by this time, I am just spinning my wheels. For now I will work at understanding how I could improve the regex engine while also exploring other packages to see if I can find some more straightforward things to accomplish. Ideally I would like to find places I could use inline assembly optimizations or compiler intrinsics, both of which we have been looking at recently in the SPO600 course.

by James Boyer ( at March 24, 2015 09:56 PM

Maxwell LeFevre

Getting httpd Up and Running

In this post I will be discussing the process I went through to get httpd up and running on our Aarch64 and x86 machine. It turned out to be a little more challenging than what I was expecting. I learned a little bit about the default ./configure command as well as the default firewall settings in Fedora.

Build and Install Process

The build and install process was identical for Red (Aarch64) and Australia (x86_64). I installed httpd in a subfolder in my home directory. To do this I did a little reading about the configure command. It turns out that it has a default option that allows the user to select the install directory using the option --prefix=<path to desired directory>. Using this option generated a make file with the appropriate install path fairly smoothly. The makefile for httpd was not as straight forward as most of the other makefiles I have worked with. It was lacking obvious places to add compiler and linker flags for enabling gprof so I ended up wrapping gcc in a simple bash script as a brute force way of appending the -pg tag to all compile commands.

#! /bin/bash
/usr/bin/gcc -pg -g $*

I then built httpd with make -j 8 install The -j 8 option tells it to use 8 threads and speeds up the build time significantly. This process for configure/build/install was effective and produced immediate results on my first try. Where I ran into trouble was trying to connect to httpd from outside of the local host for the purposes of creating a load on the server so I could profile it.

Connection Issues

Having compiled, installed and run httpd on both Red (Aarch64) and Australia (x86_64), both of which produced a gmon.out file, I thought I was ready to start profiling. I picked a website template from here, based on it having a variety of text, images and links representing a standard website, and downloaded it into /var/www/html/. Next, I naively opened my web browser and typed in Australia’s web address expecting my site to show up. Instead I received a “Server not found” error. I was unsure about where to start troubleshooting I decided to try and connect to the httpd server locally from the command line. My experience with dealing with webpages from the command line is limited to curl on a Mac and wget on unix type machines. I decided to try and wget http://localhost/, which didn’t work, connection rejected. Connection rejected was a little more information than I had previously received but I was unsure of how helpful it was because I am not familiar with what I can and can’t use wget for.

A quick web search showed that there are command line web browsers, one of which is lynx. I installed it on Australia using sudo yum install lynx. Then I used it to try and connect to localhost again. lynx http://localhost/. It worked, I got the default webpage, but not the one I was expecting to get. The page I did receive talked about how it was there to confirm that the instal worked and that I needed to configure httpd.conf. After taking a look at httpd.conf I couldn’t see anything that looked like it needed changing. Going back and reading the whole web page I realized that my website was in the wrong place, so I moved it to /var/www/html/ and now lynx http://localhost/ shows what I expect it to. lynx http://urlThatIAmNotSharing/ also showed the correct page. Trying the web browser at home again still ends in rejection though.

My experience with the Internet and failed connections is that you can almost always blame the fire wall. This would be my next area of exploration. Never having worked with networking in Fedora from the command line the first thing I did was look on the Internet for ways to check if a port is open. The first thing I tried was netstat -a which gave me lots of information about connected and listening ports but was lacking anything obviously related to port 80. Next, the Internet told me to change my netstat search with the options netstat -tulpn. From the netstat man page I now know that:

  • t – display timer valuer
  • l – show only listening
  • p – shoe PID names
  • n – show numbers instead of name for ports etc

I wasn’t sure what the ‘u’ was for because it is not in the man page so I dropped it. The results of running sudo netstat -tlpn are:

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0    *               LISTEN      1345/sshd
tcp6       0      0 :::9090                 :::*                    LISTEN      1/systemd
tcp6       0      0 :::80                   :::*                    LISTEN      29325/httpd
tcp6       0      0 :::22                   :::*                    LISTEN      1345/sshd

The third line looks like httpd is up and listening on the right port so now I am starting to feel confidant that this is a firewall issue.

The Solution

After reading a few more questions on stack overflow about firewalls on Fedora I came across the command iptables -L. The -L causes it to list all the rules for the firewall. I have extracted the relevant portion of the output from sudo iptables -L and it can be seen below.

$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
INPUT_direct all -- anywhere anywhere
INPUT_ZONES_SOURCE all -- anywhere anywhere
INPUT_ZONES all -- anywhere anywhere
ACCEPT icmp -- anywhere anywhere
DROP all -- anywhere anywhere ctstate INVALID
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited

The problematic line is the third one, “ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED". The section that says, “ctstate RELATED,ESTABLISHED,” means that the firewall will only allow connections that are related to existing connections or are pre-existing connections. More information about iptables can be found here. To allow an http connection on port 80 an exception to this rule needs to be added. This is done with the command

sudo iptables -I INPUT 4 -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT

This adds a new rule at line 4 allowing NEW tcp connections on port 80 (http). When run the table is edited it looks like this:

$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
INPUT_direct all -- anywhere anywhere
ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:http //this line is the one I added
INPUT_ZONES_SOURCE all -- anywhere anywhere
INPUT_ZONES all -- anywhere anywhere
ACCEPT icmp -- anywhere anywhere
DROP all -- anywhere anywhere ctstate INVALID
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited

With this line added the http server is now accessible from outside of the local machine and I can move on with my profiling. The results of which will be the topic of my next post.


Website templates:
About iptables:

by maxwelllefevre at March 24, 2015 08:22 PM

Ryan Dang

Release 0.5

for release 0.5, I have been working on the issue “Show in discovery” should be in settings panel for the mobile-appmaker project.
My task is to add a check box to enable or disable the app to show in the discovery tab. I use the existing toggle component to add the check box. The check box should be initialized depend on the “isDiscoverable” property of the app object. The app is updated if the user click save or reset to the initial state if the user click cancel.

One of the issue I have was the check box was not resetting to the initial state when the cancel button is clicked. The issue is because the setting panel is displayed using “v-show”. The toggle component didn’t update its data because the page isn’t reloaded. I added a new function in the toggle component to update its data when the setting panel is showed.

The pull request is up here Add show in discovery in settings panel

by byebyebyezzz at March 24, 2015 11:54 AM

March 23, 2015

Gideon Thomas

Venturing into new territories

For our release 0.4 in our open source class, we decided to keep it short and abstract as a consequence of our unfortunate but inevitable decision to abandon (at least for now) our pursuit of embedding XMP metadata into images in Android. The goal of this release – to come up with a plan for the next few releases we have left for this class.

The task was surprisingly not easy. I initially intended to find a Java project on Github that I could contribute to so as to expand my Github repertoire with respect to programming languages. My efforts seemed to be futile. I searched for almost 4 days to find projects in Java that I possibly could contribute to…however, I came across many issues – either the project did not encourage contributors (many didn’t even have a file) or the project had only a few bugs left (all assigned to core project personnel) or the issues were too complicated to qualify for beginner (or even intermediate imho) bugs or the project was and Android project (just NO!). Granted, I had a double agenda of contributing to high profile projects by companies like Facebook, Twitter, Google, etc., but I didn’t think it was going to be this challenging.

In the end, I was able to find issues for 2 of my 3 releases so far. For my first release, I decided to play it safe and work on a Filer issue. I still need to get permission from a fellow coworker of mine (who is currently assigned to fix the issue) to steal it from him. For my next release, I decided to choose something that was even more cutting edge. I will be working on sweet.js, a library by Mozilla, another high profile company. Sweet.js is a javascript library that allows you to make use of macros, a new feature introduced in ECMAScript 6. I will be working on this issue that seems to indicate that there is a problem with operator precedence in macro substitution. I already have been given the green light to work on this :)

I still hope to find a Java project for my final release…hopefully it won’t be too hard. I might just ask around for suggestions which might lead me in the right direction!

by Gideon Thomas at March 23, 2015 03:59 AM

Klever Loza Vega

The Beginning of Something Awesome

New Task

Having completed our Brackets HTMLHinter extension, our team has moved on to a new task. The task involves stripping out the default editor from Thimble and replacing it with Brackets. Doing so gives us a lot more flexibility and enhancements than what Thimble currently provides.


For instance, we would have all the enhancements that Brackets currently has: inline colour preview, inline colour picker, inline image preview, element highlight, among other features. Adding a new feature would be easy as it can be done via an extension. Our goal is to modify the least amount of Brackets source code possible. This way, if there is an update on Brackets, we can pull the latest changes and greatly reduce the risk of conflicts. If Brackets introduces a new feature, we would get it for free. Another benefit is the fact that there will be no need to download or install anything, everything would just work on the browser. In fact, it would work on any device that has a browser; including tablets and mobile devices.

Work So Far

The first part of the project is called Bramble and it’s a branch of the original Brackets code. Essentially, Bramble is a stripped down version of Brackets. One of the first things we did was to strip most of the UI from Brackets and leave just the editor. We created an extension (hideUI) to do this, again, to keep the original code intact. We also moved from using a web socket connection to using a post message one. We made two extensions (thimbleProxy and brackets-browser-livedev) that use post message connections. There was also back-end work done to have Bramble talk to Thimble and vice versa.

Work In Progress

An ongoing task is to try and make the project as small as possible. That way it would load faster and provide a smoother experience. Especially for people with slower internet connections. Another ‘work in progress’ is getting multiple files to preview properly. Currently, one HTML file previews properly. However, when more files are introduced some bugs are exposed. This is especially true with CSS and JavaScript files.

Moving Forward

Our team successfully replaced the default editor in Thimble for our Bramble project. The product is called Super Thimble for the time being. It’s currently being tested on by our team. We hope to have Mozilla pick this up and possibly replace the current Thimble in the future.

by Klever Loza Vega at March 23, 2015 03:13 AM

March 21, 2015

Bradly Hoover

On the hunt

I have been on the hunt for the past few weeks, trying to find a piece of open source software to tinker with, and to increase its performance. Well, When i say the last few weeks, I mean I should have spent some time on it, but I did not. With the major project for CERN, and a presentation on that project coming up in the next 2 weeks, I have not put the time into any other of my classes, and my marks are goign to suffer. That said, I have finally figured out what I am going to work on.

Create Repo is used to create the repository meta data for package managers that utilize RPM packages, such as YUM.  It takes thousands of packages, goes into them, pulls out the meta data for dependencies and such so that when you run yum install package_name it installs all the dependencies you need into order to properly run that software. The reason I chose this package is because every time a repository must be created for a ditstro, or for a new platform, create repo has to be run. Since it utilized multiple cores, and can be used with thousands and thousands of packages, it can cause your system to grind to a halt until its done.

Having downloaded and tested it out and profiled create repo with around 900 packages, I have found an area of focus that I think would benefit from some rewriting.

The results of the benchmarking was the following;

50703 function calls (149742 primitive calls) in 11.240 seconds

Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2768    3.840    0.001    3.840    0.001 :0(write)

As you can see, this program writes to files. A lot. 3 times for each repo that is being scanned.

This code snippet is one possible area for improvement.

for fh, buf in zip(files, xml):

So this is starting to look promising. I have pick a project, and picked and are of it to start to work on.

What is not promising is the community for createrepo. There doe not seem to be one. The ticket system is full of spam. It looks like the way that I am going to get some feedback form the maintainer is to email them directly.






The post On the hunt appeared first on

by Brad at March 21, 2015 09:11 PM

Mohamed Baig

Moving on to something new. First Step PDF.js

So XMP part of the image embedding library did not work after several attempts with several different technologies by several different people. Now that project is abandoned for a newer technology that will be used later on. That technology will not be available for now so now I will be moving to solve bugs or […]

by mbbaig at March 21, 2015 03:45 AM

March 20, 2015

Andrew Benner

Long overdue Bramble update

Hello everyone! I realized today that I haven’t written a blog post in forever. So what the heck has the Bramble team been up to you ask?! Well, since my last post we have been able to spawn an iframe beside the Brackets text editor pane to show a live preview of the rendered code. Now the editor and preview pane is 100% Brackets. This is what it looks like now.

Screen Shot 2015-03-19 at 7.54.04 PM

At the moment it’s running via Heroku and can be found at this link. If you’re interested, you can play around with this project at its current state and file any bugs in this repo.

Once this task was completed the real fun began. Up to this point, we were just trying to get a minimum viable product (MVP) up and running so we weren’t worried about the size/loading time. Since we had completed this goal it was time to trim the “fat”. We were loading multiple versions of the same library, unnecessary libraries, and web fonts were taking up an unreasonable amount of space. We went through the code and removed extra/unnecessary libraries and began planning our approach to remove web fonts. To help us accomplish this task the lead of the Bramble project, David Humphrey, has been interacting with the Brackets team via Github. They’re aware of the project we’re working on and have been able to give us some helpful hint/tip with our use case. How does this tie into removing web fonts?

Well, the Brackets team has been nice enough to take a patch that will help us swap out the web fonts without changing a bunch of the Brackets code. This helps Bramble maintain parity with Brackets. Here’s an example.

Screen Shot 2015-03-19 at 8.20.12 PM

Since Brackets accepted this change, to remove the declaration of the fonts and replace it with a variable, it makes it easy for us to just change the value of the variable instead of having to go through all of the code and find each instance you want to change.

This was a valuable lesson for me about Open Source projects. Most people that are part of the Open Source community are very friendly and willing to help. David took a change by reaching out to the Brackets team and asking for help to make our project easier. They were super supportive and that change they accepted into their code base wasn’t necessary for their project, but they took it anyway to help us out.

I’d like to give a big shout out and say thank you to the Brackets team for that! It just goes to show it’s worth reaching out and asking for help, the worst thing that can happen is they don’t respond or refuse to help. In that case, you’re still in the same place you were when you started so you have nothing to lose.

We have also been able to get the undo, redo, and size buttons hooked up to Bramble. Also, a neat tutorial feature that was previously a part of Thimble is now in working order. Here’s a sample of the tutorial feature implemented on Thimble.

Current things we’re working on is to allow the publish feature to work and compatibility for multiple files. As you can tell there was a lot to talk about since my last blog post ages ago. Thanks for reading and stay tuned for what comes up next from us, I wont wait another month before updating you :P

by ajdbenner at March 20, 2015 12:50 AM

Ryan Dang

The beginning of a new chapter

Previously, our team had been working on a creative common project which would make embedding of creative common or any metadata to images easy. However, the project did not go well as we had expected. The technology we were trying to used is outdated and is not compatible with the current technologies. We had to take a step back to think of how we should approach this project. Because of the time constraints, our team can no longer continue working on this project as it is still waiting on the higher up management for new direction. This project will be continued on by Seneca CDOT in the near future.

For the remaining of the semester, I will continue working on the mobile app maker project which I had previously worked on.
The first issue I am trying to tackle down is Show in discovery” should be in settings panel #1267

I also made a suggestion for adding a delete button which is Have a delete button beside open my app. I will work on it as my second picked issue if it gets approved

My last issue will be System back button should only exit the app editor if there’s no other state to dismiss

by byebyebyezzz at March 20, 2015 12:36 AM

March 19, 2015

Maxwell LeFevre

Selecting a Final Project for SPO600

The next project in SPO600 is to select a portion of the LAMP stack (Linux Apache mySQL PHP) and improve it’s performance on AArch64 systems without reducing it’s performance on any other system. Having looked at the various options I have decided to work with Apache, more specifically the HTTP Server Project. I also considered the ActiveMQ project because I have some experience working with ActiveMQ and I considered Python for the same reason. Originally I was planning on working with Python but the community is much smaller and less responsive. I had gone as far as building Python on both systems with options enabled for profiling before changing my mind. I decided on HTTP Server because it is a larger and in it’s documentation it specifically states that they are looking for people to contribute portability patches specifically so I felt that I would have a higher chance of having my changes implemented quickly. Also, with the larger community, it should be easier to get help and feedback on my work.

Isolating a Section for Improvement

To select a section that I would like to try and improve I was planning on installing http server on both Australia and Red, profile them and then compare there results and look for areas where there is a significant difference between performance on the two machines. Unfortunately Red has been down so I have been unable to build or profile anything on Red since last week. As an alternative I have been going through bug reports on the Apache Bugzilla page looking for opportunities to fix bugs relating to portability. Unfortunately I haven’t been able to find any optimization bugs. All the bugs are related to problems that break the program.

Potential Issues

One issue I have considered is that profiling might be a challenge because I will have to load the machine via the internet and I am not sure how to redirect http requests to a specific version of the http server, or if it is even possible. I might have to stop the system http server while I test mine and this might interfere with the work of others. I am not sure if I will be able to access Red via a web interface and will have to talk to my professor about this.


I was really hoping to have completed some profiling work and isolate a section of the code that I will be working with, but at this point I have been unable to do any meaningful work on our Aarch64 system. Hopefully Red will be back up soon and I will be able to create my next post with a lot more detail. In the mean time I will be collecting a variety data on the x86 system so that I will have something to compare my data from Red to.


Apache HTTP Server Project home page:

Apache HTTP Server Project patching info:

Apache Bugzilla page:


by maxwelllefevre at March 19, 2015 11:14 PM

Andor Salga (asalga)



I wanted to learn the basics of require.js and pixi.js, so I thought it would be fun to create a small game to experiment with these libraries. I decided to make a clone of a game I used to play: Nibbles. It was a QBASIC game that I played on my 80386.

Getting started with require.js was pretty daunting, there’s a bunch of documentation, but I found more of it confusing. Examples online helped some. Experimenting to see what worked and what didn’t helped me the most. On the other hand, pixi.js was very, very similar to Three.js….So much so, that I found myself guessing the API and I was for the most part right. It’s a fun 2D WebGL rending library that has a canvas fallback. It was overkill for what I was working on, but it was still a good learning experience.

Filed under: 1GAM, Game Development, Games, JavaScript, Open Source

by Andor Salga at March 19, 2015 09:09 PM

Alfred Tsang

Danylo Medinski

Optimizing the Lamp Stack: Part 1

The Task

As part of the SPO600 curriculum, I been assigned the task of identifying and implementing a possible optimization of the LAMP stack. The LAMP stack is a collection of software usually composed of a Linux operating system, a Apache webserver, Mysql and PHP. Since in the past I spent some time messing around with PHP, I thought it would be a great idea to try to contribute.


At first finding a optimization proved to be difficult and time consuming.  First I tried going back to the category of high resolution counters( by looking for a possible optimization of the PHP microtime function. The idea was that if I would increase the speed of the timer to nano-seconds, the microtime function would become more accurate, but this was not the case. Firstly it wouldn’t really suit having a function called microtime to record time in nano-seconds, secondly,a faster timer does not necessarily mean more reliable time stamps.

I was about to rethink about which portion of the Lamp stack I would optimize until for some reason so late,  I remembered that usually poor or slow performance is caused by too many iterations or recursions in loops.

Functions that preform operations that rely on a loop such as searching or sorting can create slowdown if the supplied array is very large. Though it is impossible to have the performance on loops to be consistently faster with each different array length, it is completely possible to have these loops find their target quicker through less iterations, thus improving performance. This can be accomplished in many ways such as culling, unrolling etc.

I began to analyse the PHP array search functions, “in_array” and “array_search”. Both these functions call the php_search_array function located in the “php-5.6.6/ext/standard” directory in the official PHP source code.

static void php_search_array(INTERNAL_FUNCTION_PARAMETERS, int behavior) /* {{{ */
zval *value,				/* value to check for */
*array,				/* array to check in */
**entry,				/* pointer to array entry */
res;					/* comparison result */
HashPosition pos;			/* hash iterator */
zend_bool strict = 0;		/* strict comparison or not */
int (*is_equal_func)(zval *, zval *, zval * TSRMLS_DC) = is_equal_function;

if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "za|b", &value, &array, &strict) == FAILURE) {

if (strict) {
is_equal_func = is_identical_function;

zend_hash_internal_pointer_reset_ex(Z_ARRVAL_P(array), &pos);
while (zend_hash_get_current_data_ex(Z_ARRVAL_P(array), (void **)&entry, &pos) == SUCCESS) {
is_equal_func(&res, value, *entry TSRMLS_CC);
if (Z_LVAL(res)) {
if (behavior == 0) {
} else {
zend_hash_get_current_key_zval_ex(Z_ARRVAL_P(array), return_value, &pos);
zend_hash_move_forward_ex(Z_ARRVAL_P(array), &pos);


I noticed that in this function, the target is searched by using the linear/sequential search algorithm. This means that the loop iterates through every index in a sequence until the end of the array..

The linear search algorithm is not bad in terms of performance when searching a relatively small array, but performance drops will happen if the target is searched through a very large array,.

Introducing Binary Search

To avoid going though every index in the array, we can search the array based on a key value. This is called Binary Search and it is a great algorithm for searching arrays that would otherwise effect performance if searched through a linear search.

Binary Searching is accomplished by first sorting the array by a Ascending to Descending order. Then instead of iterating through every index, we begin at the middle index of the array, and we compare that index with our key value. Based on this comparison we can eliminate the indexes that are less or greater then the target. This step is repeated until the remainder is the value we are searching for.

For example if we have a array of 100 indexes of numbers between 1- 100 and we want to search for 89, by using the binary search algorithm the search would go to index 50 first. The program checks if 89 is larger then 50, and is so, it eliminates indexes 0-49. This repeats with the middle now being 75. The search remove indexs 50 -75. This repeated until the remainder is either 89 or nothing  This effectively reduces the number of times the function need to iterate thus resulting in a quicker search.

To benchmark the performance of binary search, I wrote a PHP script that tests the performance of both a simple linear and  binary search. A array of 1000000 index that contain numbers 1 to a 1000000 is first allocated. This array will be passed into the search functions. Then both the search functions are called to search for the number “910000” with the microtime function recording the execution time.

function binarySearch($toSearch, $arrayToSearch) {
    $time_start = microtime(true);

    $left = 0;
    $right = count($arrayToSearch) - 1;

    while ($left <= $right) {
       $mid = ($left + $right)/2;

       if($arrayToSearch[$mid] == $toSearch){
          echo "Found\n";
       elseif($arrayToSearch[$mid] > $toSearch){
           $right = $mid - 1;
       elseif($arrayToSearch[$mid] < $toSearch){
           $left = $mid + 1;

    $time_end = microtime(true);
    return $time_end - $time_start;

function linearSearch($toSearch,$arrayToSearch){
    $time_start = microtime(true);
    $len = count($arrayToSearch);
    for($i = 0;$i < $len;$i++){
        if($toSearch == $arrayToSearch[$i]){
            echo "Found\n";
    $time_end = microtime(true);
    return $time_end - $time_start;

After running the script multiple time for consistency, this was the result.

Linear Search
Linear search time: 0.067621946334839 seconds

Binary Search
Binary search time: 0.000121831893920 seconds

We can notice that the binary search preformed quicker then the linear search. On smaller arrays the difference between execution time is not noticeably large, but on large array, it seems that binary search is preferable to linear search.

For more information on binary search, check out this link.                                                                                   

A Course of Action

I will attempt to optimize how PHP searches arrays by replacing the original search algorithm of php_seach_array with the binary search algorithm. Worst case scenario if i am unable to contribute to the current php_search_array function,  i will attempt to add my code as a new feature in array.c. I will essentially retain most of the original code along with the implementation of a faster search algorithm. This should hopefully result in a performance increase.

Before I can begin coding and optimizing, there are some certain steps I must first take. On one of my previous post were I analysed two platform specific packages(, I mentioned the rules to contributing to PHP. Since this may be a major change or a potential new feature, I will require

  • RFC and an authorized Git account
  • To fork the github repository
  • bugtracker for the patch/change
  • A discussion on the mailing list

Once these steps are complete, I shall start the implementation


by ddmedinski at March 19, 2015 03:07 AM

March 18, 2015

Hosung Hwang

CCL license tagging Java Implementation for Image

In the previous posting, I did some research about XMP Implementation. Adobe XMP Toolkit was huge and hard to port to Android. As a way to implement pure Java library, iCafe library was good solution. However, it support only extracting and inserting stream to the tag position of XMP, EXIF, etc. To parse the stream, Adobe XMPCore was standard solution.

iCafe library to PixyMeta

iCafe library is a image processing library. As a part of it’s functionality, there are image metadata handling features. I emailed the author of iCafe, and asked him to make another repository that has only metadata part. Thankfully, he sliced out the metadata related part and made another github repository called PixyMeta.

Creative Commons XMP License Specification

Creative Commons uses following properties in the XMP rights management schema.

  • xmpRights:Marked : False if Public Domain, True otherwise
  • xmpRights:WebStatement : URL containing metadata about the XMP-embedded file
  • xmpRights:UsageTerms : An optional field describing legal terms of use
  • license : The license URL
  • morePermissions : URL where additional permissions
  • attributionURL : URL to use when attributing this work
  • attributionName : creator’s preferred name to use

Inserting CCL license into JPEG XMP tag

I decided to make a sample that runs on the desktop first. The sample will work as following order.

  1. Extract XMP tag from image file
  2. Parse XMP
  3. Read CCL related information
  4. Add CCL license information
  5. Make XMP
  6. Put XMP to the image file

I mainly used PixyMeta library’s test source to test. After reading the XMP tag from a JPEG image, parsed using Adobe XMP Core library. Sample image already had xmpRights:Marked=False property, which means that it is “Public Domain”. Then I put more properties using Adobe XMP Core and put the tag into the image using PixyMeta library. Following is core part of source code:

FileInputStream fin = null;
FileOutputStream fout = null;
//PixyMeta part
if(metadataMap.get(MetadataType.XMP) != null) {
    XMP xmp = (XMP)metadataMap.get(MetadataType.XMP);
    Document xmpDoc = xmp.getXmpDocument();

    fin = new FileInputStream("images/bedroom_arithmetic.jpg");   
    String str = DocumentToString(xmpDoc);

    //Adobe XMP Core part
    XMPMeta meta = XMPMetaFactory.parseFromString(str);

    registry.registerNamespace(XMPConst.NS_XMP_RIGHTS, "xmpRights");
    meta.setProperty (XMPConst.NS_XMP_RIGHTS, "xmpRights:WebStatement", "");
    meta.setProperty (XMPConst.NS_XMP_RIGHTS, "xmpRights:UsageTerms", "This work is licensed to the public under the Creative Commons Attribution-ShareAlike license­sa/2.0/ verify at");
    meta.setProperty (XMPConst.NS_XMP_RIGHTS, "license", "");
    meta.setProperty (XMPConst.NS_XMP_RIGHTS, "attributionName", "Hosung Hwang");

    String strMeta = XMPMetaFactory.serializeToString(meta, new SerializeOptions().setOmitPacketWrapper(true));
    xmpDoc = stringToDom(strMeta);

    fout = new FileOutputStream("output.jpg");

    //PixyMeta part
        Metadata.insertXMP(fin, fout, XMLUtils.serializeToStringLS(xmpDoc, xmpDoc.getDocumentElement()));
    else {
        Document extendedXmpDoc = xmp.getExtendedXmpDocument();
        JPEGMeta.insertXMP(fin, fout, XMLUtils.serializeToStringLS(xmpDoc, xmpDoc.getDocumentElement()), XMLUtils.serializeToStringLS(extendedXmpDoc));

Worked well.
Following is original image’s XMP information :
Screenshot from 2015-03-18 14:00:48
Following is changed image’s XMP information :
Screenshot from 2015-03-18 14:01:28


  • PixyMeta + Adobe XMPCore combination worked well in desktop
  • The same test need to be done in the Android
  • PixyMeta include iCafe library, the size is 5.1 MB. Big portion of the library was CMYK/RGB image profile data. I don’t think these are necessary for metadata manipulation. After discussing with the author that could be removed.
  • For CCL License manipulation purpose, a wrapper class that uses PixyMeta and XMPCore can be needed.

by Hosung at March 18, 2015 06:17 PM

March 17, 2015

Hosung Hwang

[SPO600] Optimizing Python for AArch64 1 – Project Selection

This posting is about the selection and first stage of SPO600 Project. I Chose Python.

Why Python?

My first profiling target was php. For the php, it was hard to find the right way for profiling. For example, gprof profiling was not easy. I tried to find typical way for performance profiling, but it was hard to find. Actually I couldn’t find. In the mailing list, I couldn’t get answer.

Python is widely used hi-level language that are used for both client and server. And the language’s intention, highly readable language was attractive. It is called as one of the easiest programming language to learn. I wanted to look at how this language was designed.

In terms of contributor support, python has tons of resource from this site. Also, for the performance benchmarking, it has The Grand Unified Python Benchmark Suite, which is a collection of benchmarks for all Python implementations. Analysing this suit itself will be helpful to start benchmarking. Using this Suite and gprof will be helpful.

gprof profiling

To do gprof profiling, configuration with profiling option is needed.

./configure  --disable-shared --enable-profiling

This option automatically put -pg option to Makefile. At the end of the make it produces gmon.out. I guess it performs a test. Following is the generated image from the test profiing.


The Grand Unified Python Benchmark Suite

For standard profiling, what I found was “The Grand Unified Python Benchmark Suite”. I didn’t fully understand how to use it. I tried to run one of the the script using python executable that is already installed. Performance output is :

benchmarks-1bd1437ea49b/performance/pybench$ python
* using CPython 2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.time

Calibrating tests. Please wait...

Running 10 round(s) of the suite at warp factor 10:

* Round 1 done in 2.390 seconds.
* Round 2 done in 2.340 seconds.
* Round 3 done in 2.280 seconds.
* Round 4 done in 2.317 seconds.
* Round 5 done in 2.277 seconds.
* Round 6 done in 2.300 seconds.
* Round 7 done in 2.436 seconds.
* Round 8 done in 2.364 seconds.
* Round 9 done in 2.355 seconds.
* Round 10 done in 2.702 seconds.

Benchmark: 2015-03-17 00:38:58

    Rounds: 10
    Warp:   10
    Timer:  time.time

    Machine Details:
       Platform ID:    Linux-3.13.0-46-generic-x86_64-with-Ubuntu-14.04-trusty
       Processor:      x86_64

       Implementation: CPython
       Executable:     /usr/bin/python
       Version:        2.7.6
       Compiler:       GCC 4.8.2
       Bits:           64bit
       Build:          Mar 22 2014 22:59:56 (#default)
       Unicode:        UCS4

Test                             minimum  average  operation  overhead
          BuiltinFunctionCalls:     47ms     50ms    0.10us    0.066ms
           BuiltinMethodLookup:     33ms     35ms    0.03us    0.077ms
                 CompareFloats:     44ms     47ms    0.04us    0.088ms
         CompareFloatsIntegers:     56ms     58ms    0.06us    0.067ms
               CompareIntegers:     44ms     47ms    0.03us    0.133ms
        CompareInternedStrings:     34ms     35ms    0.02us    0.335ms
                  CompareLongs:     34ms     35ms    0.03us    0.077ms
                CompareStrings:     31ms     32ms    0.03us    0.224ms
                CompareUnicode:     40ms     41ms    0.05us    0.170ms
    ComplexPythonFunctionCalls:     42ms     43ms    0.22us    0.111ms
                 ConcatStrings:     47ms     53ms    0.11us    0.133ms
                 ConcatUnicode:     43ms     50ms    0.17us    0.093ms
               CreateInstances:     39ms     44ms    0.40us    0.094ms
            CreateNewInstances:     31ms     34ms    0.40us    0.091ms
       CreateStringsWithConcat:     42ms     44ms    0.04us    0.221ms
       CreateUnicodeWithConcat:     29ms     30ms    0.08us    0.088ms
                  DictCreation:     32ms     34ms    0.08us    0.088ms
             DictWithFloatKeys:     43ms     45ms    0.05us    0.167ms
           DictWithIntegerKeys:     43ms     45ms    0.04us    0.223ms
            DictWithStringKeys:     35ms     39ms    0.03us    0.221ms
                      ForLoops:     24ms     28ms    1.11us    0.019ms
                    IfThenElse:     32ms     33ms    0.02us    0.168ms
                   ListSlicing:     46ms     47ms    3.33us    0.025ms
                NestedForLoops:     34ms     36ms    0.02us    0.010ms
      NestedListComprehensions:     39ms     40ms    3.33us    0.023ms
          NormalClassAttribute:     37ms     41ms    0.03us    0.113ms
       NormalInstanceAttribute:     28ms     31ms    0.03us    0.111ms
           PythonFunctionCalls:     37ms     38ms    0.11us    0.066ms
             PythonMethodCalls:     40ms     42ms    0.19us    0.035ms
                     Recursion:     49ms     50ms    1.00us    0.113ms
                  SecondImport:     44ms     45ms    0.45us    0.044ms
           SecondPackageImport:     40ms     42ms    0.42us    0.044ms
         SecondSubmoduleImport:     55ms     57ms    0.57us    0.044ms
       SimpleComplexArithmetic:     35ms     36ms    0.04us    0.088ms
        SimpleDictManipulation:     41ms     42ms    0.04us    0.120ms
         SimpleFloatArithmetic:     41ms     42ms    0.03us    0.135ms
      SimpleIntFloatArithmetic:     34ms     34ms    0.03us    0.149ms
       SimpleIntegerArithmetic:     34ms     34ms    0.03us    0.132ms
      SimpleListComprehensions:     34ms     34ms    2.84us    0.023ms
        SimpleListManipulation:     30ms     31ms    0.03us    0.144ms
          SimpleLongArithmetic:     29ms     30ms    0.05us    0.074ms
                    SmallLists:     37ms     38ms    0.06us    0.088ms
                   SmallTuples:     37ms     38ms    0.07us    0.100ms
         SpecialClassAttribute:     38ms     39ms    0.03us    0.110ms
      SpecialInstanceAttribute:     60ms     61ms    0.05us    0.113ms
                StringMappings:     52ms     56ms    0.22us    0.111ms
              StringPredicates:     46ms     50ms    0.07us    0.648ms
                 StringSlicing:     23ms     25ms    0.04us    0.188ms
                     TryExcept:     28ms     31ms    0.01us    0.166ms
                    TryFinally:     31ms     34ms    0.21us    0.089ms
                TryRaiseExcept:     28ms     31ms    0.48us    0.088ms
                  TupleSlicing:     41ms     45ms    0.17us    0.016ms
               UnicodeMappings:     31ms     33ms    0.92us    0.143ms
             UnicodePredicates:     34ms     37ms    0.07us    0.781ms
             UnicodeProperties:     43ms     47ms    0.12us    0.651ms
                UnicodeSlicing:     35ms     41ms    0.08us    0.166ms
                   WithFinally:     48ms     53ms    0.33us    0.090ms
               WithRaiseExcept:     60ms     65ms    0.81us    0.111ms
Totals:                           2245ms   2376ms

This output seems to be usefull. However, this benchmark directory(pybench) is said that “unreliable, unrepresentative benchmark” by README.txt.

I tried to use the python executable I built with profiling option. However, this script didn’t work with the latest version of python. I tried to fix the script, but, there were to many syntax errors. I guess many syntax that used in the script was deprecated. Python executable could have option for use of deprecated syntax.

The difference of version is :

$ python --version
Python 2.7.6
$ /home/hosung/python/cpython/python --version
Python 3.5.0a2+


So far, I chose the project: Python, built it, and built with profiling. I didn’t decide which part I am going to optimize. next step will be :

  • Figuring out which version of source I need to use
  • Figuring out Which profiling script I need to use
  • Getting profiling result from AArch64 machine.
  • Picking up the part I am going to optimize.

As a profiling tool, gprof seems to be appropriate because it is built in on the configuration and I could get easily the output.

by Hosung at March 17, 2015 02:44 PM

Thana Annis

Optimizing Python

For my project this semester I am planning on tackling Python. More specifically, I am planning on attempting to optimize the build flags for the aarch64 platform.

The reason I chose to work on Python was because I wanted to work on the “P” part of the LAMP stack, I think it is where I feel most comfortable (although I am not entirely comfortable in any of the sections since I don’t have too much experience). Inside the “P” part of the LAMP stack, I decided to work with Python because it’s a popular language with an active community, it’s written in C which I feel okay writing in and understanding, and because I’m curious about the language itself. And finally, I chose to work on the build flags mainly because it’s something I’ve never done before and it looks like it could be a potential spot for improvement. With any luck I’m not going down the wrong path and I’ll be able to learn a lot about how large projects are built.

In terms of benchmarking, I did some testing on both an aarch64 system and an x86_64. Unfortunately, the aarch64 server is not up right now, and I haven’t seen it up all weekend so I can’t go into too much specifics. What I found after running 3 different scripts several times on each server was that the aarch64 system is taking approximately 0.030s longer for each script that I ran. The aarch64 system was very consistent in the results, with little to no deviation on each run, while the x86_64 was faster but had a greater deviation of approximately +/-0.010s. Because of the consistent difference in performance, I’m going to be looking closer at the build options for aarch64 to see if there can be anything done to increase the performance. It was suggested in class, and is something that is in the back of my mind while I’m working on this, that the difference in performance could be a result of the x86_64 machine simply being a faster machine and has nothing to do with the platform or build options. I will be considering this while I execute my game plan.

The Game Plan
My game plan right now is to read through the make file more closely and try to get a stronger handle on the sections I need to work on. Then as was suggested in class, I will try to create a script that will change the build options and run benchmarking tests to see if I can find a good combination. Python has some official benchmarks and benchmarking tools that I will be using for this since I think they’ll be more open to accepting a patch that was proven using their own tools. My back up plan in case I find out that the build options are perfect as is, is to scour through the issue tracker and see if anything jumps out as an aarch64 problem and look into solving that. My back up back up plan is to create a new test to help Python reach 100% test coverage, specifically testing some platform specific code.

The Python Community
The community for Python is very active, there are nearly 15 issues that have been responded to in the last 24 hours. There is a lot of documentation to help get a change pushed upstream, the main document is here at I followed the instructions there for building the source code. Basically, I will be pulling the source code using Mercurial, making the change and then creating an issue, if one does not already exist, on their issue tracker at After that I’ll need to wait for a core developer to review my change and decide if it’s worthy of being promoted. There is an IRC channel and mailing list for developers where I can ask for help or try and get my issue looked at more quickly. When my patch is accepted, it will either be added to the enhancements to be added in the next release and/or pushed to previous releases to increase the performance of older builds that are still in use.

I’m looking forward to working on the code since contributing to an open source project is something I’ve wanted to do for a while now but haven’t really known where to start. Hopefully, I’ll learn a lot and get my changes pushed upstream.

by bwaffles91 at March 17, 2015 03:41 AM

Yan Song

SPO600 Course Project Phase 1

MongoDB, part of the ‘M’ layer of the LAMP stack, is interesting for its “document-oriented” approach to data storage and retrieval. The » MongoDB ecosystem includes major pieces such as the core MongoDB server and drivers in various programming languages like C. In seeking a course project it would be appropriate to take into account of one’s own skill set so that the choice would be more plausible and viable. Indeed, that is the reason why I have decided to work on some aspects of the » MongoDB C driver.

After a few “false starts,” I hit upon this open issue on the Jira system for the MongoDB C Driver project. That issue reports on a build failure on a ARMv7l system. (That’s an ell, which, according to some sources on the Web, is a 32-bit processor.) The issue is reported by one of the (four) “core developers” on the MongoDB C driver project. This is awfully exciting: it would set off a chain reaction in the whole life-cycle of  the C driver in our Aarch64 system, from porting to optimization, and so forth. Therefore, as my course project, I would like to focus on porting the MongoDB C driver to our ARMv8 system.

The community around the MongoDB C driver is fairly small: the project team consists of four core developers, one of them the project lead; there’re also 27 contributors to the GitHub repo.  No irc’s of the coure developers are available on their Jira system, so it seems email would be a preferred communication channel. In addition, no detail about the inner workings of processes like code review and patch acceptance is available publicly, and as a result, one might safely assume that one can go the standard, “pull request” way.

Currently, we know for sure that identifier bson_atomic_int64_add appears in the header file mongoc-counters-private.h, whereas we don’t know where the “offending” undefined reference to the aforementioned identifier is. So we need to find that out as soon as our red is up and running. Moreover, it would be more desirable for us to build rapport with the core developers and/or the contributors on GitHub. 


by ysong55 at March 17, 2015 03:12 AM

Hosung Hwang

SIRIUS – Open-Source Digital Assistant


Using SIRIUS, we can build our own Siri, Google Now or Microsoft Cortana. SIRIUS is open source speech and vision based intelligent personal assistant service.

It recognizes speech and image, and give answer based on web search.


Possible idea using SIRIUS

  • Using image matching part for Creative Commons image searching (my current project)
  • Building it on Raspberry Pi that has video camera. -> Almost Google glass
  • Desktop application as an assistant (using voice recognition and screen capture)
  • Browser extension that calls SIRIUS to get information from image or text

by Hosung at March 17, 2015 02:06 AM

March 16, 2015

James Boyer

Project Progress

Project Progress
Project Progress: PERL5
This post is about a project I am starting in my SPO600 class that requires me to optimize a portion of the lamp stack.

Areas to optimize

I have chosen Perl5 as the package that I will be working on and have found a few areas that may be good areas to optimize and make changes.
The first two areas I found using the perl todo list.
The first was tail call optimization:
Seen at line 1130.
This would essentially have me find areas where tail call optimization is possible and rewrite them to implement it. Here is a link that explains what tail call optimization is: TCO

The second area I found was in regards to Perls regular expression engine.
Seen at line 1154
In their engine certain regular expressions end up taking exponential time. They have a workaround for this called super-linear cache but they say the code has not been well maintained and could use improvement. I found the location of this problem in the source by grepping for the keyword 'super-linear', found at regexec.c.
It seems like this could be an area for optimization although I am not very confident about how I would attempt this or what I would change to improve it because I do not have a strong knowledge of how a regular expression engine works.

The final one I found by looking through the perl 5 git repository(Instructions here), I found some sections of inline assembly by grepping for the keyword 'asm' using 'grep -r asm ./*' these sections were in a file called os2.c:
These functions may potentially be able to be ported to aarch64 syntax. I am a bit uncertain about this area because I am not sure what this code does or if it is important or not.

Why Perl?

I chose Perl for my project because the community seems really clear and organized. They have a todo list with various tasks which is very useful and as you can see above it helped me a lot with regards to finding areas to work on. Also they have a very active community, In their mailing list archive, there are daily messages which makes me confident that if I need help or need to ask a question I won't be waiting for extended periods of time.


Looking at my 3 options I believe the tail call optimizations might have a large impact depending on how many areas I can find. I would like to implement some code involving the aarch64 platform because that would relate to the SPO600 course the most but I am uncertain about the inline assembly code that I have found so far. The regular expression area seems really interesting, but I am afraid it would not be feasible given the time I have, it is something I will definitely consider if my project doesn't go as planned.
Proceeding with this project I plan on starting to work out how to apply the tail call optimizations while I engage with the upstream community about which direction is the best for them and for me. I also plan on benchmarking Perl on x86 and aarch64 to see if I can find any further areas or functions that may let me perform a platform specific optimization.

Perl Upstream

Perl has a relatively straightforward guide on their website here.
To summarize, if you have a patch either use perlbug or send it to Once the patch has been processed it will be posted on the mailing list for discussion, you are encouraged to join the discussion and promote your patch. They recommend using git, You can get the source by using 'git clone git:// perl', Once you make changes you can use git diff to make a patch, this compares your branch and the main branch to produce the patch.


This project has made me the most nervous of any project I have had so far. It is filled with uncertainties, a couple weeks ago I was uncertain I would even find anything to work on but eventually I did. Now I am uncertain on which direction to go and whether or not my contributions will be accepted. Regardless of what happens it is a great learning experience and I now appreciate the complexity of large projects like Perl or other packages in the lamp stack.

by James Boyer ( at March 16, 2015 10:54 PM

Hong Zhan Huang

SPO600: Project Perl Phase 1 – Project Selection and Analysis

We’ve come to a turning point in the SPO600 course, a sharp redirection straight into the course’s final project. There are three major components to this project:

  1. Identifying a possible optimization in the LAMP Stack
  2. Implementing said optimization
  3. Finally committing that change upstream

To put it in a few words, this is a challenging project!

Setting aside the onset of fear, anxiety and typical cold symptoms, a choice must be made. The choice of what exactly will I work with in order to get started with this project. After peeking into each letter that composes LAMP, I settled on the P. P for Perl to be exact.

Why Perl?

In my initial research into the software I looked into their to-do list of improvements and changes and was greeted with:

What can we offer you in return? Fame, fortune, and everlasting glory? Maybe
not, but if your patch is incorporated, then we'll add your name to the
F<AUTHORS> file, which ships in the official distribution. How many other
programming languages offer you 1 line of immortality?

Immortality? Sign me up.

Amusing quips in the to-do and Tolkien quotes in their makefiles aside, I found that the documentation on Perl to be plentiful and detailed. Their online documentation contains tutorials and other helpful material to help those new to the language to get started. Most relevant to me is their perlhack section which details how the process of hacking Perl begins and the standard processes of doing so. There are also articles that explain how the interpreter works and a tutorial dealing with the creation of a simple patch for the core portions of Perl which are written in C. I am still in the middle of reading all this reference material but it is giving me a better idea on how to tackle the creation of an optimization patch.

Augmenting the well organized documentation is a seemingly active community of Perl hackers and maintainers. I had signed up for their mailing lists and have seen new posts and comments every day. Having also joined their #p5p irc channel at , I’ve seen regular conversations on the goings on with code base. I hope soon to also participate in these back and forth chats.

About the patching

As this is what I’m trying to successfully do by the end of this project let’s briefly go over what it entails in regards to Perl (the full documentation on patching can be found here):

  • Patches can be submitted through the perlbug tracker or sent to
  • After the submission goes through it is then placed on the mailing list to further discussion and review. Simple non controversial patches will likely be accepted without question but otherwise more discussion and work on the patch will likely be required.
  • Needs to follow the accepted Perl style conventions which include things such as “No C++ style (//) comments”
  • Requires testing typically using one of the already available testing suites in Perl.
  • Lastly it is recommended to use git when attempting to create a patch for Perl there is functionality for more easily producing a patch specifically for Perl. “git format-patch will produce a patch in a style suitable for Perl. The format-patch command produces one patch file for each commit you made. If you prefer to send a single patch for all commits, you can use git diff .”
So what about Perl do we try to optimize?

The Perl to-do list provided a few opportunities explore in terms of optimization:

  1. Tail-call optimization – The idea is to replace the outer return call in situations such as return foo(…) with a goto to reduce the amount of overhead on the call and return which is apparently of a higher cost in Perl than in C. The proposed solution in the entry is to create a new pp_tailcall function which would handle these particular situations.
  2. Revisiting the regex super linear cache code – Look for cases where this code fails to match certain regex patterns. To quote the to-do entry “Perl executes regexes using the traditional backtracking algorithm, which makes it possible to implement a variety of powerful pattern-matching features (like embedded code blocks), at the cost of taking exponential time to run on some pathological patterns. The exponential-time problem is mitigated by the super-linear cache, which detects when we’re processing such a pathological pattern, and does some additional bookkeeping to avoid much of the work. However, that code has bit-rotted a little; some patterns don’t make as much use of it as they should. The proposal is to analyze where the current cache code has problems, and extend it to cover those cases.”
  3. Profile for hot ops – There are certain operations in the Perl code that are located in the pp_hot.c file and these operations are the ones that are used most often. To again quote the to-do entry: “As part of this, the idea of F<pp_hot.c> is that it contains the I<hot> ops, the ops that are most commonly used. The idea is that by grouping them, their object code will be adjacent in the executable, so they have a greater chance of already being in the CPU cache (or swapped in) due to being near another op already in use. Except that it’s not clear if these really are the most commonly used ops. So as part of exercising your skills with coverage and profiling tools you might want to determine what ops I<really> are the most commonly used. And in turn suggest evictions and promotions to achieve a better F<pp_hot.c>. One piece of Perl code that might make a good testbed is F<installman>.”

Of the above three options available I feel that 1 and 3 the most feasible challenges to attempt (however the tailcall entry was listed later in the list and entries that are near the end of their relevant sections are considered the more difficult problems versus the earlier ones). While I understand the idea behind the tail call optimization, finding the areas where it may be applied might take some looking. Profiling for hot ops seems to be the most straight forward but finding operations that aren’t already in the pp_hot.c file could prove to be an ordeal as I’m not still inexperienced in how the interpreter works. Optimizing code that involves regex seems to be far beyond my scope.

Profiling Perl

To be honest, I’m not certain how stable the above options are for use in this project or if I’ll be able to complete them. Given that, I set to profile Perl from both a C and Perl perspective to see if I could find additional pain points to further increase my pool of options.

From a C perspective:

Much like Lab 3 where I profiled Python, the exact same procedure is used to profile Perl. By building the binaries with an additional -pg option in the compiler and linker flags we produce a gmon.out file which is then used with Gprof to create some profiling data. I used the Installman Perl script (it installs the man pages for a build of Perl) as the basis for this profiling as it is a non trivial piece of code. Looking through the profiling data of this script on both Australia and Red I found one section that had a disproportionate difference in the amount of time taken.

The image on the left is the Australia and the right is Red. As it can be seen Red’s Perl_leave_scope has a difference of around 4.26% in how much of the total time was spent on that particular portion of the code when running the installman script. This looks to be a potential point of interest when looking for places where some platform specific optimization can be had. The portion of code where the Perl_leave_scope function is located is here. The full Gprof profiling data for both Australia and Red had here. At the time of this writing I haven’t full grasped exactly how this function works.

Aus-Installman-Portion Red-Installman-Portion

From a Perl perspective:

Perl has a robust profiling tool called NYTProf which I used to profile the installman script from the Perl side of the code to see what opportunities there are from this end. NYTProf analyzes the data and produces the results in html format for easy viewing. At the time of this writing I have not fully compared the results of the profiling between Australia and Red but I’ll make the profiling data available here for viewing. Very likely I’ll make a follow up post on this but I’m not certain if I will follow up this route for optimization if the opportunity exists as it would lean on having more Perl knowledge than I presently have.

Going forward

At present I think I have a few avenues of work to pursue and potentially even more after the analyzing the NYTProf results. I also intend to do another Gprof style profiling with other scripts to see if there are other pain points not present with the installman profile. The to-do list options are also promising and already desired by the community. In short I think now is the time to pick the brains of the Perl community and see what they think about all this and perhaps have them lead me to the best path.

There is still a good degree of reading to be done to get started as a lot of the code in the guts of Perl are akin to mystic runes due to the Perl specific conventions, functions and data structures. I think I will align with the optimization opportunities that involve C rather than not.

Quest accepted. On route to the final boss. Need to gather allies on irc. End log of an SPO600 player until next time~

by hzhuang3 at March 16, 2015 07:13 PM

Jan Ona

SPO600 – Project Phase 1: Identifying a possible Optimization

For the main project of my SPO600 course, each student were given a task of finding a possible optimization for a software within the LAMP stack. For this, I’ve decided on MongoDB.

From among the possible Software bundles, I found MongoDB’s developer site to be really accessible, where you can browse known issues with details pertaining to each one. An added benefit is that the software is mostly written in C++ (79.7%).

Initial Optimization: Optimize btree key generation:
This particular issue was raised to the community quite awhile ago. It focuses on a problem where extracting a big amount of content(>10 000 elements) from a nested object field takes a performance hit.

After looking into this Issue, I realized that it was set to be fixed on the March 16th 2015 update with a member assigned to it. With less than a day to make the optimization and another user working on the problem, I’ve decided to use my secondary option.

Optimization Option #2: ISSUE 4960: getShardsForQuery():
This optimization focuses on further optimizing an updated version of the getShardsForQuery. This issue seems to have been brought up because the patch creator thought that there are a number of improvements that it can undergo. The issue’s comment list also contains a few valuable information that might be of help for understanding how certain pieces work.

Moving Forward:

The biggest priority for me right now is to understand the pieces of the software and how they interact. Once I have a better understanding on how the software works, I plan on contacting the user that made the initial patch to get more information about the issue.

Upstream Community:

Details with the guidelines here.

There are 2 requirements for sending in a patch:
#1: Patches are sent via pull requests in Github. This pull request will be named after the SERVER ticket of the issue being patched.
#2: A contributor agreement must be signed. This should also include the Github username.


Due to the uncertainties of the project, I felt that this phase was really difficult to find a possible software that I can handle. The learning curve with the software is also very unpredictable, not knowing how much of the software is needed to be understood to properly make a contribution.

Much like with my initial optimization decision, the possibility that the community might take up the the optimization before I make my own improvement is very concerning.  But for the duration of the course, I hope to gain learning experience from this project.

by jangabrielona at March 16, 2015 05:29 AM

March 15, 2015

Catherine Leung

Using git and github in your classroom

I wrote this for the ACSE mailing list about year ago and have passed it onto my colleague when they ask about using github for their classroom.  At the encouragement of another colleague, David Humphrey, I’m going to post it to my blog to make it more accessible.

Using git and github for your classes

Firstly it is important to distinguish between git and github.

git is the open source distributed revision control system used by the Linux project. It lets you create a history of all your code.  You can send git repository around as zip files if you like.

github is a company that creates a nice place for you to put these repositories into the cloud so that you can much more easily sync and share these repositories without having to set up servers or send around giant archives.  It also provides things like access control, issue tracking, wikis, activity graphs and so on.  Very handy.

As this post is long, I’ll start with some links for quick reference on where to find/get stuff.  After that I have a section specifically for dealing with github, working together and git itself.

git client download:
git tutorial:
github gui:
tortoise git:

About Github

github  has 2 types of accounts: individual and organization.
As a teacher, will need to make for yourself an individual account and an organization account.  Any individual account can belong to zero or more organizations.  For myself, I have quite a few different organization accounts.  One for each of my courses and one for my research group.

It should be noted that both types of accounts are free to use if your repositories are open source.  I have been using github as a place to put my in-class examples for students…and that is open source so I don’t have to pay for doing that…you can check out the repo for my data structures and algorithms course here.  I use the repository itself for code examples, the wiki for assignments and course documents (minus the notes… I prefer gitbooks for that).  I have not yet tried to do this consistently but I think the issue tracker in this case can also serve as a notification and discussion board.  You can check out the repo here:
github makes money by essentially charging for privacy so under normal circumstances they would charge some amount for some number of private repositories.   You can see their pricing plans here:

However, as an educator, what you can do is request for an organization account with X number of private repositories at some discount.    You can do so at this site:

Typically they will want info about your school, number of students and so on.  The number of private repositories depends on how many students and how you plan to use the repositories.  So for example, if you have 20 students and they work in groups of 4, then you will need 5 private repositories.  However, if you have 20 students and they are working individually you will want 20 private repositories.   If you want both team and individual repos for some group work and some individual work, then you would need 25.    I also suggest asking for a couple of extras for yourself or for class material that you don’t want to open source.  In any case what you want to do is set it up so that your organization can have some number of private repositories that will suit your needs.

Aside from you setting up an account, your students must each individually create a github account.  Once that is done, have them send you their github ids.  They can get a free one.  They don’t need to pay for it as the private repos will be given to them through the your organization account.

Once you have created your organization, you can go to your organization settings and create teams.  A team consist of 1 or more individual.  As the administrator of the organization you will be able to see all the repos so if you forget to add yourself its no big deal.  Each team also has different access permissions.  Typically I would set up a student team for write access but not admin access.  If you are only planning on doing group work then set up each team so that all members are part of that group.  If you plan to do individual work or a mix of team and individual work then you will want to set up a team for each student.

Once you have the teams set, you can then create private repositories.  You can initialize them with an empty readme (or push some code up by following the instructions).  In any case once you have created those repos, you can give access to 1 or more “teams”  by hitting the collaborator button in the settings tab.
Only teams with access to private repos can see that repo.  So its a great way for groups to all contribute to a project because you can limit who sees what.  As a teacher, if each student actually submits their own stuff (and not email around and have one person do it) you can actually see their commits down to the line they put in through the blame button.

About collaboration

Getting students to unlearn the code collaboration method they had been working with for years is a definite challenge.  Putting in some best practices may be a good idea.  Here is something I wrote up for the students in my research team to help them do this:

To ensure that we are not going to step all over each others code, these are the steps we will use when putting code into  the repository:

  1. All commits should be accompanied by an issue. If there is no issue for your commit, please make one first!
  2. All commits should be put into a branch matching the issue number. Thus, if the problem the code is solving is described in issue 2, then the branch should be named issue2
  3. before pushing your code to github make sure you merge in the most recent code on master to your branch and resolve any conflicts
  4. push your code into the remote issue branch (if you want to work off your own fork then thats fine but issue branch on your fork)
  5. submit a pull request for the code that you have submitted but DO NOT merge that code yourself.
  6. assign someone else to review your pull request
  7. if you are assigned a pull request, test out the pull request on your local machine BEFORE accepting it. Do not just click the button. If there are any problems, comment and let the submitter know and have them fix it. If all goes well accept the pull request to merge into master

There are other methods out there… but you probably will need to invest some time to teach whatever hat method is.

About git and git clients

git is a revision control system and it is part of all linux distributions so if you are running linux, you already have git installed (unless it is a really really old linux release).  For OSX git is installed as part of command line tools for xcode (if I remember correctly… on mavericks the first time I tried to type git at the command prompt, it installed those tools for me).  For windows, you can download it from:

The above are command line tools for using git… and this is all you need for git and github.

You can learn to use git with this interactive tutorial:

Everyone tells me that gui’s are awesome… so github does provide some fancy gui clients for git (

Tortoise git is what one of my student use to swear by… so that may be worth looking at (

I tend to prefer the command line myself so I can’t speak to either unfortunately.

Anyhow, hope this is useful to others trying to figure this out.  Let me know what would make this even more useful for you

by Cathy at March 15, 2015 06:52 PM

Justin Grice

SPO600 Project Part 1 – Initial Testing

As part of the SPO600 course, we were given the task of choosing a part of the LAMP stack that can be optimized and improving it. I’ve decided to review the PHP portion of the stack.

I initially decided to attempt to find a part of the PHP code that contained optimizations for x86 architectures and didn’t have optimizations for ARM and AArch64. In order to do this I had to first retrieve the PHP source code. Looking at the PHP development documentation they provided, I attempted to clone the git repository for PHP using ‘git clone'. I then proceeded to build  the source using ./buildconf, ./configure, and make. After spending some time attempting to resolve various build issues, I finally managed to install a working build into a test directory.

At this point I decide the easiest method to find existing optimizations for x86 was to search for inline assembly code. Using grep -r asm ./*f  on the php-src directory resulted in a list of files containing some inline assembly. I decided to look at the /Zend/zend_operators.h file. The file contains a number of functions for performing math operations. The 4 methods containing inline assembly I decided to work with are:

  • fast_increment_function
  • fast_decrement_function
  • fast_add_function
  • fast_sub_function

These functions perform either increment/decrement or addition/subtraction on values. By default they perform there operations with double sized variables, but contain inline assembly for x86_64 and i386 architectures that performs using long sized memory. I decided that implementing this functionality using inline assembly for ARM architectures would be an ideal location to improve.

When researching the zend_operators file in the bug tracker for PHP I came across another person suggesting a similar improvement to this file 2 years previously, but no changes had resulted from the discussion. Some community members had suggested the removal of the x86 inline code entirely. This lead me to come to the alternate optimization of removing the inline assembly and testing if build optimizations can be used. I plan on using this method if writing inline assembly doesn’t net a performance increase for ARM systems.

My next steps are to start writing the inline assembly and benchmarking it to see if performance is improved. If there seems to be no improvements, I will start looking into build optimizations.  I also plan on communicating with upstream to see if they have any suggestions.

by jgrice18 at March 15, 2015 04:54 PM