Planet CDOT

July 28, 2016

Laily Ajellu

Using Multiple classNames for One Element - React

Adding many css classes to an element can be very handy. You can combine styles while keeping your code from becoming redundant. For example, how can you implement a button that needs two classes? There are three methods:
  1. Using className={} and style={}

  2. Using cx(), an old React function that's now depreciated

  3. Using classNames(), a new function by JedWatson recommeneded by React

Using Only one className

You can use one classname and add more css using the style attribute. But this way there are two different places for css, making it messy.

Using React's classSet - Now Depreciated

Although this version is depreciated, it's interesting to investigate how it works because it's still in some "legacy" code. (And by legacy I mean just a few years ago.)

We import the function cx which takes as many classnames that you would like to add. Simply pass the classname to the function.

What is 'styles'?

Notice we've also imported styles from ./styles.css.
This imports the styles object from a css file you've written. The styles object doesn't need to be explicitly declared. It contains all the classes declared in the css file, without needed to wrap them in anything.

If we use console.log, we find that styles.zoomForm is a long string that represents the path to the style.

We can even explicitly use the string to prove that it will work.

Using JedWatson's classNames - Recommended by React

As mentioned in the: React Documentation

This is the package that is the most recent solution to classnames and should be used by all React developers. It's usage is very similar to cx. Find the usage here: JedWatson classNames

The great thing about this is that you can use conditional classNames by using an object to wrap the classname (using key-value syntax) and a bool.

You can even store classnames in a prop and dynamically add them!

Here we're defining two custom props, color and circle, using propTypes and defaultProps.

In _getClassNames(), we first extract color and circle from this.props (so that we don't have to write this.props.color).

Then we define an empty object propClassNames with two attributes.

The first attribute is a css class styles.default (resolved from styles[color] - see defaultProps), set to true.

The second attribute is a css class, set to false by default (see defaultProps)

In render(), we extract className from this.props.

className is a not a custom prop, it's an attribute that can be used on any tag like so.

Back in render(), we then return our component and pass three classnames, two from this._getClassNames() (default & circle) and one from className (prevSlide in this case). Giving our element all three styles!

For more tutorials of the different methods, see: Eric Binnion's Post and Randy Coulman's Post

by Laily Ajellu ( at July 28, 2016 05:32 PM

Introduction to ARIA for HTML

Why care about Accessibility?

Have you ever tried to use a website with your eyes closed, or with the screen turned off? You have no context of what is going on or what you’ve clicked. People with disabilities use screen readers - apps that read out the screen to you.

In the beginning it can be a nightmare of overlapping words and vague descriptions like “button”, leaving you with no idea what the button does. But a properly coded website labels its buttons and other components so you hear something like: “signout button” instead.

Isn’t that clearer?

Who is the Target Audience?

  • vision-impaired users

  • dexterity-impaired users

  • users with cognitive or learning impairments

How do I Start Coding?

ARIA - Accessible Rich Internet Applications provides a syntax for making HTML tags (and other markup languages) readable by screen readers.

The most basic aria syntax is using roles. Roles tell the screen reader what category the tag belongs to - eg. a button, menu or checkbox.

Using Roles

In HTML, use elements with specific usage. Don’t just use a div if you need a checkbox, that way it will have some accessibility features already built into it.
eg. <input type=”checkbox” role=”checkbox”>
not <div role=”checkbox”>


  • The role of the element more important than the HTML tag it’s on

  • Do not change a role dynamically once you set it, this will just confuse your users

What’s Next? Establish Relationships

These are the aria attributes that establish relationships between different tags:
  1. Aria-activedescendant
  2. Aria-controls
  3. Aria-describedby
  4. Aria-flowto
  5. Aria-labelledby
  6. Aria-owns
  7. Aria-posinset
  8. aria-setsize

Aria-describedby & Aria-labelledby

  • Explains the purpose of the element it’s on
  • Most commonly used, and most useful
  • Create a paragraph tag with the label/description info and place it somewhere off the page

CSS recommended:

Code Example


  • Shows which child is active
  • Must be on a visible element
  • Must be someone’s descendant
  • Or must be owned by another element using aria-owns

eg. on a textbox inside combo-box

Code Example


  • If you click or change the value of one element, it will affect another

Eg. If you click a button Add, number will be increased by 10

Code Example


  • Indicates which element to look at/read next
  • Doesn’t affect tab order
  • Only supported by FF and ie
  • Reads flowto only when you press = key so it’s not very useful
  • Can flow to > 1 element

Code Example


  • Indicates who the parent of a child is
  • Do not use if parent/child relationship is in DOM
  • A child can only have 1 parent

Code Example

Aria-posinset & Aria-setsize

  • Indicates the position of an item in a set
  • Don’t use it if all the items of the set are already present (browser calculates)
  • Number of items in the whole set

Code Example

Change Aria Properties Dynamically (except Roles!)

  • Eg. Aria-checked on the chosen checkbox
  • Keeps the user up to date with page changes

Make Keyboard Navigation Intuitive

  • Enable navigation using up and down arrow keys
  • Enable select with space and enter

Review of Aria Process

  1. Choose HTML tags that are more specific to your needs
  2. Find the right roles
  3. Look for groups and build relationships
  4. Use states and properties in response to events
  5. Make Keyboard Navigation Intuitive


by Laily Ajellu ( at July 28, 2016 03:50 PM

Matthew Marangoni

Accessibility and Key Handle Events

In many cases for both accessibility and convenience, a user should be able to navigate through a menu using only the keyboard. To meet ARIA specifications, this is a requirement as not all users are capable of using a mouse to make selections.

As stated in the ARIA documentation:

 "Navigating within large composite widgets such as tree views, menubars, and spreadsheets can be very tedious and is inconsistent with what users are familiar with in their desktop counterparts. The solution is to provide full keyboard support using the arrow keys to provide more intuitive navigation within the widget, while allowing Tab and Shift + Tab to move focus out of the widget to the next place in the tab order.
A tenet of keyboard accessibility is reliable, persistent indication of focus. The author is responsible, in the scripts, for maintaining visual and programmatic focus and observing accessible behaviour rules. Screen readers and keyboard-only users rely on focus to operate rich internet applications with the keyboard."
Within the setting modal, we use a menu that is always being displayed to the user, and each submenu is set as an ARIA menuitem by giving it the attribute role='menuitem'. Within this menu, arrow keys and spacebar enter do not function by default so keyhandle methods must be added manually, and called using onKeyDown within your menu list items.
This requires keeping track of two variables: the active menu and the menu currently in focus. The expected behaviour is for the down key to shift focus to the next submenu, the up arrow to shift focus to the previous submenu, and the spacebar/enter keys to set whichever submenu is currently in focus as the active menu. Each time an up or down arrow is pressed, the focus menu variable must be incremented or decremented, and it is important to remember to keep these variables within the bounds of the menu - you don't want to decrement the variable when up is pressed at the beginning of the menu and similarly you don't want to increment the variable when down is pressed at the end of the menu. Instead logic must be added to enable the user to cycle through the menu from the beginning or end depending on which key is pressed.
Other things to keep in mind are when using both the Tab keys and arrow keys together, you don't want to have your variables fall out of order that are tracking the focus position. Tab will automatically shift focus to the next element in the DOM order, or whatever is set in the tabIndex so it is not necessary to write additional code to set the focus as needed in the down arrow key event. It is required however to add the logic where pressing tab within the menu will increment (or decrement with Shift+Tab) the focus menu variable. Additionally, it is possible for the user to tab out of the menu, whereas with arrows it is not. Finally, logic must be added to handle the case where the user tabs all the way through the settings modal and back to the beginning of the menu (or Shift+Tab's to the end of the menu), the focus should then be reinitialised to the start or end of the menu.

by Matthew Marangoni ( at July 28, 2016 03:03 AM

July 20, 2016

Laily Ajellu

How to pass Value to an Event Handler (React.js)


If you have a drop down menu and you want to call a function when a different option is selected (eg. slide 2), you can use the onChange attrbute.

If you then want to pass that option's value (eg. 2), you don't actually have to pass it as a parameter. The value is passed automatically in an object called event.

event is the object that has all the information about the event, ie. the user chose another option. event has a property called target, which returns the element that triggered the event.

In this case, target refers to the select tag, so gives you the value that was chosen (eg.2)


You may be wondering what bind does and why we need it. bind ties together this and the skipToSlide method, so that this (when inside the method) refers to the select tag rather than the entire object that the method belongs to.

by Laily Ajellu ( at July 20, 2016 09:28 PM

July 18, 2016

Milton Paiva

Missing NTSYSV command

I am looking for alternatives for the ntsysv command, once after the introduction of Systemd on Fedora, the #ntsysv doesn’t do all the job anymore.

Suggestions are welcome.

by miltonpaiva at July 18, 2016 12:30 PM

July 14, 2016

Andrew Smith

Awesome Student Checklist

I vaguely remember there was a time when I had a one-page resume. I also remember that time didn’t last long. I’m baffled when I get resumes how many of them are two bullets long, especially in our industry (software).

Lots has been said and written about how it helps your professional career to work on open source (especially when you’re just getting started), so I won’t spend any time on that. What I want to rant about here is more broad than that.

How is it possible that someone (presumably at least 20 years old) has spent their entire life without doing any of the following?

  • Work on some interesting personal projects.
  • Join or start a club.
  • Participate in an interesting online community,
  • Volunteer.
  • Learn something you weren’t told to learn.
  • Try something you weren’t told to try.

I understand that as a student you don’t have relevant paid work experience. Of course you don’t, that’s why you’re a student. But really, you’ve done none of the above? And you expect me to give you a job? No thanks, I’d spend less time doing the work myself than I would holding your hand and telling you what to do.

Given the number of empty resumes I receive I have to wonder how many of them are actually good candidates who have done all kinds of interesting stuff but were told not to put any of it on their resume, cause it wasn’t a paid job. Let me tell you something – I don’t give a rat’s ass about how much money you’ve made in the past. I’m not hiring a CEO. I’m hiring engineers, who need to have interest in their field. I am looking for people who can think for themselves. I’m looking for evidence that you want to do this kind of work, and you have at least tried to do something independently. If you have that – at least there’s some hope that you’ll do well on my team. Without that – forget it, don’t bother applying.

As an example, here’s the type of resume I get excited about. Yours might be a page and a half instead of 4. But notice even with a decade of experience how much of my resume is various unpaid work. It’s all relevant! If from all that there are two things that jump out at an employer – that will put you two steps ahead of someone else.

There are lots of students out there. Show me what makes you awesome, and don’t pay attention to people who tell you to have a one-page resume. Those people don’t do any hiring. If they did – they would punch themselves in the face for giving such terrible advice.

by Andrew Smith at July 14, 2016 05:50 PM

July 12, 2016

Yunhao Wang

Pure CSS Cube




#wrap {


border:2px solid black;

transform:translateZ(-100px) rotateY(180deg);
transform:translateX(-100px) rotateY(-90deg);
transform:translateX(100px) rotateY(90deg);
transform:translateY(-100px) rotateX(90deg);
transform:translateY(100px) rotateX(-90deg);

@keyframes cube{

by yunhaowang at July 12, 2016 01:07 AM

July 07, 2016

Jaeeun Cho

Dropdown list for setting menu

I'm working on the dropdown list for setting menu.
When a user clicks setting button, it will be shown as dropdown list, not modal window.

So, I googled it to figure it out how to work at React and examples.
Well, I found lots of examples and most of them are provided as a package like react-menu, react-dd-menu, rc-menu, and others.
I downloaded it to check how it works.
But everything that I set up on my test server, they gave me an error.
So I couldn't do with that.

My times were totally waste of times.

Somehow, I did implement that the dropdown is shown when I click the button.
But, this dropdown is shown at the bottom of the window.
So, I'm studying animation on React(
and figuring css out with this example (

by Jaeeun(Anna) Cho ( at July 07, 2016 10:19 PM

Matthew Marangoni

Keyboard Navigation in Settings Modal using React

I've been working on the setting Modal to enable keyboard controlled navigation in the event a user cannot use a mouse. Tabbing controls have already been completed, but other key functions are still a work in progress. A user should be able to cycle through the list of menus using the arrow keys, and select that menu by then pressing the 'enter' or 'spacebar' keys.

At this time I am able to cycle through the menu with arrows, but this currently also sets the menu it cycles to as the active menu at the same time. I'm having difficulty making the arrows only focus on the menu list elements rather than make them active, which is a result of my limited React experience. Additionally, the spacebar and enter keys currently only have limited functionality in the settings modal - they work everywhere except the menu list.

I am currently studying the React docs to determine the best way to implement making these keys functional. The documents I have found to be most helpful so far are listed below:

by Matthew Marangoni ( at July 07, 2016 09:13 PM

June 30, 2016

Jaeeun Cho

Complete my PR and logout confirmation modal.

My PR was finally merged yesterday on git repository. To my shame, I didn't know well the way that I implement the code simply and exactly. I also had mistake like missing space or semicolon, although I did check my code with lint. I should learn and study about that according to practice coding many times and taking a look others' code. And I should be more methodical person about my work.

I implemented a confirmation modal for logout.
In current code, a modal is already used for Setting menu. So I tried to use the same function and style with it. However, a modal for confirmation was shown behind of Setting menu, not in front of it and I couldn't click any button on confirmation modal.So I implemented another modal for confirmation.

This modal is shown according to double click of Leave Session menu. Setting menu lists are called from the array(submenus) so I compared with class name of menus, not index. Because the menu order can be changed.

Although my PR was merged, I'm going to test my logout process.

by Jaeeun(Anna) Cho ( at June 30, 2016 09:38 PM

Matthew Marangoni

Style Conflicts with Screen Readers

As the settings modal nears completion, a few changes were made to better adapt to screen reader users. While adding screen reader descriptions to all the elements, it became apparent that some of the interact-able portions of the settings modal were redundant and could be removed. Some of these elements included the "Done" button of the Video menu (it served no apparent purpose - same function as other "Done" button in the menu), and the lock layout option in the Participants menu (the HTML5 client will not support user adjustable layouts).

The font size adjustment bar in the Application menu had to be reworked - the way it was currently styled was causing the font-bar to not fit perfectly within the Application submenu and added unnecessary scrolling as a result. The font-bar still has an issue with the way the + and - resize buttons are being styled, and it appears this issue can only be fixed by fixing the icon itself as well as the way the button element handles those icons. Currently unnecessary padding is being added around the span that contains the icon as this is the default behavior for all other buttons, but in this case does not match the design styling. See below:

 The menu options on the left of the settings modal, as well as some elements in the right menus require being reworked as well. Currently, they are written as unordered lists, with list items. Normally this would be fine, but because a screen reader will detect these list elements they will read out to a blind user as a list, which may confuse the user. As a result, any list items that are not actually meant to be lists must be converted into styled div containers instead. This is much more tedious than making a list, but allows for more control over the element behavior.

by Matthew Marangoni ( at June 30, 2016 03:57 PM

June 23, 2016

Matthew Marangoni

First week at CDOT

During my first week at CDOT, I completed varying setup tasks. Initially, I had to salvage computer parts from other unused machines and assemble my own. This was fairly easy, the only difficulty was finding a adequate graphics card as most machines were missing them already. Once I had my machine built, I then had to format the HDDs and install Windows 7 on my machine (this took a considerable amount of time as there seemed to be endless windows updates, driver updates, and reboots) but it did eventually finish the following morning.

Once complete I was able to begin the initial setup of my VM environment which involved tweaking a few NAT configurations and then installing Ubuntu 14.04 in my VM environment. All of this seemed to work just fine so I proceeded with the BigBlueButton install. This install went reasonably smooth but did take a day or so to complete. I followed the instructions from the BBB docs step by step, but there were a few slightly outdated steps which gave me unexpected results that were known to my colleagues, but not yet documented. Some portions of the BBB install took longer than others, so I used this time to review other documents (BBB HTML5 design, development, etc) to better familiarize myself with the Big Blue Button project. Following this, I then proceeded to set up the BBB developer environment with no issues.

Currently I am working on a few tutorials to get up to speed with the BBB project and working environments. I just completed the Meteor tutorial and am about to begin reading over a bit of the Meteor documentation. Later I will move onto the REACT, ES6, and Mongo tutorials and documentation.

by Matthew Marangoni ( at June 23, 2016 03:58 PM

Screen Readers & Browser Compatibility

After beginning to add ARIA labels and descriptions to the settings menu content, I soon ran into a problem where although all my attributes were being added to elements correctly and could be seen in the DOM, at best only partial information was being read back by the screen reader (at the time, I was debugging in Chrome using ChromeVOX). I decided to start with the bottom layer of the application which contained the settings button that opens the modal as it had the least amount of complexity. I was able to add ARIA features to this button easily, but any features I added into the settings modal were not being spoken back to the user.

My first thought was that aria-labelledby was working better than aria-describedby and that I could use one over the other (they function almost the same), but both would be needed regardless and still both were not working in a few places so this was not a solution. Later I thought perhaps my content wasn't being read because the referenced div elements were not in the right locations and weren't being seen, so I moved the containers around in various places through various files again to no avail. I then tried changing the CSS class for the containers which were to hold aria labels and descriptions to see if perhaps the way I was hiding these elements from view was causing it to go undetected by the screen readers. Still this did not fix the problem, although I was able to find a better method to hide content using CSS. That information can be found here:  and the CSS alone is:

.ariaHidden {

After days of rewriting my code and searching for solutions and best practices, I came to the conclusion that my code was not the issue, everything was following the aria standard guidelines correctly. I decided to try debugging using a different browser and different screen reader to isolate the issue and lo and behold, everything worked as intended on Firefox with NVDA. The issue all along was with the screen reader and browser compatibility.


As it turn out, Chrome has the worst ARIA support compared to Firefox, IE, and Safari (I'll be debugging in Firefox from now on), and while the ChromeVOX extension is nice, its still very much a work-in-progress and falls short of other screen readers like NVDA and JAWS. If you'd like to see which browsers have the best ARIA implementation, this document does a good job of detailing and visualizing that.

Now that my content was being read back to the user, I could finally start making some progress with accessibility. I soon ran into a new issue however. Certain elements within the settings modal were reading out the word "section" multiple times in quick succession. I couldn't determine which elements these were coming from as I hadn't added the word section to any elements so I deduced that the screen reader was reading out empty div containers that were being used for styling only.

According to ARIA spec, the proper way to hide an element from the screen reader is to use the property aria-hidden="true". I tried implementing this in every surrounding div container I could think of where I was experiencing the issue, but once again nothing was solving the problem. Luckily I found an article that described the exact issue I was experiencing, along with the solution. Once again, this comes down to the issue where ARIA is not equally supported across all screen readers and browsers, and using Firefox with NVDA does not support the aria-hidden="true" attribute (ironically it would have worked fine with Chrome and ChromeVOX and I would have never realized this was an issue were I still debugging with it). The alternative is to set the role to presentation as such: role="presentation". ARIA describes the presentation role attribute as:

presentation (role): An element whose implicit native role semantics will not be mapped to the accessibility API.
The intended use is when an element is used to change the look of the page but does not have all the functional, interactive, or structural relevance implied by the element type, or may be used to provide for an accessible fallback in older browsers that do not support WAI-ARIA

This method also does not work for every browser and screen reader combination, so the best solution is to include both aria-hidden="true" and role="presentation" anywhere you will be using an element whom's only purpose is to style the page. The article which details this problem and solution further, and provides many test cases can be found here:

I will continue working on making the settings modal accessible, and documenting any issues I come across along the way.

by Matthew Marangoni ( at June 23, 2016 03:55 PM

Jaeeun Cho

My first pull request at git repository.

I worked with HTML5 client logout from the second week of Jun.

After finished to implement, I tried to send pull request at git repository on last week.

However, I had some problem log-in part after I did git stash and fetch, merge from upstream/master branch.

According to console log, the meeting room could not be created and the 'validated' column of Users was set to false. When user logout, the user could not log out normally.

I could not find where the error was for the first time. However, I compared with the file from my laptop, I found the error. The error was in eventHandler.js at meeting.

Before merge with upstream/master, the code was..

eventEmitter.on('get_all_meetings_reply', function (arg) {

After merge with upstream/master, the code was changed to

eventEmitter.on('get_all_meetings_reply_message', function (arg) {

This message is coming from MessageNames at akka-bbb-apps, the version between mine and server was different.

After checking everything, I sent pull request at git repository for the first time.
I got comments a lot about my PR and I fixed everything.

Some of them was that I made files unnecessarily.
I divided every function in different files, especially the functions related to clear session and set the location when user logout.
But I put it together in Auth. 

I also set up dev_1.1 of bigbluebutton development environment.
And I'm implementing to open confirmation box when user do double click "Leave session"

by Jaeeun(Anna) Cho ( at June 23, 2016 02:51 AM

June 15, 2016

Laily Ajellu

Adding Intractability to React


When an emoji is chosen, it must be set to the user dynamically. Because it needs to be dynamic, we need to hook into the React lifecycle so that whenever there’s a change it can be updated automatically.

My class structure:

Menu - generic
        MenuItem - generic
EmojiMenu extends Menu
        EmojiMenuItem extends MenuItem

Problem 1 - Whose State? :

Whose state should be changed?
EmojiMenu (parent) or EmojiMenuItem (child)?
Initially I thought it should be EmojiMenuItem’s state ( a bool, isChosen ) so that it can manage its own resources. To change its state from within the it’s parent’s method you need to use refs.

Solution 1 - It should be Parent’s State:

After reading this React Doc - More About Refs

“If you have not programmed several apps with React, your first inclination is usually going to be to try to use refs to "make things happen" in your app. If this is the case, take a moment and think more critically about where state should be owned in the component hierarchy. Often, it becomes clear that the proper place to "own" that state is at a higher level in the hierarchy . Placing the state there often eliminates any desire to use refs to "make things happen" – instead, the data flow will usually accomplish your goal.”

I realized that my initial choice was probably the wrong one. I could easily set the state for the parent ( theChosenEmoji : “nameOfChosenEmoji” ), simplifying the code.

Problem 2 and Solution 2 - Use refs this time:

Now I also wanted to add the attribute aria-checked to each EmojiMenuItem for accessiblity. In this case, it was clear that refs need to be used because aria-checked is an attribute EmojiMenuItem tag.

In my render() function:

In my click handler:

by Laily Ajellu ( at June 15, 2016 08:35 PM

Matthew Marangoni

Settings Accessibility Features

I've been studying the various accessibility requirements - ARIA-related and otherwise, and have begun implementing some of these features into the settings modal. The main accessibility features that must be included are simple keyboard navigation, tooltips, and descriptions for each element that is detectable by screen readers (this also requires modifying a few elements to include ARIA menu attributes).

Initially I had thought navigating the options sub-menu with arrow keys would be the easiest method, however I later realized that this would also require the user to have good vision thereby making it no longer accessible. Instead the entire menu can be now accessed via the Tab key. I am currently working on an issue where the tab key is not ignoring the background elements and hope to have that fixed soon (I have isolated this to an issue with circle buttons only). Another issue that will be addressed is that the sub-menu list cannot currently be clicked via spacebar (however the contents can).

Once the above is complete, I will be adding descriptions for each options element that will provide a clear understanding of the elements' function for users who require a screen reader. The implementation for this requires adding aria-describedby and aria-labelledby attributes, and positioning the descriptive elements off-screen so they are not visible (this seems messy, but it appears there is no better alternative). Additionally, I plan for each element to have a tooltip shown on-focus and on-hover so that it will be displayed to both mouse and keyboard users simultaneously.

by Matthew Marangoni ( at June 15, 2016 06:53 PM

June 14, 2016

Jaeeun Cho

Working with new computer at three weeks in CDOT.

Tuesday : I got a new desktop computer on last Tuesday from professor and started to assemble my new computer. I assembled my computer and tried to install windows but it was stuck in the first step of installation. I thought computer is very slow to install windows so it had some problems.

Wednesday : I assembled my computer with another parts of computer and tried to install windows again. Fortunately, it worked and checked the update whole day.

Thursday : My computer didn't finish to check windows update when I turned on the computer. So I waited to finish it.

Friday : I started to set up the BBB's development environment on my new computer.
I changed "sudo npm install grunt-cli"at package.json.
Unfortunately, it showed the error when I run "./" at bigbluebutton-html5 directory.

Error was ..

  "react" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/imports/ui/components/user-list/chat-list-item/component.jsx (web.browser)
  "load-grunt-tasks" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/Gruntfile.js (web.browser)
  "react-dom" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/imports/ui/components/chat/message-list/component.jsx (web.browser)
  "react-router" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/imports/ui/components/user-list/chat-list-item/component.jsx (web.browser)
  "history" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/imports/startup/client/routes.js (web.browser)
  "classnames" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/imports/ui/components/user-list/chat-list-item/component.jsx (web.browser)
  "underscore" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/imports/ui/components/button/component.jsx (web.browser)
  "react-intl" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/client/main.jsx (web.browser)
  "react-addons-css-transition-group" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/imports/ui/components/whiteboard/default-content/component.jsx (web.browser)
  "react-modal" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/imports/ui/components/modals/settings/submenus/SessionMenu.jsx (web.browser)
  "react-autosize-textarea" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/imports/ui/components/chat/message-form/component.jsx (web.browser)
  "classnames/bind" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/imports/ui/components/user-list/user-list-item/component.jsx (web.browser)
  "react-intl/locale-data/en" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/client/main.jsx (web.browser)
  "react-intl/locale-data/es" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/client/main.jsx (web.browser)
  "react-intl/locale-data/pt" in /home/firstuser/dev/bigbluebutton/bigbluebutton-html5/client/main.jsx (web.browser)

I just tried to clone my git repository again and my console showed another error when I run "npm install" according to the instruction.

Error was ..
  npm ERR! Error: EACCES, mkdir '/home/firstuser/tmp/npm-112557-pCIYTMRR'
  npm ERR!  { [Error: EACCES, mkdir '/home/firstuser/tmp/npm-112557-pCIYTMRR']
  npm ERR!   errno: 3,
  npm ERR!   code: 'EACCES',
  npm ERR!   path: '/home/firstuser/tmp/npm-112557-pCIYTMRR' }
  npm ERR!
  npm ERR! Please try running this command again as root/Administrator.
  npm ERR! System Linux 4.2.0-27-generic
  npm ERR! command "/usr/bin/nodejs" "/usr/bin/npm" "install"
  npm ERR! cwd /home/firstuser/dev/bigbluebutton/bigbluebutton-html5
  npm ERR! node -v v0.10.25
  npm ERR! npm -v 1.3.10
  npm ERR! path /home/firstuser/tmp/npm-112557-pCIYTMRR
  npm ERR! code EACCES
  npm ERR! errno 3
  npm ERR! stack Error: EACCES, mkdir '/home/firstuser/tmp/npm-112557-pCIYTMRR'
I tried to set up on my laptop on Sunday.
My laptop also had the same problem so I googled the solution.

Finally, I found the solution.
"npm install" was not correct.
"sudo npm install" is correct.

What a stupid I am!! OMG!!

by Jaeeun(Anna) Cho ( at June 14, 2016 03:14 PM

softlock problem in VMware

When I turned on the VMware, it showed "soft lockup - CPU#1 stuck for 23s"

I tried to forced shutdown and then VMware seemed to work correctly.
Unfortunately, my VMware could not find an ip for my remote server.

I tried

sudo service networking restart
sudo /etc/init.d/network restart
sudo service network-manager restart

sudo ifdown eth0 && sudo ifup eth0

But this command showed "No DHCPOFFERS received." and "No working leases in persistent database - sleeping."

Finally I found the solution.
apt-get -o Acquire::ForceIPv4=true update

by Jaeeun(Anna) Cho ( at June 14, 2016 03:09 PM

SVG and Canvas in HTML5

  • It stands for Scalable Vector Graphics.
  • It is used to define graphics for the web.
  • It is also define graphics of xml format so users can use text editor after create the svg images.
  • It is built into the document using elements, attributes and styles.
  • While SVG can be delivered as a standalone file, the initial focus is on its natural integration with HTML.
  • Users can use text editor for svg.
<svg height="100" width="100">
  <circle cx="50" cy="50" r="40" stroke="black" stroke-width="3" fill="red" />

    The <height> and <width> attributes define the height and width of  the <svg> element.
    The <circle> element is to draw a circle.
    The <cx> and <cy> attributes are the x and y axis of the center of the circle.
    The <r> attribute is radius of the circle.
    The <stroke> is the color of the circle line and the <stroke-width> is the border of the line.
    The <fill> is the color of the circle.

    • It is used to draw graphic.
    • It present bitmap images with JavaScript. (pixel based)
    • It is introduced by Apple for Safari, and other graphical widgets.
    • The images in canvas are deleted after rendering to the browser. If the image is changed,  a user needs to re-invoke the image to draw a entire scene.

    <canvas id="myCanvas" width="200" height="100" style="border:1px solid black;"> </canvas>
    var c = document.getElementById("myCanvas");
    var ctx = c.getContext("2d");

    by Jaeeun(Anna) Cho ( at June 14, 2016 03:08 PM

    June 10, 2016

    Kezhong Liang

    Migrate projects from the old GitLab servers to the new one

    My company wanted me to combine the projects from the old GitLab servers to the new one. There are many ways on the internet, I tried several ways, finally I found this method worked well.

    Download projects from the old server, for example:
    # git clone –mirror

    Create new projects on the new GitLab server through the website

    Modify the config file, change the url to the new one
    # cd project1.git/
    # vi config
    repositoryformatversion = 0
    filemode = true
    bare = true
    [remote “origin”]
    url =
    fetch = +refs/*:refs/*
    mirror = true

    Upload the project to the new GitLab server
    # git push origin –mirror


    Import an existing git project into GitLab?

    Filed under: Uncategorized

    by kezhong at June 10, 2016 06:56 PM

    June 09, 2016

    Kyle Klerks

    Building An API: Working With Moving Pieces

    This has been a fun week. Our team is still aggressively working on the API for our project, and with every change comes new bugs because of how a lot of the pieces work off of each other. A lot of back and forth between members, a lot of merging and copy/pasting, etc. etc.

    Doing this has really made me appreciate the forethought we had in a lot of areas in terms of working on documentation and expected results before starting with the code. For example, knowing that if I were to make a request to /api/…/?… I will get back a double array full of data, it makes it a lot easier to plan ahead on what to do with the return from that request, even though that function hasn’t even been built/finished yet. It makes life a lot easier in the long run.

    I’ve also had some experience with why hard-coding numbers into your work is a horrible idea; You should use data models for things like a switch between 0 being UNVERIFIED and 1 being VERIFIED in a database, because if those numbers are to ever change, for whatever reason, you’re going to have a lot of broken code that you’ll have to go back and fix anyway; Might as well do it right the first time, even though it is slightly more tedious.

    On top of that, it’s a lot easier to read a data model and understand VERIFIED = 1 than it is to read code accessing a database saying “STATUS = 1”. Life becomes much easier with some structure, especially working on something of this level of complication. Something as simple as require(‘../statusModel’); can save you huge headaches.

    by kyleklerks at June 09, 2016 08:53 PM

    June 08, 2016

    Laily Ajellu

    Accessibility for HTML

    Roles and Their Different Types

    Roles are very important when designing webpages that are accessible because they are read out by a screen reader for the user. For example, the mistake I had made with my previous emoji design was to use buttons rather than menu items for each emoji choice. The problem with this is that buttons are generally able to be clicked anytime, but menu items are only available when the menu is open. Misusing the roles would confuse your user.

    Below is a diagram of the relationships of the different roles used by Screen Readers and other accessibility devices that read your code found here.

    I used the categorizations listed on the WAI ARIA website to colour code each role to offer a different perspective.

    See the definitions for all the roles here.

    Characteristics of Roles

    These are not roles that you define yourself, these just describe how you're supposed to use each role.

    Characteristic 1 - relatedConcept


    Different roles whose concepts are related (like the progressbar and the status roles) have this property. This gives a better understanding of role to the user. Keep in mind that they don’t inherit properties from each other, they’re still separate roles.

    Characteristic 2 - baseConcept


    If the role is based on a specific tag, it has the baseConcept property. Keep in mind that they don’t inherit properties from each other either. Example: the textbox role is based on the textbox tag.

    Characteristic 3 - mustContain


    Example 1: if an HTML tag has a role of list, then it must contain at least one HTML tag with the role: listitem or the role: group

    Example 2: if an HTML tag has a role of menu, then it must contain at least one HTML tag with the role: menuitem , menuitemcheckbox, or menuitemradio

    If the required child tags are loaded using another script (not hard coded inside the parent) you must set aria-busy=”true” and then set it back to false once the script is done executing.

    Example 1:

    Notes: 1. Just because a parent tag mustContain a certain child tag, doesn’t mean that child tag must be contained in that specific parent.
    2. If a parent tag mustContain a certain child tag, it must contain that exact tag, not a subclass of it.
    Example: list mustContain a tag with the role group. Even though row is a subclass of group, you can’t use row here.

    Characteristic 4 - scope


    This is pretty much the opposite of mustContain. Scope defines what the role must be contained in.

    Example 1: if an HTML tag has a role of listitem , then it must be contained in an HTML tag with the role: list.

    by Laily Ajellu ( at June 08, 2016 05:57 PM

    Matthew Marangoni

    BBB HTML5 Settings Progress

    The settings user interface is now looking much better - the entire interface closely matches what is shown in the design screens which also includes all the submenu content.

    The audio submenu now contains placeholder select lists for the microphone and speaker sources, and a button for testing audio. As the audio codec/plug is not yet implemented, these dropdowns are not actually connected to any microphone or speaker sources, but now at least will require minimal changes once ready.

    Like the audio menu, the select list elements are currently just placeholders. Once the video codec/plugin is implemented, we can used it to populate the select lists with the users camera source.

    The application submenu requires only a few changes now; The checkbox elements eventually will be displayed as an ON/OFF slider (iOS style), and the font size at the bottom needs to be represented as a number rather than extra small/small/medium/large/extra large. The two + and - resize buttons also have a problem with containing extra padding outside the border of the button. This is an issue that it seems can only be resolved by modifying the icon files since they actually contain unnecessary whitespace surrounding the button.

    Just as in the application submenu, the checkboxes here will eventually be styled to show ON/OFF sliders and checked/unchecked icons for the corresponding submenu options.

    I now need to do some research on implementing ARIA which will allow a screen reader to recognize all of the elements in the settings menu in order to make the content accessible.

    by Matthew Marangoni ( at June 08, 2016 02:56 PM

    June 03, 2016

    Laily Ajellu

    Creating an Emoji Button

    Creating a button that opens up emojis in a stack


    1. Adding the button in the right spot among the existing elements on the page
    2. Opening up the stack of emoji images on click
    3. Overlaying the images on the existing elements instead of pushing them aside


    1. Adding the button in the right spot:

    The solution to this was re-rendering the button that opens the emoji choices along with the emoji choices themselves so that they can be placed relative to each other.
    This solution replaces rendering only the emoji choices above the button when it's clicked. (for this you need the id of the element you want to nest it in on the page, and it was to overlap multiple different elements)

    2. Opening up the stack:

    I replaced the generic button in the button's parent div (where it was originally being shown on the page) with my created Component EmojiList, instantiating it without any properties (ie. props):

    export default class ActionBar extends Component{
        <EmojiList />

    Which triggered the React lifecycle:

    Image by A. Sharif

    And finally in the EmojiList component's render I used a regular button, so that it could use it's button like properties while keeping it wrapped with the states and properties I needed:

    export default class EmojiList extends Component{
        <button onClick={this.clickHandler} ... />

    3. Overlaying the emoji choices over the rest of the page

    This was probably the most time consuming of my challeges, but all it needed was:

    position: absolute;

    in the css.

    Next Steps:

    Tomorrow I'll be looking into using the React - Animation package for time delay animation so that each emoji choice is shown a split second after the previous one, giving a fluid look to it.

    by Laily Ajellu ( at June 03, 2016 03:06 AM

    June 01, 2016

    Matthew Marangoni

    Settings Modal Update

    The following changes have been made to the settings modal:

    • added menu borders matching the bbb settings mockup and removed horizontal rule
    • implemented global color scheme to header, submenu and contents
    • added selected submenu highlighting
    • added scalability to settings modal - this was particularly tricky. This required restructuring some of the elements to be wrapped in nested containers with unique properties, having the settings overlay be scalable to 50% screen with a minimum height of 250px
    • added scrollability to left and right submenus
    • added icons to left submenu and made it so they are properly center aligned with the submenu text

    To be added to settings modal:
    • global buttons with custom styling (temporary custom buttons are currently in place)
    • right submenu content to match design mockup
    • replace submenu buttons with proper +- resizing icons
    • icon (3 dots) to replace current button to display settings overlay
    • transition animations when clicking on submenu items

    by Matthew Marangoni ( at June 01, 2016 04:21 PM

    May 27, 2016

    Laily Ajellu

    Creating Github Markdown pages - BBB Documentation

    My newest challenge was to create markdown pages detailing HTML5 Coding Practices and HTML5 Project Structure. Most of the content for the documentation was provided by Daniel Perrone (Really smart guy! Click his name to checkout his Github).


    My first challenge was to install Ruby 2.2.4p230 on my existing VM (using VMware). The problem was that certain components of BigBlueButton are dependent on Ruby, and updating Ruby meant uninstalling using:
    sudo apt-get purge ruby

    (Aside: If you do know how to update to ruby 2.2.4 on ubuntu 14.04 - without uninstalling -, please post a comment below!)

    When I ran the command I was prompted to remove the dependent BBB components, which I didn't want to do. Solving this problem was probably the most time consuming of the whole task.


    So after looking for ways to update vs. uninstall (and failing), I decided to create a VM from a backup. This backup had Ubuntu but no BBB.

    Once Ruby was installed I used
    bundle install
    and finally
    jekyll serve -H
    to have it running on something other than my localhost.

    At this point I was able to preview my changes, so all I needed to do was convert the Google Doc to Github Markdown.
    First I looked for automatic converters to speed up the process and found 2 that work together:
    1. Writage
    2. Pandoc

    The syntax wasn't quite right, but it did format some things nicely for me like bullets, spaces, and special characters.

    Commiting, Pushing and Merging

    Once the syntax was ready, I created symbolic links to both new pages, and followed these steps to update it on the official BBB github
    1. git add ...
    2. git config --global
    3. git config --global
    4. git commit
    5. git push

    Now that the changes were pushed to my own fork, it was time to submit a pull request (Compare and Pull Request Button) and finally merge the changes.

    And voila! My first commit. What an amazing day!

    by Laily Ajellu ( at May 27, 2016 09:03 PM

    May 26, 2016

    Kyle Klerks

    ABCs of Express.js APIs

    Now that the pieces of the puzzle are starting to be finished, it’s time to start putting some parts together. I previously haven’t had any experience working with express.js; that portion of the project was mostly left to my colleague. That said, it didn’t take long to wrap my head around the concept and get to work on gluing the code pieces together under the express framework.

    An example of what an express function might look like is this;

    app.get(‘/api/’, function(req, res) {



    So essentially what’s going on here is the client trying to access the server by going to the URL “some.random.url/api/?id=6” would be redirected to this piece of code, which then executes the function(req, res).

    app.get(‘/api/’, function(req, res) {

    res.send(‘The ID you entered: ‘ +;


    The above code will return “The ID you entered: 6” to the client. Let’s look at how that happens, piece by piece. The “req” variable means “request”. This represents any data coming from the user. In this case, “req” is holding any additional information tacked onto the URL after ‘/api/’. It works very much the same as GET in PHP does, in that you can reference that variable directly, and you will get the value the client included in the request: in this case, ?id=6. The “res” variable stands for “response”. This is what you return to the browser. By saying res.send, you alter the head of the response to the client with “The ID you entered: 6”.

    Alternatively, the function might take the form of:

    app.get(‘/api/:id/’, function(req, res) {

    res.send(‘The ID you entered: ‘ +;


    In this case, we get the same result, except instead of the client saying “some.random.url/api/?id=6”, the client says “some.random.url/api/6”. The only difference server-side in the actual implementation is saying instead of

    by kyleklerks at May 26, 2016 08:51 PM

    May 25, 2016

    Laily Ajellu

    Differences between EcmaScript5 and EcmaScript6

    I have been reading the text "Exploring EcmaScript 6 by Dr. Axel Rauschmayer and have found a lot of useful new syntax, here is a sampling:
    Source: Exploring ES6

    Scoping of 'this'

    In ES5:
    When you have embedded functions, 'this' inside the scope of the inner function refers to the inner function.
    But sometimes we want 'this' to refer to the outer scope. So we assign: const _this = this.


    In ES6:
    Arrow functions don't shadow 'this' in their scopes, so you don't have to assign: const _this = this.
    You can simply use 'this' on it's own.

    Source: Exploring ES6 - Ch. 4.4

    Concat using spread Operator

    In ES5:
    We have to use the concat function on the first array and pass the rest as arguments.

    But in ES6, we can use the spread operator:

    Source: Exploring ES6 - Ch. 4.11

    C++ style methods

    In ES5:
    Methods are just properties, but their value is a function.

    In ES6:
    functions defined within an object's scope are understood as methods

    Source: Exploring ES6 - Ch. 4.12

    Accessing a Base Class

    In ES5:
    You have to use the keyword 'prototype' on the Base class in order to access it's methods and the method 'call' on the Base class to set it's member variables:

    In ES6:
    All you need to do is use the keyword 'super' for both cases:

    Reasons for Reading this Text

    I'm currently working on a tutorial by Meteor which uses the React framework and ES6.

    You can find it here: Meteor using React Tutorial

    And although it's great, the learning curve is very steep for those who only know Javascript, not just because of the concepts, but because of the cool new syntax that's been added in the new release.
    My hope is to bridge the gap using the text.

    by Laily Ajellu ( at May 25, 2016 07:55 PM

    May 23, 2016

    Andrew Smith

    Test/Exam clock

    Sometimes I don’t understand the world. One of the things I couldn’t understand is why it is so difficult to find an online clock that is simple and can be resized. For years I would google “clock” and variations of that, and all I would get back was crappy small digital clocks on pages 80%-full of ads.

    Why is this so complicated? I don’t know. Mozilla had a Canvas demo of how to make a clock, but even that one wasn’t very good, it was not resizeable. So I made my own, and our webmaster added a bunch of JavaScript library garbage, because I guess that’s what web programmers do these days by default.

    Anyway, it still works and here it is: Seneca ICT Exam Clock

    by Andrew Smith at May 23, 2016 05:11 PM

    May 19, 2016

    Kyle Klerks

    Node.js & MySQL – Prepared Statements

    What an exciting first few weeks this has been so far. I’m already learning so much about JavaScript and node.js through working with my project team.

    The task I was most recently tackling was writing an SQL query generator that receives a JSON object filled with tokens and then builds a query around those parameters. Since our code is meant to run on a node.js server, we’re using the mysql module located here.

    Initially when I wrote this function, all it did was parse the data and spit out a string meant to be run as a query. This was kind of an oversight, because the proper way to do it when security is a concern is to use prepared statements. The short version of prepared statements is that you use a ? in place of what would normally be filled in by a parameter. So, instead of;

    “SELECT ” + field + “FROM ” + table + “WHERE id = ” + id;

    You prepare it as;

     “SELECT ?? FROM ?? WHERE id = ?”

    Essentially, the function was returning something like;

    “SELECT column1 FROM exampleTable WHERE id = 1”

    It SHOULD be formatted like this;

    “SELECT ? FROM ? WHERE id = ?”

    So, after some tweaking, I got my function to return an array with the query in first index, followed by a list of parameters to fill in the gaps. I ended up getting objects like this;

    [“SELECT ? FROM ? WHERE id = ?”, “column1”, “exampleTable”, 1]

    Once you’ve got an object like this, you can;

    connection.query(“SELECT ? FROM ? WHERE id = ?”, [“column1”, “exampleTable”, 1], function(error, results, fields) { ….. });

    The query function in this case has 3 parameters; the first being the statement to be prepared, the second an array of parameters to fill in your ?’s, and a function for handling the results of the query.

    I tested the above statement and… oh fun, an error. “‘exampleTable’ is invalid syntax”, the log tells me. What? How is that possible? I log into the database directly and test the statement. No problems.

    After a moment of disbelief and some more tests, I decided to give the documentation for the mysql module another once-over. It didn’t take long to realize my mistake. Glancing over the prepared statements again, I noticed this;

    var sql = SELECT * FROM ?? WHERE ?? = ?;

    Notice that some of the parameters are represented as ?? and not ?. This is because ?? is used when referencing a table or field, and ? is used when referencing a value. Oops.

    So after that blunder, I tested it again and magically I’ve got a big list of results being spit out by the query function. Awesome! Mission Accomplished. Time to go get some coffee.

    by kyleklerks at May 19, 2016 07:46 PM

    May 18, 2016

    Laily Ajellu

    Day 2 at CDOT

    Installing Windows - Day 2 - BBB


    Error 1: No drives were found
    Error 2: Bootmgr is missing Press Ctrl+Alt+Del to restart


    1. For my motherboard, I needed to use the AHCI SATA port type
    2. The SATA ports my hard drive were physically connected to weren't enabled in the BIOS settings, so it wasn't recognizing the hard drives

    My Process:

    Windows installation started (but no option to start from USB/disk) Chose language English.

    Gave me the error: No drives were found


    Following the instructions, I cleaned Disk 0

    But then I found that it was actually the USB drive that I erased! (Which contained the .iso image for Windows7) So I had to burn the .iso onto the USB again using Rufus, but this time with FAT32 because you need to use FAT32 rather than NTFS when using a UEFI bios.

    This time it gave me the error:
    Bootmgr is missing Press Ctrl+Alt+Del to restart

    So I went into BIOS settings by mashing F2, F8, F10, F12 as soon as I turned on the computer.
    In BIOS settings:

    1. Made sure that USB is the primary method of installation
    2. Checked the RAM using the BIOS check (returned okay)
    3. Quick checked the HardDrive using the BIOS check (returned okay)

    Finally I found this link that gave me the solution:

    After I enabled the all SATA drives in BIOS and changed the SATA port type to AHCI, it worked! Next time I booted up the computer it asked me to press any key to start from USB and once I started Windows installation it was able to find the hard drives. The reasons for the fix were:
    1. For my motherboard, I needed to use the AHCI SATA port type
    2. The SATA ports my hard drive were physically connected to weren't enabled in the BIOS settings, so it wasn't recognizing the hard drives

    by Laily Ajellu ( at May 18, 2016 01:15 PM

    My first day at CDOT

    My first day at CDOT

    When I first came in to CDOT today I wasn't expecting to start working immediately, but I was thrilled nevertheless.

    First thing on the agenda, make my computer work.

    The computer I was initially assigned would turn off periodically, for unknown reasons (at the time) so CDOT decided to give me another CPU. Turns out, it was one that used to be used as a server and so it didn't have a graphics card (It's a Z440 HP). So we took out the graphics card of the faulty computer and seated it into the server, hooked everything up and booted it up.

    Lo and behold, the monitors didn't connect and after a few seconds of booting the computer starting sounding 6 BIOS beeps and showing 6 red flashes on the motherboard. I got on Google trying to figure out what the problem could be.

    First I found this link:

    HP support

    Which was as nice first attempt, but no cigar.

    Next, I hit the jackpot:

    BIOS Beep Codes

    Using CTRL-F to look for "6"...

    And fixed!

    Such a great day...

    by Laily Ajellu ( at May 18, 2016 01:11 PM

    April 22, 2016

    Berwout de Vries Robles

    Update on phase 3

    Quick update:

    I got a reply to my github pull request. Sadly it was not what I had hoped for. The package manager replied that they want to give users full control of their compiler optimizations and not force anything on them.

     As there is theoretically no downside to applying atleast a base level of optimization, this argument seems a little bit strange to me.

    The way I would do it is give the user a base level of optimization, so that users that have no knowledge of compiler optimizations get the benefit in the significant speedup without having to do anything. Advanced users could then opt-out of the optimization if they want to. I realize that this is more work for advanced users. However I think the situation where the user opts out of optimizations would come up very rarely. Whereas the situation where the user should have applied some degree of optimization, but did not because the user didn't know he should have, is more prevalent.

    by Berwout de Vries Robles ( at April 22, 2016 05:58 AM

    Andrei Topala

    SPO600 Project: Phase Three and Conclusion

    There is a saying attributed to Edison that goes something along the lines of “I have not failed–I have just found 10,000 ways that don’t work.”

    The goal of the project was to optimize a selected open-source compression or checksum package and produce a meaningful improvement in performance. I chose lrzip, partly because it seemed interesting and partly because its creator is a physician, which is a career that I had at length considered myself back in high school before deciding that hospitals and life-and-death situations are scary. This might not have been the best choice (nor, I guess, the best reason to make a choice) because, well, lrzip looks pretty good. I could find no area of optimization that would amount to a substantive improvement. Whenever I thought I had come across an opportunity, it turned out not to be the case at all.

    Optimization, I’ve come to realize, is pretty hard. Conversely, making a program run worse, even unintentionally, is very easy.

    In phases one and two of this project I noted that lrzip crashed on AArch64 platforms when compressing a file with ZPAQ due to ZPAQ’s use of x86-specific JIT code. I thought to translate the x86 instructions into ARM instructions, so that JIT code could be used on both architectures, but that was too tall an order. I chose instead to focus on the C++ code that is used when JIT is unavailable, to try to optimize it and bring its runtime closer to that of the JIT version of the program. Also, I examined the LZMA code in the package (as LZMA is the default compression algorithm lrzip uses) and tried to optimize that as well.

    Now, let me count the 10,000 ways.

    The first order of business was to profile the program in order to see which functions took up the majority of the execution time. gprof was the tool for this. First I built the package with the -pg flag, which generates extra code to write profile information. This I did by setting it as one of the C, C++, and linker flags when calling the configure script:

    [andrei@localhost lrzip-master]$ ./configure CFLAGS="-pg -O2" CXXFLAGS="-pg -O2" LDFLAGS="-pg"

    Then I built the package using make (made the package?) and then ran lrzip on a 500MB text file.

    [andrei@localhost lrzip-master]$ ./lrzip text
    Output filename is: text.lrz
    text - Compression Ratio: 3.937. Average Compression Speed:  2.041MB/s.
    Total time: 00:04:04.53

    This generated a gmon.out file in the directory, which lets us examine lrzip using the gprof tool. I used the following command to create an image from the profile information:

    [andrei@localhost lrzip-master]$ gprof lrzip | gprof2dot | dot -Tpng > image.png


    We can see from the graph that the program spent the vast majority of its time in the function GetMatchesSpec1, which is defined in the file lzma/C/LzFind.c. The graph tells us that the function was called 200 million times, so I figured that any marginal improvement I could make to the function would result in a significant gain in performance.
    The function uses two Byte pointers to move along the stream of data read from the text file, and two UInt32 pointers (typedef’d as CLzRef) to keep track of matches from the data in a circular buffer.

    CLzRef *ptr0 = son + (_cyclicBufferPos << 1) + 1;
    CLzRef *ptr1 = son + (_cyclicBufferPos << 1);

    Since the two pointers point to adjacent data locations I tried to define one in terms of the other (removing the “+1” part of ptr0’s definition then setting ptr1=ptr0++), but it had no effect, and I think the machine code generated would be the same regardless.

    if (++len != lenLimit && pb[len] == cur[len])
              while (++len != lenLimit)
                if (pb[len] != cur[len])

    This I rewrote as a single while loop, but as I said in phase two it had no (or a negative) effect on the executable. The objdump revealed that both this code and my edit resulted in identical instructions with only a slight change in layout.
    I also rewrote this section of the code entirely to check multiple bytes at a time by XORing chunks of data from pb and cur, but even before fine-tuning the code to set len to the proper value after the first different chunk of data was found, I realized that the overhead of doing all this made the program significantly slower, so I abandoned that course of action. (For the text file, the loop only ran a few times with every function call, so checking very large chunks of data would be counter-productive).

    After that, I tried to see if I could find something to optimize in LzmaEnc_CodeOneBlock, which is in lzma/C/LzmaEnc.c. I couldn’t, at least not beyond that failed experiment with memmove that I wrote about in phase two.

    So, LZMA I couldn’t make better. I then went back to ZPAQ. Here’s the profile information for lrzip when it’s invoked with the -z flag after having been compiled with -DNOJIT to turn off the JIT code:


    Both update0 and predict0 are structured as a for loop that uses a switch statement to check the value of an element of an array, which gets its value from an element in the block header array of a ZPAQL object, which represents ZPAQL instructions. I couldn’t find any way to optimize the functions without breaking the logic of the algorithm. The only thing I could do is replace some multiplication operations with bitshifts, but that didn’t affect the runtime (I think the compiler does that as well, if it can).
    The execute function, which consists of a switch statement of over 200 cases, only takes up 3% of the execution time, and as much as I tried to optimize it I couldn’t get it to run more quickly by any measurable magnitude, so that didn’t work out either.

    My attempts to edit the source code having been unsuccessful, I tried to see if any compiler options would make the program go faster, and, indeed, -Ofast (which enables -ffast-math) created an executable whose LZMA compression consistently outperformed the default -O2 executable by about 1% without introducing any errors into the compressed file. -ffast-math, though, is probably not something one would want in a compression tool, and there might be potential for mathematical errors somewhere in the code. Besides, the 1% difference, though I tested it quite a few times, might just have been a coincidence.

    In the end, all I could do is add a small preprocessor condition in the libzpaq/libzpaq.h file to check for an x86 architecture and disable the JIT code if none is found. I created a patch file but I am not sure if it’s worth submitting since 1) it is such an insignificant change and 2) it only checks for GCC and Visual Studio macros, and if the program is compiled on an x86 processor with a different compiler then the JIT code will be deactivated, which will have a significant negative impact on performance. Here is the patch file for, at least, posterity:

    From 3976cdc2640adbe593a09bba010130fcf74ef809 Mon Sep 17 00:00:00 2001
    From: Andrei Topala <>
    Date: Thu, 21 Apr 2016 17:56:34 -0400
    Subject: [PATCH] Added a preprocessor check for an x86 architecture in
     libzpaq.h as a requisite for enabling JIT code.
     libzpaq/libzpaq.h | 5 +++++
     1 file changed, 5 insertions(+)
    diff --git a/libzpaq/libzpaq.h b/libzpaq/libzpaq.h
    index 93387da..e2b5058 100644
    --- a/libzpaq/libzpaq.h
    +++ b/libzpaq/libzpaq.h
    @@ -25,6 +25,11 @@ comprises the reference decoder for the ZPAQ level 2 standard.
     #ifndef LIBZPAQ_H
     #define LIBZPAQ_H
    +// If we're not on an x86 processor, disable JIT
    +#if !defined(i386) && !defined(_M_IX86) && !defined(__x86_64__) && !defined(_M_X64)
    +#define NOJIT
     #ifndef DEBUG
     #define NDEBUG 1

    That concludes this chapter in our journey through the wide waters of software portability and optimization. SPO600 was certainly the most challenging and most interesting course I’ve taken at Seneca, and I am sure I learned very much. I do wish I’d had more time to blog about it, though. That is my only regret, and also the 10,000 failures.

    by andrei600 at April 22, 2016 03:58 AM

    Tom Ng

    The End of an SPO600 Semester

    Finally done SPO600; and what a ride it was.

    SPO600 was not what I expected in the beginning but as I progressed through the semester, the course developed into its own kind of charm. The course really demystifies a lot of what happens in the background. It explains what is normally abstracted away from or taken for granted by the programmer or really, most people.

    The topics covered in SPO600 are very specialized. It really is hard to say accurately say what this course is about and “Software Portability and Optimization” is only the tip of the iceberg; it’s almost misleading on what the course has to teach you. The course covers a broad spectrum of topics from compilers to architectures to optimizations. Sometimes the course even covers things that may not have even been course topics in the first place such as useful linux commands and programs.

    This course is also unlike any other I’ve taken before. I’ve never taken a course before where blogs and the zenit wiki are used  and to communicate. There was also an emphasis for doing things and reporting the results as well as a bit more leeway on how to approach a given problem. Other courses are more direct in what they have to teach and offer but the approach used in SP600 required you to think a bit more.

    Unfortunately I was unable to intake everything that the course tried to teach but I still feel I learned very much from this course. I was initially curious only about a few things here and there relating to the topics of the course and the course gave me a lot more than what I bargained for. The course opens your eyes just enough so that you realize you actually know absolutely nothing. Computing turned out to be a pretty complicated subject and for the various topics SPO600 covered, it probably only scratched the surface. Scratching that surface though felt like I was gaining some deep knowledge on the subject even if it was only the bare bones basics. Regretfully I was unable to find an optimization for my project but I was able to still take away some further knowledge about the make system and other compiler options from doing the project. There really is no other course like SPO600 and even if I got a bit more than I could chew, I’m really glad I have taken this course.

    by tng23spo600 at April 22, 2016 03:54 AM

    Lab6: Auto-Vectorization of Loops

    In this lab, I got to explore with the auto-vectorization features of gcc. I started by creating a small program that created two 1000 length int arrays filled with random numbers, a third array that stored the sums of the other two arrays and the program would print the sum of all numbers in the third array before exiting. The code is shown below:

    #include “stdio.h”
    #include “stdlib.h”
    #include “time.h”

    int main() {

    int array1[1000], array2[1000], array3[1000];
    long int sum;
    int i;


    /* Fill array 1 with random numbers */
    for (i = 0; i < 1000; i++) {
    array1[i] = rand();

    /* Fill array 2 with random numbers */
    for (i = 0; i < 1000; i++) {
    array2[i] = rand();

    /* Sum contents of arrays 1 and 2 into 3 */
    for (i = 0; i < 1000; i++) {
    array3[i] = array1[i] + array2[i];

    /* Sum contents of array 3 into long int */
    for (i = 0; i < 1000; i++) {
    sum += array3[i];

    printf(“Sum is: %ld”, sum);


    I made two object dumps of this program to compare with by compiling with two different compiler options. The first version was made with only -g to turn on debugging symbols: gcc -g arraysum.c -o arraysum.out. The second version was made with: gcc -g -O3 -ftree-vectorizer-verbose=1 arraysum.c -o arraysum.out. The -O3 turns on more optimizations including vectorization while -ftree-vectorizer-verbose=1 prints out what was vectorized and not. The messages printed out when attempting to vectorize were:

    Analyzing loop at arraysum.c:29
    arraysum.c:29: note: vect_recog_widen_sum_pattern: detected: patt_7 = _31 w+ sum_50;
    Vectorizing loop at arraysum.c:29
    arraysum.c:29: note: LOOP VECTORIZED.
    Analyzing loop at arraysum.c:24
    Vectorizing loop at arraysum.c:24
    arraysum.c:24: note: LOOP VECTORIZED.
    Analyzing loop at arraysum.c:19
    Analyzing loop at arraysum.c:14
    arraysum.c:5: note: vectorized 2 loops in function.

    Of the four loops in the program, gcc was able to vectorize two of them; the loop one where the third array was being filled with the sums and the loop where the sum of all values in the third array was being calculated. I initially used -ftree-vectorizer-verbose=2 at first but the messages were quite long so I used verbose=1 instead. The reason why the first two loops were not vectorized was shown in verbose=2 but not verbose=1. Likely what the issue was here were the calls to the rand() function which the vectorizer cannot analyze. A snippet of the messages printed from verbose=2 concerning the first two loops is below:

    arraysum.c:19: note: not vectorized: loop contains function calls or data references that cannot be analyzed
    arraysum.c:19: note: bad data references.
    Analyzing loop at arraysum.c:14

    arraysum.c:14: note: not vectorized: loop contains function calls or data references that cannot be analyzed
    arraysum.c:14: note: bad data references.
    arraysum.c:5: note: vectorized 2 loops in function.

    The assembly for the vectorized loop that fills the third array is:

    4005a8: 910083a3 add x3, x29, #0x20
    4005ac: 8b000062 add x2, x3, x0
    4005b0: 913f03a3 add x3, x29, #0xfc0
    4005b4: 8b000061 add x1, x3, x0
    4005b8: 4c407841 ld1 {v1.4s}, [x2]
    4005bc: d283ec02 mov x2, #0x1f60 // #8032
    4005c0: 4c407820 ld1 {v0.4s}, [x1]
    4005c4: 8b1d0042 add x2, x2, x29
    4005c8: 8b000041 add x1, x2, x0
    4005cc: 4ea08420 add v0.4s, v1.4s, v0.4s
    4005d0: 91004000 add x0, x0, #0x10
    4005d4: 4c007820 st1 {v0.4s}, [x1]
    4005d8: f13e801f cmp x0, #0xfa0
    4005dc: 54fffe61 4005a8 <main+0x58>

    The assembly for the vectorized loop that sums up the elements of the third array is:

    4005f0: 4cdf7800 ld1 {v0.4s}, [x0], #16
    4005f4: 0f20a402 sxtl v2.2d, v0.2s
    4005f8: 4ee18441 add v1.2d, v2.2d, v1.2d
    4005fc: 4f20a400 sxtl2 v0.2d, v0.4s
    400600: eb01001f cmp x0, x1
    400604: 4ee18401 add v1.2d, v0.2d, v1.2d
    400608: 54ffff41 4005f0 <main+0xa0>

    When comparing the object dump for the version of the program that had vectorization with the one that didn’t, the vectorized loops contained less instructions inside the body compared to the build where vectorization wasn’t enabled. There were also instructions exclusively relating to vectorization present such as the ld1 instruction which loads a single 1-element structure to one lane of one register. For example in the first dump above, the instruction at 4005b8 is: “ld1 {v1.4s}, [x2]”. v1 is the register, the 4s signifies the 4th index of 32 bits and x2 is for wide register 2.

    Vectorization can also be implemented in the volume sampling lab as it also contains many loops. The loops are used to fill up the lookup table and vectorization should be able to speed up the process. The scale function which is in a loop might be the only issue as it may be too complicated for gcc to analyze and determine if the loop can be vectorized. Perhaps this problem can be alleviated by inlining it inside of the loop or by rewriting it more appropriately (I believe my implementation of the scale function could be a bit less messy than the way I wrote it).

    by tng23spo600 at April 22, 2016 03:50 AM

    Nina Wip

    Trying to get my changes upstream

    As noted before my optimization was to change the compiler flag from -O2 to -O3. This increased the speed with 11% on x86_64. During the testfase of this optimization I changed the compiler flags in the Makefile itself. If I wanted this to get committed in the community I'd have to find the place where the makefiles are being created.

    This would normally be the configure file. However I could not find where the flags were set for the makefiles which I thought was very strange because I am very sure that when you configure and make the program, it compiles with the -O2 flag.

    I used grep to find files where -O2 would be used, and the only file it found was an instructions file of how you can manually add -O2 while configuring, not as a standard.

    Then I tried using grep to find CFLAGS and where they would be defined. What I discovered is that they use a pthread library which helps find the right flags for compilation. ( I quoted this from the md5deep documentation:
    #   This macro figures out how to build C programs using POSIX threads. It
    #   sets the PTHREAD_LIBS output variable to the threads library and linker
    #   flags, and the PTHREAD_CFLAGS output variable to any special C compiler
    #   flags that are needed.
    I did not know how to manipulate the pthread configuration so it would always use -O3. I did read in their instructions that they had an email address for questions or remarks on the compiler options in the README file. It was however not in the file, so I could not contact them personally either.

    On that note I'm sad to share that I could not get the right flags in the configure step. This means I could not commit anything to the github project because I could not contact them and ask for help or an explanation.

    by Nina ( at April 22, 2016 03:36 AM

    Vishnu Santhosh Kumar

    Lab 6 – Vectorization

    Vectorization is the process of rewriting a loop so that instead of processing a single element, it can process multiple elements simultaneously.

    To show this in action I used the following code:

    #include <stdio.h>
    #include <stdlib.h>
    #define NUM 1000
    int main()

    int a1[NUM], a2[NUM];

    int i=0;
    for (i = 0; i < NUM; i++) {
    a1[i] = rand() % (100);
    for (i = 0; i < NUM; i++) {
    a2[i] = rand() % (100);

    long int total = 0;
    int a3[NUM];
    for (i = 0; i < NUM; i++){
    a3[NUM] = a1[i] + a2[i];
    for (i = 0; i < NUM; i++){
    total += a3[NUM];


    The code above creates two 1000-element integer arrays and fills them with random numbers, then sums those two arrays to a third array, and finally sums the third array to a long int and prints the result.

    The code is compiled with -O3- option that  allows options for vectorization.

    402388: 91420222  add x2, x0, #0x80, lsl #12
    40238c: 3c407c00  ld1 {v0.2d}, [x0]
    402390: 3cdf7c21  ld1 {v1.2d}, [x1], #16
    402394: 3e61d400  fadd v0.2d, v0.2d, v1.2d
    402398: 4c9f2c00  st1 {v0.2d}, [x0], #16
    40239c: eb02001f  cmp x0, x2

    Here, the st1 stores a single element structure to one path of one register. ld1, on the other hand, loads a single element structure instead. Those two commands seem to help the process of vectorization.


    by Vishnu at April 22, 2016 02:51 AM

    Berwout de Vries Robles

    Phase 3 Pushing changes upstream

    To get my changes accepted upstream, the first thing I needed to figure out was where in the make / build process to insert my change. Since in the phase 2 optimization testing I only directly edited the Makefile to see the changes.

    It turns out ncompress uses a Makefile.def, which contains the definitions to be used for the make call. The package is not built with ./configure like most, but instead, after cloning you can directly make install. This means I had to add the following 2 lines to the Makefile.def:
    #Compiler optimization flags
    I tested whether the package still built correctly with my change to the Makefile.def and it built as it should with the -O3 optimization turned on.

    Next I had to figure out how to contribute to the package. Luckily since the package is available on github I can fork the package. So that is what I did, I forked the package, made the change and committed it to my branch. I then created the following pull request:

    I've added a link to my blog and a short description in the pull request, since there is no other means of communication described on the project page, now all we can do is wait for a response!

    by Berwout de Vries Robles ( at April 22, 2016 01:46 AM

    Vishnu Santhosh Kumar

    Final Project : Phase 3.4 (last)

    Software Package : LZ4

    Official Website :

    Git Hub :

    Final Aim: Optimize and improve the performance of the software on AArch64 and                                    x86_64 systems.

    Phase 3.4 –Final Patches Submitted and Project Conclusion

    Date : April 6,2016

    April 6,2016

    Submitted the final patches to the LZ4 community

    Details :

    The final patches of the code have been submitted to the lz4 community . Waiting for any further response from the community. All the changes made in the previous phases of the project have been added to the final code. The community is very active so I’m expecting a reply from the community soon.

    April 7, 2016

    Response from LZ4 community.

    Details  :

    The community has responded to my pull requests in GitHub. They have analysed my code and got the following replies.
    The community replied that one of the optimizations I have done for decompression on  a function called “LZ4IO_compressFilename_extRess()” was not the right place to do that one, instead making that similar changes in a function named “LZ4IO_GetBlockSize_FromBlockId()” could have made the program much better. The optimizations I have done to improve the performance of the code in the x86 machines makes the community satisfied but they show me their concern regarding the performance of the code on some other platforms.  The community also told me that they compiler options I put for the platform specific flag won’t work in certain conditions or new platforms. He also pointed out one of my mistakes I raised in the pull requests regarding the cache size. When I examined the oldest versions of the LZ4 code, I noticed an option to set the cache size of the compression block, I increased this cache size to see the changes in the compression speed. My assumption was right and the compression speed got increased. However, this option was not found in the latest version of the  code. I added this statement in the pull request and community told me that, its still available but they changed the place of the code to a different file. The community owner also appreciated me to try lz4 optimization and he told me that he have gone through my blog.


    The SPO course and this final project have created a lot of improvements in my coding and other skills. This course helped me to explore the world of the open source and many more new things. One such new thing is the blogging habits. I never had a blog before and  now I’m really interested in doing blogging and sharing my research.The Professor was very much helpful and inspiring. He made the labs fun and helped us to improve our team skills by making groups for all labs and working in groups. He explained things very well and give us a lot of references to follow. The classmates were too enthusiastic and energetic. This was one of the best Courses I have attended. I got some many new things to continue my research from this course. I once again thank my Professor and my classmates for being with me and supporting me  in  this course for this  whole semester.


    by vishnuhacks at April 22, 2016 01:34 AM

    April 21, 2016

    Berwout de Vries Robles

    Project Phase 2

    After last week where we ended our project selection with a somewhat hasty switch to another project I am happy to be able to just dig into this package a little bit. Sadly I have been sick half of this week(still am) so the progress is slower than I would have hoped.

    In my project selection blog I mentioned that I wanted to look at vectorization options the compiler can do for us if we make some small changes to the code. However before looking into the code I went to take a look at the makefile to see if there are any vectorization flags currently turned on.

    This is what the compiler options line looks like on both betty and my own x86 machine:
    What I found out is that CFLAGS, CPPFLAGS and LDFLAGS are all undefined in this case. What I also found strange is that these options define a set memory for the program to use and a number of registers the program can use.

    The most glaring lack here is that there is not even a base level of optimization applied, so I set out to test that first. To verify that the code has not been changed in a way that it now has a different meaning, we will be using md5 checksum to see if our compressed and uncompressed files remain the same. After slowly ramping up the optimization level, it turns out that the program works on O3, without it causing any problems.

    During benchmarks I first encountered a couple of runs where it looked like uncompressing the files was slower than before applying the optimization, however after running some more benchmarks, it seems to me that it was probably background noise from other processes running on my system.

    After the -O3 optimization level had been applied we see the following results in the benchmarks:
    aarch64 testserver Betty:
    2545934560(~2.4GB) textfile:
    real:    64.467s
    user:   63.280s
    sys:     1.180s

    1073741824(1GB) integerstream:
    real:    9.413s
    user:   8.980s
    sys:     0.430s

    2545934560(~2.4GB) textfile:
    real:    10.882s
    user:   9.510s
    sys:     1.370s

    1073741824(1GB) integerstream:
    real:    4.734s
    user:   3.920s
    sys:     0.700s

    my own x86_64 Fedora 23 installation:
    2545934560(~2.4GB) textfile:
    real:    34.812s
    user:   18.821s
    sys:     4.715s

    1073741824(1GB) integerstream:
    real:    13.274s
    user:   3.789s
    sys:     1.651s

    2545934560(~2.4GB) textfile:
    real:    45.669s
    user:   5.779s
    sys:     2.339s

    1073741824(1GB) integerstream: 
    real:    17.713s
    user:   2.814s
    sys:     1.107s

    If we compare that with last week's results we find that in all of the cases the compression was faster with the optimization turned on. However oddly enough on my local machine the uncompress was notably slower! This could also be explained by the amount of possible disturbances in the background processes of my machine since I am running a Virtual machine to test with. However I have tested extensively and have yet to see the optimized version of uncompress not do worse.

    To go about implementing this optimization in the actual program I will have to add it to the Makefile recipe, for the testing I have just been editing the Makefile itself. I have also been looking for a reason as to why compression on the aarch64 server betty is so terribly slow, while uncompress is very fast, but I have yet to find the answer.

    by Berwout de Vries Robles ( at April 21, 2016 10:09 PM

    Tom Ng

    The results of Optimizing xzutils with -O3

    Ever since I wrote my last post on my attempt to optimize xzutils, I have been compiling different builds of xzutils with different compiler flags. The results of my first two benchmarks shown that -O3 ran slower than the default compilation which used -O2. These results surprised me so I wanted to figure out how a program could run slower with -O3 instead of -O2.

    -O3 was simply -O2 with 9 additional optimizations turned on. The 9 optimizations are: -fgcse-after-reload, -finline-functions, -fipa-cp-clone, -fpredictive-commoning, -ftree-loop-distribute-patterns, -ftree-partial-pre, -ftree-vectorize, -funswitch-loops and -fvect-cost-model. This resulted in me creating 9 builds of xzutils, each with one of those optimizations passed into the configure script and compiled. The results of these builds are as shown in the picture below:

    results(Click picture to enlarge)

    The picture shows the times recorded by the time command when compressing and decompressing the test files. Highlighted in green are times where an improvement relative to the executable with the default optimizations (-g -O2) was found.

    Overall, the results show that compiling with -O2 and one individual optimization that -O3 turns on has an overall negative impact on the running speed of the program since the times to complete operations have mostly increased. Savings if any tend to happen more during the decompression operations than the compression operations. All the individual optimizations were also slower compared to the -O3 version of the executable which was slower than the original executable to begin with.

    Furthermore, any reductions in time that a build may have achieved for a section were mostly insignificant and potentially false. For example if there was a time savings in the user section, it may not have actually been a savings since the time “saved” was actually passed and added onto the kernel time resulting in no net gain of time savings.

    Of the 9 optimizations tested, -finline-functions seems to be a bit of an outlier. The executable compiled with this optimization is still slower than -O2 and -O3 but its timings are notably better than the other executables with the individual -O3 optimizations. In fact, its times are similar to that of the -O3 executable but it is still stands that adding -finline-functions to -O2 produces a slower running executable. In other words, -finline-functions adds considerably less runtime compared to the other optimizations but the executable is still better off without having this optimization.

    It was surprising to me that each individual -O3 optimization made the program slow but combined (-O3), the times were better (though still slower than just -O2). This suggests that combining optimizations may yield a speed boost. It’s hard to determine which optimizations could work with each other but through the names alone, I tried -ftree-loop-distribute-patterns, -ftree-partial-pre and -ftree-vectorize. The build with these optimizations though, did not yield favorable results. Much like the other tests with the individual optimizations, it was slower than the -O2 version and the -O3 version as well.

    In the end, I was unable to optimize xzutils through the adding of the -O3 compiler optimizations. None of the individual -O3 optimizations result in any sort of meaningful speed improvement and overall slowed the program down. It seems -O3 isn’t used all the time because it may actually slow down programs and not just because it can break something as I had initially thought. The xzutils devs are in the right for not using -O3 as enabling -O2 without any -O3 optimizations will produce the faster running xzutils.

    by tng23spo600 at April 21, 2016 09:39 PM

    Lab7: Looking at Inline ASM in libmlx4-1.0

    In this lab, I chose a package whose source includes inline assembly and attempted to locate it and find out what it does. In the source, I found only three lines of assembly. The assembly was used in exclusively inside of a series of if-else statements for the c preprocessor meaning only one or none of the assembly code will be executed. The assembly specifically targeted the i386, x86_64 and ia64 platforms.

    The file mlx4.h contained the 3 asm statements found in the source code. The section where the assembly was found is an if-else block section for the c preprocessor for defining wc_wmb(). Depending on the platform though, wc_wmb() may be substituted for a single line of assembly or wmb() instead. The input and output operands were empty. Declaring asm volatile and putting “memory” in the clobber register ensures the statement doesn’t get optimized or moved by the compiler. The assembly for all three sections seems to be an implementation of a memory barrier which prevents the compiler from compromising the order of execution through optimization. Throughout the qc.c file, wmb() would be called in different parts of the source.

    A single line of assembly would be used if it was determined that the architecture being targeted was either i386, x86_64 or ia64. If the target architecture was neither of these platforms, the else section defines wc_wmb() as wmb() without any assembly. I could not find wmb() being defined anywhere else in the program or redefined as something else elsewhere so it is possible that the call to wmb() may actually do nothing but let the compiler compile the program. If it does do nothing then it means the memory barrier for a section won’t be implemented and the program may suffer performance-wise after optimization because of this.

    Only a single line of assembler is executed if one of three platforms are detected and none at all if the platform isn’t recognized. The increase in complexity of the code is insignificant and only serves to benefit the platforms that it does support. No portability is lost, just unsupported platforms may not run as well. For a series of small checks, it is probably worth having the assembler in the source code.

    by tng23spo600 at April 21, 2016 08:59 PM

    Giuseppe Ranieri

    Providing for the Upstream

    I have submitted my work to the upstream through github.

     I thought it would be best practice to first show my finding to the community for them to comment on the benefits and cons that I don't know about. Unfortunately, even 4 days after posting no one has responded yet. I assume even if I had made a pull request, the same thing would have happened at this point in time as no new pull requests were answered either.

    For the sake of time though, if the work had been accepted, I would have had to went through their guidelines for contributing. Which follows a Google Individual Contributor License Agreement (CLA). The following is stated:

    Before you contribute

    Before we can use your code, you must sign the Google Individual Contributor License Agreement(CLA), which you can do online. The CLA is necessary mainly because you own the copyright to your changes, even after your contribution becomes part of our codebase, so we need your permission to use and distribute your code. We also need to be sure of various other things—for instance that you'll tell us if you know that your code infringes on other people's patents. You don't have to sign the CLA until after you've submitted your code for review and a member has approved it, but you must do it before we can put your code into our codebase. Before you start working on a larger contribution, you should get in touch with us first through the issue tracker with your idea so that we can help out and possibly guide you. Coordinating up front makes it much easier to avoid frustration later on.

    Code reviews

    All submissions, including submissions by project members, require review. We use Github pull requests for this purpose.

    The small print

    Contributions made by corporations are covered by a different agreement than the one above, the Software Grant and Corporate Contributor License Agreement.

     I followed their suggestion of:

    Before you start working on a larger contribution, you should get in touch with us first through the issue tracker with your idea so that we can help out and possibly guide you. Coordinating up front makes it much easier to avoid frustration later on.

    But since they have not answered me yet, I was not able to continue with the signing of the CLA.

    by JoeyRanieri ( at April 21, 2016 05:55 PM

    Vishnu Santhosh Kumar

    Tips For Future DATA SCIENTISTs

    My Tips for Future Data Scientistsdata-scientist

    • Be flexible and adaptable – There is no single tool or technique that always works best.
    • Cleaning data is most of the work – Knowing where to find the right data, how to access the data, and how to properly format/standardize the data is a huge task. It usually takes more time than the actual analysis.
    • Not all building models – Like the previous tip, you must have skills beyond the just model building.
    • Know the fundamentals of structuring data – Gain an understanding of relational databases. Also, learn how to collect and store good data. Not all data is useful.
    • Document what you do – This is important for others and your future self. Here is a subtype, learn version control.
    • Know the business – Every business has different goals. It is not enough to do analysis just because you love data and numbers. Know how your analysis can make more money, positively impact more customers, or save more lives. This is very important when getting others to support your work.
    • Practice explaining your work – Presentation is essential for data scientists. Even if you think you are an excellent presenter, it always helps to practice. You don’t have to be comfortable in front of an audience, but you must be capable in front of an audience. Take every opportunity you can get to be in front of a crowd. Plus, it helps to build your reputation as an expert.
    • Spreadsheets are useful – Although they lack some of the computational power of other tools, spreadsheets are still widely used and understood by the business world. Don’t be afraid to use a spreadsheet if it can get the job done.
    • Don’t assume the audience understands – Many (non-data science) audiences will not have a solid understanding of math. Most will have lost their basic college and high school mathematics skills. Explain concepts such as correlation and avoid equations. Audiences understand visuals, so use them to explain concepts.
    • Be ready to continually learn – I do not know a single data scientist who has stopped learning. The field is large and expanding daily.
    • Learn the basics – Once you have a firm understanding of the basics in mathematics, statistics, and computer programming; it will be much simpler to continue learning new data science techniques.
    • Be polymath – It helps to be a person with a wide range of knowledge.

    by vishnuhacks at April 21, 2016 02:33 PM

    April 20, 2016

    David Wesley Au

    [Lab 3] Assembly Lab



     mov x3,start
     mov x10,10
     adr x11,msg
     udiv x4,x3,x10
     msub x5,x10,x4,x3
     cmp x4,0
     add x4,x4,48
     add x5,x5,48
     strb w4,[x11,5]
     strb w5,[x11,6]
     mov x0,1
     adr x1,msg
     mov x2,len
     mov x8,64
     svc 0
     add x3,x3,1  
     cmp x3,max loop  
     mov x0,0
     mov x8,93
     svc 0


     mov $start,%r15
     mov $10,%r10
     mov %r15,%rax
     mov $0,%rdx
     div %r10    
     cmp $0,%rax
     mov %rax,%r14
     add $48,%r14
     mov %rdx,%r13
     add $48,%r13
     movb %r14b,msg+5
     movb %r13b,msg+6
     mov $len,%rdx
     mov $msg,%rsi
     mov $1,%rdi
     mov $1,%rax
     inc %r15
     cmp $max,%r15
     jne loop
     mov $0,%rdi
     mov $60,%rax

    There is not much of a difference between the two, The methods of doing it may be somewhat different, but the logic is ultimately the same.

    by David Au ( at April 20, 2016 07:43 PM

    Project Phase 3

    Since the attempt from Phase II, I have made attempts to try other possible optimizations, but have not gotten any luck in successful attempts. As I have said before, after searching through the pxz.c file multiple times on different occasions, I cannot find another possible shortcut / optimization that could be made towards the application. Between the two files, with a 1 MB difference, there is only a negligible time difference between the two. From my understanding of PXZ up until now, although the latest patch to the software dates back to almost one year. It already seems perfect in terms of code structure and speed. Given the tests I have done, there were almost no difference in time/speed between the different servers of our SPO600 servers.

    Back when PXZ was made (2009), for four years or so, there was been a number of "patches" that have been applied to the compression tool. Given the benchmarks, the PXZ community has attempted on, PXZ is meant for multiple cores and processors, which will increase its efficiency in compression time. The compression time using a single threaded PXZ is equivalent to an ordinary usage of the XZ compression tool.

    Although, my optimization attempts have ended in failure, I have learned a lot during the process of this project. I learned about how compression works in code, and how everything works well together in terms of assembly. In the future, I will attempt to make contributions to other parts of the PXZ utility be it the assembly, makefile, manpage or documentation.

    by David Au ( at April 20, 2016 07:16 PM

    [Lab 7] Inline Assembly

    The Open Source Package I have chosen to blog about is called "Mosh". Mosh is a mobile shell, a remote terminal application that allows roaming, supports intermittent connectivity and provides line echo. It also has line editing of user keystrokes. Mosh can be used as a replacement for SSH, it is free to use and it is available across many platforms. (

    1. Mosh has 14 different MAKEFILES

    2. Mosh is available on Windows, OS X, Android, Chrome OS, Linux, BSD, and Solaris.

    3. Mosh reduces the median keystroke response time to nearly instantaneous time. This makes remote servers feel like they are "local" machines.


    5. When I tried to configure the package, I ran into this problem: configure: error: cannot find protoc, the Protocol Buffers compiler. Thus stopping me from attempting to build the package.

    by David Au ( at April 20, 2016 03:16 PM

    [Lab 2] Code Building

    1. hello-2.10

    I used these commands to build the software:


    I had no problems with either, and after each option was used, there were more files that I have access to in the hello-2.10 directory.

    These are the options after make

    @aarchie hello-2.10]$ ls
    ABOUT-NLS  COPYING    ChangeLog.O  INSTALL  NEWS    README-dev      THANKS  aclocal.m4  config.h   config.log     configure     contrib  hello    lib  po   stamp-h1
    AUTHORS    ChangeLog  GNUmakefile  Makefile  README  README-release  TODO    build-aux  config.status  doc      hello.1  m4   man       src  tests

    2. ghostscript

    tar -xvf gnu-ghostscript-9.14.0.tar.xz


    Since there is no MAKEFILE, and it is suggested that I do not install on the SPO600 servers, there is nothing else I can do after configure. These are the options available after ./configure.

    @aarchie gnu-ghostscript-9.14.0]$ ls
    AUTHORS    DroidSansFallback.NOTICE  Resource    config.log  devices   expat                    ghostscript_rt.vcxproj  install-sh  libtool    missing   toolbin
    COPYING    LICENSE                   NEWS         aclocal.m4  base          config.sub  contrib       doc       ghostscript-ufst.vcproj  iccprofiles             jbig2dec  openjpeg  trio
    ChangeLog  LICENSE.Unicode           README       arch        config.guess  configure   cups          examples  ghostscript.vcproj       ijs                     lib         man        psi

    by David Au ( at April 20, 2016 02:58 PM

    April 19, 2016

    David Wesley Au

    [Lab 5] - Algorithm Selection Lab

    The first one should scale a signed 16-bit integer by multiplying it by a volume scaling factor expressed as a floating point number in the range of 0.000-1.000. This should be implemented as a function that accepts the sample (int16) and scaling factor (float) and returns the scaled sample (int16).
    The second version of the function should do the same thing, using a lookup table (a pre-computed array of all 65536 possible values). The lookup table should be initialized every time a different volume factor is observed. This should be implemented as a drop-in replacement for the function above (same parameters and return value).


    #include <time.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <stdint.h>
    #define MAX 250000000
    #define FACTOR 0.500
    int16_t method1(int16_t, float);
    int16_t method2(int16_t, float);
    int16_t* sample;
    int main(){
    int16_t* meth1;
    int16_t* meth2;
    meth1 = (int16_t*)malloc(MAX * 2);
    meth2 = (int16_t*)malloc(MAX * 2);
    int i;
    struct timeval stop, start;
    double diff;
    sample = (int16_t*)malloc(32768 * 2);

    srand(time(NULL)); // generate sound sample by using random number generator
    int16_t* sound;
    sound = (int16_t*)malloc(MAX * 2);
    for (i = 0; i < MAX; i++) {
    sound[i] = rand() % 65536 - 32768;
    for (i = 0; i < MAX; i++){
    printf("Value of sound sample: %d\n", sound[i]);

    // Calculating time taken for method1 - method1
    gettimeofday(&start, NULL);
    for (i = 0; i < MAX; i++){
    meth1[i] = method1(sound[i], FACTOR);
    gettimeofday(&stop, NULL);
    printf("took %d\n", stop.tv_sec - start.tv_sec);

    // calculate time taken for method2-method2
    gettimeofday(&start, NULL);
    for (i = 0; i < 32769; i++){
    sample[i] = i * FACTOR;
    for (i = 0; i < MAX; i++){
    meth2[i] = method2(sound[i], FACTOR);
    gettimeofday(&stop, NULL);
    printf("took %d\n", stop.tv_sec - start.tv_sec);
    return 0;

    int16_t method1(int16_t s, float f){
    return s*f;

    int16_t method2(int16_t s, float f){
    int16_t result;
    if (s >= 0){
    result = sample[s];
    result = -sample[-s];
    return result;

    aarchie: Method1 took 215,498 ms || Method2 took 168,443 ms
    x86_64: Method1 took 20,429 ms || Method2 took 23,564 ms

    by David Au ( at April 19, 2016 08:33 PM

    [Lab 4]

    #include <stdio.h>
    int main(){
          printf("Hello World!\n");

    Compiler Options:
    -g               # enable debugging information
    -O0              # do not optimize (that's a capital letter and then the digit zero)
    -fno-builtin     # do not use builtin function optimizations

    -static option:

    -It includes all stdio library files (all, not just the ones given in .c file) as a static memory, and it will make the file much larger than the original file.

    Without "-fno-builtin" option:

    -Total file size is reduced.

    Without "-g" option:

    -Total file size is reduced.

    Addition printf statements:

    -Will move into multiple registers and push them into the stack

    Move printf to a separate function (output()):

    -It will call output() from main, and move $0x0 to %eax to empty the register before each function call.

    by David Au ( at April 19, 2016 08:32 PM

    Andrei Topala

    Inline Assembler Lab

    Let’s look at the use of inline assembler in a C program.

    Assembly-language code can be embedded into a C (or other high-level languages) source file in order to optimize a certain part of the program by using processor-specific instructions that are faster than what the compiler would create out of functionally-equivalent C code. Moreover, inline assembly code can be used for hardware-specific tasks that would be impossible with high-level code.

    An example of inline assembly code can be found in K9Copy, a DVD backup tool. In the file src/dvdread/md5.h, there is the following piece of code:

    #if defined __GNUC__ && defined __i386__
    static md5_uint32
    rol(md5_uint32 x, int n)
      __asm__("roll %%cl,%0"
              :"=r" (x)
              :"0" (x),"c" (n));
      return x;
    # define rol(x,n) ( ((x) << (n)) | ((x) >> (32-(n))) )

    If the program is compiled with a GNU compiler and on an i386 processor, the function rol(md5_uint32 x, int n) is defined. It contains the assembly instruction roll (rotate left long), which rotates the 32-bit integer x to the left by n places, bringing bits that fall off the end back to the beginning. If the program is compiled with a different compiler or on a different processor, the function is instead replaced with a macro function that achieves the same result through bit shifts; it would use multiple instructions whereas the version using inline assembly uses only one.

    This, I think, is a good use of inline assembly: it’s there to be used if it can make the executable faster, and if it can’t be used then there is an alternative. Unless there are machines that have the __i386__ macro defined but no rol function available (which I don’t think would be the case?), there are no problems with portability. The concept is sound. However, this particular source file was written in 1999. Modern compilers would almost certainly recognize the second function (that is, the macro function using bit shifts) and replace it with a rol instruction if possible. Using inline assembly for optimization is not so (relatively) straightforward a job as one might think, or as it would have been 20 years ago. Most attempts at optimization with inline assembly would probably be surpassed by the compiler’s optimizations; at worst, inline assembly might even impede the compiler and result in an executable that performs worse than it would have had the compiler been left to its own devices. There are, I’m sure, exceptions, and in some situations it could be the best course of action to use inline assembly in an optimizing capacity, but it probably shouldn’t be the first resort.

    Inline assembly code should, of course, still be used for certain platform-specific operations or for tasks that cannot be done with high-level languages–those things, at least for the moment, lie beyond the powers of the compiler.

    by andrei600 at April 19, 2016 01:51 AM

    April 18, 2016

    Tom Ng

    Lab4: Compiler Options and their Impact

    Lab 4 was about observing the impacts of compiler options on a simple program. This simple program was nothing more than a call to printf saying “Hello World”. In this lab, I compiled 7 different version of the same program with either different compiler options or with a slightly difference in the source code but accomplishes (mostly) the same result.

    Before experimenting with compiler options, it is necessary to see how the compiler normally compiles a program without any options. The compile command would simply be: gcc hello.c. The compiler compiled the source to a binary that was 8190 bytes.

    The first variation of the program was compiled with the options: -g -O0 -fno-builtin resulting in the command: gcc -g -O0 -fno-builtin. The -g enables debugging symbols, -O0 sets the optimization level to 0 which translates to no optimizations and -fno-builtin disables other optimizations. The resulting file was 9184 bytes which was larger in size. When examining the object dump (objdump -d) for the resulting executable and comparing it with the executable without options, there was only one change; in the optimized version, the instructions for calling printf function were changed to call the function puts instead.

    The second variation of the program was compiled the -static option added in. This option is for linking statically instead of linking dynamically when compiling. The executable produced with this option shot up almost 700% in size to 681699 bytes. The object dump also resulted in massive amounts of instructions being added in. While the main was the same amount of lines, the start_main which was only 4 lines in the dynamically linked version, was many more; possibly too much for the objdump to show as the end of the header showed “…” suggesting that further lines may have been truncated.

    The third variation was compiled without -fno-builtin. The filesize became slightly smaller at 7149 bytes. The object dump reveals the same change as the first variation where the printf calls were replaced by puts instead.

    The fourth variation was compiled without -g. Comparing to the first variant, this resulted in no change to the file size or the assembly code. However doing a hash check on both executables resulted in different values so there is some difference here not visible in the file size or when calling objdump -d.

    The fifth experiment is different in that this time it is a source code change rather than a compiler option change. The original program simply calls the printf function and passes to it one argument: the string containing “Hello world”. The change here is to add additional arguments. The additional arguments added are integers meant to be printed with the first string. I made 10 sub-variations of this program each with a different number of parameters passed to printf (1 to 10). When comparing the object dumps of the sub-variations with more parameters passed, there were more mov instructions there were in the main header as the number of parameters passed increased. There were also more instructions for pushing values onto the stack in that header. There were also other changes throughout the object dumps but they were only address changes.

    The sixth experiment was much like the fifth where the source code was changed rather than the compiler options. The original program simply calls the printf function inside the main and then exits. Here a call to the function output() was made in the main and output contains the call to printf. In the disassembly, this was reflected by a new header output being created and contained much of the instructions that were in the main header. The main header was also reduced by one instruction.

    The seventh and final experiment was setting the optimization level to the highest which meant changing -O0 to -O3. Here the filesize went up from 9184 bytes to 10608 bytes. There were many changes done to the assembly as revealed in the object dump. Some headers like the main were moved to other parts of the section. The main header also contained less lines of assembly to execute.

    by tng23spo600 at April 18, 2016 04:22 AM

    April 17, 2016

    Tom Ng

    On two Optimizations: -fmerge-constants and -fpeel-loops

    Recently, I had the opportunity to spend time to examine two optimizations available in gcc: -fmerge-constants and -fpeel-loops. For each optimization, I examined what it did and tried to write code that would show the effect of the optimization.

    -fmerge-constants is an optimization where as its name suggestions, the compiler attempts to merge all constants that have the same values. The idea is if a variable has the same value and it is assured that the variable will not change (constant), the same variable can be referenced instead of using multiple variables. This can help create smaller sized executables since there are fewer variables. This optimization even spans across compilation units meaning it’s not limited to the variables of one specific file in the source code.

    The simple program I made to highlight the optimization simply prints constant variables of the same values in the main and includes the header for another function that does the exact same thing with the exact same values. I compiled two versions of the program: one with the optimization and one without the optimization. When comparing the disassembly of the two executables, the only things that changed were the addresses of the mov instructions. There was no increase or decrease in instructions yet the file size of the executable was smaller. The savings came from the .rodata section where for the optimized executable, the variable and its value were only declared once instead of multiple times. -fmerge-constants is enabled at optimization levels –O, -O2, -O3 and –Os and rightfully so because it should always be used. There are essentially no downsides to using this optimization unless your code calls for violating the constness of one of the variables you declared.

    -fpeel-loops is an optimization where either the first or last iteration of the loop can be hoisted out which can potentially eliminate a variable or operation inside of the function. It is possible for the first or last iteration of a loop to be a special case and different from the rest of the iterations. Handling this case tends to requires extra code that ends up inside the loop. If this optimization is enabled, the compiler can move the special case out of the loop by handling it before or after the loop and the loop can then be adjusted to fit the other cases while eliminating a variable or operation. This can lead to further loop optimizations such as if compiler determines the number of iterations is small enough, it may completely unroll it eliminating the loop.

    Unfortunately I don’t have any working code for this optimization but here is a trivial example on what would happen with this optimization enabled. Given:

    int x = 2000;

    int y;

    for (y = 0; y < 10; y++) {


         x = y;



    The loop will print the squares of x in and increment x by 1. Notice however the way this code was written makes it so that the first iteration is an anomaly in which x starts at a completely different number before becoming what would be normal for the loop. What the compiler can do is hoist the first iteration out and put it before the loop:



    for (…


    Following this, the loop can be adjusted like:


    for (x = 1; x < 10 x++) {




    The end result looks like:


    int x = 2000;


    for (x = 1; x < 10; x++) {




    Notice the entire elimination of the y variable in the snippet. Furthermore, an entire operation inside the loop was eliminated (x = y). In this particular example, the number of lines of code was reduced but that does not always happen as it depends on the operation that occurs inside the loop and additional logic would be needed to handle the special case if the exit condition of the for loop is not constant.

    by tng23spo600 at April 17, 2016 01:24 AM

    April 16, 2016

    Tom Ng

    Lab3: Programming in Assembly

    Lab 3 was a fun lab about learning assembly. Assembly is certainly very different from higher level languages but it has its charms. Programming in assembly felt like doing a time consuming puzzle that you know you can eventually figure out and when you do, it feels great. Assembly brings back the joy of printing your first variable or making your first working loop which is probably taken for granted when you’ve programmed enough. The fun part or challenge of assembly comes from its restrictions: there is a limited set of instructions which must be combined in order to do something that may have taken only one line of code in another language. Assembly also uses a limited number of registers which are used to store values and must be used sparingly.

    I was tasked with having to code a loop that would simply print the numbers from 0-9. Sounds simple enough. That would be two or three lines of code in another language; but not in assembly. To start I copied the “Hello World!” example and tried to compile it. It worked but when I tried to execute it I got an exec format error. It took a while before I figured out what was wrong because I thought it was something like a syntax error but the message did not contain any lines for where an error could have been. It turned out I was trying executing the object file all this time and that the compiler (as) doesn’t link by default. I thought that was weird because when using gcc to compile c or c++, it links by default unless using the -c flag. I then had to actually call the linker ld whereas when linking with c or c++ I would use gcc and it would use ld. I guess I should have realized sooner since I had to keep typing chmod +x before attempting to execute what I thought was the compiled executable. Well, now I could start making loops.

    It turned out the loop was actually already given to me and what I had to do instead was code what the loop was supposed to do. The requirement was to print “Loop: ” and then the current iteration of the loop on one line. I thought of printing “Loop: ” and then printing the number but it turned out you can actually modify the string by changing the address of the string; offset by the desired amount. This required declaring the string to have extra spaces at the end: “Loop: “. It was also necessary to add 48 to the number in order to print its ASCII equivalent. The next part of the lab expanded on the previous part by printing the numbers up to 30. This meant having to print numbers of up to two digits length but only a single digit could be printed at a time. Printing each digit meant breaking up the number with the help of division and remainders which the div instruction conveniently had. Leading zeroes also had to be supressed which was simple enough since individual digits were being printed anyway. To supress the zero, just before where the code prints the digit, a check is made to see if the number is zero and prints it if it is not. If it is zero, it jumps past to where it would have printed the digit. The code I wrote for the lab only allows for numbers with two digits to be printed but logic for printing larger numbers is there, probably at the cost of using more registers.

    Finally I had to replicate the x86_64 assembly and port it to aarch64 assembly. Another deceptively hard task. I started by copying over the code from the x86_64 system to the aarch64 system and recompiled it. There were enough errors that I decided to start from scratch and upon further reading on wiki about aarch64 assembly, it was different enough that I had to rewrite the program from scratch anyway. The first interesting part that I had to change was the code to divide the number the loop was on. The x86_64 assembly had a lot of mov instructions because some of the instructions like div required certain numbers in certain registers to be used. Aarch64 turned what was 6 lines of code in x86_64 into just 2 lines since the (u)div instruction for aarch64 did not require the numbers it worked with to be in specific registers. The block of code used to print the message was also shorter by 3 lines because x86_64 assembly again required certain data to be in certain registers. Most of the code removed from the aarch64 version was in fact, mov. But aarch64 assembly also has some downsides compared to x86_64 assembly. Some instructions such as udiv only accepted registers as their arguments whereas div x86_64 could also accept numbers so a few mov instructions were also required to use some aarch64 instructions. Overall though, aarch64 required less lines of code than x86_64 assembly.

    Doing lab 3 has taught me quite a bit about assembly but I’m definitely still in the realm of a beginner. Probably the most important thing I’ve learned about assembly is that there are different kinds of assembly for different architectures. I always thought assembly was one language but someone who knows x86_64 assembly may not know necessarily know aarch64 assembly. Assembly is some hard stuff and it’s kind of amazing knowing there are people that work with it everyday.

    by tng23spo600 at April 16, 2016 07:45 PM

    April 14, 2016

    Vishnu Santhosh Kumar

    Final Project : Phase 3.3

    Software Package : LZ4

    Official Website :

    Git Hub :

    Final Aim: Optimize and improve the performance of the software on AArch64 and                                    x86_64 systems.

    Phase 3.3 –  Interactions with the lz4  community in GitHub.

    Date : March 29,2016 – March 30,2016

    March 29,2016

    Opened an issue to improve the documentation for the LZ4 compression program.

    Details :

    I opened an issue in GitHub to improve the documentation of the project. The pieces of information provided in the present documentations are not enough to understand the working of the program or the flow of the program. No documentation describes the special purpose of respective directories and files in it. I believe that a better documentation will really help beginner for understanding the program and contributing to its development.So I opened this issue in the GitHub.

    I got the reply that the point of a format specification is to allow any kind of implementation that existing, and still remain compatible with each other.This is a good thing  as different environment come with different restrictions and strengths.  So a single algorithm wouldn’t fit them all.He also encouraged me to  understand the format and  then to  create my own program to read it.Which will get better insight, and as a result, get a better chance to understand the reference source code. I found a similar issue opened in the Github, and the response from the owner was that there is constantly a need to balance between “simple usage for a beginner” and “hardcore total-control experience” where little details matter a lot. Also, the community is been working since  a very long time(17 years) and many patches were merged without proper documentations.He said that he will try to convert the comments to Doxygen, a documentation generator, a tool for writing software reference documentation.


    March 30,2016

    Opened an issue regarding the increase in the compressed file

    Details : 

    I compressed a file with only unique characters to check the performance of the code. It is found that the size of the file got increased by nearly 150%. Which I believe , it happened because of the encoding of the non-matching literals. So I opened an issue to make some  changes in the algorithm to encode the literal only if a match has been and found and also only if the encoded literal size is lesser than the uncompressed data. By that way, the uncompressed literal will have the same size before. There the worst case file size can be reduced.
    The response from the community was that the  default minimum size of the header is 15 bytes, even though my issue could improve the worst case situation , the command line utility cannot do that ,  because it must generate a result decodable on many different systems and for different use cases, so it needs the frame headers to ensure decoding success.Also in order to compress the file without header,  a function called ‘LZ4_compress()’ can be used directly at API level.



    by vishnuhacks at April 14, 2016 05:52 PM

    Final Project : Phase 3.2

    Software Package : LZ4

    Official Website :

    Git Hub :

    Final Aim: Optimize and improve the performance of the software on AArch64 and                                    x86_64 systems.

    Phase 3.2 –  Interactions with the lz4 community in  google forum.

    Date : March 28,2016


    March 28,2016

    Post : Which is the default format used, Frame format or Block format?

    Details : 

    While I was working on my Phase 2.1,  I came across two different kinds of formats used by the compression code to decompress the literals. The format means the structure of the compressed block of literals, that I explained in phase 1.1 . The two formats are the frame format and the block format,  which have different structures. The actual default format used by the code was not mentioned in any documentations.I ask my question in the community forum.

    I got  the reply that the “block format” is used for compression  in the LZ4 compression.Also, there is no default compression format, the  frame format is used  for testing and debugging. I also understood that the frame format is easy to use, but the block format is used to write an advanced application which has specific limitations (e.g. tight working memory/bandwidth/size/speed budget). I decided to make changes in the block format to improve the optimization.

    The Block format is  used in compression and Frame format is used in testing and debugging.Also, there is no default format used for compression.

    by vishnuhacks at April 14, 2016 04:27 PM

    Final Project : Phase 3.1

    Software Package : LZ4

    Official Website :

    Git Hub :

    Final Aim: Optimize and improve performance of the software on AArch64 and                                    x86_64 systems.

    Phase 3.1 –  Interactions with the lz4 maintainers in google community blog.

    Date : March 22,2016  –  March 26,2016

    March 22,2016

    Post:  Difference between ‘lib’ directory and ‘programs’ directory?

    Details :

    During my Phase 1.1 of my final project , I  find it really hard to understand the purpose of the various file in different directories. One among such most confusing pair of libraries were the ‘lib’ directory and ‘programs’ directory. The files in each library had similar names and the code was so confusing (ex: a lot of macros) and unable to understand the purpose of these files. I posted my question in the google community forum.

    I got my first reply that , the “lib” directory contains LZ4’s library (core) part.  “program” directory contains application programs which use LZ4 library: LZ4 command line utility, I/O submodule and test tools (e.g. benchmark). I understood that the optimizations for the I/O module like increasing buffer sizes or multi-threading the input/output etc can be modified in the programs directory. It contains the programs that use the files in the “lib”, which contains the code for compression or decompression , the block formats of the compression etc. Any optimizations on the compression/decompression should be done on the “lib” directory files.

    The ‘lib’ directory is used to maintain the core program and the ‘programs’ directory is used to support the programs for the user interface, input or output operations, buffer management etc.


    March 26,2016

    Post:  How to determine maxOriginalSize and correctly allocate a buffer for decompression?

    Details :

    While I was analyzing the  code for the decompression, I noticed that the buffer size to decompress the file is fixed(128 kb). I thought it would be better to include the original size of the file in the header of the compressed file and by that way a better decompression size can be achieved from the header. This helps to improve the decompression speed of the decompressing code. I posted my question on the community blog.

    I got my first reply on the same day itself.  The owner of the blog is a profile named “Cyan”, he replied that the compression block uses only a maximum of 64-kb, because of that the 128kb size in the  decompression code would be enough to decompress any file size. He also said that the future extensions for the compression format will allow embedding the file size in the header.Two other users have also posted their opinion after this, supporting  “Cyan’s” post.

    The version 6 of the LZ4 software is planned to add the file size in the header . That can be used by the decompression code to achieve maximum possible decompression speed.


    by vishnuhacks at April 14, 2016 03:59 PM

    April 11, 2016

    Andrei Topala

    Project: Improving lrzip; Phase Two

    In my last post, I mentioned that lrzip, when given the -z option, complains of an illegal instruction and crashes on AArch64. I did some debugging with GDB, hoping to find the cause of this, but the situation seemed to be that the program just sort of lost itself during execution:

    Program received signal SIGILL, Illegal instruction.
    0x000003ff7b000000 in ?? ()

    Even though the package had been compiled with -g (to produce debugging information) and we’d downloaded all the separate debuginfos, the program still lost its way, and it couldn’t tell us where it was.

    The reason for this turned out to be very simple. I took a look at the files in the libzpaq folder and found, in libzpaq.h, the following comment:

    “By default, LIBZPAQ uses JIT (just in time) acceleration. This only
    works on x86-32 and x86-64 processors that support the SSE2 instruction
    set. To disable JIT, compile with -DNOJIT. To enable run time checks,
    compile with -DDEBUG. Both options will decrease speed.”

    JIT code is created during execution time and then executed. libzpaq.cpp contained several functions that filled various arrays with hex numbers representing machine instructions. Those arrays were then written to memory and executed as code. The problem is that those machine instructions are x86 instructions. So, on our AArch64 machines, the program was trying to execute data it had no business executing, and that’s why it crashed.

    libzpaq.cpp has a check to make sure that the platform it’s being compiled on is x86, but it isn’t a very good check:

    // x86? (not foolproof)
      const int S=sizeof(char*);      // 4 = x86, 8 = x86-64
      U32 t=0x12345678;
      if (*(char*)&t!=0x78 || (S!=4 && S!=8))
        error("JIT supported only for x86-32 and x86-64");

    That snippet of code checks that the system uses little-endian format (which x86 does) and that a pointer is either 4 or 8 bytes long. AArch64 also uses little-endian format (AArch64 is bi-endian in regards to reading data, but by default it is little-endian) and has same-sized pointers, so it passes the test, which is something that shouldn’t happen.

    It seems that this is a known problem. An issue was submitted to lrzip’s GitHub’s page a little over a month ago, but no solution was proposed beyond that mentioned in the libzpaq.h file (ie. passing to the C++ compiler the -DNOJIT flag when compiling the code).

    I first thought to translate the encoded x86 instructions into the equivalent AArch64 instructions . . . but there are a lot of instructions, and I don’t think I could get it done in time (and I am sure I would run into trouble concerning the handling of things like x86’s variable-length vs AArch64’s 32-bit instructions, etc.)

    I wanted to see how big of a difference the JIT code makes, so I tried running the ZPAQ algorithm on x86_64 with and without the -DNOJIT flag.

    Here is the result of running lrzip -z normally:

    [andrei@localhost lrzip-0.621]$ ./lrzip -z text 
    Output filename is: text.lrz
    text - Compression Ratio: 4.810. Average Compression Speed:  1.355MB/s. 
    Total time: 00:06:08.42

    To compile the program with the -DNOJIT flag, we just need to add it to the C++ flags when we run the configure script:

    [andrei@localhost lrzip-master]$ ./configure CXXFLAGS="-g -O2 -DNOJIT"

    The -g and -O2 flags are set by automake (which lrzip uses to create its configure script and makefiles) if CXXFLAGS isn’t defined by the user, so if we are defining CXXFLAGS we need to set them as well, for consistency’s sake.

    Now, here’s the same operation without the JIT code:

    [andrei@localhost lrzip-0.621]$ ./lrzip -z text 
    Output filename is: text.lrz
    text - Compression Ratio: 4.810. Average Compression Speed:  0.919MB/s. 
    Total time: 00:09:04.74

    Without the JIT code, the program takes 3 extra minutes, or slightly over 30% more time.

    Chris suggested that such a large discrepancy between the JIT code and the compiled C code might indicate that the C code isn’t quite optimized as well as it could be. I looked at the C code and, while there are some small pieces I could try to optimize, I do not really understand the algorithm at all, and the code is difficult to read. I’ll try to keep going but I don’t have anything to show for my efforts here.

    Anyway, at least I edited libzpaq.cpp. I added the following preprocessor directives near the top of the file:

    // GNU
    #ifdef __arm__
    #define NOJIT 1
    #ifdef __aarch64__
    #define NOJIT 1

    NOJIT, which is supposed to be set with the -DNOJIT flag, is checked in every function that uses JIT code, and if it is defined the program uses regular C code instead. So, with this change, if the preprocessor detects that we’re on an ARM machine, it just sets NOJIT and the other conditionals take care of the rest. It’s an inelegant solution (and I suppose it would make more sense to check instead that the program is running on an x86 architecture and enable NOJIT otherwise) but it works on aarchie and betty, and the ZPAQ algorithm defaults to using the C code. I’ve only tested this on gcc, though; other compilers have different (or no?) macros to check for ARM architectures, so this is not the best solution. I’ll try to refine it but I don’t know if it’ll be worth submitting to the community, since it is such a simple thing and could be achieved just as easily by compiling the code with the -DNOJIT flag.
    Modifying the script to check the architecture and add -DNOJIT to CXXFLAGS accordingly also works, but I think it’s better just to have a safeguard in the libzpaq.cpp source file itself, because adding an extra C++ flag by default on non-x86 machines is not really expected behavior.

    I took a break from the ZPAQ algorithm and tried to find something to optimize in the regular LZMA code (which is the path taken by default when lrzip is called without specifying an algorithm), but this, too, proved fruitless (although the code is much easier to read). Two functions take up 90% of the execution time, and I couldn’t improve either of them.

    Two pieces of code in particular (the first from LzmaEnc.c and the second from LzFind.c) I tried to wrest some improvement in performance from.

    p->reps[3] = p->reps[2];
    p->reps[2] = p->reps[1];
    p->reps[1] = p->reps[0];
    p->reps[0] = pos;

    Here I tried to use memmove to shift the array (it’s an UInt32 array) forward all at once, but it didn’t have any effect on performance.

    if (++len != lenLimit && pb[len] == cur[len])
              while (++len != lenLimit)
                if (pb[len] != cur[len])

    This I tried to condense to a single while loop (or at least one while loop and a single if statement) in order to reduce the number of branches and operations, but that actually made the code slower. It might be the case here that the compiler optimizes this particular logic layout best.

    So, that’s where things stand at the moment. It still looks like the ZPAQ C code might be the most promising venture, if only because I can’t seem to optimize the LZMA code at all. I’ll keep trying both options, though. I also haven’t looked into other compiler options yet, so that’s also still a possibility.

    by andrei600 at April 11, 2016 06:46 AM

    Giuseppe Ranieri

    The Trials and Tribulations of Optimizing Zpofli

    What Didn't Work

    When I first went about doing this, my methods of testing were incorrect and there was a lot of research I should have done better on.

    For example, reading more about the zopfli compression, the reason it is slow was because that was the entire point of why it was made. It was meant to be 80x slower than gzip while only providing an up to 8% boost. So it became a tall order to try to better it.

    Here's comparing the size differences between the two.


    As you can see, in this case for the 10mb file made with the command:

    base64 /dev/urandom | head -c 10000000 > 10mb.txt

    it can only provide a 0.6% increase. Although with less random text, it can probably provide better performance. Still, we're dealing with razor thin margins.

    What I first tried to do was follow this guide but what ended up happening was that I was getting an even worse performance than by doing the makefile. And this wasn't a permanent solution either as I wouldn't be able to port the exe file to linux.

    Next I tried to change the code anywhere I could to see even just a bit of a boost. Changing any data types where I thought it could help. The only problem was that it wouldn't compile correctly. The code was so advanced and accurate, it was difficult to pinpoint where and what to change for everything. So I thought it was a bust to try to even attempt this.

    What Did Work 

    So I thought my only chance was maybe changing the compiler options around to see what sticks. In the makefile, and the documentation it's suggested that the program is compiled with an optimize of -o2. I thought this was odd, as why not go for the full o3? As well as some other options that allow it to be compiled correctly and such.

    Here are the results when comparing -o2 and -o3 on the xerxes server.

    real    0m29.852sreal    0m29.847s

    As you can see, no difference at all that can really be assumed thanks to the higher optimization level. As I ran it many times between the two, with -o2 sometimes being faster than -o3 and vice versa. I can only assume the reason the documentation suggests -o2 over -o3 that it just compiles quicker, and maybe it uses less memory, although I did not test for this.

    So again I was stumped, as that means the optimization levels won't matter, until I remembered that I should try it on the aarchie server! And the results were terrific.

    real    3m56.909sreal    3m50.971s

    I tested multiple times on both levels to make sure it wasn't just a fluke, but nope, results were always +/- 1 second from their respective levels. This means that changing between the levels can provide as much as ~2% difference in speed! And when something takes this long to compress, that is something that is noticeable when you're working with things that could take hours.
    Testing with larger files on the xerxes server with different optimizing levels showed the same speed between them all. Unlike showing a difference like it did on the ARM64. Which means that changing it is an ARM64 improvement only. This is probably why it was not noticed or included in the makefile.

    What Now?

    Before submitting these findings to the upstream I'll have to gather more research for why this happens. There must one or two only flags in the level 3 optimization flag that help it only on the ARM64 platform. Once I find this out, I could then push an ARM64 only makefile so others could see the same performance boost.

    by JoeyRanieri ( at April 11, 2016 03:28 AM