Planet CDOT

February 28, 2015

Maxwell LeFevre

Writing Assembly for GAS on AArch64 and x86_64 (Lab 5)

In this post I will be looking at writing a simple loop in assembly, for the gas compiler on x86_64, that prints the numbers 1 – 30, formatted as follows:

Loop:  1
Loop:  2
Loop:  3
... (Lines removed to save space) ...
Loop: 28
Loop: 29
Loop: 30

This started out as a group task in class and I expanded on it later on my own. I have two different solutions to the problem, one is much shorter but was more challenging to write. The first solution can be here, loopnozero.docx. If you look at the comments you’ll notice that I decided to print each part (‘Loop:’, number, line break). This was not the most efficient or smallest code I could have written. I was constrained by the time limit set by the end of class so I chose this method because it was faster and didn’t involve any commands not included in the lab guidelines.

The next part of the lab was to write the same program but on the AArch64 system. Because I was going to have to rewrite it I decided to modify my program on the x86_64 platform to be shorter and more efficient so that it would be less work to port over. The short version (47 lines instead of the original 82), loopshort.docx, is the one that I will be looking at in detail.

Assembly on x86_64

The first section of the loopShort.s file tell the compiler that this code goes in the .text section and declares _start which is the equivalent of int main() in C. the first line of start just moves the number 1 into the register that I used for the loop index.

.globl    _start

    mov     $1,%r15         /* loop index */

The next block of code starts with the label that marks the top of the main loop.

    /* Divide number */
    movq    %r15,%rax       /* copy the loop index to registry for division */
    movq    $10,%r12        /* write 10 to register */
    movq    $0,%rdx         /* clear rdx for remainder */
    div     %r12            /* divide by 10 */
    movq    %rax,%r11       /* move result quotient to register 11 */
    movq    %rdx,%r13       /* move remainder to number register 13*/

The first thing that happens in the loop is the division of the loop index by 10. There are six separate steps involved in doing this. Three registers are used to do division; rax must contain the number to be divided, rdx stores the remainder, and any other register can be used to hold the divisor (r12 in this case). The third move statement that overwrites the rdx register with 0 was added because I found I was getting leftover information when the remainder was 0 so I preemptively overwrite it before the divide operation. The div instruction tells the processor to divide rax by r12 and store the result quotient in rax and the remainder in rdx.

The next step if to move the results into the message that will eventually be printed. The message string is stored in the variable msg which I have created at the end of the file:

.section .data
msg:        .ascii          "Loop:   \n"
            msgLen = . - msg

I have left three spaces between the colon and the new line character to allow for one space for formatting and two digits. Before I can insert the numbers into the string I have to convert them to ascii characters. To do this I add 48 to them because 48 is the ascii number for 0. Then I can copy them in the message using the single byte move. The ‘b’ after ‘%r13′ means move a single byte and the ‘+7′ means move it into the 7th byte of ‘msg’. To avoid displaying the leading zero for numbers less than ten I used cmp to compare the value of the quotient to zero. The compare function sets a flag in the processor so that when the jump if equal (je) function is called in the next step it knows if the jump should be taken.

    /* Add ones to message */
    add     $48,%r13        /* convert to ascii by adding 48 */
    mov     %r13b,msg+7     /* copy one byte into msg at location 7 */

    cmp     $0,%r11         /* check if quotient is 0 */
    je      skipTens        /* if it is jump to skipTens label */

    /* Add tens to message */
    add     $48,%r11        /* convert to ascii by adding 48 */
    mov     %r11b,msg+6     /* copy one byte into msg at location 6 */

The je line tells it to jump to the label ‘skipTens’ if the previous line evaluated to true. This jumps over the section where the quotient is added into byte 6 of the message. The third last section is where the message gets written to stdout with a system call. There are four registers that must be used for this. The address of the message goes into rsi, the file descriptor (stdout = 1) into rdi, rax is the value of the syscall we whant (1 for sys_write), and rdx holds the length of the message.

    /* print number ones */
    movq    $msg,%rsi        /* pass the address of number to rsi */
    movq    $1,%rdi          /* file descriptor stdout */
    movq    $1,%rax          /* syscall sys_write */
    movq    $msgLen,%rdx     /* set length to 1 */

    /* loop checking */
    inc     %r15             /* increment index */
    cmp     $31,%r15         /* see if we're done */
    jne     loop             /* loop if we're not */

    mov     $0,%rdi          /* exit status */
    mov     $60,%rax         /* syscall sys_exit */

The line syscall actually make the call and writes the line. The final bit of code increments (inc) the loop index, compares it to 31, and if it’s not there yet sends it back up to the top. The last three lines are a standard exit syscall.

Differences between x86_64 and AArch64

When converting my program from x86_64 to AArch64 there were a number of significant changes, some easier to deal with than others. One of the obvious ones that was really easy to fix is that on AArch64 none of the registers, numbers or variables are preceded with ‘%’ or ‘$’. The other obvious but slightly annoying one was that the from and to locations are switched in every function. On x86_64 mov %r12,%r15 means move r12 into r15 but in AArch64 mov r12,r15 means move r15 into r12. To me the x86 version is the more natural way to read it. As an example you can see the difference in the first line of each program:

    mov    x15,1            /* set loop index (register 15) to 1 */
    mov     $1,%r15         /* loop index */

The first major difference I came across was trying to divide the loop index.

mov    x12,10            /* put number 10 in register 12 */
udiv   x11,x15,x12       /* quotient = loop index / divisor eg. r11 = r15 * r12 */
msub   x13,x12,x11,x15   /* remainder = divisor - (quotient * loop index) */
                         /* r13 = r12 - (r11 * r15) */
movq    %r15,%rax       /* copy the loop index to registry for division */
movq    $10,%r12        /* write 10 to register */
movq    $0,%rdx         /* clear rdx for remainder */
div     %r12            /* divide by 10 */
movq    %rax,%r11       /* move result quotient to register 11 */
movq    %rdx,%r13       /* move remainder to number register 13*/

Although the AArch64 code is 2 lines shorter I found it much more awkward to work with. The udiv function (unsigned division) does not provide a remainder. The remainder must be calculated in an additional step using the formula ‘remainder = divisor – (quotient * loop index)’ with the msub function.

When it comes to placing the number into the message the AArch64 system requires the additional step of moving the address of msg into a register using the adr function before it can insert the number. I do like the way that the add function on AArch64 allows you to specify the register for the result so you don’t always destroy the number you are adding to.

add    x13,x13,48        /* convert number to char: add 48 to quotient (r13) and store result in r13 */
adr    x14,msg           /* load the address of msg into r14 */
strb   w13,[x14,7]       /* Put the remainder (r13) into msg (r14) */
                         /* write one bit from r13 into r14 in the 7th postition */
add     $48,%r13        /* convert to ascii by adding 48 */
mov     %r13b,msg+7     /* copy one byte into msg at location 7 */

Printing the message work the same but AArch64 uses numbered registers 1-8 instead of named registers, write is syscall 64 instead of 1, and the syscall is executed with svc 0, not syscall. AArch64 has no inc command so instead to increment the loop count I just added 1 to it. The final difference I noticed was that AArch64 has calls ‘jumps’, ‘branches’ so the command is bne instead of jne for branch not equal.


Debugging was challenging because I am unfamiliar with the language and there is a distinct lack of meaningful variable names so it is easy to lose track of what is in each register. I found it very helpful to keep a text document open that I updated with the contents of each register when I started using a new one. The program didn’t require much debugging because each instruction was so simple it was easy to predict the results. The only situation that gave me pause was when unexpected characters were printed. Our professor pointed out fairly quickly that is was happing because I was relying on the length value that was stored in a register that was getting clobbered. His help saved us from having to debug that particular error.


Writing this program was an interesting task and did not take as long as I expected it to. That being said it still took much longer and was much more work than writing the same program in C, which would have taken less than a minute and only 4-6 lines of code. The syntax for assembly on x86_64 felt more natural to read and write and took less effort to understand but the syntax for AArch64 had a number of features, like significantly more address space that could be treated as registers and selecting the output register for math functions, that I think would be very handy for more complex applications.


I didn’t have to do much research for this post. The only resources I used are from the lab or in the resources section of these three pages:
Assembler Basics: Assembler_Basics
x86_64 Registers: X86_64_Register_and_Instruction_Quick_Start
AArch64 Registers: Aarch64_Register_and_Instruction_Quick_Start

The following is a zip file containing the source code for the various loop programs I wrote for this lab:

Optional Challenge

This lab had an optional challenge to write a program to print out the times tables up to 12. At the time of publishing this my program is incomplete, it uses nested loops to print the left side of the equation but I have not added the functionality to do the actual multiplication. If I can find the time to finish it I will update this section.

by maxwelllefevre at February 28, 2015 11:23 PM

Hong Zhan Huang

SPO600: Page Size Presentation

This time’s activity was an interlude between the lab work where we presented on one of the following topics that deal with platform specific code and the assumptions that are sometimes made about said code. The topic I covered was Page Sizes.To get a better grasp on what meaning was in differences between page sizes first I had to get an idea of what page sizes currently in use and what pages are supported in general. To that end I looked some information regarding page sizes on X86 and Arm architectures to get an idea of what the base line looks like and to compare the differences between the two.

Architecture Page Size Huge Page Size
X86_64* 4KiB 2MiB, 4MiB, 1GB
Arm v5 1KiB, 4KiB 64KiB
Arm v7** 4KiB 64KiB, 1MiB, 2MiB, 1GB

*2MiB page size is achieved through a combination of Physical Address Extension (PAE) and Page Size Extension (PSE), 4MiB through just PSE and 1GB through enabling Long mode (ie a 64bit OS utilizing 64bit instructions and registers)
**Page sizes greater than 64KiB are only available with LPAE implementation

Playing off the initial idea of assumptions, let’s make one and see where it will take us in understanding page sizes

The assumption: Page sizes are consistent throughout architectures and use the traditional 4KiB page size.

The question: Is this true? If so what are the problems associated with it?

The problem: Assuming that the above is true, there are a few problems that arrive with such a small page size. To expose most performance impacting of these issues we will need some background information on a term related to page based memory management.

Translation Lookaside Buffer (TLB) – A cache that stores recently accessed virtual memory to physical memory translations ie page table entries. When some request for memory occurs the Memory Management Unit (MMU) will first look into the TLB to see if that mapping is already there so it can reduce translation time and quickly satisfy the request. In the case that it finds what it’s looking for it is referred to as a TLB hit and a miss otherwise. In the case of a miss, the MMU must traverse the page tables to find the correct entry which is a much more time consuming process. After finding that entry the TLB needs to update itself with that entry (if the TLB was filled, an entry would need be freed first). The TLB has a limited number of slots for these entries in order to keep the speed of this accessing this cache high.

Let’s take the following situation:

Situation: A program wants to request a 1 MiB block of test data. To do this the CPU would need to allocate 256 pages of 4 KiB each.

Issue: This would mean that each of these entries would be cached into the TLB. Given that the TLB is limited in its capacity we might be running into the issue of cluttering it with many entries of 4KiB pages. This causes a rather large performance drop when the TLB fills up. Perhaps all of those pages could have been single larger page that would only take up one entry in the TLB thus keeping it a fast cache with less frequent need to walk the page tables and free up entries to make way for more recent ones.

The solution: As alluded to the solution is to perhaps make use of larger page sizes. However simply doing that has its own problems and the most obvious one would be that using larger page sizes has a trade off in in wasted spaced. Suppose we wanted a 1025KiB block of data in a system where the page size was 1024KiB. The first page would be filled entirely but the second page needed for that last 1KiB would have wasted the remaining 1023KiB.

If a system were to make use of multiple page sizes that might be what the doctor ordered. Pages that are greater than the traditional 4KiB are called huge pages. Having access to huge page support as well the standard 4KiB pages would give the best of both worlds so to speak. Being able to allocate the best size for the situation at hand would more effectively utilize the TLB cache for large allocations while keeping wasted memory on a reasonable scale for smaller allocations.

While support for huge pages is available in most contemporary processors and architectures, they aren’t in everyday use. Typically they require the enabling some option, flag or advanced settings. There is however something called Transparent Huge Pages which is featured in some contemporary Linux distributions such as Redhat that manages these varying page sizes under the hood (with some limitations as to what regions of memory it can map).

The solution for a programmer? Originally in the presentation I suggested the use of system calls such as mmap on linux based platforms and other equivalent memory mapping functions to produce a ‘best fit’ use of memory to avoid the stated issues of large or small page sizes but it turns out that it was rather irrelevant.

My professor Chris Tyler stated that we probably don’t have to care so much about page sizes in our code and just let the underlying operating system or hardware handle it which seems sensible enough to me.


  • I found it rather difficult to find code that involved page sizes and I suppose given that we shouldn’t care too much about it is perhaps why.
  • While I wasn’t too fruitful in seeing any platform specific code in this research I did get to see the difference in page size support for Arm systems vs X86 which was rather interesting.
  • I also a bit learned about how page sizes larger than the traditional 4KB are set up and how different page sizes can net different gains
  • This time around I don’t feel entirely certain on the things that I’ve written but hopefully any readers can glean some useful information from my bumbling research


Quest completed! Page size enemies cause confusion. End log of an SPO600 player until next time~

by hzhuang3 at February 28, 2015 08:11 PM

February 26, 2015

Thana Annis

Atomic Operations

An atomic operation is an operation that is executed instantaneously (or at least appears to the system to appear instantaneously ). What does that mean? Well, let’s take a look at some sample code and explore this definition.

uinit64_t x = 0;

void Store() {

x = 0x100000002;


This function simply stores a 64bit value into a variable. When it is compiled and converted to machine code instructions, in some cases (some ARMv7 or 32bit x86 processors) a single store is split into 2 instructions.

mov  DWORD PTR x, 2     -> Assigns 2 to the 32 bits at x

mov  DWORD PTR x+4, 1   -> Assigns 1 to the 32 bits at x+4(4 bytes over from the start of x)

*DWORD PTR indicates a size of 32 bits

This code is non-atomic because the operation of storing the value is not instantaneously. If you have a multi-threaded application there is a possibility that you execute the first instruction then another thread attempts to use the value at x before the 2nd instruction is finished. The thread will be accessing a value that hasn’t been fully processed and is known as a Torn Write. A Torn Read will work in the same way except it is when you are trying to read the value from a variable and it is interrupted. If this happens it can be really hard to debug and cause all sorts of problems for your application.

Now you may be wondering what you can do to protect from this, but luckily there are many solutions available to you. In C/C++11 there is an atomic operations library that will guarantee that the instructions cannot be interrupted before they are complete. Here is the same code from earlier written with this new library.


std::atomic<uint64_t> x(0);

void Store() {, std::memory_order_relaxed);


There isn’t too much extra work needed to be sure that your operations are atomic. Include the library, declare your variable as an atomic and then use the libraries store function to assign your values. There are a lot of user made atomic libraries that you can use for other languages.

One technique that can be used to make sure that an operation is atomic is called a Compare and Swap. Say you’re assigning a value to a variable, compare and swap will read in the original value then modify that value. Before it assigns the modified value back to the variable, it will read in the value again and check it against the first read to be sure that it matches. If it matches then it will write the new value, if it doesn’t write it will try the process again.

Another technique is the Fetch and Add. This will read in the original value, increment it and write the new value all in one instruction so it obviously can’t be interrupted.

If you’re writing code that has multiple threads, always keep in mind that something that looks simple like a single store can cause problems at the lower level if you don’t prepare for it.

For more information on this topic:


by bwaffles91 at February 26, 2015 02:54 PM

Artem Luzyanin

Lab5 or First Assembly Programs

For the Lab5 of SPO600 we have write few pieces of assembler code. One is to loop through numbers 0-9 and output them to Standard Output. Another code is to do the same, but for numbers 0-30. Both programs have to be written in aarch64 and x86_64 assembly languages. For the ease of following, I will first talk about all programs in one language, and then in another.

x86_64 : loop 0-9

We start with section “text”:


Defining the global label:

.globl    _start

Starting value for our loop (optional, can be hard coded later, but it’s better to use it):

start = 0

Exit condition for loop, 1 greater than the max value needed to be printed (optional, can be hard coded later, but it’s better to use it):

max = 10

Location of “start” label:


Saving the starting index in register 15:

mov     $start,%r15

Location of “loop” label:


Storing the message length in register rdx:

mov     $len,%rdx

Storing the message location in source index register:

mov     $msg,%rsi

Defining the message destination as “1”(standard output):

mov     $1,%rdi

Storing syscall number for writing in syscall register:

mov     $1,%rax

Doing syscall:


Incrementing the value to be printed by 1:

inc     %r15

Storing the new number in a new register:

mov     %r15,%r14

Adding a character value of zero to the register to make a character out of the number:

add     $’0′,%r14

Storing in a new register the location of a number inside our string:

mov     $num,%r13

Storing one byte value (size of a character) value from the character version of the new number to the location of that number in the string:

mov     %r14b,(%r13)

Comparing if we reached the exit condition (1 over the max number to be printed):

cmp     $max,%r15

If not, GO TO “loop” function again:

jne     loop

If yes, store an exit status in the first register for syscall arguments:

mov     $0,%rdi

Storing exit code in syscall register:

mov     $60,%rax

Doing the syscall:


Now we fill the section “data”:

.section .data

Storing the message to be printed in ascii format:

msg: .ascii  “Loop: 0\n”

Calculating the length of the string (by subtracting the location of the beginning of the string from the current location):

len = . – msg

Storing the location of a number inside the string:

num = msg + 6

x86_64 : loop 0-30

Although the program can be done using functions, I will keep it simpler, so it’s easier to see the change from the “x86_64 : loop 0-9” program, and commend only on the differences:


.globl  _start

start = 0

max = 31

Divisor to be used in the program:

  ten = 10


                mov     $start,%r15

Storing the divisor in a register:

mov     $ten,%r14


                mov     $len,%rdx

mov     $msg,%rsi

mov     $1,%rdi

mov     $1,%rax


inc     %r15

Making “rdx” register equal to zero, so we could use it later for “div” operation:

mov     $0,%rdx

Storing the current number in “rax” register to be used for “div” operation:

mov     %r15,%rax

Dividing content of “rax” register (current number) by 10:

div     %r14

Storing in a new register the location of a number inside our string:

mov     $num,%r12

Storing the quotient of the division in a register:

mov     %rax,%r13

Checking if the quotient is a zero, so our number is one digit only:

cmp     $0,%r13

If it is, skipping to “skip” label:

je      skip

If it’s not, adding a character value of zero to the register to make a character out of the number:

add     $’0′,%r13

Storing one byte value (size of a character) value from the character version of the first digit of a new number to the location of that digit in the string:

mov     %r13b,(%r12)

Location of “skip” label:


Going to the next character in the string, which is the second digit position:

inc     %r12

Storing the remainder of the division in a register:

mov     %rdx,%r13

Adding a character value of zero to the register to make a character out of the number:

add     $’0′,%r13

Storing one byte value (size of a character) value from the character version of the second digit of a new number to the location of that digit in the string:

mov     %r13b,(%r12)

cmp     $max,%r15

jne     loop

mov     $0,%rdi

mov     $60,%rax


.section .data

                msg: .ascii  “Loop:  0\n”

len = . – msg

num = msg + 6

aarch64 : loop 0-9

This program is mostly the same as “x86_64 : loop 0-9”, just with different syntax, but very similar logic, so I will comment on on the differences from “x86_64 : loop 0-9” (different syntax is implied):


.globl    _start

start = 0

max = 10


                mov     x19, start


                adr     x1, msg

mov     x2, len

mov     x0, 1

mov     x8, 64

svc     0

Since aarch64 assembly language doesn’t have “inc” command, we have to use “add” as a substitute:

add     x19, x19, 1

mov     x20, x19

add     x20, x20, ‘0’

Since aarch64 provides us with a command to access a particular address with a starting position and incrementor, we are storing the beginning of the string in a register, instead of the location of the number in it:

adr     x21, msg

Writing the new character version of a digit directly to the location of it inside the string:

strb    w20, [x21,6]

cmp     x19, max

bne     loop

mov     x0, 0

mov     x8, 93

svc     0


                msg: .ascii  “Loop: 0\n”

len = . – msg

Note that we don’t need to store the memory location of the number inside the string, due to “strb” command.

aarch64 : loop 0-30

Again, it’s almost the same program as “x86_64 : loop 0-30”,  just with different syntax and few small changes (and again, i’ll comment only on differences, not including syntax):


.globl    _start

start = 0

max = 31

ten = 10


                mov     x19, start

mov     x22, ten


                adr     x1, msg

mov     x2, len

mov     x0, 1

mov     x8, 64

svc     0

Since aarch64 assembly language provides us with a command to access a particular address with a starting position and incrementor, we are storing the beginning of the string in a register, instead of the location of the number in it:

adr     x23, msg

Since aarch64 assembly language doesn’t have “inc” command, we have to use “add” as a substitute:

add     x19, x19, 1

Since aarch64 assembly language doesn’t have a command that would provide us with both quotient and remainder at the same time, we have to use two commands. Here we are storing the quotient of the division in a register:

udiv    x20, x19, x22

Calculating and storing a remainder in another register:

msub    x21, x22, x20, x19

cmp     x20, 0

beq     skip

add     x20, x20, ‘0’

Writing the new character version of a digit directly to the location of it inside the string:

strb    w20, [x23,6]


                add     x21, x21, ‘0’

Writing the new character version of a digit directly to the location of it inside the string:

strb    w21, [x23,7]

cmp     x19, max

bne     loop

mov     x0, 0

mov     x8, 93

svc     0


                msg: .ascii  “Loop:  0\n”

len = . – msg

Again, note that we don’t need to store the memory location of the number inside the string, due to “strb” command.


Well, it was one heck of a ride! Only few hours ago I was completely clueless to what is the logic flow of an assembly program, and here I am not, blogging about four programs. Programming on assembly was very different from programming on any other language, with, yet, some similarities. The biggest breakthrough happened when I realized that immediate values in assembler are like constants in C, while registers are like variables. The rest is just a difference in available commands and very specific way of outputting and exiting. Also, we were taught not to use GO TO commands, while here we had to use them a lot. But again, some practice, and assembly becomes just another language to learn…

Differences and preferences:

After coding in both x86_64 and aarch64 languages, I must say that I prefer aarch64. First of all, I don’t have to keep track of “$” and “%” as in x86_64, which is a good thing (my first program didn’t compile, because I mixed those two up in one place). Second, I like that there is an opportunity to access memory locations with a starting point and an incrementor. Sounds very useful, and saves a lot of headache. Again, the fact that the location is usually on the left-hand side is more usual to me, since it’s closer to C language. And finally, it’s much easier to remember and work with registers, as they are sequential, and always start with size-defining prefix.

by lisyonok85 at February 26, 2015 05:23 AM

Andrew Smith

Using libXMP with the NDK in an Android app

I’ve done this work to help out with the open source programming course at Seneca (DPS911).

The goal: see if it’s possible (and realistic) to use XMP in an Android app.

I’ve spent about 20 hours working on it, mostly going round in circles. The XMP library is shit developed by idiots and Android Studio is a pain in the neck because if Apple can be, so can Google.

I can’t possibly log everything I’ve done in this post. The problem itself is complex to begin with: cross-compile a major library, and make it usable (via JNI) in an Android app. I was expecting that figuring out cross-compilers, toochains, ABIs, and APIs (which is hard) but the what made it nearly impossible was Android Studio with it’s Gradle nonsense.

Every time I thought I solved a major problem another one took its place. But enough whining, here’s what I found:

janrueegg’s Android port of the library works. I’ve forked it on github just so I know which version that was. Compiling it is simple, from the build directory:

export ANDROID_NDK=~/programs/android-ndk-r10d/
make StaticReleaseAndroid_armeabi-v7a

Note the armeabi-v7a part, you’ll need to build several versions if you’re actually planning to release an app using it or if you want to run it in an x86 emulator. The result is two .ar files which I need to turn into one .a file (for a long time because of retarded Android documentation I was trying to make a .so file from it but that turned out to be unnecessary):

ar xv
ar xv
ar cr libXMP.a *.o
cp libXMP.a ~/temp/MyApplication/app/src/main/prebuilt/

The following steps were so many and confusing I’ve simply posted the resulting sample application on github. Pull it to better understand what I’m talking about.

I added my libXMP.a to app/src/main/prebuilt so I can build it in.

I needed a .cpp file to do the JNI (app/src/main/jni/xmp-jni.cpp). That file needs to include the headers for the library. I’ve dumped all the contents of the directory xmp/public/include from the library into app/src/main/jni. is the glue between the JNI code and whatever wants to use it in the rest of my java.

The best for last, the build part. After spending many hours trying to get it to work with Gradle (first tried using a static version of the library, then dynamic) I figured out that it’s just not going to work, I found snippets of google employees saying online they won’t even try to fix it. Instead I configured gradle to call ndk-build manually, which uses my own and

I’ll just post the files here with a couple of notes, explaining it all would take too long.


apply plugin: ''

android {
    compileSdkVersion 21
    buildToolsVersion "21.1.2"

    defaultConfig {
        applicationId "com.example.andrew.myapplication"
        minSdkVersion 15
        targetSdkVersion 21
        versionCode 1
        versionName "1.0"

        ndk {
            moduleName "xmp-jni"
             *  Need this so that I get STL support, needed by XMP.incl_cpp
             * Sadly I think this makes the apk much bigger because
             * of the included
            stl "gnustl_shared"
             * gnustl_shared has exceptions turned off by default, but I need for XMP.
            cFlags "-fexceptions"


    sourceSets.main {
        // Uncomment this line to use instead
        //jni.srcDirs = ["src/main/jni"]
        jni.srcDirs = []
        jniLibs.srcDir "src/main/jniLibs"

    // call regular ndk-build(.cmd) script from app directory
    task ndkBuild(type: Exec, description: 'Build XMP JNI object files') {
        commandLine '/home/andrew/programs/android-ndk-r10d/ndk-build',
                '-C', file('src/main/jni').absolutePath,
    tasks.withType(JavaCompile) {
        compileTask -> compileTask.dependsOn ndkBuild

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), ''

dependencies {
    compile fileTree(dir: 'libs', include: ['*.jar'])
    compile ''

Note in particular the ndkBuild with its hard-coded paths.

# This makefile is only used if the magic string is uncommented in build.gradle

LOCAL_PATH := $(call my-dir)

# static library info
LOCAL_SRC_FILES := ../prebuilt/libXMP.a

# wrapper info
include $(CLEAR_VARS)
LOCAL_MODULE    := xmp-jni
LOCAL_SRC_FILES := xmp-jni.cpp
LOCAL_CFLAGS    += -fexceptions

APP_STL := gnustl_static
APP_PLATFORM := android-21


Note that here also the paths need to be changed to work on your machine.

I wish I could say the exercise was a huge success. I got to work everything I planned to, but it was really, really hard and even though I managed to call one XMP function from Java, that was the end of my patience. See the comment in the xmp-jni.cpp below:

#include <jni.h>

#define UNIX_ENV 1 // for XMP_Environment.h

#include <string>
#define XMP_INCLUDE_XMPFILES 1 //if using XMPFiles
#define TXMP_STRING_TYPE std::string
 * Clients must compile XMP.incl_cpp to ensure that all client-side glue code is generated. Do this by
 * including it in exactly one of your source files.
 * This has to happen before the #include "XMP.hpp" below
#include "XMP.incl_cpp"

#include "XMP.hpp"

const char* doSomeXMPStuffInCPP()

extern "C" {

    Java_com_example_andrew_myapplication_XMPlib_doSomeXMPStuff( JNIEnv* env,
                                                                 jobject thiz )
            return env->NewStringUTF("Could not initialize SXMPMeta");
        // This returns true but causes the app to crash like this:
        // 02-25 18:27:12.852    2960-2960/com.example.andrew.myapplication A/libc﹕ Fatal signal 6 (SIGABRT), code -6 in tid 2960 (w.myapplication)
        /*if (!SXMPFiles::Initialize())
            return env->NewStringUTF("Could not initialize SXMPFiles.");

        // Do XMP stuff here


        return env->NewStringUTF("XMP toolkit seems to be working!");

I hope for your sake you don’t have to work with this crap, but if you do, here again are the URLs for the two projects:

by Andrew Smith at February 26, 2015 01:25 AM

February 25, 2015

Hosung Hwang

CentOS minimum installation : network setup, GNOME Desktop Install

Network Setup

There are difficult way and easier way to setup network in command line. Editing /etc/sysconfig/network-scripts/ifcfg-eth0 and /etc/resolv.conf will be the difficult way.

Easier way is using this command :

$ nmcli & nmtui

You can see GUI and put static address easily, and see the Wifi network AP list and connect with password.

GNOME Desktop Install

  1. Check installation
    $ sudo yum check-update
  2. Install gnome package
    $sudo yum groupinstall "GNOME Desktop" "Graphical Administration Tools"
  3. Make the system start with GUI
    $sudo ln -sf /lib/systemd/system/ /etc/systemd/system/

Screenshot from 2015-02-25 14:32:38

by Hosung at February 25, 2015 07:43 PM

Alfred Tsang

Assembler Lab

Working in Assembler is harder compared to most programming languages like Java.  In high level programming languages (like Java), commands are easy to execute.  In Assembler, commands are low level, and thus, harder to execute.  Assembler operates on the hardware, and high level programming languages operate on the software.

Debugging in both high level languages (like Java) compared to Assembler is very easy.  In Assembler, the syntax is harder to understand.  High level programming languages have a way of writing to the screen, whereas Assembler has to go through a series of steps.

x86 64 and aarch64 assembler have very similar syntax, and the commands are very similar.

The programs that were written had difficulty to debug.  There were segmentation faults that could not be fixed.

by kaputsky263 at February 25, 2015 07:13 PM

David Humphrey

Repeating old mistakes

This morning I've been reading various posts about the difficulty ahead for the Pointer Events spec, namely, Apple's (and by extension Google's) unwillingness to implement it. I'd encourage you to read both pieces, and whatever else gets written on this in the coming days.

I want to comment not on the spec as such, but on the process on display here, and the effect it has on the growth and development of the web. There was a time when the web's course was plotted by a single vendor (at the time, Microsoft), and decisions about what was and wasn't headed for the web got made by employees of that corporation. This story is so often retold that I won't pretend you need to read it again.

And yet here we are in 2015, where the web on mobile, and therefore the web in general, is effectively in the control of one vendor; a vendor who, despite its unmatched leadership and excellence in creating beautiful hardware, has shown none of the same abilities in its stewardship and development of the web.

If the only way to improve and innovate the web is to become an employee of Apple Inc., the web is in real danger.

I think that the larger web development community has become lax in its care for the platform on which it relies. While I agree with the trend toward writing JS to supplement and extend the browser, I think that it also tends to lull developers into thinking that their job is done. We can't simply write code on the platform, and neglect writing code for the platform. To ignore the latter, to neglect the very foundations of our work, is to set ourselves up for a time when everything collapses into the sand.

We need more of our developer talent and resources going into the web platform. We need more of our web developers to drop down a level in the stack and put their attention on the platform. We need more people in our community with the skills and resources to design, implement, and test new specs. We need to ensure that the web isn't something we simply use, but something we are active in maintaining and building.

Many web developers that I talk to don't think about the issues of the "web as platform" as being their problem. "If only Vendor X would fix their bugs or implement Spec Y--we'll have to wait and see." There is too often a view that the web is the problem of a small number of vendors, and that we're simply powerless to do anything other than complain.

In actual fact there is a lot that one can do even without the blessing or permission of the browser vendors. Because so much of the web is still open, and the code freely available, we can and should be experimenting and innovating as much as possible. While there's no guarantee that code you write will be landed and shipped, there is still a great deal of value in writing patches instead of just angry tweets: it is necessary to change peoples' minds about what the web is and what it can do, and there is no better way that with working code.

I would encourage the millions of web developers who are putting time into technologies and products on top of the web to also consider investing some time in the web itself. Write a patch, make some builds, post them somewhere, and blog about the results. Let data be the lever you use to shift the conversation. People will tell you that something isn't possible, or that one spec is better than another. Nothing will do more to convince people than a working build that proves the opposite.

There's no question that working on the web platform is harder than writing things for the web. The code is bigger, older, more complicated, and requires different tooling and knowledge. However, it's not impossible. I've been doing it for years with undergraduate students at Seneca, and if 3rd and 4th years can tackle it, so too can the millions of web developers who are betting their careers and companies on the web.

Having lived through and participated in every stage of the web's development, it's frustrating to see that we're repeating mistakes of the past, and allowing large vendors to control too great a stake of the web. The web is too important for that, and it needs the attention and care of a global community. There's something you can do, something I can do, and we need to get busy doing it.

by David Humphrey at February 25, 2015 02:51 PM

February 24, 2015

Hong Zhan Huang

SPO600: The fourth lab… C Complier Options

The 4th lab involves exploring a handful of the many options that can be applied when using the gcc compiler. We will doing all of this lab on the Australia machine so the results will reflect an X86_64 system. To begin let us create some C code to be our test subject in this nefarious experimentation. We’ll be using a fundamental Hello World program called hello0.c as our base:

#include <stdio.h>

int main() {
    printf("Hello World!\n");


For the base case we’ll be compiling using these three options:

-g               # enable debugging information
-O0              # do not optimize (that's a capital letter and then the digit zero)
-fno-builtin     # do not use builtin function optimizations
eg. gcc -g -O0 -fno-builtin hello.c -o hello0

By using these options we’ll create a binary that is a ELF (Executable and Linkable Format) file. It will contain various information about the program that can contain things such as program data (the variables the program uses), metadata about the program, debugging symbols… and possibly more. In other to read this abundance of information we will be using the objdump program which as it sounds dumps information about a object file. For our purposes it’ll be used as a disassembler to view executable code in assembly form. There are some rather useful options that objdump has that can be employed (for the most part I’ve used the -d option for this lab):

-f          # display header information for the entire file
-s          # display per-section summary information
-d          # disassemble sections containing code
--source    # (implies -d) show source code, if available, along with disassembly
eg. objdump -d hello0 > hello0.txt

The amount of information dumped into my hello0.txt is quite a bit so I’ll keep the focus of it on the portion of information that shows what’s going on in the main function of hello0.c on the assembler level and if there are other interesting bits of information in different sections it will be noted accordingly.

hello0.c (Our base case which the above compiler options):

0000000000400536 <main>:
  400536:    55                       push   %rbp
  400537:    48 89 e5                 mov    %rsp,%rbp
  40053a:    bf e0 05 40 00           mov    $0x4005e0,%edi
  40053f:    b8 00 00 00 00           mov    $0x0,%eax
  400544:    e8 c7 fe ff ff           callq  400410 <printf@plt>
  400549:    5d                       pop    %rbp
  40054a:    c3                       retq   
  40054b:    0f 1f 44 00 00           nopl   0x0(%rax,%rax,1)

Examining the above we can see what occurs when our main is executed:

  1. First it does a push function with the %rbp (register base pointer) and that seems to relate to pushing a item on to a stack.
  2. Next it moves some data from %rsp (register stack pointer ie the current point in the stack) to the %rbp
  3. Following that is another move operation that moves the data pointed to by $0x4005e0 to the %edi register
  4. After that is yet another move that moves $0x0 to the %eax register
  5. Now we finally get to the actual part where we seem to call our printf function. However it seems that it isn’t a direct call to the function that that the callq looks for the printf function in the plt which I believe is the pointer lookup table.
  6. Pop should be popping something from the stack.
  7. Retq is a return that returns us to the operating system now that the code is completed I think.
  8. Lastly nopl seems to be some sort of padding code that doesn’t do any particular operations. I suppose this needed to better align memory to the set boundaries.

Interestingly the section that actually holds our string “Hello World” that is printed by our program is located elsewhere:

Contents of section .rodata:
 4005d0 01000200 00000000 00000000 00000000  ................
 4005e0 48656c6c 6f20576f 726c6421 0a00      Hello World!..

Now that we’ve established what the base case is like, let us go through each of our 6 other variations and see what the results of using different compiler options are:

hello1.c (added the -static option to the compiler)

0000000000400b5e <main>:
  400b5e:    55                       push   %rbp
  400b5f:    48 89 e5                 mov    %rsp,%rbp
  400b62:    bf 10 09 49 00           mov    $0x490910,%edi
  400b67:    b8 00 00 00 00           mov    $0x0,%eax
  400b6c:    e8 0f 0b 00 00           callq  401680 <_IO_printf>
  400b71:    5d                       pop    %rbp
  400b72:    c3                       retq   
  400b73:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  400b7a:    00 00 00 
  400b7d:    0f 1f 00                 nopl   (%rax)

Looking at the portion that holds the main, it seems that it is relatively similar to the base case with the except of the callq using a direct call to printf instead of using a pointer lookup table. However an obvious difference can be seen in the sizes of the base case’s resulting executable (10KB) vs hello1.c’s (809KB). That’s a 80x difference. The objdump file is also very large totaling at a whopping 10000KB. So what happens when we use the -static option? Well we enable static linking of our libraries rather than dynamic linking. In other words we include all the necessary code to make our program run in the executable rather than linking to it. This causes our executable sizes to bloat significantly. By trading offer memory however it is a suposedly faster process as well as being more portable (since all the required libraries are in the executable if moved to a different system, there wouldn’t be any need to have those libraries available on the new system).

hello2.c (removed the -fno-builtin option)

0000000000400536 <main>:
  400536:    55                       push   %rbp
  400537:    48 89 e5                 mov    %rsp,%rbp
  40053a:    bf e0 05 40 00           mov    $0x4005e0,%edi
  40053f:    e8 cc fe ff ff           callq  400410 <puts@plt>
  400544:    5d                       pop    %rbp
  400545:    c3                       retq   
  400546:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  40054d:    00 00 00

Again it is mostly a mirror of the base case. However instead of printf in the function call we have puts instead. Having removed the -fno-builtin we’ve allowed the built in optimizations to occur when we compile our programs. In this case it saw that the string we were printing “Hello World” is just a simple string without any formatting. Printf allows us to format our strings but we’re not making any use of that functionality so it seems that compiler will optimize our program by using the puts function which I assume is less costly than a printf.

hello3.c (remove the -g option)

0000000000400536 <main>:
  400536:    55                       push   %rbp
  400537:    48 89 e5                 mov    %rsp,%rbp
  40053a:    bf e0 05 40 00           mov    $0x4005e0,%edi
  40053f:    b8 00 00 00 00           mov    $0x0,%eax
  400544:    e8 c7 fe ff ff           callq  400410 <printf@plt>
  400549:    5d                       pop    %rbp
  40054a:    c3                       retq   
  40054b:    0f 1f 44 00 00           nopl   0x0(%rax,%rax,1)

Absolutely no change in the main compared to the base case. Removing the -g option disables the debugging and this is reflected in the dump file where compared to the base case’s dump we are missing some fifty lines of the debugging symbols and information such as:

Contents of section .debug_aranges:
 0000 2c000000 02000000 00000800 00000000  ,...............
 0010 36054000 00000000 10000000 00000000  6.@.............
 0020 00000000 00000000 00000000 00000000  ................

Without the extra debugging related code in our executable the size it also dropped a bit compared to the base case (9.4KB vs 8.4KB)

hello4.c (add addition arguments to printf)

0000000000400536 <main>:
  400536:    55                       push   %rbp
  400537:    48 89 e5                 mov    %rsp,%rbp
  40053a:    6a 0c                    pushq  $0xc
  40053c:    6a 0b                    pushq  $0xb
  40053e:    6a 0a                    pushq  $0xa
  400540:    6a 09                    pushq  $0x9
  400542:    6a 08                    pushq  $0x8
  400544:    6a 07                    pushq  $0x7
  400546:    6a 06                    pushq  $0x6
  400548:    6a 05                    pushq  $0x5
  40054a:    41 b9 04 00 00 00        mov    $0x4,%r9d
  400550:    41 b8 03 00 00 00        mov    $0x3,%r8d
  400556:    b9 02 00 00 00           mov    $0x2,%ecx
  40055b:    ba 01 00 00 00           mov    $0x1,%edx
  400560:    be 00 00 00 00           mov    $0x0,%esi
  400565:    bf 10 06 40 00           mov    $0x400610,%edi
  40056a:    b8 00 00 00 00           mov    $0x0,%eax
  40056f:    e8 9c fe ff ff           callq  400410 <printf@plt>
  400574:    48 83 c4 40              add    $0x40,%rsp
  400578:    c9                       leaveq 
  400579:    c3                       retq   
  40057a:    66 0f 1f 44 00 00        nopw   0x0(%rax,%rax,1)

For this variation I added some additional arguments to the printf. Specifically I added 13 additional integer arguments sequentially ie printf(“Hello World %d %d %d……”, 1, 2, 3, 4…. 13). Compared to the base case, here we see that each argument was put into a register in the same manner sequentially. Once we get past the r9d register each argument is then placed on the stack. So only the first six arguments are place in the registers.

hello5.c (move the printf to another function output() and call that from main)

0000000000400536 <output>:
  400536:    55                       push   %rbp
  400537:    48 89 e5                 mov    %rsp,%rbp
  40053a:    bf f0 05 40 00           mov    $0x4005f0,%edi
  40053f:    b8 00 00 00 00           mov    $0x0,%eax
  400544:    e8 c7 fe ff ff           callq  400410 <printf@plt>
  400549:    5d                       pop    %rbp
  40054a:    c3                       retq  

000000000040054b <main>:
  40054b:    55                       push   %rbp
  40054c:    48 89 e5                 mov    %rsp,%rbp
  40054f:    b8 00 00 00 00           mov    $0x0,%eax
  400554:    e8 dd ff ff ff           callq  400536 <output>
  400559:    5d                       pop    %rbp
  40055a:    c3                       retq   
  40055b:    0f 1f 44 00 00           nopl   0x0(%rax,%rax,1)

By moving the printf to another function called output() we now have addition stuff in our dump file. The main now makes a call to output which then goes to execute printf. After that the process returns to the main to continue the rest of the execution. In this case there is additional overhead in calling output in order to call printf.

hello6.c (remove the -O0 option and add the -O3 option)

0000000000400440 <main>:
  400440:    bf f0 05 40 00           mov    $0x4005f0,%edi
  400445:    31 c0                    xor    %eax,%eax
  400447:    e9 c4 ff ff ff           jmpq   400410 <printf@plt>

The main becomes only 3 lines with the -O3 optimization option turned on! It seems like all that stack set up from the base case is stripped entirely. I would suppose as we aren’t using any memory allocation in our simple hello world program all that initial code is rather unnecessary. The call callq is now a jmpq which I think is something like a goto statement. However much speed is gained through these optimizations, there is a memory trade off as our executable is now 11KB in size compared to the base case’s 9.4KB as well as the dump file’s 25KB vs 19KB.

Now that we’ve concluded all seven variations of the experimenting with the gcc compiler options lets sum up what we’ve discovered:

  • There are a lot of compiler options one can use.
  • The use of any one option can have significant effects in terms of memory, speed or portability. Usually being a trade off of one aspect in exchange for another
  • By disassembling we can gain a better understanding of how the code is seen on the low level as well as insight on what may be the best compiler optimizations we can employ on a particular piece of code

Quest completed! Sifted through a bag of compiler options and organized gcc inventory. End log of an SPO600 player until next time~

by hzhuang3 at February 24, 2015 05:33 AM

Cha Li

Memory Barrier

What is memory barriers?

Memory barriers are used to provide control over the order of memory accesses

A memory barrier that affects both the compiler and the processor is a hardware memory barrier

a memory barrier that only affects the compiler is a software memory barrier

A memory barrier that affect both read and write is called full memory barrier

Understand memory barriers

Void init(){

-data = 42;

_initialized = true;


Complier order

Void init(){

_initialized = true;

-data = 42;


when we have mutil-thread program , if the complier switch process order

we might facing huge problem.

think if the following two function is a multi- thread program

if the complier process _initialized = true first, we will have _data = 0 when we process print at same time

Public class Datainit{

Private int _data;

Private bool _initialized;

Void init(){

_data = 42;

_initialized = true;


Void print(){

If (_initialized){



Console.writeline(“error,not initialized”);



how to fix

Public class Datainit{

Private int _data;

Private bool _initialized;

Void init(){

_data = 42;

Thread.MemoryBarrier();     //1

_initialized = true;

Thread.MemoryBarrier();    //2


Void print(){

Thread.MemoryBarrier();   //3

If (_initialized){

Thread.MemoryBarrier();   //4



Console.writeline(“error,not initialized”);



Memory barrier 1 and 4 prvent  this example from writing “0”

Memory barrier 2and 3 provide freshness guraratntee.

Types of Memory Barrier function



If (isPublished){


Return value;


It prevent reording of loading



A StoreStore barrier effectively prevents reordering of stores performed before the barrier with stores performed after the barrier.



Give the ability to juggle instruction



Ensure the store is complete bofore is visbilt to other processor and all load proformance after the barrier receive the lastest value

by lifuzhang1991 at February 24, 2015 04:21 AM

Hosung Hwang

Continuous Integration with Jenkins


What is Continuous Integration (CI) ?

When I first saw Jenkins, I though about “The Joel Test” from the book, “Joel on Software“. That is a list to rate the quality of a software team. By using Jenkins we can cover 4 of 12 in the list : 1. Do you use source control?, 2. Can you make a build in one step?, 3. Do you make daily builds?, 5. Do you fix bugs before writing new code?.

Nowadays many software projects are developed by many developers, and building the source code takes long time. When there is a compile conflict or test conflict, without CI tool, the development process delayed. CI enables the project always have clean build.

Continuous Integration starts from following requirements :

  1. The system must be able to be built and tested automatically
  2. The system must be under configuration management
  3. Every developer in the team commits their changes frequently.
  4. Upon commit, the system is immediately and automatically “integraged” : Build, Test, Notified

By using CI, we can get following benefits:

  1. Reduced risk : system always builds, if not developers know right away
  2. Bugs and problems discovered quickly
  3. Deployable more frequently with more user feedback

What is Jenkins?

  • Jenkins is very popular open source Continuous Integration tool that works as a server.
  • It provides 1009 plugins to support building and testing virtually any project.
  • Support every build/test project that runs by command line.
  • It uses power of bash.
  • Jenkins is called as the easiest and the most flexible CI tool.

History of Jenkins

Jenkins started developing as the name of “Hudson” by developers in Sun Microsystems in 2004. It evolved and used by developer teams inside Sun. In 2009, Oracle purchased Sun and they wanted strictly controlled development. Whereas, core developers wanted open, flexible, and fast-paced community-focused development as they used to do. In 2011, most of the core Hudson developers and contributors joined to make a fork of Hudson as a name of Jenkins : 75% of Hudson users had switched to Jenkins. In 2015, Jenkins celebrates 100K installations, 1000 plugins, and 10 years of Jenkins.

What Jenkins does

  • Runs as a server : can be installed in Windows, Linux, Mac OS, Solaris
  • Define Jobs
    Register SCM : CVS, Subversion, Git, etc.
    Schedule to pull the source using “cron” syntax
    Test method : JUnit, CPPUnit etc.
    Deploy using ssh
  • Run Jobs : Build, Run unit testing, deploying, running post-build script automatically
  • Notify if there is an issue by email


  • Automatic building, testing
  • Distributed building : Master/Slave strategies using SSH – building in various build environment using jenkins plugin (Mac, Linux, and Windows for iOS app, Android app, and Windows Phone app)
  • Profiling : by adding bash script or using
  • Issue tracking using JIRA plugin.
  • Plenty using plugins :

Who is using Jenkins

  • Companies : Dell, eBay, Facebook, GitHub, LinkedIn, NASA, Netflix, Sony, Yahoo, etc.
  • Open Source Projects : AngularJS, Apache, Bazaar, cocos2d-x, Jenkins, KDE, OpenSUSE, Linaro, Ubuntu, Mozilla, etc.


by Hosung at February 24, 2015 12:56 AM

February 23, 2015

Hosung Hwang

[SPO600] Compiled C Lab – Compiler option and assembly

This posting investigates how a simple C code is translated to an assembly language relating to Compiled C Lab in SPO600 course in Seneca College. I especially focused on “(5) Move the printf() call to a separate function named output(), and call that function from main(). Explain the changes in the object code.” part in the lab.

This lab is performed in a X86_64 machine :

First Sample : hello0.c

#include <stdio.h>

void output(char *c)

int main()
  printf("Print in main\n");
  output("First Param\n");

After printing “Print in main” in main function called output() function with one char array. I compiled with -O0 optimization option : No Optimization. For easy analysis, assembly code is generated using -S option. And generated object dump file using objdump.

  • compile : g++ -O0 -o hello0 hello0.c
  • generate assembly : g++ -O0 -S -o hello0.asm hello0.c
  • object dump : objdump -d hello0 > hello0.dump

following is output and main main function part of hello0.dump

0000000000400646 <_Z6outputPc>:
  400646:   55                      push   %rbp
  400647:   48 89 e5                mov    %rsp,%rbp
  40064a:   48 83 ec 10             sub    $0x10,%rsp
  40064e:   48 89 7d f8             mov    %rdi,-0x8(%rbp)
  400652:   48 8b 45 f8             mov    -0x8(%rbp),%rax
  400656:   48 89 c7                mov    %rax,%rdi
  400659:   b8 00 00 00 00          mov    $0x0,%eax
  40065e:   e8 ad fe ff ff          callq  400510 <printf@plt>
  400663:   c9                      leaveq 
  400664:   c3                      retq   

0000000000400665 <main>:
  400665:   55                      push   %rbp
  400666:   48 89 e5                mov    %rsp,%rbp
  400669:   bf 20 07 40 00          mov    $0x400720,%edi
  40066e:   e8 bd fe ff ff          callq  400530 <puts@plt>
  400673:   bf 2e 07 40 00          mov    $0x40072e,%edi
  400678:   e8 c9 ff ff ff          callq  400646 <_Z6outputPc>
  40067d:   b8 00 00 00 00          mov    $0x0,%eax
  400682:   5d                      pop    %rbp
  400683:   c3                      retq   
  400684:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40068b:   00 00 00 
  40068e:   66 90                   xchg   %ax,%ax

The same part of hello0.asm is :

    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $16, %rsp
    movq    %rdi, -8(%rbp)
    movq    -8(%rbp), %rax
    movq    %rax, %rdi
    movl    $0, %eax
    call    printf
    .cfi_def_cfa 7, 8
    .size   _Z6outputPc, .-_Z6outputPc
    .section    .rodata
    .string "Print in main"
    .string "First Paramn"
    .globl  main
    .type   main, @function
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $.LC0, %edi
    call    puts
    movl    $.LC1, %edi
    call    _Z6outputPc
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8

push %rbp and mov %rsp,%rbp two lines in every function starting point is not significant: typical code for stack pointer saving to manage function calling.

In main(), “Print in main” string address is moved to %edi register (movl $.LC0, %edi) and puts function is called(call puts). This is probably because only one parameter is used for printf function. Then “First Param” string address is moved to %edi(movl $.LC1, %edi), and putput() function is called(call _Z6outputPc). Both cases, %edi register is used to send a parameter.

in output(), these two line : mov %rdi,-0x8(%rbp) and mov -0x8(%rbp),%rax looks like for managing an argument. and then rax is used to send an argument to printf().

Second sample : hello2.c

#include <stdio.h>

void output(char *c, char *d, char *e, char *f, char *g)
  printf("text %s, %s, %s, %s, %s", c, d, e, f, g);

int main()
  printf("Print in main\n");
  output("First Param\n", "Second param\n", "Third param\n", "Forth param\n", "Fifth param\n");

This sample send 5 arguments to output function, and output function calls printf function with 6 arguments.

Following is object dump file :

0000000000400646 <_Z6outputPcS_S_S_S_>:
  400646:   55                      push   %rbp
  400647:   48 89 e5                mov    %rsp,%rbp
  40064a:   48 83 ec 30             sub    $0x30,%rsp
  40064e:   48 89 7d f8             mov    %rdi,-0x8(%rbp)
  400652:   48 89 75 f0             mov    %rsi,-0x10(%rbp)
  400656:   48 89 55 e8             mov    %rdx,-0x18(%rbp)
  40065a:   48 89 4d e0             mov    %rcx,-0x20(%rbp)
  40065e:   4c 89 45 d8             mov    %r8,-0x28(%rbp)
  400662:   48 8b 7d d8             mov    -0x28(%rbp),%rdi
  400666:   48 8b 75 e0             mov    -0x20(%rbp),%rsi
  40066a:   48 8b 4d e8             mov    -0x18(%rbp),%rcx
  40066e:   48 8b 55 f0             mov    -0x10(%rbp),%rdx
  400672:   48 8b 45 f8             mov    -0x8(%rbp),%rax
  400676:   49 89 f9                mov    %rdi,%r9
  400679:   49 89 f0                mov    %rsi,%r8
  40067c:   48 89 c6                mov    %rax,%rsi
  40067f:   bf 60 07 40 00          mov    $0x400760,%edi
  400684:   b8 00 00 00 00          mov    $0x0,%eax
  400689:   e8 82 fe ff ff          callq  400510 <printf@plt>
  40068e:   c9                      leaveq 
  40068f:   c3                      retq   

0000000000400690 <main>:
  400690:   55                      push   %rbp
  400691:   48 89 e5                mov    %rsp,%rbp
  400694:   bf 78 07 40 00          mov    $0x400778,%edi
  400699:   e8 92 fe ff ff          callq  400530 <puts@plt>
  40069e:   41 b8 86 07 40 00       mov    $0x400786,%r8d
  4006a4:   b9 93 07 40 00          mov    $0x400793,%ecx
  4006a9:   ba a0 07 40 00          mov    $0x4007a0,%edx
  4006ae:   be ad 07 40 00          mov    $0x4007ad,%esi
  4006b3:   bf bb 07 40 00          mov    $0x4007bb,%edi
  4006b8:   e8 89 ff ff ff          callq  400646 <_Z6outputPcS_S_S_S_>
  4006bd:   b8 00 00 00 00          mov    $0x0,%eax
  4006c2:   5d                      pop    %rbp
  4006c3:   c3                      retq   
  4006c4:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  4006cb:   00 00 00 
  4006ce:   66 90                   xchg   %ax,%ax

Following is generated assembly code :

    .string "text %s, %s, %s, %s, %s"
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $48, %rsp
    movq    %rdi, -8(%rbp)
    movq    %rsi, -16(%rbp)
    movq    %rdx, -24(%rbp)
    movq    %rcx, -32(%rbp)
    movq    %r8, -40(%rbp)
    movq    -40(%rbp), %rdi
    movq    -32(%rbp), %rsi
    movq    -24(%rbp), %rcx
    movq    -16(%rbp), %rdx
    movq    -8(%rbp), %rax
    movq    %rdi, %r9
    movq    %rsi, %r8
    movq    %rax, %rsi
    movl    $.LC0, %edi
    movl    $0, %eax
    call    printf
    .cfi_def_cfa 7, 8
    .size   _Z6outputPcS_S_S_S_, .-_Z6outputPcS_S_S_S_
    .section    .rodata
    .string "Print in main"
    .string "Fifth paramn"
    .string "Forth paramn"
    .string "Third paramn"
    .string "Second paramn"
    .string "First Paramn"
    .globl  main
    .type   main, @function
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $.LC1, %edi
    call    puts
    movl    $.LC2, %r8d
    movl    $.LC3, %ecx
    movl    $.LC4, %edx
    movl    $.LC5, %esi
    movl    $.LC6, %edi
    call    _Z6outputPcS_S_S_S_
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8

Important part is following :

    movl    $.LC2, %r8d     //Fifth param
    movl    $.LC3, %ecx     //Forth param
    movl    $.LC4, %edx     //Third param
    movl    $.LC5, %esi     //Second param
    movl    $.LC6, %edi //First Param
    call    _Z6outputPcS_S_S_S_
    movq    %rdi, -8(%rbp)  //First Param
    movq    %rsi, -16(%rbp) //Second param
    movq    %rdx, -24(%rbp) //Third param
    movq    %rcx, -32(%rbp) //Forth param
    movq    %r8, -40(%rbp)  //Fifth param
    movq    -40(%rbp), %rdi //First Param
    movq    -32(%rbp), %rsi //Second param
    movq    -24(%rbp), %rcx //Forth param
    movq    -16(%rbp), %rdx //Third param
    movq    -8(%rbp), %rax  //Fifth param
    movq    %rdi, %r9       
    movq    %rsi, %r8       
    movq    %rax, %rsi      
    movl    $.LC0, %edi     //"text %s, %s, %s, %s, %s"
    movl    $0, %eax
    call    printf

In main, before calling output, 5th, 4th, 3rd, 2nd, and 1st arguments set to registers, r8d, cs, edx, esi, and edi, respectively. Then in output(), now rdi, rsi, rdx, rcx, and %r8 is used to store them in stack local memory. And then again these values are loaded to registers. After loading first argument for formatting (“text %s, %s, %s, %s, %s”), printf function is called.

Debugging, call stack analysis

Followings are debugging screenshot:
Screenshot from 2015-02-22 17:42:01

Screenshot from 2015-02-22 18:02:25

While executing main() function’s code starting from 0x4005d0, base pointer(rbp) was 0x7fffffffdc70, and stack pointer(rsp) was also 0x7fffffffdc70; the same address probably because there is no command line argument and local variable. Then, when output() function was executing, base pointer was 0x7fffffffdc60 and stack pointer was 0x7ffffffffdc30.

Stack looks like this :

Screenshot from 2015-02-22 20:39:21

by Hosung at February 23, 2015 01:42 AM

February 22, 2015

Yan Song

Gulliver in Byteland

For program objects that cannot fit in one byte—like ints, two conventions have to be established:

  • The addressing convention; that is,  what is the address of a multibyte object?
  • The byte ordering (aka endianness) convention; that is,In  How are the bytes ordered in memory?

In virtually every computer machine, a sequence of bytes is used to represent a multibyte object, with its address given by the lowest memory address of the bytes allocated. For example, suppose the variable x = 0x11223344U is of type unsigned int (and an unsigned int spans four bytes), then &x would be the address of the smallest address of the four bytes.


As for the byte ordering problem, there are two common approaches. The first convention is referred to as little endian, in which case the least-significant byte comes first in memory whereas the most-significant byte comes last. Using our unsigned int x above, the memory layout would be:


This convention is followed by most Intel-compatible machines.

The second convention is referred to as big endian, where the least-significant byte comes last in memory whereas the most-significant byte comes first. Following this convention, in memory our unsigned x would look like so:


The big-endian convention is followed by most most machines manufactured by IBM.

Some recent machines are biendian, in the sense that they can operated in either little-endian mode or big-endian mode.

For application developers programming in higher-level languages, the problems of addressing and endianness are usually not visible. But in two major major cases, these two  issues would show up. The first case is when one is doing network programming where data are routinely exchanged between machines that follow different endian conventions. The second case is, of course, when one is reading assembly code generated by an dissembler

The following simple C code lets us confirm that our server australia is operating in little-endian mode:

/* byteorder.c */
#include <stdio.h>

int main(void) {
        unsigned int x;
        unsigned char *p;
        int i;

        x = 0x11223344U;
        p = (unsigned char *) &x;
        for (i = 0; i < sizeof(unsigned int); i++) {
                printf("%p: %02x\n", p, *p);
        return 0;

In this program, we cast &x to unsigned char * so that we can look into the individual bytes that are used to represent x. The addressing convention makes sure that the pointer increment is getting the for loop on the right track and therefore, will enumerate the bytes in the right order.

by ysong55 at February 22, 2015 04:09 AM

February 21, 2015

Hosung Hwang

HTTP/2.0 is comming


Couple of days ago, HTTP/2 specification is approved as a standard. From HTTP/1.1 to HTTP/2.0 it’s been 16 years. The version number says there are significant difference.

Apparently, HTTP/2.0 is based on SPDY, which is made by Google, implemented inside chrome browser to boost speed. For 16 years, the internet world changed : Google rules the internet and they changed HTTP standard.

What is the difference?

  1. Single connection -> Multiple connection
    In HTTP/1.x, although client requested using multiple connection to load resources, the server have to serve them in order. In HTTP/2.0, multiple bidirectional streams are multiplexed over a single TCP connection. New specification support at least 100 different streams over a single connection.

  2. Text protocol -> Binary protocol
    This looks weird because HTTP is HyperText Transfer Protocol. It contains the meaning of Text protocol. Anyway it is now binary protocol. Probably we do not need to use base64 en/decoding kind of things. May be another media streaming protocol can be inside HTTP itself.

  3. Uncompressed TEXT based header -> HPACK compressed header
    There is additional RFP for HPACK, which is Header Compression for HTTP/2. It is for compressing HTTP header securely.

  4. Server push
    HTTP/2.0 allows servers to push streams to clients without clients having to make the initial request. This seems to cause some change in web programming environment such as Ajax.

When can I see the difference?

  1. Clients need to support it.
    Chrome already has these features. This link says we can run HTTP/2.0 mode chrome browser by using “–enable-spdy4″ flag. I tried it in my chrome40 for Linux, but there was no difference. Chrome Canary doesn’t support Linux. Do I have to build? May be.
    Firefox says they are under development, and we can try by using nightly channel. This link says how to enable HTTP 2.0 in Firefox; I tried it. However, I couldn’t see any difference. Maybe I need to see packets by capturing.

  2. Server need to support it.
    Although a client support HTTP/2.0, if the server doesn’t support it, it is nothing. May be Google already finished server part development I guess. In Linux, is it implemented? I don’t know. It should be checked. If there is a server that supports HTTP/2.0 and if we have a browser, the test will be possible. This link says Apache project has mod_spdy last year.

  3. Networking devices need to support it.
    Router, IPS, Firewall, etc.. Few years ago, I’ve experienced some network that doesn’t support HTTP/1.1 range request. The conclusion was that there was a network device that support only HTTP/1.0. It will take time to meet the network that fully support new protocol.

This link lists HTTP/2.0 implementations in Client, Server, proxy, etc.


  1. Client implementation
    Not only for browsers, other client programs will need to implement HTTP/2.0. C++ libraries and many libraries will implement it.

  2. Requires more servers
    The fact that client can have more connection in parallel means server should serve more connection at the same time. Client will have more benefit in terms of speed, but to do that, more server power is necessary.


by Hosung at February 21, 2015 07:06 PM

Mohamed Baig

Progress on CC image library – Release 0.2

Switched focus from Qt to XMP. Getting to work on Android. Working on a sample app now.

by mbbaig at February 21, 2015 08:24 AM

Ryan Dang

CCImage Release 0.2


Below are the API and mock up methods that will be implement for the CCImage library

package ccimage;
import java.util.*;
import image-metadata.*;
import constants.*;
import licenses.*;

public class CCImage

	private Constants constants;
	private CCImageMEtadata metadata;

	public void CCImage() {}

	//check if image already have metadata
	private boolean hasMetadata(Uri uri) {}

	//check if image already have license
	private boolean hasLicense(Uri uri) {}

	public void addImageMetadata (CCImageMEtadata metadata, Uri uri,  boolean overWrite, Uri output) {}
	public void addPNGImageMetadata (CCPNGImageMEtadata metadata, Uri uri,  boolean overWrite, Uri output) {}
	public void addJPEGImageMetadata (CCJPEGImageMEtadata metadata, Uri uri,  boolean overWrite, Uri output) {}

	public void updateAuthorName(String name, Uri uri) {}
	public void updateLicense(License license, Uri uri) {}

	// extract metadata from image uri
	public CCImageMetadata extractmetadata(Uri uri) {}

	public String getImageType(Uri uri) {}


package image-metadata;
import java.util.*;
import constants.*;
import licenses.*;

public class CCImageMetadata {

	private String author;
	private Date creationDate;
	private License license

	public void CCImageMetadata() {}
	public void CCImageMetadata(String author, License license, Date creationDate) {}
	//Get methods
	public String getAuthor() {}
	public License getLicense() {}
	public Date getCreationDate() {}

	//Set methods
	public void setAuthor(String author) {}
	public void setLicense() (License license)
	public void setDate(Date date) {}



package image-metadata;
import java.util.*;
import constants.*;

public CCJPEGImageMetadata extends CCImageMetadata {


package image-metadata;
import java.util.*;
import constants.*;

public CCPNGImageMetadata extends CCImageMetadata {


I created a license package that includes all the Creative Common license


package licenses;

public class License {
	public static final int name = "";
	public static final String licenseUrl = "";

package licenses;

public class AttributionNoDerivs extends License {
	public static final int name = "Attribution-NoDerivs";
	public static final String licenseUrl = "";

package licenses;

public class Attribution extends License {
	public static final int name = "Attribution";
	public static final String licenseUrl = "";

package licenses;

public class AttributionNonCommercial extends License {
	public static final int name = "Attribution-NonCommercial";
	public static final String licenseUrl = "";

package licenses;

public class AttributionNonCommercialNoDerivs extends License {
	public static final int name = "Attribution-NonCommercial-NoDerivs";
	public static final String licenseUrl = "";

package licenses;

public class AttributionNonCommercialShareAlike extends License {
	public static final int name = "Attribution-NonCommercial-ShareAlike";
	public static final String licenseUrl = "";

package licenses;

public class AttributionShareAlike extends License {
	public static final int name = "Attribution-ShareAlike";
	public static final String licenseUrl = "";

I also created a simple unit test for some of the methods


package test;
import java.util.*;
import constants.*;

public class CCImageTest {

	private CCImage ccimage;

	public CCImageTest() {
		ccimage = new CCImage();
	public void testCCImageHasLicense() {
		assert ccimage.hasLicense("testImages/imageWithLicense.jpeg") == true;

	public void testCCImageHasNoLicense() {
		assert ccimage.hasLicense("testImages/imageWithNoLicense.jpeg") == false;

	public void testCCImageHasMetadata() {
		assert ccimage.hasLicense("testImages/imageWithMetadata.jpeg") == true;

	public void testCCImageHasNoMetadata() {
		assert ccimage.hasLicense("testImages/imageWithNoMetadata.jpeg") == false;

	public void testCCImageGetImageTypePNG() {
		assert ccimage.getImageType("testImages/imageWithNoMetadata.png") == "png";

	public void testCCImageGetImageTypeJPEG() {
		assert ccimage.getImageType("testImages/imageWithNoMetadata.jpeg") == "jpeg";

	public void runAllTests(){

A constants class that store all the constant variable that can be shared with other classes

[sourcecode language="java"]
package constants;

public Constants {
    private Constants() {}  // Prevents instantiation

    public static final int ATTRIBUTION = 1;
    public static final int ATTRIBUTION_SHAREALIKE = 2;
    public static final int ATTRIBUTION_NODERIVS = 3;
    public static final int ATTRIBUTION_NONCOMMERCIAL = 4;
    public static final int ATTRIBUTION_NONCOMMERCIAL_SHAREALIKE =5;
    public static final int ATTRIBUTION_NONCOMMERCIAL_NODERIVS =6;
    public static final String ATTRIBUTION_LINK = "";
    public static final String ATTRIBUTION_SHAREALIKE_LINK = "";
    public static final String ATTRIBUTION_NODERIVS_LINK = "";
    public static final String ATTRIBUTION_NONCOMMERCIAL_LINK = "";
    public static final String ATTRIBUTION_NONCOMMERCIAL_SHAREALIKE_LINK = "";
    public static final String ATTRIBUTION_NONCOMMERCIAL_NODERIVS_LINK = "";
    public static final String IMAGE_TYPE_PNG = "png";
    public static final String IMAGE_TYPE_JPEG = "jpeg";

by byebyebyezzz at February 21, 2015 04:55 AM

February 20, 2015

Jordan Theriault

Scheme for adding a valid license to an image using Adobe XMP Toolkit

In continuation of my previous post, I’ve created a sample which stores the licenses, uses the XMP Toolkit to create the valid XMP packet, and applies it to a file. The real take-away is the library function which allows easy creation of XMP packets and can be used now in C++ programs. However, the size of the XMP toolkit is large as it will still take whittling away the dense and irrelevant code, or integrate it into wrappers like the Android XMP kit that my fellow team members are working on. The following code is adapted from code released by my colleague Gideon Thomas.

I am using an XMP metadata packet instance and enums in for the function parameters. Using enums allows coders to easily reference a CC license without the use of “magic numbers”. These enums are used to access a map which contains instances of license objects, which in turn contain the license information.

AddLicense(MyXMPPacket, CC_ATTRIBUTION) is the function where MyXMPPacket is an XMP packet which is either new or contains a previous license and CC_ATTRIBUTION is the enum which could be any of 6 creative commons licenses.

In order to determine how to add the specific properties using the API, I referenced this list, as well as this.

Snippets from the code

Note: These are condensed, for proper usage reference the files.

The code which adds a license to correct properties is:
“URI” is where the license URI is placed as a string.
“Description” is where description for Usage Terms would be placed.
meta->SetProperty(kXMP_NS_XMP_Rights, "Marked", "True", 0);
meta->SetProperty(kXMP_NS_XMP_Rights, "WebStatement", "URI", 0);
meta->SetProperty(kXMP_NS_XMP_Rights, "UsageTerms", "Description", 0);

kXMP_NS_XMP references the namespace “” which is defined in XMP_Toolkit/public/include/XMP_Const.h (a great resource for those working with the XMP Toolkit for any reason).

To affix the code to the file:
myFile.OpenFile(filename, kXMP_UnknownFile, opts);

In this release is the new license class, library code in conjunction with the test code. In order to compile this, you will also need the aforementioned adobe XMP library repaired for OSX compilation by my colleague Gideon Thomas.

Screenshot 2015-02-20 17.28.37

1. C++ Files
2. Image with license added

by JordanTheriault at February 20, 2015 11:02 PM

Hosung Hwang

[SPO600] Platform Specific Code : Array/Pointer Access Analysis

In the last posting, I compaired the time between big[i] = i; and *p++ = i;. Interesting point was the gap between two method in AArch64 was much smaller than the gap in X86. In this posting, I looked at assembly code of it.

Array element calculation

    for(int i = 0; i < ARR_SIZE; i++){
        big[i] = i;

This code is translated to following assembly code in AArch64 :

        ldrsw   x2, [x29,60]
        adrp    x0, big
        add     x0, x0, :lo12:big
        ldrsw   x1, [x29,60]
        str     x2, [x0,x1,lsl 3]
        ldr     w0, [x29,60]
        add     w0, w0, 1
        str     w0, [x29,60]

big[i] = i; part is line 6 : str x2, [x0,x1,lsl 3].
“lsl” is “logical shift left”; I it means << 3.
For example, binary 00000001(1) becomes 00001000(8).
8 bits are 1 byte. big[] is 1 byte char array, thus it make sense.
I am not sure how rest of the code works. But it seems clear that shift operation is used to calculate array element position.

Whereas, in X86 64 :

        movl    -4(%rbp), %eax
        movslq  %eax, %rdx
        movl    -4(%rbp), %eax
        movq    %rdx, big(,%rax,8)
        addl    $1, -4(%rbp)

big[i] = i; part is line 6 : movq %rdx, big(,%rax,8)
According to googling, big(,%rax,8) means something like M[0x402680 + (8 * RAX)]. Which means it uses multiplication to calculate the position.

Generally said that shift operation is faster than multiplication in CPU level. I guess the speed difference between AArch64 and X86 comes from this.

Pointer enumeration

    for(int i = 0; i < ARR_SIZE; i++){
        *p++ = i;

This piece of code is translated to assembly code in AArch64 like this :

        ldr     x0, [x29,48]
        add     x1, x0, 8
        str     x1, [x29,48]
        ldrsw   x1, [x29,44]
        str     x1, [x0]
        ldr     w0, [x29,44]
        add     w0, w0, 1
        str     w0, [x29,44]

And in X86 :

        movq    -16(%rbp), %rax
        leaq    8(%rax), %rdx
        movq    %rdx, -16(%rbp)
        movl    -20(%rbp), %edx
        movslq  %edx, %rdx
        movq    %rdx, (%rax)
        addl    $1, -20(%rbp)

In both machines, there were no significant difference : most of them were moving and adding.


  • Code like *pointer++ = i is generated the same kind of assembly code in both AArch64 and X86.
  • Code like array[i++] = j is generated using shift operation and multiplication in AArch64 and X84, repectively. This cause speed difference.


Tips for compiling/gdb for assembly

  • Generating assembly code : $ g++ -S -o ar2.a array2.c
  • Building for debugging : g++ -g -o ar2 array2.c
  • Assembly mode in gdb : (gdb)layout asm
  • View registers : (gdb)info registers
  • View a register : (gdb)i r x1

by Hosung at February 20, 2015 09:37 PM

February 19, 2015

Maxwell LeFevre

FPU Rounding in Different Architectures (Presentation)

My chosen topic is FPU Rounding in different architectures. There are a few different rounding modes according to the gnu web page. They include round to nearest, round toward plus infinity, round toward minus infinity, and round toward zero. Before I explain them I am going to talk about how floating point calculations work.

Precision and Rounding

There are two methods for calculating a floating point number in the x86 architecture. Extended precision mode, which uses 64-bits to store the results before rounding down to fit it into the allocated memory space for a float (single precision, 32-bits, 1 sign bit, 8 exponent bits, 23 bit significand), read more here. The other way is double precision mode which stores the value in 64 bits (1 sign bit, 11 exponent bits, 52 bit significand), read more here. Double and single precision numbers are defined in C by float, for single, and double, for double precision. I used a part of the value of pi to demonstrate the difference between single and double precision.


#include <stdio.h>
int main() {
  float  piFloat  = 3.14159265358979323846;
  double piDouble = 3.14159265358979323846;

  printf("\n\\****   Double vs Float   ****\\\n");
  printf("Pi as entered:  3.14159265358979323846\n");
  printf("Pi in a double: %.18f\n", piDouble);
  printf("Pi in a float:  %.10f\n", piFloat);


\****   Double vs Float   ****\
Pi as entered:  3.14159265358979323846
Pi in a double: 3.141592653589793116  <-- 15 decimal places (116 is garbage)
Pi in a float:  3.1415927410 <--  7 decimal places (410 is garbage)

The operating system declares which of these two methods is used by default. Most OS’ use double precision, but GNU Linux uses extended precision. Extended precision stores the number in 80 bits (1 sign bit, 15 exponent bits, 63 bit significand) while calculations are made and then rounds it down to fit in the specified type (float or double). The mode can be overridden though by specifying it in your program (more on this further down).

Round to nearest: Normal rounding
Round toward plus infinity: Always round to the next higher number (ceil)
Round toward minus infinity: Always round to the next lowest number (floor)
Round toward zero: Negative rounds up, positive rounds down.

ARM processors use ’round to nearest’ for FPU rounding and I was unable to find out what x86 processors use as default.

From :

Traditional x87 FPU

The characteristics of the x87 FPU are:

  • Different instructions for loading and storing different types of floating point numbers from and to memory.
  • single instruction set for performing calculations.
  • An internal precision with which all calculations are done.
  • Instructions to change the internal precision
  • Support for single, double and double-extended precision.

MMX/SSE instructions

The characteristics of the MMX/SSE instructions are:

  • Different instructions for loading and storing different types of floating point numbers from and to memory.
  • Different instructions for performing caluclations depending on the data types in the registers.
  • Calculation precision is dependent on operhand types.
  • Supports only single and double precision
  • Support for parallel computation.


According to the Documentation (see: Book I: PowerPC User Instruction Set Architecture) and some tests done by Scott MacVicar, the following holds for PPC:

  • Different instructions for loading and storing different types of floating point numbers from and to memory.
  • single instruction set for performing calculations.
  • All calculations are done in double precision.
  • Supports single and double precision, but not double-extended precision
  • At least the Power4 processor (integer-64bit PPC) also supports quad precision



Johannes Schlüter did some tests on UltraSPARC IIIi platform which yielded the following results:

  • Different instructions for loading and storing different types of floating point numbers from and to memory.
  • Different instructions for performing caluclations depending on the data types in the registers.
  • Calculation precision is dependent on operhand types.
  • Supports single, double and quad precision

Control Rounding in C

Controlling how rounding happens in C/C++ is fairly simple. It involves using the fenv.h library. The example below is from this question on

#include <fenv.h>

const int originalRounding = fegetround( );  // store the original rounding mode
fesetround(FE_TOWARDZERO);  // establish the desired rounding mode
// do whatever you need to do ...
fesetround(originalRounding);   // ... and restore the original mode afterwards
rounding mode    C name
to nearest       FE_TONEAREST   
toward zero      FE_TOWARDZERO  
to +infinity     FE_UPWARD      
to -infinity     FE_DOWNWARD    


For this presentation we were supposed to find examples of code that doesn’t work, and then explain why it doesn’t work and how to fix it. I was unable to come up with or find a current example of code that actually shows different results on the two different systems. Many of what people thought were examples of rounding error were just the result of them using the wrong sized variable and printing garbage after their number. There were also number of outdated examples that have now been fixed though. There was a lot more information available about x86 than ARM. One important thing I found was that if you absolutely need consistency you should use double precision because it is specified in the IEEE 754 standard so should be the same across all hardware/software setups that are compliant.


About rounding modes:
Precision modes:
FPU information on different arcitectures:
Slideshow from presentation: FPURounding
Fenv man page:
Fenv.h example:
IEEE standard for FPU:

by maxwelllefevre at February 19, 2015 04:00 PM

February 17, 2015

Alfred Tsang

Summary of Presentation

The presentation that I mentioned taught about how different architectures (e.g x86 and iSeries) have different ways that datatypes can be manipulated and this can affect the result of the program.  It also taught about the consequences of dividing two integers can have an impact if architecture is not taken into consideration.

Any programming language can write code that can divide two integers correctly relies on whatever architecture it is running on.  Each architecture is different, and this may cause the program to either crash or continue, as long as the same program is run on two different computers with two different architectures.

by kaputsky263 at February 17, 2015 09:51 PM

Hosung Hwang

[SPO600] Platform Specific Code : Array Access Optimization

In the last posting, I examined how array is arranged in memory. In this posting, I will test about the speed depending on array access method.

Sample Code

I wrote simple test code :

#include <stdio.h>
#include <time.h>

#define ARR_SIZE 400000000

long big[ARR_SIZE];

int main()
    time_t start, end;

    start = clock();
    for(int i = 0; i < ARR_SIZE; i++){
        big[i] = i;
    end = clock();
    printf("Delay time : %fn", (double)(end - start)/CLOCKS_PER_SEC);

    start = clock();
    long *p = big;
    for(int i = 0; i < ARR_SIZE; i++){
        *p++ = i;
    end = clock();
    printf("Delay time : %fn", (double)(end - start)/CLOCKS_PER_SEC);

The source code makes 400 million long array; long in 64bit machine is 8 bytes.
First loop set array from big[0] to big[399999999] in order. Second loop do the same thing using pointer increment.

Result in AArch64 and X86

I tested 10 times in both machines. Followings are the result :

ARM64 Array ARM64 Pointer X86 Array X86 Pointer
2.540624 1.835648 3.344561 1.466308
2.541102 1.835669 3.383403 1.470134
2.541190 1.835670 3.323564 1.490613
2.541096 1.835970 3.339740 1.500834
2.541325 1.837497 3.362360 1.315499
2.540733 1.835697 3.400443 1.483274
2.541273 1.835647 3.382420 1.490314
2.541166 1.835666 3.286187 1.285057
2.540958 1.835688 3.274380 1.490684
2.541311 1.835644 3.331944 1.455013

When I draw a graph :

AArch64 :

X86 :


  • In this case, pointer access is faster and array access.
  • In AArch64 machine, pointer access was around 50% faster than array access.
  • In X86 machine, pointer access was more than 100% faster than array access.
  • Interestingly, AArch64 machine’s speed gap was smaller than X86 : pointer access was slower than x86 and array access was faster than x86.
  • In this case changing from array access to pointer access can be a solution for optimization.

Presentation for this topic

by Hosung at February 17, 2015 12:27 AM

February 16, 2015

Hosung Hwang

[SPO600] Platform Specific Code : Layout of arrays in memory

In this post, I will test how array is look like in the memory. 1-dimensional, 2-dimensional, and 3-dimensional array will be tested, and also, char type and int type will be tested.
Test is also performed in x86 machine and AArch64 machine.

Sample code

I wrote simple c test code:

#include <stdio.h>

int main()
    printf("SIZE : char(%d), int(%d), long(%d)\n", sizeof(char), sizeof(int), sizeof(long));

        char x[10] = {65, 66, 67, 68, 69, 70, 71, 72, 73, 74}; //A ~ J
        char y[2][5] = {{65, 66, 67, 68, 69}, {70, 71, 72, 73, 74}};
        char z[2][2][3] = {{{65, 66, 67}, {68, 69, 70}}, {{71, 72, 73}, {74, 75, 76}}};

        printf("=== 1-D char x[10] ===\n");
        for (int i = 0; i < 10; i++)
            printf("0x%x : x[%d] : 0x%x\n", &(x[i]), i, x[i]);

        printf("=== 2-D char y[2][5] ===\n");
        for (int i = 0; i < 2; i++)
            for (int j = 0; j < 5; j++)
                printf("0x%x : y[%d][%d] : 0x%x\n", &(y[i][j]), i, j, y[i][j]);

        printf("=== 3-D char z[2][2][3] ===\n");
        for (int i = 0; i < 2; i++)
            for (int j = 0; j < 2; j++)
                for (int k = 0; k < 3; k++)
                    printf("0x%x : z[%d][%d][%d]  0x%x\n", &(z[i][j][k]), i, j, k, z[i][j][k]);
        int x[10] = {0,1,2,3,4,5,6,7,8,9};
        int y[2][5] = {{0,1,2,3,4},{5,6,7,8,9}};    

        printf("=== 1-D int x[10] ===\n");
        for (int i = 0; i < 10; i++)
            printf("0x%x : 0x%08x\n", &(x[i]), x[i]);

        printf("=== 2-D int y[2][5] ===\n");
        for (int i = 0; i < 2; i++)
            for (int j = 0; j < 5; j++)
                printf("0x%x : 0x%08x\n", &(y[i][j]), y[i][j]);

There are three char arrays : 1-Dimension, 2-Dimension, 3-Dimension. When we see the printed address, we can check if they are continuous even though they are multi-dimensional array.
I also tested two int arrays to see 4 bytes data array.

Debugging and Execution result of AArch64 machine

Following is binary data in the memory while debugging.

(gdb) x/10bt x
0x3fffffff310:  01000001    01000010    01000011    01000100    01000101    01000110    01000111    01001000
0x3fffffff318:  01001001    01001010
(gdb) x/10bt y
0x3fffffff300:  01000001    01000010    01000011    01000100    01000101    01000110    01000111    01001000
0x3fffffff308:  01001001    01001010
(gdb) x/12bt z
0x3fffffff2f0:  01000001    01000010    01000011    01000100    01000101    01000110    01000111    01001000
0x3fffffff2f8:  01001001    01001010    01001011    01001100

(gdb) x/10w x
0x3fffffff2a0:  00000000000000000000000000000000    00000000000000000000000000000001    00000000000000000000000000000010    00000000000000000000000000000011
0x3fffffff2b0:  00000000000000000000000000000100    00000000000000000000000000000101    00000000000000000000000000000110    00000000000000000000000000000111
0x3fffffff2c0:  00000000000000000000000000001000    00000000000000000000000000001001
(gdb) x/10w y
0x3fffffff2c8:  00000000000000000000000000000000    00000000000000000000000000000001    00000000000000000000000000000010    00000000000000000000000000000011
0x3fffffff2d8:  00000000000000000000000000000100    00000000000000000000000000000101    00000000000000000000000000000110    00000000000000000000000000000111
0x3fffffff2e8:  00000000000000000000000000001000    00000000000000000000000000001001

Following is print output :

SIZE : char(1), int(4), long(8)
=== 1-D char x[10] ===
0xfffff310 : x[0] : 0x41
0xfffff311 : x[1] : 0x42
0xfffff312 : x[2] : 0x43
0xfffff313 : x[3] : 0x44
0xfffff314 : x[4] : 0x45
0xfffff315 : x[5] : 0x46
0xfffff316 : x[6] : 0x47
0xfffff317 : x[7] : 0x48
0xfffff318 : x[8] : 0x49
0xfffff319 : x[9] : 0x4a
=== 2-D char y[2][5] ===
0xfffff300 : y[0][0] : 0x41
0xfffff301 : y[0][1] : 0x42
0xfffff302 : y[0][2] : 0x43
0xfffff303 : y[0][3] : 0x44
0xfffff304 : y[0][4] : 0x45
0xfffff305 : y[1][0] : 0x46
0xfffff306 : y[1][1] : 0x47
0xfffff307 : y[1][2] : 0x48
0xfffff308 : y[1][3] : 0x49
0xfffff309 : y[1][4] : 0x4a
=== 3-D char z[2][2][3] ===
0xfffff2f0 : z[0][0][0]  0x41
0xfffff2f1 : z[0][0][1]  0x42
0xfffff2f2 : z[0][0][2]  0x43
0xfffff2f3 : z[0][1][0]  0x44
0xfffff2f4 : z[0][1][1]  0x45
0xfffff2f5 : z[0][1][2]  0x46
0xfffff2f6 : z[1][0][0]  0x47
0xfffff2f7 : z[1][0][1]  0x48
0xfffff2f8 : z[1][0][2]  0x49
0xfffff2f9 : z[1][1][0]  0x4a
0xfffff2fa : z[1][1][1]  0x4b
0xfffff2fb : z[1][1][2]  0x4c
=== 1-D int x[10] ===
0xfffff2a0 : 0x00000000
0xfffff2a4 : 0x00000001
0xfffff2a8 : 0x00000002
0xfffff2ac : 0x00000003
0xfffff2b0 : 0x00000004
0xfffff2b4 : 0x00000005
0xfffff2b8 : 0x00000006
0xfffff2bc : 0x00000007
0xfffff2c0 : 0x00000008
0xfffff2c4 : 0x00000009
=== 2-D int y[2][5] ===
0xfffff2c8 : 0x00000000
0xfffff2cc : 0x00000001
0xfffff2d0 : 0x00000002
0xfffff2d4 : 0x00000003
0xfffff2d8 : 0x00000004
0xfffff2dc : 0x00000005
0xfffff2e0 : 0x00000006
0xfffff2e4 : 0x00000007
0xfffff2e8 : 0x00000008
0xfffff2ec : 0x00000009

Debugging and Execution result of X86 machine :

Binary data while debugging :

(gdb) x/10bt x
0x7fffffffe440: 01000001    01000010    01000011    01000100    01000101    01000110    01000111    01001000
0x7fffffffe448: 01001001    01001010
(gdb) x/10bt y
0x7fffffffe430: 01000001    01000010    01000011    01000100    01000101    01000110    01000111    01001000
0x7fffffffe438: 01001001    01001010
(gdb) x/12bt z
0x7fffffffe420: 01000001    01000010    01000011    01000100    01000101    01000110    01000111    01001000
0x7fffffffe428: 01001001    01001010    01001011    01001100

(gdb) x/10wt x
0x7fffffffe3c0: 00000000000000000000000000000000    00000000000000000000000000000001    00000000000000000000000000000010    00000000000000000000000000000011
0x7fffffffe3d0: 00000000000000000000000000000100    00000000000000000000000000000101    00000000000000000000000000000110    00000000000000000000000000000111
0x7fffffffe3e0: 00000000000000000000000000001000    00000000000000000000000000001001
(gdb) x/10wt y
0x7fffffffe3f0: 00000000000000000000000000000000    00000000000000000000000000000001    00000000000000000000000000000010    00000000000000000000000000000011
0x7fffffffe400: 00000000000000000000000000000100    00000000000000000000000000000101    00000000000000000000000000000110    00000000000000000000000000000111
0x7fffffffe410: 00000000000000000000000000001000    00000000000000000000000000001001

Following is print output :

SIZE : char(1), int(4), long(8)
=== 1-D char x[10] ===
0xffffe440 : x[0] : 0x41
0xffffe441 : x[1] : 0x42
0xffffe442 : x[2] : 0x43
0xffffe443 : x[3] : 0x44
0xffffe444 : x[4] : 0x45
0xffffe445 : x[5] : 0x46
0xffffe446 : x[6] : 0x47
0xffffe447 : x[7] : 0x48
0xffffe448 : x[8] : 0x49
0xffffe449 : x[9] : 0x4a
=== 2-D char y[2][5] ===
0xffffe430 : y[0][0] : 0x41
0xffffe431 : y[0][1] : 0x42
0xffffe432 : y[0][2] : 0x43
0xffffe433 : y[0][3] : 0x44
0xffffe434 : y[0][4] : 0x45
0xffffe435 : y[1][0] : 0x46
0xffffe436 : y[1][1] : 0x47
0xffffe437 : y[1][2] : 0x48
0xffffe438 : y[1][3] : 0x49
0xffffe439 : y[1][4] : 0x4a
=== 3-D char z[2][2][3] ===
0xffffe420 : z[0][0][0]  0x41
0xffffe421 : z[0][0][1]  0x42
0xffffe422 : z[0][0][2]  0x43
0xffffe423 : z[0][1][0]  0x44
0xffffe424 : z[0][1][1]  0x45
0xffffe425 : z[0][1][2]  0x46
0xffffe426 : z[1][0][0]  0x47
0xffffe427 : z[1][0][1]  0x48
0xffffe428 : z[1][0][2]  0x49
0xffffe429 : z[1][1][0]  0x4a
0xffffe42a : z[1][1][1]  0x4b
0xffffe42b : z[1][1][2]  0x4c
=== 1-D int x[10] ===
0xffffe3c0 : 0x00000000
0xffffe3c4 : 0x00000001
0xffffe3c8 : 0x00000002
0xffffe3cc : 0x00000003
0xffffe3d0 : 0x00000004
0xffffe3d4 : 0x00000005
0xffffe3d8 : 0x00000006
0xffffe3dc : 0x00000007
0xffffe3e0 : 0x00000008
0xffffe3e4 : 0x00000009
=== 2-D int y[2][5] ===
0xffffe3f0 : 0x00000000
0xffffe3f4 : 0x00000001
0xffffe3f8 : 0x00000002
0xffffe3fc : 0x00000003
0xffffe400 : 0x00000004
0xffffe404 : 0x00000005
0xffffe408 : 0x00000006
0xffffe40c : 0x00000007
0xffffe410 : 0x00000008
0xffffe414 : 0x00000009


The reason why I looked at memory while debugging was to see if there was any difference between AArch64 and X86 architecture. However, they were the same.

Result shows :
* layout of char array is continuous byte block
* layout of 1-d, 2-d, and 3-d arrays are the same way(continuous).
* the concept of dimension in the array is logical access method of memory
* int array is continuous 4-bytes block
* in terms of array layout, there is no difference between AArch64 and X86 architecture

by Hosung at February 16, 2015 11:27 PM

Artem Luzyanin

Assumptions about the system which can vary between architectures: Default size of variable types

One of the ways computer architectures differ from each other is the size of certain variable types. For example, this article is talking about difference in variable sizes between 16, 32 and 64-bit UNIX platforms. 16-bit computer architecture used integers and pointers of the size of 16 bits. That architecture used so called “IP16” data model, where “I” stands for “integer”, “P” stands for “pointer” and “16” stands for their size in bits. When the 32-bit architectures arrived, they had 32-bit integers and pointers, but it still had a 16-bit version of integer, calling it “short”. The used data model was already “ILP32”. Besides the obvious values, it also introduced “L”, which stands for “long”. Years later, 64-bit architectures appeared. Now, although it would be expected that they increased a default size of an integer, long, and a pointer, and call their 32-bit counterparts something new, but they decided to come up with a new model, that would provide them with more benefits. Of course, they were limited in their choices, and the reason to that is the reason why the difference in the default sizes of variables is generally a problem.
It is very hard to implement a new data type in C language. So to call integer 64-bit data type, they would have to either change the current data types, as well as the code that relies on specific data type size, or add new types, and change the code to utilize the correct data types. Either solution requires immense changes. Imagine the following simple piece code: “ int x = sizeof(int);” As you can see, x will have different values depending on data structure used. Hence, the implication of different default sizes of data types is immense on the code that utilizes them as fixed values. Although the above code might seem useless, why not to write something like “int x = 4”? Well, sometimes using data type size is beneficial, like in “malloc(int)”.
Another problem arises when you try to use bit shift operations on a data type. For example, if you want to encrypt a number, and you expect “x << y” to produce a certain result, imagine that encrypting was done on the system with ILP32 model, and decryption was done on “IP16” model. Amount of bits, expected/processed by the system will be different, as the result the message will not be decrypted properly.
The last but not the least problem with the difference in sizes is the information that can be stored in one variable. The largest number, stored by 16-bit integer is 65,535, while 32-bit integer can already store 2,147,483,647. So the code that would employ a line like “int x = 100000;” will produce an error on 16-bit system, while will work fine on 32-bit.
Now back to the history lesson. While the data type sizes were stable in 16-bit systems, in 32-bit and 64-bit, it became a mess. Since C-language adheres to this standard: “sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) = sizeof(size_t)” (where size_t is the largest unsigned supported size), there are many possible implementations of the data types on the same architecture. Consider this table of data types and sizes:
Data Type                | LP32 | ILP32 | ILP64 | LLP64 | LP64
char                        | 8      | 8      | 8       | 8       | 8
short                       | 16    | 16     | 16     | 16      | 16
int32                       |         |         | 32     |          |
int                          | 16     | 32     | 64     | 32      | 32
long                        | 32     | 32     | 64     | 32      | 64
long long (int64)        |         |         |         | 64      |
pointer                     | 32    | 32     | 64     | 64      | 64
All these models were implemented, and, as you can guess, having three separate implemented data structures on the same architecture creates problems. For example, if the code is relying on the size of int, long and pointer to be the same, it will not work properly on, say, LLP64 data model.
So now, that the problem is defined, what could be a solution? The most straightforward answer: don’t rely on the sizes of data types, and assume that they will be different. If in your code you need to employ “sizeof(int)”, and later you use that as a number, then use a switch statement to check for the possible values. If you need to store a value that might not be stored on the older architecture, it might be better to use the data type, size of which was always the same: double. It is 4-bit on all architectures, so the value will stay the same. Generally speaking, since the technology advances quickly, new architectures are likely to emerge, so expecting the data types to be a certain size might make your code not portable. Don’t rely on any assumptions, and try to utilize data types and code structures that do not change over the time.

Special thank you to this article for providing most of my blog information:

by lisyonok85 at February 16, 2015 05:42 PM

February 15, 2015

Chris Tyler (ctyler)

Initial Current and Temperatures on the HiKey from 96Boards

I was fortunate to receive an early access HiKey board from the 96Boards project at Linaro Connect last week.

This board is powered by an 8-core, 64-bit Cortex-A53 ARMv8-A Kirin 620 SOC from HiSilicon with 1GB of LPDDR3 RAM, a Mali 450MP4 GPU, dual USB, eMMC and micro-SD storage, 802.11g/n, and high- and low-speed expansion connectors with I2C, SPI, DSI, GPIO, and USB interfaces.

So far, this has been an incredible board to work with, despite some teething pains with the pre-release/early access software and documentation (and a few minor quibbles with the design decisions behind the 96Boards Consumer Edition spec and this first board). It's not in the same performance class as the ARMv8 server systems that we have in the EHL at Seneca, but it's a very impressive board for doing ARMv8 porting and optimization work -- which is its intended purpose, along with providing a great board for hacker and maker communities.

I experimented with the board last week and took some readings at home today, and thought I'd share some of my findings on board current draw and temperatures, because it may be useful to those planning alternate power supplies and considering temperatures and airflows for cases:

  • Current consumption: The board draws ~120 mA at idle (Linux login prompt) with nothing connected, and about 150-155 mA with a basic USB fast ethernet adapter connected. With ethernet attached and 8 cores doing busy-work (compressing /dev/urandom to /dev/null), current consumption rises to just over 300 mA (297-320). All of these readings are at 12+/-0.25 vdc, so that's under 4W including the USB ethernet. Note that the GPU was basically idle during these tests.
  • Temperature: In a room with an ambient temperature of ~21C, with all 8 cores doing busy work (8 processes gzipping /dev/urandom to /dev/null, and top reporting 0.0% idle), the temperature on the SOC heatsink rose fairly quickly to ~48C, and eventually reached 52C, measured using an infrared temperature reader (accuracy of +/- <2C).

A couple of other random observations about the board:

  • The board mounting holes accommodate M2.5 screws. Basic hardware stores, including Home Depot (at least in Canada), do not carry M2.5 screws, so I've been thwarted in my efforts to mount this onto an acrylic plate so far (cases will evetually follow, I'm sure, but I always prefer to have boards on/in something and not sitting directly on my desk). I'm sticking some silicon feet on the bottom as an interim measure.
  • There is a "USERDATA" partition on the eMMC which is not used by the initial software image. Be sure to format and mount that partition to gain an additional 1.5 GB of space if you're running from eMMC.

I'm looking forward to the release of WiFi drivers and UEFI bootloader support soon, as promised by the 96Boards project.

More notes to follow...

by Chris Tyler ( at February 15, 2015 02:39 AM

February 14, 2015

Yan Song

Wanna Be a Software Reverse Engineer, Hu

For the » Compiled C Lab, we’re expected to get started with the Intel architecture and assembly language, using the classic C program:

/* hello.c */
#include &lt;stdio.h&gt;

int main(void) {
    printf(&quot;hello, world\n&quot;);
    return 0;

For the purpose of anti-argument, the above file was compiled on australia like so:

gcc -g -O0 -fno-builtin -o hello hello.c

Now it’s our prof’s turn:

Using objdump, find the answers to these questions: (i) Which section contains the code you wrote? (ii) Which section contains the string to be printed?

Of course, our main function must be in the .text section. This can be verified by the following command (on australia0:

objdump -s -j .text hello

The most relevant part of the output

0000000000400536 &lt;main&gt;:
  400536:       55                      push   %rbp
  400537:       48 89 e5                mov    %rsp,%rbp
  40053a:       bf e0 05 40 00          mov    $0x4005e0,%edi
  40053f:       b8 00 00 00 00          mov    $0x0,%eax
  400544:       e8 c7 fe ff ff          callq  400410 &lt;printf@plt&gt;
  400549:       b8 00 00 00 00          mov    $0x0,%eax
  40054e:       5d                      pop    %rbp
  40054f:       c3                      retq

tells us that the first instruction of (this ELF version of) our main starts at the address 0x400536. Also note that the mov instruction on line 4 is loading the address 0x4005e0 into register EDI. Recall from C that what gets passed into printf() is the address of the first character of the string "hello, world\n".

And, again, recall from C that the string literal "hello, world\n" usually goes into the .rodata section. Using objdump, that can be verified using the following command:

objdump -s -j .rodata hello

which produces the output below:

hello: file format elf64-x86-64

Contents of section .rodata:
 4005d0 01000200 00000000 00000000 00000000 ................
 4005e0 68656c6c 6f2c2077 6f726c64 0a00 hello, world..

From here, we can see that the ‘h’ really is at 0x4005e0 whithin the .rodata section. And remember 0x0a is '\n', 0x00 ''—the quintessential null terminator.

Next come some variations on the same theme:

(1) Add the compiler option -static. Note and explain the change in size, section headers, and the function call.

With the -static option, the compiler driver gcc performs static linking, as part of which code and data from some library modules would be copied into the resulting executable, which would in turn account for the dramatic increase in file size.

The table below collects section header information contained in the two executables (as reported by objdump -h):

   statically linked              dynamically linked
   =================              ==================
0 .note.ABI-tag                  .interp
1             .note.ABI-tag
2 .rela.plt            
3 .init                          .gnu.hash
4 .plt                           .dynsym
5 .text                          .dynstr
6 __libc_thread_freeres_fn       .gnu.version
7 __libc_freeres_fn              .gnu.version_r
8 .fini                          .rela.dyn
9 .rodata                        .rela.plt
10 .stapsdt.base                 .init
11 __libc_thread_subfreeres      .plt
12 __libc_subfreeres             .text
13 __libc_atexit                 .fini
14 .eh_frame                     .rodata
15 .gcc_except_table             .eh_frame_hdr
16 .tdata                        .eh_frame
17 .tbss                         .init_array
18 .init_array                   .fini_array
19 .fini_array                   .jcr
20 .jcr                          .dynamic
21                  .got
22 .got                          .got.plt
23 .got.plt                      .data
24 .data                         .bss
25 .bss                          .comment
26 __libc_freeres_ptrs           .debug_aranges
27 .comment                      .debug_info
28 .note.stapsdt                 .debug_abbrev
29 .debug_aranges                .debug_line
30 .debug_info                   .debug_str
31 .debug_abbrev
32 .debug_line
33 .debug_str

The very 0th section in the dynamically linked version is .intercp, which in our case containts the pathname /lib64/, which in turn is a symbolic link to /usr/lib64/

The call stacks in these two versions are also different. For example, in <main> of the dynamic-linking version above, the call instruction:

400544:       e8 c7 fe ff ff          callq  400410 &lt;printf@plt&gt;

refers to an instruction that resides in a different section, .plt (short for procedure linkage table)whereas the call instruction (of the static linking version) below:

  400b5e:       55                      push   %rbp
  400b5f:       48 89 e5                mov    %rsp,%rbp
  400b62:       bf 10 09 49 00          mov    $0x490910,%edi
  400b67:       b8 00 00 00 00          mov    $0x0,%eax
  400b6c:       e8 0f 0b 00 00          callq  401680 &lt;_IO_printf&gt;
  400b71:       b8 00 00 00 00          mov    $0x0,%eax
  400b76:       5d                      pop    %rbp
  400b77:       c3                      retq
  400b78:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  400b7f:       00

refers to an instruction in the same section as <main> does.


(2) Remove the compiler option -fno-builtin. Note and explain the change in the function call.

With the removal of that option, the call to printf is replace with a call to puts, as evident on line 4 below:

0000000000400536 &lt;main&gt;:
  400536:       55                      push   %rbp
  400537:       48 89 e5                mov    %rsp,%rbp
  40053a:       bf e0 05 40 00          mov    $0x4005e0,%edi
  40053f:       e8 cc fe ff ff          callq  400410 &lt;puts@plt&gt;
  400544:       b8 00 00 00 00          mov    $0x0,%eax
  400549:       5d                      pop    %rbp
  40054a:       c3                      retq
  40054b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)


(3) Remove the compiler option -g. Note and explain the change in size, section headers, and disassembly output.

Without the -g option, absent from this version are the following, debugging-related sections:


which also accounts for the decrease in the file size.


(4) Add additional arguments to the printf() function in your program. Note which register each argument is placed in. (Tip: Use sequential integer arguments after the first string argument. Go up to 10 arguments and note the pattern).

For this variation, we used the simple C code below:

/* hello4.c */
#include &lt;stdio.h&gt;

int main(void) {
        return 0;

From the part of the output of the objdump -d -j .text

0000000000400536 &lt;main&gt;:
  400536:       55                      push   %rbp
  400537:       48 89 e5                mov    %rsp,%rbp
  40053a:       48 83 ec 08             sub    $0x8,%rsp
  40053e:       6a 0a                   pushq  $0xa
  400540:       6a 09                   pushq  $0x9
  400542:       6a 08                   pushq  $0x8
  400544:       6a 07                   pushq  $0x7
  400546:       6a 06                   pushq  $0x6
  400548:       41 b9 05 00 00 00       mov    $0x5,%r9d
  40054e:       41 b8 04 00 00 00       mov    $0x4,%r8d
  400554:       b9 03 00 00 00          mov    $0x3,%ecx
  400559:       ba 02 00 00 00          mov    $0x2,%edx
  40055e:       be 01 00 00 00          mov    $0x1,%esi
  400563:       bf 10 06 40 00          mov    $0x400610,%edi
  400568:       b8 00 00 00 00          mov    $0x0,%eax
  40056d:       e8 9e fe ff ff          callq  400410 &lt;printf@plt&gt;
  400572:       48 83 c4 30             add    $0x30,%rsp
  400576:       b8 00 00 00 00          mov    $0x0,%eax
  40057b:       c9                      leaveq
  40057c:       c3                      retq
  40057d:       0f 1f 00                nopl   (%rax)

it seems that the first five arguments after the format string went into five registers, haphazardly—to my eyes.


(5) Move the printf() call to a separate function named output(), and call that function from main(). Explain the changes in the object code.

For this variation, we used the simple C code below:

/* hello5.c */

#include &lt;stdio.h&gt;

void output(void) {
        printf(&quot;hello, world\n&quot;);

int main(void) {

Of course, gcc has added another new part to the .text section:

0000000000400536 &lt;output&gt;:
  400536:       55                      push   %rbp
  400537:       48 89 e5                mov    %rsp,%rbp
  40053a:       bf f0 05 40 00          mov    $0x4005f0,%edi
  40053f:       b8 00 00 00 00          mov    $0x0,%eax
  400544:       e8 c7 fe ff ff          callq  400410 &lt;printf@plt&gt;
  400549:       5d                      pop    %rbp
  40054a:       c3                      retq

And, finally—

(6) Remove -O0 and add -O3 to the gcc options. Note and explain the difference in the compiled code.-

The dump of this version is different from the original one (see above).

0000000000400440 &lt;main&gt;:
  400440:       48 83 ec 08             sub    $0x8,%rsp
  400444:       bf f0 05 40 00          mov    $0x4005f0,%edi
  400449:       31 c0                   xor    %eax,%eax
  40044b:       e8 c0 ff ff ff          callq  400410 &lt;printf@plt&gt;
  400450:       31 c0                   xor    %eax,%eax
  400452:       48 83 c4 08             add    $0x8,%rsp
  400456:       c3                      retq

It seems the -O3 version directly manipulated the stack-related register RSP instead of relying on the implicit effect induced by the push instruction.

by ysong55 at February 14, 2015 11:41 PM

Bruno Di Giuseppe

Flight Paths with ThreeJS


I was given the task of making a flight path like visualization in 3D using ThreeJS.

One of the requirements is that the city location info should be on a latitude-longitude system. Because it’s universal, precise and you can easily modify it. So it’s both dynamic and accurate.

The question is: How do you place a map like coordinate system on a 3D sphere? Once again our Math Masters explained it to us in no time. Longitude/Latitude is a polar coordinate system and to be able to place any given point on the sphere we need to translate those coordinates to the Cartesian System. More on coordinates system.

Getting on the nitty gritty of stuff

I’m not going to pretend I understand the whole thing, so I’m going to say what I understood from the solution:

    var phiFrom = * Math.PI / 180;
    var thetaFrom = (dataRecord.from.lon + 90) * Math.PI / 180;

    //calculates "from" point
    var xF = radius * Math.cos(phiFrom) * Math.sin(thetaFrom);
    var yF = radius * Math.sin(phiFrom);
    var zF = radius * Math.cos(phiFrom) * Math.cos(thetaFrom);

By calculating how the angle from the coordinate points translate to an actual angle on our system and then multiplying it by the sphere’s radius and our beloved trigonometrical functions, you get the points on your 3D world. MathMagic.

Good enough. So, once you set up both “from” and “to” variables to get the points, it’ll be easier if you set vectors for them.

 //Sets up vectors
 var vT = new THREE.Vector3(xT, yT, zT);
 var vF = new THREE.Vector3(xF, yF, zF);

Vectors are just points on the 3D world. Unless stated otherwise, their origin is at the scene’s origin [0,0,0].
Now, this should give you the starting and ending points for the path. How do we make a nice path that goes from one point to another.
pointsNow that we have the points set correctly on the globe, we should be able to make a nice path between them. I used Bezier Curves for the task, since it’s rather easy to use them. On ThreeJS, you can do cubic or quadratic bezier. A quadratic is built on three defined points, while the quadratic uses 4 control points. After trying quadratic, I decided to go with the cubic, because the shape of the paths are better defined with an extra control point doing it so.

So, you have your starting and ending points, but how do you set your middle control points?

There’s some things we need, but first let’s find the midpoint between the starting and ending points, as well as the distance between the first two points. It’s also a good idea to get started on the 2 control points by cloning the position of the starting ones.


// vF = vector From
// vT vector To
var dist = vF.distanceTo(vT);

// here we are creating the control points for the first ones.
// the 'c' in front stands for control.
var cvT = vT.clone();
var cvF = vF.clone();

// then you get the half point of the vectors points.
var xC = ( 0.5 * (vF.x + vT.x) );
var yC = ( 0.5 * (vF.y + vT.y) );
var zC = ( 0.5 * (vF.z + vT.z) );

// then we create a vector for the midpoints.
var mid = new THREE.Vector3(xC, yC, zC);

Now we have the centre points of the paths.
Let’s now get in the math for establishing the points to draw the path once and for all.

    var smoothDist = map(dist, 0, 10, 0, 15/dist );
    mid.setLength( radius * smoothDist );
    cvT.setLength( radius * smoothDist );
    cvF.setLength( radius * smoothDist );

Here I am using a map function. You can find one on any quick Google search. Then we set the Length of the vector make it bigger than the sphere, relatively to the inverse of its distance, first to the midpoint. Then we add the midpoint to the vectors, to bring them closer together and then we set the length to the final vectors once more, to make them even bigger.

Finally, we create the path and add it to the scene.

var curve = new THREE.CubicBezierCurve3( vF, cvF, cvT, vT );

    var geometry2 = new THREE.Geometry();
    geometry2.vertices = curve.getPoints( 50 );

    var material2 = new THREE.LineBasicMaterial( { color : 0xff0000 } );

    // Create the final Object3d to add to the scene
    var curveObject = new THREE.Line( geometry2, material2 );


To get a sphere following the path is rather easy. You just need to create a sphere on the path and then, on the render() function, you iterate through all the points of the path, one by frame, and update the sphere’s position to be this position.
Because I have multiple paths and spheres, I created arrays to contain them and control them individually.

    for( var i = 0; i < movingGuys.length; i ++ ) {
      pt = paths[i].getPoint( t );
      movingGuys[i].position.set( pt.x, pt.y, pt.z );
      t = (t >= 1) ? 0 : t += 0.002;

And this is how you make them move. Nicely enough, ThreeJS has a method .getPoint() that returns the position to a given point (from 0 to 1, like from 0% – 100%) as a vector and you can just set it to be the position of your object.
I hope you liked this post, feel free to comment and even help me, I guess this is not the best solution for setting up the control points.

Filed under: 3D programming, CDOT, Javascript

by brunodigiuseppe at February 14, 2015 12:00 AM

February 13, 2015

Andrew Benner

Bramble — Update

Now that we are able to have Brackets live in Thimble, we are continuing to push forward with what we envision for the finished product. Currently this is how Thimble looks.

Screen Shot 2015-02-10 at 1.29.32 PM

We are working to replace the preview pane on the right with a preview pane using Brackets. The way we are going to implement this is by using the Live Preview feature from Brackets. We are going to have Brackets spawn an iframe to the right of the editor pane then use Brackets Live Preview feature to render the code in the iframe.

The Live Preview feature has a couple different layers that allow it to communicate between the browser and the editor. Two of the main layers are the transport and protocol. Each of these two layers has a remote and non-remote side to them. The remote side is the browser/rendered side and the non-remote is the Brackets editor. The portion I was responsible for is the transport layer on the remote side. Here’s a diagram that may help you understand the architecture.Screen Shot 2015-02-13 at 2.38.29 PM

Currently, Brackets uses websockets for the transport layer between the remote and non-remote. What we needed to do was replace the websocket portion with a postMessage to the iframe. To do this I had to investigate the remote protocol to see what it expects to receive from the transport. This way I know how to rewrite the functions for the postMessage implementation. Essentially, I took the existing websocket code and used it as a template for my code.

by ajdbenner at February 13, 2015 07:42 PM

Hosung Hwang

Crosswalk – Crodova Command Line Compatibility

Cordova Command-Line Interface is about using “cordova” command. It is a node script that is installed in the host machine. In my case : /usr/local/bin/cordova.

cordova command works based on Cordova generated directory structure. Therefore, it doesn’t work for Crosswalk directory structure. To build and run, instead of using cordova command, we need to use the scripts inside projectname/cordova directory, which was in projectname/platforms/android/cordova in case of Cordova project. To create Crosswalk project, there is a new node script made by Crosswalk team : create. And for managing Crodova Plugins, low level plugin utility, plugman is used.

Followings are matching functionalities of cordova. ./create script is in bin directory of Crosswalk-Cordova distribution, and other commands are executed in the project directory. For example, ./cordova/run is projectname/cordova/run file.

function platform command / explanation
create Project Cordova $cordova create hello com.example.hello HelloWorld
Crosswalk $./create hello com.example.hello HelloWorld
./create is inside the bin directory of Crosswalk-Cordova distribution
build prepare Cordova $cordova prepare
Crosswalk NA
copying resource from www directory to platform directory
compile Cordova $cordova compile
Crosswalk NA
compiling without copying
build Cordova $cordova build
Crosswalk $./cordova/build
prepare + compile. options are the same. eg) –debug –release
run Cordova $cordova run
Crosswalk $./cordova/run
build + run. options are the same. eg) –nobuild –emulator
version Cordova $cordova –version
Crosswalk $./cordova/version
project info Cordova $cordova info
Crosswalk NA
platform Cordova $ cordova platform add/ls/update
Crosswalk NA – Crosswalk-Android is only for Android and it already contains a skeleton Android project

Followings are plugin management functionalities. plugman is used instead of cordova plugin.
plugman is basically a tool to create/manage/upload Cordova plugins.
Full functionalities : link.

function platform command / explanation
adding plugin Cordova $cordova plugin add cordova-plugin-name
Crosswalk $plugman install –platform android –project . –plugin cordova-plugin-name
deleting plugin Cordova $cordova plugin rm cordova-plugin-name
Crosswalk $plugman uninstall –platform android –project . –plugin cordova-plugin-name
listing plugin Cordova $cordova platform ls
Crosswalk NA
searching plugin Cordova $cordova plugin search plugin-keyword
Crosswalk $plugman search plugin-keyword

Followings are scripts supported by Crosswalk that are in bin directory of Crosswalk-Cordova distribution

command explanation
$./create options used to create a Crosswalk project
$./update project_directory it calls create script again and it updates the project to newest version.
$./check_reqs check if there is no problem in Crosswalk build environment
it shows “Looks like your environment fully supports cordova-android development!”
$./android_sdk_version shows installed Android sdk version.


by Hosung at February 13, 2015 12:05 AM

February 12, 2015

Hosung Hwang

Markdown format

Recently, I started to use Markdown format as a personal memo format. This is by far the one I was looking for. I always wanted a text format that can be easily formatted as HTML, at the same time, it should be easy to write, and text file itself should be readable. Markdown has all that features and it is already widely being used.
I was thinking about using wiki format. There is local wiki editor and viewer called Zim Desktop Wiki. Emacs also has wiki mode. However, sharing with other devices was painful.

About Markdown

Markdown is developed in 2004 by John Gruber “to write using an easy-to-read, easy-to-write plain text format, and optionally convert it to structurally valid XHTML”


Following links are useful :

Benefit in the local memo file with emacs

To write everyday activities, I was using text files that are everyday generated by emacs (link). It was good, but to share it, I had to copy it to evernote or something.
By using emacs’ Markdown mode, I can see the text with little bit of formatting using color and bold. Also, it gives fast way to insert Markdown formatting keywords.
Now all my text file extension is .md and emacs generates it automatically.

Benefit in mobile sharing with dropbox

When I put the diary directory’s link to dropbox sync directory, the files are automatically synchronized as soon as I edit them.
There is an Android app called “Draft“. It supports Dropbox synchronization for Text and Markdown format. Also it shows formatted Markdown. Of course, it has editing feature.
As a result, when I add or edit a text in the Android Draft app, it is synchronized to my laptop’s diary directory. Now, I do not need evernote. “Draft” looks like this :

Benefit with github

github uses Markdown as readme and all kind of text file. So, if I write my local file using markdown, it can be directly used as github text.

Benefit with wordpress

wordpress supports Markdown. This posting and previous posting was written using Markdown. It is much faster than typing html tag for making list and table. Also, my Markdown memo can be directly posted as a wordpress posting.

Chrome Extension

There is a Markdown Preview Plus chrome extension. If the markdown file is opened by chrome using “file:///home/hosung/diary/” or drag-drop, it shows generated HTML on the page using CSS for github style or other style. Optionally, the page can reload changed file every 1+ seconds.

Emacs Integration

I wrote a simple LISP script to add into .emacs file. By using it, currently editing file can be quickly opened in chrome by typing M-x chrome or Ctrl-Shift-c.

(defun chrome ()
   (format "google-chrome --activate-existing-profile-browser %s" buffer-file-name ))
(global-set-key (kbd "C-S-c") 'chrome)


by Hosung at February 12, 2015 02:13 AM

Crosswalk – Cordova Plugins Compatibility 2

In the previous posting about “Crosswalk-Cordova Plugins Compatibility”, I introduced compatibility table. However, the table was for Crosswalk 7; stable Crosswalk version when I started to use Crosswalk was 9. Because there was no more compatibility table from Crosswalk, I had to check if current Cordova Plugins work properly on Crosswalk-Android application.
I tested major plugins that are necessary to make Cordova apps.

Test Order

  1. Choosing a Cordova Plugin from
  2. Testing the plugin at a Cordova app
    • Creating Cordova App using : cordova create testapp com.cordova.testapp testapp
    • Adding the plugin using : cordova plugin add org.apache.cordova.pluginname
    • Making a sample HTML/Javascript app using sample codes on Cordova plugin page, Adobe PhoneGap API Reference, and internet searching result.
    • Adding permission to : platforms/android/AndroidManifest.xml
    • Testing it on the phone using : cordova run android
  3. Testing the plugin at a Crosswalk app
    • Creating Crosswalk App using : ./create testapp com.crosswalk.testapp testapp
    • Adding the plugin using : plugman install --platform android --project . --plugin gitrepo
    • Coping the sample app worked on Crodova to project_directory/assets/www
    • Adding permission to : project_directory/AndroidManifest.xml
    • Testing it on the phone using : ./cordova/run

Versoin of tested Cordova/Crosswalk-Cordova Issue

  • Cordova : 3.6.4
  • Crosswalk-Cordova :
    • Cordova used by Crosswalk : 3.5.1
  • Crosswalk-Cordova :
    • Cordova used by Crosswalk : 3.6.3

When I tested File-Transfer plugin, this caused annoying problem. The code working on the Cordova didn’t work with an error code on Crosswalk app. By using recently updated Crosswalk-Cordova stable version(, I could get the same result.

Test Result

Simply, every Plugin I tested worked well in Crosswalk.

Plugin Installation Work O/X
Device $ cordova plugin add org.apache.cordova.device O
$ plugman install –platform android –project . –plugin O
Console $ cordova plugin add org.apache.cordova.console O
$ plugman install –platform android –project . –plugin O
Battery Status $ cordova plugin add org.apache.cordova.battery-status O
$ plugman install –platform android –project . –plugin O
Camera $ cordova plugin add O
$ plugman install –platform android –project . –plugin O
taking picture, selecting from local gallery worked well
Contacts $ cordova plugin add org.apache.cordova.contacts O
$ plugman install –platform android –project . –plugin O
listing contacts, adding new entry worked well
Device Motion (Accelerometer) $ cordova plugin add org.apache.cordova.device-motion O
$ plugman install –platform android –project . –plugin O
Device Orientation (Compass) $ cordova plugin add org.apache.cordova.device-orientation O
$ plugman install –platform android –project . –plugin O
Dialogs $ cordova plugin add org.apache.cordova.dialogs O
$ plugman install –platform android –project . –plugin O
FileSystem $ cordova plugin add org.apache.cordova.file O
$ plugman install –platform android –project . –plugin O
file/folder listing/reading/writing worked well
FileTransfer $ cordova plugin add org.apache.cordova.file-transfer O
$ plugman install –platform android –project . –plugin O
fileEntry.fullPath didn’t work for both. cordova.file.dataDirectory is the way to get downloadable path.
in version, method failed with error code 3(CONNECTION_ERR), it worked in
I didn’t test file uploading.
Geolocation $ cordova plugin add org.apache.cordova.geolocation O
$ plugman install –platform android –project . –plugin O
Media $ cordova plugin add O
$ plugman install –platform android –project . –plugin O
Remote audio file playback/pause/stop worked well
Network Information $ cordova plugin add O
$ plugman install –platform android –project . –plugin O
Connection type 3G/4G/Wifi. Worked well
Vibration $ cordova plugin add org.apache.cordova.vibration O
$ plugman install –platform android –project . –plugin O

Permissions used in AndroidManifest.xml

    <uses-permission android:name="android.permission.INTERNET" />
    <uses-feature android:name="" />
    <uses-permission android:name="android.permission.CAMERA" />
    <uses-permission android:name="" />
    <uses-permission android:name="android.permission.BATTERY_STATS" />
    <uses-permission android:name="android.permission.VIBRATE" />
    <uses-permission android:name="android.permission.ACCESS_COARSE_LOCATION" />
    <uses-permission android:name="android.permission.ACCESS_FINE_LOCATION" />
    <uses-permission android:name="android.permission.ACCESS_LOCATION_EXTRA_COMMANDS" />
    <uses-permission android:name="android.permission.READ_PHONE_STATE" />
    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.RECEIVE_SMS" />
    <uses-permission android:name="android.permission.RECORD_AUDIO" />
    <uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
    <uses-permission android:name="android.permission.READ_CONTACTS" />
    <uses-permission android:name="android.permission.WRITE_CONTACTS" />
    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
    <uses-permission android:name="android.permission.ACCESS_WIFI_STATE" />
    <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
    <uses-permission android:name="android.permission.RECORD_AUDIO" />
    <uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
    <uses-permission android:name="android.permission.GET_ACCOUNTS" />


  • All plugins I tested were working the same way in both Cordova and Crosswalk.
  • When I installed plugins, I didn’t specify plugin’s version because the aim of this testing was to know if current version of plugins are working on current version of Crosswalk. However, current Cordova version and the version of Cordova module that are used in stable Crosswalk was different.
  • When the application use a plugin, if it doesn’t work properly, it needs to be checked if it works properly in the Cordova application that use the same Cordova module version.
  • FileTransfer plugin worked differently depending on Cordova version. However, in most recent version, works well. According to web search result, there were several report about error code 3. One solution is adding header (Connection:”close”) from this link.
  • I didn’t test all property/method of plugins. Therefore, there might be some functions that works differently.
  • There are 740 plugins so far. To use it, testing is necessary. I tested only 14 plugins.

by Hosung at February 12, 2015 12:55 AM

February 10, 2015

Kenny Nguyen

Making a patch for brackets-browser-livedev

This blog relates to this github repo:


So I'd say a week ago I finished this:


I think I've elaborated on this before, but I'll explain again. Basically we want to create an iFrame within brackets itself so that we can use it with brackets "Multibrowser Live development" module.
Presently with Live Development Brackets will open up a new chrome instance in order to see your changes to an html file, live. The idea behind "Multibrowser Live Development" is that you'd run a server which would server html to multiple browsers. What we want to do is fake the server, and use http post methods to talk to the iFrame I've created instead of towards a web browser.
My method is kind of rough, but bascially I call the command "Create Split View". Then I find the html element with the id of Second-Pane, I then .empty() it and fill it with the iframe.
Not neccessarily the best method, but presently this works in desktop brackets, and using brackets in the web.

Note: This is a dramatization

I just made this story for fun, don't take it too seriously
Skip down to the TL:DR section for a more informative section

On the morro of Monday, I was ready to finally finish my extension, in my hubris I spoke a lot to my co-worker Andrew Benner that I'd have this extension finished Lickety-split! I came in, sat down and worked on it for an hour and hastily proclaimed that I had finished what I had set out to do! For this would be my downfall, for as confidence welled up inside me so very rapidly, so soon did it flee my very existence when I recieved notice of my failings.

Like thunder at the crack of dawn, my overlord Sir David Humphrey had doled upon my very existence, 10 "suggestions" that I need to expunge imediately. So off I went, into the abyss that is my code, once thought of as a shining example of grace, turned to nothing more than a hazy mess of slapped together words merely reaching meaninglessly towards greatness.

I did not falter though, I ran towards the code with all my might, and came back seemingly victorious. Thus I send another PR

PR Means Pull Request, basically when you want to add your code to someone elses main code for them to accept

towards my overlord, but nay! It was not the end, another 6 "suggestions" flung my way, had I been of a lower caste I may have fallen over and given up, but Nay! I say! I am stronger, I shall perservre! Thus once more I picked myself up, and ran at the beast with all my strength. And yet again, I had finally applied everything my overlord had taught me up till this point, I had attained it, the end I sought.

Thus I submited the PR once more, my hubris restored, I let down my guard, and once more my overlord threw at me 3 "suggestions". From this I had fallen, seemingly unable to get up, but somewhere deep inside I realized I was getting close, my failings up till this point were not the worlds fault, but my own. If I had taken the time to slowly fight beat this monster from the very start I may have saved myself this arduous journey. Once more I'd fight the beast.

But nay! This time I'd do it differently. I'd fight the monster slowly, going over every little piece of it, make sure everything is spotless and in order. Not allowing even a fraction of weakness shine through. For you are only as strong as your greatest weakness.

And finally, at the end I submited my PR, not with hubris, but with pride, care, and attentiveness.

And overlord David Humphrey, finally, and truely. Accepted my PR into the master branch.


  • Well this is the first time in a while I've had my code scrutinized so thoroughly
What I learned:
  1. Remove Useless Modules
  2. Fix Comments
  3. Remove Useless Code
  4. Make sure to understand your spec 100%
  5. Try to make your code understandable to everyone, not just yourself
  6. Be extra careful about spacing
  7. Follow the code practices of the environment you're coding for
  8. Reduce redundant code
  9. Make sure your variable names make sense

    Another note is that, an incorrectly explained comment, is worse than no comment at all because it could mislead someone who's reading your code, so make sure to be specific and factual in your comments.

    Long Version

    I started out with this on monday morning: iframe-browser.js in it you can see on lines 7 to 11 a few imports that don't really do anything, and _panel being a global, even though we don't need it globally.

    Browse() is super over complicated, I guess I was trying to cover all possible usecase scenerios in 1 function. What browse does in this instance is it takes in a html file, and a true or false variable. It uses this to create the iFrame I displayed earlier in either vertical or horizontal form.

    Next up is update(), this is where I make the iFrame, you can see some redundant code, some literally wrong code. This code just simply needed a rewrite.

    In the end I ended up with this: iframe-browser.js in it I've added many new methods.

    • Browse has been simplified, it checks to see if split exists, and calls _update to make the iframe
    • setOrientation/show basically split up the showing of the panel into 2 methods, 1 is a set/get that allows you to specify the orientation before hand, and show which will display the panel
    • _update now works properly with url's, and a lot of redundant code has been removed
    • added a method to get the iFrame itself
    • Added many more public API's to interact with
    • Updated most of the comments so that they're easier to understand

by Kenny Nguyen at February 10, 2015 04:04 PM

Koji Miyauchi

Geolocation Data for Visualization.

In our project, we are looking for some information that can be plotted on a 3D globe. To do that, we will need geolocation data ( latitude, longitude ) and some values.I was looking for simple world population data ( expected .csv or excel format ), however, there are not much population data including geolocation.
The data with geolocation which I found was a GIS file format used for GIS.

GIS (Geographic Information System)

To work on data visualization using geolocation data, knowing about GIS is very important.

GIS stands for Geographic Information System, which is a system or an application designed to visualize many type of information on a map, and utilize it for analysing.
For example, Google Earth is one of the popular GIS, Google Earth allows users to plot many kinds of data on a globe visualize it nicely.

Information is represented as a layer, and overlapped each layers over a map. This diagram explains very well how GIS plotted some different information on a map.

[ Source: Geographic Information System Basics v1.0, Stanford University Library ]

GIS file formats

GIS file formats are standard of encoding geographical information into a file and used for GIS .
Because the information for GIS are diverse, there are many types of file format in GIS file formats.

GIS file formats for web.

There are three GIS file formats which are web friendly among various formats.

  • GeoJSON – a lightweight format based on JSON, used by many open source GIS packages
  • Geography Markup Language (GML) – XML based open standard (by OpenGIS) for GIS data exchange
  • Keyhole Markup Language (KML) – XML based open standard (by OpenGIS) for GIS data exchange

GeoJSON is the only JSON format among GIS file formats. Beucase it’s JSON, it is much easier to parse in a web application. Only disadvantage is that it is not maintained by a formal standard organization.

GML is the XML grammer file format which is defined by the OGC (Open Geospatial Consortium).

KML is also the XML grammer file format ( shares some grammer with GML) developed by Google, Inc and used for their Google Earth or Google Map. KML is now an international standard maintained by the OGC.

by koji miyauchi at February 10, 2015 01:00 PM

February 09, 2015

Anderson Malagutti

Get current time in JAVA.

This post is going to show how we can get the current time in JAVA in a very simple way.
For instance, in our code we will have two functions.

    1. formatTime() – receives a long value that represents the current time, and returns it as a String “human readable”.

    2. getCurrentTime() – calls the formatTime function passing the system current time in milliseconds, which is provided by System.currentTimeMillis().

So, let’s see the code for these functions:

import java.text.SimpleDateFormat;

public static String formatTime(long time) {
    SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd|HH:mm:ss:SSS"); 
    String currentTime = simpleDateFormat.format(time);
    return currentTime;

public static String getCurrentTime() {
   return formatTime(System.currentTimeMillis());

Using these functions, you should be able to get a current time formatted like this:


This String represents year-month-day|hours:minutes:seconds:milliseconds.

Thank you! :)

by andersoncdot at February 09, 2015 05:03 PM

Koji Miyauchi

Comparing D3.js and Raphael.js

In order to investigate the performance of both d3.js and raphael.js for our project, I created two different examples:
One is particles animation example, the other one is interactive(with mouseover, touch events) example.
And run these script on PC, and smartphone environment(Cordova, Crosswalk).

Animation Performance

This is the example that creates number of particles(like 500), and each particle moves around randomly.
I created same example for both d3.js and raphael.js with each library’s feature and syntax.


Platform D3.js Raphael.js
PC Performance 500particles(23-60fps) 500particles(19-30fps)
Cordova Performance 50particles(8-21fps) 50particles(6-14fps)
Crosswalk Performance 50particles(23-37fps) 50particles(13-16fps)

Is was significant performnce different between D3.js and Raphael.js for this type of animation.

Interactivity Performance

Creates bar graph and added eventhandlers(mouseover, mouseout, touch) to examine performance.


D3.js Raphael.js
PC Performance 59-61fps 59-61fps
Cordova Performance initialize(56fps), touch(60fps) initialize(50fps), touch(60fps)
Crosswalk Performance initialize(56fps), touch(60fps) initialize(55fps), touch(60fps)

There was no significant difference between, D3.js and Raphael.js performance on interactive events.
However, when I added a alpha transition on tooltip popup for the examples, it drops 10fps when touch interaction occurs.
Thus, it is important to be careful using redundant animation for smartphone version of charts.


Both libraries are using SVG for rendering. Because of that it is easy to create smartphone-optimized charts. Also all event handlers for smartphones are available to use.

High level libraries C3.js and gRaphael.js

There are some high level libraries on the top of D3.js or Raphael.js. C3.js and gRaphael.js. I coded examples which uses same data and creates a chart by both libraries features.


C3.js gRaphael
PC Performance initialize(56fps) initialize(49fps)
Cordova Performance initialize(4fps) initialize(38fps)
Crosswalk Performance initialize(4fps) initialize(36fps)
Easiness to code 7lines 6lines

There is different from initializing performance.
C3.js perform very slow fps when it creates a chart. It is because the c3 library added many fancy transition on the graph. It might be good if it is customizable for smartphone.

by koji miyauchi at February 09, 2015 03:00 PM

February 07, 2015

Justin Grice

Lab 4- Disassembling a Simple C Program

In this lab, we will be dissembling a simple C program using objdump and comparing various compile options.

We initially created a simple C program that prints the “Hello World!” string, as shown below.
int main() {
printf("Hello World!\n");

We then compiled it using gcc with the following settings:
gcc -g -O0 -fno-builtin test.c

Disassembling the Program
This resulted in an ELF file with a size of 9528 bytes. When we examine the disassembled section containing our main function we see the following:
push %rbp
mov %rsp,%rbp
mov $0x4005e0,%edi
mov $0x0,%eax
callq 400410 <printf@plt>
pop %rbp

Upon examination of the assembly instructions we can see that it is doing some stack management and then calling the standard library printf function using a value at the 400410 memory location. The @plt section after the printf indicates that it is using the function from the procedure linkage table.

Alternate Compilation Methods:
After examining the initial programs compilation, we proceeded to try some other arguments for gcc when compiling. When adding the -static argument to gcc, there was a significant increase in compiled file size. When compiled it resulted in the file being 828073 bytes, nearly 85 times bigger. When examining the disassembly, the cause was discovered to be the fact that the entire stdio.h library was being included in the compiled program, instead of dynamically linking it through the PLT.
When we removed the -fno-builtin option, there was a negligible decrease in file size to 9510 bytes. The major noticeable change was that instead of calling printf@plt it was calling puts@plt. This was because by default, the compiler knows a list of substitutions it can use the optimize your code. When printf is called with only one argument, it optimizes the compiled code to use puts because puts will be faster and result in the same end result. This will only work when printf has one argument though and -fno-builtin is not included as an argument for gcc.
Removal of the -g option resulted in decrease of file size to 8504 bytes. When examining the compiled code, we also noticed a lack of debugging sections leading to the conclusion the -g enables debugging information to be added during compilation.
Adding additional arguments to printf()
When we changed the printf function to contain additional arguments(printf("Hello World\n", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);) we noticed a change in the disassembly. There seemed to be the last 5 arguments pushq into the stack and the first 6 arguments were mov into registers in this order: eax, edi, esi, ecx, r8d, r9d.
printf() to separate function
We then changed the code to resemble the following:
void output(char * text) {
int main() {
output("Hello World!\n");

This resulted the two separate sections in the object code: main and output; with the value to print being passed between the functions using registers.
-O0 and -O3
When we changed the optimization level to -O3, the file size increased to 10968 bytes. The most likely cause of this is because increased performance at higher optimization levels requires additional file size increases. On inspection of the object code, it appears to have removed all the extra commands related to stack management in main. The main consists of only moving a value into a register, a XOR operation, and a JMPQ operation(originally was a CALLQ operation).

by jgrice18 at February 07, 2015 04:44 PM

February 06, 2015

Artem Luzyanin

Compilation and assembler (a.k.a. Lab 4)

For the next lab of SPO600 we were playing around with different compiler options. After I disassembled compiler code for 6 different combinations of program code and compiler option, I compared the results, and noted the differences.

The first run:

Code (size: 65b):

#include <stdio.h>


int main() {

printf(“Hello World!\n”);



I used the next command to compile (compiled size: 9550b):

gcc –g –O0 –fno-builtin FileName
Using “readelf –p .rodata RunnableFileName” I fetched the dump of “.rodata” section, which contains the string that I used in my application: “Hello World!”.

Using objdump –f –s –source I got the disassembled information about the runnable file.
Notable line in <_start> function is the call of <__libc_start_main@plt>. It is calling library from Procedure Linkage Table. This is my #include code.

One notable thing in <main> function is that it is calling printf procedure from Procedure Linkage Table. Another is that it is calling it, so there will be extra overhead of saving and restoring the main function. Also, main shows few lines of code working with the stack. This function contains my codes for “main” function and for “printf” command.

The second run:

Code (size: 65b):

#include <stdio.h>


int main() {

printf(“Hello World!\n”);



I used the next command to compile(compiled size: 828kb):

gcc –g –O0 –fno-builtin –static FileName

Using “readelf –p .rodata RunnableFileName” I fetched the dump of “.rodata” section, which contains the string that I used in my application: “Hello World!”.

Using objdump –f –s –source I got the disassembled information about the runnable file.
Notable line in <_start> function is the call of <__libc_start_main>. This time around it’s not calling library from Procedure Linkage Table, but rather loads it directly to memory. This increases the processing speed, since the overhead of going to PLT and then to the library is removed. This added a new procedure to the code called <__libc_start_main>, which appeared right after <main>. This comes at the cost of significant (nearly 90x) increase to the size of the executable.
<main> function is now calling <_IO_printf> directly, and the code for <_IO_printf> is now loaded in the program. “callq” is sitll used for “printf”, so there still is an overhead for the save and restore points. There is also overhead for stack setup.

“-static” compiler option is a good choice to go with if there is a need to avoid using PLT as much as possible. It will increase runtime speed, which is a good choice for smaller size libraries and procedures. At the same time, due to very significant increase to the size of the executable, this compiler option should be used with caution, as with the large amount of inclusions, the size of the program might get out of control.

The third run:

Code (size: 65b):

#include <stdio.h>


int main() {

printf(“Hello World!\n”);



I used the next command to compile (compiled size: 9532b):

gcc –g –O0 FileName

Using “readelf –p .rodata RunnableFileName” I fetched the dump of “.rodata” section, which contains the string that I used in my application: “Hello World!”.

Using objdump –f –s –source I got the disassembled information about the runnable file.

<_start> function is identical to the first run. The library is called from PLT again.
<main> function is now calling <puts@plt> although I used “printf” in my code. The reason is that without “–fno-builtin” option compiler is trying to recognize some commonly used functions and use a shortcut for it. This way, “printf” function was changed by “puts”. This happened because I use “printf” with only one argument, the string “Hello World!”, and instead of using more complicated “printf”, compiler avoids extra move to the stack, and outputs the string directly.

“–fno-builtin” compiler option turns off an option that increases processing time speed and decreases slightly the size of runnable by using some commonly used code shortcuts. The pitfall of not using this option is that you can’t put a breakpoint in this shortcut, so it becomes harder to debug it. Also, since the compiler will treat the changed function as if it was using the standard library, you can’t overwrite the behaviour of that function by using another library.

The fourth run:

Code (size: 65b):

#include <stdio.h>


int main() {

printf(“Hello World!\n”);



I used the next command to compile (compiled size: 10990b):

gcc –g –O3 –fno-builtin FileName

Using “readelf –p .rodata RunnableFileName” I fetched the dump of “.rodata” section, which contains the string that I used in my application: “Hello World!”.

Using objdump –f –s –source I got the disassembled information about the runnable file.
While <_start> function wasn’t changed much, <main> is changed significantly, as it went down from 8 lines to 3. There is no stack setup anymore, and the overhead of saving and restoring <main> function during the “printf” processing was removed by using jump (kind of “go”) instead of call.

Higher level of optimization is an option that can save a significant amount of processing time by removing overhead as well as unnecessary commands where possible. This comes at the price of relying on the judgement of compiler, which not always might be wise. In its attempts to create a faster (not necessarily smaller, in fact, the size, like in our case, might go up) version of the code, it might create errors, or simply change the way the code was intended to behave.

The fifth run:

Code (size: 65b):

#include <stdio.h>


int main() {

printf(“Hello World!\n”);



I used the next command to compile (compiled size: 8510b):

gcc –O0 –fno-builtin FileName

Using “readelf –p .rodata RunnableFileName” I fetched the dump of “.rodata” section, which contains the string that I used in my application: “Hello World!”.
10 11
I run “dbg ./RunnableFIleName” command for my very first program as well as this one. Debugger complained in the latter case that “no debugging symbols found”, but that was the almost entire difference. Code-wise, without “-g” (debugging) option compiler didn’t put pieces of actual code in the object, which would make it harder/impossible to debug. The benefit of using this compiler option is to save space (around 15%).

The sixth run:

Code (size: 255b):

#include <stdio.h>


int main() {

printf(“%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d\n”,

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30);



I used the next command to compile (compiled size: 9550b):

gcc –g –O0 –fno-builtin FileName

Using objdump –f –s –source I got the disassembled information about the runnable file.

While <_start> function is exactly the same as during the first run, <main> is completely different. Due to large amount of arguments, registers were filled up sequentially from the highest number to the lowest, and then those that didn’t fit had to go in “r%9”, “r%8”, “%ecx”, “%edx”, “%esi”.

When specifying multiple parameters for “printf” function, one needs to make sure that all those parameters can be stored in registers without exhausting them. Each parameter that will go on the top of that will slow the processing time.

The seventh run:

Code (size: 96b):

#include <stdio.h>


void output() {

printf(“Hello World!\n”);


int main() {





I used the next command to compile (compiled size: 9637b):

gcc –g –O0 –fno-builtin FileName

Using “readelf –p .rodata RunnableFileName” I fetched the dump of “.rodata” section, which contains the string that I used in my application: “Hello World!”.

Using objdump –f –s –source I got the disassembled information about the runnable file.
While <_start> code didn’t change, <main> started calling the stack setup code as well as the call to “printf” function is now moved to “output”. <main> maintained a large portion of its code, lacking only “mov $0x4005f0,%edi” and “callq 400410 <printf@plt>” lines. At the same time it had to make a call to <output>. Effectively, taking “printf” out in a separate function just increased the code and overhead of running extra save and restore points. At the same time, if the function would be used heavily in the application, there would be a point in taking it out.

by lisyonok85 at February 06, 2015 10:41 PM

Hosung Hwang

How to change VirtualBox HDD size on Linux

The storage in my Windows 7 VM on the VirtualBox fell short. HDD size was 29GB. I forgot how to extend it, so I am writing how to do it here after doing it.

My system is Ubuntu 14.04 LTS and VirtualBox 4.3.20

The command is “VBoxManage modifyhd” and the vdi file is in “~/VirtualBox VMs/Win7/Win7.vdi”

$ sudo VBoxManage modifyhd ~/VirtualBox\ VMs/Win7/Win7.vdi --resize 45000
[sudo] password for hosung:

Now in the “Disk Management” I can check that there is Unallocated 14.65GB. Select “Extend Volume” as follows:

Screenshot from 2015-02-06 16:57:28

Now C drive size is 43.84 GB.

Screenshot from 2015-02-06 17:00:03

by Hosung at February 06, 2015 10:16 PM

Alfred Tsang

Conflict dividing two integers based on two different architectures

This project focuses on the differences between an int and a long on two different computer architectures.  The size of an int and a long vary from one computer architecture to the other. The variation of the size of an int and a long may cause division calculations to fail.  This is because when the result of a calculation goes over the maximum, a run-time error will be generated. The size of an int is 4 bytes on an IBM.  On an IBM architect, there is no long data type.  This means that on an IBM architecture, the int plays both int and long value for IBM.  On a Windows architecture, the size of an int is 4 bytes and a long is 8 bytes.  int and long are treated as separate data types on a Windows architecture.  Programmers write code this way because they want to manipulate ints and longs to get other ints and longs. A problem with this is that dividing two integers will vary based on the architecture used.  On an IBM machine, the result is truncated and an int is returned.  On a windows architecture, dividing two integers will result in either an int or a double depending on how the two integers were stored. A program that will solve dividing two integers have varying results based on the architecture.  One way of solving this problem is to typecast both integers into a double (or maybe a float).  Also, when dividing two integers, make sure that the second integer is NOT zero when doing the division, and if it is, terminate the program.

The presentation that I mentioned taught about how different architectures (e.g x86 and iSeries) have different ways that datatypes can be manipulated and this can affect the result of the program.  It also taught about the consequences of dividing two integers can have an impact if architecture is not taken into consideration.

Any programming language can write code that can divide two integers correctly relies on whatever architecture it is running on.  Each architecture is different, and this may cause the program to either crash or continue, as long as the same program is run on two different computers with two different architectures.

by kaputsky263 at February 06, 2015 08:50 PM

Maxwell LeFevre

GCC Compiler Options (Lab 4, part 2)

This is the second part of my lab 4 post for SPO600. The first part can be found here. In that section I go over the basic parts of an ELF file and discuss a little bit about assembly language. The information in that post is assumed to be known in this post and I am just going to jump right in.

Compiling With Different flags

Before I go any further I am going to post a table with a list of a.out files, the flags they were compiled with, their sizes, and links to objdump text files. I will be referencing this information throughout the final section of this post.

Binary Name Flags Size objdump -d objdump -s
aBase.out -g -O0 -fno-builtin 9.4K aBase-d aBase-s
a1.out -g -O0 -fno-builtin -static 809K a1-d a1-s
a2.out -g -O0 9.3K a2-d a2-s
a3.out -O0 -fno-builtin 8.4K a3-d a3-s
a4.out -g -O0 -fno-builtin 9.4K a4-d a4-s
a5.out -g -O0 -fno-builtin 9.4K a5-d a5-s
a6.out -g -O3 -fno-builtin 11K a6-d a6-s

GCC and the -static Option (build a1.out)

The first thing to notice is that the file is 86 times larger when the -static flag is added. The reason for this is that the compiler has included all the default libraries in the .out file. The most obvious difference is in the .text section of the file which has gone from 137 lines to 146,627 lines. It looks like a lot of functions have been added. To find out exactly what was added have a look at the output of ldd aBase.out vs lld a1.out:

ldd aBase.out => (0x00007fff023d5000) => /lib64/ (0x00007fb6c9a73000)
/lib64/ (0x00007fb6c9e42000)

ldd a1.out
not a dynamic executable

The ldd command prints the libraries that the binary is linked to. The three libraries that are linked to aBase.out are built into a1.out.

There are also important differences in other sections though. In the <_init> section the third line, <_DYNAMIC+0x1d0>, has been replaced with <__stack_prot+0x10>. Something that was dynamic is now looks like if is part of a protected area of the stack (my guess on what prot stands for). Another expected but surprising change is to the .plt section. It has changed significantly but not in the way I would have anticipated. There are no longer references to printf, as expected because the compiled program no longer needs a reference in the procedure lookup table. The program can call printf directly, callq 401680 <_IO_printf>. The surprising thing here is that .plt section isn’t empty. Why do we need a PLT if all the procedures are already stored in the file and called directly by the program? The small amount of research I did suggested that the reason for this is that it is kept for performance reasons. The PLT is used to make sure that the proper version of the requested function is called for the specific CPU it is being run on, even if it is different from the one it was compiled on.

GCC and the -fno-builtin Option (build a2.out)

The -fno-builtin option prevents the compiler from optimizing the code in pre-defined ways. In this program when printf() is called there are no arguments passed to it so the compiler knows that it doesn’t need to use printf(), it can use puts() instead and get the same result with less resource usage. For reasons I am not sure of the compiler also leaves out writing the exit code to %eax and uses nopw (nop with a length of word) instead of nopl. As expected the two references to printf in the .lpt have been changed to puts a well. The difference in file size for this change is negligible.

GCC and the -g Option (build a3.out)

-g is the debug option so I can expect a smaller file when it is removed because the compiler will not include information for debugging purposes. I can expect all five .debug sections to be missing entirely. Comparing aBase-s.txt and a3-s.txt from the table this is exactly what I see. The debug lines amount to an increase of about 10% in total file size. It is also important to note that the ELF file does not contain the source code, instead it references the original file.

Additional printf Arguments (build a4.out)

For this section I added 10 arguments to the printf() function to see how the compiled binary changes. The new .main section contains the following additional lines:

40053a: 48 83 ec 08       sub $0x8,%rsp
40053e: 6a 0a             pushq $0xa
400540: 6a 09             pushq $0x9
400542: 6a 08             pushq $0x8
400544: 6a 07             pushq $0x7
400546: 6a 06             pushq $0x6
400548: 41 b9 05 00 00 00 mov $0x5,%r9d
40054e: 41 b8 04 00 00 00 mov $0x4,%r8d
400554: b9 03 00 00 00    mov $0x3,%ecx
400559: ba 02 00 00 00    mov $0x2,%edx
40055e: be 01 00 00 00    mov $0x1,%esi

There are 11 additional lines but I only added 10 arguments. The first line appears to subtract 8 from the pointer to the top of the stack. This would reserve the next 8 bytes on the stack for the printf() arguments to be written. The sub command is used because on x86_64 the stack grows down. Looking at the next 10 lines it is interesting to note that it works backwards from the last value (10 or a) up to the first (1). The reason for this is so that when they are needed they can be popped off the stack efficiently without having to move around inside the stack. The other thing to note is that 5 bytes were allocated but only 5 were used (the 5 pushq lines) because 5 of the arguments were stored in registries. The extra 3 bytes are put back into use after the printf() call with the line 48 83 c4 30 add $0x30,%rsp. Again because the stack grows down the add command in used to decrease it.

GCC and Functions (build a5.out)

For the fifth version I moved the printf() function to a new function called ‘output’. The new disassembled code looks like this:

0000000000400536 : <output>
400536: 55             push %rbp
400537: 48 89 e5       mov %rsp,%rbp
40053a: bf f0 05 40 00 mov $0x4005f0,%edi
40053f: b8 00 00 00 00 mov $0x0,%eax
400544: e8 c7 fe ff ff callq 400410 <printf@plt>
400549: 5d             pop %rbp
40054a: c3             retq 

000000000040054b : <main>
40054b: 55                   push %rbp
40054c: 48 89 e5             mov %rsp,%rbp
40054f: 5d                   pop %rbp
400550: c3                   retq
400551: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
400558: 00 00 00
40055b: 0f 1f 44 00 00       nopl 0x0(%rax,%rax,1)

As would be expected there is a new section called <output> that now contains almost all the same code as the old <main> because it now has all the functionality. The <main> section is slightly smaller but not by much. It still contains all it’s own initialization code. The thing I found interesting was that there is no call to the new function. Instead, midway through <main>, the old base is popped of the stack in the line 40054f: 5d pop %rbp. It seems to me like that would take it back one whole function to <output> similar to a jump but I was unable to verify this.

GCC and the -O Option (build a6.out)

The -O option is used to specify the desired level of optimization. -O0 is no optimization, -O1 is some optimization, -O2 is as much optimization as possible without increasing the size of the binary, -O3 is optimization at the cost of file size, and -O0 is optimization for file size at the cost of efficiency. All of these options are fairly safe and will work in almost all cases. In this case the file size went up from 9.4K to 11K in order to make the program more efficient. Looking specifically at this example the most obvious thing is that the length of main has been shortened drastically.

0000000000400440 :<main>
400440: bf e0 05 40 00  mov $0x4005e0,%edi
400445: 31 c0           xor %eax,%eax
400447: e9 c4 ff ff ff  jmpq 400410 

Now only three lines, the main function still copies the address of the string into a register but instead of doing all the initialization and cleanup it just jumps (not calls) to the printf() function. The jump means that it doesn’t plan on coming back so nothing is required after that line. Although there are all the same functions in the .text sections of both files <main> has been moved to the top in a-6.out. Interestingly the increase in file size comes exclusively from including more information in the .debug sections. With this slight change to the code it looks like generating debug information becomes more complex.


At the beginning of this lab I found it challenging to understand what was going on in the disassembled code but by the end of the lab it became much more natural to read and I found I did not have to look up what was going on. This simple program uses only a few parts of the x86 instruction set and the various compiler option did not have a very large effect on the program itself because with only one function there is not a lot that can be optimized or debugged. I enjoyed doing the research involved in answering the questions posed in this lab but it did take quite a while and I decided to split this into two posts to make it easier to read and have the topics more clearly divided.


Part 1

by maxwelllefevre at February 06, 2015 06:24 AM

Exploring ELF Files and Compiler Options (Lab 4, part 1)

The fourth lab in SPO600 is focused on understanding what happens to our code when we send it off to the compiler, specifically gcc for x86_64. In this post I will be looking at the contents of the output file and how it changes when different options are used or the code is changed slightly. This is a pretty broad topic and as a result I have divided it into two separate posts. This one, which covers ELF file structure and assembly language, and a second one, exploring a selection of gcc flags and their effect on the compiled ELF file.

The program I am working with is a basic Hello World! program in c. The code for which is below.

#include <stdio.h>
int main() {
   printf("Hello World!\n");

This is a simple program that does nothing except output the string to stdout.

ELF Files

Before I start getting in to detail about the contents of the file created by the compiler I am going to go over it’s basic structure. For more detailed information have a look at this ELF format specification document. ELF stands for ‘Executable and Linkable Format’. For now, the sections of an ELF file that are important to this lab are:

Section Name Contains…
.data/.data1 Initialized Data
.debug Debug Info
.dynamic Dynamic Linking Information
.dynstr Strings for .dynamic
.dynsym Symbols for Symbol Table
.fini Code run on successful program exit
.init Code that runs before the main program
.line Line number information for debugging
.plt Procedure Linkage Table
.rodata Read-only data from the program
.text Executable code for the program

There are two commands to view the contents of an ELF file, objdump and readelf. For this lab I used objdump almost exclusively. I also use a tool called FileMerge, which is part of the developer kit that comes with X-Code, to compare the output of the objdump command when used on different executables.

What parts of an ELF file are mine?

The compile command for the first build is $gcc -g -O0 -fno-builtin -o aBase.out hello.c. There are a number of options selected:

-g # enable debugging information
-O0 # do not optimize (capital letter 'o' and zero)
-fno-builtin # do not use built-in function optimizations

There were two questions posed at the beginning of the lab which were, “Which section contains the code [I] wrote?” and, “Which section contains the string to be printed?” The answer to these questions is in the table above but it can also be found by looking at the output of objdump. The command objdump has a number of arguments that can be used to obtain different types of information but for my purposes I will mainly be using -d and -s. objdump -d is used to disassemble the portions of the file containing code and display it in a somewhat human readable format. objdump -s will display summary information by section. In the output of objdump -s aBase.out we can see the lines:

Contents of section .rodata:
4005d0 01000200 00000000 00000000 00000000 ................
4005e0 48656c6c 6f20576f 726c6421 0a00 Hello World!..

This tells me that the string “Hello World!” must be contained in the .rodata section. There is nothing in this set of output that is easily identified as part of the code though. For that I will use the objdump -d aBase.out command. objdump -d aBase.out provides information divided into three columns. The first column is the memory address and won’t help us much. The second column is the hex code the processor uses, good to know but won’t help us identify our code. The third column is human readable assembly language, this is what I want to have a look at. Reading down the assembly column I will eventually get to the line:

 400544: e8 c7 fe ff ff callq 400410 <printf@plt>

This line contains the printf function call that was used in the program. Furthermore it is in a subsection titled 0000000000400536 : in the section .text so I can be reasonably sure this is the function I used in main().

That third column…

I described the third column of objdump -d aBase.out as ‘human readable’ and you have to admit it is way better than the second column but it is still far from a higher level language like C. There are a few commands that seem to be repeated more frequently than others; push, mov, callq, and pop. Some of these seem pretty obvious but I am going to define them anyways.

  • push is used to push (write) data onto the stack (memory used for storing a programs variables managed by the cpu, different from heap which is managed by the program for the most part).
  • pop is used to get, or pop, data off the stack.
  • mov copies data from one location to another.
  • callq is a call to the defined procedure, in this case that is printf in the Procedure Lookup Table (plt).

A complete listing of assembly commands for x86_64 can be found here, it is a Wikipedia page but it is much easier to read than the source documents, and they can be found at the bottom of the wiki page if you are interested.

There is a second part to the assembly column that contains the details of what is happening. This part is not quite as obvious and requires a little bit of research to understand what is happening. There are two basic patterns here, a $ followed by a hex number, and a % followed by three alphanumeric characters. Anything prefixed by a ‘$’ is a constant and anything that has a ‘%’ is a register. In the x86_64 architecture some of the registers have fixed functions and some are multipurpose. The table below list some of the more common ones.

Register Name Purpose
%rbp Points to the base of the current start frame
%rsp points to the top of the current stack frame
%eax contains the programs exit code
%r?x 64bit RAX register
%e?x lower half of the RAX register (32bits)
%?x lower half of EAX register (16bits)

Have a look here for more details on register addresses on x86.

What actually happens…

This is a line by line breakdown of what happens when the main function I wrote runs. First, this is the assembly code for main:

0000000000400536 <main>:
(1) 400536: 55 push %rbp
(2) 400537: 48 89 e5 mov %rsp,%rbp
(3) 40053a: bf e0 05 40 00 mov $0x4005e0,%edi
(4) 40053f: b8 00 00 00 00 mov $0x0,%eax
(5) 400544: e8 c7 fe ff ff callq 400410 <printf@plt>
(6) 400549: 5d pop %rbp
(7) 40054a: c3 retq 
(8) 40054b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

Line one is push %rbp so the program is taking the current value of %rbp and pushing it onto the stack. In the second line, mov %rsp,%rbp, the program then copies the pointer to the current top of the stack (%rsp) into the pointer to the base of the stack (%rbp), defining the start of stack frame for ‘main’. mov $0x4005e0,%edi is copying the “Hello World!” string location to the general purpose registry %eax. We know this because a quick search of the output of objdump -s aBase.out for that memory location (4005e0) returns the line 4005e0 48656c6c 6f20576f 726c6421 0a00 Hello World!... The fourth line is just setting up the return value for the program. x86 requires that the programs return value be stored in the %eax register so mov $0x0,%eax puts the value ‘0’ into that location. Line 5 is the only line of code actually written by me. It is the call to the printf function. callq 400410 <printf@plt>, a search of the output of objdump -d aBase.out for memory location 400410 returns the following lines:

0000000000400410 <printf@plt>:
400410: ff 25 02 0c 20 00 jmpq *0x200c02(%rip)

A jump command to the location of printf in the main stdio.h library. Which then reads the string to be printed off the stack (I am not sure exactly how it knows which location in the stack to look at), displays it, then returns to the main program. The next three lines, 6,7 and 8, are just cleanup code. Line 6, pop %rbp, pulls the old %rbp value off the stack and overwrites the current value of %rbp with the old one and line 7, retq tells the computer to jump back to the return address. Line 8, nopl 0x0(%rax,%rax,1), is a confusing line for me because ‘nopl’ is defined as ‘no operation’ with a size of long, so it uses processor time but does nothing. I am not sure why this would be included in any program.

Break Time

This wraps up the first part of lab four. You can find the second part here. In the second part the focus will be on discerning the effects of various compiler options.


ELF format specification document

x86 Instruction Set

x86 Registries

by maxwelllefevre at February 06, 2015 06:19 AM

Gabriel Castro

Keeping a debug key in your android project.

Android security is based on a package signature structure. Every application must have a unique identifier called an applicationID, and a signature that cannot change without the application being fully removed.

An applicationID takes the form of a Java package name, which is a string as the reverse of the applications domain name.

For example if you are working for a company called
creating an music player, the package name should be something along the lines of 
 As long as people only use domains that they own, this method ensures that it's not very hard to   find an unused applicationID. The actual structure of the applicationID is not enforced by the system so an application with the id "dsadawrew.dasdeqwade" would install without issues.

This can cause issues when developing an application across multiple devices.
This is because the android sdk automatically generates a signing key in ~/.android/.androiddebug.key that is used when installing a debug build of an application. 

Luckily android's gradle build system lets us easily change what key to use for signing. So we can keep one in the git repository for everyone to use.

  1.  create file called debug_keystore.gradle in the root of the project with the following contents.
    android {
    signingConfigs {
    debug {
    storeFile project.file(new File(project.rootDir, 'debug.keystore'))
    storePassword 'android'
    keyAlias 'androiddebugkey'
    keyPassword 'android'
  2. copy ~/.android/.androiddebug.key to the root of the project and rename it to 'debug.keystore'
  3. add the following to any applications build.gradle
    apply from: '../debug_keystore.gradle'
Anyone who now clones the repo will be able to build and install the application on any device without having to uninstall to move between computers.

by Gabriel Castro ( at February 06, 2015 04:30 AM

Glaser Lo

Experiment on XMPToolkit with Android NDK

Last week, I had an adventure of compiling XMPToolkit with NDK. It was a long and terrible trial. In order to compile it with NDK, there were few existing resources to use:


Exempi is a library (XMPTookit included) for writing/reading information to/from image files designed to work on Linux. However, While assuming it might be easy to port to Android, I I realized the developers are still trying to include XMPToolkit 2013 version in the future updates. Old version of XMPToolkit definitely wouldn’t be the thing we are looking for.

Offical XMPToolkit

Using offical XMPTookkit is definitely the way to go. Though, it could be hard to compile with NDK. A lot of issues have been faced when I tried using the library with NDK.

IDE issues

It is probably a bad timing of using NDK for our project. The reason is that Google presents Android Studio as the main IDE for Android development, at the same time, NDK would be replaced soon in the future. Therefore, there are only two strange choices left for us to use: “NDK with deprecated Eclipse ADT” or “unsupported NDK with Android Studio hacks”

Modified version

No matter what, I still want to try out any methods available. The modified version of XMPToolkit can be compiled into static library for Android. After going through serveral blog posts, the way to include libraries are: - add ndk path to - puting files with the name of libxxx.a into src/main/jniLibs/(arch) folder
(arch: x86, armeabi, armeabi-v7a, mips …) - modifying build.gradle to include compiling options

However, it doesn’t work for me. Android Studio wasn’t able to detect any libraries. XMPToolkit couldn’t be compiled as shared library as well, cause the cmake/make files were modified for static linking only.


Another approach

The focal app is another good resource , because XMPtoolkit source code is included in the focal app with (Android make file) file. Though, there is no instruction of adding those external source code to an app project and file is used only for Eclipse. Therefore, I took it as a reference and tried compiling XMPTookit inside Android Studio. After two days of stuggling, it was finally compiled successfully without all the useless files. Well, it turns out it is not a good result:

  • takes 5 mins or more to compile. Not very long but could be very annoying.
  • compile fails sometime randomly because of multiple definitions of certain object/variable
  • compile errors caused by compiling XMPCore and XMPFiles together (it seems there is no way to compile them separately)



After tried serveral methods, using XMPToolkit with NDK is harder than I could imgine. It requires the understanding of serveral things: gcc compiler, NDK, definitions, and gradle build system. Mohamed’s method is probably the better and simpler choice for us.

UPDATE: this might be helpful!

Experiment on XMPToolkit with Android NDK was originally published by Glaser Lo at Illusion Village on February 06, 2015.

by Glaser Lo ( at February 06, 2015 12:00 AM

February 05, 2015

Hong Zhan Huang

SPO600: The third lab… Profiling PH… ython

The 3rd lab plays off of the previous benchmarking lab where I benchmarked a build of PHP 5.6.4 on both the Australia and Red platforms. This next step is to profile PHP to get a better idea of how and where it is spending its time while doing its work.

In order to do that we will be using the gprof tool. A tutorial for how to operate gprof can be found here.

1. To set up profiling we have to add the -pg option to the applicable compiler and linker flags in the Makefile for PHP. To do this I manually inserted the option into the Makefile with vim but another way to do it is to use the environment variables with the configure tool to set the flags when you configure the build. After that we can use the make command to build the binary and make test to test if it all was done properly. An example of the command to this would be:

./configure CFLAGS=-pg LDFLAGS=-pg
make -j15
make test

2. Now that it’s all set up what is supposed to occur is when running the PHP executable upon some php script (in this case we would be using the previously mentioned benchmarking script) it will then produce a gmon.out file in the current working directory. This gmon file is contains profile data which one can then use with gprof like this:

gprof sapi/cli/php gmon.out > Profiling.txt

The Profiling.txt file will contain the results of the profiling the in form of a flat profile and a call graph. The former shows shows how much time a particular function takes up and how many times it is called. The latter shows the call tree (ie which functions called which children functions) and the time spent/times called breakdown for each function and its children.

Initially what occurred was that I wasn’t able to produce a gmon when running the benchmarking script but was able to produce it when building the binary. I looked for flags that needed to be set, if the configure tool had a profiling option to enable, setting the flags via environment variables and lastly looking for children makefiles in sub directories. In the end I wasn’t able to get anywhere. It was a rather perplexing problem that me stumped until I read a post by fellow SPO600 classmate Max Lefevre. He documents his process to into attacking the issue that leads to something that circumvents the problem of being unable to produce a gmon file with PHP.  It appears that enabling the profiling -pg option is not so simple and may cause rather odd side effects (in this case it causes a fatal execution error).

3. The Profiling.txt file that was produced can be a bit much to read through so we will be using two tools that will help us visualize the results of the profiling. Those two tools are gprof2dot and dot. The command to use these tools in conjunction with gprof is as follows:

gprof sapi/cli/php gmon.out | gprof2dot | dot -Tpng > Results.png

The above command will pipe the results of the profiling into dot through gprof2dot and output the resulting graphic as a png file. That png file looks like this:

A rather elaborate red web appears. Somehow it doesn’t seem easier to read nor does it seem to be a result we can analyze. The red colour indicates that each process takes up 100% of the execution time however when we look at a short excerpt of the Profiling.txt file:

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

 % cumulative self self total 
 time seconds seconds calls Ts/call Ts/call name 
 0.00 0.00 0.00 78920033 0.00 0.00 ZEND_JMP_SPEC_HANDLER
 0.00 0.00 0.00 47270005 0.00 0.00 ZEND_IS_SMALLER_SPEC_CV_CV_HANDLER
 0.00 0.00 0.00 46000026 0.00 0.00 ZEND_JMPZ_SPEC_TMP_HANDLER
 0.00 0.00 0.00 38000000 0.00 0.00 ZEND_PRE_INC_SPEC_CV_HANDLER
 0.00 0.00 0.00 36915769 0.00 0.00 _zend_mm_alloc_int
 0.00 0.00 0.00 36915766 0.00 0.00 _zend_mm_free_int
 0.00 0.00 0.00 36915752 0.00 0.00 _efree
 0.00 0.00 0.00 33395707 0.00 0.00 _emalloc
 0.00 0.00 0.00 28270004 0.00 0.00 ZEND_JMPZNZ_SPEC_TMP_HANDLER
 0.00 0.00 0.00 27000000 0.00 0.00 ZEND_IS_EQUAL_SPEC_CV_CONST_HANDLER
 0.00 0.00 0.00 10410289 0.00 0.00 zend_hash_get_current_data_ex

There is apparently no time accumulated at all. Without some measure of time it is difficult to analyze the results. This leads me to think we still haven’t been able to resolve the issue of profiling with PHP.

4. Well now what. Feeling rather discontent with PHP at this point after spending a non trivial amount of time working with it, I decided that let’s just go with another open source package to profile for the sake of having something that can be more easily analyzed for possible points of optimization. And for that purpose I decided to repeat this lab this with Python 3.4.2

Cutting quickly to the chase, obtaining Python was done via wget in the same manner as with PHP. More importantly the method to enable profiling was fair simpler as there was a easy to spot profiling option with Python’s configure tool.

./configure --enable-profiling

After building the package I was able to quickly produce the profiling data. With Python I profiled the compiling of some code. That code was just a simple hello world program:

print ("Hello world");

The following is a excerpt of the flat profile text file created from the gprof tool:

Flat profile:

Each sample counts as 0.01 seconds.
 % cumulative self self total
 time seconds seconds calls ms/call ms/call name
 50.00 0.01 0.01 17545 0.00 0.00 lookdict_unicode
 50.00 0.02 0.01 7216 0.00 0.00 PyObject_GenericGetAttr
 0.00 0.02 0.00 64611 0.00 0.00 visit_decref
 0.00 0.02 0.00 52429 0.00 0.00 visit_reachable
 0.00 0.02 0.00 47084 0.00 0.00 lookdict_unicode_nodummy

Perhaps a  hello world program is too trivial in nature but we do produce a result that can be examined. So without further ado here’s the same thing in graphical form:

At a cursory glance it seems as though all the non dark blue coloured functions are where we should look for pain points. One place I would take a look at first would perhaps be the lookdict_unicode function which is one of the lowest leafs in the this tree that consumes a good deal of the overall time spent while having no other child functions called by it. This function is a part of the dictobject.c and I believe its purpose is to look up keys for a table.

To conclude despite being unable to properly profile PHP (something to keep in mind to look into) and having a rather trivial sample for profiling Python I believe now I’ve gotten a good grasp on how the process to profiling works and can put it to use in the future.

Quest completed! Dropped PHP and picked up a python. End log of an SPO600 player until next time~

by hzhuang3 at February 05, 2015 08:16 PM

Anil Santokhi

SPO600 Lab 1

The two open source software projects I have chosen to research are Drupal and Android

Drupal uses the GNU General Public License, version 2 or later. Code is submitted through patches. One must obtain access git access on in order to contribute code. To add a patch to Drupal, one must simply check an issue on the Drupal project they intend to contribute to, pull the code into a new branch with the naming scheme “module-or-core-name-version-dev”. All of the work should be done in this branch.

After they believe they have solved the issue, they should commit the code. Once the code has been committed to the newly created branch, the developer must then create a patch file and upload it. Drupal suggests this format:

git diff 7.x-1.x > [project_name]-[short-description]-[issue-number]-[comment-number].patch

The patch review process for Drupal is pretty simple, as explained in the diagram below (taken from’s “Novice code contribution guide

patch process

Essentially you check the module or Drupal core for issues on the git repository, solve the issue to the best of your ability, create a patch, upload the patch for review and if everything checks out, the issue is marked as fixed and the patch is applied to the project.

An example of this process in its entirety can be seen here.

In this issue three people were involved – tadityar, pwolanin and Wim Leers. The patch was initially submitted by Wim Leers and it failed the test, therefore needed to be resubmitted. It was then fixed by pwolanin, which caused the code to pass the test. Finally tadityar fixed some styling issues with the code, primarily indentation and the patch was finally marked as complete. The whole process took two days. While only three people contributed actual code in the end, the comment section for the issue had a variety of users discussing the issue itself, other “drupalisms”, related issues and finally solutions. It was a collaborative process.

Drupal’s review process seems to be community based. There are site admins who are able to mark an issue as complete as well as a bot who tests the patch against the current code. Essentially the community works with itself to hammer out working code which is then tested by a bot and finalized into the project.

The advantage to this is that the code on a whole is looked over by many people and has many chances to improve. Furthermore, it is easy to jump in on a patch and help someone out. The code is tested automatically. As a result, code that does not work at all cannot be added to the project at all, filtering out poor work.

The downside to this is that anyone can jump in and help which may not always be a good thing, especially if the author is trying to solve an issue which may have just been solved by someone else who intended to help.

Android uses the Apache Software License, Version 2.0. Code is submitted through patches using Google’s Gerrit server. To add a patch to Android the author must pull the code from the project they wish to contribute, add/remove code to the project and submit the changes via Gerrit for review. The code is then tested locally for functionality and styling. If it passes the reviewer(s)’ criteria, the code is then merged to the project’s public repository. If the code is successfully merged, it is then submitted and successfully added.

If at any point the code does not pass the styling, functional or merging criteria of the reviewer(s) as well as the automated Gerrit server, the code is sent back to the author for revision.

Below is a more detailed look at the actual code submission process.

workflow diagram

An example of this process can be seen here.

In this example, Fredrik Roubert submitted a patch to the Gerrit server. The code was reviewed by Neil Fuller and Narayan Kamath. Neil then verified the code and merged it into the project. The Gerrit server confirmed this and the process was concluded. The whole process took 13 minutes on the developer’s side. However the buildbot needed 10 hours to conclude the issue. There was no visible discussion beyond the bare minimum of what needed to be discussed for the actual submission.

Android’s review process is very community based. Developers come together and review and verify each other’s code before submitting it to the server. The server has a bot which will compile the code and ensure it actually working.

The advantages to this is the many people involved in the submission process. The code must be verified by an actual human and it is easy to see who is working on what at any given time. The code is then tested automatically after the merge to ensure everything is working correctly.

The disadvantage to this approach is that the entire process takes a while to be finalized by the server’s buildbot.

by adsantokhi at February 05, 2015 04:03 AM

February 04, 2015

Bruno Di Giuseppe

Approaching Data Visualization from all sides

The preliminaries

So I was given this task: To find out which data visualization tool was better for the project. What it needed to comply with was: Responsiveness, good performance on cross-platform execution( should work flawlessly on mobile, desktop, tablets, fridges, etc.), interactivity, easiness of implementing and able to draw multiple charts and graphs styles. So, the first step: Research.

So, head to your favourite research engine and use all of your research skills. After a quick one(pun intended), a friend helped me find this nice table. It worked very nice as a first triage. Since some of our colleagues knew about D3.js and it showed to be pretty useful for us on the chart, that’s the one I started with.

The first look at D3

So, D3 here we go. One of the hardest examples (as we’d planned thus far) to build  was an interactive donut chart. By that, I mean a donut chart that you click on a slice and it slides along its normal. Let the image below explain it for me:

That’s pretty much it

After some doodling around, I managed to build a donut chart. The code for it is a little messy at first, but simple too. You start by setting up your “canvas” size. Just remember that D3 works with SVG, not actual canvas. The thing about SVG is that because it’s Scalable Vector Graphics, it can be scaled to any size you want with no distortion, unlike drawing pixels to the screen.

var width = 960,
    height = 700,
    outerRadius = Math.min(width*0.6, height*.6) * .5 - 10,
    innerRadius = outerRadius * .6;

var n = 4,
    data0 = [30,20,40,10],

var desc = [&quot;Active&quot;, &quot;Almost Active&quot;, &quot;Low Actve&quot;, &quot;Sedentary&quot;];
var color = [&quot;green&quot;, &quot;yellow&quot;, &quot;orange&quot;, &quot;red&quot;];

var arc = d3.svg.arc();

var pie = d3.layout.pie()
var p;

var toolTip =;body&quot;).append(&quot;div&quot;)
                .style(&quot;padding&quot;,&quot;2px 10px&quot;)
                .style(&quot;background&quot;, &quot;#cccbbb&quot;)

var svg =;body&quot;).append(&quot;svg&quot;)
    .attr(&quot;width&quot;, width)
    .attr(&quot;height&quot;, height);

    .attr(&quot;class&quot;, &quot;arc&quot;)
    .attr(&quot;x&quot;, width/2)
    .attr(&quot;y&quot;, height/2)
    .attr(&quot;transform&quot;, &quot;translate(&quot; + width / 2 + &quot;,&quot; + height / 2 + &quot;)&quot;)
    .attr(&quot;desc&quot;, function(d,i){return desc[i];})

    .on(&quot;mouseover&quot;, function(d,i){
              .style(&quot;display&quot;, &quot;block&quot;);
      toolTip.html(;desc&quot;) + &quot; : &quot; + + &quot;%&quot;)
              .style(&quot;left&quot;, (d3.event.pageX + 5) + &quot;px&quot;)
              .style(&quot;top&quot;, (d3.event.pageY + 5) + &quot;px&quot;);

    .on(&quot;mouseout&quot;, function(d,i){
              .style(&quot;display&quot;, &quot;none&quot;);

    .attr(&quot;fill&quot;, function(d, i) { return color[i]; })
    .attr(&quot;d&quot;, arc)

function arcs(data0) {
  var arcs0 = pie(data0),
      i = -1,
  while (++i &lt; n) {
    arc = arcs0[i];
    arc.innerRadius = innerRadius;
    arc.outerRadius = outerRadius;
  return arcs0;

So this code is for simply creating the donut. 71 Lines to set up some stuff and generate the chart.
Now, for the interactivity part:


D3 offers a good deal of events and interaction. As you can see already, on the code above, it’s got some mousein and mouseout events, but that’s for starters. We need it to move. We need to Frankenstein it.

.on(&quot;click&quot;, function (d, i) {
 p =;

 .attr(&quot;transform&quot;,&quot;translate(&quot;+ (width/2) +&quot;,&quot;+(height/2)+&quot;)&quot;);

var cumulatedAngle = 0;
 for( var j = 0 ; j &lt; (i) ; j++){
 cumulatedAngle += chartValues[j];
 cumulatedAngle += 0.5 * chartValues[i];
 var angle = ( (cumulatedAngle * 2.0 * Math.PI) / total )

 var ang = angle - Math.PI * 0.5;

 var _x = Math.cos(ang) * 100;
 var _y = Math.sin(ang) * 100;

if(p.attr(&quot;toggle&quot;) == &quot;true&quot;){
 .attr(&quot;transform&quot;,&quot;translate(&quot;+ (_x + width/2) +&quot;,&quot;+(_y + height/2)+&quot;)&quot;);

How did we do it? Pretty simple. We have an  applied math masters in the team. That’s how. Well, I managed to get it started, but he’s the one who got it working in the end. Here we get the total values of the data, do some math to get the sine and cosine of the centre point of the slice and move it along its normal. This post is for comparing functionalities of the libraries, not how the code above works, if there’s any doubts on it, feel free to ask me in the comments.

What I want to highlight here is the lines that involve D3 code. See how it works to create the event listener, just like before, you do .on(“identifier”, function(d, i ){}). How you can select the clicked piece and, at the same time, all of the slices. D3 has got some very good control tool over the DOM elements. Document Object Model is, putting it simply, all of the elements on the screen.

This aspect of D3 is just what we need. It creates everything we need. Bar and line graphs are pretty straight forward, comparing to this. It’s got labels, interaction, the visual, everything. D3 is then a good candidate for our task.

But wait…

There’s more. There’s this library called RaphaelJS. It also works with SVG, it’s got some interactive, it builds many kinds of graphs…. huh. Nice. Let’s take a look at it.

Raphael.fn.donutChart = function (cx, cy, r, rin, values, labels, stroke) {
 var paper = this,
 rad = Math.PI / 180,
 chart = this.set();
 function sector(cx, cy, r, startAngle, endAngle, params) {
 var x1 = cx + r * Math.cos(-startAngle * rad),
 x2 = cx + r * Math.cos(-endAngle * rad),
 y1 = cy + r * Math.sin(-startAngle * rad),
 y2 = cy + r * Math.sin(-endAngle * rad),
 xx1 = cx + rin * Math.cos(-startAngle * rad),
 xx2 = cx + rin * Math.cos(-endAngle * rad),
 yy1 = cy + rin * Math.sin(-startAngle * rad),
 yy2 = cy + rin * Math.sin(-endAngle * rad);
 return paper.path([&quot;M&quot;, xx1, yy1,
 &quot;L&quot;, x1, y1,
 &quot;A&quot;, r, r, 0, +(endAngle - startAngle &gt; 180), 0, x2, y2,
 &quot;L&quot;, xx2, yy2,
 &quot;A&quot;, rin, rin, 0, +(endAngle - startAngle &gt; 180), 1, xx1, yy1, &quot;z&quot;]

 var angle = 0,
 total = 0,
 start = 0,
 process = function (j) {
 var value = values[j],
 angleplus = 360 * value / total,
 popangle = angle + (angleplus / 2),
 color = Raphael.hsb(start, .75, 1),
 ms = 500,
 delta = 30,
 bcolor = Raphael.hsb(start, 1, 1),
 p = sector(cx, cy, r, angle, angle + angleplus, {fill: &quot;90-&quot; + bcolor + &quot;-&quot; + color, stroke: stroke, &quot;stroke-width&quot;: 3}),
 txt = paper.text(cx + (r + delta + 55) * Math.cos(-popangle * rad), cy + (r + delta + 25) * Math.sin(-popangle * rad), labels[j]).attr({fill: bcolor, stroke: &quot;none&quot;, opacity: 0, &quot;font-size&quot;: 20});
 angle += angleplus;
 start += .1;
 for (var i = 0, ii = values.length; i &lt; ii; i++) {
 total += values[i];
 for (i = 0; i &lt; ii; i++) {
 return chart;

var values = [],
 labels = [];
 $(&quot;tr&quot;).each(function () {
 values.push(parseInt($(&quot;td&quot;, this).text(), 10));
 labels.push($(&quot;th&quot;, this).text());
 Raphael(&quot;holder&quot;, 700, 700).donutChart(350, 350, 200, 50, values, labels, &quot;#fff&quot;);

To begin with, Raphael doesn’t have direct support to create donut charts. You have to create that function inside Raphael, you sort of expand Raphael’s functionalities by creating a donutChart() function with the Raphael.fn method.

The thing about Raphael is, because it’s not data drive like D3, you have to set up more stuff. With D3, all you pretty much need to provide is the data, but with Raphael, you need to actually create the points for the vector drawing.

It’s a little more cumbersome and harder to understand, a lot more of variables and math involved. I got it from an online source, but I forgot to get the link to it. This code it’s not entirely mine. () {
 p.stop().animate({transform: &quot;t100,100&quot;}, ms, &quot;elastic&quot;);
 txt.stop().animate({opacity: 1}, ms, &quot;elastic&quot;);

var i =;
 var cumulatedAngle = 0;

 for( var j = 0 ; j &lt; (i) ; j++){
 cumulatedAngle += values[j];
 cumulatedAngle += 0.5 * values[i];
 var angle = ( (cumulatedAngle * 2.0 * Math.PI) / total )

 var _x = Math.cos(angle) * 100;
 var _y = -Math.sin(angle) * 100;

if(p.attr(&quot;toggle&quot;) == &quot;true&quot;){
 p.stop().animate({transform: &quot;t&quot;+_x+&quot;,&quot;+_y+&quot;&quot;}, ms, &quot;elastic&quot;);


As for the interactivity part, it’s got the same principles, since math is math doesn’t matter the language. But see specially how the animation part works differently from D3. On Raphael I couldn’t get the slices to move back, I couldn’t find a way to work as intended, like D3 did.

Once it's gone it's goneOnce it’s gone it’s gone

But wait again…

A friend of mine, the same math masters, comes to me saying that he found out about this other library. It’s called C3. And guess what, it’s built on top of D3. It’s meant to be easier than D3 and have all of its functionality. I started off with the example C3 offers on the site, you can see it’s so simple that it gets to the point of being dumb.

var chart = c3.generate({
 data: {
  columns: [
 ['data1', 30],
 ['data2', 120],
 type : 'donut',
 onclick: function (d, i) { console.log(&quot;onclick&quot;, d, i); },
 onmouseover: function (d, i) { console.log(&quot;onmouseover&quot;, d, i); },
 onmouseout: function (d, i) { console.log(&quot;onmouseout&quot;, d, i); }
 donut: {
 title: &quot;Iris Petal Width&quot;

15 lines of code and you have your little donut chart. Pretty neat, eh? And try hovering the mouse over it. Clicking on the chart legend. So amazing. Isn’t it awesome? No.

Because it’s got so many built in functionalities, you have to take it all in. You can switch some toggles, deactivate some stuff, but you don’t have complete control over it.

This is my code for the solution:

var columnsNo = 4;
var p;
var h = 600;
var w = 1000;
var chartValues = [30, 25, 10, 22];
var total = 0;
for(var i = 0; i &lt; columnsNo; i++)
total += chartValues[i];
var das;
var chart = c3.generate({
data: {
selection: {
enabled: false

labels: false,
order: null,
columns: [
['Active', chartValues[0]],
['Almost Active', chartValues[1]],
['Not Very Active', chartValues[2]],
['Sedentary', chartValues[3]]
type : 'donut',
// onmouseover: function (d, i) {event.preventDefault(); return;},
// onmouseout: function (d, i) {event.preventDefault(); return;},
onclick: function (d, i) {
p =;

das = d3.selectAll(&quot;path&quot;);
for( var j = 0 ; j &lt; columnsNo; j++ ){
if([0][j]).attr(&quot;toggle&quot;) == &quot;true&quot; ){[0][j])
.attr(&quot;transform&quot;,&quot;translate(&quot;+ (0) +&quot;,&quot;+(0)+&quot;)&quot;)
.attr(&quot;toggle&quot;, &quot;false&quot;);

var cumulatedAngle = 0;
for( var j = 0 ; j &lt; (d.index) ; j++){
cumulatedAngle += chartValues[j];

cumulatedAngle += 0.5 * chartValues[d.index];
var angle = ( (cumulatedAngle * 2.0 * Math.PI) / total )

var ang = angle - Math.PI * 0.5;

var _x = Math.cos(ang) * 100;
var _y = Math.sin(ang) * 100;

if(p.attr(&quot;toggle&quot;) == &quot;true&quot;){
.attr(&quot;transform&quot;,&quot;translate(&quot;+ (_x) +&quot;,&quot;+(_y)+&quot;)&quot;);
tooltip: {
show: true
size: { height: h, width: w },
colors: { data1: 'green',
data2: 'yellow',
data3: 'orange',
data4: 'red'
donut: {
label : { show:false },
title: &quot;Active Population&quot;,
width: 100,


See how I had to try and deactivate mouse functions manually. But that didn’t work. See how I had to think of a different way to get the pieces individually exploded. But you can still see how easy, fast and quick it is to set up, compared to the former 2 examples, and on the mouseClick() event I’m able of using D3’s selecting and manipulating functionalities  to play with the DOMs.

Looks niceLooks nice

So, with C3, I’m able to create charts very easily, pretty charts actually, I can apply the interactivity I need and….. it “works”. It “works” because it’s not entirely on my control. I believe that because of some of its built in events, I get this bug that if I move the mouse away while the animation of sliding, the slice freezes in place.

Check this outCheck this out

That’s something for later. I have to check some other stuff, work my way around and ask the forums, that kind of thing.

Wrapping Up

Raphael is more focused on design and art than data visualization, so  it’s not as easy as D3 to accomplish this task. At least for now, let’s put Raphael aside. Raphael is nice, it is good, but it’s not the tool for the job. The important thing I’m trying to show you here is not all the code itself, but the fact that you can accomplish the same task in a million different ways. There’s a lot of different tools for different tasks, you should first analyse and know what you want to do, so you know what you are looking for on a tool and what you don’t need, for more than what you need can become an overkill and it might cause more bad than good. C3 has more than I need, but in a good way. For it is exactly what I need, it’s easy, responsive, interactive and will cut a lot of the monkey work. Although that bug is getting on the way, the trouble of finding a solution for it is worth the choice.

Filed under: CDOT, Javascript

by brunodigiuseppe at February 04, 2015 10:17 PM

February 03, 2015

Ryan Dang

CCimage Release 0.1

I’ve been working on the API for add CC license for my first release for this project.

The idea is to keep the API relatively simple that developers can easily use it. The two public methods that the library should have and will mostly use by developers are


addLicense method takes 2 parameters. The first parameter is the type of license that will be embedded to the image. The second parameter is the Uri of the image that was taken.

extractLicense method take 1 parameter which is the  Uri of the image that needs to extract the license data. It returns a url string to the CC license.

With these 2 methods, developers will able to quickly implement CCimage library to their apps.


For Android developers, There are 3 possible ways that they can add CC license to the image.

If the developers create the camera app using Intent, they can use our CCimage library by calling it inside the onActivityResult override method


If the developers create camera app using Camera API (deprecated), they can use our CCimage library by calling it inside the Camera.PictureCallback. A simple example is


If the developers create camera app using android.hardware.camera2 API and the file is converted to .dng format,

they can use our CCimage library inside one of these 3 public method:





by byebyebyezzz at February 03, 2015 10:13 PM

Kieran Sedgwick

Demo: Thimble just levelled up

Ever since Mozfest of 2013, CDOT has wanted to power up Webmaker’s Thimble code editor by combining it with Adobe Brackets, a fully featured desktop code editor we’d hacked to work in a browser. Now, we have our first prototype of this combination in action!

Screen Shot 2015-02-03 at 1.55.47 PM

Here’s the quick tl;dr:

Takeaway 1: Much features, very free

Swapping out Thimble’s analogue of a combustion engine for the analogue of space wizardry in the form of Brackets means we get all of the Brackets functionality we want with near-zero development cost. Want an example? How about inline colour previews of CSS on hover?

Screen Shot 2015-02-03 at 1.29.55 PM

You want MORE you say? Then how about an inline colour picker for that CSS?



Inline preview of img tags? Sure!

Screen Shot 2015-02-03 at 1.29.42 PM

Takeaway 2: I hear you like extensions, so have some

Brackets comes with a built-in extension system. Do we need to add a feature we always wanted? Or one we lost in the transition to using Brackets? Build it as an extension. In fact, that’s what we’re doing next – replacing the features we love about Thimble with extension equivalents.

Takeaway 3: Oh, that old UI element? Totally optional

End game is a Thimble that can expose new tools to users layer by layer as they progress as developers. Maybe they don’t need a tool bar to start! Disable it. Maybe they need to use a full file tree instead of a single html file. Enable it! Linting code, multi-file projects, all of these are possibilities.

What do you think? Around the office, we gave it a good ole’

Screen Shot 2015-02-03 at 1.26.50 PM

by ksedgwick at February 03, 2015 06:57 PM

Andrew Benner

Bramble — thimbleProxy Extension

The search continues as we try to disable user interface functionality. We have been able to disable the sidebar, menu, and toolbar. The challenge we are facing right now is hiding the functionality when the Brackets editor starts up. Right now the entire editor is rendered, then the extension kicks in and hides everything. This displays a flash of the full editor followed by the editor layout we are trying to achieve. We are working to eliminate that flash of the full editor. We can achieve this goal if we alter the Brackets code, but our overall goal is to try and leave the Brackets code alone and incorporate the features we want through extensions. The way Brackets seems to run is in a way that loads the entire editor then loads the extensions. We are trying to figure out a work around.

I have shifted to a new piece of work. The user interface work is being taken over by two other members of the team. I am working on an extension with another team member. The extension will be able to take the code in the Brackets text editor and send it through a postMessage to its parent. This extension will allow Brackets to communicate with Thimble.

The extension is called thimbleProxy and can be viewed on GitHub. The first thing that needed to happen is to load all the dependencies, set our variables, and get a reference to the iframe parent.

Screen Shot 2015-02-03 at 11.24.14 AM

Then we must force Brackets to open up a new document so there is something to communicate to Thimble. When it opens we write a file with the initial source code.

Screen Shot 2015-02-03 at 11.25.13 AM

To force the Brackets code to enter the if statement necessary to open a new file we must run these statements.

Screen Shot 2015-02-03 at 11.24.40 AM

Then we have to getActiveEditor of the Brackets editor so we can retrieve the source code. When we have the reference to the parent window iframe, we send the source code to Thimble .

Screen Shot 2015-02-03 at 11.24.52 AM

by ajdbenner at February 03, 2015 06:01 PM

Koji Miyauchi

Hello world!

This is your very first post. Click the Edit link to modify or delete it, or start a new post. If you like, use this post to tell readers why you started this blog and what you plan to do with it.

Happy blogging!

by kojimiyauchi at February 03, 2015 05:40 PM

Alfred Tsang

Compiled Hello.c Lab 4

I added the -static option to the gcc command.  The size of the file with the first three options (-g -O1 -fno-builtin) with the -static option was nearly 1000X larger than the file that was obtained without the -static option.   The size of the first output without the static option is approximately 8248 bytes, meaning that the size of the file compiled with the -static option included is 8248000 bytes.

I removed the -fno-builtin option from the compilation of the Hello.c file.  The file with the -fno-builtin option removed was 3 bytes larger than  the file when it had the -fno-builtin command when compiled.

I removed the -g option from my initial compilation command.  There was nothing when I displayed the contents.

I added two arguments to the printf function in the Hello.c file and I got a couple of moves.  It does not matter how many arguments are put in- the first one goes in EDI, the other ones are rearranged.

I put the printf statement in the output() function.  This resulted in several moves and function calls being placed in a different order.

I removed the -O0 option and added the -O3 option to the command and it resulted in the file being processed faster than usual.

by kaputsky263 at February 03, 2015 01:20 PM

Jordan Theriault

Taxonomy of Extensible Metadata Platform (XMP) with a Creative Commons License


Adobe’s Extensible Metadata Platform is capable of the “creation, processing, and interchange of metadata”.(Adobe, 2) and most resembles the well-known XML format. The XMP packets are widely used in image files already, Adobe has integrated XMP into the most well known name in photo editing (Photoshop), 2D graphics creation (Illustrator), video editing (Premiere) and photo management (Bridge) making them the most logical choice for globally defining image metadata standards. This includes licensing information but is mostly used for last date modified, author, and common metadata. The XMP library is available from Adobe in C++ and can be downloaded here. This, among a vast functional list, allows the programmatic reading and writing of these XMP files.

Creative Commons provides simple, easy licenses that anyone can attribute to their works. These licenses have a wide variety of specifications that include allowing for share-alike, commercial and noncommercial uses. The fact that these can so easily be changed and still hold their legal meaning makes the Creative Commons licenses a fantastic choice for those working with images.

XMP Properties – Content Analysis

Among the large number of fields available in the XMP protocol, there are three fields which will be useful for adding the license: “Copyright Notice” (string), “Copyright Status” (boolean), and “Rights Usage Terms” (string). In order for proper attribution, it is also useful to include the option to an author for the work, but is not necessary in the basic case.

The length of these fields, as being a derivative of XML implies, are not limited in size other than what the image format can hold. However having a large field may cause unnecessary bloating to the image file. The full license for a Creative Commons license varies slightly depending on the particular customizations but are approximately 1600 words. Since these licenses have the aim of being in most image files, including full licenses could cause a great deal of clutter on the Internet. Therefore, it would be preferable to include a uniform resource identifier (uri) which Creative Commons provides for all license variations. Inserting this into the “Rights Usage Terms” would be sufficient for providing these terms.

Comparing Packets

Creative Commons provides a tool for generating an XMP packet which can be found here.

The packet generated by the tool available on Creative Common’s website does not make use of “Rights Usage Terms” and instead uses a description, with an XML namespace named “cc”. However, although it has proper XML syntax and does not break XMP syntax, this does not closely follow the XMP specifications as per Adobe’s documentation. This may be possible to include if there is a specific reason for it, but I can’t see an obvious reason for including the namespace. This packet does include the “Copyright Notice” and “Copyright Status” which are correct. However, the “Copyright Notice” contains repetition for languages “en”, “en_US” and “x-default” which could be used as an optional feature within the library.

I used Adobe Bridge to produce an XMP packet using the terms from the Creative Commons generated package to compare. The one produced was very similar, but excluded the aforementioned xml namespace, and did not have repetition for language options. In short, this package generated by Adobe Bridge seems to be a more lean XMP packet and conforms perfectly to the XMP specifications while providing a fully valid license.

Take a look at the packets, the first being the one generated by Creative Common’s XMP packet tool. And the second I generated using Adobe Bridge.

  1. Creative Commons Generated XMP Packet
  2. Adobe Bridge Generated.

I have attempted to contact Creative Commons Metadata team, and I am awaiting correspondence to inquire about the XMP formatting and other questions regarding embedding metadata. If you have any comments or questions regarding my findings, you can find my contact information under “Contact” on my website.

by JordanTheriault at February 03, 2015 03:36 AM

Hosung Hwang

Crosswalk – Cordova plugins compatibility

Adding Cordova Plugin to the Cordova project.

$ cordova create Testproject com.test Testproject
$ cordova platform add android
$ cordova plugin add

After making Crodova project, Android platform need to be added. And then, plugins are added.
All of them work with the cordova command.

Adding Cordova Plugin to the Crosswalk-Cordova project

$ ./create Testproject org.crosswalkproject.sample Testproject
$ sudo npm install -g plugman
$ plugman install --platform android --project . \

Using first command, Crosswalk-cordova for Android project is created.
“platform add android” doesn’t need to be done because this is only for Android so it already contains an android project.
According to the instruction, to add cordova plugin to Crosswalk project, another tool called “plugman” is used. Second command is to install it using npm(NodeJS Package Manager).
Third command is to add a cordova plugin’s particular version using plugman. However, plugman is from cordova project and it is basically the same as “cordova plugin add“. Although plugman is plugin management tool and it seems to have more functionality, I guess there will be no difference in terms of adding a plugin.

Crodova – Crosswalk plugin compatibility
Technically, Cordova plugins should work in Crosswalk Cordova. However, Crosswalk reports plugin compatibility lists. Which means a Crosswalk distribution is tested for particular plugin version.
Crosswalk Cordova for Android: plugin compatibility lists

Following table is for Crosswalk Cordova 7; Current stable version is

API Installation URL Notes
Battery Status
Device Motion (Accelerometer)
Device Orientation (Compass)
File Transfer
Geolocation Recommend to use Crosswalk’s geolocation API
Media Install File Plugins before this one
Media Capture Install File Plugins before this one
Network Information (Connection)

Next Step
– Testing another plugins except core plugins listed in compatibility list. (709 plugins)
This page says AdMob plugin and IAP plugin are also used in Crosswalk Android. What make it possible need to be examined.
– Compatibility list shows only until Crosswalk 7. Current stable version is 10. Most recent compatibility list need to be checked.
– Cordova command line interface and crosswalk command line compatibility check (list need)

by Hosung at February 03, 2015 02:22 AM

James Boyer

Lab 3

Lab 3, Python Profiling

We decided to not continue using MySQL chose to download python. We thought it would be a good idea to test it on both servers, australia: 64 bit x86 and red: 64 bit Arm.
We used python 3.4 using this link  and wget to get it onto the server. After we un-tarred it (tar -xvf [filename] ) we ran the ./configure which gave us a makefile and then we made a few changes.
We changed line 72 in the makefile which was
BASECFLAGS=         -Wno-unused-result

BASECFLAGS=         -Wno-unused-result -pg

This allows python to output a gmon.out file when python executes, this contains all the profiling information.
We now needed to run something with Python so we would have a valid gmon.out file.
we found this resource: tests which showed us how to run python tests using the command
./python -m test

On Australia:
This went through 389 tests when it was completed we had a gmon.out file
To make the profile easier to read our professor suggested to use gprof2dot to generate a png image file of the profile, the command is:
gprof ./python | gprof2dot | dot -Tpng > Profile.png  (full image)

On Red: 
This went through most of the tests but stalled at the last one with a message:
_mcleanup: gmon.out: No such file or directory 
(I later realized that message popped up a lot during the tests which suggests that it is not relevant to the stalling)
After waiting a long time I just used control-C to get out of the test and I still had a gmon.out with 388 tests. Again, used gprof2dot to get the image file of the profile (full image)

Although both pictures appear very different I was  looking at the function calls and they are the same, it is simply the layout of the picture that changes drastically.

There is one function that stands out on both systems is PyEval_EvalFrameEx, it takes up 12% on x86 and 14% on Arm. I looked up the source code here and the function itself is around 2500 lines , it is safe to say I'm not sure what it is doing. It does have an interesting comment section line 827 about optimizations that this function has, this is essentially trying to avoid leading the CPU on a mispredicted branch,  this relates to what Chris Tyler was talking about in our class(on 1/29/2015) when he mentioned CPUs guessing the correct path to follow.

In the end I think red's profile is hard to directly compare to australia's because of the difference of layout of the 2 files but the EvalFrameEx takes up the most usage on both systems. It seems like they are aware of this with the extensive optimization of it.

Thanks for reading

by James Boyer ( at February 03, 2015 01:27 AM

Lab 2

Lab 2, Benchmarking Mysql

For lab 2 we had to benchmark a software package. We chose MySql and found the link here and under the selected platform just choose Source Code. We were using a 64 bit x86 machine running fedora 21 and there was not a version for fedora so we just chose generic Linux option which is at the very bottom. We copied that link and used the wget command to install it via command line. Once we Un-tarred it  (tar -xvf [filename]) we started looking for the configure script but it did not have one, it used cmake, not configure. So we installed cmake on the server and ran it and it gave us a makefile. We ran make without the useful -j option which allows it to take advantage of multiple cores, so that delayed us, we ending up waiting for a long time. Once it was made we needed to benchmark it, eventually we found out how to use the benchmark suite through this tutorial. We were running low on time in class so we stopped it after the first test but it gave us this results file:

 per operation:
Operation                           seconds          usr       sys     cpu   tests
alter_table_add                      100.00    0.02    0.00    0.02     100
alter_table_drop                     101.00    0.01    0.00    0.01      91
create_index                           2.00        0.00    0.00    0.00       8
create_table                            11.00      0.00    0.00    0.00      28
drop_index                             2.00        0.00    0.00    0.00       8
drop_table                              14.00      0.00    0.00    0.00      28
insert                                       340.00    0.37    0.20    0.57    9768
select_distinct                          2.00        0.19    0.00    0.19     800
select_group                           1.00         0.24    0.02    0.26    2800
select_join                               0.00        0.06    0.00    0.06     100
select_key_prefix_join             1.00        0.34    0.00    0.34     100
select_simple_join                    1.00        0.08    0.00    0.08     500
TOTALS                               575.00     1.31    0.22    1.53   14331 

I'm not quite sure if I understand this table, particularly why there is so many seconds but there was so little time taken on usr or sys side. My theory is that the usr and sys and cpu columns are the percent of cpu it was using at the time, but I could be wrong.

I think this lab was quite useful, we learned how to use cmake  and how to speed up the make process with -j and we did get results, and although I might not understand them yet, they're still results!

Thanks for reading.

by James Boyer ( at February 03, 2015 01:26 AM

February 02, 2015

Gideon Thomas

C++ overload – How I struggled to build Adobe’s XMP library

So our group of 5 members + Dave Humphrey has been tasked with creating an open source library that can inject licenses into image metadata as part of the “Open Source Project” course. After some considerable research we decided to use Adobe’s XMP Toolkit (which btw is written in C++) for our purposes.

Now, one of the first programming languages that I learned was C/C++. However, since then, it has been my least proficient and least liked language. Verbose code is not something that I am fond of (I am very lazy). But we had no better choice, we had to work with C++ (for now at least).

My first task was to go through Adobe’s documentation for using their toolkit. Well written it may be, but it was also quite long. That took me a while and then came the moment to build the Toolkit from source. The documentation explains this to be fairly straight-forward; and it is…for the first couple of steps. Then we bring in the build step which requires us to use XCode (I’m on a Mac). This part was not so straight-forward. In fact it took me a really really long time to build both the Toolkit and the sample code.

I believed that it would simply require me to click ‘Build’, and everything would just magically work. Silly me! I see numerous compile-time errors that I am not sure what to do about. My first instinct was to Google them. For the first part of building the Toolkit itself, this did not work so well as all I found were generic answers. So, I carefully examined the error and decided to fix them manually (i.e. change Adobe’s code :-o). I had to change some library references here and there (from what I recall, one was changing an include tr1/memory to just memory) and fix some casting problems. Voila…it worked.

Unfortunately the same strategy could not be applied to building the sample code. Several “architecture” errors popped up in XCode which baffled me. But after some deep digging, I was able to find one answer that solved all my problems. This thread even possibly explains why I was encountering problems with the Toolkit build – “The SDK has been tested and released for Mac 10.8.5 and xcode 5.0.2. ”

So now that that is out of the way, time to actually use this code in a project…lets see how that turns out!

by Gideon Thomas at February 02, 2015 10:31 PM

Christopher Markieta

Progress Report

Calculating Compass Bearing

I have been using the GeoPy library for most of my geographical calculations. However, it does not provide a method to calculate bearing between two points. For this case I have referred to jeromer's found on GitHub.

For some strange reason, whenever the testing device is traveling in the direction of the 4th quadrant, the bearing is not reported correctly as it should be between 180° - 270° (Southeast)

2D Animation using PyGame or Pyglet

In order to visualize all of the abstract math and code, it would be useful to demonstrate an animation of what we are trying to implement. Since Python has many powerful graphics libraries such as PyGame, Pyglet, TkInter, PyGTK, PyCairo, wxPython, PyQT, and turtle, I think it would be a good use of our time to create a simple debugging interface that may save us a lot of time in the long run. PyGame and Pyglet are my top choices, as they are pretty popular and developing our animation will be pretty simple with the tools provided. It will also serve as a great demonstration tool for others to understand our project.

Testing Scenarios

If the team and our equipment is prepared by the 2nd week of February, we will begin our real-world testing of the app and the server. This is far from the stable release of our software, but it should be enough to get some basics worked out.

We would like to come up with at least 4 different scenarios to test. For starters, we will have 4 vehicles with a passenger in each.

by Christopher Markieta ( at February 02, 2015 08:20 PM

Hosung Hwang

[SPO600] Profiling Lab

Lab : SPO600 Profiling Lab

In the previous lab posting, I built php and did benchmarking it using test script.
Today, I tried to profile it using gprof tool in RED server using the same php script.

gprof tutorial :

1. Generating gmon.out
I modified Makefile to add -pg flag in compiler and linker flag. After building it, I ran it with bench.php.

2. Getting graph image from gmon.out

$ gprof ./php | ./ >
$ gprof ./php | ./ | dot -Tpng > phpgprof.png

First command generates .dot file format from gmon.out. This file can be used from tools like xdot and ZGRViewer. ZGRViewer is useful to navigate huge diagram.
Second command generates png image file from .dot format.
3. Getting Flat profile

$ gprof ./php > analysis.txt
$ gprof -a ./php > analysis.txt
$ gprof -a -p ./php > analysis.txt

This command generates following flat profile.

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00    19286     0.00     0.00  zend_parse_ini_string
  0.00      0.00     0.00    18224     0.00     0.00  _safe_emalloc
  0.00      0.00     0.00     9952     0.00     0.00  lex_scan
  0.00      0.00     0.00     9358     0.00     0.00  zend_function_dtor
  0.00      0.00     0.00     6775     0.00     0.00  zend_initialize_class_data
  0.00      0.00     0.00     5422     0.00     0.00  zend_is_compiling
  0.00      0.00     0.00     4470     0.00     0.00  _zend_hash_quick_add_or_update
  0.00      0.00     0.00     4091     0.00     0.00  zend_ensure_fpu_mode
  0.00      0.00     0.00     3495     0.00     0.00  zend_stack_destroy
  0.00      0.00     0.00     3334     0.00     0.00  file_handle_dtor
  0.00      0.00     0.00     3012     0.00     0.00  zend_binary_strcmp
  0.00      0.00     0.00     2727     0.00     0.00  zend_is_auto_global_quick
  0.00      0.00     0.00     2717     0.00     0.00  zend_hash_exists
  0.00      0.00     0.00     2666     0.00     0.00  zend_binary_strncmp
  0.00      0.00     0.00     2666     0.00     0.00  zend_get_zval_ptr
  0.00      0.00     0.00     2617     0.00     0.00  zend_stack_del_top
  0.00      0.00     0.00     2597     0.00     0.00  zend_stack_base
  0.00      0.00     0.00     2401     0.00     0.00  zend_memory_usage
  0.00      0.00     0.00     2227     0.00     0.00  function_add_ref
  0.00      0.00     0.00     1848     0.00     0.00  zend_hash_func
  0.00      0.00     0.00     1636     0.00     0.00  _zval_internal_dtor
  0.00      0.00     0.00     1410     0.00     0.00  get_binary_op
  0.00      0.00     0.00     1264     0.00     0.00  zend_llist_prepend_element
  0.00      0.00     0.00     1262     0.00     0.00  zend_llist_copy
  0.00      0.00     0.00     1238     0.00     0.00  _zend_hash_add_or_update
  0.00      0.00     0.00     1131     0.00     0.00  zend_do_bind_traits

4. Analyzing data
The most frequently called function was zend_parse_ini_string. This function is in Zend/zend_ini_parser.c and it is a part of bison parser.

ZEND_API int zend_parse_ini_string(char *str, zend_bool unbuffered_errors, int scanner_mode, zend_ini_parser_cb_t ini_parser_cb, void *arg TSRMLS_DC)
	int retval;
	zend_ini_parser_param ini_parser_param;

	ini_parser_param.ini_parser_cb = ini_parser_cb;
	ini_parser_param.arg = arg;
	CG(ini_parser_param) = &ini_parser_param;

	if (zend_ini_prepare_string_for_scanning(str, scanner_mode TSRMLS_CC) == FAILURE) {
		return FAILURE;

	CG(ini_parser_unbuffered_errors) = unbuffered_errors;
	retval = ini_parse(TSRMLS_C);


	if (retval == 0) {
		return SUCCESS;
	} else {
		return FAILURE;

Parsing and memory functions were frequently called. Parsing part source code is bison parser. I am not sure if it can be optimized.

Second most frequently called function was _safe_emalloc, which is in zend_alloc.c

#elif defined(__GNUC__) && defined(__arm__)

static inline size_t safe_address(size_t nmemb, size_t size, size_t offset)
        size_t res;
        unsigned long overflow;

        __asm__ ("umlal %0,%1,%2,%3"
             : "=r"(res), "=r"(overflow)
             : "r"(nmemb),

        if (UNEXPECTED(overflow)) {
                zend_error_noreturn(E_ERROR, "Possible integer overflow in memory allocation (%zu * %zu + %zu)", nmemb, size, offset);
                return 0;
        return res;
#elif defined(__GNUC__) && defined(__aarch64__)
static inline size_t safe_address(size_t nmemb, size_t size, size_t offset)
        size_t res;
        unsigned long overflow;

        __asm__ ("mul %0,%2,%3\n\tumulh %1,%2,%3\n\tadds %0,%0,%4\n\tadc %1,%1,xzr"
             : "=&r"(res), "=&r"(overflow)
             : "r"(nmemb),

        if (UNEXPECTED(overflow)) {
                zend_error_noreturn(E_ERROR, "Possible integer overflow in memory allocation (%zu * %zu + %zu)", nmemb, size, offset);
                return 0;
        return res;

ZEND_API void *_safe_emalloc(size_t nmemb, size_t size, size_t offset ZEND_FILE_LINE_DC ZEND_FILE_LINE_ORIG_DC)
	return emalloc_rel(safe_address(nmemb, size, offset));

The code seems to already has optimization depending on architecture using assembly code.

So far, I am not sure what kind of code can be optimized. Finding it will need more time.

by Hosung at February 02, 2015 07:07 AM

Cha Li

lab2 benchmarking

Our group decide to do php

we got the php from the website

wget -O filename mirrorUrl

achieved the file

tar -zxvf php-5.6.5.tar.gz

configure the file

./configure \

–prefix=/usr/local/php \

–enable-mbstring \

–with-curl \

–with-openssl \

–with-xmlrpc \

–enable-soap \

–enable-zip \

–with-gd \

–with-jpeg-dir \

–with-png-dir \

–with-mysql \

–with-pgsql \

–enable-embedded-mysqli \

–enable-intl \

then run these 2 command:


make install

The environment:

Name : red

Cpu : 7

in use:1


width:64 bits


test command

time php -r ‘phpinfo();’

time php -r ‘for($i = 0; $i < 1000; $i++){echo “hello”;}’

by lifuzhang1991 at February 02, 2015 05:20 AM

Thana Annis

Building and Benchmarking Python v 2.7.9


To build Python navigate to and look for the gzipped source tarball link. Right click and save the link. Run wget <link> to download the source. Unpack the file with tar xvf.

Inside the new directory you want to configure the install for your current platform by running ./configure.

We will be building Python with profiling enabled, so you will need to modify Makefile. Add -pg to $CC variable in the Makefile.

In the command line run make j15 2>&1 | tee buildResults.txt. This will build Python using j15 to utilize multiple cores for more speed and it will copy the build results into a text file for you to look at later.

If the build completes successfully then you’ll want to run make test to ensure the install was clean.


Since I’m not very familiar with the Python language, the script I used to benchmark is a simple inline command to test the print function.

 for((i = 0; i < 10; i++)) ; do time python -c ‘print 2*125′ ; done;

The results were consistently

real    0m0.024s
user    0m0.020s
sys     0m0.000s

The most it deviated was +/-0.01s real and -0.010s user. Changing what was printed didn’t seem to have any affect on the completion time, so it seems as though this function’s run time is fairly constant.

The environment I used in testing this is:

ARMv8 AArch64 machine running Fedora release 21.

by bwaffles91 at February 02, 2015 04:34 AM

Cha Li

Lab 1

GUN development

License: GUN General Public License

Aspell is one of open source project under Gun. It design to be a spell check tool to replace ispell. Aspell Community use E-mail to exchange information in community. There are two goals Aspell want to achieve. The first goal is be able to suggest possible replacement of a misspelled word. The second goal is become the standard system spell checker for GUN/Linux operation system.  The current maintainer is Kevin Atkinson.

Review source code involved several step.

  • Update project information on Savannah
  • Turn to the mailing lists
  • Check for existing bug report
  • Contact distro packagers
  • Use the software
  • Focusing on fixing outstand bug

Apache Open Office

Apache 2.0 License

Open Office is a large, complex open source project. To get start with Apache open office develop, you need to join Mailing Lists first. Mailing List is a useful way to share the information with community member since it will forward the mail to all that subscribers to that mailing list. The code you contribute to community must under Apache License 2.0 and must code by yourself.  For small bug fixed code, best submit with as patch attached to a Bugzilla issue. For large contribute, you must submit an individual contributor license agreement (ICLA) form too. Apache open source community use Apache Subversion for version control.

$ svn co excalibur-trunk

$ cd excalibur-trunk

$ echo “test” > test.txt

$ svn add test.txt

$ svn commit –username your-name –password your-password \

The newest version of Apache OpenOffice is 4.1.1.  it’s a small update with multiple bug fixed such as  text isn’t right-aligned in Hebrew/Arabic language and Picture loss on save. However, it still has lots of errors that need to be fixed.

Conclusion: e-mail seem to be an effective tool to communicate in open source project since people may not be in same time zone. In additional, get useful feedback or bug report is very Important in open source project

by lifuzhang1991 at February 02, 2015 04:31 AM