Planet CDOT

February 04, 2016


Andrew Smith

WTFPL

Hah! I didn’t even make that one up: Do What the Fuck You Want to Public License. It should have been “whatever” not “what”, which kind of sucks, makes it look like an illiterate 15-year-old american came up with it.

I found it because uMap (an awesome piece of software I’m sure I‘ll write about one day) is licenced under it.

It’s interesting, for most of my life I used the GPL for my software, and if ever I considered a BSD/MIT-style licence I was confused about the need for a licence/disclaimer at all, given that you’re free to do whatever you want.

If I had to decide today – I don’t know what I’d do. I still think it’s kind of retarded to claim that just because you got some software for free on the internet the author of the software owes you a warranty, but I’d need to look deeper into why the warranty disclaimers are so prevalent.

by Andrew Smith at February 04, 2016 04:43 AM

February 01, 2016


Kenny Nguyen

SPO600 Lab 1

Inprogress
Link To Lab

Alright the general mission for this lab is to find open source projects and study their patch submission process. For my first project I tried shopping around to find a good one, I started off looking at a few projects, namely: grep, barcode, mpv,

Nano

Link to Application

License: GNU General Public License

Patch that will be Examined

For my first patch I decided to choose "patch #7082: adding a noread option". The reasoning I decided upon this one is because the original submitter has no direct connection to the GNU nano. For the patch, the original submitter submitted a patch to savannah which appears to be a bug tracker where people can submit code/patches, he outlined what he was submitting and why. From there we can see that someone from the project has acknowledged the patch, but has delayed its implementation due to scheduling, a few months pass and a different member of the community brings whether or not the patch should still be implemented, in which they agree to accept it, and sometime later it is finally accepted into a branch of the SVN.

Brackets

Link to Application

License: Adobe

Patch that will be Examined

by Kenny Nguyen at February 01, 2016 04:31 AM


Berwout de Vries Robles

SPO600 Lab 4

In our fourth lab we looked at some of the different compiler options and the differences it made in the assembly output. In order to look at the differences we wrote a simple hello world program that was compiled with the different options.

I will go by the options we went over one by one below.

0. Original.
The original command we used was gcc -g -O0 -fno-builtin.
This version is used to compile a file with debugging information, no optimization and without use of builtin functionality. 
 
1. Static.
Adding -static means the compiler pulls in the entire library the function you are using is in. In this case it pulled in the library that contains the printf function we used to print our message. This results in about a 10 times increase in file size in this case. The advantage of using static is that no matter what system you go to, you can guarantee that the functions you are using will be present, since you include the entire library. The disadvantage is the strain on the file size and if you use static a lot, a lot of your programs will have the same libraries, but with different versions. It will be hard to keep track of security flaws and versioning.

2. Not using fno-builtin.
 fno-builtin means you are asking the compiler not to use the builtin functionality. In this case the compiler recognizes it has a more efficient way to do the printf function because it is only using 1 argument. The compiler realizes that it can use the builtin function puts instead for more efficiency.

3. Removing -g .
 By removing -g from the command we are compiling the file in without debug information, this means the file becomes smaller, but also harder to read. the --source command no longer shows you where the compiler put what part of your code.

4. Adding additional arguments to the printf.
On aarch64 after assigning 7 arguments to different registers, the rest of the arguments get put on the stack through r1. On x86_64 there were only 5 arguments assigned to the registers before putting the rest on the stack. This makes sense because x86_64 has less registers to work with. Interestingly enough, on the aarch64 side the assembly made use of store instead of put to add the arguments to the stack.

5. Moving printf to a separate function called output and calling that from the main.
With this one I thought the compiler would recognise that it is just calling printf and the function did not really have any use, but I was wrong. The assembly called output from the main and in the output was the same section that was previously in the main. If you use arguments to pass the string it also stores some overhead on the stack.

6. Replace O0 with O3.
This instruction asks the compiler to use a lot of optimization(severe), it will actually move some of your code around to do so, so if you compile something with O3 you always have to check whether the functionality of your program remains the same. One of the differences in the output is that instead of returning to the main function after calling the printf function the compiler realized that the printf was the last statement and with an unconditional branch told the program not to bother coming back to the main. Severe optimization also deleted the set up for the stack in this case, because this program makes no use of it.

So this week we saw a small part of what the compiler can do for us and what we can ask it to do. I found it very interesting to finally see what makes it so that any code we write can actually run. I think understanding how the compiler turns our code into assembler can really help us decide what options to use when we want to go for maximum efficiency in our code.

by Berwout de Vries Robles (noreply@blogger.com) at February 01, 2016 04:26 AM


Andrei Topala

Compiled C Lab

So it was that we wrote a simple Hello World program. And it was good. Then we compiled it:

gcc -g -O0 -fno-builtin hello.c -o hello

And it did compile. Now, the -g enables debugging information, the -O0 tells the compiler not to optimize (numbers greater than zero would have optimized the program: -O1 would do basic optimization, -O2 would do all the safe optimizations, and -O3 would perform aggressive optimization), and the -fno-builtin tells it not to use builtin function optimizations. So, we got an unoptimized binary ELF file with debugging information. Let’s examine it!

We were told to use objdump with the -f, -s, -d, or –source options in order to examine the file. We did. It was a good file.

Now, we were to use objdump to examine the changes that would occur for the following:

  1. Adding the compiler option -static:
    The resulting binary file was much larger than the original. The cause of this was that the -static option brought in all the environment variables and libraries needed to run the program.
  2. Removing the compiler option -fno-builtinMost noticeable was that calls to the printf function in the ELF file were instead calls to the puts function. This is a function optimization based on the fact that the puts function is faster than printf. When printf is supposed just to output a string without any variables, the compiler calls puts instead, which skips past the whole hassle of formatting the output string and just puts it there.
  3. Removing the compiler option -gWithout debugging information, the resultant binary file was significantly smaller. The ELF did not have the .debug_ sections that the first file had.
  4. Adding additional arguments to the printf() function.The first few arguments are stored in the registers while the rest are stored on the stack.
  5. Moving the printf() call to a separate function called output().we saw that the <main> section was smaller and had only the call to output. The <output> section was identical to the original <main> section, that is, it stored the “Hello World” string in memory and called the printf() function.
  6. Removing -O0 and adding -O3.The binary file was even larger. We are optimizing for speed, then, not size. The disassembled code was quite different. The compiler performed several optimizations to make the code execute faster. For example, the unoptimized ELF uses mov $0x0,%eax to clear the register %eax, while the optimized file does xor %eax,%eax.

by andrei600 at February 01, 2016 04:04 AM


Kenny Nguyen

SPO600 Lab4

Not Done yet >.>
Link to lab

For this lab we're tasked with compiling C code with various different compilation options. From this point on I'll simply list the compilation and go over what it does.

Legend: c file will that will be compiled will be named lab4.c

The contents of lab4 are as follow:

//Lab4.c #include <stdio.h> int main() { printf("Hello World!\n"); }

1.

gcc -o lab4 lab4.c

This compilation is simple, it'll just export the code with all the default options to a compiled file called lab4 in which we can run.

Additional notes: -o denotes output and takes 1 argument, in this case lab4, thus upon run it'll make the output file be named lab4 instead of the system default(in some cases a.out)

2.

gcc -g -fno-builtin -static -o lab4 lab4.c

For this compilation we're focusing on -static. Upon completion of compilation if we compare the file sizes alone, we can notice that the file size of our newly compiled source is much larger than that of our original file. It ballooned to aproximately 10x its original size.

8.4KB vs 836KB

examining the file further with the:

objdump -d

We can peer further into the compiled code and find out what its doing. I think I'm starting to ramble so I'll make this short and sweet. With the command listed above we discover that the file has become really really long, this is to be expected, the static command does not dynamically link to the stdio.h library but pulls in that code into the compiled code. Thus inside main instead of dynamically asking for the puts@plt function it will call upon a function that was pulled into the compilation called: IOprintf

Addition notes:
- g enables debugging information
- fno-builtin tells the compiler to not use the builtin function optimizations

3.

gcc -g -o lab4 lab4.c

Didn't notice anything obvious on my thinkpad 420s with fedora, will return to note changes later

4.

gcc -fno-builtin -o lab4 lab4.c

Quick notes: Same file size, print@plt instead of puts@plt.

5.

Standard Compliation but lab4 has been modified

//Lab4Modified #include <stdio.h> int main() { printf("Hello World!\n" + "Test" + "test"); }

6.

Standard Compilation but lab4 is modified again

//Lab4Modified #include <stdio.h> int main() { output(); } void output() { printf("Hello World!\n"); }

Instead of calling the puts@plt, it calls the output function to then call puts@plt, not much of a change.

7.

gcc -g -fno-builtin -O0 -o lab4 lab4.c
gcc -g -fno-builtin -O3 -o lab4 lab4.c

For this examination we'll compare -O0 being no optimization versus -O3 being aggressive optimization.

by Kenny Nguyen at February 01, 2016 03:58 AM


Berwout de Vries Robles

SPO600 Lab 3


For our third lab we split up in groups to write some assembly, we started by looking at some x86_64 assembler, after which we looked at the differences between that and aarch64 assembler.

Our first assignment was to build a simple loop that loops from 0-9 and displays "Loop: [the current number] \n" of the loop. In order to convert our loop number to an ASCII-character we have to add 48 to the number because this is the offset of characters where the 0 starts in ASCII. We did this by moving 48 into an empty register that doesn't get trampled at the start of the program. Then in the loop we moved the loop number into an empty register, added 48 from the previous register and wrote the result into the rsi register(message location register).
At first we had our string in a .rodata section, this stands for read-only, so when we tried to write into it, nothing happened. We fixed this by removing the ro to spell .data
Next we did not realise that mov moves the full 64 bytes. Even if you only had 1 byte of data.
This caused our number's move to overwrite the newline character at the end of the original string. This is fixed by adding the b suffix to mov and the register you wish to move it from. So it becomes something like: movb %r13b,(msg+6).
On aarch64 moving into the message with an offset was a little bit different, but otherwise the instructions were similar.

Next we changed it so the loop would print from 00-30. This used the div instruction, which takes the rax register and divides it by a chosen register. It stores the result into rax and the remainder into rdx. We divided the loop number by 10, added our offset of 48 to rax and rdx and moved them into the string as we had done in the previous assignment.
One thing to pay attention to: before using the div operation, the rdx register has to be set to 0!
On aarch64 this was similar, but you had to use msub to calculate the remainder.

The last assignment was supressing the first digit when the number was below 10 We did this by checking if the quotient was zero, if it was, we use je(jump if equal) or be(branch if equal) to jump to a label past the mov assignment to move the first digit into the register.

Tips and tricks:
as -g -o filename.o filename.s; ld -o filename filename.o; ./filename compiles and executes a source file in one statement.
Make sure you know whether you are on least significant byte first architecture or most significant byte architecture as it could heavily influence the result of moving a specific byte somewhere.
Copy your files regularly so if you add to a file and you break it, you still have a working copy to try something new on.

Observations about the differences between aarch64 and x86_64 assembler:
- If you need just the remainder on aarch64 you still have to divide, where you get the remainder built in on x86_64
- When moving just a byte on aarch64 you specify a 32 bit-wide register and you work with an offset, whereas on x86_64 you have the option to specify byte sized.

by Berwout de Vries Robles (noreply@blogger.com) at February 01, 2016 03:56 AM


Andrei Topala

Assembler Lab

There are three servers that we will use for the course. The first two, aarchie and betty (soon also jughead?), are ARM8 AArch64 systems. The third, xerxes (the Great?), is an x86_64 server. In groups, we examined the result of compiling C versions and assembly language versions of a “Hello World” on each of the two different architectures.

We used objdump -d hello to examine the instructions of the binary files created from the C source. Both versions of the program–the one compiled on x86_64 and the one compiled on aarch64–were fairly similar. There was a large amount of code, and the <main> sections, with which we were concerned, were about the same. The mnemonics were different, and so were some of the instructions, but the gist of was that there was a location in memory in which the message was, and then a call or branch to <printf@plt>.

For the assembler source, we compiled the files on both architectures, as before, and examined them with objdump -d hello. Instead of a whole bunch of code, like we had with the compiled C programs, we saw only a very small amount. aarchie’s file has the format elf64-littleaarch64, while xerxes’ had elf64-x86-64. Then there was a disassembly of section .text; both files had their own instructions.

Now, having examined the structure of the assembler code, we were given the task of creating a simple loop that would output the word “Loop” and each number from 1 to 9 (i.e. “Loop: 1,” “Loop: 2,” etc.). Printing the word was simple, and printing the number was simple, but where our group ran into trouble was printing the word and the loop. We had the string in one register and the ASCII number in another, and we were trying to combine the two. Our teacher, however, told us that we could simply write the ASCII number to the location in memory where the string was stored, plus the proper offset. Once set on the right track, we finished the project without trouble.

Looping it up to 30 was also relatively simple. To suppress the tens digit for the first 10 numbers, we moved the iterator to a new register and divided the number by 10. Then we checked the tens digit, compared it to 0, and, if it was 0, jumped over the part of the code that would have printed it.

Writing in assembler is, I feel, more difficult than writing in higher-level programming languages because of the need to manipulate and keep track of the registers, and because we are dealing with specific locations in memory. Something as simple as printing a counting loop, which would be trivial in most high-level programming languages, takes a decent amount of work and requires attention.

The precision and the level of control the programmer has, however, seems like a serious advantage, and I am sure there are techniques for optimization that cannot be done in a high-level language, and that require the use of assembler code.

 


by andrei600 at February 01, 2016 03:08 AM

January 31, 2016


Giuseppe Ranieri

Compling a C Program

In this lab, we investigated the source code of a C program against the output of the C compiler. We first started off with compiling hello.c as is.
gcc -o hello hello.c -g -O0 -fno-builtin

The options do the following:
  • -g:                  Enables debug information
  • -O0:               Disables GCC optimizations
  • -fno-builtin:   Disables C optimizations

The resulting binary files was just a few kilobytes in size.  In the lab we were to recompile the code with the following changes:
  1. Add the compiler option -static
  2. Remove the compiler option -fno-builtin
  3. Remove the compiler option -g
  4. Add additional arguments to the printf()
  5. Move the printf() call to a separate function named output(), and call that function from main()
  6. Remove the -O0 and add -03 to the gcc options

 

Add the Compiler Option -static

The static option causes all the dependencies to be added for the program in the binary. Everything is included and compiled along with the source code. This option makes the program access links much faster but increases the size of the binary. Without the option the program will dynamically look for the links.

 

Remove the Compiler Option -fno-builtin

This option tells the compiler to avoid using any built in optimization techniques. When used it changes the function printf() to become puts(). The difference between the two is that printf() validates each character to make sure its a formatted. While puts() simply inserts the function into the buffer and onto the screen without any validation. Other functions that are being used in the background are also changed but I am unable to tell why and for what purpose. I assume most of the operations are much faster and more direct just like printf() and puts().

 

Remove the Compiler Option -g

The -g option allows debugging to occur. Without it errors are not as human readable and warning are not told. The -g option can make the job of figuring out easier for the developer, but the public binary should not be released with it.

 

Add Additional Arguments to the printf()

When additional arguments are added to printf(), based on the platform, numbers are loaded into the registry first while others are pushed onto the stack.

int main() {
  400500:       55                      push   %rbp
  400501:       48 89 e5                mov    %rsp,%rbp
  400504:       48 83 ec 30             sub    $0x30,%rsp
    printf("Hello World!\n%d%d%d%d%d%d%d%d%d%d%d", 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100);
  400508:       c7 44 24 28 64 00 00    movl   $0x64,0x28(%rsp)
  40050f:       00
  400510:       c7 44 24 20 5a 00 00    movl   $0x5a,0x20(%rsp)
  400517:       00
  400518:       c7 44 24 18 50 00 00    movl   $0x50,0x18(%rsp)
  40051f:       00
  400520:       c7 44 24 10 46 00 00    movl   $0x46,0x10(%rsp)
  400527:       00
  400528:       c7 44 24 08 3c 00 00    movl   $0x3c,0x8(%rsp)
  40052f:       00
  400530:       c7 04 24 32 00 00 00    movl   $0x32,(%rsp)
  400537:       41 b9 28 00 00 00       mov    $0x28,%r9d
  40053d:       41 b8 1e 00 00 00       mov    $0x1e,%r8d
  400543:       b9 14 00 00 00          mov    $0x14,%ecx
  400548:       ba 0a 00 00 00          mov    $0xa,%edx
  40054d:       be 00 00 00 00          mov    $0x0,%esi
  400552:       bf 00 06 40 00          mov    $0x400600,%edi
  400557:       b8 00 00 00 00          mov    $0x0,%eax
  40055c:       e8 7f fe ff ff          callq  4003e0 <printf@plt>
}
  400561:       c9                      leaveq
  400562:       c3                      retq
  400563:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40056a:       00 00 00
  40056d:       0f 1f 00                nopl   (%rax)

 

Move the printf() Call to a Seperate Function Named output(), and Call that Function From main()

Moving the function for this is straight forward. All that seems to happens is that the output() function has its own section that main calls upon.

 

Remove the -O0 and Add -O3 to the GCC options

Any -OX command allows for optimization to be applied to the program. X being the level of optimization, where 0 is the minimal level of optimization and 3 is the most optimized. -O3 has the potential to break your program as it goes above and beyond to make sure it is optimized. From -O0 to -O3 different options are turned on automatically for the developer but they can also enable these options themselves for better control and choice.

Here are the important differences in main(-O3 vs -O0):
int main() {
    printf("Hello World!\n");
  400410:       bf a0 05 40 00          mov    $0x4005a0,%edi
  400415:       31 c0                   xor    %eax,%eax
  400417:       e9 c4 ff ff ff          jmpq   4003e0 <printf@plt>
and
int main() {
  400500:       55                      push   %rbp
  400501:       48 89 e5                mov    %rsp,%rbp
    printf("Hello World!\n");
  400504:       bf b0 05 40 00          mov    $0x4005b0,%edi
  400509:       b8 00 00 00 00          mov    $0x0,%eax
  40050e:       e8 cd fe ff ff          callq  4003e0 <printf@plt>
}
  400513:       5d                      pop    %rbp
  400514:       c3                      retq
  400515:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40051c:       00 00 00
  40051f:       90                      nop

As you can see, -O3 accomplishes the same task in less code than -O0.
 

by JoeyRanieri (noreply@blogger.com) at January 31, 2016 11:02 PM


Andrei Topala

Code Building Lab

Now we’ll try to build two packages that have different licenses.

Let’s take a look at the process of building a simple software package from the Free Software Foundation’s GNU Project. Going through the blurbs for all the official GNU packages, I come across something entitled “Gcal,” a program that can calculate and print calendars. Our current operating system, Fedora 23, does come installed with the standard cal program available on most Unix machines, but gcal has several extended features, including the ability to the calculate astronomical data such as the phases of the moon, which is information that can be used to plan the harvest season and to perform magick rituals—a must for any computer programmer. Let’s go ahead and download the gcal package.

In the downloads directory, we have now a file labeled gcal-4.tar.gz. The .tar extension indicates that it is a tarfile (“tar” being short for tape archive), that is, a group of files bundled together with the tar command; and the .gz extension indicates that it has been compressed with the gzip command. To build the package, first we must unpack the archive by using the tar command. Let’s run the following command:

tar -xzf gcal-4.tar.gz

The x option tells the tar command that we need to extract something, the z option tells it that we need to filter the archive through gzip, and the f option tells it that we are using a file, specifically the gcal-4.tar.gz file. Running the command creates and populates a new folder by the name of gcal-4. (This name, by the way, follows the naming convention common to most GNU packages: package_name-version.)

Inside the folder, there is an INSTALL file which tells us that the ‘configure’ shell script will try to find correct values for system-dependent variables that are to be used during compilation, and will create a ‘Makefile’ which we may then use to compile the package. Let’s go ahead and run the script.

./configure

Alas, we are accosted by an error:

configure: error: no acceptable C compiler found in $PATH

It seems our newly-installed operating system is missing a C compiler. It’s simple enough to install one:

sudo dnf install gcc

GCC is the GNU Compiler Collection; it should give us what we need. Let’s run the configure script again:

./configure

A long list of statements cascades across the terminal, each announcing that it has checked something. A few seconds pass and we are told that all the checks are done. We find now in the same folder a Makefile. We’ll invoke the make command, which will use the makefile to convert the source code of the package into executable binaries:

make

Another list of statements informs us of the steps taken to build the package, and at the end of it we find that there is a new gcal file inside the src directory. Let’s try to run it.

./gcal

screen6

The program performs as advertised!

Unfortunately, the documentation for gcal is a little overwhelming at the moment, so I won’t explore all of its functions in this post, but here’s a list of this year’s astronomical events (using GMT; this can be configured):

[andrei@localhost src]$ ./gcal -n --astronomical-holidays
Eternal holiday list: The year 2016 is A leap year
New Moon 01:30 (Ast) - Sun, Jan 10th 2016 = -19 days
Waxing Half Moon 23:26 (Ast) - Sat, Jan 16th 2016 = -13 days
Full Moon 01:46 (Ast) - Sun, Jan 24th 2016 = -5 days
Waning Half Moon 03:28 (Ast) - Mon, Feb 1st 2016 = +3 days
New Moon 14:39 (Ast) - Mon, Feb 8th 2016 = +10 days
Waxing Half Moon 07:46 (Ast) - Mon, Feb 15th 2016 = +17 days
Full Moon 18:20 (Ast) - Mon, Feb 22nd 2016 = +24 days
Waning Half Moon 23:11 (Ast) - Tue, Mar 1st 2016 = +32 days
New Moon 01:54 (Ast) - Wed, Mar 9th 2016 = +40 days
Solar Eclipse/Total 01:57 (Ast) - Wed, Mar 9th 2016 = +40 days
Waxing Half Moon 17:03 (Ast) - Tue, Mar 15th 2016 = +46 days
Equinox Day 04:30 (Ast) - Sun, Mar 20th 2016 = +51 days
Full Moon 12:01 (Ast) - Wed, Mar 23rd 2016 = +54 days
Lunar Eclipse/Penumbral 11:47 (Ast) - Wed, Mar 23rd 2016 = +54 days

And then, inside the data/dates directory, we find some more important information:

screen8

Let’s try to build something a little more complicated now. We’ll try to build NetHack, an ASCII dungeon-crawling game (which, coincidentally, just last month released its first major update in twelve years). NetHack is a popular Roguelike (meaning like Rogue) game that we can run in our terminal.

We download the file nethack-360-src.tgz from the official site. The .tgz extension is the same as the tar.gz extension we dealt with earlier. We extract the tarball using the same command.

tar xzf nethack-360-src.tgz

Inside the folder in which the archive has been extracted, there is a README file, which directs us to the sys/unix/Install.unx file, which then directs us to the newer sys/unix/NewInstall.unx. There, we learn that the sys/unix/hints directory contains a file which will help us configure the package. From the sys/unix directory, we run the command

sh setup.sh hints/linux

and everything seems to be okay. Nothing is printed to the terminal, which is usually a good sign. Next, we’re to go to the top directory and run the command

make all

Note that we’re doing this without having had installed any of the package’s dependencies. We’ll see where it gets us and we’ll deal with trouble as it comes.

Several files seem to compile properly, but then we’re met with the following error:

../win/tty/termcap.c:852:20: fatal error: curses.h: No such file or directory
compilation terminated.

The curses.h file can be found in the ncurses-devel package, which is a library containing developer’s tools for ncurses, which, according to the GNU website, “is software for controlling writing to the console screen under Unix, Linux and other operating systems.” Let’s install ncurses-devel:

sudo dnf install ncurses-devel

Now, if we run sudo find / -name curses.h (sudo is needed in order to traverse directories forbidden to us), we see that we now have multiple curses.h files:

find: /usr/include/ncursesw/curses.h
/usr/include/curses.h
/usr/include/ncurses/curses.h

Let’s try to compile NetHack again. The NewInstall.unx file told us to remove all the generated files before retrying to build the package, so from the top level directory (ie. The one into which the tarball had been extracted) we run this command:

make spotless

And then:

make all

Another error:

yacc -d dgn_comp.y
make[1]: yacc: Command not found
Makefile:327: recipe for target 'dgn_yacc.c' failed
make[1]: *** [dgn_yacc.c] Error 127
make[1]: Leaving directory '/home/andrei/Downloads/nethack-3.6.0/util'
Makefile:191: recipe for target 'dungeon' failed
make: *** [dungeon] Error 2

Yacc is a parser generator which, according to the Install.unx file, is being used here to produce dungeon and level compilers. A GNU tool called Bison is compatible with Yacc, and can do the same thing. Let’s install Bison:

sudo dnf install bison

We’ll also need the program flex, which is a lexical analyzer that can tokenize input data to be used by Bison. (There is a lot more to yacc and lex (and Bison and flex), but for the purposes of building the NetHack package, this is all we need to know.)

sudo dnf install flex

We must also edit the sys/unix/Makefile.utl file:

# yacc/lex programs to use to generate *_comp.h, *_lex.c, and *_yacc.c.

# if, instead of yacc/lex you have bison/flex, comment/uncomment the following.

YACC = yacc

LEX = lex

# YACC = bison -y

# YACC = byacc

# LEX = flex

Let’s comment out the yacc and lex lines and uncomment the bison -y and flex lines.

Now, since we’ve edited Makefile.utl, we need to run the setup script again. In sys/unix:

sh setup.sh hints/linux

And then, in the top directory:

make all

This time, there are no errors; we are told that everything is done. Indeed, if we look inside the src directory, we’ll find a new executable file entitled nethack. Let’s run it.

./nethack
/home/andrei/nh/install/games/lib/nethackdir: No such file or directory
Cannot chdir to /home/andrei/nh/install/games/lib/nethackdir.

It seems it wants to be installed. Let’s run the following command to see what would happen if we installed it:

make -n install

This prints all the steps that make install would go through, but it doesn’t actually execute them. It appears that the installation would be wholly contained inside the ~/nh/install/games/ directory. I spent a few minutes creating that directory and copying the necessary files into it by hand, and I was able to run the program, but there’s really no need to do that—nor do I think that anyone would be interested in the (very straightforward) process. Let’s just go ahead and install the package:

make install

Now, in the ~/nh/install/games directory there is an executable file called nethack. Let’s go there and run it:

./nethack

nethack4

It works!

nethack5

It is a hard game.


by andrei600 at January 31, 2016 10:52 PM


Giuseppe Ranieri

Assembly

During our Lab 3 in class we were assigned the task of learning more about how Assemblers work and what they are used for. First though we had to learn the differences between X86_64 Register and aarch64 Register. It's interesting to see and take away the different design philosophy that each platform decides on.

An important thing to note is the differences and the similarities. For example, ARM instructions are RISC (Reduced instruction set computer) like instructions. The benefit is that instructions are all very simple and fixed length. X86 has more complicated instructions as it is CISC (Complex instruction set computer).

For the lab itself, the task required building a simple program ran in x86_64 and aarch64. The program first had it looping from 0 to 9 but by then it had to be 0 to 30. I'm just going to focus on the x86_64 portion of the lab, as the logic for one, was same for the other, just with minor tweaks I'll elaborate later on.

The main differences between assembly and higher level languages for me are:
  1. Symbols
  2. Keeping Track of Registers
  3. A Different Way of Thinking

 

Symbols

Although the program is simple in nature, it still taught me the basic learning blocks for what more complicated programs will ask for. It is important to allocate memory without the need of moving it for later. Symbols make sure all of this corresponds to the correct memory addresses, registers or other values.


Keeping Track of Registers

I found it very unique how registers are handled. They help speed up the processor operations with internal memory storage locations. The only problem and challenge is that there are so few registers to use. I am assuming later on you will need to dynamically point to memory storage instead of just using registers but that is still yet to be seen. It will be interesting to compare these thoughts later down the road as I learn more in SPO600.


A Different Way of Thinking

If the task was asked for in a higher level language, it could be finished in a few lines like this psuedocode suggests:
for count in range(0,30):
        print "Loop:" + count

But this was a bit more complicated. For example, although counting out looud in your head pass 10 is simple, for assembly doing this in ASCII isn't simple. Instead we had to take the number, divide it by 10 using an instruction, taking that reminder and displaying it.


by JoeyRanieri (noreply@blogger.com) at January 31, 2016 09:54 PM


Nina Wip

Lab 3 - Code Building

Time to write some assembly! We did this lab in groups so we could put our heads together to write some good assembly code.

First of all we had to download the examples to our directories on the servers and unpack them. This was done with the tar and make command.
Once we had the example files we could take a look at the differences between dumping the C file and the assembler file. When dumping the c source code you get a bunch of information about all the source code. When dumping the assembler code you only get the instructions.

Now it was time to make our own little program. We needed to make a simple looper that goes from 0-9 and prints out the number.
After making our own file with the simple Hello World looper we needed to compile the assembler code using the following commands:
       Run the assembler: as -g -o test.o test.s
       Run the linker: ls -o test test.o
You're going to use these command a lot so it's easier to write it in one command and re-use it.
       as -g -o test.o test.s; ls -o test test.o; ./test

If you do not have the right permissions, use the following command:
       chmod 700 filename

Now to edit some assembler code. 
To print a number in assembler you need to work with ASCII codes. In our case we needed to add 48 to the number because this is the character for 0. We added this offset to an empty register that would not get trampled at the start of the program. In the loop itself we added the offset to the number it's supposed to be (0-9) and set the result in the rsi register. This is the message location register. 
In theory this was supposed to work, but we made a little mistake in the .data section. We set it to be .rodata, this means it was read-only. After changing this to .data it did print all the numbers, but we also had a newline character in our string but that did not seem to work. Appearently the mov command moves the full 64 bytes, even if you only have 1 byte of data. This caused the number to overwrite the newline character. This was an easy fix by adding the b suffix to mov and the register you move it from. 

The next step was to loop from 00-30. 
To get this to work we divided the loop number by 10 and kept track of our remainders. To do this you use the div command, this takes the rax register and divides it by a chosen register. It stored the outcome into rax and the remainder into rdx. The rdx register has to be set to 0 before you use the div command. As before we added our offset of 48 to rax and rdx and moved these into the string.

The last step was to supress the first zero when the number is below 10.
We did this by checking if the quotient was zero. If it was, we used the je(jump if equal) command to jump to a label. This label would be after the mov command to move the first digit to the string, so the first digit would not get printed if the jump occurred.

by Nina (noreply@blogger.com) at January 31, 2016 07:33 PM

Lab 4 - Compiled C

This lab was about compiling code with different arguments. We'll be discussing 6 changes. Each change was done in differents groups. My group did change 5, so this one is more extensive.

Starting code
We used the simple hello world code for this lab:
#include <stdio.h>

int main() {
printf("Hello World!\n");
}

We compiled this code using the GCC compiler with the following arguments.
-g               # enable debugging information
-O0 # do not optimize (that's a capital letter and then the digit zero)
-fno-builtin # do not use builtin function optimizations

In the objdump you could see that <main> calls to <printf> to print Hello World.

1. Add the compiler option -static. Note and explain the change in size, section headers, and the function call.
When you add the option -static, the compiler imports the entire library when it only needs one function. This causes the file size to be much larger. 

2. Remove the compiler option -fno-builtin. Note and explain the change in the function call.
<printf> replaced with <puts>

3. Remove the compiler option -g. Note and explain the change in size, section headers, and disassembly output.
When you enter the command objdummp --source you normally get the written code together with the assembler code. This is handy when you want to see which assembler code belongs to which C code. When you remove the option -g you remove all the debug information. This also means when you enter the command objdump --source, you do not get the C code in the file because this is seen as debug information.

4. Add additional arguments to the printf() function in your program. Note which register each argument is placed in. 
In the AARCH64 architecture 7 arguents get saved in registers, the overflow gets pushed on the stack. 
In the INTEL architecture only 5 arguments get stored in registerd, and the overflow also gets pushed on the stack.

5. Move the printf() call to a separate function named output(), and call that function from main(). Explain the changes in the object code.
5.1
Our task was to move the printf() function to a seperate function output().
#include <stdio.h>

void output(){
printf("Hello World!\n");
}

int main() {
output();
}

We compiled this with the same arguments stated above. The objdump now shows:
<main> calls <output> and that calls <printf>
<output> is the same as <main> without our output() function.

5.2
We now compiled the following code to inspect differences when you add parameters.
#include <stdio.h>

void output(char a[]){
printf(a);
}

int main() {
output("Hello World!\n");
}

In the objdump you see that it loads the given parameter to the x0 register.
The output function in the <main> adds a pointer to the parameter to register x0 before calling <output>.
<printf> in the <output> then takes the x0 parameter and puts it in the x1 register to put it as argument 2 for printf.

File size increased slightly with each change.

6. Remove -O0 and add -O3 to the gcc options. Note and explain the difference in the compiled code.These options have to do with optimizations. The compiler gets 'smarter' and deletes lines of code it doesn't need. For example, in O0 it sets up the stack but it never uses the stack after that. These lines are deleted in the O3 option. Another example is that after the <printf> is done, it doesn't return to main. It goes back to whatever called <main>.

by Nina (noreply@blogger.com) at January 31, 2016 07:32 PM

Lab 1 - Code Review

The purpose of this lab is to explore code review processes used in open source projects. We had to select two open source project that have different licenses.

Amarok
The first project I looked at was called Amarok. This is a iTunes like music player for Linux, Unix and Windows. https://amarok.kde.org/. This project is licensed under the GNU General Public License.

There are a few ways you can contribute to this project:
  • Create artwork for the application
  • Write code in C++ using the Qt and KDELibs toolkits
  • Write scripts that expand the application
To start contributing they recommend you take a look at the bugs and see if you can help with any bugs. Their bugtrackingsystem can be found at: https://bugs.kde.org/buglist.cgi?quicksearch=amarok

All the code that you write must be submitted to their reviewboard which can be found at https://git.reviewboard.kde.org/groups/amarok/?show-closed=1&page=1.
On this reviewboard all the members can comment on your contribution. However there are only a few members with commit rights. So when members comment that the code looks good and would be a nice addition, they'll most likely comment that someone with commit access should push this to the master branch. An example I found: https://git.reviewboard.kde.org/r/120930/#review69630

A more recent example of an actual submission: https://git.reviewboard.kde.org/r/2/
Here you can see that normal members comment on it and say to ship it, and a member with commit access commits the code. 

Chromium
The second project I looked at was Chromium, an open-source browser behind google chrome: http://www.chromium.org/Home. This project is licesned under the Creative Commons Attribution 2.5 License.

There are three major paths you can take to contribute to Chromium:
  • You have a new idea
  • You want to change exicting code
  • You want to work on an existing bug
When you have a new idea you want to add to Chromium you first have to post your idea to a discussion group. In this group they'll discuss if your idea if worth adding to the application. I couldn't find an example of this, because their discussion group is also for many other problems. It can be found here: https://groups.google.com/a/chromium.org/forum/#!forum/chromium-discuss

If you want to change existing code, you have to contact the person who originally wrote the code and ask if your idea would be good to add. 

Not all bugs in their system are assigned, so if it's not assigned youre allowed to pick it up and assign it to yourself. If the bug you want is already assigned you can contact the person it's assigned to and ask if you could help or take over. It's also possible to file your own bug and work on it, but these bugs cannot be whole new ideas. These bugs have to be nontrivial, like simple cleanups and style fixes. The listed bugs can be found here: https://code.google.com/p/chromium/issues/list

To submit your code you have to add it to the code review site: https://codereview.chromium.org/ 
Your code will be reviewd within 24-48 hours, if the patch is not too big. All members can comment on your code and give you feedback. Only some members have commit access. An exmaple I found: https://codereview.chromium.org/1650283002/

Overall Chromuim takes communication very seriously, if you're working on something you need to keep communicating to relevant people otherwise you might have a hard time getting your code reviewed.


by Nina (noreply@blogger.com) at January 31, 2016 07:32 PM

Lab 2 - Code Building

GNUChess
https://www.gnu.org/software/chess/
As recommended, I went to the GNU website to find some projects to install on my fedora installation.There was a whole list of projects and I decided to go with a game. One game I chose was gnuchess.
Since this was my first time installing an open source project on a Linux device, I had to find some instructions on how to install the game. Luckily the package had a install guide in it which told me to run the commands ./configure, make, make install. The command ./configure did work but I got error's when I tried to execute the make commands. Then I just tried to run the gnuchess file and it asked if I wanted to install the packages, which I said yes to. When I ran the gnuchess file again, the game started in the terminal.


I could not yet install a non-GNU project. The projects I found were too big, and I could not install all the needed components. I will try again later.

by Nina (noreply@blogger.com) at January 31, 2016 07:30 PM

January 30, 2016


Yunhao Wang

SPO600 – Lab4

In this lab, using a simple Hello World Program to demonstrate the 3 different
GCC compiler options and the way it optimize the code in x84_64 enviroment:
Hello World:
#include <stdio.h>
int main() {
    printf("Hello World!\n");
}
GCC Compile Option:
-g # enable debugging information 
-O0 # do not optimize (that's a capital letter and then the digit zero) 
-fno-builtin # do not use builtin function optimizations
(1)GCC -g -O0 -fno-builtin -o: this is the original used to compare with others
Size: 9,536byte

0000000000400536 <main>:
#include <stdio.h>
int main() {
 400536: 55 push %rbp
 400537: 48 89 e5 mov %rsp,%rbp
 printf("Hello World!\n");
 40053a: bf e0 05 40 00 mov $0x4005e0,%edi
 40053f: b8 00 00 00 00 mov $0x0,%eax
 400544: e8 c7 fe ff ff callq 400410 <printf@plt>
 400549: b8 00 00 00 00 mov $0x0,%eax
}
 40054e: 5d pop %rbp
 40054f: c3 retq

0000000000400410 <printf@plt>:
 400410: ff 25 02 0c 20 00 jmpq *0x200c02(%rip) # 601018 <_GLOBAL_OFFSET_TABLE_+0x18 >
 400416: 68 00 00 00 00 pushq $0x0
 40041b: e9 e0 ff ff ff jmpq 400400 <_init+0x20>
(2)GCC -g -O0 -fno-builtin -static -o : Add the compiler option -static
Size: 855,736byte
With the -static option it will including all the stdio library file (no only the function used for the hello world program) 
as static memory, which will make the file size extremely large than the original version.

The difference between the original version and static version are highlight with red.
It using _IO_prinft function call instead of printf@plt funciton call.
And also have one additional operation 'nopl' which is one-byte 'do nothing' operation.
000000000040095e <main>:
 40095e: 55 push %rbp
 40095f: 48 89 e5 mov %rsp,%rbp
 400962: bf 10 77 49 00 mov $0x497710,%edi
 400967: b8 00 00 00 00 mov $0x0,%eax
 40096c: e8 3f 0b 00 00 callq 4014b0 <_IO_printf>
 400971: b8 00 00 00 00 mov $0x0,%eax
 400976: 5d pop %rbp
 400977: c3 retq
 400978: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
 40097f: 00

Here is the _IO_printf, not like the printf@plt in the original version, 
it representing the whole scenario of display 
on the screen. On the original version, the printf@plt will jump to the actual display function in the dynamic linked library.
00000000004014b0 <_IO_printf>:
 4014b0: 48 81 ec d8 00 00 00 sub $0xd8,%rsp
 4014b7: 84 c0 test %al,%al
 4014b9: 48 89 74 24 28 mov %rsi,0x28(%rsp)
 4014be: 48 89 54 24 30 mov %rdx,0x30(%rsp)
 4014c3: 48 89 4c 24 38 mov %rcx,0x38(%rsp)
 4014c8: 4c 89 44 24 40 mov %r8,0x40(%rsp)
 4014cd: 4c 89 4c 24 48 mov %r9,0x48(%rsp)
 4014d2: 74 37 je 40150b <_IO_printf+0x5b>
 4014d4: 0f 29 44 24 50 movaps %xmm0,0x50(%rsp)
 4014d9: 0f 29 4c 24 60 movaps %xmm1,0x60(%rsp)
 4014de: 0f 29 54 24 70 movaps %xmm2,0x70(%rsp)
 4014e3: 0f 29 9c 24 80 00 00 movaps %xmm3,0x80(%rsp)
 4014ea: 00
 4014eb: 0f 29 a4 24 90 00 00 movaps %xmm4,0x90(%rsp)
 4014f2: 00
 4014f3: 0f 29 ac 24 a0 00 00 movaps %xmm5,0xa0(%rsp)
 4014fa: 00
 4014fb: 0f 29 b4 24 b0 00 00 movaps %xmm6,0xb0(%rsp)
 401502: 00
 401503: 0f 29 bc 24 c0 00 00 movaps %xmm7,0xc0(%rsp)
 40150a: 00
 40150b: 48 8d 84 24 e0 00 00 lea 0xe0(%rsp),%rax
 401512: 00
 401513: 48 89 fe mov %rdi,%rsi
 401516: 48 8d 54 24 08 lea 0x8(%rsp),%rdx
 40151b: 48 8b 3d 7e bb 2b 00 mov 0x2bbb7e(%rip),%rdi # 6bd0a0 <_IO_stdout>
 401522: 48 89 44 24 10 mov %rax,0x10(%rsp)
 401527: 48 8d 44 24 20 lea 0x20(%rsp),%rax
 40152c: c7 44 24 08 08 00 00 movl $0x8,0x8(%rsp)
 401533: 00
 401534: c7 44 24 0c 30 00 00 movl $0x30,0xc(%rsp)
 40153b: 00
 40153c: 48 89 44 24 18 mov %rax,0x18(%rsp)
 401541: e8 fa 41 01 00 callq 415740 <_IO_vfprintf>
 401546: 48 81 c4 d8 00 00 00 add $0xd8,%rsp
 40154d: c3 retq
 40154e: 66 90 xchg %ax,%ax
(3)GCC -g -O0 -o : Remove the compiler option -fno-builtin
Without the -fno-builtin option, GCC compiler will using builtin function to optimize the code.
The difference between original version and function builtin version is highlight with red.

It using puts@plt instead of printf@plt. 
0000000000400536 <main>:
 400536: 55 push %rbp
 400537: 48 89 e5 mov %rsp,%rbp
 40053a: bf e0 05 40 00 mov $0x4005e0,%edi
 40053f: e8 cc fe ff ff callq 400410 <puts@plt>
 400544: b8 00 00 00 00 mov $0x0,%eax
 400549: 5d pop %rbp
 40054a: c3 retq
 40054b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

The functionality between puts@plt and printf@plt is same when we only have one pramater on printf function call. 

If we have more than one pramater, the GCC compiler will using printf@plt in both situation.
0000000000400410 <puts@plt>:
 400410: ff 25 02 0c 20 00 jmpq *0x200c02(%rip) # 601018 <_GLOBAL_OFFSET_TABLE_+0x18>
 400416: 68 00 00 00 00 pushq $0x0
 40041b: e9 e0 ff ff ff jmpq 400400 <_init+0x20>

puts is a straight forward function call to display string on the screen, 

while the printf will scan the pramater and '%' to determine the output, which will taking extra time to performance.
(4)GCC -O0 -fno-builtin -o : Remove the compiler option -g
Size:8504byte
The difference between original version and no debug information version is highlight with red.
Because it will not include the debugging information without option -g, the total file size was reduced.
0000000000400536 <main>:
#include <stdio.h>
int main() {
 400536: 55 push %rbp
 400537: 48 89 e5 mov %rsp,%rbp
 printf("Hello World!\n");
 40053a: bf e0 05 40 00 mov $0x4005e0,%edi
 40053f: b8 00 00 00 00 mov $0x0,%eax
 400544: e8 c7 fe ff ff callq 400410 <printf@plt>
 400549: b8 00 00 00 00 mov $0x0,%eax
}
 40054e: 5d pop %rbp
 40054f: c3 retq
(5)GCC -g -O0 -fno-builtin -o : Add additional arguments to the printf()

#include <stdio.h>
int main() {
 int a=10,b=20,c=30,d=40,e=50,f=60,g=70,h=80,i=90,j=100;
 printf("Hello World!\n%d %d %d %d %d %d %d %d %d %d",a,b,c,d,e,f,g,h,i,j);
}
The first five value declare in the Hello World program move into five different register %eax,%edx,%ecx,%edi,%r8d

The last five value declare in the Hello World program move to the register %esi and push to the stack.

0000000000400536 <main>:
 400536: 55 push %rbp
 400537: 48 89 e5 mov %rsp,%rbp
 40053a: 48 83 ec 30 sub $0x30,%rsp
 40053e: c7 45 fc 0a 00 00 00 movl $0xa,-0x4(%rbp)
 400545: c7 45 f8 14 00 00 00 movl $0x14,-0x8(%rbp)
 40054c: c7 45 f4 1e 00 00 00 movl $0x1e,-0xc(%rbp)
 400553: c7 45 f0 28 00 00 00 movl $0x28,-0x10(%rbp)
 40055a: c7 45 ec 32 00 00 00 movl $0x32,-0x14(%rbp)
 400561: c7 45 e8 3c 00 00 00 movl $0x3c,-0x18(%rbp)
 400568: c7 45 e4 46 00 00 00 movl $0x46,-0x1c(%rbp)
 40056f: c7 45 e0 50 00 00 00 movl $0x50,-0x20(%rbp)
 400576: c7 45 dc 5a 00 00 00 movl $0x5a,-0x24(%rbp)
 40057d: c7 45 d8 64 00 00 00 movl $0x64,-0x28(%rbp)
 400584: 44 8b 45 ec mov -0x14(%rbp),%r8d
 400588: 8b 7d f0 mov -0x10(%rbp),%edi
 40058b: 8b 4d f4 mov -0xc(%rbp),%ecx
 40058e: 8b 55 f8 mov -0x8(%rbp),%edx
 400591: 8b 45 fc mov -0x4(%rbp),%eax
 400594: 48 83 ec 08 sub $0x8,%rsp
 400598: 8b 75 d8 mov -0x28(%rbp),%esi
 40059b: 56 push %rsi
 40059c: 8b 75 dc mov -0x24(%rbp),%esi
 40059f: 56 push %rsi
 4005a0: 8b 75 e0 mov -0x20(%rbp),%esi
 4005a3: 56 push %rsi
 4005a4: 8b 75 e4 mov -0x1c(%rbp),%esi
 4005a7: 56 push %rsi
 4005a8: 8b 75 e8 mov -0x18(%rbp),%esi
 4005ab: 56 push %rsi
 4005ac: 45 89 c1 mov %r8d,%r9d
 4005af: 41 89 f8 mov %edi,%r8d
 4005b2: 89 c6 mov %eax,%esi
 4005b4: bf 60 06 40 00 mov $0x400660,%edi
 4005b9: b8 00 00 00 00 mov $0x0,%eax
 4005be: e8 4d fe ff ff callq 400410 <printf@plt>
 4005c3: 48 83 c4 30 add $0x30,%rsp
 4005c7: b8 00 00 00 00 mov $0x0,%eax
 4005cc: c9 leaveq
 4005cd: c3 retq
 4005ce: 66 90 xchg %ax,%ax

(6)GCC -g -O0 -fno-builtin -o : Move the printf() call to a sepraate funciton name output()
#include <stdio.h>
void output(){
 printf("Hello World!\n");
}
int main() {
 output();
}

The program call output from main and call printf@plt 
from output and move $0x0 to %eax to empty the register before each function call.

000000000040054c <main>:
 40054c: 55 push %rbp
 40054d: 48 89 e5 mov %rsp,%rbp
 400550: b8 00 00 00 00 mov $0x0,%eax
 400555: e8 dc ff ff ff callq 400536 <output>
 40055a: b8 00 00 00 00 mov $0x0,%eax
 40055f: 5d pop %rbp
 400560: c3 retq
 400561: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
 400568: 00 00 00
 40056b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

0000000000400536 <output>:
 400536: 55 push %rbp
 400537: 48 89 e5 mov %rsp,%rbp
 40053a: bf 00 06 40 00 mov $0x400600,%edi
 40053f: b8 00 00 00 00 mov $0x0,%eax
 400544: e8 c7 fe ff ff callq 400410 <printf@plt>
 400549: 90 nop
 40054a: 5d pop %rbp
 40054b: c3 retq>
(7)GCC -g -O3 -fno-builtin -o : Change -O0 to -O3
The compiler remove the first two line initialize code and did not go back 
to main when the printf@plt function called as the last step of the Hello World program.

Original one:
 400536: 55 push %rbp
 400537: 48 89 e5 mov %rsp,%rbp
 40053a: bf e0 05 40 00 mov $0x4005e0,%edi
 40053f: b8 00 00 00 00 mov $0x0,%eax
 400544: e8 c7 fe ff ff callq 400410 <printf@plt>
 400549: b8 00 00 00 00 mov $0x0,%eax
 40054e: 5d pop %rbp
 40054f: c3 retq

0000000000400440 <main>:
 400440: 48 83 ec 08 sub $0x8,%rsp
 400444: bf f0 05 40 00 mov $0x4005f0,%edi
 400449: 31 c0 xor %eax,%eax
 40044b: e8 c0 ff ff ff callq 400410 <printf@plt>
 400450: 31 c0 xor %eax,%eax
 400452: 48 83 c4 08 add $0x8,%rsp
 400456: c3 retq
 400457: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
 40045e: 00 00

by yunhaowang at January 30, 2016 12:00 AM

January 29, 2016


Andrei Topala

Code Review Lab

Hello, friends! In this post I will examine two open-source software packages that have different licenses.

First, let’s look at what exactly is meant by “open-source software.” In the early years of digital computers, software was generally an academic concern. The focus was on research and problem-solving. To this end, software was usually distributed alongside its source code in order to allow users—who were, at the time, largely scientists and researchers—to examine and, if need be, to modify the software so that it would run on their particular systems.

As the market for computers grew, however, the nascent software development industry began to shift away from the open-source model, giving rise to proprietary, or “closed-source” (or “non-free”), software, which soon established itself as the norm. The reasons for this paradigm shift in software distribution are simple: if users must pay to purchase or to distribute software, then developers will have a strong monetary incentive to create better products, and so will trend to excellence—it’s the American dream and the very heart of capitalism. The dream was realized, ubiquitous giants such as Microsoft and Facebook and Google took form, and, as of today, proprietary software is, at least in the view of the laity, the standard.

But there have always existed developers and hobbyists who have shared their code, and who believed that a free and open approach to software development can be mutually beneficial to both developers and users; and, with the advent of the Internet, this community expanded and established channels of communication to foster a growing free software movement. There exist now a multitude of powerful and important open-source projects that offer an unrestrictive alternative to closed-source projects; moreover, there is some open-source software than has no proprietary equivalent.

A quick point must be made, I think, about the distinction between “open-source” software and “free software.” The first term is practical; the latter is political. We’re not about politics right now, but this leads us into the concept of licenses. Open-source software is released under a license that delineates what a user may or may not do with it. The purpose of licenses is twofold. Firstly, licenses can allow software to be used, manipulated, and distributed freely, effectively releasing it from the confines of its automatic copyright (copyright is, in general, a good thing by most accounts, but one must wonder what it says about us that freedom requires a license; imagine here Bond brandishing a license to free). Secondly, licenses can protect the freedom granted—for instance, some licenses specify that the software can be freely manipulated but also that programs derived from that software must be subject to the same condition.

Well, with that out of the way, we can move onto the meat and potatoes of this post: the examining of two open-source software packages that have different licenses.

The first open-source program we’ll look at is Firefox, a web browser named after an animal that is neither on fire nor a fox.

320px-redpandafullbodyIt’s a red panda.

Firefox was developed by the Mozilla Foundation, a non-profit organization that spearheads the Mozilla project, of which Firefox is a part. It is released under the Mozilla Public License.

The Firefox source code is hosted on Mozilla’s Mercurial (a distributed revision control tool) code repository, from where developers can download and build the Firefox package to analyze the workings of the program.

The process of contributing code for Firefox centers on Bugzilla, a web-based bug-tracking system to which users submit ‘bugs’ (which may be errors in the program’s functioning or simply features that the user would like to see implemented). Mozilla uses Bugzilla (if you’re curious about the -zilla, it does come from Godzilla, and it’s supposed to denote size and power and all things that are -zilla (but probably not the nuclear war metaphor)) for the development of many of its products: alongside Firefox are Thunderbird, SeaMonkey, etc. Each product has its own section, and all changes to the code are tracked by bugs in Bugzilla.

Each bug is submitted with various information specifying its current status, the product to which it is related, the relevant component of that product, etc. Moreover, each bug may be assigned to a person, who would then be responsible for work done on the bug. Most bugs are initially unassigned. To have a bug assigned to oneself, a person must post a message in the bug, declaring his or her intention to work on it and asking someone with bug-editing privileges (usually an experienced and trusted developer) to assign the bug. If a bug is already assigned, one may email the current assignee and cooperate on the project.

Code must be submitted in the form of a patch, which represents a single complete change. A user can attach a patch to a bug on Bugzilla and request that it be reviewed. As a module of the Mozilla project, Firefox is supervised by a module owner. For code to be accepted and integrated into the Firefox module, it must be reviewed and approved by the module owner or one of the owner’s designated peers. The user who develops the patch can upload it and flag it for feedback. Once enough feedback has been given and taken into account, and the user feels that the patch is ready to be implemented, he or she can attach it and flag it for review. A reviewer will then either accept or reject the patch. If the reviewer rejects it, he or she will provide further feedback and elaborate on what must be done in order to have the patch accepted. The user can then submit a modified patch and ask for another review, and this cycle continues until a patch is accepted. If the patch affects the Firefox UI, it must be reviewed further by a member of the Mozilla UX (User eXperience) team. Finally, if all is good, the patch is tested through a “try server,” and, if all is still good, the user can denote that the patch is ready to commit. Then, a user with commit access will push the patch to the repository, and, if it passes all the automated tests, it will be merged into the main branch and be released in the Nightly build, which is used for testing and which will eventually be finalized and released as an official build. Experienced contributors are encouraged to use a repository-based code-review system called MozReview to submit patches; the system is integrated with Bugzilla and simplifies the task of patching and committing for trusted users.

What stands out most about this process, I think, is its efficiency in consolidating code and ideas, and its simplicity. An issue can be brought up by one person, solved by another, and reviewed by yet another. The famous slowness of bureaucracy doesn’t exist here. Now, the system isn’t perfect—some issues are abandoned halfway through, some unresolved bugs date back years, and so on, but these are, by and large, limited to small, inconsequential features. Most important issues are either solved or being worked on.

Now let’s see how another side does it. Let’s look at GIMP, the GNU Image Manipulation Program, which is released under the GNU Public License. GIMP uses git (another distributed version control system), and its source code is hosted on the GNOME git repository.

GIMP uses a few mailing lists and an IRC channel to facilitate communication between developers. A new developer is encouraged to introduce himself or herself on IRC or one one of the mailing lists and to announce the proposed changes he or she is set to make. Then, after the developer makes those changes, he or she must generate a patch file using the git format-patch command. Afterwards, the patch must be sent to the mailing list along with a summary of what it does. The patch will then be reviewed and, if someone with sufficient administrator privileges says it’s okay, it will be accepted and merged into the code base.

The process is quite similar to that of Firefox developers. Indeed, GIMP also uses Bugzilla to keep track of bugs, and a developer may choose simply to submit his or her patch to Bugzilla rather than to the mailing list. The differences in their use of Bugzilla are only in the smaller details. Unlike Firefox bugs, GIMP bugs are not assigned to a particular person but to the GIMP section in general, and the history of a bug and the various stages in its fixing seem to be less strictly organized. This, I think, results from the size of each project and that of its respective development team. The GIMP team is smaller. Firefox, with its larger team and Mozilla’s financial backing, can afford the sort of meticulous precision that characterizes its patching cycle.

At the most basic level, the submission of code into an open-source project is the same regardless of the project or (at least in these two cases) the license. A developer writes code that is necessary or desirable, and submits it for consideration; someone in a position of authority reviews the code; and the code, if accepted, is merged into the software package.


by andrei600 at January 29, 2016 04:19 AM

January 27, 2016


Berwout de Vries Robles

SPO600 Lab 2

For this weeks lab we will be looking at two Open Source packages from the Free Software Foundation and installing them.
In order to find them I will browse through the GNU project's free software directory at:
http://directory.fsf.org/wiki/GNU

My first choice is GNUstep, it is a graphical editor for Objective-C and it plans to add support for C++ in the future. We will look at the documentation offered through
http://www.gnustep.org/resources/documentation/User/GNUstep/userfaq_toc.html
 in order to install the package.

On Fedora, most of the required packages for this installation are included in the "Development tools" group, so it is recommended to install those first.

If you are not on Fedora, the packages can be found in the installation tutorial and each one has a specific use, you can opt-out of most of them if you do not require the functionality so read carefully what each one does.
http://www.gnustep.org/resources/documentation/User/GNUstep/gnustep-howto_2.html#Preliminaries

I tried installing the different packages and running the application, but I accidentally installed it through the install command, thinking it would work as well. I should have compiled it from source instead so my first attempt did not work.

After my first attempt I went to try a simpler program to see if I could get that to install.
I chose http://directory.fsf.org/wiki/Hello which is a modification to the Hello world program everyone writes when they just start programming. It is a demonstrative program meant to educate you. We install it by downloading the source from the provided link. this is a .tar.gz file, which means it is in compressed form.
We can decompress it by using the command
tar -xzf filelocation/filename.tar.gz
After which we install it by executing the following commands:
./configure
make

We can now run our Hello program by typing hello in the appropriate directory.

This is the standard for installing source packages, sometimes the names will be slightly different, for instance ./configuration, view the file names inside the package to see what your specific package requires.

After my success with the Hello program, I downloaded the tar.gz files for GNUstep and used the tar -xvf command to extract them.
After installing the gnustep-make package using ./configure and make, we have to run
 . /usr/local/share/GNUstep/Makefiles/GNUstep.sh
Then we install the other packages the same way we installed the make package.
We are now ready to run GNUstep, before we do though we have to make sure our environment is set up correctly by running the GNUstep.sh file again with the sh command.

After the compilation we can run the program from the commandline. In class I learned that we should not install the files with root privilege, because it could overwrite the packages already present in Fedora. Instead we can just run the programs from the directories in which we installed them.




by Berwout de Vries Robles (noreply@blogger.com) at January 27, 2016 05:03 PM

January 26, 2016


Vishnu Santhosh Kumar

Assembly Language (lab-3)

This lab session was a challenging and fun experience to understand the low level programming. This lab was done in a group.

Stage :1

In the first part we built and ran a “hello world” assembler  program in xerxes (x86_64) server and betty (aarch 64) servers. This helped us to understand the basic difference between an assembly code written in different computer architectures ,in this case x86_64 and aarch-64 computer architectures. The mnemonics used in the two architectures were entirely different , but the logical idea was same. I understood the concept of interacting with the registers  and functions of various registers.

Stage :2

Part A.

Apart from Stage 1, this part was little bit more challenging.In this part, I had to deal with condition statements, loops and more advanced way of using registers to get the desired output.This entire part of assembly code was done on x86_64 machine.The first obstacle was to integrate the concepts of the ‘stage 1’ program into the given code to display a loop count along with a message string .We have successfully displayed the message string , but the hardest part was to include a loop count along with the display. We used mnemonics like ‘add’ to concatenate two pieces of string (Message string + loop count). But that leads to memory overflow and we got segment faults.
Prof. Chris Tyler gave us hints about using a 8 bit ,16 bit or 32 bit of a 64 bit memory with ‘move’ mnemonics and register names with certain suffix’s.Then we came up with an idea of replacing the last characters of the message string using its index with the loop count variable stored in the register named ‘r15’. This is accomplished by taking 1 byte (8 bits) from the register ‘r15’ and replacing the characters on an index of the message string  with  this ‘r15’, with ‘movb’ mnemonics.
ex:  movb %al,(disp+5), where ‘al’ was the 8 bits location of an ‘rax’ register, that contains the ASCII value of loop count and disp+5 is the position of character in disp from 0th index + 5.

Part B.

In this part of stage 2, we have to print the loop from 0-30 and the main task was to eliminate zero’s at the front of loop count.
ex.loop: 01
loop: 02
…….
loop 30
When we first print the values , it displayed some undesired outputs in the display after loop count 9.Thus we understood that the maximum  value to hold is less than 10. Then we used the ‘div’ instruction to get the quotient and the remainder of the loop count , to integrate them together in the message to show values above 10. ‘div’ instruction stores the quotient in ‘rax’ register and reminder in the ‘rdx’ register. The divisor is given along with the instruction. In order to eliminate the zero in the loop count, we had to use a condition and jump statement on the values of quotient.

This is the resultant code:

.text
.globl    _start
start =1                      /* starting value for the loop index; note that this is a symbol */
max = 31                       /* loop exits when the index hits this number  */
_start:
movq $start,%r15  /* loop index */
movq $10,%r10
loop:
movq %r15,%rax
movq $0,%rdx
div %r10
cmp $0,%rax
je loop2
add $48,%rax
movb %al,(disp+5)
loop2:
movb %dl,(disp+6)

movq $disp,%rsi
movq $len,%rdx
movq $1,%rax
movq $1,%rdi
syscall
 inc     %r15                /* increment index */
    cmp     $max,%r15           /* see if we’re done */
    jne     loop                /* loop if we’re not */
    movq     $0,%rdi             /* exit status */
    movq     $60,%rax            /* syscall sys_exit */
    syscall
.data
disp:  .ascii “Loop:  \n”
 len =  . – disp

 

Stage :3

 

// TODO>>>>>>>>>>>>>>


by vishnuhacks at January 26, 2016 11:43 PM

January 23, 2016


Yunhao Wang

SPO600 Lab3

This is the first time I step into assembly language which is so differently than other programming language I had learned. In this lab, we build a simple loop program to get our feet wet on both AArch64 and x86_64 enviroment.

The lab required us to build a simple loop program to display as below:

Loop: 0
Loop: 1
Loop: 2
Loop: 3
Loop: 4
Loop: 5
Loop: 6
Loop: 7
Loop: 8
Loop: 9
Loop:10
and so on

Here is the code writing on x86_64.

.text
.globl _start

start = 0 /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 31 /* loop exits when the index hits this number (loop condition is i<max) */

_start:
 mov $start,%r15 /* loop index */
 mov $10,%r10
loop:
 /* ... body of the loop ... do something useful here ... */
 mov %r15,%rax
 mov $0,%rdx
 div %r10      
 cmp $0,%rax
 je onedigit
 mov %rax,%r14
 add $48,%r14
onedigit:
 mov %rdx,%r13
 add $48,%r13
 movb %r14b,msg+5
 movb %r13b,msg+6
 mov $len,%rdx
 mov $msg,%rsi
 mov $1,%rdi
 mov $1,%rax
 syscall
 inc %r15 /* increment index */
 cmp $max,%r15 /* see if we're done */
 jne loop /* loop if we're not */

 mov $0,%rdi /* exit status */
 mov $60,%rax /* syscall sys_exit */
 syscall

.data

msg: .ascii "Loop: \n"
.set len , . - msg

In this code, the method that convert value to ASCII character are highlight with red, the method that converting 2-digit number was highlight with blue, the method to suppress high digit 0 was highlight with purple, and the method to replacing number in the Loop: string was highlight with green.

Here is the code writing on aarch64 with similar logic:

.text
.globl _start

start = 0 /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 31 /* loop exits when the index hits this number (loop condition is i<max) */

_start:
 mov x3,start /* loop index */
 mov x10,10
loop:
 adr x11,msg
 /* ... body of the loop ... do something useful here ... */
 udiv x4,x3,x10
 msub x5,x10,x4,x3
 cmp x4,0
 b.eq onedigit
 add x4,x4,48
onedigit:
 add x5,x5,48
 strb w4,[x11,5]
 strb w5,[x11,6]
 mov x0,1
 adr x1,msg
 mov x2,len
 mov x8,64
 svc 0

 add x3,x3,1   /* increment index */
 cmp x3,max    /* see if we're done */
 b.lt loop     /* loop if we're not */

 mov x0,0 /* exit status */
 mov x8,93 /* syscall sys_exit */
 svc 0

.data

msg: .ascii "Loop: \n"
len= . - msg

In this code, the method that convert value to ASCII character are highlight with red, the method that converting 2-digit number was highlight with blue, the method to suppress high digit 0 was highlight with purple, and the method to replacing number in the Loop: string was highlight with green.

Because the syntax are totally different between aarch64 and x84_64, the method is looks like different, but the logic behind the code is same.


by yunhaowang at January 23, 2016 09:24 PM

January 22, 2016


Vishnu Santhosh Kumar

Building a software package (Lab-2)

1.Dico

A dictionary server operates on a set of databases. Each database contains a set of headwords with corresponding articles, therefore it can be regarded as a dictionary, in which articles supply definitions (or translations) for headwords.
I downloaded the software from the GNU website. The install process was pretty easy and it was clearly mentioned in the documentation. After the files are downloaded using wget, it is unzipped using tar and configured using ‘./configure’.The Software package is compiled using  ‘make’ command and then ‘make install’  to install the software packages. The steps were pretty easy and I didn’t come up with any trouble in doing this one.

2.Nettle

Nettle is a cryptographic library that is designed to fit easily in more or less any context: In crypto toolkits for object-oriented languages (C++, Python, Pike, …), in applications like LSH or GNUPG, or even in kernel space. In most contexts, you need more than the basic cryptographic algorithms, you also need some way to keep track of available algorithms, their properties and variants. You often have some algorithm selection process, often dictated by a protocol you want to implement.
The Build process for the Nettle was almost similar to Dico Library. Software is downloaded using wget command and unzipped using ‘tar’. The files are then configured and compiled the software package . Then the software is installed .


by vishnuhacks at January 22, 2016 04:03 AM

January 19, 2016


Gideon Thomas

#DatatypeCastingProblems

Warning: This post is an informative programming excerpt

Changing the datatype of a variable from one form to another can often be tricky. This is especially true for dynamic languages like JavaScript.

Recently, I had to run a DB migration that was meant to convert existing text dates (for e.g. ‘2016-01-18T19:34:56.264Z’) in the table to actual DB date objects. The migration to do this was written in JS.

Unfortunately, the text dates could either be ISO formatted or a timestamp (Epoch-based). You would think that doing something like:

var myDateObj = new Date(myDateString);

would just simply work. Well, unfortunately this is the case:

function getDateObj(date) {
    return new Date(date);
}

getDateObj('2016-01-18T19:34:56.264Z'); // works fine
getDateObj('1453145696264'); // throws an exception

Well, annoying as it is, the solution seems quite straightforward right – try to cast whatever I get in getDateObj to an integer. If it throws, use the string, if not, use the parsed integer.

function getDateObj(date) {
    var validDate;
    try {
        validDate = parseInt(date);
    } catch(e) {
        validDate = date;
    } finally {
        return new Date(validDate);
    }
}

Makes sense doesn’t it? In fact, for both of the above data sets, you get Date objects without an exception.

Except, the Date object created from the ISO string has the wrong date! Wtf???

The Date object will represent a date that is 2016ms from epoch time. Why is that?

Well, turns out the issue is with our faulty assumption that parseInt will throw (something that I have become accustomed to from Java and C++). According to MDN Docs:

“If parseInt encounters a character that is not a numeral in the specified radix, it ignores it and all succeeding characters and returns the integer value parsed up to that point.”

So, an ISO date string will be parsed up to the first hyphen it encounters. Hence the catch block is never entered :(

To fix the issue, I was told to use Date.parse instead and test for NaN (which will happen for a value like ‘1453145696264’). If it is NaN use parseInt, otherwise use the string as is.

Well, that was an interesting experience (especially because I nearly wrecked a production database because of this :P).
Moral – Tread carefully when attempting to make a dynamically-typed language behave like a statically-typed one.


by Gideon Thomas at January 19, 2016 05:11 PM


Kenny Nguyen

SPO600 Lab 2

Link To Lab

Tasks

  1. Do not build or install this software as the root user.

    Do not build the software as root, and do not install the software into the system directories. Doing so may cause conflicts with other software on the system and/or may leave your system in an unusable state, and may be very difficult to reverse.
    
  2. Select a software package from the Free Software Foundation's GNU Project.

  3. Build the software. You may need to install build dependencies (e.g., compilers, tools, and libraries); you can do this (and only this) as the root user.
  4. Test that it works.
    Select a second, open source software package that has a different license, and repeat the process with that software.
  5. Blog about the process, your results, your observations, and what you learned.

Wget

Link to software

License: GNU General Public License

First software I decided to install on my Macbook pro is wget, its a simple program that simply isn't installed on OS X(although curl serves basically the same purpose). It is also quite simple to test.

To start off I find the latest source of wget I can get, I proceed to:
http://ftp.gnu.org/gnu/wget/
to find get the source and run:

curl -O http://ftp.gnu.org/gnu/wget/wget-1.17.tar.xz

Gives me a the zipped file, in which I unzip with:

gunzip -c wget-1.17.tar.xz | tar xopf -

It'll simply unzip whatevers in there to a folder with the files name.

To install I simply followed the instructions given to me by the wget wiki, minus the install step:

$ gunzip < wget-1.17.tar.gz | tar -xv
$ cd wget-1.12
$ ./configure
$ make
# make install

Although since I'm running on OS X I did run into some dependency issues when it came to ./configure. My machine didn't have gnutls which from a cursory glance handles ssl connections, so I quickly installed the dependencies on my Mac by inputting this into terminal:

brew install gnutls

This is simply a package manager I installed separately that works someone close to apt-get on ubuntu, that command got gnutls as well as a few of the dependencies that gnutls needs.

After installing the dependencies I needed ./configure worked, make worked, and I proceeded into the src folder in order to run the application.

In my test run I will download a gif off my blog to the present directory and display that gif in order to show that the application has been installed properly. Below is a gif of my run:

MPV

Link to software

License: GPLv2

Alright for my next application I've decided to install MPV, its a media player based on mPLayer1/2 thats focused on efficient media playback. Its available at https://www.mpv.io

The simple steps to get it are:

 curl -O https://github.com/mpv-player/mpv/archive/v0.15.0.tar.gz

gunzip -c mpv-0.15.0.tar.gz | tar xopf -

cd mpv-0.15.0

./bootstrap.py // This downloads the latest build tools

./waf configure //Pretty much the same as ./configure from the previous application

./waf build

From there assuming there were no dependency issues the software is ready to use and compiled as mpv inside the build folder.

For my test I'll play a video in my documents folder and add an additional option to make the video play at 8seconds into he video. The gif provided below displays my test.

Closing Notes:

I can't say that I'm all that unfamiliar with building software from source, in the past I have run into issues with dependencies that didn't arise when I tried installing these 2 packages, but I feel that I'm slowly starting to understand how dependencies work.

by Kenny Nguyen at January 19, 2016 04:57 PM


Tom Ng

Lab 2: Compiling My Everyday Tools from Source

So I downloaded the gcc source. The thought of compiling a compiler was interesting because I use compilers all the time and if a compiler was just a program then it too was probably compiled by another compiler. When I went GCC’s website to get gcc, there was a warning saying that there was probably no reason to download the source without a c compiler. That made me wonder who made the first compiler? Sounds like one of those chicken and egg things.

I downloaded a tarball from an ftp mirror. It turned out to be larger than I thought exceeding 100mb. After a bit of Googling, I extracted the files from the tarball. This took longer than the example done in class with gzip which makes sense considering the source was also much smaller. I ran the configuration script but it ended prematurely because it could not find GMP 4.2+, MPFR 2.4.0+ and MPC 0.8.0+. This of surprised me because I thought such a vital piece of software would not have dependencies other than the c compiler used to compile it. I also noticed that the script outputted that the build system type, the host system type and the target system type were all “x86_64-unknown-linux-gnu”. I was running this off of a live USB for another class (MAP524) which is based off of Linux Mint 17.3 and I wondered if it had anything to do with the configure script being unable to detect any of the types.

Because I did not have root access for the operating system (the password given to me did not work), I could not install the other dependencies so I decided compiling gcc was for another day and went to compile something else.

From the long list of things under software on GNU’s website, I chose to download less. I use less a lot more than cat or vi and is my favorite utility for viewing files that aren’t source code. less took (literally) less time to download being almost instant and the tarball was a mere 300kb. The tarball too extracted almost instantly and contained no directories. Unfortunately the configuration script failed this time because it could not find terminal libraries. Perhaps I should compile on another operating system…

Finally I decided to try matrix. less was ftp’d without issue but the tarball for gcc couldn’t be uploaded properly so instead I downloaded the emacs source which was roughly half the size of gcc and uploaded that tarball instead. When running the configuration script, it ran without issue and created a makefile. The makefile also ran without issue and produced the less executable in the same directory. It finally worked!

Now on to emacs… Turns out extracting the tarball exceeded my data quota on matrix…maybe that’s why gcc couldn’t be uploaded. Fortunately the tmp directory doesn’t seem to have a quota so I extracted it there. When running the configure script, it ended prematurely because package libungif was missing. The configuration script however allowed for –with-gif=no to be used as a parameter and then the script ran fine. It took much longer than the script for less but it was finally done and a makefile was produced. Running make filled the screen with commands spanning multiple rows; mostly because of the libraries that had to be linked. Many compile commands took at least one second to complete. After about five minutes it was finally done. Strangely neither an executable nor a bin directory was created. After reading the installation notes it turned out the executable is created in the src directory which I thought was unusual. The executable works however and was four years and one version more recent than the one on matrix…though without gif support.


by tng23spo600 at January 19, 2016 04:35 AM


Kenny Nguyen

First blog post

Alright well I finally got my blog up, running and hopefully setup properly.

In the past, maybe half a year ago I had a blog up for my coop term at CDOT, but well it kind of exploded along with my windows 8.1 Server so there's no helping that.
I guess I can talk about my old server. It was sort of a horrifying chimera of software.

A basic rundown of what it was would be:

  • Windows 8.1 Home Edition
  • Ghost Blog running inside cmd, not automated at all so I had to open it manually every time the machine crashed
  • Ftp server for no apparent reason
  • Samba Server with 5 Hard Disks
  • Plex Server Software
  • And to top it all off, absolutely, ZERO redundancy

So yeah, not the most reliable server possible. With this new blog I've decided to host it with ghost again, I know I could just jump over to wordpress or any of the other numerous free blog sites, but there is a certain element to it that makes me want to self host.

With this blog I've put my faith in cloudatcost vps, specs seen below:

Along with a free dns, which is duckdns.

I've outfitted the vps with docker-64bit which is what I assumed was ubuntu with docker installed but... I don't know if I'm inexperienced or not, but I did run into a particular issue.

I had setup the VPS properly with duckdns, logged in via putty. Went to finally install the ghost docker package when this error popped up:

It may be my inexperience but I'm pretty quick to just assume that somethings wrong on my VPS hosts end.

Well I think this is a good enough point to stop my first blog post.

by Kenny Nguyen at January 19, 2016 12:09 AM

January 18, 2016


Giuseppe Ranieri

Code Building

GNU Units

The first software I will be compiling is the GNU Units and it uses the GNU License. After I installed the file through FTP and unzipped and untar I found there was no makefile at all in the directory. I wasn't sure what the issue was:










I thought maybe I had to do ./install-sh but was perplexed why it wouldn't work(I assume that should have at least done something as it was an executable.) I had then noticed the INSTALL readme file and opened it up to see what to do.

After reading through the documentation everything had work and I was able to run units correctly. The configuration and make both took very quickly, less than a few seconds even though the documentation said otherwise.

LUA

The next software I installed and complied is Lua and it uses the MIT License. The installation method was easier as the make file was already inside the directory. The compiling had some issues.


After googling the fatal error issue I discovered the problem was that I had to install the readline development library. After doing so, the project was able to install correctly.


by JoeyRanieri (noreply@blogger.com) at January 18, 2016 07:41 AM

Open Source & Contributing To Software

This post will briefly speak of two open source software packages that anybody can contribute to and how it can be done.

parsedatetime Library

The parsedatetime library is an open source python library able to parse human-readable date/time strings  and can be obtained at https://github.com/bear/parsedatetime. It supports many different types of date formats. I've used the library before in the past for personal projects and I've always found no matter the string, it's able to come out ahead with the correct parse. The license for the project is the Apache License.
  1. Go to there issues tab on github to see any problems with ongoing development. https://github.com/bear/parsedatetime/issues
  2. To contribute a patch, make a "pull request" through git for the issue. 
  3. Wait for the author to comment and review your proposal.
  4. If thoroughly review and approved, the project author will merge the pull request.
The author will respond very quickly and the pull could be added quickly if he approves. Even in the same day sometimes. Other times, bugs or problems will be cataloged for others to go through to fix themselves. Although the project is very thorough throughout the years, there are still edge cases here and there that the author needs help fixing.

jQuery

jQuery is a JavaScript library designed to simplify the client-side scripting of HTML. jQuery is the most popular JavaScript library in use today. I have personally used it many times while working for clients who required intricate details for their web projects. It can be obtained at https://jquery.com/. It uses the MIT License. To contribute:
  1. Go to https://contribute.jquery.org/ 
  2. Through the side menu decide whether you want to contribute to:
    • Bug Triage
    • Code
    • Community
    • Documentation
    • Support
  3. You'll need to establish the need for a fix or feature.
  4. Discuss the ticket with the team
  5. Create a pull request
  6. Respond to code reviews and wait for it to be accepted.
    When contributing, it's important to follow the style guide found here https://contribute.jquery.org/style-guide/ and to make sure unit tests are added. The community itself has many ways to support through forums or even Meetups. This is a quick and easy way to join an open source community.

    by JoeyRanieri (noreply@blogger.com) at January 18, 2016 06:56 AM

    January 16, 2016


    Andrew Smith

    You try to give Microsoft a break but no…

    I feel like Microsoft hates me. It’s not just that it doesn’t care – it is going out of its way to keep me off its operating system, its ecosystem, everything it owns or controls.

    A few weeks ago I decided to accept the inevitable and “upgrade” from Windows 7 to Windows 10. I know about all the privacy problems and the forced automatic updates, but I figured might as well get used to it, in a couple of years there will be no old-school Windows to use.

    With that (pretty good) mindset I upgraded the OS. At first I thought it looked pretty nice, considering how terrible Windows 8 was. They realised people do still want to use a keyboard and mouse, and put some effort into making them work properly.

    I saw the automatic updates too, kind of annoying that they can’t be turned off but I was going to accept the future so I accepted that. I tried to play some games, most of them didn’t work, but again, I was trying to accept the future.

    Then I started playing Batman Arkham City. I would get to where I kick Mr Fries’s ass and the game would crash. Weird, old game, or something… but I tracked the problem down to the video driver. The newer video drivers are incompatible with this older game.

    Nvidia knew about it but decided not to fix the problem, which I understand, their job must be incredibly difficult and it’s awesome that they are as good as they are with compatibility. I found an older Nvidia driver, uninstalled the new one, installed the old one, and played the game – good times.

    At some point I noticed (the screen flickered) that my video driver got updated… eh.. I downgraded it on purpose, but I guess I knew Windows 10 does that. I managed to finish playing the game, the driver bug only affected that one place.

    Then I played the game again from the start and of course ran into this problem again. I already had the old driver handy, knew what to do, uninstalled the new one, rebooted, all good.

    Went back to the game and I shit you not – Windows downloaded and upgraded my video driver while I was in the middle of playing a game. Not only does that suck generally (the game will crash when the video driver is modified), but in this particular case it caused more than a minor inconvenience – Windows prevented me from doing the only thing that I needed Windows for in the first place!

    What am I supposed to do now? I found a tutorial to disable driver updates – but it said right at the top it may not work (it didn’t). What other options are there? Screw around with random registry keys? Pull out the network cable and reinstall windows? Go back to Windows 7? Stop using Windows altogether?

    I’m so annoyed I may go for the last option. I did my best and gave Microsoft an honest chance, and they fucked it up. Even my legendary patience has limits. Maybe I’ll buy a console, as long as I can find one that doesn’t require online activation of games that I bought and are supposed to be mine for as long as I keep the disks.

    by Andrew Smith at January 16, 2016 07:32 AM

    January 15, 2016


    Tom Ng

    Lab 1: Different Licenses, Similar Procedures

    So I had to choose two open source programs and see how they managed their code. I started with Audacity because it was open on my screen at the time.

    Audacity is an audio editor and is licensed under the GNU GPL v2. After poking around on their website, they seem to have a wiki, a forum for general users and a bugzilla. The bugzilla required an account to submit bugs or requests and an account can only be created by requesting for one via e-mail. However I was able to view a list of bugs without an account. The bugs all have a status (NEW, RESOLVED, DEVEL – FIX MADE or REOPENED) and a resolution (—, FIXED, DUPLICATE, QUICKFIXED, WONTFIX, WORKSFORME, INVALID). When sorted by date/time, most of the recent bugs were of NEW status and — resolution. Bugs dating back from May had more varying statuses and resolutions but there were still bugs that were NEW yet had no resolution.

    I decided to read some of these entries to see how they were dealt with. The first thing everyone seems to do is see if the bug is reproducible. If it is then a link to a fix on GitHub is posted by a developer. If the fix works, the user who filed the bug then confirms it in a reply. Then the bug receives the RESOLVED status and either a FIXED or QUICKFIXED resolution. The number of participants for entries varies. Some bugs can have 0 participants (especially the ones that are NEW and without resolution), most have one developer and one or two users and a few have more. The response time for the developers often takes a few days and code changes can take as long as a few months after the bug was first reported.

    Looking for a second program to analyze took a lot more time than I thought since I had to find one with a different license. It seemed like every open source software I had was licensed under some form of the GPL (either v2, v3 or LGPL). It took a while but I finally found SFML on my computer.

    SFML (Simple and Fast Multimedia Library) is a library for developing software with input and media. It uses its own license and contains the licenses for other external libraries it depends on. On the website, there were two different links, a bug report link to the help section of their forums and an issue tracker to their GitHub. The GitHub contains an issues page which contains bugs and merge requests from other contributors.

    Much of the discussion and bug fixing happens in a manner similar to the way Audacity does. A user describes a problem and while other users and developers try to replicate the problem. If a problem is indeed present, fixes are posted in the replies and if approved by a developer, it is committed to the project. Unlike Audacity, most of the issues seem to be resolved or in the process of being looked at though this may be attributed to the fact that this project is not as widely known as Audacity.


    by tng23spo600 at January 15, 2016 05:42 AM

    January 13, 2016


    Vishnu Santhosh Kumar

    vishnuhacks

    Some Open Source Projects


    Fedora

    Fedora is a Linux based operating system, powered by Red Hat in 2003. There are different ways to contribute in the Fedora open source project, instructions on How to join the Fedora Community is described in the community wikki page.

    How To Contribute ?
    • The first important step towards becoming a contributor is to choose the category of our wish to do research. some of them includes os developer, content writer , Designer etc
    • Then create a fedora account.
    • Subscribe to the mailing list of the corresponding category we have already choose to contribute.
    • Send an introductory email to the group with neccessary information prescribed in the Community Wiki page.
    • Instead of waiting for the getting started help from any group members, we can start attending meeting on IRC [Internet Relay Chat],and introducing ourselves to the group.
    • Get membership on any of the groups in our category and learn more about their projects.
    • Attend the meeting on IRC regularly, and contributions are accepted through mailing list.
    Example of a Patch Submitted on the fedora open source community in infrastructure category.

    Anitya
    Anitya is a Release monitoring project developed in fedora.There are 19 contributors in the project.The Project started 2013,Nov.One of the changes added to the project includes adding a test check button allowing to see what things would like in the database.Some of the important issues discussed in this project includes creating user friendly interface on editng or creating new projects are reporting them to Fedora Bugzilla.

    MySQL

    MySQL is an open source Relational Database Management Software (RDBMS).

    The Primary step to do in order to contribute in the MySQL software is to participate in the internal mailing list and discussions. In order to give contribution , we have to sign the oracle contributor agreement. Find the project of our interest using mailing list or bug tracking system. To submit a patch, we have to include the patch in the bug report and send the report to the mailing system or bug tracking system. Explicitly mentioning the impact of the patch is really appreciated by the community.

    One of the patches that made in  the community is to fix the problem of ‘Starting MySQL 5.7 on windows 10’.
    The bug was reported on 4 January  and closed on 11 January. The community had an active participation on this issue. Many people were contributed to the issue. The Discussion last for almost 1 week, and ended up making small changes in the code of MySQL 5.7 server. The Problem occurred due to some configuration problems with in the installation on windows 10.

     

     


    by vishnuhacks at January 13, 2016 06:33 PM

    January 12, 2016


    Berwout de Vries Robles

    Welcome to my blog

     I am Berwout de Vries Robles, a Software Engineering student in my sixth semester. I am currently enrolled in the CPA program at Seneca College as an exchange student. My home country is the Netherlands, I will be in Canada for my minor semester after which I will return home. I will use this blog to publish course assignments and thoughts regarding Software Portability and Optimization. I hope that it will provide a thorough understanding of my work process and lead to a meaningful contribution for the large open source project we will be working on.

    by Berwout de Vries Robles (noreply@blogger.com) at January 12, 2016 07:03 PM

    January 04, 2016


    David Humphrey

    On Sabbatical

    As I begin the new year, I'm trying something new. I'll be taking a sabbatical for all of 2016. Normally at this time of year I'm starting new courses and beginning research projects that I'll lead in the spring/summer semester. I've been following that rhythm now for over 16 years, really giving it all I have, and I'm in need of a break. Coincidentally, I'll be turning 40 in a few months, and that means I've been in the classroom, in one form or another, for the past 35 years.

    While I wouldn't count myself among those who enjoy 'new' for newness sake (give me routine, discipline, and a long road!), I'm actually very grateful for the opportunity provided by my institution. Historically a sabbatical is something you do every seventh year, just as God rested on the seventh day and later instructed Moses to rest every seventh year. I'm aware that this isn't so common in private industry, and I think it means you end up losing good people to burn out with predictable regularity. At Seneca you can apply for one every 7 years (no guarantees), and I was lucky enough to have my application accepted.

    The sabbatical has lots of conditions: first, I make only 80% of my salary; second, it's meant for people to use in order to accomplish some kind of project. For some, this means upgrading their education, writing a book, doing research, or developing curriculum. While I'm still figuring out a schedule, I do have a few plans for my year.

    First, I need to retool. It's funny because I've done nothing but live at the edge of web technology since 2005, implementing web standards in Gecko, building experimental web technologies and libraries, and trying to port desktop-type things to the web. Yet in that same period things have changed, and changed radically. I've realized that I can't learn and understand everything I need to simply by adding it bit by bit to what I already know. I need to unlearn some things. I need to relearn some things. I need to go in new directions.

    I'm the kind of person who likes being good at things and knowing what I'm doing, so I find this way of being a hard one. At the same time, I'm keenly aware of the benefits of not becoming (or remaining) comfortable. There's a lot of talk these days about the need for empathy in the workplace, and in the educational context, one kind of empathy is for a prof to really understand what it feels like to be a student who is learning all the time. I need to be humbled, again.

    Second, I want to broaden not only my technological base, but also my community involvement, peers, and mentors. I've written before about how important it is to me to work with certain people vs. on certain problems; I still feel that way. During the past decade open source won. Now, everyone, everywhere does it, and we're better off for this fact. But it's also true that what we knew--what I knew--about open source in 2005 isn't really true anymore. The historical people, practices, and philosophies of 'open' have expanded, collapsed, shifted, and evolved. Even within my own community of Mozilla, things are totally different (for example, most of the people who were at Mozilla when I started aren't there today). I'm still excited about Mozilla, and plan to do a bunch with them (especially helping Mark build his ideas, and helping to maintain and grow Thimble); but I also need to go for a bunch of long walks and see who and what I meet out amongst other repos and code bases.

    I'm going to try and write more as I go, so I'll follow this blog post with other such pieces, if you want to play along from home. I'm less likely to be irc in 2016, but you can always chat with me on Twitter or send me mail. I'd encourage you to reach out if you have something you think I should be thinking about, or if you want to see if I'd be interested in working on something with you.

    Happy New Year.

    by David Humphrey at January 04, 2016 09:17 PM

    December 19, 2015


    Anderson Malagutti

    [SOLVED] Ubuntu WiFi does not work – HP Envy

    Hello everybody.

    A couple of days ago, I’ve got a HP Envy (17 inches) laptop, and I decided to install Ubuntu as my OS on this machine.

    Everything was great, the installation was pretty easy and fast…

    The problem was when I started the OS itself… I could not use Wifi, and couldn’t connect to a wireless router and use an internet connection.

    After going on the web and searching for it, I found the solution, and I guess it’s worth sharing in here.

    So let’s go :D

    You won’t believe how simple it was…

    Basically, you will have to run three commands on your terminal.

    A very important step, as I said before I couldn’t use a wireless router, so to run these commands you must connect your computer with an internet cable, after that make sure you have a connection.

    As we are gonna use the ‘sudo’ keyword, you might have to enter your password to make it work.

     

    Commands:

    $ sudo apt-get update

    $ sudo apt-get install bcmwl-kernel-source

    $ sudo modprobe wl

     

    It was this simple, and then I could see the wireless networks around me, and could successfully use a wifi connection.

     

    Thank you.


    by andersoncdot at December 19, 2015 04:06 PM


    Dmytro Yegorov

    Just SPO600

    So this semester I had a pleasure taking SPO600 that was about software portability and optimization. In other words, if you have a piece of software, this course would teach you how to make this software work, work with increased performance and in different environments (CPU architecture, system requirements etc). I would consider this course as something that lies between programming and what is know by the term IT – you do not create software, but you work with the code using some hardware specific methods.

    During this course I learned about CPUs, how they work, how to make them perform specific instruction, how to program on CPU level and so on. I already knew about CPUs, different processor architectures, but my knowledge was very basic, and now I got a chance to learn in-depth about it and apply my knowledge in practice.

    I really liked how this course was structured, as there were, pretty much, no obligations, due dates or whatsoever – it was very informal and friendly, which makes this course different (in a good way for sure) from any other courses I’ve had in Seneca. Also, we got introduced to Open Source community and were provided with lots of opportunities to visit different events, such as FSOSS and X-Windows conference – another way to get in touch with Open Source communities and meet many interesting people. The only downside of this course was that some questions in quizzes we had were pretty unexpected, and when the professor was answering those questions, you would sit like “What is this? I see it for the first time”. However, if you look at “Additional materials” links at the bottom of each week’s wiki-page, you would find many useful resources and everything gets clear after that. Also, even if you did not know answers for some questions, you still could use logic, and based on the knowledge you already had, guess the right answer – for me it worked 90% of the time :)

    Another thing why I found this course very useful for myself was that I am looking for the job exactly in this field. I do like to work with code, but I am not that kind of person who can spend 10 hours a day (and more) writing this code; at the same time, I do not feel like working in Tech. Support and System Administration, and this course deals with issues that lie exactly somewhere between these two areas. This course’s topic is exactly what I’d like to be doing in my workplace. Hopefully, I will be able to find this kind of job soon.

    It was exciting adventure taking this course, and very useful one for sure. I would definitely recommend this course to people that want to understand how exactly processors work and get to know low-level programming.

    Cheers!

    see-you-soon1


    by dyegorov at December 19, 2015 07:25 AM


    Yehoshua Ghitis

    SPO600 Stage III project post

    The build framework was a success for Python-3.5, I was able to see the differences in performance just by changing a few compilation flags and how those flags affect the performance differently based on the architecture that the program may be running on. I feel like the Python dev team should try working on a build test of their own and probably incorporate that into a make check to facilitate the job of developers. And if there is some utility of the sort, make it easier to find in their documentation which I've seen people complain about in the past already.

    Between architectures there was a noticeable gap in performance which can suggest that further development may be required for the interpreter to better optimize it for aarch64 machines.




    It was incredibly helpful to learn about a few tools Linux has to offer for building projects, particularly the usefulness of Make. Along with that, and a tool that I have not stopped using and do not plan on ceasing its use ever since I learned it was the screen command to make sure that I could run a command and then leave my machine alone as the server continues to process it while I'm away. It means I can actually shut down and leave something running over the night and be able to come back and see the progress the next day with little to no trouble.

    In the future I would like to look into further developing this framework we worked on and possibly improve upon it. Maybe add more statistics or make it more easy to use, detect the different build tools that may be found to make a certain project (search for a configure and make file, etc.) and generate or substitute a build plugin as needed. And in the absence of the proper tools, ask the user to provide them. Same goes for the test tools and benchmarking. I feel this project could use more and be much more useful for users with a little more work outside of the constraints that an educational environment give it.

    by the Krev (noreply@blogger.com) at December 19, 2015 05:27 AM

    Building and testing the Python interpreter

    Python is an interpreted programming language. This means that instead of compiling source files for any program, Python instead reads in the source code and runs it on the fly. With this in mind, for my project I have written a fibonacci calculator to run on Python. Before running the calculator, I intend on running a very simple test script, also in Python just to make sure that the installation of the interpreter went as planned.

    The test script: (Keep in mind this is using Python 3.5)

    print("hello")

    Not much going on as you can see, but that's not important. What matters is that the script can run successfully. That will show me that Python is installed and working.

    After the test script works, I have the benchmark program time the following (relatively simple) code:


    def fib(n):
    x=1
    y=1
    temp=0
    i=0
    while i<n:
    print(x)
    temp=x
    x=y
    y+=temp
    i=i+1
    fib(5000)
    The other plugins have relatively few changes, mainly just to run the new plugins using the installation of Python like so:

    Python-[0-9]*/python ./fib.py

    The code for the different plugins is as follows: m10k_build_plugin
      #!/bin/bash
    #m10k_build_plugin test file

    # Unpack source
    tar xf Python*t*z

    # Goes into project directorry
    cd Python-[0-9]*

    # Read configuration to be tested
    read PERMUTATION

    # Set FLAGS to the options from permutation
    export CFLAGS="$PERMUTATION"

    ./configure || exit $?

    make -j8

    m10k_test_plugin

      #!/bin/bash

    Python-[0-9]*/python ./test.py

    m10k_benchmark_plugin

      #!/bin/bash

    START=$(date +%s.%N)
    Python-[0-9]*/python ./fib.py
    END=$(date +%s.%N)
    SCORE=$(echo "($END - $START)*100"|bc|sed "s/\.[0-9]*//")
    echo "$SCORE"


    The best scoring results (by memory score) from building on an X86 test server




    Unique IdPermutation Unique IdGiven IdSpeed ScoreMemory Score
    40134117984
    58182117984
    31113127988
    2084127992
    51163127992
    652117996
    1062117996
    2191117996

    The highest score for memory score was

    11|6|3|12|8120

    The speed score is nearly identical for each result but the memory score does differ (slightly) each time, with the lowest memory score being on row 21 with 8120, and the highest being row 40 with 7940.

    Speed score here has very negligible differences, almost all ranging from 11 to 14, so it can easily be said that row 40 also has the lowest speed score. Meanwhile, row 36 counts with the highest (and only score over 14) of 16.

    36|12|4|16|8092


    The results are much different after running the same test on the AArch64 test server:

    Unique IdPermutation Unique IdGiven IdSpeed ScoreMemory Score
    1112368704
    4142368704
    5212438704
    6222358704
    9312438704

    Most of these scored very similar memory scores, in fact there are only two different memory scores throughout the 64 results given. Those being 8704 and 8768.
    Last few results of the same test:

    Unique IdPermutation Unique IdGiven IdSpeed ScoreMemory Score
    571442358678
    601542358678
    641642358678

    After this test, there is a notable difference between speed scores, especially considering how they are each over twelve times higher than the speed scores of the x86 system. The memory score is slightly higher in each one with similar variations between each iteration, but a score of 8320 appearing much more often now.

    As for the speed scores on the AArch64 run, almost every single one scored 235 with the notable exceptions of

    Unique IdPermutation Unique IdGiven IdSpeed ScoreMemory Score
    1 1 12368704
    2 1 22368768
    4 1 42368704
    144 22368768
    371012368704
    103 22378704
    8 2 42398768
    7 2 32408768
    123 42418704
    5 2 12438704
    9 3 12438704
    441142518704


    There were no permutations with a unique Id of 3 or 8 that yielded a score of 8768. Also, the only unique Ids to give a speed score result that was not 235 were 1, 2, 3, 4, 10, and 11.

    My biggest recommendation to the Python devs would be to include a better way to test that there is a working installation of Python. Maybe include a make check option in their make file.


    This project was done by testing Python-3.5 with the m10k framework with the purpose of comparing the benchmarks of it after being compiled with multiple options and on two different architectures (x86 and AArch64).

    by the Krev (noreply@blogger.com) at December 19, 2015 04:54 AM


    Miguel Dizon

    Final Thoughts on SPO600

    SPO600 is one of the more interesting courses that I took at Seneca. Unlike the other programming courses, SPO600 talks about CPU architectures, assembly code and compiler design. A lot of my courses are about learning concepts or software creation, but this course talks about intricate details of low-level stuff when it comes to working with computers. Some of the contents of the course was hard to digest, and I’ll admit that I had a hard time following my professor while he’s lecturing  about the components of a CPU, or how a program accesses memory, but overall it was still worthwhile enrolling in this course. Chris Tyler is also one of the more friendly and knowledgeable teachers I’ve had.

    Assembly code is very complex compared to high-level programming languages. There were more stuff I had to think about when I tried to write some assembly code, such as registers and opcodes.

    Learning more about compilers changed how I thought about my code being compiled, since I never thought that they can change my code to perform better and make it look ‘correct’.

    The Active Learning Classroom was something unique I haven’t experienced before, and it certainly reflected what I thought about this course. It was nice having multiple screens around the room to look at, but I didn’t like having to use my old Android tablet to do in-class work.

    I’m not sure if the knowledge that I gained from the course can be applied to future studies or my work, but it’s always good to learn about these topics in-depth, since I work with computers everyday and it makes me appreciate it more now that I know what goes on behind the scenes. I would recommend this course to anyone in the CPA or CTY program.


    by madizonseneca at December 19, 2015 04:50 AM


    Nitish Bajaj

    SPO600 Review

    It's been an interesting semester. Like any other course, this course has it's up and down's. There are some great things about this course, things you learn about the compiler, how it works and all. The bad, it can get a little tiring and boring when looking at assembly. Thankfully, we had one of the best professor to get us through this course. He's friendly, eager to help and very knowledgeable about what he does/teaches. He also understands that we all are taking more than at least 1 class and sometimes have difficulty finishing labs on time and he's very understanding about that, as long as you understand the material he teaches in class. He's probably the quickest teacher I've seen regarding replying back to emails, which are extremely helpful. Regarding the course, it's good the way it's set up, you play around with the compilers using, well, "Hello World!" (Lab 2) and other stuff as you go on. Also the room was different than the usual room, had TV's on each wall, which is understandable because we need to see what the professor is doing, while he teaches. I did have problems understanding the framework in the beginning, especially when we were in the planning stages, but finally I'm through that now.

    by Nitish (noreply@blogger.com) at December 19, 2015 04:25 AM

    Lab 5

    What is Vectorization?
    In simple words, it taking a loop that has either multiple loops within it or is doing a lot of data processing and instead of processing one element at a time, it will try to process multiple elements at a time.

    The example I will be using can be found here:

    https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookWritingAutovectorizableCode

    The code I used: 
    const unsigned int ArraySize = 10000000; float* a = new float[ArraySize]; float* b = new float[ArraySize]; float* c = new float[ArraySize]; for (unsigned int j = 0; j< 200 ; j++) // some repetitions for ( unsigned int i = 0; i < ArraySize; ++ i) c[i] = a[i] * b[i]; }

    The time it took to run ./a.out without -03(Vectorization turn on a when the -O3 flag is used): 
    real    0m17.581s
    user    0m17.570s
    sys     0m0.000s


    The time it took to run ./a.out with -03:
    real    0m1.226s
    user    0m1.220s
    sys     0m0.000s


    As you can see there is a huge difference. Vectorization can increase the processing speed of your program by a big margin (more likely if your code consists of large data processing within loops). This was started by Intel that added Single-Instruction-Multiple-Data (SIMD) machine instructions into their CPU's. These instructions allow you to implement the math operators you would use in a for loop such as addition, multiplication, etc to multiple values at time.

    This lab was performed on Bertty, rather than Aarchie. I had trouble logging into Aarchie.

    by Nitish (noreply@blogger.com) at December 19, 2015 04:13 AM


    Jayme Laso-Barros

    Trip Report: FSOSS 2015

    By the recommendation of my professor, I attended the Free Software and Open Source Symposium hosted at my school’s campus. The conference lasted for two days and during that time I attended a few presentations. I will talk about the two presentations that have interested me the most.

    Thursday afternoon’s keynote was presented by Ruth Suehle, the community marketing manager of the Open Source and Standards Group at Red Hat and a senior editor at GeekMom. This keynote revolved around two things: makers and openness. Ruth began her presentation with a small history lesson about how makers existed since the Prehistoric Era, explaining how our ancestors went from making tools out of sticks and stones to wall-paintings, all the way to more complex contraptions like vehicles and rocket ships. She goes into detail about open-source and the concerns from those outside its community, such as profitability. This is countered by showing how profits for Raspberry Pi remained stable despite being their hardware being cloned multiple times by others. I enjoyed the examples she gave of open-source projects benefiting society, such as the Open Geiger Project in Japan.

    Ruth believes that openness matters and makers are innovators at heart. I found her to be incredibly positive throughout the presentation, but it left me wondering if there were any negatives at all to being more open. After the presentation, I asked her directly if she though there were any possible downsides. She said there can be, but of course nothing in the world is perfect; the positives of open source greatly outweigh the negatives. What I enjoy about the keynote is that you don’t necessarily have to be tech-savvy to enjoy it. Ruth’s enthusiasm and use of examples outside technology make this an great presentation that can be shown to just about anyone.

    Another presentation I’ll talk about is Optimizing Open Source Projects for Contribution by Joshua Matthews, a Platform Engineer at Mozilla. Joshua highlighted how important it is document as much as you can, wherever you can. From basic things like README files, where to find support, and how to contribute, to more advanced content such as posting release notes, FAQs, quickstart guides, and changelogs. These can all look like separate tasks that could take time away from actually working on the software itself, but in the long-term it helps to attract newcomers to your project and thus allows the community to grow. One cool example given was Dolphin, which is a GameCube and Wii emulator. Every month, changelogs are posted on the front page of the website in the form of a progress report, which lists notable changes made to the project (with images highlighting the issue or change) by the contributors.

    Joshua also advises to avoid information overload; try to keep things brief but detailed, so that people don’t feel overwhelmed just by reading. He explains how important it is to make a good first impression for newcomers, and to also follow through when a contribution is made. Respond to contributors appropriately and with good manners while minimizing feedback time. Recognize their contributions and suggest what should be done next. He believes that any contributor has the potential to end up becoming a long-term maintainer for the project, especially if you treat them right. Overall, Joshua’s presentation shows that documentation for your project is highly important, but so is treating any contributors with openness and respect.


    by Jayme LB at December 19, 2015 04:00 AM


    Miguel Dizon

    Optimizing GNU Make

    As I wrote about in my previous post, I’m using the M10k framework to help me benchmark Make. This framework parses and permutes a configuration file containing groups of compiler flags, builds the programs with the flags, then tests and runs a benchmark on it. I was able to use it for my use case by just modifying its plugins.

    My benchmark basically runs a Makefile for 200 times. My professor suggested to run my benchmark numerous times, otherwise my benchmark runs would only take around two seconds and wouldn’t provide any accurate results. The Makefile being used is this. It takes a really long string and manipulates it using some text functions from Make. It also links and creates a lot of files.

    COUNT=0
    cp make-4.1/make .
    START=$(date +%s.%N)
    while [ $COUNT -lt 200 ]; 
    do
    	./make --silent
    	./make clean --silent
    	let COUNT=COUNT+1
    done
    END=$(date +%s.%N)
    SCORE=$(echo "($END - $START)*100"|bc|sed "s/\.[0-9]*//") 
    

    My benchmark plugin

    My main goal with this project is to find a set of compiler flags that would help Make perform better on x86_64 and AArch64. These are the compiler flags that were used for my benchmarking. It was originally provided by my SPO600 class, but I had to remove a few options since the compiler was giving me warnings with them and they were making the self-tests fail. The AArch64 benchmark ran on the same machine as my previous post, while the x64_86 benchmark ran on a Ubuntu virtual machine on my desktop computer.

    Results

    Permutations

    x86_64 Results:

    2015-12-18_16-12-38

    AArch64 Results:

    2015-12-18_16-12-49

    The surprising thing about these results was that the memory score was consistent across the board on both architectures.As far as speed score goes, the differences between how fast each permutation ran was mostly very marginal. For x86_64, permutation 6 was consistently faster than the others by a few seconds (around 3% on average).

    Conclusion and Recommendations

    The anomalies from my results might mean that due to the nature of Make, it’s hard to try to benchmark it. But I don’t think that’s not really important. Most of the time, Make is typically used to execute bash commands to compile, install, and run programs. A lot of the time spent processing most of the Makefiles I’ve used is during when a time consuming process like GCC or its equivalents are executed. If I had more time, I would try to write a benchmark that focuses more on parsing Makefiles and how Make checks files if they’re updated.

    Since GNU Make is very old software that’s part of the GNU build system (which also includes GCC) and it’s very rarely updated, I’d assume that its authors already developed it a lot through its years.


    by madizonseneca at December 19, 2015 03:58 AM


    Joseph Jaku

    Final Post for Now

    SPO600 was one of the more interesting classes I've taken at my time at Seneca. When I was selecting my timetable in August, I really didn't have many options. But after hearing some kind words about Chris and the interesting subject material we'd be looking at I knew that it would be worth it.

    I was a little weary at first because I was one of the few CTY students enrolled in the class which made me think we'd be doing a lot of programming. I was able to cope in IPC144 and my other language related courses, but I was definitely rusty this semester. I struggled a bit in the coding that we did do, mostly on lab 3, the algorithm selection lab and when trying to contribute to our framework for m10k_permute. It would have been nice to spend some more time on these two pieces of the class, but the semester really was a busy one.

    When I first came in there was so much I didn't know. Every class my mind was getting blown by the new content. When we looked at our first objdump, I was amazed to see the effects of compiling source code, I wasn't aware of all the added code just to get a simple "Hello World" script working. Then when we took a look at assembly I was finally able to understand the way a processor interprets code. Up until now, there was a gap in my knowledge between my code and what the processor does with it. After completing SPO600 the puzzle has become more clear. I also had no idea about the Aarch64 architecture and how it differed from the processors I was used to.

    I had a great time in our lecture periods, it was a nice change to the usual class structure. I found it a lot easier to pay attention and learn in SPO, mostly due to the group environment and interesting topics. In a program with such small class sizes it was nice to get a chance to work and talk with other people so I knew I wasn't the only one who felt clueless. I think the project ended up taking a bit longer than anyone expected, so starting a bit earlier on it wouldn't have hurt but with a new curriculum it's obviously difficult to plan ahead.

    In the future I'll definitely keep what I've learned this semester in SPO in mind. Before this class I had no idea the effects an over-complicated line of code could do. A few milliseconds of optimization in one place can add up to minutes down the road. I'll be sure to keep things short and sweet and won't be afraid to try different approaches to see which one gives the best overall performance. I'll also be sure to plan well ahead when working on any project, because you never know what could come up.

    by Joseph J (noreply@blogger.com) at December 19, 2015 03:07 AM


    Gaurav Patel

    Software Portability and Optimization course review


    The Software Portability and Optimization course have been a rough, this course is full of a lot of information. Everything that I learned was new to me and most of that was interesting and additionally Chris Tyler told us about some handy tools beside course material. In the beginning I regretted taking this course but it turned out an interesting class and when you email Chris with problem he replies within a day to keep us moving so we don't get stuck on our work.

    Things that I liked most is working with shell scripting, learning new things about Linux, improving programs using algorithms and compilation process (compiler options). Chris gave and showed samples on how he did the work which got us started. Before this course when I wrote code I'd try to improve it by removing unnecessary variables and code, trying to consume minimal memory as possible but these things never made difference because I learned that the compiler readjusted the code to improve it. So, the things that I did to improve the code the compiler does it automatically therefore, from now on I'll just work with compiler options to improve my code other than manually doing it.

    by Gaurav (noreply@blogger.com) at December 19, 2015 02:58 AM

    Vectorizing

    Everyone wants to write their code that gives better performance and one of the processes which slows down program performance is loops. If you are writing a loop that loops through each element in array/s and you want to optimize it to get better performance, you could use vectorization to get best performance on the loop. Vectorization is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.

    Ex:
    void copy(char* dest, char* source, long size) {
    int i;
    for (i = 0; i < size; i++)
    dest[i] = source[i];
    }
    The code above copies data from one location into another, the performance wouldn't matter is the size is small (like: 100 - 1000) but if the size was high (millions) then you can see the effect but by using vectorization the program copies N elements (lets say 4) at cost of one element copy in code above only if the memory region of destination and source aren't overlapping. To apply vectorization on your program you could do is compile your program with -03 flag. If the loop is guaranteed to operate on non-overlapping memory regions then you could add “#pragma ivdep” on top of loop which informs compiler to ignore vector dependencies and the vectorizing will be applied to your loop with even better performance then -03 because the vectorization checks will not run but if the loop operates on overlapping memory then your program will give false results.
    The difference between loop unrolling and vectorization is loop unrolling will do 4 (4 as an example) separate operations in one index increment but in vectorization it will copy 4 (4 as an example) data and place it in destination in 1 operation. It's easy to show it in code, so here it is:

    Loop Unrolling
    for (int i=0; i < size; i += 4){
           dest[i] = source[i];
           dest[i + 1] = source[i + 1];
           dest[i + 2] = source[i + 2];
           dest[i + 3] = source[i + 3];
    }

    Vectorization
    for (int i=0; i < size; i += 4){
           copyFourThingsAtOnce(&dest[i], &source[i]);
    }

    by Gaurav (noreply@blogger.com) at December 19, 2015 02:57 AM


    Jayme Laso-Barros

    A Reflection on SPO600: Software Portability and Optimization

    Seneca’s Fall 2015 semester is coming to a close this week, and with that I have some brief thoughts and feedback for SPO600.

    I must give credit where credit is due, Chris Tyler is one of the most knowledgeable, friendly, and helpful professors I’ve ever had at Seneca. It goes without saying, but I would definitely attend a course with him teaching again. My only gripe is that I wish I could view my marks as the semester progressed. As of typing this, Blackboard has no grades posted for this course. However, I recall Chris telling us once not to worry so much about the grades, as long as we shown we’ve learned something.

    I also want to talk about the learning environment for this course. This is the first course I attended that uses the school’s Active Learning Classrooms. I can’t say I’m fan of this approach however, mostly because a laptop is mandatory. It could be argued that yes, one should have a laptop with them anyways for software courses, but to me class sessions in a computer lab are still great for this. The laptop I used this semester is a bulky, four-year-old ASUS that’s burrowed from a family member and always needs to be plugged in because the battery only lasts for two minutes. And no, I don’t have the money to get myself a new one, otherwise I wouldn’t be complaining. I don’t want come off as completely negative though, so I will say that I liked the LCD screens all around the walls. It was great for lectures since you didn’t have to crank your neck around to view a single screen, unless Chris started writing on the whiteboard that is. The ability to plug your device to one of the screens is also a plus, since you can display your work from wherever you’re sitting.

    Regarding my project, I definitely regret not putting much more time into figuring the ins and outs of the class framework, cmake, and the benchmarking tools for MariaDB. If I could rewind time I would definitely ask more questions to my professor about things such as why would a software package fail to run make when using gcc instead of c++ as the C++ compiler. While I did have results for building the software with different compiler groups, I couldn’t show anything to how it performs.

    A whole lot of the course material, from the architecture of the servers we used to coding a “Hello World” program in assembly, were completely new to me. I learned a lot more about compiling than I have in the other programming courses combined. To be honest though, I’m not sure where I could apply this knowledge in the future. Nevertheless, it was definitely an interesting course, and I would recommend it to anyone who is interested in either optimizing their C/C++ code better or working with open source software.


    by Jayme LB at December 19, 2015 01:59 AM


    Nitish Bajaj

    OpenSSL

    This was a presentation I had made regarding the project. We were told to get to know the program (earlier in the semester) we picked and I decided to make a presentation(I thought we were presenting but we weren't). So I'll talk more about this program.
    OpenSSL is an open source program/tool used to make communication/file transfering safe that uses SSL and TLS. SSL is a cryptographic protocol, the standard security that makes it possible to have a encrypted link between a client and a server. It ensures no other person will eavesdrop or interfere with your connection. TLS is the successor to SSL.

    How does SSL Work?
    • SSL Certificate
      • Have a key pair: public and private key
      • Certificate also contains a "subject": identity of the owner
    • Public Key
    • It uses encryption algorithms such as RSA & Elliptic Curve Cryptography (ECC), which allows to create both public and private keys
    • RSA is the initials of the people who first described the algorithm. Ron Rivest, Adi Shamir and Leonard Adleman
    TLS Handshake protocol: Web browser & server 
    1. Browser requests Secure Socket (ask the server to identify itself) 
    2. Server responds with the SSL certificate (server sends a copy of the SSL certificate) 
    3. Session key seed is encrypted with SSL public key and sent to server (the browser checks whether  it trusts the SSL certificate) 
    4. Server indicates that all future communication for the session is to be encrypted 
    5. Browser and Server can send encrypted data between each other.
    Cipher suite:  it is a combination of authentication, encryption, message authentication code (MAC) and key exchange algorithms, which is used to accommodate the security setting for the network.

    Here is one way to check if you connection to the server is encrypted. By clicking on the lock beside the back button, and depending on the version of firefox you are currently, it might say more information or give you an error and then more information.
     

    Has it been broken/cracked? 
    • There are reports that say that the NSA has been working on breaking encryption such as SSL, VPN, etc. All of these have one thing in common, AES 256.
    • It has also been estimated that a brute force attack on a message that is encrypted with AES-256 encryption, would take longer to break than the universe has been in existence, even for a super computer.

    by Nitish (noreply@blogger.com) at December 19, 2015 01:39 AM

    OpenSSL Framework Project

    I decided to do my framework project/benchmarking on OpenSSL. OpenSSL is an open source program/tool used to make communication/file transferring safe that uses SSL and TLS. SSL is a cryptographic protocol, the standard security that makes it possible to have a encrypted link between a client and a server. It can be used for other purposes as well such as encrypting a file on your local machine, which is what I've done in my project.
    Getting an understanding of this framework was a little annoying and tedious for me. I had problems in the beginning understanding like many others did, but got the hang of it after my colleagues helped me get a better understanding of it. The only real problem I had was an error regarding the file "m10k_cogitate", no idea what caused it but I was informed by one of my colleagues that running my benchmarks on xerxes would not cause this problem, and it didn't.

    So for my bench marking, I decided to encrypt professor's gzip file, I used -02 and -03 and on xerxes (x86_64).

    Permutation columns: uniqueId, givenId, flags, buildExit, buildWallTime, buildUserSystemTime, buildSize, testExit
    Benchmark columns: uniqueId, permutationUniqueId, givenId, speedScore, memoryScore

    (1, 1, '-O2', 0, 131.62, 172.21, 0, 0)
            (1, 1, 1, 12753, 3548)
            (2, 1, 2, 317, 3504)
            (3, 1, 3, 226, 3536)
            (4, 1, 4, 219, 3536)

    (2, 2, '-O3', 0, 126.9, 173.32, 0, 0)
            (5, 2, 1, 2390, 3436)
            (6, 2, 2, 349, 3536)
            (7, 2, 3, 284, 3456)
            (8, 2, 4, 284, 3488)


    Timing was an issue since it stopped many times and asked me to enter a password to encrypt the file.

    /***************/
    m10k_build_plugin:
    #!/bin/bash
    #m10k_build_plugin test file

    # Unpack source
    tar xzf openssl*t*z

    # Goes into project directorry
    cd openssl-1.0.0t

    # Read configuration to be tested
    read PERMUTATION

    # Set FLAGS to the options from permutation
    export CFLAGS="$PERMUTATION"

    #./config || exit $?
    ./config --openssldir=/home/nsbajaj/project/openssl-install || exit $?
    make -j8

    make install


    /********************/
    m10k_benchmark_plugin:
    #!/bin/bash

    cd ~/project/openssl-install/bin
    START=$(date +%s.%N)
    #./gunzip -v <../gutenberg-10M.txt.gz|gzip -v >/dev/null
    rm encryptedFile
    ./openssl aes-256-cbc -a -salt -in ~/project/gzip-1.6.tar.gz -out encryptedFile
    END=$(date +%s.%N)
    SCORE=$(echo "($END - $START)*100"|bc|sed "s/\.[0-9]*//")
    echo "$SCORE"

    /**************/
    I decided to use aes-256 encryption, just because that is the highest AES(Advanced Encryption Standard) encryption type available and pretty much impossible to crack, added salt and then the -in is for the input file (professor's gzip) and once it's done encrypting, it outputs an encrypted file with the name of "encryptedFile". 

    /**************/
    m10k_test_plugin:
    #!/bin/bash

    cd openssl-[0-9]*
    make test


    This benchmarking was performed on the OpenSSL 1.0.1k-fips 8 Jan 2015.



    by Nitish (noreply@blogger.com) at December 19, 2015 01:17 AM


    Jayme Laso-Barros

    Recommendations for the MariaDB source code and cmake

    Our project for the SPO600 course requires each student to build, test, and benchmark an open-source software package (that benefits from GCC compiler flags), and then communicate the results upstream. Our professor provided a list of packages to choose from, and for my project I chose MariaDB (version 10.1) simply because database software was the most familiar to me compared to other options I was able to choose from. The project is divided into three stages: baselines, results, and recommendations.

    This post will cover stage three: recommendations to the upstream project based on the results of stage two.

    Click here for stage one of this project.

    Click here for stage two of this project.

    To the folks who handle MariaDB’s source code, I would say when building the Release With Debug Info version of MariaDB, continue compiling the C++ code using…well, the c++ command. gcc obviously causes issues when building from source so it’s always best to stick with what works. On the other hand, there could be some value with using gcc for the C compiler instead of cc, or at the very least changing the flag options. As explained in stage two, build times and memory usage are reduced by a fair amount when not using the default compiler flags for both the x86_64 and AArch64 systems. The installation time is slightly shorter for AArch64 as well.

    The documentation for building and installing is superb, but I do have one nitpick. The build instructions do say that all cmake configuration options can be listed with ‘cmake . -LH’, but there’s two problems with that: first, you still need to run cmake in order to actually see what those options are; second, any custom options that are used in the source package are not displayed. I didn’t know they existed until I stumbled upon the CMakeCache.txt file, which again can only be seen after running cmake.

    For MariaDB’s own benchmark tools, I would have enjoyed it more if the documentation for benchmarking were as robust and user-friendly as the documentation for getting started. Most of the information on configuration variables and whatnot is within the config files itself. There is a page at least that shows you info on the perl script that automates the benchmark, but the conf files used have little to no information. Granted when viewing the conf files, the variables are described well and it’s clearly outlined as to what’s changeable and what should be changed to the needs of your system. It would still be nice to have that info in the online documentation though. Running the benchmark script requires linking to the bzr repository (i.e. Launchpad) of the MariaDB tree to use and compile, which could be a problem since newer versions are hosted on GitHub.

    Aside from my gripes with their benchmarking tool, MariaDB’s source code and documentation are top notch, which makes it a worthy alternative to MySql.


    by Jayme LB at December 19, 2015 12:28 AM

    December 18, 2015


    Jayme Laso-Barros

    My Experience with Optimizing the MariaDB 10.1 Source Package

    Our project for the SPO600 course requires each student to build, test, and benchmark an open-source software package (that benefits from GCC compiler flags), and then communicate the results upstream. Our professor provided a list of packages to choose from, and for my project I chose MariaDB (version 10.1) simply because database software was the most familiar to me compared to other options I was able to choose from. The project is divided into three stages: baselines, results, and recommendations.

    This post will cover stage two: determining the best GCC compiler flags for optimizing the package on either an x86_64 or AArch64 server, and then analyzing the results.

    Click here for stage one of this project.

    Click here for stage three of this project.

    To help with this stage of the project, the professor and all students of SPO600 collaborated to create a framework (link to GitHub repo) that will loop through different groups of GCC compiler options and then build, test, benchmark, and analyze the software for each one. However, since each software package is obviously built differently, changes had to be made to the following:

    • Source Tarball
    • Makefile
    • Build Plugin
    • Test Plugin
    • Benchmark Plugin

    I have forked the framework to my own GitHub account so you may view the changes I made for my project here.

    Starting off with the source tarball, I created one using this article as a guideline. However, I later found out that in order for the framework to function properly it requires the monkeys10k.config file and the plugins inside it as well. Also, MariaDB’s own benchmarking tools are not included in the source code, so according to this article I needed to grab those tools from Launchpad and place them into the root of the source directory. Afterwards, I basically just used tar to unzip the tarball, drop the extra files in the folder, and then zip it back up again.

    Next was modifying the Makefile to copy the modules and my tarball to a new directory, .m10k, for the purposes of running the framework. You can view my Makefile here, but for convenience here’s what it looks like as of this post:

    all:
            echo "Nothing to do"
    
    install:
            mkdir -p ~/.m10k/
            mkdir -p ~/bin
            cp m10k_* ~/.m10k/ -v
            cp m10k ~/bin
    
    test:
            cd test-data; tar cvzf ../test-mariadb-10.1.9.tgz .
            mkdir -p ~/.m10k/
            cp m10k_* ~/.m10k/ -v
            ./m10k test-mariadb-10.1.9.tgz

    Next is the build plugin. In my previous post for this project I said that the compiler flags are set in CMakeLists.txt. As it turns out, the compiler flags can actually be set when running cmake. The only problem is that I couldn’t find out what options there are to use until after cmake is ran, so that it generates the CMakeCache.txt file with the options. I ran cmake once without setting any compiler flags to see what the default values were, and found this in the generated CMakeCache.txt file:

    //Choose the type of build, options are: None(CMAKE_CXX_FLAGS or
    // CMAKE_C_FLAGS used) Debug Release RelWithDebInfo MinSizeRel
    CMAKE_BUILD_TYPE:STRING=RelWithDebInfo
    
    //CXX compiler.
    CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/c++
    
    //Flags used by the compiler during all build types.
    CMAKE_CXX_FLAGS:STRING=
    
    //Flags used by the compiler during debug builds.
    CMAKE_CXX_FLAGS_DEBUG:STRING=-g
    
    //Flags used by the compiler during release builds for minimum
    // size.
    CMAKE_CXX_FLAGS_MINSIZEREL:STRING=-Os -DNDEBUG
    
    //Flags used by the compiler during release builds.
    CMAKE_CXX_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
    
    //Flags used by the compiler during release builds with debug info.
    CMAKE_CXX_FLAGS_RELWITHDEBINFO:STRING=-O2 -g -DNDEBUG
    
    //C compiler.
    CMAKE_C_COMPILER:FILEPATH=/usr/bin/cc
    
    //Flags used by the compiler during all build types.
    CMAKE_C_FLAGS:STRING=
    
    //Flags used by the compiler during debug builds.
    CMAKE_C_FLAGS_DEBUG:STRING=-g
    
    //Flags used by the compiler during release builds for minimum
    // size.
    CMAKE_C_FLAGS_MINSIZEREL:STRING=-Os -DNDEBUG
    
    //Flags used by the compiler during release builds.
    CMAKE_C_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
    
    //Flags used by the compiler during release builds with debug info.
    CMAKE_C_FLAGS_RELWITHDEBINFO:STRING=-O2 -g -DNDEBUG

    So what cmake variables are used in the build depends on the build type. In this case, I am using RelWithDebInfo, short for Release With Debug Info, which can come in handy for benchmarking. Looking through the MariaDB documentation, I also found this article that explains how to build specifically for benchmarks, so I based my build plugin around that, like so (link to GitHub file):

    #!/bin/bash
    #m10k_build_plugin test file
    
    # Unpack source
    tar xf mariadb*t*z
    
    # Goes into project directorry
    cd mariadb-[0-9]*
    
    # Create a new directory for an out-of-source build (recommended when using cmake)
    rm -rf build
    mkdir build
    cd build
    
    # Read configuration to be tested
    read PERMUTATION
    
    # Set the install directory.
    INSTDIR=$HOME/bin/mariadb
    
    # Set FLAGS to the options from permutation
    CFLAGS="$PERMUTATION -g -DNDEBUG"
    
    # Set the installation layout options
    CMAKE_LAYOUT_OPTS="-DCMAKE_INSTALL_PREFIX=$INSTDIR -DMYSQL_DATADIR=$INSTDIR/data"
    
    # Set the feature options
    CMAKE_FEATURE_OPTS="-DWITH_READLINE=1 -DWITHOUT_OQGRAPH_STORAGE_ENGINE=1"
    
    # Set the build options
    CMAKE_BUILD_OPTS="-DCMAKE_BUILD_TYPE=RelWithDebInfo"
    
    # Run cmake in the build folder
    CC=gcc cmake .. $CMAKE_BUILD_OPTS \
    $CMAKE_LAYOUT_OPTS \
    $CMAKE_FEATURE_OPTS \
    -DCMAKE_C_FLAGS_RELWITHDEBINFO="$CFLAGS"
    
    make -j13 && make install

    You may have noticed that I am only optimizing for the C code instead of both C and C++. The reason for that is whenever I try setting the C++ compiler from c++ to gcc, make will fail on both the x86_64 and AArch64 servers.

    CC_CXX_cmake_make_attemptThis is my result when running make after using “CC=gcc CXX=gcc cmake”
    CXX_cmake_make_attemptI also tried running make after just “CXX=gcc cmake” but it only seems to fail much sooner.

    Looking at where it crashed, it most likely has something to do with libgroonga. Unfortunately I was unable to find a way to get gcc to work with C++ so I will have to make do with just optimizing the C code.

    The test plugin was incredibly easy, as MariaDB comes with its own make test that can be run after cmake and even before make install. I just had to make sure I was in the correct directory (link to GitHub file):

    #!/bin/bash
    
    cd mariadb-[0-9]*/build
    make test

    Now, at this point in the project I hit a few brick walls. I noticed that the permute module would be looping endlessly and build with the same set of flags. Also, one or more groups of flag options would cause make to fail partway through, and those groups were different depending on the system I ran it on. In other words, I couldn’t get the framework to work properly for my software package. Finally, I couldn’t figure out how to get the benchmarking tools to run (this is explained further in my stage three post).

    I don’t want to come out of this empty handed though, so I want to at least show the run times of building and installing MariaDB using  on an x86_64 server and two different AArch64 servers. Here’s how I did it:

    1. From the MariaDB source directory, create a ‘build’ folder to use as an out-of-source build, which is recommended when using cmake.
    2. From the build folder, run this cmake command:
      CC=gcc cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=$HOME/bin/mariadb -DMYSQL_DATADIR=$HOME/bin/mariadb/data -DWITH_READLINE=1 -DWITHOUT_OQGRAPH_STORAGE_ENGINE=1 -DCMAKE_C_FLAGS_RELWITHDEBINFO=""

      Where -DCMAKE_C_FLAGS_RELWITHDEBINFO contains the first set of a group of flag options from the config file. I.e. Running cmake with group2 would look like this:

      -DCMAKE_C_FLAGS_RELWITHDEBINFO="-falign-functions -falign-jumps -falign-labels -falign-loops -fcombine-stack-adjustments"
    3. Run make with the -j option to speed up the process, while timing it with /usr/bin/time:
      /usr/bin/time -v -o ~/buildtimes/btgX_Ymake.txt make -j13

      Where X is the group number and Y is the run number for the group. I.e. btg2_3make.txt would be the 3rd run for group2 of the compile flags.

    4. Run ‘make test’ just to check that everything passes.
    5. Similar to step 3, but this time with make install:
      /usr/bin/time -v -o ~/buildtimes/btgX_Yinstall.txt make install
    6. Use ‘rm -rf’ to remove the out-of-source build folder and the installation directory so that the next build and installation starts fresh.
    7. Repeat the above steps until each group of compiler flags have been run at least three times to get a good average.

    I also ran that cmake command without -DCMAKE_C_FLAGS_RELWITHDEBINFO a couple times so that I could time the build and installation using the default compiler flags, which happen to be ‘-O2 -g -DNDEBUG’.

    The software was built and installed on an x86_64 system and two different AArch64 systems, though not all groups of compile flags would work. For the x86_64 system, group1 of the config file would cause make to fail, while for both AArch64 systems group4 was the only group where make did not fail.

    Here are the results for the x86_64 system:

    == 'make -j13' with default flags, 1st run ==
    Command being timed: "make -j13"
            User time (seconds): 2017.21
            System time (seconds): 190.00
            Percent of CPU this job got: 387%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 9:30.02
            Maximum resident set size (kbytes): 602208
    
    == 'make install' with default flags, 1st run ==
    Command being timed: "make install"
            User time (seconds): 5.71
            System time (seconds): 6.06
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.88
            Maximum resident set size (kbytes): 15516
    
    == 'make -j13' with default flags, 2nd run ==
    Command being timed: "make -j13"
            User time (seconds): 2014.80
            System time (seconds): 189.10
            Percent of CPU this job got: 376%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 9:45.62
            Maximum resident set size (kbytes): 602140
    
    == 'make install' with default flags, 2nd run ==
    Command being timed: "make install"
            User time (seconds): 5.65
            System time (seconds): 6.22
            Percent of CPU this job got: 98%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.07
            Maximum resident set size (kbytes): 15604
    
    == 'make -j13' with default flags, 3rd run ==
    Command being timed: "make -j13"
            User time (seconds): 2022.79
            System time (seconds): 188.99
            Percent of CPU this job got: 386%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 9:31.69
            Maximum resident set size (kbytes): 602176
    
    == 'make install' with default flags, 3rd run ==
    Command being timed: "make install"
            User time (seconds): 5.79
            System time (seconds): 5.93
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.83
            Maximum resident set size (kbytes): 15536
    
    == 'make -j13' with group2 flags, 1st run ==
    Command being timed: "make -j13"
            User time (seconds): 1594.96
            System time (seconds): 173.55
            Percent of CPU this job got: 385%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 7:38.68
            Maximum resident set size (kbytes): 412616
    
    == 'make install' with group2 flags, 1st run ==
    Command being timed: "make install"
            User time (seconds): 5.73
            System time (seconds): 6.01
            Percent of CPU this job got: 98%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.86
            Maximum resident set size (kbytes): 15576
    
    == 'make -j13' with group2 flags, 2nd run ==
    Command being timed: "make -j13"
            User time (seconds): 1596.51
            System time (seconds): 173.11
            Percent of CPU this job got: 382%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 7:42.87
            Maximum resident set size (kbytes): 412552
    
    == 'make install' with group2 flags, 2nd run ==
    Command being timed: "make install"
            User time (seconds): 5.69
            System time (seconds): 5.78
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.58
            Maximum resident set size (kbytes): 15512
    
    == 'make -j13' with group2 flags, 3rd run ==
    Command being timed: "make -j13"
            User time (seconds): 1601.42
            System time (seconds): 173.76
            Percent of CPU this job got: 382%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 7:43.93
            Maximum resident set size (kbytes): 412560
    
    == 'make install' with group2 flags, 3rd run ==
    Command being timed: "make install"
            User time (seconds): 5.68
            System time (seconds): 5.84
            Percent of CPU this job got: 98%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.66
            Maximum resident set size (kbytes): 15528
    
    == 'make -j13' with group3 flags, 1st run ==
    Command being timed: "make -j13"
            User time (seconds): 1596.67
            System time (seconds): 173.29
            Percent of CPU this job got: 382%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 7:42.57
            Maximum resident set size (kbytes): 413336
    
    == 'make install' with group3 flags, 1st run ==
    Command being timed: "make install"
            User time (seconds): 5.63
            System time (seconds): 5.83
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.55
            Maximum resident set size (kbytes): 15620
    
    == 'make -j13' with group3 flags, 2nd run ==
    Command being timed: "make -j13"
            User time (seconds): 1594.73
            System time (seconds): 172.97
            Percent of CPU this job got: 382%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 7:42.04
            Maximum resident set size (kbytes): 413132
    
    == 'make install' with group3 flags, 2nd run ==
    Command being timed: "make install"
            User time (seconds): 5.74
            System time (seconds): 5.72
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.58
            Maximum resident set size (kbytes): 15612
    
    == 'make -j13' with group3 flags, 3rd run ==
    Command being timed: "make -j13"
            User time (seconds): 1608.12
            System time (seconds): 173.75
            Percent of CPU this job got: 337%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 8:48.01
            Maximum resident set size (kbytes): 413424
    
    == 'make install' with group3 flags, 3rd run ==
    Command being timed: "make install"
            User time (seconds): 6.32
            System time (seconds): 6.76
            Percent of CPU this job got: 76%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:17.18
            Maximum resident set size (kbytes): 15580
    
    == 'make -j13' with group4 flags, 1st run ==
    Command being timed: "make -j13"
            User time (seconds): 1602.99
            System time (seconds): 173.75
            Percent of CPU this job got: 382%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 7:44.71
            Maximum resident set size (kbytes): 412768
    
    == 'make install' with group4 flags, 1st run ==
    Command being timed: "make install"
            User time (seconds): 5.95
            System time (seconds): 6.45
            Percent of CPU this job got: 98%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.58
            Maximum resident set size (kbytes): 15620
    
    == 'make -j13' with group4 flags, 2nd run ==
    Command being timed: "make -j13"
            User time (seconds): 1605.84
            System time (seconds): 173.78
            Percent of CPU this job got: 379%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 7:48.38
            Maximum resident set size (kbytes): 412712
    
    == 'make install' with group4 flags, 2nd run ==
    Command being timed: "make install"
            User time (seconds): 6.34
            System time (seconds): 7.04
            Percent of CPU this job got: 76%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:17.58
            Maximum resident set size (kbytes): 15576
    
    == 'make -j13' with group4 flags, 3rd run ==
    Command being timed: "make -j13"
            User time (seconds): 1618.49
            System time (seconds): 176.42
            Percent of CPU this job got: 321%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 9:17.99
            Maximum resident set size (kbytes): 412916
    
    == 'make install' with group4 flags, 3rd run ==
    Command being timed: "make install"
            User time (seconds): 5.69
            System time (seconds): 5.80
            Percent of CPU this job got: 99%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.60
            Maximum resident set size (kbytes): 15556

    Compared to the default flags, the max resident set size is reduced by about 32% across all the config groups. The run time for building the software is also reduced by a couple minutes across those groups. There is no major difference found for the install time. It’s close, but I would say that group2 would be the best choice for compiling. While the build times are similar, group3 and group4 took a noticeable dip in build time by the 3rd run, while group2 remained steady.

    By the request of my professor, I am unable to display the absolute build and install times for the other two servers. However, I can say that group4 absolutely came out on top for both building and installing the package compared to the default settings.

    When I did a generic build on an AArch64 server back in stage one, the typical runtime for building MariaDB was rather long. With the -j13 option for make, the runtime is more satisfactory (a near 80% difference!) but still longer than the other two systems. Compared to the results above, a lot more RAM was used, but even that’s cut down to nearly half the amount when group4 is used to compile. The results for the third default run got a bit skewed due to running it while a couple other students were building their software as well. Regardless, a good 25% of build time is shaved off when using the group4 compiling options. Install times are reduced by 20% as well.

    The other AArch64 system is much faster, almost at the same level as the x86_64 system. Once again, memory usage and build times are cut by roughly half when compiling with the group4 options.


    by Jayme LB at December 18, 2015 11:50 PM


    Kirill Lepetinskiy

    M10K: Building and benchmarking Perl

    For the 10k Monkeys project I picked Perl as the package that I would analyze. Our main goal is to experiment with different compiler optimization and explore their effectiveness on the AArch64 platform.
    This choice was mostly random, but I also wanted to pick something that I thought would build and benchmark in a reasonable time, so that I could iterate quickly if I encountered some problem or wanted to delve deeper into some combination of flags.

    Building
    Configuring Perl to build with the desired flags was fairly easy, one only has to pass the desired options into the Configure script with the -Doptimize parameter. I used this configuration file as input for the permutations. It has 3 groups with 2 options each and one group with one option, which results in 9 possible permutations.

    The build itself turned out to be fairly quick, only taking a few minutes, even the configure script took longer than the actual compilation.

    Testing
    As Chris suggested, I decided to forgo using the make test provided by the package for the purposes of performing a sanity check, as it can be put to better use as a benchmark.

    Benchmarking
    As mentioned above, there would be 9 permutations to go through, with 4 benchmark runs each. A little problem arose here as I quickly discovered that a single run of make test takes about 20 minutes. Doing some arithmetic will tell you that a full run will take at least 12 hours.
    This when my procrastination got the better of me....again. I left it running overnight and decided to look at the results before the full script was finished. I could get at the results by finding the correct directory in /tmp/ and looking at analysis.txt.
    Quick aside: I think it would have been better to not wipe out analysis.txt or any other intermediate files when the script exits, as they could be valuable for salvaging failed runs, especially when it could take a very long time. It would be terrible to find out that something was wrong after letting a script run for a day and having nothing to use for diagnostics and debugging.


    Here are the results, lower speed score is better:
    Tarball including Perl source, config, and plugins
    Sqlite3 Database with results
    Permutations
    Averages for benchmark runs (check the id to see what the flags were)

    The quickest conclusion that is visible right away is that the memory usage remain almost constant.
    I'm not sure how to feel about this because we've learned during the course that there is often a trade off between memory usage and processing time for some optimizations (e.g. loop unrolling).

    As far as speed score goes, things weren't quite as static but the difference was still quite small. The gap between the best and worst scores is about 5%.
    The winner is permutation number 6, which has the following config:

    Group2: off
    Group3: on
    Group4: on

    5% might be a huge win if you're constantly running your application, but I am also hesitant about this result because the experimental conditions were far from rigorous. The tests ran over a period of 12 hours and could have easily had someone else log on and perform their tests.

    Conclusion
    The results of this first attempt were inconclusive, but I think it is not unreasonable to assume that there could be something here after all.

    Here are my recommendations for future research:
    • Review memory score algorithm, there was almost no variation here which leads me to believe that the testing methodology might be ineffective.
    • Increase benchmark sample size. 4 runs just isn't enough.
    • Look into shortening the benchmark, while 20 minutes is very thorough, it might be overkill and the increased speed of iteration might be worth the reduction in length.
    • Improve test environment rigor. The benchmarks are invalidated if each one is not run in the exact same environment, and that can't be guaranteed when the system is used by the entire class.

    by Kirill (noreply@blogger.com) at December 18, 2015 11:40 PM


    Gaurav Patel

    Assembly Language

    Writing assembler code is very difficult, it's difficult because you have to write lowest level programming language, you have to write code specific to architecture of machine, 32bit or 64bit machines.

    Take a look at following assembly code written for x86_64 machine. This code loops until start reaches to value in max and prints out each value.

    .globl    _start    /* must declare for program to know where to start program, it's just says which is main function. */

    .set     start, 0     /* starting value for the loop index */
    .set     max, 10    /* loop exits when the index hits this number (loop condition is i<max) */
    _start:
    mov     $start,%r15    /* loop index */
    loop:
    mov     %r15,%r14     /* copy to r14 for ascii conversion */
    mov     $'0',%r13        /* loding value of start into r13 register */
    add      %r13,%r14     /* adding r13 and r14 values and result is store into r14 register */
    movq   $digit,%rsi       /* message location, rsi is register source index */
    movb   %r14b,(%rsi)  /* write byte to start of message */
    movq   $len,%rdx       /* message length, rdx is register d extended */
    movq   $msg,%rsi      /* message location, rsi is register source index */
    movq   $1,%rdi          /* file descriptor stdout, rdi is register destination index */
    movq   $1,%rax         /* syscall, sys_write, rax is register a extended */
    syscall

    inc       %r15              /* increment register 15 */
    cmp     $max,%r15    /* see if we're done */
    jne       loop               /* loop if we are not */

    mov     $0,%rdi         /* exit status */
    mov     $60,%rax      /* syscall sys_exit */
    syscall 

    .data
    msg:    .ascii    "Loop: #\n"   /* message text - # is number of index */
    digit = . - 2                           /* memory address for digit */
    len = . - msg                         /* length of message */

    The code belongs to Chris Tyler and some of the comments belongs to him as well, it's quite difficult to understand some of the steps. This code is able to handle only single digit value for being printed. All the registers that were used in code above are for only x86_64 machine, for Aarch64 there are different set of registers. Take a look at following table for some basic difference between x86_64 and Aarch64 machine.


    X86_64 Aarch64
    R8 to r15 are general registers r0 to r30 are general registers
    rax, rbx, rcx, rdx, rbp, rsp, rsi, rdi are specialied regisers -
    64-bit registers using the 'r' prefix: rax, r15

    32-bit registers using the 'e' prefix (original registers: e_x) or 'd' suffix (added registers: r__d): eax, r15d

    16-bit registers using no prefix (original registers: _x) or a 'w' suffix (added registers: r__w): ax, r15w

    8-bit registers using 'h' ("high byte" of 16 bits) suffix (original registers - bits 8-15: _h): ah, bh

    8-bit registers using 'l' ("low byte" of 16 bits) suffix (original registers - bits 0-7: _l) or 'b' suffix (added registers: r__b): al, bl, r15b
    x0 through x30 - for 64-bit-wide access registers w0 through w30 - for 32-bit-wide access registers
    Openrations like add, divide, multiply takes two parameters and operation result is stored in second parameter. Openrations like add, divide, multiply takes three parametrs and result is stored in first parameter and other two are for operation values.
    Many differences.. Many differences..

    One of the good thing about writing assembly code is when you compiler it and analyze executable file using objdump command you will not see any code other than what you have written but when you write C/C++ code and analyze it' executable there is a lot of other code that you didn't write, that code is coming from your includes but since you don't need to include anything in assembly code so what you write is what you get. This assembly program is written in .s type extension and you just run:
    ls -o name.o file.s
    ld -o executable-name name.o

    You can also write/add assembly code in your C/C++ program using asm function, ex: asm("add $1, %0"). And to compile this you can just using gcc or g++ compiler.

    by Gaurav (noreply@blogger.com) at December 18, 2015 11:26 PM


    Kirill Lepetinskiy

    SPO600 4eva

    This course has definitely been a ride.

    The most interesting part of the course has to be its unorthodox structure. Most other courses are set up well in advance and go over the same material semester after semester. SPO600 is different because we had a much more flexible structure and we worked on genuinely interesting project. This project was not only stimulating intellectually, but was also real work being done in the open source community, which is a departure from the norm of working on artificial projects that are only there to evaluate us.

    Aside from this project, we've delved into the lowest depths of computers to figure out what makes them tick, and what techniques are used to make them tick faster and more efficiently. Just like learning C helped me think about the computer's memory, learning assembler helped me think about how the computer works on a very base level, which was highly rewarding and made me feel like a wizard.

    The only prior knowledge that was useful in this class were the topics that I was covering at the same time in my Parallel Programming course (GPU610). That course also deals with the hardware and optimization, but on the graphics card, which turned out to be an excellent complement to what we learned about the CPU.

    There was another "stealth" topic that I learned much about that isn't discussed on the syllabus: development in a Linux environment. This was thanks to Chris doing all of the work right in front of us on the screens in the classroom. I picked up a lot of new techniques related to simply using Linux in more powerful ways, shell scripting, automating builds, collecting and analyzing data acquired from our experiments, and so on.

    Additionally, Chris provided us with some very interesting insider knowledge about the cutting edge developments in processor design. I enjoyed taking a peek into the the future where we'll all be using many core low power processors that will greatly increase the available processing power, even if the community is only starting to think about writing programs to take advantage of this new paradigm.

    In conclusion, I have no doubt that I will make use of what I learned in in SPO, even if I don't touch any low level code again. However, I found the topics presented to be fascinating, and it sparked an interest in me. We live during an exciting time in computing, and there are several emerging fields that need tons of research like massively parallel computing, machine learning, and virtual reality. What they have in common is that we'll need to squeeze as much performance as we can out of the hardware we have, and to do this we'll want to work on open platforms and share our finding so that we can continue standing on the shoulders of giants like all that came before us.

    by Kirill (noreply@blogger.com) at December 18, 2015 08:30 PM


    Dmytro Yegorov

    Recommendations for Apache

    Considering results received from benchmarking runs (discussed in previous post), the following changes to source code/make file can de made by Apache developers:

    1. Developers could add extra logic to make file that would look at resources available (free memory, load of CPU) and determine which options need to be used during compile: if we can not use CPU at maximum, we would use group 6 or group 9; if we are running on low memory (let’s say most of memory is already in use), we would use group 4 or group 6. If both – CPU and memory – are not busy, we would use group 3 for ARM architecture and group 8 for x86_64.
    2. Developers could change their source code to directly fit each architecture: as there is almost no difference in performance for ARM architecture, we could decrease the work of compiler to minimum by applying optimization changes manually (reorder loops, functions, changing type of vars etc) to decrease build time. For x86_64 we could create separate instances of source code, so that each of them either is faster, uses less memory or builds faster – in this we would not need to determine user’s system characteristics, user would choose the build that suits his needs best.

    thumbs-up-sign


    by dyegorov at December 18, 2015 10:16 AM

    Testing Apache

    Finally we got our main config file working and now we can test our framework with apache and see what and how optimization options affect build and performance of Apache.

    The testing was performed on two processor architectures: x86_64 and aarch64.

    Aarch64:

    Strangely, all benchmarking tests almost did not differ in terms of performance: speed score was 14-15 and memory score was 1112-1128. Another criteria we could look at to compare the influence of optimization options is build time – the fastest build was achieved with group3: we saved up to 5 seconds of wall time comparing to any other optimization options.

     

    x86_64:

    On this machine every combination of optimization options has given different results. The fastest build was achieved with group 8, best memory score was in groups 4 and 6, best speed score – in groups 9 and 6. Interestingly, group 9 has shown 20x better performance in terms of speed and lost to group 4 and 6 in less than 1% of memory score; group 6 has shown not only best memory score, but also 10x faster performance.

     

     

    Groups Mentioned:

    Aarch64 group3
    aarch_group3

    x86_64 group8
    x86_group8

    x86_64 group9
    x86_group9

    x86_64 group4 and group6
    x86_group4

    x86_group6


    by dyegorov at December 18, 2015 09:19 AM


    Kirill Lepetinskiy

    Lab 4: Writing assembly

    We're finally here, at the lowest level of programming: assembly. I think that going any lower would involve composing the binaries by hand in a hex editor.

    The lab involved a fairly simple task: display numbers 0 through 30 on the console while suppressing the leading zero for numbers less than 10.
    For comparison, doing this in C fits comfortably on one line:
    for (int i = 0; i <= 30; i++) printf("Loop: %d\n", i);
    while in assembly my solution came out to more than thirty lines.
    Refer to this page to see the source code for my solution.

    Coding in assembly is hard, which is why it's almost never done any more outside of a few specific circumstances. I like to think that writing software in a high level language like python or C# is very abstract. You pull on some levers and things happen to the data you pass around.
    Coding in C feels more like dealing with a computer, you have to be careful with memory, but in general it's still similar to higher level languages.
    However, everything is different in assembly. You're zoomed in so far that you lose sight of the abstract and only concern yourself with instructions and registers. While I've always been aware than everything in a computer is numbers, assembly makes it much more obvious. Even printing out a number to the console requires you to pull out the ASCII table.

    If you looked through the source you'll find that my x86 version isn't complete as it does not suppress the leading zero. I've commented out lines 14 and 15 which caused some really bizarre things to happen. The jump just didn't work at all and the value of RAX would somehow be wiped out by the CMP and JL instructions. Needless to say, I gave up, googling around for any kind of help was very difficult.

    By contrast, the aarch64 assembly worked like a charm. There were no magic registers or side effects, you simply provide the input and destination, no problem.

    In conclusion, I've found the experience educational and a neat brain teaser. Aarch64 was a breeze and while I was frustrated with x86, I think that after some practice I could do much better.

    by Kirill (noreply@blogger.com) at December 18, 2015 07:31 AM


    Gaurav Patel

    Benchmarking MPICH package

    In this post I'm going to write about benchmarking MPICH software using the framework we designed in Software Portability and Optimization class, Here is the link to get to the framework: https://github.com/pk400/SPO600-Build-Framework. MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard, it's being used exclusively on nine of the top 10 supercomputers.

    I have spent a lot of time on getting this test framework run with my MPICH package and I faced many problems as well. First of all when I edited build plugin file to change CFLAGS to use compiler options selected in test framework and build my project the build fails and I didn't know why until the professor (Chris Tyler) looked at it and told me that exporting to CFLAGS variable is breaking the build. I looked into it and I wasn't sure why it was causing it to fail and I still don't know, I checked that the configure file and make file uses CFLAGS variable. I found the a way around by setting compiler options directly to the variable used for build but before that I unset the CFLAGS variable after calling configure command because configure command was adding compiler options to CFLAGS variable and in make command it uses CFLAGS variable to assign it into variable that has flags for building package which is MPICH_MAKE_CFLAGS. While I was figuring out this I found out that MPICH library was being build from CFLAGS and appended other compiler options in configure file as will. So by unsetting the variable CFLAGS after calling configure file I was able to build MPICH package and it's libraries with compiler flags chosen by the permutation, which I wanted.

    For the tesing reason I used -02 and -03 as trial run for my package. The results on x86_64 system is:

    Permutation columns: (uniqueId, givenId, flags, buildExit, buildWallTime, buildUserSystemTime, buildSize, testExit)
    Benchmark columns: (uniqueId, permutationUniqueId, givenId, speedScore, memoryScore)

    (1, 1, '-02', 0, 274.37, 703.28, 0, 0)
        (1, 1, 1, 702, 17016)
        (2, 1, 2, 702, 18736)
        (3, 1, 3, 704, 18796)
        (4, 1, 4, 701, 18796)

    (2, 2, '-03', 0, 224.73, 598.86, 0, 0)
        (5, 2, 1, 819, 18796)
        (6, 2, 2, 825, 18776)
        (7, 2, 3, 820, 18764)
        (8, 2, 4, 820, 18784)

    The results with test framework compiler options are:
    There was an issue with first group because it has all the options that stay on in all O-levels and options that are kept off in all levels (-01, -02, -03) and because of that there was no delimiter causing the permutation to fail after first group. So to get results from all the groups I had to run the framework with group one only and remove group one from config file so other groups are used for building package in different set of compiler options. In the benchmarking results I'll only specify group name because there are a lot of flags being used, to see what compiler options are being used go to: https://github.com/pk400/SPO600-Build-Framework and look at monkeys10k.config file.

    (1, 1, 'group1', 0, 256.45, 697.5, 0, 0)
        (1, 1, 1, 702, 18760)
        (2, 1, 2, 703, 18888)
        (3, 1, 3, 701, 18796)
        (4, 1, 4, 704, 18720)

    (1, 1, 'group2 permute: 1', 0, 487.48, 729.73, 0, 0)
        (1, 1, 1, 1373, 18804)
        (2, 1, 2, 1424, 18800)
        (3, 1, 3, 1405, 18780)
        (4, 1, 4, 1361, 7356)

    (2, 2, 'group2 permute: 2', 0, 470.55, 630.61, 0, 0)
        (5, 2, 1, 1248, 14856)
        (6, 2, 2, 1267, 16396)
        (7, 2, 3, 1268, 18768)
        (8, 2, 4, 1338, 17056)

    (1, 1, 'group3 permute: 1', 0, 550.99, 734.2, 0, 0)
        (1, 1, 1, 1555, 16392)
        (2, 1, 2, 1660, 14876)
        (3, 1, 3, 1659, 18700)
        (4, 1, 4, 1653, 18764)

    (2, 2, 'group3 permute: 2', 0, 435.94, 621.67, 0, 0)
        (5, 2, 1, 703, 18804)
        (6, 2, 2, 704, 18820)
        (7, 2, 3, 699, 18836)
        (8, 2, 4, 707, 18772)

    (1, 1, 'group4 permute: 1', 0, 368.79, 714.64, 0, 0)
        (1, 1, 1, 1292, 18776)
        (2, 1, 2, 1303, 18744)
        (3, 1, 3, 1266, 18816)
        (4, 1, 4, 767, 18776)

    (2, 2, 'group4 permute: 2', 0, 321.62, 616.49, 0, 0)
        (5, 2, 1, 1409, 18800)
        (6, 2, 2, 1328, 18768)
        (7, 2, 3, 1337, 17272)
        (8, 2, 4, 1302, 18812)

    This measures are a combination of time it takes to configure, make, run make check, run benchmark on package. In the results above the group 1 contains flags which are always on/off in all levels made fastest test in wall time but group 4's off options were fastest in user systems time but we don't care about user system time because we want to measure the time user/person (stop watch) not computer's cpu cycles or etc. This just proves that the MPICH program compiler faster on defaults without any additional compiler flags because the once that are set by default is faster in this case on x86_64 machine but I would say to benchmark it more deep I would like to add compiler options of other groups (not one) to group one, one by one, and see performance on MPICH software.

    by Gaurav (noreply@blogger.com) at December 18, 2015 07:16 AM


    Dmytro Yegorov

    Testing framework for apache

    During our course SPO600 we built a framework that is supposed to work with pretty much any linux package – it would install it, test, run and benchmark with different optimization options. At the end, the framework would provide us with results, so we would know how source code can be changed to improve performance. I decided to work on apache 2.4 package. As our testing package for the framework was gip, only few things needed to be changed in plugins, so framework would work with apache: build and benchmark plugin. For the build, we would need to consider apache’s dependencies, like apr and apr util. I decided to stick both directories in source archive, where they ‘re supposed to be during configure, so during configure step we would just say –with-included-apr – user would not need to worry about this. Another option we want to use with configure, would be –prefix=… , as we do not want user to use sudo rights, while running our program, so we would change our install directory to some tmp dir inside user’s home.

    build

    For the benchmark plugin it gets a little trickier. By default, apache runs on port 80 – this is reserved port and it might be used by another web-server that already exists on user’s machine. We need tell apache to start on random port that is not reserved (1000+). Moreover, we need to check is the port is open, as it can be used by another application (this logic can be added directly to plugin just to determine the port number). After we get our port number, we need to change apache’s httpd.conf file. The only way to do this I can think about, is to generate new httpd.conf with same settings as default one except the port number (“Listen” statement) and then replace default file (or just rewrite it with new data) with it. After we do this, we need to start apache and run apache benchmark (ab) to benchmark it and simulate some load and catch requests per second, and stop apache after we’re done. As we did not have enough time, I used benchmarking on port 80 (we had sudo access and there were no other web-services running on machines). If we had more time, extra logic with generating random port, checking if its taken (can be done with simple nmap command and then grep output), and pasting new conf file to httpd.conf could be and will be added during holidays, after the end of semester. For now, benchmarking plugin would look like this:

    bench

    By performing these tests, I could run our framework with apache and benchmark it with O1, O2, O3 options. Real config file with specific single options didn’t work with our framework, Probably, some of the options where causing build to crash.

    When we’ll fix config file, real test will be performed and results fetched. O1, O2 and O3 were used only for testing purposes.


    by dyegorov at December 18, 2015 05:56 AM


    Joel Aro

    Benchmarking SDL: Results

    In my previous blog post, Benchmarking Simple Direct Media Layer, I set out to benchmark the Simple Direct Media Library (SDL for short) package. I provided my simple benchmark source code that renders two images 2000 times, and alternates between the images. For the past few days I attempted to make the benchmark work with our Benchmarking Suite, which I will refer to as M10k (10,000 Monkeys, is the name we decided to use). I was able to get some results and in this post I will share those results for you.

    Xenon and AArchie

    Before I present the results, I will first talk briefly about the two systems I used for benchmarking SDL. The first is my own personal machine using the x86 architecture, let’s call it Xenon, and the other is a system supplied by the school running with an AArch64 architecture, named AArchie. We were also supplied with two other systems: Xerxes(x86) and Betty(AArch64), but I only used two just to see how the benchmark performs on the two different architectures. While the benchmarks were running there were no other processes running, and no users other than myself using the systems.

    Problems During Benchmarking

    M10k as it currently stands, only works for the package, gzip. The plugins found in the test-data folder have been specifically programmed to work with and only with the gzip package. To allow M10k to work with the SDL package, there are a few changes that need to be done.

    1. Moving over the monkeys10k.config file from the root folder to the test-data directory.
    2. Replace the gzip build, test, and benchmark plugins with the custom SDL plugins.
    3. Add the benchmark source code to test-data.

    Simple enough right? Well, to be able to do the above steps, you would first have to know how M10k works, as well as knowing a couple Linux commands, a scripting language, and how to build a package in general. I ran into a couple problems when trying to make M10k work with SDL, and I will share some with you:

    The first being that SDL does not provide a testing plugin the same way that gzip does. For gzip, you simply have to call make check and it will instantly perform the tests for you. In the case of SDL, the tests are located in a directory called tests, and to be able to use the tests, you would need to build the tests using a ./configure and make command.

    I then ran into a big problem, which I should have been aware of when I picked the package: AArchie does not have a graphical interface at all. Which is a problem because one of SDL’s main uses is to provide developers an easy to use library for graphics, audio, etc. I spent quite a bit of time wondering how I would benchmark a graphics library without a graphical interface. I was then informed that I can still run my benchmarks but I would have to use a few linux tools. Xvfb and xvfb-run are two commands that provide a dummy graphical interface that the system can use. Nothing will be displayed when using these commands, but they will still be processed. First I had to run:

    Xvfb -ac :21 &

    which will run a Xvfb session running on port 21 and put it in the background. Once the Xvfb session is up and running I can use:

    xvfb-run -n 21 ./run_benchmarks

    to perform the benchmarks on the Xvfb port 21.

    The last issue I encountered, was that the monkeys10k.config in the root directory did not contain a comma (,) for group1. This took me a while to find out the reason why the build was failing, but for some reason I could not build the package without the comma somewhere in the group. I found this was the case when I swapped the monkeys10k.config with another version called monkeys10k-small.config, also contained in the root directory, and found that the package built successfully. I suspect that the reason for this is located in our permutation source code. The permutation looks for delimiters in each group, and I am guessing that it could not find one in group1, so it caused a conflict in the other sections of M10k.

    Results

    Finally, I will share the results from the benchmarks. M10k puts the results of the benchmarks into a database which can then be queried. I compiled the results into a spreadsheet, but for certain reasons I cannot share the results for the AArch64 machines, and will have to describe the results relative to Xenon. Instead, the x86 results can be found here.

    Unforunately, M10k failed after the 16th permutation, and due to time constraints, I am not able to fix this problem. So given the results on the spreadsheet:

    Build Times

    Xenon built SDL between 15-29 seconds, wall time, and 5-65 seconds, user-system time.

    The fastest built time at 15.27 seconds going to the 15th permutation:

    -fno-sched-spec-load-dangerous -fno-sched-stalled-insns -fsched-stalled-insns-dep -fno-sched2-use-superblocks -fno-schedule-insns -fno-section-anchors -fno-sel-sched-pipelining -fno-sel-sched-pipelining-outer-loops -fno-sel-sched-reschedule-pipelined -fno-selective-scheduling -fno-selective-scheduling2 -fno-short-double -fshort-enums -fno-short-wchar -fno-signaling-nans -fsigned-zeros -fno-single-precision-constant -fsplit-ivs-in-unroller -fno-strict-enums -fno-threadsafe-statics -ftoplevel-reorder -ftrapping-math -fno-trapv -ftree-coalesce-vars -ftree-cselim -ftree-forwprop -fno-tree-loop-distribution -ftree-loop-if-convert -fno-tree-loop-if-convert-stores -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize -fno-tree-lrs -ftree-phiprop -ftree-reassoc -ftree-scev-cprop -fno-tree-vectorize -funit-at-a-time -fno-unroll-all-loops -fno-unroll-loops -fno-unsafe-loop-optimizations -fno-unsafe-math-optimizations -fno-unwind-tables -fvar-tracking -fvar-tracking-assignments -fno-var-tracking-assignments-toggle -fno-var-tracking-uninit -fno-variable-expansion-in-unroller -fno-vpt -fweb -fno-whole-program -fno-wrapv -fno-align-functions -fno-align-jumps -fno-align-labels -fno-align-loops -fno-combine-stack-adjustments -fno-inline-functions -fno-inline-functions-called-once -fno-inline-small-functions -fno-reorder-blocks -fno-reorder-blocks-and-partition -fno-reorder-functions -fno-ipa-cp -fno-ipa-cp-clone -fno-ipa-profile -fno-ipa-pure-const -fno-ipa-reference -fno-ipa-sra -fno-move-loop-invariants -fno-reorder-blocks -fno-reorder-blocks-and-partition -fno-reorder-functions -fno-thread-jumps -fno-unswitch-loops -ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-fre -ftree-loop-distribute-patterns -ftree-partial-pre -ftree-pre -ftree-pta -ftree-sink -ftree-slp-vectorize -ftree-slsr -ftree-sra -ftree-switch-conversion -ftree-tail-merge -ftree-ter -ftree-vrp -fguess-branch-probability -fhoist-adjacent-loads -fbranch-count-reg -fcaller-saves -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps -fdefer-pop -fdevirtualize -fexpensive-optimizations -fforward-propagate -fgcse -fgcse-after-reload -fif-conversion -fif-conversion2 -fmerge-constants -foptimize-sibling-calls -foptimize-strlen -fpeephole2 -fpredictive-commoning -frerun-cse-after-loop -fschedule-insns2 -fshrink-wrap -fsplit-wide-types -fstrict-aliasing

    On AArchie, permutations 9, 10, 11, 12, 13, 15, 16 were very close, almost a 1% difference between them. While permutations 1-8, and 14 showed a variety of changes, though none of them were faster than the other permutations, the gap between them ranged from approximately 20%-80% difference.

    Benchmark Performance

    When looking at the benchmark performances, overall Xenon outperformed AArchie. The average speed score for Xenon was about 4 to 4.5 seconds. While the memory score was between 16,000 and 21,000.

    The fastest in terms of speed going to the 4th permutation’s 2nd run, with a 4.360379604 second runtime.

    The largest in terms of memory score going to the 3rd permutation’s 3rd run, with a 21,148 total memory score.

    On AArchie, the difference in speed scores between the benchmarks ranged from 1-1.2 seconds. While it’s memory scores were very stable. For most of the benchmarks it remained a constant memory score, except for 12 benchmark runs where it increased by about 64 points.

    Conclusion

    To compare, in my previous post I posted these two images:

    1 2

    Ignoring the Xerxes side of the images, AArchie configured in about 16 seconds and was built in 2 seconds. The combination of which is significantly smaller than the compiler flags that M10k tested. From the benchmarks I can see that the fastest build time on AArchie is 400% slower than what SDL is currently built with. This means that based on the results from the benchmark, SDL is currently built using the most efficient compiler flags for the AArch64 architecture.


    by joelaro at December 18, 2015 04:32 AM

    December 17, 2015


    Ramanan Manokaran

    Lab 4 assembly language

    The required project for this lab was to write an assembly program that will print to the console from 1 to 30

     

    The following image shows how assembly code is written for an x86 machine on the left and Aarch64 on the right.

    assembly

    Code belongs to Chris Tyler

    The differences

    The x64 assembly languages use Registers that start with R letter leading which registrar is accessed

    For Aarch64 we used x to indicated we are working with a 64 bit wide access

    The instruction set for each one is reversed: for example the compare of max value and loop counter to indicate if we are done

    • X 86:       cmpq $max, %r15
    • Aarch:    cmp 64: x3, max

    The symbol also appear to be different for x86 we used notzero and for aarchie we labeled it writeit

     


    by ramanan136 at December 17, 2015 10:07 PM

    December 16, 2015


    Donald Nguyen

    The Final Stages: Benchmarking TCL


    Continuing work with TCL involved building, bench-marking the package. Using the framework that our class developed together, we were able to build, test, and benchmark different options quickly. The framework works by reading a file with a list of compiler options, makes permutations out of those options and attempts to build the package with the permutations. For every successful build, the framework will run a test and benchmark.

    The compiler options I've used were O1, O2 and O3. Be default, the TCL package compiles with O3, and I wanted to see if using less options will affect performance in any way.

    For the test, I've used the tests that is included in the TCL package. The tests in TCL has many short scripts that ensure the functionally of the package. For a programming language, I’d imagine that's very important.

    CFLAG
    Wall
    UserSystem
    O1
    87.01
    141.86
    O2
    78.70
    75.52
    O3
    78.02
    75.57

    These measures are the combination of the time it takes to configure and make the package multiplied by the number of runs. So if we take the 01 wall time, and divide it by four, we get an average build time per run. The results here do not surprise me. Building with O2 and O3 will build faster, at the cost of other resources not measured at this stage. The difference between O2 and O3 are negligible.  
    Run
    CFLAG
    SpeedScore
    MemoryScore
    1
    O1
    31637
    380224
    2

    31273
    380160
    3

    30998
    380224
    4

    31428
    380160
    1
    O2
    31859
    380224
    2

    31222
    380224
    3

    31070
    380224
    4

    31203
    380224
    1
    O3
    31254
    380224
    2

    31338
    380224
    3

    31134
    380288
    4

    31499
    380288

    The results were given in seconds times 100, so 31637 is more like 316 seconds.  During the first run of the O1 and O2 flags, the scores were high compared to the subsequent runs. The O3 runs didn't follow that pattern, however the cause can vary from fluctuating system activity to an option in the O3 group. Regardless, the difference between the fastest and slowest times are only 10 seconds apart. I did not observe any significant improvement over the optimization groups.

    The purpose of the benchmark is to determine whether the options provided by the package are the best ones. The default CFLAG for TCL is O3, the group has a set of options and extends the O1 and O2 flags. Based on these results, the default options should be used. However, since TCL is a programming language, its use case is wider than packages like gzip. How do you measure performance of a tool that has use cases as wide as languages like C and C++?

    by Donald Nguyen (noreply@blogger.com) at December 16, 2015 07:50 AM

    A Simple Look at Vectorization with GCC


    When I was first introduced to vectorization in class, I got the idea that it's like multi-threading, where two or more processes can be assigned to a thread and can be executed during the same cycle. I had to refer to Wikipedia to get a more precise understanding on where the compiler can transform a part of code to operate as a vector instead of in scalar fashion. In that time, I also used a lot of my time to play citation needed with myself with a handful of terms that were new to me.

    Anyways on an Aarch64 system, I used this guide to a simple program to observe vectorization. 


    /*
    *Purpose: To observe GCC vertorization optimization options
    *
    * */

    #include <stdio.h>
    #include <math.h>

    #define SIZE (1L << 16)

    void foo(double * restrict a, double * restrict b)
    {
    int i;

    /*
    The use of restrict and __builtin_assume_aligned eliminates
    extra intructions that checks the array for overlap, and gives hints
    to the compiler that these arrays are 16-bit aligned. By hinting to the compiler
    that these arrays are all aligned, it will not check if they are not.
    */
    double *x = __builtin_assume_aligned(a, 16);
    double *y = __builtin_assume_aligned(b, 16);
    //Run to observe vectorization
    for(i=0; i < SIZE; i++)
    {
    x[i] += y[i];
    }
    }

    int main(void)
    {
    double a=3.0;
    double b=2.0;
    foo(&a,&b);
    }
    For this example, I've compiled with the following command:

    gcc -std=c99 -O3 -o l6 l6.c

     -std=c99 - allows the use of the restrict keyword
    -O3 - optimization group that contains options pertaining to vectorization.

    Here is the objdump of the function I wanted to analyze:

    0000000000400588 <foo>:
    400588: 91420002 add x2, x0, #0x80, lsl #12
    40058c: 4c407c00 ld1 {v0.2d}, [x0]
    400590: 4cdf7c21 ld1 {v1.2d}, [x1], #16
    400594: 4e61d400 fadd v0.2d, v0.2d, v1.2d
    400598: 4c9f7c00 st1 {v0.2d}, [x0], #16
    40059c: eb02001f cmp x0, x2
    4005a0: 54ffff61 b.ne 40058c <foo+0x4>
    4005a4: d65f03c0 ret

    Looking at the objdump, the instructions new to me are ld1 and st1. According to the ARMsoftware development tool guide, ld1 is the vector instruction to load a single element structure to one lane of one register and st1 is similar, but to store the single element structure. I can see the process of vectorization occurring within the two ld1 instructions.

    While doing this lab, I can't help but be reminded of the time I wrote a program in C++ that used multithreading techniques. With this simple experiment, without measuring performance, I can see how techniques to hint the compiler to perform vectorization can be useful for more complex programs. Vectorization, and Parallel Computing, is definitely in my area of interest now.  

    Source:
    http://zenit.senecac.on.ca/wiki/index.php/SPO600_Vectorization_Lab 
    http://locklessinc.com/articles/vectorize/
    https://en.wikipedia.org/wiki/Automatic_vectorization
    http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/LD1_advsimd_sngl_vector.html

    by Donald Nguyen (noreply@blogger.com) at December 16, 2015 07:50 AM