Planet CDOT

July 30, 2015

Anderson Malagutti

MAMP sysvshm module – Install PHP modules on MAMP

Hello everybody,

I’ve been developing some PHP applications, and as a Mac user I’ve been using MAMP on my development environment. MAMP has worked just fine until I had to use the System V Shared Memory on my PHP code.

I couldn’t run my PHP code because basically it required some PHP modules (sysvshm, sysvsem and pcntl), and MAMP by default does not include them. Therefore, my solution was to install these modules myself.
I thought it might help someone else, so I will give a little step by step on how you can add your own  module into your MAMP.

Let’s start:

You will have to have Homebrew and autoconf installed as well as Xcode (what you probably already have).
You can easily install homebrew by the command provide on its website:

ruby -e "$(curl -fsSL"

With homebrew working on your system, you can install autoconf easily with the command:

 brew install autoconf

As I said earlier MAMP does not include the PHP modules that you need, so the best solution for now it’s download the PHP version that you need from the PHP official website.
I will use the php5.6.10 as an example. I believe it should work on other versions as well.

After you have downloaded the PHP source code into your mac, just move it to:

mv your-new-php-directory /Applications/MAMP/bin/php/php5.6.10/include/

Note: You may not have the “include” folder, it’s not a problem you can just create this new directory running this command: 

mkdir /Applications/MAMP/bin/php/php5.6.10/include/

After you have copied the original source code in the correct location, you should rename the folder (from phpX.X.X to php), then your path will look like this:

mv /Applications/MAMP/bin/php/php5.6.10/include/php5.6.10 /Applications/MAMP/bin/php/php5.6.10/include/php

Now we will finally start to have some fun :)

Let’s configure the original source code on our system.

 cd /Applications/MAMP/bin/php/php5.6.10/include/php


Then go to the folder of the module that you need (in this case the module is sysvshm):

 cd /Applications/MAMP/bin/php/php5.6.10/include/php/ext/sysvshm

Let’s compile our module with the MAMP’s phpize, inside your modules’ directory run:


After that, let’s configure and make :)

./configure && make

Copy the new module to your MAMP directory:

cp /Applications/MAMP/bin/php/php5.6.10/include/php/ext/sysvshm/modules/ /Applications/MAMP/bin/php/php5.6.10/lib/php/extensions/no-debug-non-zts-20131226/

Then on your php.ini (/Applications/MAMP/bin/php/php5.6.10/conf/php.ini) file you should include the module wanted:

Just start PHP again and have fun :)

I’ve compiled some other modules (sysvsem and pcntl) using this same method, and it has worked for me as well!
I have used MAMP free version 3.3 on OS X 10.10.4 (Yosemite)

Thanks! Hope it has helped you.

by andersoncdot at July 30, 2015 09:48 PM

Barbara deGraaf

Callback hell and promises

While creating the GUI for my project I ran into a tiny bit of a callback hell, so I will talk a little about promises and how they are useful.

While making the GUI I used jQuery for to populate the camera and lenses select boxes. The main thing about jQuery that is important for this post is that it is async; this means that the rest of the code block will run while the jQuery request is being made. In my code I have the select boxes populate and change appropriately, in order to do this I needed to use callbacks to manage this async nature.

This lead to callback hell where my callbacks needed to be nested and I ended up with the “triangle of doom.” Callback hell is suck a widely know occurrence with async javascript that there is even a website devoted just to callback hell.

There was two things I did to clean up the code a bit which included promises and making modules.

First a bit about promises. There is a really good webpage that a coworker directed me to that I will post here.   It’s worth pointing out that jQuery actually has “Deferreds” these are actually a little different than promises but there is a way to cast to javascript promises with;

var thisPromise = Promise.resolve($.ajax('jsonFile.json'));

You can go to jQuery’s website to view more about Deferreds and how they can be used. For promises they can be chained together to make a sort of queue that the statements run in. This can be done with .then, I will so an basic example to make it easier to understand.

.then(function(data0) {
  //do what you want here
  return anotherasycncall(data0);
.then(function(data1) {
  //do more
  //can return a third async call here
  //catch any error that occured

The second thing that I did to make the code easier to read is to modularize the two parts that are async. Basically what was done is have lens and camera select separate modules

privateMethods.LensSelect = function (lensfolder) { 
//the async code here with promises 

And then in the method where the GUI is being made call the above method like so

var lensfolder = this.gui.addFolder("Lens");, lensfolder);

Using these two concepts; promises and modules my code has less chance of being spaghettied.

For the future the GUI is going to made in a different way using bootstrap so look forward to that.



by barbaradegraafsoftware at July 30, 2015 06:20 PM

July 29, 2015

LEAP Project

Installations – Install Trees and Composing: Lorax

Greetings once again fellow LEAP users, continuing on with the Installation series, this time I’ll be speaking about the composing process. Composing refers to the creation of the image files, EFI bootloader files and other assorted bits that are pieces of our install-tree. One can then use the install-tree to install the distribution. The tool we use to do this is called Lorax. Lorax is a tool for creating anaconda install images (which are used by the Anaconda installer, a graphical system installer used in Fedora and also leveraged by us in LEAP). Today I will do a brief overview of install-trees and how to use this tool.

To have an idea of what a completed install-tree is, we can quickly take a look at our official LEAP install-tree. The pieces that are generated by Lorax are the EFI, LiveOS and images directories (it also creates a couple files .treeinfo and .discinfo which hold some information about the tree and disc).

The EFI directory (which only has a BOOT directory) contains files that are mainly involved with the bootloader portion of an installation:

  • BOOTAA64.EFI – An EFI executable file that handles the booting process. Files named BOOTX.EFI are typically for booting non permanently installed operating systems ie hot pluggable media such as a live images and other installation media.
  • MokManager.EFI – An EFI executable file that manages Machine Owner Keys which is involved with Secure Boot (We don’t currently make use of Secure Boot on LEAP)
  • grub.cfg –  A grub configuration file that specifies what the grub menu consists of (In LEAP’s case it’s less of a menu and more of a specification of the boot for a PXE context)
  • grubaa64.efi – A EFI executable file much like the BOOTAA64.EFI that handles the booting process. Unlike the BOOTX.EFI files these ones are more usually used for booting permanently installed operating systems.
  • Fonts – As it implies it’s a font.

The LiveOS directory containsly only one file which is squashfs.img which is the image that contains the filesystem used for LiveOS.

Finally the images directories contain the images that one will be used to boot the system:

  • boot.iso – An image containing a minimal boot of the operating system.
  • efiboot.img – A UEFI boot image which really contains what was in the above EFI directory.
  • pxeboot – A directory that contains the vmlinuz and initrd.img. vmlinuz is the an executable file that contains the kernel for the distribution and initrd.img is the initial ramdisk for the system.

Now that we have an idea of what Lorax generates let’s move on to using Lorax.


There’s a few things to note when using Lorax:

  1. You need have root access or run Lorax with sudo.
  2. selinux needs to be disabled.
  3. Lorax 22 is recommended.

Using Lorax

An excerpt from the Lorax manpage:

-p STRING, --product=STRING
  product name

-v STRING, --version=STRING
  version identifier

-r STRING, --release=STRING
  release information

  source repository (may be listed multiple times)

A sample of using lorax could be:

sudo lorax --product=LEAP --version=7 --release=lp7 --source=

The above command will then use the source repository to generate an the EFI, LiveOS and images directories with their files using the product, version and release information supplied.

Some additional optional arguments that one might find rather useful are:

  • –excludepkgs=STRING – This option can be used to exclude specific package globs from the compose. An example would be –excludepkgs=fedora-productimg-workstation which would exclude packages that are part of the set that comprises what makes up the fedora workstation package set.
  • –isfinal – This option signifies that the compose that is being created will be for a stable, release ready distribution. Having this option will create a isfinal environment variable in the buildstamp (which is located in the initrd.img). This variable is then checked by the Anaconda installer to see if this is a stable release. If it isn’t a stable release then Anaconda installer will have various “This is pre-release software, install at your own risk” type of messages and graphics.
  • –buildaarch=STRING – This will set the architecture that this installation is for.

Hopefully that brief overview was informative for those unfamiliar with the composing process that is a step in creating the install tree which is used to install distributions like LEAP. While Lorax is the tool that we used to compose, there are a variety of others that also facilitate this process. Largely the process is rather simple but next time I’ll cover some additional aspects that involve composing when it comes to a testing context.

Until next time, happy leaping everyone. If there are any questions or comments please hit us up on irc or in the below comments sections.

by hzhuang3 at July 29, 2015 09:34 PM

July 28, 2015

LEAP Project

Installation Tarball fixed

Greetings again to all you LEAP users,

I recently discovered that there was a small typo in the grub.cfg file in the installation tarball. The exact typo was:

initrd (tftp)/leap/EFI/leap/pxeboot/initrd.img

This would have caused the installations to fail as the pxeboot portion does not exist in the directory structure. One can fix this through just the removal of that pxeboot part of the line:

initrd (tftp)/leap/EFI/leap/initrd.img

or you may download the updated installation tarball.

I hope this issue didn’t cause too much grief. Apologies once again and hopefully this allows for smooth installations from here on out. If any other issues crop up please drop by the irc channel at: irc:// or our bugzilla and let us know what’s going.

by hzhuang3 at July 28, 2015 06:12 PM

July 27, 2015

Anderson Malagutti

JAVA – SSL exception Could not generate DH pair

Hello everybody.

It’s been a long time since I posted something here…

Today I had a little trouble with a SSL exception. It was driving me crazy! hahaha I wish I had a post about it somewhere, then I decided to publish one myself… Hopefully I might help someone else having problems with it. :)

Short history: after playing with Titanium Studio for a little while on Mac OS 10.10.4 (Yosemite), the IDE stopped to work, and I was no longer able to login and make changes on my code.

I decided to try to launch the IDE trough the command line, then the SSL exception appeared…

Solution: Download these two jars:

1. bcprov-jdk15on-152.jar

2. bcprov-ext-jdk15on-152.jar

After you have these files on your computer, you simply have to move them into: YOUR_JAVA_HOME/lib/ext/

Also, edit the file located in: YOUR_JAVA_HOME/lib/security/

You should add: security.provider.1=org.bouncycastle.jce.provider.BouncyCastleProvider close to the other security providers.

You might want to replace the “security.provider.1″ for “security.provider.X” where X is another number, as it may cause you some problems… 

Hope it helps somehow. :)

by andersoncdot at July 27, 2015 08:18 PM

Anna Fatsevych

July 24, 2015

Ali Al Dallal

Localizing React app using React-Router with React-Intl

In the past, when we do our localization work we'll usually write our own library to do localization, and one of the example is node-webmaker-i18n.

Recently, I had to localize our React application for Webmaker and this time our team thought we might change our approach and let's not write our own, but use something that's already out there, so I did a research about different libraries for internationalization for React application and ended up using React-Intl by Yahoo.

Obviously, there are some good reason why we didn't write our own this time and why we ended up using React-Intl instead of other options. One of the reasons we didn't write our own this time simply because we don't want to reinvent the wheel and helping the community by contributing to the existing library is also a good idea. I find that React-Intl got a lot in common in term of their needs in the library that they've wrote, and they are very responsive and helpful when we had problem using their library.

Now, let's get started on how to integrate React-Intl with React app that's using React-Router for handling routes.

NOTE: We're using Webpack.js to handle our modules bundling.


import React from 'react';  
import Router from 'react-router';  
import routes from './routes.jsx';  
import messages from './messages';

var locale = navigator.language.split('-')  
locale = locale[1] ? `${locale[0]}-${locale[1].toUpperCase()}` : navigator.language

var strings = messages[locale] ? messages[locale] : messages['en-US']  
strings = Object.assign(messages['en-US'], strings);

var intlData = {  
    locales : ['en-US'],
    messages: strings
};, Router.HistoryLocation, function (Handler, state) {  
  React.render(<Handler {...intlData} />, document.querySelector("#my-app"));


import React from 'react';  
import { Route } from 'react-router';

var routes = (  
    <Route name="home" path="/" handler={require('./home.jsx')} />

module.exports = routes;  


// This is essentially bulk require
var req = require.context('../locales', true, /\.json.*$/);  
var exports = {};

req.keys().forEach(function (file) {  
  var locale = file.replace('./', '').replace('.json', '');
  exports[locale] = req(file);

module.exports = exports;

Just to explain a bit what's going on here in each file.

  1. messages.js is basically just to pre-load all the locale files, so that you don't have compile time error with webpack.

  2. routes.jsx this file is pretty straightforward since it's just the normal way of declaring your routes in react-router.

  3. entry.jsx this is where it gets a bit complicated. First thing first let's talk about this line

var locale = navigator.language.split('-')  
locale = locale[1] ? `${locale[0]}-${locale[1].toUpperCase()}` : navigator.language  

This basically just extracting the language code from the browser using navigator.language then we rewrite the string to match what we have in our dictionary that was stored in messages.js file. The reason I have to do toUpperCase() here because Safari will return en-us where as Firefox and Chrome will return en-US.

var strings = messages[locale] ? messages[locale] : messages['en-US'];  

This one is pretty simple since we are just trying to retrieve the strings from our dictionary and if we can't find that locale then just fallback to en-US.

strings = Object.assign(messages['en-US'], strings);  

Sometimes we will include a language with partial translation
and we need to make sure the object that we pass to intlData
contains all keys based on the en-US messages otherwise React-intl will throw.

Now, let's look at home.jsx where we will use React-Intl to dynamically change the string based on our language.


import React from 'react';

export class render extends React.Component {  
  mixins: [require('react-intl').IntlMixin],
  render() {
    return (


  'hello_world': 'Hello World'


  'hello_world': 'สวัสดีชาวโลก'

So, I think that's it! We are now fully localized our React app using React-Intl :)
If you find any problem in my code or have any question don't forget to leave a comment down below or tweet me @alicoding and I'm happy to answer :)

by Ali Al Dallal at July 24, 2015 04:19 PM

July 23, 2015

LEAP Project

Installations – Kickstart Options: Users, LVM, RAID

Greetings to potential or current users of LEAP. My name is Michael and on the LEAP team I was working on various pieces of testing and in particular getting the installer for our distribution working. Today I’d like to start a series of short talks about the various aspects of installing LEAP. The topic this time is Kickstart options. They can help to enhance your automated installations for those who are new to the whole thing (much like I was when I first joined the project).

When installing LEAP one has the option of using the graphical VNC installer or the text mode installer. While there is a certain allure to being able to see more than a terminal window when doing an installation it will likely become a cumbersome activity if there are a number of systems to deploy LEAP to or if re-installations are often occurrences. Kickstart installations allow for all the installation configurations to be set in a Kickstart file which is then used to bypass the configuration portion of the installation for the user. For our release we have provided a rather simple and generic Kickstart file for users to use but it doesn’t make use of certain options that might be useful. Let’s look at a few of those.

User Creation

The default Kickstart does not have any users created but should one want to create one it is rather simple. Simply use the ‘user’ option:

user --name=leapuser --groups=wheel --password=leapfrog

Here we create the leapuser account and add it to the wheel group and give it a password of leapfrog. There are additional flags one can use with the user option such as the –homedir= flag to set the default home directory for that user (otherwise /home/ is used).

LVM Partitioning

The portion of the default Kickstart file that is relevant to storage partitioning is as follows:

bootloader --location=mbr --boot-drive=sda
clearpart --all --initlabel 
part  /boot/efi --fstype efi --ondisk sda --size 200
part  /boot --fstype xfs --ondisk sda --size 200
part  swap --ondisk sda --size 4000
part  / --fstype xfs --ondisk sda --size 10000 --label=root --grow

At the first line we state that we want to install the bootloader into the sda drive at the master boot record. zerombr refers to initializing any disks whose formatting is unrecognized by the installer, that is to say those disks will be wiped clean. The clearpart option removes partitions from the system prior to creating new partitions. We remove all partitions in our kickstart with the –all flag and initialize the disks with –initlabel. Following those instructions we specify the new partitions we want to create on the system with flags that decide their file system type, label, size and disk to be created on. As can be seen, it is a rather simplistic file system that is done with standard partitioning and it takes up all of the sda disk.

To make use of LVM partitioning in the Kickstart we make use of the volgroup and logvol options:

bootloader --location=mbr --boot-drive=sda
clearpart --all --initlabel
part  /boot/efi --fstype=efi  --ondisk=sda  --size=200
part  /boot   --fstype=xfs --ondisk sda  --size=200
part pv.01 --size=1000 --grow --ondisk=sda
volgroup rootvg01 pv.01
logvol / --vgname=rootvg01 --name=lv_root --fstype=ext4 --size=1024 --grow
logvol swap --vgname=rootvg01 --name=lv_swap --size=4000

Up to the /boot partition it is still the same as before. After that point we create a partition which will be used for LVM (these partitions all start with the pv prefix). We then create a volume group on top of that partition with the volgroup command. Then we can start creating as many logical volumes on the volume group as we like. The syntax is largely similar to the existing lines with the part command with two additional flags that refer to the volume group name and the name give to the logical volume being created.


Setting up a RAID with the Kickstart options is also relatively easy and it follows a similar process as with the previous partitioning schemes:

part raid.11 --fstype=raid --size=200 --ondisk=sda
part raid.12 --fstype=raid --size=200 --ondisk=sda
part raid.13 --fstype=raid --size=10000 --ondisk=sda
part raid.21 --fstype=raid --size=200 --ondisk=sdb
part raid.22 --fstype=raid --size=200 --ondisk=sdb
part raid.23 --fstype=raid --size=10000 --ondisk=sdb
raid /boot/efi --fstype=efi --device=md0 --level=1 raid.11 raid.21
raid /boot --fstype=ext4 --device=md1 --level=1 raid.12 raid.22
raid pv.01 --device=md2 --level=1 raid.13 raid.23
volgroup sysvg pv.01
logvol / --vgname=sysvg --name=lv_root --fstype=ext4 --size=8000 --grow
logvol swap --vgname=sysvg --name=lv_swap --grow --size=1024 --maxsize=2048

We once again create partitions that will be used for the RAID. For this set up we’ll be doing a RAID 1 with two disks sda and sdb. Using the part command with the raid prefix will create partitions that will be used for RAID and the fstype is also raid. Following the creation of these partitions we use the raid command to create the raid device. The command is again similar in syntax to the part command except with the –device flag that specifies the name of the device, the level of the RAID and the partitions that make up the device. We create a device for the EFI partition, the boot partition and also a physical volume for LVM. Then we do a similar set up as the previous LVM example.

Hopefully that was a somewhat informative post about how you can do common place configurations such as user creation, LVM and RAID partitioning with Kickstart options. This is only a fraction of what’s available and for further reading please refer to the github page for pykickstart which outlines all the options as well as a comprehensive overview on how Kickstart installations work.

Happy installations everyone and we’ll see you next time.

by hzhuang3 at July 23, 2015 05:31 PM

Barbara deGraaf

A short post on debugging three.js shaders

Just a very small update on how it was to create the shader I made before and how I went about debugging it.

Debugging shaders is known to be very hard and there are a couple of ways to debug your code. The first way is to set your values to a gl_fragcolor and compare the output of your texture with the values you want.

There is some software that people have released in order to debug webgl that I didn’t use but may be useful for some people. One of them being Web Gl inspector. As well there is a Firefox Web Gl shader edition This allows you to edit the shader code in realtime and mouse over to see it’s effect in the scene. And if you prefer chrome there is a some Chrome canvas inspection dev tools that allow you to capture frames and debug code as well.

Of course if you don’t feel like downloading something or using a different browser you could do some rubber duck debugging but I would strongly recommend using the programs or techniques above.

That’s all for now, stay tuned for a brief post on promises.



by barbaradegraafsoftware at July 23, 2015 04:39 PM

July 22, 2015

Armen Zambrano G. (armenzg)

Few mozci releases + reduced memory usage

While I was away adusca released few releases of mozci.

From the latest release I want to highlight that we're replacing the standard json library for ijson since it solves some memory leak issues we were facing for pulse_actions (bug 1186232).

This was important to fix since our Heroku instance for pulse_actions has an upper limit of 1GB of RAM.

Here are the release notes and the highlights of them:

  • 0.9.0 - Re-factored and cleaned-up part of the modules to help external consumers
  • 0.10.0:
    • --existing-only flag prevents triggering builds that are needed to trigger test jobs
    • Support for pulse_actions functionality
  • 0.10.1 - Fixed KeyError when querying for the request_id
  • 0.11.0 - Added support for using ijson to load information, which decreases our memory usage

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

by Armen Zambrano G. ( at July 22, 2015 02:21 PM

July 20, 2015

Anna Fatsevych

Inside pHash

I had written in my previous post about the functions that make up the pHash. This time, I have compiled my own program (that runs the functions from pHash and saves the results as separate images) to visually see the change process of the image in hopes of writing the JavaScript implementation.

First, you will need CImg library that can be downloaded from source here, or

sudo apt-get install cimg 
sudo apt-get install imagemagick // needed to work with the images the images

To compile, just run this command (from official CImg doc) for linux

g++ -o myprogram.exe myprogram.cpp -O2 -L/usr/X11R6/lib -lm -lpthread -lX11 

Here is the c++ code and, respectively, the result:

    CImg meanfilter(7,7,1,1,1);
    CImg img;
    if (src.spectrum() == 3){
        img = src.RGBtoYCbCr().channel(0).get_convolve(meanfilter);
    } else if (src.spectrum() == 4){
    int width = img.width();
        int height = img.height();
        int depth = img.depth();
    img = src.crop(0,0,0,0,width-1,height-1,depth-1,2).RGBtoYCbCr().channel(0).get_convolve(meanfilter);
    } else {
    img =;

Here is the result after the greyscale and convolve filters are applied:


The rest of the code here will resize the image, then take a run of the pixels and hash them to 1 or 0 bits depending on the mean value.

    CImg *C  = ph_dct_matrix(32);
    CImg Ctransp = C->get_transpose();
    CImg dctImage = (*C)*img*Ctransp;
    CImg subsec = dctImage.crop(1,1,8,8).unroll('x');;
    float median = subsec.median();
    ulong64 one = 0x0000000000000001;
    hash = 0x0000000000000000;
    for (int i=0;i< 64;i++){
    float current = subsec(i);
        if (current > median)
        hash |= one;
    one = one << 1;

Subsequently here are the images:

Resize (32x32 pixels): hash

dctHash (32x32 pixels): dctimg

Final sub-section upon which the dct hash is based (64x1 pixels) : subsec

This is the visualization of the dct pHash process.



by anna at July 20, 2015 09:18 PM

July 17, 2015

Anna Fatsevych

pHash in JavaScript

I have been taking a closer look at the pHash dct_hash algorithm in order to recreate it on the client side. I have also looked into the possibilities of running compiled C++ programs on the client side with such tools like Native Client and Emscripten.

I decided to write my own JavaScript pHash implementation, as the pHash.js that I have been previously testing on has continuously produced a large hamming distance as a result and is not identical to the pHash dct hash algorithm – I have contacted the author, but in the meantime I will be writing my own.

Here is the main ph_dct_imagehash function that hashes the images:

int ph_dct_imagehash(const char* file,ulong64 &hash){

    if (!file){
    return -1;
    CImg src;
    try {
    } catch (CImgIOException ex){
    return -1;
    CImg meanfilter(7,7,1,1,1);
    CImg img;
    if (src.spectrum() == 3){
        img = src.RGBtoYCbCr().channel(0).get_convolve(meanfilter);
    } else if (src.spectrum() == 4){
    int width = img.width();
        int height = img.height();
        int depth = img.depth();
    img = src.crop(0,0,0,0,width-1,height-1,depth-1,2).RGBtoYCbCr().channel(0).get_convolve(meanfilter);
    } else {
    img =;

    CImg *C  = ph_dct_matrix(32);
    CImg Ctransp = C->get_transpose();

    CImg dctImage = (*C)*img*Ctransp;

    CImg subsec = dctImage.crop(1,1,8,8).unroll('x');;
    float median = subsec.median();
    ulong64 one = 0x0000000000000001;
    hash = 0x0000000000000000;
    for (int i=0;i< 64;i++){
    float current = subsec(i);
        if (current > median)
        hash |= one;
    one = one << 1;
    delete C;

    return 0;

pHash c++ code seems not so large at a first glance, but it also incorporates CImg Library, in particular the following functions:

in this line

img = src.RGBtoYCbCr().channel(0).get_convolve(meanfilter);
  • RGBtoYCbCr(0) - converts the image to YCbCr profile (y component is the greyscale image)
  • channel(0) - returns specified image channel
  • get_convolve(meanfilter) - convolve image by a mask, this function subsequently calls
  • get_correlate(mask) - correlate image by a mask: res(x,y,z) = sum_{i,j,k} (⇤this)(x + i,y + j,z + k)⇤mask(i,j,k), which performs the maths as well as calls
  • magnitude() - returns normalized image represented in matrix form

These lines of code follow right after

CImg *C  = ph_dct_matrix(32);
CImg Ctransp = C->get_transpose();

resize() is fairly easily replicated
ph_dct_matrix(32) - here is the function:

CImg* ph_dct_matrix(const int N){
    CImg *ptr_matrix = new CImg(N,N,1,1,1/sqrt((float)N));
    const float c1 = sqrt(2.0/N); 
    for (int x=0;xdata(x,y) = c1*cos((cimg::PI/2/N)*y*(2*x+1));
    return ptr_matrix;

get_transpose() then calls
get_permute_axes("yxzc") which permutes the axes order

After that, the image is cropped (8x8), unrolled - all the pixels on the x-axis, and consequently hashed:

    CImg dctImage = (*C)*img*Ctransp;
    CImg subsec = dctImage.crop(1,1,8,8).unroll('x');

    //pHash code 

So far, the pixel manipulations seem feasible in canvas. I will be posting updates on my implementation update.



by anna at July 17, 2015 07:12 PM

LEAP Project

LEAP is Officially Released!

After month of preparation, we are pleased to release LEAP: Linux for Enterprise ARM Platforms.

LEAP is our software distribution for testing and evaluating 64-bit ARM enterprise computing platforms. It is based on the CentOS 7.1 sources, with package upversioning, optimization, and fixes, plus additional packages. We will be continuously updating LEAP over the coming months to support additional ARM64 platforms and improve performance.

What are you waiting for? Head on over to the LEAP Homepage for all the details!

by christophertyler at July 17, 2015 06:46 PM

July 16, 2015

Cong Wang

address to coordinates transformation through openstreetmap Search through the above url and the response is in xml format. I have written a java app to convert address to coordinates here is my code: package javaconnection; import; import; import; import; import; import; import; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.transform.OutputKeys; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.dom.DOMSource; import; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.Node; import org.w3c.dom.NodeList; import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; public class JavaConnection { public static void main(String[] args) throws MalformedURLException, IOException, SAXException, ParserConfigurationException, Exception { String roadnumber = "135"; String AvenuneName = "pilkington"; String avenue = "avenue"; String Country = "birmingham"; String BaseURL = ""; String uri = BaseURL + roadnumber + "%20" + AvenuneName + "%20" + avenue + "," + "%20" + Country + "?format=xml&point=1&addressdetails=1"; URL url = new URL(uri); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setRequestMethod("GET"); connection.setRequestProperty("Accept", "application/xml"); InputStream xml = connection.getInputStream(); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(xml); prettyPrint(doc); } public static final void prettyPrint(Document xml) throws Exception { Transformer tf = TransformerFactory.newInstance().newTransformer(); tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); tf.setOutputProperty(OutputKeys.INDENT, "yes"); Writer out = new StringWriter(); tf.transform(new DOMSource(xml), new StreamResult(out)); for (String coordinates : out.toString().split(" ")) { if(coordinates.contains("lon=")) System.out.println(coordinates); if(coordinates.contains("lat=")) System.out.println(coordinates); } // System.out.println(out.toString()); } }

by wang cong ( at July 16, 2015 08:00 PM

July 15, 2015

Hosung Hwang library mismatch problem and solution


When I tried to run a executable that had been built at other machine, it showed following error :

$ ./regdaemon
./regdaemon: /usr/lib/x86_64-linux-gnu/ version `GLIBCXX_3.4.20' not found (required by ./regdaemon)



The reason of this error was because dynamic linking library's version was lower than the library version used in the build machine.

On the build machine, the library is like following:

/usr/lib/x86_64-linux-gnu$ ll libstdc*
lrwxrwxrwx 1 root root      19 Nov  4  2014 ->
-rw-r--r-- 1 root root 1011824 Nov  4  2014

This means that the library that is actually used by the executable is and links to it. This library is installed with new gcc.

On the other machine that showed error, the library was like following:

/usr/lib/x86_64-linux-gnu $ ll libstdc*
lrwxrwxrwx 1 root root     19 May 14 14:11 ->
-rw-r--r-- 1 root root 979056 May 14 14:41 links to and it is older version than on the build machine.



Since the machine was linux mint, which was debian, newest gcc can be installed by following command :

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install g++-4.9

Then the library is updated like this :

/usr/lib/x86_64-linux-gnu $ ll libstdc*
lrwxrwxrwx 1 root root      19 Apr 23 13:00 ->
-rw-r--r-- 1 root root 1541600 Apr 23 13:23

Now, because installed library was newer than in the build machine, the executable worked well.

The other solution will be linking statically by adding <code>-static-libgcc</code> option.

additional information

Which files(file/socket etc.) are opened by a process can be seen using "lsof" utility.

hosung@hosung-Spectre:~$ lsof -p 6002
regdaemon 6002 hosung  cwd    DIR                8,2     4096 2589221 /home/hosung/cdot/ccl/regdaemon/Debug
regdaemon 6002 hosung  rtd    DIR                8,2     4096       2 /
regdaemon 6002 hosung  txt    REG                8,2  1066943 2545008 /home/hosung/cdot/ccl/regdaemon/Debug/regdaemon
regdaemon 6002 hosung  mem    REG                8,2    47712 2117917 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2    14664 2117927 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2   100728 2101352 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2  1071552 2117915 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2  3355040 6921479 /usr/lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2  1840928 2117938 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2    92504 2097171 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2  1011824 6846284 /usr/lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2  1112840 6830431 /usr/lib/
regdaemon 6002 hosung  mem    REG                8,2   141574 2117939 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2   149120 2117935 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung    0u   CHR             136,23      0t0      26 /dev/pts/23
regdaemon 6002 hosung    1u   CHR             136,23      0t0      26 /dev/pts/23
regdaemon 6002 hosung    2u   CHR             136,23      0t0      26 /dev/pts/23
regdaemon 6002 hosung    3u  IPv4              63342      0t0     TCP localhost:60563->localhost:mysql (ESTABLISHED)
regdaemon 6002 hosung    4u  unix 0x0000000000000000      0t0  114861 /tmp/cc.daemon.sock

by Hosung at July 15, 2015 10:07 PM

Anna Fatsevych

Images with HTML5 Canvas

HTML5 Image canvas comes with an array of possibilities in terms of pixel manipulation. I will be exploring more of its functionality in order to recreate pHash dct hash on the client side with JavaScript.

Here is a simple code to create an image:

canvas id="myCanvas" width="100" height="100">
Your browser does not support the HTML5 canvas tag.

var c = document.getElementById("myCanvas");
var ctx = c.getContext("2d");
var imgData = ctx.createImageData(100, 100);

var i;
for (i = 0; i <; i += 4) {[i+0] = 255;[i+1] = 0;[i+2] = 255;[i+3] = 125;

ctx.putImageData(imgData, 0, 0);

First we create the canvas tag with the width and height of 100 pixels.
Then we create the Image Data with width and height of 100 pixels.

In the forloop we assign the values for each pixel - it has four values for each channel Red, Green, Blue, and Alpha.
The channels are the greyscale representations of intensity of each colour (R,G,B and Alpha) defined from 0 to 255 shades of grey. Red (R), Green (G), and Blue (B) are the primary colours, and all the other colours can be created by combining the combination of the former in various strengths (intensities).The fourth channel is the alpha channel that represents transparency, where black is the indication of complete transparency in the spectrum.

To begin the pHash process the image needs to be turned to greyscale first. In this wikipedia article there is a formula on how to reduce the RGB channels in order to achieve the result.

It recommends the following modification: greyscale image = R * 0.299 + B * 0.587 + G *0.114;
So for each pixel, the formula of reducing red by 30%, blue by 60% and green by 10% will be applied.

Next step will be applying filters and masks - Will be discussed in detail in next post.



by anna at July 15, 2015 11:23 AM

July 14, 2015

Fardad Soleimanloo

Justin Flowers

Configuring Windows 7 Vagrant Base Boxes with SSH

With Vagrant it can be quite frustrating setting up Windows Base Boxes using WinRM. I have never had any success myself using the Vagrant WinRM method. While I gawk in amazement at pre-built boxes which have WinRM control, there seems to be no complete documentation anywhere on how to set it up. In fact, the Vagrant page describing how to set up Windows base boxes has formatting issues which makes its Windows code blocks near unreadable. Add on top of that the fact that their (and others’) instructions are either missing steps or outright wrong and you end up where I was three weeks ago: falling back on using the SSH method to connect and provision with Vagrant. While Vagrant does not have much built in automatic provisioning features with SSH mode Windows, you can still do manual provisioning using the Vagrantfile modifyvm command to configure what you need.

Step 1: Create vagrant user

The first important step is to create the vagrant user on Windows. Make sure the account’s username and password is “vagrant” and that it is an administrator. Then log into this new account.

Step 2: Install any provider specific requirements with Cygwin and OpenSSH

Next, from the vagrant user’s account, install any provider specific required software (like VBox’s Guest Additions) and then install Cygwin and OpenSSH using these great instructions from Oracle.

At the end of this section, you should be able to SSH to localhost from a Cygwin terminal by running:

ssh localhost

Step 3: Configure Windows Firewall

You’ll need to do add some entries to the firewall to allow communication through port 22 so that Vagrant can communicate with the base box.

  1. Go to “Windows Firewall with Advanced Security” in the start menu.
  2. Go to “Inbound Rules”
  3. Hit “New Rule”
  4. Select “Port” based rule
  5. Select “TCP”
  6. Select “Specified local ports” and enter 22
  7. Select “Allow the connection”
  8. Selected all check boxes for “When does this rule apply?”
  9. For name make it something along the lines of “Allow SSH Access”

You may need to add an outbound rule as well with the same setup to explicitly allow connections outbound over 22 but most likely that is not necessary.

Step 4: Package up base box

Now to package up the base box. Create a folder called “vagrant_win7”, change dir into it and run:

vagrant package --base "VM_NAME_HERE" --output ""

Substituting “VM_NAME_HERE” with the name of your VM in your respective provider. This will take a while and will create a file called “” in the contained folder.

Step 5: Configure Vagrantfile

In order for Vagrant to even add this new base box to a provider it needs a Vagrantfile. In our case, we’re using:

  1. A Windows guest OS
  2. SSH to a Windows guest OS
  3. Password protection

All of which go against Vagrant default functionality. Hence, we need our Vagrantfile to reflect that. We must also disable the default shared Vagrant folder because it does not set up correctly automatically over SSH. Here’s an example of the Vagrantfile I used to create my Windows base box:

Vagrant.configure(2) do |config| = "vagrant_win7"

  config.vm.provider "virtualbox" do |v| = "vagrant_win7"
    v.customize ["modifyvm", :id, "--nic2", "hostonly"]
  config.vm.synced_folder '.', '/vagrant', disabled: true
  config.vm.guest = "windows"

  config.ssh.insert_key = false
  config.ssh.username = "vagrant"
  config.ssh.password = "vagrant"

With this configuration I set the name of the box in Vagrant and VirtualBox. I also set up a host only adapter using the modifyvm parameter with v.customize.  I then disable the automatic synced folder and explicitly tell Vagrant its using a Windows guest OS. Finally I tell Vagrant to ignore using a private/public key-pair with SSH and tell it the username and password to use to connect.

Step 6: Add box to Vagrant and vagrant up

Finally, now that you have a working base box and Vagrantfile, its time to add your box and vagrant up! From the “vagrant_win7” folder simply run:

vagrant box add vagrant_win7

Once its finished adding your box you can run:

vagrant up

to turn it on and:

vagrant ssh

to create a SSH connection!


And that’s it! As you can see, the SSH method is a little complex, and leaves you without some automatic provisioning features with Vagrant. However if you simply need a Windows box working quickly and cannot get the WinRM route working, then this is a functional alternative.

by justin at July 14, 2015 02:35 AM

July 03, 2015

Koji Miyauchi

Using Box APIs

This week, I have been working on researching Box APIs in order to use the service in our application.

Box is a one of the popular cloud storage, and we decided to use this service as data source of our data visualization application project. The idea is that user can login to their Box account, and choose one of files for data visualization.

Get started

These links will be good resources to get started.

CORS support

Tutorial from the above link is quite straight forward.

But when people try to use any kinds of APIs through javascript, cross origin error will always be a problem.
Because generally web browsers does not allow cross domain access from javascript.

Since our application consists of mostly client side script, I needed to consider cross origin issue, however, Box has CORS(cross origin resource sharing) support for their API.

In order to do cross origin access with Box APIs, you need to contact Box API support and let them know what URLs you want to use for your application. They will set up Allow-Control-Allow-Origin in your application for you.

API request samples

After you get contact from Box and all set up, you can try this example.

Also there are some examples that I have tried for some APIs.

User login popup: First step of user login. It will popup window and let user authenticate our application.

var loginForm ="", "LoginWindow", "width=400, height=600");

//Keep listening popped-up window until authentication has been done.
var timer = window.setInterval(function(){
  var code = "";
    code =;

  //When popup redirected to a page with client code.
  if(code != ""){
    var reg = /code=(.*).$/g;
    console.log(reg.exec(code)[1]) // The code you need for OAuth request
}, 100);

OAuth: After you get the code from previous example, you can make a request to OAuth.

var form = new FormData();
form.append('grant_type', "authorization_code");
form.append('code', code);
form.append('client_id', "CLIENT_ID");
form.append('client_secret', "CLIENT_SECRET");

  url: "",
  type: 'POST',
  contentType: false,
  processData: false,
  data: form
}).complete(function (data){
  var json = JSON.parse(data.responseText);
  console.log(json.access_token) //This is the token that you can use for your API accesses.

API request
Now you can use that access-token to make a request to many different .

var headers = {
   Authorization: 'Bearer ' + token,
  url: "curl",
  type: 'GET',
  headers, headers,
  contentType: false,
  processData: false
}).complete(function (data){

Solution for our application

After I tried a couple of those examples, I found out that Box provides some widgets that you can simply embed and use.

This will create a button “Select from Box” button, and it will popup login form when you click it.

Screen Shot 2015-07-03 at 5.43.16 PM

Screen Shot 2015-07-03 at 5.42.45 PM

Once you login, you can explore your folder and choose file(s) to use in your application.
Screen Shot 2015-07-03 at 5.43.11 PM

At the end, this is all what I need.

by koji miyauchi at July 03, 2015 09:51 PM

Glaser Lo

Koji build system overview

As being Pidora's maintainer in winter, most of my time was spent on Koji, a package building system created by Fedora engineers. Packing a repository scale of packages is a task that could take huge amount of time to complete, and it is unrealistic to build on your computer one by one. Therefore, Koji is created. The purpose of having koji is making a multi-user, reusable, scalable package building system. There are a few components in the Koji system:

Koji build system graph

  • Koji-hub - Central service of koji - Receiving commands from client, assigning/passing tasks to other components based on httpd and XML-PRC call.

  • Builders/Hosts (kojid) - machines that are used for building packages only. Once the daemon kojid on it receives any task from koji-hub, the builder creates a mock chroot and builds packages in it.

  • Storage - Simply a network-attached storage mounted at /mnt/koji on any koji machines, storing all the build logs and package files.
    • Koji repositories - collections of packages that have already built on koji. Since a koji repo is basically a yum repo, testing it becomes easy and convenient.
  • Kojira - In order to make a built package available for building other packages, koji repository needs to be regenerated. Kojira is a daemon automatically detects any new changes and create a repo regeneration task.

  • Postgresql database - koji user and package database

  • koji web - A web interface allowing user to quickly check about current tasks, build status, host status, reports, etc.

  • koji client - A workstation contains koji user certificate. Once the client is authenticated, user can manage packages on koji server using koji cli commands.

Koji build system overview was originally published by Glaser Lo at Illusion Village on July 03, 2015.

by Glaser Lo ( at July 03, 2015 09:07 PM

Hosung Hwang

CC Image License Search Engine API Implementation


CC - New Page


Previously, my colleague Anna made a page that search similar images by uploading or from the link. This UI page can be either inside the server or outside the sever. It uses only PHP API without accessing Database directly.


This is open API that have functions of Adding, Deleting, and Matching image. It can be accessed by anyone who want this function. UI page or client implementation such as browser extension uses this API. The matching result is JSON format.
This API page Add/Delete/Match by asking “C++ Daemon” without changing Database.
Only for read-only access to the Database will be permitted.

C++ Daemon

All adding/deleting operation will be done in this daemon. By doing so, we can remove the problem of synchronization between database and index for matching. That is because this daemon will have content index on the memory all the time for fast matching.
Because this daemon is active all the time, to get the request and give result to “PHP API”, it works as domain socket server. PHP API will request using domain socket.


Database contains all metadatas about CC license images and thumbnail path that are used to show as a preview in the matching result.

by Hosung at July 03, 2015 03:16 PM

July 02, 2015

Chris Tyler (ctyler)

The OSTEP Applied Research Team

I haven't introduced my research team for quite a while, and it has changed and grown considerably. Here is the current Open Source Technology for Emerging Platforms team working with me at Seneca's Centre for Development of Open Technology. From left to right:

  • (me!)
  • Michael (Hong) Huang (front)
  • Edwin Lum (rear)
  • Glaser Lo
  • Artem Luzyanin (front)
  • Justin Flowers (rear)
  • Reinildo Souza da Silva
  • Andrew Oatley-Willis

Edwin and Justin work with me on the DevOps project, which is applying the techniques we've learned and developed to the software development
processes of a local applied research partner.

Michael, Glaser, Artem, Reinildo, and Andrew work with me on the LEAP Project. Recently (since this photo was taken), Reinildo returned to Brazil, and has been replaced by Christopher Markieta (who has previously worked with this project).

I'm dying to tell you the details of the LEAP project, so stay tuned for an announcement in the next week!

by Chris Tyler ( at July 02, 2015 08:51 PM

Anna Fatsevych

Flickr API Woes

My genius Flickr Downloader was chugging along and downloading images with all the required licensing and author information and everything seemed fine, until yesterday when I ran into an interesting issue. The images kept duplicating themselves after the folder was at 4,497 files. I ran the program again and (after a few hours, mind you) the issue reappeared. After I had exhausted all the possibilities of errors on my end (code, maximum dir size capability, etc), I began an investigation on Flickr API that yielded no results. Today I ran the program a few different times, on various dates, and alas, it capped out at 4,500 images on the dot each time.

The only limit ever mentioned in Flickr Official API Documentation is the 3600 api calls per hour throttling cap, and nothing is documented on the maximum results returned by a search. I had dug out this StackOverflow article that mirrors my issue, the only difference that it states the cap to be 4,000 search results, whereas I found it to be 4,500.

I am now testing the new downloader with more frequent time increments that yield search results that are less than the allowable max.

by anna at July 02, 2015 01:02 AM

June 29, 2015

Hosung Hwang

Pastec Test for Performance

So far, I tested Pastec in terms of the quality of image matching. In this posting, I tested speed of adding and searching.

Adding images to index

Firstly I added 100 images. Adding 100 images took 48.339 seconds. Then I added all directory from 22 to 31. Those images are uploaded to wikimedia commons from 2013.12.22 to 2013.12.21.

Directory Start End Duration Count Average
22 17:32:42 18:43:50 01:11:08 8785 00:00.49
23 18:43:50 19:42:03 00:58:13 7314 00:00.48
24 19:42:03 20:28:56 00:46:53 6001 00:00.47
25 20:28:57 21:28:02 00:59:05 7783 00:00.46
26 21:28:02 22:41:12 01:13:10 9300 00:00.47
27 22:41:19 23:54:28 01:13:09 9699 00:00.45
28 00:54:28 01:53:23 00:58:55 7912 00:00.45
29 00:53:23 02:27:42 01:34:19 11839 00:00.48
30 02:27:42 03:31:48 01:04:06 8827 00:00.44
31 03:31:48 04:23:15 00:51:27 6880 00:00.45

Average time for adding an image was around 0.46 second and it didn’t increased as the index grows. Most of the time for adding an image is extracting features.
I saved the index file for 100 images, from 22 to 26, and from 22 to 31. The size were 8.7mb, 444.1mb, and 935.8mb respectively.


Searching images

I loaded the index file for 100 images. And searched all 100 images that are used to add.

Directory Start End Duration Count Average
22 00:01:14 100 00:00.74

Searching took 1m14.781s. Since it is 100 images, average time to add one image was 0.74 second.

Then I loaded the index file that contains index for 39,183 images in the directory from 22 to 26.

Directory Start End Duration Count Average
22 09:00:05 11:21:06 02:21:01 8785 00:00.96
23 11:21:06 13:13:52 01:52:46 7314 00:00.93
24 13:13:52 14:48:26 01:34:34 6001 00:00.95
25 14:48:26 16:48:44 02:00:18 7783 00:00.93
26 16:48:44 19:13:11 02:24:27 9300 00:00.93

This time, average time for searching one image was 0.95 second.

Then I loaded the index file that contains index for 84,340 images that are in the directory from 22 to 31.

Directory Start End Duration Count Average
22 19:32:54 22:44:09 03:11:15 8785 00:01.31
23 20:44:09 23:16:59 02:32:50 7314 00:01.25
24 01:16:59 03:24:52 02:07:53 6001 00:01.28
25 03:24:52 06:11:33 02:46:41 7783 00:01.28
26 06:11:33 09:30:53 03:19:20 9300 00:01.29

Searching performed for the same images from 22 to 26. Average time for searching was 1.3 seconds.


  • Adding an image took 0.47 second.
  • Adding time didn’t varied by index size.
  • Searching an image varied by index size.
  • When the index size was 100, 39183, and 84340, searching time was 0.74, 0.95, and 1.3 seconds, respectively.
    Screenshot from 2015-06-28 23:14:15
    In the chart, y-axis is time in milliseconds. Around 0.6 second is likely to be for reading an image and extracting features. And searching time will be increased in proportion to the size of index.

by Hosung at June 29, 2015 03:28 AM

June 26, 2015

Barbara deGraaf

The thrilling saga on shaders continues

In my last post I detailed some basics of creating a shader and in this post I will be focusing on how to create a depth of field shader.

There is going to be a couple files that need changing including the shader file and the main js file. I am going to start off with the shader file and mention the js file later.

As I stated before in the last post the depth of field shader is only going to change the fragment shader so the vertex shader will be the same as the one that I have posted on the last post.

So this post will manly focus on the fragment shader. I was going to talk about the code in the shader but that has made the post too long so I will talk about the main concept of creating depth of field. Which is as follows; create a texture containing the depth map. Then grab the value from the depth texture to figure out how far away from the camera the pixel is. Using the inputs from the camera find out what the near and far depth of field areas are. We can then compare the depth of pixel to the near and far depth of field to find out how blurry it should be. We then do something called image convolution. This process grabs the colour of the pixels around the certain pixel and adds them together so that the final pixel is a mix of all the colours around it.

To get the shader to work Three.js has something called effect composer and shader pass to work with your shaders. This is done in rough form as follows;

composer = new THREE.EffectComposer( renderer );
composer.addPass( new THREE.RenderPass( scene, camera ) );

var Effect1 = new THREE.ShaderPass( shadername, textureID );
Effect1.uniforms[ 'value1' ].value = 2.0 ;
Effect1.renderToScreen = true; //the last shader pass you make needs to say rendertoscreen = true
composer.addPass( Effect1 );

Then to get this to work you need to call composer.render() in the render loop instead of the normal renderer.render().

I will end here for this post, If need be I will wrap up some minor things about shaders in the next post. As well once the scene is nicely set up and the GUI works with real world cameras/lenses I will post a post with a survey to see what shader produces the best results and where it can be improved.



by barbaradegraafsoftware at June 26, 2015 02:04 PM

Hosung Hwang

scp/sftp through ssh turnnel

SSH Tunneling

Machine CC can be connected from another machine called zenit.
To do scp to CC through zenit, following command establish a ssh tunnel to CC.

ssh -L 9999:[address of CC known to zenit]:22 [user at zenit]@[address of zenit]
in my case,
ssh -L 9999:

Now, 9999 port of localhost( is for tunnel to CC through zenit.
This session need to be alive to do all followings.


SCP through the SSH Tunnel

Then these commands do scp from local test.png file to CC:~/tmp and copy from CC:/tmp/test.png to ..

scp -P 9999 test.png ccuser@
scp -P 9999 ccuser@ .


Making it easy

Typing those long command is not a good idea.
I added an alias to .bashrc.

alias ccturnnel='ssh -L 9999:'

Then wrote two simple bash script.

This is cpfromcc.

var=$(echo $1 | sed 's/\/home\/hosung/~/g')
scp -P 9999 ccuser@$remote $2

This is cptocc.

for var in "$@"
    if [ $i -ne $# ]
        values="$values $var"
        var=$(echo $var | sed 's/\/home\/hosung/~/g')
scp -P 9999 $values ccuser@$remote

The reason why I use sed for remote path is because bash changes ~ to my home directory.
Now I can establish ssh tunnel by typing ccturnnel.
Then I can do scp from my machine to CC using :

cptocc test.jpg test2.jpg ~

And I can do scp from CC to my machine using :

cpfromcc ~/remotefile.txt .


Making it convenient using sftp

When the tunnel is established, sftp is the same.

$ sftp ccuser@


Making it more convenient using Krusader

By typing sftp://ccuser@ in the URL bar of the Krusader, and then by adding the place to the bookmark, the remote machine’s file system is easily accessed.

Screenshot from 2015-06-26 10:23:39

Mounting it using sshfs also will be possible.

by Hosung at June 26, 2015 03:54 AM

June 24, 2015

Anna Fatsevych

Flickr API – Date Time

Flickr API has a funny way with dates, I am in the middle of discovering how it really works. Before I was sending the date in terms of string “YYYY-MM-DD” and setting a difference of one day i.e. “2015-03-20 2015-03-21″ and I was getting only about 1,000 images per day (on average).

I had dug deeper into the API and realized that UNIX timestamp and MySQL datetime. In my php code I set the default timezone to Greenwich and then set the date in the MySQL datetime like this

min_upload_date: “2015-03-20 00:00:00″
max_upload_date: “2015-03-20 23:59:59″

And now I get on average of 200,000 results per day (Licenses 1 through 7).
This is great news – there are some grey areas that I need to further research – in terms of time comparison, or how exactly does Flickr compare dates, with what precision, round off, or truncation.

More to come as I am still researching and running tests.



by anna at June 24, 2015 09:47 PM

Hosung Hwang

Pastec Test for real image data

In the previous test of Pastec, I used 900 jpeg image that was mainly computer generated images. This time, I tested images from WikiMedia Commons Archive of CC License Image that are uploaded from 2013-12-25 to 2013-12-30. They are zip file 17GB to 41GB and it contains around 10,000 files including jpg, gif, png, tiff, ogg, pdf, djvu, svg, and webm. Before testing, I deleted xml, pdf, djvu and webm. Then there are 55,643 images.


Indexing 55,643 images took around 12 hours and Index file was 622mb. At first, I made separate index files for each day. However, Pastec can load only 1 index file. So I added all 6 days’ images and saved it to one index file.

While indexing there are some errors.

  1. Pastec uses OpenCV, and OpenCV doesn’t support gif and svg. For these two format, OpenCV didn’t open.
  2. Pastec adds images that is bigger than 150×150 pixel.
  3. There are zero bytes images : 153 files in 55,643 files. However on the web page of wikimedia, there are valid images. Anyways it causes an error.
  4. One tiff image cause crash inside the Pastec. It need debugging.


After loading the 622 mb index file, images can be searched. Searching 55,643 images took around 15 hours. Every searching process, it extracts features before searching, therefore, searching takes more time.

Search result

Among 55,643 images, 751 images(1.43%) are smaller than 150×150, so they were not added. 51479 images are proper size, proper format for OpenCV, they are indexed and can be searched.

  • 42,931 (83%) images are matched with only themselves (exactly the same image)
  • 8,459 (15%) images are matched more than one image
  • 90 (0.17%) images are not matched with any images even with themselves.

Images didn’t match with any images

These 90 images are properly indexed, but didn’t match even with themselves.

  • 55 images were png image that include transparency. Other than this case, jpg images
  • 14 images were long panorama images like followings


  • 6 images were simple images like followings

__Amore_2013-12-30_14-18 __Bokeh_(9775121436) __Bokeh_(9775185973) __Hmm_2013-12-30_16-54 __Moon_early_morning_Shot

  • 8 vague images : lines are not clear and photographs that are out of focus

__20131229141153!Adrien_Ricorsse SONY DSC __Llyn_Alwen,_Conwy,_Cymru_Wales_21 __Minokaya_junior_high_school __Moseskogbunn __Nella_nebbia._Franco_Nero_e_Valeria_Vaiano_in_Mineurs_-_Minatori_e_minori SONY DSC SONY DSC

  • Other cases
    __Brännblåsa_på_fingret_2013-12-26_13-40 __Pottery_-_Sonkh_-_Showcase_6-15_-_Prehistory_and_Terracotta_Gallery_-_Government_Museum_-_Mathura_201d247a1ec8535aec4f9bf86066bd10dd
    These two images are a bit out of focus.

__Jaisen_Wiki_Jalayathra_2013_Alappuzha_Vembanad_Lake26 __Jaisen_Wiki_Jalayathra_2013_Alappuzha_Vembanad_Lake41 __Jaisen_Wiki_Jalayathra_2013_Alappuzha_Vembanad_Lake42

Original image size of this is 150×150 pixel. May be it is too small and simple.

Images matched with more than one image

8,459 images were matched with more than one images. To compare the result, I generated an html file that shows all match results like following :
Screenshot from 2015-06-24 16:29:49

I converted all images to 250×250 pixel using convert -resize 250x250 filename command to show it on one page. The html file size was 6.8 mb and it shows 64,630 images.

As I mentioned on my previous blog, Pastec is good for detecting rotated/cropped image.
Almost all matching was reasonable(similar). Followings are significant matchings :
20131225102452!Petro_canada Petro_canada_info

20131225193901!Bitlendik Bitlendik-avatar

In these two cases, the logo was matched.

20131225212947!June_Allyson_Dick_Powell_1962 June_Allyson_Dick_Powell_1962 Aveiro_342

This matching looks like false positive.

Buddhapanditsabhacandapuri Aveiro_144

This matching also is false positive.

NZ-SH45_map NZ-SH30_map NZ-SH38_map

In this case, the map is shifted.

PK-LJH_(Boeing_737-9GP)_from_Lion_Airlines_(9722507259) Blick über die Saalenbergkapelle in Sölden zum Kohlernkopf

This is obvious false positive, maybe sharp part of the airplane and the roof part was matched.

From my observation, obvious false positive matching that doesn’t share any object was less than 50, which was 0.08%. Usually when the image contains graphs or documents, there were wrong matching. When the image was normal photograph, the result was very reliable.

by Hosung at June 24, 2015 09:41 PM

Anna Fatsevych

Curl and wget

When downloading images using php (using curl or file_put_contents) I have ran into issues with download sizes, possible interruptions and memory usage, which all can and have to be changed in your php.ini file.

Then I have come across this comparative article about wget and curl curl vs. wget and decided to give wget a try as it does not seem to initially have those limitations and has the ability to continue downloading even after an interrupt, thus making the case for the preferred download method in our case.

Curl relies heavily on php.ini settings and is incorporated with my php program, whereas wget is executed as a command line and downloads independently of the php settings, thus might be more beneficial in making a portable downloader with minimal changes of configuration required.

I did not have to install wget package for he Linux Mint Cinnamon OS and can just run the executable within my php code like this:

exec("wget http://your/url");
exec(“wget “.$urlToDownload);

or you can choose to specify the directory of the download with wget

exec(“wget https://your/url -O /your/dir/filename.jpg”);


I have ran more test, as at times wget would give me a 100% downloaded message, but the file was sized 0 bytes. This alarmed me as the error was not caught and it was caused by a redirect, which is automatically handled by curl. I am currently looking more into this issue, but in the meantime I have ran some tests, and these are my results:

280 images – CURL: 422 seconds, No Errors
WGET: 703 seconds, No Errors

350 images – CURL: 475 seconds, No Errors
WGET: 821 seconds, No Errors

450 images – CURL: 541 seconds, No Errors
WGET: 1008 seconds, 3 Errors – Images size 0

In regards to file storage, I have found out (NTFS) that the directory can store sufficient amount of image files for our purposes in NTFS format, and therefore one directory would be enough to store images that way as opposed to having them stored as blobs in the MySQL database.

More to come on this topic,


by anna at June 24, 2015 02:57 PM

June 23, 2015

Andrew Smith

Using ImageMagick without running out of RAM

For our research project we needed to use pHash to do some operations on a lot (tens of thousands) of image files. pHash uses ImageMagick internally, probably for simple operations such as resizing and changing the colour scheme.

I am pretty familiar with errors such as these coming from convert or mogrify:

convert.im6: no decode delegate for this image format `Ru-ей.ogg' @ error/constitute.c/ReadImage/544.
convert.im6: no images defined `pnm:-' @ error/convert.c/ConvertImageCommand/3044.
sh: 1: gm: not found

[CImg] *** CImgIOException *** [instance(0,0,0,0,(nil),non-shared)] CImg<unsigned char>::load(): Failed to recognize format of file 'Ru-ей.ogg'

What I wasn’t expecting was to get such errors in one of my own applications that uses a library (phash) that uses another library (imagemagick). What moron prints error messages to stdout from inside a library? Seriously!!??

But it gets worse. As I put this code in a loop it quickly found a reason (the first was a .djvu file) to eat up all my ram and then start on the swap. Crappy code, but it’s a complex codebase, I can forgive them. I figured I’ll just set my ulimit to not allow any program to use over half a gig of RAM with “ulimit -Sv 500000″ and ran my program again:

[CImg] *** CImgInstanceException *** [instance(0,0,0,0,(nil),non-shared)] CImg<float>::CImg(): Failed to allocate memory (245.7 Mio) for image (6856,9394,1,1).
terminate called after throwing an instance of 'cimg_library::CImgInstanceException'
  what():  [instance(0,0,0,0,(nil),non-shared)] CImg<float>::CImg(): Failed to allocate memory (245.7 Mio) for image (6856,9394,1,1).

Aborted? What sort of garbage were these people smoking? You don’t bloody abort from a library just because you ran out of memory, especially in a library that routinely runs out of memory! Bah. Anyway, I found a way to make sure it doesn’t abort. Set ulimit back to unlimited and instead created a global imagemagick configuration file /usr/share/ImageMagick-6.7.7/policy.xml:

  <policy domain="resource" name="memory" value="256MiB"/>
  <policy domain="resource" name="map" value="512MiB"/>

Now no more aborts and no more running out of memory. Good. Until I got to about file number 31000 and my machine ground to a halt again, as if out of RAM and swapping. What this time? Out of disk space of course, why not!

I’ve already set ImageMagick in my program to use a specific temporary directory (export MAGICK_TMPDIR=/tmp/magick1 && mkdir -p $MAGICK_TMPDIR) so that my program, after indirectly using the imagemagick library can run “system(“rm -f /tmp/magick?/*”);” because, you know, it’s too much to ask ImageMagick to clean up after itself. Barf… But it even got around that. For a single PDF file it used over 65GB of disk space in /tmp.

And if at least they said they’re using other people’s libraries it’s not their fault and so on and so forth maybe I wouldn’t be so pissed, but instead they give me bullshit like “oh what’s a lot of resources to you is nothing to someone else, we have 1TB of RAM, bla bla”.

Piss off, I’m going to find another solution that doesn’t involve using this garbage.

by Andrew Smith at June 23, 2015 02:47 AM

June 19, 2015

Barbara deGraaf

An Introduction to shaders

For our project we are using shaders to replicate the depth of field for the camera. The shaders online certainly work but I was not happy with the lack of explanation or the procedure within those shaders, so I have decided to make my own to replicate depth of field.

Within this post I am just going to explain some introductory concepts about using shaders in Three.js and led up to the final shader results in the later posts.

Before going into details about the shaders I am going to talk a bit about the rendering pipeline and then jump back. The rendering pipeline is the steps that OpenGL (The API that renders 2D and 3D vector graphics) takes when when rendering objects to the screen.


This image was taken from the OpenGl rendering pipeline page here.

Glossing over some things a bit, there is basically two things happening. First the pipeline deals with the vertex data. Then the vertex shader is responsible for turning those 3D vertices into a 2D coordinate position for your screen(responsible for where objects get located on the screen). After some other stuff rasterization occurs which makes fragments(triangles) from these vertices points. After this occurs the fragment shader occurs. This fragment shader is responsible for what colour the fragment/pixel on screen has.

This whole pipeline runs on the GPU and the only two parts of this pipeline that are programmable by a user are the vertex shader and the fragment shader. Using these two shaders we can greatly alter the output on the screen.

For Three.js/WebGL the shaders are written in GLSL (with three.js simplifying things for us a little bit) which is similar to C. This shader file can be separated into three main parts: uniforms, vertex shader, and the fragment shader.

For the first part the uniforms this is going to be all the values passed from the main JS file. I’ll talk about passing in values in a later post. A basic example is;

uniforms: {
"tDiffuse": { type: "t", value: null },
"value1": { type: "f", value: 1.2 }

tDiffuse is the texture that was passed from the pervious shader and this name is always the same for three.js. The types that can occur in the uniforms are many but some of the basic ones are i = integer, f=float, c=colour, t=texture, v2 = vector2 (also 3 and 4 exist), m4 = matrix4 etc….

The next part is the vertex shader, because of what I want to do (change the colour of the pixel to create a blurring effect) I don’t need to change anything in here, but it is still required to write this in the shader file. If you want to code one you must code the other as well.

vertexShader: [

  "varying vec2 vUv;",
  "void main() {",
    "vUv = uv;",
    "gl_Position = projectionMatrix * modelViewMatrix * vec4( positio       n, 1.0 );",


Varying meaning that the value change for each pixel being processed. In this one we have vUv which is a vector that holds the UV (screen co-ordinates) of the pixel and is automatically passed in by three.js. The next line just takes the 3D coords and projects them onto the 2D coords on your screen. I am going to skip the explanation of why this works as it is not important, just look it up or ask me if you really want to know.

Now for the important one, the fragment shader;

fragmentShader: [

"uniform sampler2D tDiffuse;",
"varying vec2 vUv;",

"void main() {",
  "vec4 color = texture2D(tDiffuse, vUv);",
  "gl_FragColor = color;",


For this vUv is the same as from the vertex shader and tDiffuse is the texture that was passed in (stated as sampler2D here). In this main loop we are grabbing the RGBA value from the passed in texture as coord vUv and then assigning it to the output pixel.

This is the shader I will be using to create a depth of field and for the rest of the posts I will be looking at this shader only.

That’s it for the introduction, next post I will start to get into the fragment shader and image convolution.



by barbaradegraafsoftware at June 19, 2015 07:48 PM

Dmitry Yastremskiy

Hello Data!

I’m working on the project of 3D Data Visualization, which is an emerging industry and getting popular these days, especially where lots of data gets generated and needs ways to interpret it to allow humans read it and learn something from it. The goals of this project are to be able to grab pretty much any data and visualize it as well as visualize it taking advantage of the 3rd dimension Z, where 2 dimensions X and Y just not enough. In order to give the app to be extensible and to live happy life we are structuring it the way people will be able to add their own templates and sources of data, so it is not wired to particular data sources or visualizations. From the technical side the tools we using are: Three.js for WebGL, Backbone.js for MVC pattern, Require.js for dynamic script loading and pure vanilla JavaScript for the rest. You can see our first steps here: We will be happy for any feedback or advices. Feel free.

by hamabama at June 19, 2015 07:03 PM

June 18, 2015

Hosung Hwang

Pastec analysis

Pastec works as following order :

  1. Load visual words : visualWordsORB.dat file contains it, the size is 32,000,000 bytes. Loading the file takes around 1 seconds.
  2. Building the word index : using the visual words, builds word index; it takes around 13 seconds.
  3. Now previously saved index file can be loaded, or an image can be added to the index.
  4. Using an image file, similar image file that contains similar word indexes can be searched.
  5. Index in the memory can be written to a file

Adding new image to the index works as following order :

  1. Using OpenCV, ORB features are extracted.
  2. Matching visual words are searched.
  3. Matching visual words are indexed on the memory

When I added 900 images, the size of index file was 16,967,440 bytes.

By changing source code, I saved matching visual word list to the text file for each images. Each word matching stored using this struct :

struct HitForward
    u_int32_t i_wordId;
    u_int32_t i_imageId;
    u_int16_t i_angle;
    u_int16_t x;
    u_int16_t y;

Each word matching has word id, image id, angle, and x/y coordination. Saved file looks like this (order of ImageID,Angle,x,y,WordId) :


It contains 1593 lines, which means it has 1593 matching words. Image id 469 was Jánské.jpg and the image looks like this :
The size of this image is 12.8 mb. Like other HDR images, it contains lots of features. Also it has biggest number of matching words among 900 images. When the data was written to the text file, the size was 39,173 bytes, it would be the worst case. When the image is simple, only few words are matched. Full size of matching word text files of 900 images was 20.9 mb.

To reduce it, I made a simple binary format. Since the image id is the same for an image, I wrote it once, and it is followed by 4 bytes count. Then every word is written as 4 bytes word id, 2 bytes angle, 2 bytes x, and 2 bytes y.

4 bytes - id
4 bytes - count
4,2,2,2 (10 bytes) *  count

In case of id 469 image, the size is 11,238 bytes. And the file looks like this :

00000000: d501 0000 3906 0000 e282 0100 dcd9 a101  ....9...........
00000010: 6f00 a2fc 0300 10b4 a801 c501 889c 0000  o...............
00000020: 9610 6203 0901 f2b1 0900 00ad 5703 2701  ..b.........W.'.
00000030: 9b70 0000 0ee7 df02 0c01 4d20 0200 ee30  .p........M ...0
00000040: 1102 7000 9ba0 0200 e130 f401 2700 3b68  ..p......0..'.;h
00000050: 0400 a2bd 6702 3b00 b094 0800 c64c 5f02  ....g.;......L_.

0x1d5 is 469 and 0x639 is 1593.
In this case, the size was 15938 bytes, which was 15 kb, around 34% of text format (39 kb).
Since this image is the worst case, storing all binary index to database for all image record is realistic.
Full size of all 900 images was 8.5 mb. (text file was 20.9 mb)
Interestingly, it is smaller than index file for 900 images (16.2 mb)


I was thinking of saving index file. However, saving word list for each image will be the better solution because when it is binary format, it consumes less storage and adding it to the index is very fast. Also, when it is stored as a database field, synchronization between index and database is not a problem.

by Hosung at June 18, 2015 09:58 PM

June 17, 2015

Hosung Hwang

How to import CMake project in Eclipse CDT4

Currently I am analysing Pastec; it uses CMake as a build system. To split them up, I wanted to analyse it using the functionality of Eclipse.

Pastec can be built using following order.

$ git clone
$ mkdir build
$ cd build
$ cmake ../
$ make

To build Pastec in Eclipse CDT, instead of doing “cmake ..”, following order need to be done. (Debug build)

$ cd build
$ cmake -G"Eclipse CDT4 - Unix Makefiles" -D CMAKE_BUILD_TYPE=Debug ..

Then, it can be imported into Eclipse:

  1. Import project using Menu File->Import
  2. Select General->Existing projects into workspace:
  3. Browse where your build tree is and select the root build tree directory(pastec/build). Keep “Copy projects into workspace” unchecked.
  4. You get a fully functional eclipse project


by Hosung at June 17, 2015 03:53 PM

June 16, 2015

Anna Fatsevych

Wiki Commons API

I have been working on downloading meta data for the images found in the Wiki Image Dumps. I am using the Commons Tools API to gather licensing data and author information.

The fact that anybody can edit information on the Wiki, is great for many reasons, but can produce unexpected, and sometimes totally unreadable results when trying to parse XML returned from the call.

Here is the code snipped, and while the image name is unique and stays unchanged, the author name, license, description, and even the template itself can be changed and edit by the user.

 [file] => SimpleXMLElement Object
            [name] => QuezonNVjf181.JPG
            [title] => File:QuezonNVjf181.JPG
            [urls] => SimpleXMLElement Object
                    [file] =>
                    [description] =>

            [size] => 6480788
            [width] => 4608
            [height] => 3456
            [uploader] => Ramon FVelasquez
            [upload_date] => 2013-12-29T09:28:24Z
            [sha1] => 8646ca2be96f423faa2c33da1f2bbddbeee454c8
            [date] => 
            [author] => a href="" title="User:Ramon FVelasquez">Ramon FVelasquez SimpleXMLElement Object

As you can see – the author tag has an html tag, that sometimes can be just plain text; I am parsing the “title” tag and storing the contents, which prove to be erroneous at times. Also as far as licensing is concerned, it is usually much clearer, as the pre-set Creative Commons Licenses are mostly used, and thus provide an easier parse-able fields:

    [licenses] => SimpleXMLElement Object
            [@attributes] => Array
                    [selfmade] => 1

            [license] => SimpleXMLElement Object
                    [name] => CC-BY-SA-3.0
                    [full_name] => Creative Commons Attribution Share-Alike V3.0
                    [attach_full_license_text] => 0
                    [attribute_author] => 1
                    [keep_under_same_license] => 0
                    [keep_under_similar_license] => 1
                    [license_logo_url] =>
                    [license_info_url] =>
                    [license_text_url] =>


I am using this Commons Tool to get the information from already downloaded images. I also have been checking first if I have the complete information in the XML file dumps first, but now, have decided to bypass that check and just use the API, as I think it will provide us with the newly updated information, and less possibility for an outdated or corrupt XML file.



by anna at June 16, 2015 08:58 PM

June 15, 2015

Hosung Hwang

Pastec test method and result


Pastec is mentioned on my previous posting about Content Based Image Retrieval(CBIR). It extracts features using ORB and Visual Word.

Pastec offers visual word data file: visualWordsORB.dat that is 10.5MB. Pastec program load the visual word data initially and then load index data file. Then it can be searched. Today, I am going to write about the test result for 900 images the same as I did before. Performance and source code analysis will be done later.

Test Mothod

Full API is in this page.
Pastec runs as HTTP Server that uses RESTful API. It can run using following command :

./pastec visualWordsORB.dat

I added all jpeg images in the directory to the index by writing this script :

for F in /home/hosung/cdot/ccl/hashtest/all-images/*.jpg;
    curl -X PUT --data-binary @"${F}" http://localhost:4212/index/images/$i;

Then each image is searched by this script :

for F in /home/hosung/cdot/ccl/hashtest/all-images/*.jpg;
    echo $i,"${F}"
    curl -X POST --data-binary @"${F}" http://localhost:4212/index/searcher;

These generates an output like following :

2,/home/hosung/cdot/ccl/hashtest/all-images/05 0751 DOE NamUs UP 345 Reconstruction 001a.jpg
3,/home/hosung/cdot/ccl/hashtest/all-images/0514-80 Reconstruction 002b.jpg
70,/home/hosung/cdot/ccl/hashtest/all-images/A 3D Object design using FreeCad Software.jpg

Since response is json data, I had to parse again. So I wrote simple python script because in python, json parsing is easy.

import json

id = 0
file = "nofile"
error = 0
notfound = 0
found = -1
moreThanOne = 0
onlyOne = 0

with open("search2.txt", "r") as f:
    for line in f:
        if line[0] != '{':
            line1 = line.split(',')
            id = int(line1[0])
            file = line1[1]
            j = json.loads(line)
            if j["type"] == "SEARCH_RESULTS":
                ids = j["image_ids"]
                if len(ids) == 0:
                    notfound += 1
                if len(ids) == 1:
                    found = ids.index(id)
                    onlyOne += 1
                if len(ids) > 1:
                    moreThanOne += 1
                    print str(id) + " : ",
                    print ids,
                    print file
                print str(id) + " : " + j["type"],
                print " : " + file
                error += 1

print "Error : " + str(error)
print "NotFound : " + str(notfound)           
print "Match Only One : " + str(onlyOne)
print "Match More Than One : " + str(moreThanOne)

I printed only the results that include more than one matching. Following is the result of previous python script

22 : [22, 835] /home/hosung/cdot/ccl/hashtest/all-images/1992-06560 Reconstruction 002.jpg
23 : [23, 835] /home/hosung/cdot/ccl/hashtest/all-images/1992-06614 Reconstruction 002.jpg
28 : [28, 29, 30] /home/hosung/cdot/ccl/hashtest/all-images/20131017 111028 green spiral ornament with Purple background.jpg
29 : [29, 30, 28] /home/hosung/cdot/ccl/hashtest/all-images/20131017 111122 Fairest wheel ornament with wall as background.jpg
30 : [30, 29] /home/hosung/cdot/ccl/hashtest/all-images/20131017 111143 - White Feerest wheel ornament with plywood background.jpg
70 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/A 3D Object design using FreeCad Software.jpg
77 : [77, 78] /home/hosung/cdot/ccl/hashtest/all-images/Alaska Hitchhiker Skull (Moustache Hair Eyepatch).jpg
78 : [78, 77] /home/hosung/cdot/ccl/hashtest/all-images/Alaska Hitchhiker Skull (Moustache Hair).jpg
90 : [90, 91] /home/hosung/cdot/ccl/hashtest/all-images/Anisotropic filtering en.jpg
91 : [91, 90] /home/hosung/cdot/ccl/hashtest/all-images/Anisotropic filtering pl.jpg
175 : [175, 180] /home/hosung/cdot/ccl/hashtest/all-images/Ch Light10.jpg
176 : [176, 177] /home/hosung/cdot/ccl/hashtest/all-images/Ch Light2.jpg
177 : [177, 176] /home/hosung/cdot/ccl/hashtest/all-images/Ch Light3.jpg
178 : [178, 181] /home/hosung/cdot/ccl/hashtest/all-images/Ch Light4.jpg
180 : [180, 175] /home/hosung/cdot/ccl/hashtest/all-images/Ch Light6.jpg
193 : [193, 195] /home/hosung/cdot/ccl/hashtest/all-images/Circle reflect wikipedia 2.jpg
195 : [195, 193] /home/hosung/cdot/ccl/hashtest/all-images/Circle reflect wikipedia sky.jpg
204 : [204, 205] /home/hosung/cdot/ccl/hashtest/all-images/Computer generated image of the M챈rsk Triple E Class (1).jpg
205 : [205, 204] /home/hosung/cdot/ccl/hashtest/all-images/Computer generated image of the M챈rsk Triple E Class (cropped).jpg
207 : [207, 367, 772] /home/hosung/cdot/ccl/hashtest/all-images/Copper question mark 3d.jpg
211 : [211, 210] /home/hosung/cdot/ccl/hashtest/all-images/Cro-Magnon man - steps of forensic facial reconstruction.jpg
216 : [216, 217] /home/hosung/cdot/ccl/hashtest/all-images/CTSkullImage - cropped.jpg
217 : [217, 216] /home/hosung/cdot/ccl/hashtest/all-images/CTSkullImage.jpg
220 : [220, 222] /home/hosung/cdot/ccl/hashtest/all-images/Cubic Structure.jpg
222 : [222, 220] /home/hosung/cdot/ccl/hashtest/all-images/Cubic Structure with Shallow Depth of Field.jpg
237 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Dimens찾o Fractal.jpg
251 : [251, 252] /home/hosung/cdot/ccl/hashtest/all-images/Earthrelief.jpg
252 : [252, 251] /home/hosung/cdot/ccl/hashtest/all-images/Earthrelief mono.jpg
266 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/ENIGMA Logo.jpg
281 : [281, 282] /home/hosung/cdot/ccl/hashtest/all-images/Flower And Vase (Graphic).jpg
282 : [282, 281] /home/hosung/cdot/ccl/hashtest/all-images/Flower And Vase Ver.02.jpg
337 : [337, 338] /home/hosung/cdot/ccl/hashtest/all-images/Frankfurt Skyline I - HDR (14196217399).jpg
338 : [338, 337] /home/hosung/cdot/ccl/hashtest/all-images/Frankfurt Skyline II - HDR (14391360542).jpg
350 : [350, 352, 351] /home/hosung/cdot/ccl/hashtest/all-images/Glass ochem dof2.jpg
351 : [351, 350, 352] /home/hosung/cdot/ccl/hashtest/all-images/Glass ochem dof.jpg
352 : [352, 350, 351] /home/hosung/cdot/ccl/hashtest/all-images/Glass ochem.jpg
356 : [356, 357] /home/hosung/cdot/ccl/hashtest/all-images/GML-Cave-Designer (1).jpg
357 : [357, 356] /home/hosung/cdot/ccl/hashtest/all-images/GML-Cave-Designer.jpg
358 : [358, 359] /home/hosung/cdot/ccl/hashtest/all-images/GML-Gothic-Cathedral (1).jpg
359 : [359, 358] /home/hosung/cdot/ccl/hashtest/all-images/GML-Gothic-Cathedral.jpg
360 : [360, 361] /home/hosung/cdot/ccl/hashtest/all-images/GML-Gothic-Window-Thickness (1).jpg
361 : [361, 360] /home/hosung/cdot/ccl/hashtest/all-images/GML-Gothic-Window-Thickness.jpg
362 : [362, 363] /home/hosung/cdot/ccl/hashtest/all-images/GML-Stuhl-Template (1).jpg
363 : [363, 362] /home/hosung/cdot/ccl/hashtest/all-images/GML-Stuhl-Template.jpg
364 : [364, 365] /home/hosung/cdot/ccl/hashtest/all-images/GML-Voronoi-Diagram (1).jpg
365 : [365, 364] /home/hosung/cdot/ccl/hashtest/all-images/GML-Voronoi-Diagram.jpg
367 : [367, 207, 772] /home/hosung/cdot/ccl/hashtest/all-images/Gold question mark 3d.jpg
377 : [377, 378] /home/hosung/cdot/ccl/hashtest/all-images/Griffith Park Jane Doe Reconstruction 9b.jpg
378 : [378, 377] /home/hosung/cdot/ccl/hashtest/all-images/Griffith Park Jane Doe Reconstruction 9d.jpg
423 : [423, 424] /home/hosung/cdot/ccl/hashtest/all-images/Hall effect A.jpg
424 : [424, 423] /home/hosung/cdot/ccl/hashtest/all-images/Hall effect.jpg
435 : [435, 815, 814] /home/hosung/cdot/ccl/hashtest/all-images/HDR The sound of silence (The road to Kamakhya).jpg
436 : [436, 837] /home/hosung/cdot/ccl/hashtest/all-images/HEAD inline.jpg
448 : [448, 449] /home/hosung/cdot/ccl/hashtest/all-images/Homo erectus pekinensis
449 : [449, 448] /home/hosung/cdot/ccl/hashtest/all-images/Homo erectus pekinensis.jpg
453 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/HrdiBloomExample.jpg
457 : [457, 458] /home/hosung/cdot/ccl/hashtest/all-images/Ilame In Tengwar Ver.01-2.jpg
458 : [458, 457] /home/hosung/cdot/ccl/hashtest/all-images/Ilam챕 (Name) In Tengwar.jpg
487 : [487, 488] /home/hosung/cdot/ccl/hashtest/all-images/King's Cross railway station MMB C1.jpg
488 : [488, 487] /home/hosung/cdot/ccl/hashtest/all-images/King's Cross railway station MMB C2.jpg
489 : [489, 490] /home/hosung/cdot/ccl/hashtest/all-images/King's Cross railway station MMB C3.jpg
490 : [490, 489] /home/hosung/cdot/ccl/hashtest/all-images/King's Cross railway station MMB C4.jpg
494 : [494, 495] /home/hosung/cdot/ccl/hashtest/all-images/KrakowHDR pics.jpg
495 : [495, 494] /home/hosung/cdot/ccl/hashtest/all-images/KrakowHDR slides.jpg
512 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/LOD Example.jpg
521 : [521, 524, 523] /home/hosung/cdot/ccl/hashtest/all-images/Lync02.jpg
523 : [523, 524, 521] /home/hosung/cdot/ccl/hashtest/all-images/Lync04.jpg
524 : [524, 523, 521] /home/hosung/cdot/ccl/hashtest/all-images/Lync05.jpg
586 : [586, 593] /home/hosung/cdot/ccl/hashtest/all-images/Mount Vernon
610 : [610, 611] /home/hosung/cdot/ccl/hashtest/all-images/Obsidian Soul 1.jpg
611 : [611, 610] /home/hosung/cdot/ccl/hashtest/all-images/Obsidian Soul 2.jpg
617 : [617, 618] /home/hosung/cdot/ccl/hashtest/all-images/Oren-nayar-vase1.jpg
618 : [618, 617] /home/hosung/cdot/ccl/hashtest/all-images/Oren-nayar-vase2.jpg
667 : [667, 668] /home/hosung/cdot/ccl/hashtest/all-images/Radiosity Comparison.jpg
668 : [668, 667] /home/hosung/cdot/ccl/hashtest/all-images/Radiosity scene.jpg
676 : [676, 677, 678] /home/hosung/cdot/ccl/hashtest/all-images/Rauzy2.jpg
677 : [677, 678, 676] /home/hosung/cdot/ccl/hashtest/all-images/Rauzy3.jpg
678 : [678, 677, 676] /home/hosung/cdot/ccl/hashtest/all-images/Rauzy4.jpg
721 : [721, 724, 722, 723] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.00.12 PM Meshlab.jpg
722 : [722, 721, 723, 724] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.00.26 PM meshlab.jpg
723 : [723, 722, 721, 724] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.00.37 PM meshlab.jpg
724 : [724, 721, 722, 723] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.00.49 PM meshlab.jpg
725 : [725, 726, 731, 730, 727, 729, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.09.42 PM blender.jpg
726 : [726, 725, 731, 730, 727, 729, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.11.32 PM blender.jpg
727 : [727, 725, 726, 731, 730, 729, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.11.42 PM blender.jpg
728 : [728, 729, 727, 726, 725, 731, 730] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.13.32 PM blender.jpg
729 : [729, 726, 731, 727, 725, 730, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.14.07 PM blender.jpg
730 : [730, 731, 726, 725, 727, 729, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.14.11 PM blender.jpg
731 : [731, 730, 726, 725, 727, 729, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.14.15 PM blender.jpg
734 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Scupltris logo.jpg
763 : [763, 764] /home/hosung/cdot/ccl/hashtest/all-images/Snapshot12.jpg
764 : [764, 763] /home/hosung/cdot/ccl/hashtest/all-images/Snapshot13.jpg
772 : [772, 207, 367] /home/hosung/cdot/ccl/hashtest/all-images/Spanish Question mark 3d.jpg
790 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Sterling2 icon SterlingW2589.jpg
799 : [799, 800] /home/hosung/cdot/ccl/hashtest/all-images/Synagoge Weikersheim innen 01.jpg
800 : [800, 799] /home/hosung/cdot/ccl/hashtest/all-images/Synagoge Weikersheim innen 02.jpg
814 : [814, 435, 815] /home/hosung/cdot/ccl/hashtest/all-images/The Sound of Silence -2EV.jpg
815 : [815, 435] /home/hosung/cdot/ccl/hashtest/all-images/The Sound of Silence Resulting HDR.jpg
835 : [835, 22, 23] /home/hosung/cdot/ccl/hashtest/all-images/UP 3773 and UP 3774 (1400UMCA and 1397UMCA) Reconstruction 001.jpg
837 : [837, 436] /home/hosung/cdot/ccl/hashtest/all-images/UPPER inline.jpg
844 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Valentine Doe 1993 Scaled.jpg
852 : [852, 854] /home/hosung/cdot/ccl/hashtest/all-images/ViewFrustum.jpg
854 : [854, 852] /home/hosung/cdot/ccl/hashtest/all-images/ViewWindow2.jpg
876 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Woman in bra staring.jpg
882 : [882, 883] /home/hosung/cdot/ccl/hashtest/all-images/WP VS 1 rel(dachris).jpg
883 : [883, 882] /home/hosung/cdot/ccl/hashtest/all-images/WP VS 2 rel(dachris).jpg
898 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Zoomin.jpg

Error : 10
NotFound : 10
Match Only One : 783
Match More Than One : 97

Test Result

The result says that there were 10 images that did not added. The reason was ‘IMAGE_SIZE_TOO_SMALL’. According to the source code, when the image’s width or height is smaller than 150 px, it does not add to the index. Since the 10 images didn’t added to the index, there were 10 images that are not founded in the searching.
783 images were matched with only the same image.
And 97 images were matched with more than one images.
Therefore, there was no true negative result.

Followings are some meaningful matching.

Cropped image

This :1992-06560 Reconstruction 002, and this : 1992-06614 Reconstruction 002 matches with this:UP 3773 and UP 3774 (1400UMCA and 1397UMCA) Reconstruction 001

It means this algorithm detects when an image is part of the other image. Followings are similar results :

Computer generated image of the Mærsk Triple E Class (1) Computer generated image of the Mærsk Triple E Class (cropped)

Cro-Magnon man rendered Cro-Magnon man - steps of forensic facial reconstruction

Frankfurt Skyline I - HDR (14196217399) Frankfurt Skyline II - HDR (14391360542)

Hall effect Hall effect A

Homo erectus pekinensis Homo erectus pekinensis, forensic facial reconstruction

Oren-nayar-vase1 Oren-nayar-vase2

moving and similar images

20131017 111028 green spiral ornament with Purple background 20131017 111122 Fairest wheel ornament with wall as background 20131017 111143 - White Feerest wheel ornament with plywood background

Mount Vernon, NYJane Doe facial reconstruction NamUs 3123 Reconstruction 001

This result is a bit strange. Faces of two people look resemble, however, this seems to be a false positive result.

Synagoge Weikersheim innen 01 Synagoge Weikersheim innen 02

changing colours and rotation

Copper question mark 3d Gold question mark 3d Spanish Question mark 3d

The other cases

KrakowHDR pics KrakowHDR slides

Three images’ position were changed.

Obsidian Soul 1 Obsidian Soul 2

Rauzy2 Rauzy3 Rauzy4

This result is a bit strange; another false positive.

Snapshot12 Snapshot13

This result is interesting because rotated object are detected. Whereas, for similar images (Snapshot00, 01, 02 ~ 14.jpg) that gives a lot of false positive result in pHash, didn’t match each other.


  • Pastec ignores images when their width or height is smaller than 150px. This should be considered.
  • Rotated and cropped images can be detected.
  • Comparing to DCT/MH hash in pHash, there were much less false positive results.
  • All in all, the result for 900 images were reliable than pHash
  • Hashing/Indexing and searching seems to be quite fast. However, performance test should be performed.
  • Hash size and indexing/searching mechanism should be analysed to customize for our server system


by Hosung at June 15, 2015 09:53 PM

Ali Al Dallal

Simple React Webpack and Babel Starter Kit

At Mozilla Foundation, we're starting to use React mainly to create our Web application and most of the time writing React without Webpack and Babel can be a bit annoying or really hard I can say.

Finding an example to create React app with Webpack and Babel sometimes you get tons of stuff that you don't want or don't care and having to remove stuff yourself you'll either create bugs or finding yourself spending more time fixing things that you broke than starting to code, so I created this simple repo with just the simple stuff you need to get started.

React Webpack and Babel
Simple React Webpack Babel Starter Kit

This is a simple React, Webpack and Babel application with nothing else in it.

What's in it?

Just a simple index.jsx, webpack.config.js and index.html file.

To run

You can simply run webpack build using this command:

> $ npm run build

If you want to run with webpack-dev-server simply run this command:

> $ npm run dev

Please contribute to the project if you think this can be done better in anyway even for the README :)

by Ali Al Dallal at June 15, 2015 02:37 PM

June 12, 2015

Anna Fatsevych

Flickr API in PHP

In one of my previous posts, I wrote a Python program to download images using Flickr API.

Now, I wrote it in PHP using phpFlickr API, which is quite easy to use and understand. For our purposes, my program will download all the images uploaded on the specific date. It makes one api call per image, it also hashes the images, as well as stores them in MySQL database.

Here is a code snipped to see how easy it is to make an API call and set the required parameters:

$f = new phpFlickr("YOUR API KEY HERE");
$photos = $f->photos_search(array("tags"=>"car","per_page"=>"500",
          "license"=>"3", "extras"=>"url_o,owner_name, license"));

More details on Flickr API queries and limitations are in my previous post here. The PHP program is available on GitHub.



by anna at June 12, 2015 07:24 PM

June 11, 2015

Hong Zhan Huang

OSTEP – The City of ARMs – Tools of the trade 2: tmux

The tool of the trade to be featured in this post will be the terminal multiplexer known as tmux. A terminal multiplexer is a tool which allows a user to create, access and manage a number of terminals all withing the confines of one screen. tmux also has the ability to detached from the screen and continue running in the background and later reattached when one wishes to continue the work from where the session was left off at. The tmux manual offers an encompassing literature on the workings of the program for those interested.

In this post I’ll be expounding upon my experience in setting up and using tmux.

The work that I’m doing at CDOT for the OSTEP team involves ssh’ing into a variety of machines (mainly housed in our EHL server cabinet) on a daily basis. After a certain point it becomes difficult to manage each connection with just a regular terminal. There’s also the inability to continue from the point where you had left off, the next time you want to return to work. After seeing my coworkers making use of tmux in their work processes, I endeavored to attempt to do the same.

tmux vs screen

Before we get into the basics of tmux, we should perhaps compare it with another terminal multiplexer: GNU’s Screen. I’m no expert on Screen but the gist of the comparison seems to point to tmux being a more modern and better version of Screen and is still actively being supported. The reading on the reasons why that is can be read on this FAQ. For myself as a new user to tmux and only have a little bit of dabbling with Screen, tmux does seem to be the better tool so far.


After installing tmux onto your system, to use it you’ll need to start a new session of tmux. This can be done through this command:

tmux new -s Demo

This will create a new session named Demo that has a single window and display it on the screen. You’ll also notice that in this window there is a status line at the bottom of the screen that will show information about the current session as well as being the location to input tmux commands.

A basic tmux session with one window

From here we can begin using tmux’s features and functionality to modify our terminal work space to suit our liking.

tmux prefix

The prefix or escape function is the key combination that allows the user to exit normal input and enter tmux commands or shortcuts. The prefix in tmux is ctrl-b or in other words ctrl plus b together. Following this input you may press any key that has a bound functionality to it (ctrl-b c will create a new window for example) or press the colon key to enter the tmux command prompt where you can type out the command you wish to execute manually. You can find a list of all the currently assigned bindings with ctrl-b then question mark (ctrl-b ?). Now with the knowledge of the prefix let’s go and play around.

We’ll start by creating an additional three more windows in our session:

ctrl-b c x3 or new-window in the tmux command prompt

In our first window we’ll split the window into three panes by first splitting the window in half vertically:

ctrl-b % or split-window -v (v for vertical splits and h for horizontal splits)

Lastly we’ll rename the current window to “A Distant Place” (tmux has a search function for window names so you can easily find a window if you have many running if you have a name for it):

ctrl-b , or command-prompt -I #W "rename-window '%%'"

Now our session looks like this:

We have four windows as shown in the status line and our first window now named A Distant Place has a two pane split. These are just some of the basic options to creating a work-space to your liking.


One of the pros of using terminal multiplexers like tmux is the ability to start a task, walk away and come back to it later. The process to do this is to detach the session:

ctrl-b d or detach-client

and then when you wish to return to your session:

tmux attach -t Demo

Sessions are ended when all windows of a session are exited. My typical usage of tmux so far is to have my workstation start the session and thus become the tmux server. I can then remotely access my workstation via a laptop when I’m not on site and can continue using my session for as long as it exists. By using tmux I can maintain  a constant terminal environment with all the ssh or serial connections easily.


I had said earlier that tmux is quite easily customizable. You can change how the key bindings are for tmux commands or create new ones for your own preferences. You may also change the visual aspects of tmux such as the colours of the status bar items. You can add items of choice to the status bar such as up-time, number of users currently using the session or battery-life of your laptop. Mouse support also exists for tmux should you want it. Suffice to say there is a lot of customization you can do with tmux. I’ll share the .tmux.conf file that has all the configurations I’ve been using so far (comments are prefixed with the # sign):

#Start numbering at 1
set -g base-index 1
set -g pane-base-index 1

#Set status bar for cleaner look
set -g status-bg black
set -g status-fg white
set -g status-left '#[fg=green]#H'

#Highlight active window
set-window-option -g window-status-current-bg red
set-window-option -g window-status-activity-style "fg=yellow"

#Show number of current users logged in and average loads for the computer
set -g status-right '#[fg=yellow]#(uptime | cut -d "," -f2-)'

#Set window notifactions
setw -g monitor-activity on
set -g visual-activity on

#Automatically set window title
setw -g automatic-rename

#Rebind split window commands
unbind % #Remove the default binding for split-window -h
bind | split-window -h
bind - split-window -v

#Less input delay in command sequences ie C-a n
set -s escape-time 0

#Mouse support
set -g mode-mouse on
set -g mouse-resize-pane on
set -g mouse-select-pane on
set -g mouse-select-window on

#Allow for aggressive resizing of windows (not constrained by smallest window)
setw -g aggressive-resize on

#pane traversal bindings
bind h select-pane -L
bind j select-pane -D
bind k select-pane -U
bind l select-pane -R

# reload config
bind r source-file ~/.tmux.conf \; display-message "Config reloaded..."

#COLOUR (Solarized 256)

#default statusbar colors
set-option -g status-bg colour235 #base02
set-option -g status-fg colour136 #yellow
set-option -g status-attr default

#default window title colors
set-window-option -g window-status-fg colour244 #base0
set-window-option -g window-status-bg default
set-window-option -g window-status-attr dim

#active window title colors
set-window-option -g window-status-current-fg colour166 #orange
set-window-option -g window-status-current-bg default
set-window-option -g window-status-current-attr bright

#pane border
set-option -g pane-border-fg colour235 #base02
set-option -g pane-active-border-fg colour136 #base01

#message text
set-option -g message-bg colour235 #base02
set-option -g message-fg colour166 #orange

#pane number display
set-option -g display-panes-active-colour colour33 #blue
set-option -g display-panes-colour colour166 #orange

set-window-option -g clock-mode-colour colour64 #green

# status bar
set-option -g status-utf8 on

So that about wraps up an introductory bit about tmux’s utility and a brief on how you can go about using it. I think it is a really useful tool for those who are regularly using remote machines through ssh and I’ll likely be using it all the time from here on out. There are many features and items I didn’t touch on such as tmux’s copy mode, multi-user sessions and more. If you’re so interested in learning more about tmux, please refer to their official manual.

by hzhuang3 at June 11, 2015 05:36 PM

Hosung Hwang

MH Hash, MVP-Tree indexer/searcher for MySQL/PHP

Current development server works on the LAMP stack. Anna is working on Creative Commons Image crawler and User Interface using PHP/MySQL. For the prototype that works with the PHP UI code and MySQL database, I made an Indexer and Searcher.


The database contains lot’s of records that contains image url, license, and hash values. And that is make by crawler written in PHP.


Source code :

Description :

$ ./mhindexer
Usage :
     mhindexer hostName userName password schema table key value treeFilename
     hostName : mysql hostname
     userName : mysql username
     password : mysql password
     schema : db name
     table : table name
     key : image id field name in the table
     value : hash field name in the table
     treeFilename : mvp tree file name
Output :

The program takes MySQL connection informations : hostname, username, password. And the database information : schema, table, key, value. After connecting using the information, it reads all ‘key’ and ‘value’ fields from the ‘table’. ‘key’ is used as a unique key that points the db record that contains image information : filename, url, hash value, etc. ‘value’ is a hash value that is used to calculate hamming distance.

After connecting to the database, program reads all records that contains hash values. And makes add them to MVP-tree. When the tree is built, it is written to the ‘treeFilename’ file.

I made simple bash script that run mhindexer with parameters. output is :

$ ./,784,0.035845

From the hashes in the database, the tree is written to and there are 784 nodes and it took 0.035845 seconds.


Source code :

Description :

Usage :
    mhsearcher treeFilename imageFilename radius
    eg : mhsearcher ./test.jpg 0.0005
output : 0-success, 1-failed
    success : 0,count,id,id,id,...
      eg : 0,2,101,9801 
    failed : 1,error string
      eg : 1,MVP Error

For now, searcher reads the tree file(treeFilename) to generate tree structure, and extracts MH hash from input file(imageFilename), then search the hash value in the tree using ‘radius’.

Output is used by php script. When the first field divided by comma is 0, there is no error and the result is meaningful. Second field is count of detected hashes. And following fields are ids of hashes. Using the ids, php script can get image information from the database.
When the first field is 1, following field is the error message.

To test it, I randomly chose an image that is in the database.
Example output is :

$ ./mhsearcher WTW_Nov_2013_Tumanako_023.JPG 0.001
$ ./mhsearcher WTW_Nov_2013_Tumanako_023.JPG 0.1
$ ./mhsearcher WTW_Nov_2013_Tumanako_023.JPG 0.2
$ ./mhsearcher WTW_Nov_2013_Tumanako_023.JPG 0.3
$ ./mhsearcher WTW_Nov_2013_Tumanako_023.JPG 0.44

For the performance statistics purpose, I added radius, calculation count and extraction time at the end of the result.
In this image’s case, when the radius was 0.2, matching image was found. And when the radius was 0.44, there was 5 results.


  • This utilities works well with MySQL and PHP.
  • Because of the characteristics of tree search algorithm, repeated search from the radius of 0.001 to 0.5 inside the searcher can be done to get the fast and reliable result.
  • Later, indexer and searcher can be changed to linux daemon process to maintain the tree in the memory for fast searching.
  • When the amount of database record is enormous(millions ~ billions), the tree can be divided to several sections in the database.

by Hosung at June 11, 2015 04:30 AM

June 10, 2015

Koji Miyauchi

Heroku with node.js & mongoDB


The goal of our project last two weeks is toput our application on to Github Page. In order to do that, we had to host our server side APIs to somewhere accessible.
After some discussions with our clients, we decided to host the server side codes to Heroku.

Heroku is one of the popular cloud application platforms ( such as AWS, DigitalOcean and Engine Yard ) that can host your web application. Good thing is about Heroku is initial cost is free.

This service is very easy to use.
Only you need to do is basically these.

  1. Have your git repository for the app.
  2. Proper configuration in your project.
    In our case, we use Node.js, so we configure the applications’ dependencies and start up file in package.json
  3. Push the repository to Heroku

After deploy your application to Heroku’s master repository. It will automatically install all the dependencies your app need, and run it.

Deploy your application to Heroku

Here is good instruction how to deploy your Node.js application onto Heroku. Setting up is very straight forward.

Install mongoDB Add-on

In order to use mongoDB on Heroku after set up your application. You need to install Add-on calledmongoLab or Compose MongoDB. I use mongoLab this time.

Installing Add-on is also quite easy to do. Just type

heroku addons:create mongolab -a <application>

and it will install the Add-on on to your application.
All the configuration of your DB is available from Heroku’s web console.
mongoLab 500MB storage for free.


Heroku accepts many types of applications, such as Node.js, Ruby on Rails, php, Java, Python and so on.
And it allow user to deploy the application very quickly. It will automatically set up infrastructure for you, so you can save your time as well.

by koji miyauchi at June 10, 2015 09:11 PM

Anna Fatsevych

Wiki Parser and User Interface

As I mentioned in the last post I was writing a “parser” of some sorts to get through the xml files that are located in the Wiki Image Grab along with the corresponding images.

I have a php program now, that will get the image name from the list file, and will then use wiki API to get the latest data (author, license, and its existence status). The program is available on GitHub.

I have also written a User Interface in PHP that will allow for comparison of images: either downloaded or VIA url. Here is a preview of it.


Here is the link to this code on GitHub. This is a quick demo for now, using jQuery and Bootstrap – and the PHP code will be re-factored and cleaned up.

by anna at June 10, 2015 09:02 PM

Hosung Hwang

MVP Tree with MH Hash for Image Search

MH Image hash in pHash project generates 72bytes’ hash values. Despite the weakness of false positive result for simple images, it has a benefit of the fact that it can be used with MVP Tree implementation.

Sample program

I wrote sample utility using C++ to test real samples.
Source code is (can be changed later) :

This program works like following usage :

Usage :
    MHHashTree drectory filename radius
      directory : a directory that contains .hashmh files that will be in the MVP-tree
      filename : a .hashmh file to search from the tree
      radius : radius to search eg. 0.0001, 0.1, 1.0, 4.0
    MHHashTree drectory filename radius BranchFactor PathLength LeafCap
      BranchFactor : tree branch factor - default 2
      PathLength : path length to use for each data point - default 5
      LeafCap : leaf capacity of each leaf node - maximum number of datapoints - default 25

Test Result 1

The sample directory contains 900 image hashes that are extracted from images. I picked up an image that has 1 similar image :
Ch Light6

$ ./MHHashTree /home/hosung/cdot/ccl/hashtest/all-images "Ch Light6.jpg.hashmh" 0.001
(*) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
------------------Results 1 (9 calcs) (0.000011 secs)---------
(0) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
$ ./MHHashTree /home/hosung/cdot/ccl/hashtest/all-images "Ch Light6.jpg.hashmh" 0.1
(*) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
------------------Results 2 (738 calcs) (0.002161 secs)---------
(0) Ch Light10.jpg.hashmh   : ff43e93158c7740000949690aecc3100e7e0b2a5493b6fa5263444bad9c16930224891f9b2fc300bc1f0fc7e392436c7e7f1ffb40c04e07030fc7e3f038fc7000000000000000000
(1) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
$ ./MHHashTree /home/hosung/cdot/ccl/hashtest/all-images "Ch Light6.jpg.hashmh" 0.4
(*) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
------------------Results 11 (897 calcs) (0.000733 secs)---------
(0) Ch Light10.jpg.hashmh   : ff43e93158c7740000949690aecc3100e7e0b2a5493b6fa5263444bad9c16930224891f9b2fc300bc1f0fc7e392436c7e7f1ffb40c04e07030fc7e3f038fc7000000000000000000
(1) Metaball3.jpg.hashmh   : 0000000000000000000002b9c0400fc4620000f6e4a77c7b877e242496ec45b978d848db24b5254f99b97cdcdb2076ccdfefcd6de42400447e2a0203381e00000000000000000000
(2) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
(3) Orx-logo.jpg.hashmh   : 000000000000000000000000000000000000000063f1b9ef2fb200006b897da4b194020000226c5098e17dea00000037fbf6f92dfe00000000000604000000000000000000000000
(4) Snapshot10pointcloud.jpg.hashmh   : 0000000000000000000000001fcfc000000000000012cdb0000000000000124db0000000000000161db00000000000001228d8000000000000027028000000000000000000000000
(5) Snapshot05.jpg.hashmh   : 00000000000000000000000000afc80000000000001a4db0000000000000122da800000000000016cb680000000000001263200000000000001b72e0000000000000000000000000
(6) Snapshot01.jpg.hashmh   : 00000000000000000000000000a1980000000000001b4308000000000000112928000000000000136ba80000000000000922400000000000000c0378000000000000000000000000
(7) Alaska Hitchhiker Skull (Moustache Hair).jpg.hashmh   : 000080003007bc0000f702c50c4d773389448fd50e3fcf81399c0400d2c483b1f88f96d220b4a4ea6ba81e4b2223d300e1e8a81f2883000000000000000000000000000000000000
(8) Alaska Hitchhiker Skull (Moustache Hair Eyepatch).jpg.hashmh   : 00008000700fbc0000ff00439c4c4704683c8fd51e7f4781399d8c0095dce391f84f079220eda6eb69a9e64b2623d300e1e8a81d2a83000000000000000000000000000000000000
(9) Snapshot04.jpg.hashmh   : 00000000000000000000000000a1880000000000001a424800000000000012292800000000000016cfb0000000000000092bf00000000000001b7078000000000000000000000000
(10) Snapshot07.jpg.hashmh   : 00000000000000000000000000a1c80000000000001a4df0000000000000126db8000000000000124db00000000000001244d8000000000000137270000000000000000000000000

When the radius was 0.001 or 0.01, calculation count was 9 and the result was only 1 image that is exactly the same. Time was 0.000011 secs.
When the radius was 0.1, calculation count was 738, and the result was 2 images. More time took than 9 times’ calculation. Newly added image(Ch Light10.jpg.hashmh) was this :
Ch Light10
When the radius was 0.3, the result was the same as 0.1
When the radius was 0.4, calculation count was 897 and there were 11 results. The result images are :
Snapshot07Snapshot04Alaska Hitchhiker Skull (Moustache Hair Eyepatch)Alaska Hitchhiker Skull (Moustache Hair)Snapshot01Snapshot05Snapshot10pointcloudOrx-logoMetaball3

Test Result 2

This time I picked up an image that has white background and more similar images : Snapshot01.jpg.

$ ./MHHashTree /home/hosung/cdot/ccl/hashtest/all-images "Snapshot01.jpg.hashmh" 0.01
(*) Snapshot01.jpg.hashmh   : 00000000000000000000000000a1980000000000001b4308000000000000112928000000000000136ba80000000000000922400000000000000c0378000000000000000000000000
------------------Results 1 (21 calcs) (0.000073 secs)---------
(0) Snapshot01.jpg.hashmh   : 00000000000000000000000000a1980000000000001b4308000000000000112928000000000000136ba80000000000000922400000000000000c0378000000000000000000000000

$ ./MHHashTree /home/hosung/cdot/ccl/hashtest/all-images "Snapshot01.jpg.hashmh" 0.1
(*) Snapshot01.jpg.hashmh   : 00000000000000000000000000a1980000000000001b4308000000000000112928000000000000136ba80000000000000922400000000000000c0378000000000000000000000000
------------------Results 10 (152 calcs) (0.000435 secs)---------
(0) Snapshot06.jpg.hashmh   : 00000000000000000000000000000000000000000000afc00000000000001244d0900000000000124ff4000000000000040200000000000000000000000000000000000000000000
(1) Snapshot09.jpg.hashmh   : 00000000000000000000000000aee01000000000001642e37c00000000001244940000000000000b6db80000000000000d92d0000000000000088100000000000000000000000000
(2) Snapshot02.jpg.hashmh   : 00000000000000000000000000a1980000000000000929f80000000000001b63780000000000001b6b280000000000000922580000000000000882f0000000000000000000000000
(3) Snapshot05.jpg.hashmh   : 00000000000000000000000000afc80000000000001a4db0000000000000122da800000000000016cb680000000000001263200000000000001b72e0000000000000000000000000
(4) Snapshot03.jpg.hashmh   : 0000000000000000000000000020200000000000001253b00000000000001262500000000000001a62480000000000001b6b08000000000000040c00000000000000000000000000
(5) Snapshot01.jpg.hashmh   : 00000000000000000000000000a1980000000000001b4308000000000000112928000000000000136ba80000000000000922400000000000000c0378000000000000000000000000
(6) K-3D logo.jpg.hashmh   : 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
(7) Snapshot04.jpg.hashmh   : 00000000000000000000000000a1880000000000001a424800000000000012292800000000000016cfb0000000000000092bf00000000000001b7078000000000000000000000000
(8) Snapshot07.jpg.hashmh   : 00000000000000000000000000a1c80000000000001a4df0000000000000126db8000000000000124db00000000000001244d8000000000000137270000000000000000000000000
(9) Snapshot00.jpg.hashmh   : 00000000000000000000000000000000000000000000000000000000000016d998000000000000126fb0000000000000040380000000000000000000000000000000000000000000

When the radius was 0.01, the result was only 1 that exactly matches with 21 calculation.
When the radius was 0.1, after 152 calculation, there were 10 results that are similar :
Snapshot07Snapshot06Snapshot05Snapshot04Snapshot03Snapshot02Snapshot01Snapshot00K-3D logo


  • When the radius was smaller than 0.01, calculation count in the tree was only few, and the result was exactly the same.
  • When the radius was 0.1, calculation count was grater, and the result was similar.
  • When the radius was 0.4, calculation count was almost similar to the count of all samples.
  • Radius means the distance in the tree that says the similarity, and it is same as hamming distance.
  • MH hash generates lots of 0 when the image contains solid background colour. Therefore, the hash value of the image that has only black colour is all 0.
  • As for BranchFactor, PathLength, and LeafCap parameters that are used for making MVP-Tree, I used default values, 2, 5, and 25 respectively. Test for various values need to be done.

by Hosung at June 10, 2015 08:22 PM

MVP Tree for similarity search

For several days, I analysed and implemented C++ utility that interact with Perceptual Hashes from the database. In this posting, I will introduce general analysis of MVP-Tree.

MVP Tree

Following two papers gives details about VP-Tree and MVP-Tree for similarity search :

“In vp-trees, at every node of the tree, a vantage point is chosen among the data points, and the
distances of this vantage point from all other points (the points that will be indexed below that node) are computed. Then, these points are sorted into an ordered list with respect to their distances from the vantage point. Next, the list is partitioned to create sublists of equal cardinality. The order of the tree corresponds to the number of partitions made. Each of these partitions keep the data points that fall into a spherical cut with inner and outer radii being the minimum and the maximum distances of these points from the vantage point. The mvp-tree behaves more cleverly in making use of the vantage-points by employing more than one at each level of the tree to increase the fanout of each node of the tree.” [Bozkaya & Ozsoyoglu 2]

Screenshot from 2015-06-10 10:39:47
[Bozkaya & Ozsoyoglu 9]

Screenshot from 2015-06-10 10:40:18
[Bozkaya & Ozsoyoglu 10]

MVP Tree implementation

The source code that I used was from that introduced on this page.

Followings are major APIs.

MVPTree* mvptree_alloc(MVPTree *tree,CmpFunc distance, unsigned int bf,unsigned int p,unsigned int k);
typedef float (*CmpFunc)(MVPDP *pointA, MVPDP *pointB);

mvptree_alloc allocates memory to store MVP-tree structure. CmpFunc is comparison function that is used to calculate hamming distance between two hash values inside MVPDP struct when new data point is added and the searching is happened.

MVPError mvptree_add(MVPTree *tree, MVPDP **points, unsigned int nbpoints);

This function add a data point to the tree. It can be the array of data point or one data point. While adding the node, tree is formed by comparing using CmpFunc.

MVPError mvptree_write(MVPTree *tree, const char *filename, int mode);
MVPTree* mvptree_read(const char *filename, CmpFunc fnc, int branchfactor, int pathlength, int leafcapacity, MVPError *error);

Using these functions, the tree structure can be written to a file, and can be loaded without making the tree again.

MVPDP** mvptree_retrieve(MVPTree *tree, MVPDP *target, unsigned int knearest, float radius, unsigned int *nbresults, MVPError *error);

This function retrieves similar hash results based on radius. When the radius is big, the comparison is done more times.

Sample program results

When I used 100 samples of 10 bytes random binaries, when the radius is changed from 0.01 to 3.0 and 5.0, results are :

radius : 0.01
------------------Results 1 (7 calcs)---------
(0) point101

------------------Results 3 (18 calcs)---------
(0) point108
(1) point101
(2) point104

------------------Results 10 (24 calcs)---------
(0) point102
(1) point103
(2) point105
(3) point107
(4) point108
(5) point101
(6) point104
(7) point106
(8) point109
(9) point110

When the radius was 0.01, there was 7 calculation while going through the tree. When the radius was 5.0, there was 24 calculations. When I changed the size of samples from 10 bytes to 72 bytes, the size of MH hash, the comparison count was more than the number of samples.


Sample program generates random values instead of using real hash of image. Since random values hardly have similarity between them, when the radius was less than 0.1, there was only one result that exactly matched value. To get some results, the radius should be at least 3. In this case, calculation count was almost the same with the number of values.
When I used real image hash values, the search result was quite impressive. That will be written in the next posting.

by Hosung at June 10, 2015 03:36 PM

Barbara deGraaf

What’s in a GUI

In this post I am going to talk about adding a GUI to a test scene so that a user can change values. I have meant to put this up earlier but got side tracked because I was watching  Hannibal’s season 3 premiere which has some of the most breathtaking cinematography I have seen, so if you want just what sort of results a cinematographer can create that would be the show to watch.

So back to the GUI, using THREE.js there is a library file called dat.gui which you can grab from google code page. Within your javascript file you can start making the GUI with;

  var gui = new dat.GUI();

I also recommend creating something to hold all the values that are going to be used in the GUI so in this case

var params ={


focallen: 100,

…(all other camera and lens properties)


If after you made the GUI you want to add folders you can add them with;

var camfolder= gui.addFolder(Camera);

var lenfolder=…

After you make all the folders you want you can start adding variables to the folder with;

var foc = lenfolder.add(params, focallen‘); 

The gui.dat library will add text boxes or sliders depending on if the value in params was a text or a number. So for the number values we can change it so that the user has a lower and upper limit for what the value can be and change the increment for the slider with using this line instead;

var foc = lenfolder.add(params, focallen,10,200).step(4).name(focal length);

The other type of input was a select menu for camera/lens. In order to do this the first step is to store the information about the camera/lens into a JSON file. After having the file we can use  jQuery;


//the inner working may change depending on how the JSON file was setup but you are          //going to use $.each to loop through the JSON file getting each entity and grab the value you    //want. So for this example I looped and grabbed the format value and then added it to an  //array of cameras(listcams).

after looping with the $.each we can use this list of cameras formats as options for the menu with

var cam = camfolder.add(params,’format’,listcams);

After having the GUI working we want it to do something when we change values so we can use


params.focallen =value;


We can do this for all values to continuously update the params. If you are running into issues with the JSON and storing the values gathered from the JSON file just remember that jQuery is async and do the onChange within the $.getJSON above.

If you want to add a button to the GUI the best way to do that is;

var obj= {submit:function(){

//logic that occurs when pressed here

//I did calculations of Hyperfocal distance and depth of field here


gui.add(obj, ‘submit’);

So this is basically all we will need to work with in terms of making and changing the GUI. The next step that my partner and I worked on was the depth of field using shaders. So next blog post I will talk about shaders before going into depth about them with depth of field.

Have a good night everyone.



by barbaradegraafsoftware at June 10, 2015 01:58 AM

June 09, 2015

Hong Zhan Huang

OSTEP – The City of ARMs – Tools of the trade 1: iperf

In the short time that I’ve been working on the OSTEP Team at CDOT there’s been much to take in and learn. In these Tools of the trade series of posts I’ll be describing a tool I have been making use of in my work.


iperf is a network performance measuring tool that I have been using to do some testing with the Ethernet ports of some of our ARM machines. I required a tool that would be able to measure the maximum performance of these ports while bypassing intermediate mediums that could obfuscate the results (such as the write speeds of a hard drive). iperf seemed to be a tool that would meet all my needs and more.

To quote the features of iperf from their official site:

  • TCP
    • Measure bandwidth
    • Report MSS/MTU size and observed read sizes.
    • Support for TCP window size via socket buffers.
    • Multi-threaded if pthreads or Win32 threads are available. Client and server can have multiple simultaneous connections.
  • UDP
    • Client can create UDP streams of specified bandwidth.
    • Measure packet loss
    • Measure delay jitter
    • Multicast capable
    • Multi-threaded if pthreads are available. Client and server can have multiple simultaneous connections. (This doesn’t work in Windows.

There’s quite bit of things that iperf is able to do but for my purposes, the TCP functionality with one client and one server suits me fine.

Using iperf

As alluded to prior, iperf operates in a client and server model where the server will artificially serve a file to the client and from that interaction iperf will measure the performance of the transfer between the two machines.

The steps to start up a basic testing process are as follows:

  1. Start iperf on the machine that will act as the server with: iperf -s
  2. On the other machine, start it up as the client with: iperf -c {IP of the Server}

And that’s it for basic operation! Following the completion of that instance you will see the results of the testing both the server and the client machines and it’ll look something like:

Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
[852] local port 5001 connected with port 33453
[ ID]   Interval          Transfer       Bandwidth
[852]   0.0-10.6 sec   1.26 MBytes   1.03 Mbits/sec

Again this is for the most basic usage of iperf which use the default window sizes, ports, protocol (TCP is default), units of measurement (Mbits/sec is default) and other options. For my use I only made use of the -f option which allows the user to choose what unit of measurement the results should be formatted to (in my case I used -f g which gives me the results in GBits/sec). In the chance you’d like access iperf’s other features this guide is what I read to get a understanding of how to operate this tool.

To make my life a little easier I wrote one bash script to automate the process of doing the iperf tests and recording their results as well as another to more easily parse the resulting logs.

test script:


echo "Beginning tests"

if [ "$1" = "" ] || [ "$2" = "" ]
  echo "Requires IP of the iperf server and output file name."
  touch ./$2
  for i in `seq 1 10`;
    iperf -c "$1" -f g >> $2

echo "Finished the tests"

The test script is meant to be used on the client machine as follows: test {IP of Server} {Filename of log}

parse script


echo "The file begin parsed is $1:"

echo "`grep "Gbits/sec" $1`\n"

AVG=$(grep -o "[0-9].[0-9][0-9] Gbits/sec" $1 |
  tr -d "Gbits/sec" |
  awk '{ SUM += $1} END { print SUM }' |
  awk '{ AVG = $1 / 10 } END { print AVG }')

MAX=$(grep -o "[0-9].[0-9][0-9] Gbits/sec" $1 |
  tr -d "Gbits/sec" | sort -n | tail -1)

MIN=$(grep -o "[0-9].[0-9][0-9] Gbits/sec" $1 |
  tr -d "Gbits/sec" | sort -n | head -1)

echo "The average rate of transfer was:  $AVG"
echo "The max rate was: $MAX"
echo "The min rate was: $MIN"

echo "Finished parsing."

The parse script is again used on the client in the following manner: parse {Filename of log}

And that about wraps up iperf in brief. The only thing to note is that you may need to open the relevant ports for iperf to work.

by hzhuang3 at June 09, 2015 06:00 PM

June 08, 2015

Barbara deGraaf

First up a test scene

Most feedback I got from the last post was that it was too mathy and I promise this one will have 100% less math than the last one.

The first thing to do done in the project was to make a test scene to work with. This will allow us to try different techniques and see if the outcome was expected.

The first part of making the test scene was to make walls and a ground. By using the box geometry or plane geometry in THREE.js is it very easy to make a wall or ground of size wanted. Adding all walls and the ground to an single object3D object allows us to move the whole scene around if we want the walls and ground to be in a different place.

To help measure units in the scene better a black and white checkerboard pattern was added to the wall and ground. The best way to do this is to have a small texture pattern of the checkerboard and to set texture.wrapS and texture.wrapT to THREE.RepeatWrapping and then use texture.repeat.set(x,x) where x is half the length/width of the geometry used above. Basically these three lines will cause the small checkerboard texture to appear on the whole wall/ground.

After having the basic walls and ground of the scene set up the next part is to add some detailed objects to the scene. Instead of boxes and spheres we need something will more definition and I decided to use humanoid models. There are a couple different ways to add external models to the scene. The way I did was to use MakeHuman software which allows you to easily make models and use them under the cc0 license. Exporting the created model to a obj/mtl files allows easy use in THREE.js. You can also use to make a object and export them to the file type you want.

To load the model THREE.js has a obj/mtl loader. The THREE.js website has excellent documentation on how to use it so I will say to check that up if you need to. After the model is loaded you can make as many meshes of the model as you want to put in the scene. The models can be easily scaled for accurate dimensions. By defining 1 THREE.js unit as representing 1 foot we can resize the models. Using a box of dimension 6x2x1 I can resize the human model to fit inside the box and therefore be accurate. I also added all the humans to a single object3D so that all humans can be moved at once. For my scene I ended up putting 5 human models in the scene spaced evenly apart from each other.

With these elements we have a scene that can be customized for any dimensions or distances we may want to test depth of field or field of view.

I was going to talk about adding the GUI here but I think instead I will make a separate post talking about the GUI so I can mention some specific points in creating it. So look forward to that next.


by barbaradegraafsoftware at June 08, 2015 01:57 AM

June 04, 2015

Hosung Hwang

Eclipse CDT Cross GCC Linker Option


I am testing an algorithm called MVP Tree. The source code is written in c and used MakeFile. To analyse it I wanted to see actual values in nodes and memory state when the tree is forming. Debugging with gdb in the console was painful. So I moved it to Eclipse CDT. However, in the linking process, Eclipse shows following errors.

MHMvcTree.c:50: undefined reference to `exp'
MHMvcTree.c:106: undefined reference to `sqrt'


Adding -lm linker option to
Project -> Properties -> Cross GCC linker -> Miscellaneous -> Other objects

Screenshot from 2015-06-04 16:04:33

Now I am happy.

by Hosung at June 04, 2015 08:15 PM

June 02, 2015

Anna Fatsevych

Wiki Media Commons

Wiki Media Commons is a media file repository, containing millions of images. I have been working with their Wikimedia Image Grabs to get the author, title, and licensing information.

The files gathered from the Wiki Commons Grabs come with an XML file that provides the author information in a wiki template {{Information}} to be exact, along with the {{Credit Line}} template

I have been looking into a few Wikimedia Template Parsers – many of them are not updated, and many just parse the text into html, or wikitext, ignoring the various available templates, which is what I need, unfortunately. My goal is to get the information I need from the XML files I already have without calling the Wikimedia API on each image – i.e. using the network. Here is the list of the Alternate Parsers.

I am currently writing one in PHP. So far, I have used wiki_parser.php to attempt to parse the Information template into key value pairs, but it looks like it only succeeds in parsing categories and I will have to write a parser myself. Here is my code so far:


$xmlfile = 'test.xml';
$fp = fopen($xmlfile, 'r');
$xmldata = fread($fp, filesize($xmlfile));

$xml = simplexml_load_string($xmldata);

// count number of revisions, get the latest one;
$numrevisions = count($xml->page->revision)-1;

// get the text part of the Information - isolate text to parse;
$text = $xml->page->revision[$numrevisions]->text;
(string)$try = (string)$text[0];

// so far successful at parsing "categories"
$wikipedia_syntax_parser = new Jungle_WikiSyntax_Parser($try);

I use simple_xml to parse the xml file into elements, and also to isolate the “Information” part, which is a text tag

Here is the output:

|Description=[[:en:Ford Galaxie|1967 Ford Galaxie GT]] photographed in [[:en:Laval, Quebec|Laval]], [[:en:Quebec|Quebec]], [[:en:Canada|Canada]] at Auto classique VACM Laval.[[Category:Auto classique VACM Laval 2013]]
[[Category:Ford Galaxie]]
[[Category:1967 Ford automobiles]]
|Permission=All Rights Released.
== {{int:license-header}} ==

To be continued with the parser code,


by anna at June 02, 2015 08:24 PM

Hosung Hwang

Performance Test of pHash : MH Image Hash

In another posting, I did performance test of DCT Image hash in pHash. Today, I did the same test for MH Image hash.

The test machine is the same machine I did before. And this time, the test performed only in Internal SSD. In the benchmarking using dd command, the reading speed was 440MB/S.

Sample images are 900 jpg files and the size is varied from 1.3kb to 50.4mb. Full size of the sample files were 805.4mb. The function for MH Image hash is ph_mh_imagehash(). This function allocates a memory block for output hash value and the hash value is 72bytes, which is much bigger than DCT hash result(8bytes). I wrote another C++ program to calculate hash using this function for all images in the given directory. To test only the reading time, printing or storing didn’t do.

Test results are :

$ for F in {0..7}; do ./phashbenchmh /home/hosung/cdot/ccl/hashtest/all-images; done
Elapsed secs to hashing 900 files : 246.779554
Elapsed secs to hashing 900 files : 242.660379
Elapsed secs to hashing 900 files : 242.693598
Elapsed secs to hashing 900 files : 242.494878
Elapsed secs to hashing 900 files : 243.201334
Elapsed secs to hashing 900 files : 242.948810
Elapsed secs to hashing 900 files : 243.554532
Elapsed secs to hashing 900 files : 243.004734

Interestingly, the hashing time was faster than DCT hash. DCT hash spent around 291 seconds in the internal SSD. Whereas, MH hash spent around 243 seconds.

by Hosung at June 02, 2015 08:20 PM

MH Image Hash in pHash 2 : test result

900 Sample image test

I used the same samples that are used in Andrew Smith’s test. Although the bunchmark from pHash team says that the hamming distance is smaller than 0.3 the images are similar, in the range of 0.2 ~ 0.3 there are so many false positive results. False positive results less than 0.2 was 175 pairs. And less than 0.1 was 63. However, many of them looks similar. And some strange result was in the range up to 0.2

Firstly, an image that filled with black colour matches with some simple images.

K-3D logo

Hamming distance between this black images and following 10 images were less than 0.1

Snapshot07 Snapshot10pointcloud Snapshot09 Snapshot06

Snapshot05 Snapshot04 Snapshot03 Snapshot02

Snapshot01 Snapshot00

I have no idea why the black image matches with so many images. in the 0.2, there are more matching.

Following image matchings seems to be true positive although it could be controversial.

Selection_046 Selection_047 Selection_048 Selection_049 Selection_050 Selection_051 Selection_053 Selection_055 Selection_060

In terms of false positive results, in the range of 0.0~0.1, all the matching is like folowing matching other than black image.


Following image matchings seems to be false positive results in the range of 0.1 ~ 0.2

Selection_062Selection_059 Selection_058 Selection_057 Selection_056 Selection_054 Selection_052

Something in common in those matching is the fact that the images contains wide range of solid colour background.

Font image test

I used the same image set for dct hash test in my previous posting.

Rotation 2 degrees

Intra Distance
Inter Distance

Rotation 4 degrees

Intra Distance
Inter Distance

Rotation 45 degrees

Intra Distance
Inter Distance

In terms of rotation, when the degree is up to 4 degrees, the images shows hamming distance from 0.1 to 0.35. The result of inter distance comparison is around 0.2~0.5. When the degree was 45, the distance range was the same as inter distance.

Adding a dot

Intra Distance

Interestingly, when the image has additional dot, all of the hamming distance was less than 0.1.

Moving 2%

Intra Distance

Moving 5%

Intra Distance

The result shows that MH hash also cannot detect when there was moving.

Arial Bold Font

Intra Distance

Arial Italic Font

Intra Distance

Georgia Font

Intra Distance

Times New Roman Font

Intra Distance

In terms of font change, the distance was in the range of 0.1 ~ 0.5.


  • From the sample image test, there are some false positive result for simple images.
  • When the sample image is complex or it doesn’t have solid background colour, there was no false positive results.
  • As for the font images, in terms of rotation change, MH hash seems to detect more than DCT hash.
  • Adding little dot doesn’t cause big hamming distance comparing to DCT hash.

by Hosung at June 02, 2015 06:06 PM

June 01, 2015

Hosung Hwang

MH Image Hash in pHash

So far, what we have tested with pHash was all about DCT(Discrete Cosine Transform) image hash algorithm. According to pHash design website, MH Image Hash method is recently add in pHash, and it has better results.

Difference between MH and DCT

  • MH hash size is 72 bytes : DCT is 8 bytes
  • MH takes more time on hashing and computing hamming distance than DCT
    -> Speed test will be performed later
  • MH is stronger against attacks
  • MH can be used with MVP tree indexing structure for fast retrieving
  • Hamming distance of MH is calculated by Binary quantization; whereas, DCT is XORing.
    : It must be slower than XORing.

This chart shows the benchmark result of MH hash. According to this result, when the hamming distance is bigger than 0.3 the images are different; and if the distance is smaller than 0.3 the images are similar.

This chart shows the result of DCT hash.

MH Hash implementation

I wrote two simple cpp program to hash and to calculate the hamming distance.

phashmp.cpp source code

int alpha = 2;
int level = 1;
const char *filename = argv[1];
int hashlen = 0;
uint8_t* hash1 = ph_mh_imagehash(filename, hashlen, alpha, level);
for (int i = 0; i < hashlen; i++)
    printf("%02x", hash1[i]);

Default values of alpha and level parameters are 2 and 1, respectively. I used default values. Third parameter is reference of generated hash value’s size. If the function succeeded, it is always 72(bytes). Return value is byte array (uint8_t is byte) of 72bytes hash value.

hammingmh.cpp source code

#define HASHLEN 72
    for (int i = 0; i < HASHLEN; i++) {
        sscanf(argv[1] + 2*i, "%02x", (unsigned int*)&hash1[i]);
    for (int i = 0; i < HASHLEN; i++) {
        sscanf(argv[2] + 2*i, "%02x", (unsigned int*)&hash2[i]);
    double dist = ph_hammingdistance2(hash1, HASHLEN, hash2, HASHLEN);
    printf("%lfn", dist);

In case of MH hash, hamming distance is calculated by binary quantization; a double value.

Test method

I wrote some bash scripts to generate hashes for all jpg files and to gather all hamming distance files to a file. Then I sorted them by hamming distance. To compare them, I wrote 2 different kind of bash script.

while read line           
    num=$(echo $line | cut -d , -f 1)
    filename=$(echo $line | cut -d , -f 2)
    filename2=$(echo $line | cut -d , -f 3)
    eog "$filename" "$filename2"
done < $1

This script shows two images using gnome image viewer line by line.

while read line           
    num=$(echo $line | cut -d , -f 1)
    filename=$(echo $line | cut -d , -f 2)
    filename2=$(echo $line | cut -d , -f 3)
    base1=$(basename "$filename");
    base2=$(basename "$filename2");
    r=$(( $RANDOM % 1000000 ));
    ln -s "$filename" "$linkpath1";
    ln -s "$filename2" "$linkpath2";
done < $1

This script makes soft link of all images to another directory by hamming distance, random value and the file name. So, I can see the image files’ pair list in the image viewer like XnView.

Test result will be posted soon.

by Hosung at June 01, 2015 09:45 PM

May 29, 2015

David Humphrey

Messing with MessageChannel

We're getting close to being able to ship a beta release of our work porting Brackets to the browser. I'll spend a bunch of time blogging about it when we do, and detail some of the interesting problems we solved along the way. Today I wanted to talk about a patch I wrote this week and what I learned in the process, specifically, using MessageChannel for cross-origin data sharing.

Brackets needs a POSIX filesystem, which is why we spent so much time on filer.js, which is exactly that. Filer stores filesystem nodes and data blocks in IndexedDB (or WebSQL on older browsers). Since this means that filesystem data is stored per-origin, and shared across tabs/windows, we have to be careful when building an app that lets a user write arbitrary HTML, CSS, and JavaScript that is then run it in the page (did I mention we've built a complete web server and browser on top of filer.js, because it's awesome!).

Our situation isn't that unique: we want to allow potentially dangerous script from the user to get published using our web app; but we need isolation between the web app and the code editor and "browser" that's rendering the content in the editor and filesystem. We do this by isolating the hosting web app from the editor/browser portion using an iframe and separate origins.

Which leads me back to the problem of cross-origin data sharing and MessageChannel. We need access to the filesystem data in the hosting app, so that a logged in user can publish their code to a server. Since the hosted app and the editor iframe run on different origins, we have to somehow allow one to access the data in the other.

Our current solution (we're still testing, but so far it looks good) is to put the filesystem (i.e., IndexedDB database) in the hosting app, and use a MessageChannel to proxy calls to the filesystem from the editor iframe. This is fairly straightforward, since all filesystem operations were already async.

Before this week, I'd only read about MessageChannel, but never really played with it. I found it mostly easy to use, but with a few gotchas. At first glance it looks a lot like postMessage between windows. What's different is that you don't have to validate origins on every call. Instead, a MessageChannel exposes two MessagePort objects: one is held onto by the initiating script; the other is transferred to the remote script.

I think this initial "handshake" is one of the harder things to get your head around when you begin using this approach. To start using a MessageChannel, you first have to do a regular postMessage in order to get the second MessagePort over to the remote script. Furthermore, you need to do it using the often overlooked third argument to postMessage, which lets you include Transferable objects. These objects get transferred (i.e., their ownership switches to the remote execution context).

In code you're doing something like this:

 * In the hosting app's js
var channel = new MessageChannel();  
var port = channel.port1;


// Wait until the iframe is loaded, via event or some postMessage
// setup, then post to the iframe, indicating that you're
// passing (i.e., transferring) the second port over which
// future communication will happen.
iframe.contentWindow.postMessage("here's your port...",  

// Now wire the "local" port so we can get events from the iframe
function onMessage(e) {  
  var data =;
  // do something with data passed by remote
port.addEventListener("message", onMessage, false);

// And, since we used addEventListener vs. onmessage, call start()
// see


// Send some data to the remote end.  
var data = {...};  

I'm using a window and iframe, but you could also use a worker (or your iframe could pass along to its worker, etc). On the other end, you do something like this:

 * In the remote iframe's js

var port;

// Let the remote side know we're ready to receive the port
parent.postMessage("send me the port, please", "*");

// Wait for a response, then wire the port for `message` events
function receivePort(e) {  
  removeListener("message", receivePort, false);

  if( === "here's your port...") {
    port = e.ports[0];

    function onMessage(e) {
      var data =;
      // do something with data passed by remote

    port.addEventListener("message", onMessage, false);
    // Make sure you call start() if you use addEventListener
addEventListener("message", receivePort, true);


// Send some data to the other rend
var data = {...};  

Simple, right? It's mostly that easy, but here's the fine print:

  • It works today in every modern browser except IE 9 and Firefox, where it's awaiting final review and behind a feature pref. I ended up using a slightly modified version of MessageChannel.js as a polyfill. (We need this to land in Mozilla!)
  • You have to be careful with event handling on the ports, since using addEventListener requires an explicit call to start which onmessage doesn't. It's documented, but I know I wasted too much time on that one, so be warned.
  • You can safely pass all manner of data across the channel, except for things like functions, and you can use Transferables once again, for things that you want to ship wholesale across to the remote side.
  • Trying to transfer an ArrayBuffer via postMessage doesn't work right now in Blink

I was extremely pleased to find that I could adapt our filesystem in roughly a day to work across origins, without losing a ton of performance. I'd highly recommend looking at MessageChannels when you have a similar problem to solve.

by David Humphrey at May 29, 2015 08:02 PM

Justin Flowers

Starting out with Ansible in CentOS6

Ansible is an incredibly powerful automation tool. It can allow you to connect to a VM and control configuration and installation of programs simply. Here’s what I did to to get it working for the first time:

Step 1: Get client machines installed with basic requirements

Important steps here are to make sure that:

  • You have configured your SSH RSA keys for the accounts which will be connecting (check out here for a great tutorial)
  • Your client machines for Ansible have Python installed
  • Your client machines for Ansible have libselinux-python installed
  • You have a supported OS installed for Ansible control machine

If you can SSH to your machine without using a password then you should be fine with the RSA keys here.

Step 2: Install Ansible on control machine

If you’re on Fedora, you likely have the ansible package in repositories already. Otherwise, you can install Ansible by installing epel-release with these commands:

sudo yum install epel-release
sudo yum install ansible

Step 3: Configure Ansible hosts

If you open up /etc/ansible/hosts you can add and modify groups of hosts. There are many options for configuration in this file, but suffice it to say you can declare a group with square brackets and then write either hostnames or IPs below the square brackets to add machines to it. For example, I defined an IP on my local host-only vbox network to be in the logstash-host group with:


Step 4: Write a playbook and test

This is the hard part. There are many examples on the internet on how to write this kind of a file, but essentially you can see it as defining a group of hosts to work for, the user to remotely connect, and then  a list of the tasks the playbook should perform.

Each task is made up of a name and command. The name is essentially what will be shown to you when it attempts to perform the given command. The command is a specific action to be performed by Ansible. For example, one of the tasks I used in my playbook was this:

 - name: send logstash input configuration
 copy: src=~/elk/logstash-input.conf dest=/etc/logstash/conf.d/

This command copied the file logstash-input.conf on the control machine to /etc/logstash/conf.d/ on the client machine. If you need help with finding what command to use or how to use it, googling your issue followed by ansible is usually good enough to get you a StackOverflow answer or take you right to the Ansible documentation for what you need.

Finally, to test, simply run:

ansible-playbook logstash_playbook.yml

Substituting logstash_playbook.yml for the name of the playbook you made.

by justin at May 29, 2015 06:57 PM

Mohamed Baig

How to setup Hypothesis locally for development on Arch Linux

What is Hypothesis Hypothesis is a web annotator. It allows users to create notes on any web page and PDF. As it says on the project website: Our team is building an open platform for discussion on the web. It leverages annotation to enable sentence-level critique or note-taking on top of news, blogs, scientific articles, […]

by mbbaig at May 29, 2015 05:33 AM

Barbara deGraaf

Everything you wanted to know about cameras

In this post I will detail the main points related to the image that is produced in a camera.

Without going into too much detail on how cameras and lenses work there are three main things that the final image can differ in:

1) Field of view

The field of view is how much of the area in front of you will be on the final image taken by the camera. While field of view and angle of view tend to be used interchangeably they are different in that field of view refers to the distances in real life that are being placed on the final image angle of view refers to the angle from top to bottom that is extended out from the camera.

To find the angle of view you need to know the focal length of the lens and the size of the film or sensor used in the camera. This following image by Moxfyre at English Wikipedia (under CC BY-SA 3.0) shows the best example for this concept 

Wiki page

Optics of a Camera

In this S1 is the distance from the lens to the object, S2 is the distance from lens to the sensor or film, and F is the focal length of the lens. You can see that as you increase the focal length while keeping the film sized fixed the angle will get smaller. If you keep the focal length the same but increase the film size the angle will get bigger. The equation to find the angle of view is easy enough to find with trigonometry from the above equation (with the assumption that S2=F which is not a valid assumption for macro situations but is valid for distant objects) and is

α = 2*arctan(d/2f)                                                                                                  (1)

The above can be looked at as if top down or from the side, in fact the angle of view tends to be different from horizontal and vertical as the film size is different for these dimensions.

Therefore field of view depends on film/sensor size a property of the camera chosen and focal length which is a property of the lens chosen.

2) Depth of field

Depth of field may be a little harder to understand and refers to the area that will be sharp or acceptably sharp in the final image. The first thing to do to find the hyperfocal distance. This distance is where objects will be in focus from half the hyperfocal distance up to infinity(past the half distance it is always in focus).

For example if the hyperfocal distance is 20m and you decide to focus on an object 25m away then the image will be in focus from 10m to infinity. If you focus on something 15m away(<H) then you have a finite depth of field which you will have to calculate.

First the equation for the hyperfocal distance, at the risk of being too mathy I will leave out the derivation (which can be found with geometry)

H = (F^2)/N*C                                                                                                      (2)

Where F is the focal length, N is the f-stop and C is the circle of confusion. The f-stop is the aperture and is the ratio of focal length to diameter of entrance pupil. C is the circle of confusion which is a property of a lens and is where light will not come to prefect focus.

After finding the hyperfocal distance the near and far depth limits can be found after knowing the focus distance (which is something the cinematographer picks.)

DNear = H*S                                                                                                         (3)             H+(S-F)

DFar = H*S                                                                                                           (4)           H-(S-F)

Where H is the hyperfocal distance and S is the focus distance.

For a good explanation of how depth of focus works go to this page.

Therefore the depth of field depends on focal length which is a property of the lens, circle of confusion which is also a property of the lens, aperture which is a property chosen by user and focus/subject distance which is also chosen by the user.

3) Exposure

This last thing I will mention but not go into detail. This refers to the amount of light going into the camera and how bright the picture will be. This depends on many things like aperture, shutter speed, and lights placed in the scene.

Therefore the main this that a user should be able to pick are the type of camera, type of lens, aperture setting, focus distance, and maybe focal length of lens if it a zoom lens.

Stay tuned for the adventure of making a test scene to use and verify our cameras in.

by barbaradegraafsoftware at May 29, 2015 03:37 AM

May 28, 2015

Anna Fatsevych

MySQL Tests Continued

The Wikimedia Commons image dumps include files of .svg,.pdf, .ogg, and .djvu formats. SVG is an image file, whereas .pdf’s were mostly books, ogg’s were video/audio and .djvu’s were scanned books/texts.

Hashing these with pHash and BlockHash was a challenge, because they do not always throw an error (i.e. when trying to hash pdf), so some issues took longer to discover, and the others (svg, ogg, and djvu) cannot be hashed.

Dealing with file names with many different characters of various languages some exceptions arose – a php function addcslashes($string, “list of characters to escape”) comes in handy as well as the ones mentioned in the previous post.

I ran my php program on 6,953 files, of which 6308 were processed (some due to format and some due to file naming convention errors). It took 2.41 hours – to pHash, and blockHash each image out of 6308, and then store the values in the database. Granted, hashing took the most time as the dummy data results averaged 1,000 INSERTS per minute.

I ran SELECT tests on my 6,008 records and discovered that, ‘SELECT *’ and ‘SELECT where’ based on hashes speeds were quite impressive, with the longest one taking 3.2 seconds for select all. Granted, there were many duplicates in this database (the same hash was applied to erroneous files), which will not be the case.

Part Two (May 29, 2015):

I have run more tests on MySQL. Here is an overview:


To time the ‘SELECT *’ statement I am running this command on the shell:

time mysql --user=root --password='password' myDB < selectall.sql > ~/temp

In “selectall.sql” I have the following statement:

select * from MyTable;

And for the 105,000 entries here is the time

real    0m1.069s
user    0m0.887s
sys     0m0.084s

When timed in my PHP code, SELECT *, and SELECT on phash or bhash took respectively (0.2 and 0.1 seconds);

Here is a snipped of my PHP code on how I have timed the queries:

$sql = "SELECT * FROM IMG where phash=15980123629327812403";
$result = $conn->query($sql);

if ($result->num_rows > 0) {
    // you can output data of each row
    while($row = $result->fetch_assoc()) {
} else {
    echo "0 results";
$time_post = microtime(true);
$exec_time = $time_post - $time_pre; 

To conclude here are the results of my queries, without printing to screen:

INSERT times were constant at approximately 16 inserts per second;
SELECT * was timed at 0.01 seconds (system) for 105,000 records, and WORST at 3.02 seconds for 6,308 records (with BLOBs);
SELECT WHERE (search on hash value) averaged at 0.04 seconds in general. There were many duplicate hashes, as generating unique values proved very time consuming. Although, I am planning to run a SELECT test on unique values in near future.
VARCHAR(73) or CHAR(73) were both tested for efficiency, and there was no difference on 5,000 record tests.

More to come on this topic,


by anna at May 28, 2015 07:17 PM

Hosung Hwang

Performance test of pHash

I performed performance test of a perceptual hash algorithms : pHash.

Following is the specification of test machine.

OS : Ubuntu 14.04 LTS
Processor : Intel Core i7-3517U CPU @ 1.90GHz x 4
OS type 64-bit
Memory 7.7GiB
Disk : 117.6GB SSD

Test performed from the internal SSD(MSATA) drive and external USB hard drive.

Read/Write benchmarking

Before doing actual hashing test, I performed simple read/write benchmark using dd command. It writes and read 8.2gb file :

time sh -c "dd if=/dev/zero of=/media/hosung/BACKUP/test.tmp bs=4k count=2000000 && sync"
time sh -c "dd if=/media/hosung/BACKUP/test.tmp of=/dev/null bs=4k"

Each job was performed 5 times. Followings are average values.

Condition Speed
Internal SSD Write 245 MB/s
Internal SSD Read 440MB/s
External HDD Write through USB 3.0 109MB/s
External HDD Read through USB 3.0 122 MB/s
External HDD Write through USB 2.0 109 MB/s
External HDD Read through USB 2.0 129 MB/s

USB 3.0 reading speed was slightly faster than USB 2.0. And Internal SSD was 4 times faster than USBs.

pHash performance test

Sample images are 900 jpg files and the size is varied from 1.3kb to 50.4mb. Full size of the sample files were 805.4mb. For the test, I wrote a c++ code that extract hash values using ph_dct_imagehash() function in pHash from all jpg images in a directory. The reason to rewrote the program is to avoid the time of loading new process when shell script is used. Every test were performed 8 times after rebooting.

Internal SSD

hosung@hosung-Spectre:~/cdot/ccl/PerHash/pHash/pHash-0.9.6/phash-test$ for F in {0..7}; do ./phashbench /home/hosung/cdot/ccl/hashtest/all-images; done
Elapsed secs to hashing 900 files : 292.419326
Elapsed secs to hashing 900 files : 290.789127
Elapsed secs to hashing 900 files : 291.163042
Elapsed secs to hashing 900 files : 290.769897
Elapsed secs to hashing 900 files : 290.710176
Elapsed secs to hashing 900 files : 290.940988
Elapsed secs to hashing 900 files : 290.880126
Elapsed secs to hashing 900 files : 290.766687

External HDD through USB 3.0 port

hosung@hosung-Spectre:~/cdot/ccl/PerHash/pHash/pHash-0.9.6/phash-test$ for F in {0..7}; do ./phashbench /media/hosung/BACKUP/all-images; done
Elapsed secs to hashing 900 files : 293.422019
Elapsed secs to hashing 900 files : 293.145768
Elapsed secs to hashing 900 files : 292.828859
Elapsed secs to hashing 900 files : 292.591345
Elapsed secs to hashing 900 files : 292.631436
Elapsed secs to hashing 900 files : 292.811508
Elapsed secs to hashing 900 files : 292.898119
Elapsed secs to hashing 900 files : 292.607773

External HDD through USB 2.0 port

hosung@hosung-Spectre:~/cdot/ccl/PerHash/pHash/pHash-0.9.6/phash-test$ for F in {0..7}; do ./phashbench /media/hosung/BACKUP/all-images; done
Elapsed secs to hashing 900 files : 294.008601
Elapsed secs to hashing 900 files : 292.954135
Elapsed secs to hashing 900 files : 292.275561
Elapsed secs to hashing 900 files : 292.255697
Elapsed secs to hashing 900 files : 292.459464
Elapsed secs to hashing 900 files : 292.737186
Elapsed secs to hashing 900 files : 292.803859
Elapsed secs to hashing 900 files : 292.605617


  • USB 3.0 port and USB 2.0 port seems to have no difference.
  • Even when it was performed from internal SSD, the speed was slightly fast.
  • In spite of 4 times faster reading speed, hashing speed in internal SSD was less than 1% faster than USB.
  • Therefore, in terms of hashing, CPU performance seems to be important than IO performance.
  • The other method in pHash : ph_mh_imagehash should be tested later.

by Hosung at May 28, 2015 02:34 PM

May 27, 2015

Justin Flowers

Working with Host-Only VBox Networks in CentOS6

In order to communicate between VMs a simple alternative to fancy port forwarding is to set up a host-only network joining them. This is my go to solution for testing machines that need to talk to other machines. In CentOS6 this can be quite difficult to figure out on your own. Here I’ll discuss how to set up your machines fully in simple terms.

Step 1: Create the network in VirtualBox preferences

Before we can begin configuring boxes to be on this host-only network, we’ll need to make it first. This is relatively easy, thankfully. Simply go to VBox’s network preferences through File->Preferences->Network and hit the + button at the top right of the page to add a new host-only network.

Step 2: Connect VMs to host-only network

Note: to do this part your VMs must be powered down
Next we need to give the VMs access to this network. Go to the VMs network settings through right-clicking on the machine and Settings->Network. Once there add a new adapter by clicking on one of the tabs at the top and checking Enable Network Adapter. Then simply pick Host-only Adapter and it should automatically pick the first network on the list. Do this for all machines you want to communicate via the host-only network.

Step 3: Configure adapters on VMs

This is the hardest step and took me the longest to figure out. Begin by doing:

ifconfig -a

This will show you a list of all adapters present on your machine. The one you’re looking for will match the hardware address created for each network adapter in step 2, although usually its the last Ethernet one in the list. Once you have the name of your adapter (likely eth1 if you only had a NAT adapter before) you can begin configuring it with:

sudo vi /etc/sysconfig/network-scripts/ifcfg-eth1

Substituting eth1 with the name of your adapter you found with the ifconfig.In this new file copy and paste:


Again, substituting eth1 for whatever the name of your host-only adapter was, adapter_mac with the MAC address for your host-only adapter (which can be found with ifconfig -a or from the VBox machine network settings page), and the IP address for whichever one you want the machine to have.

Save that file and then run:

ifup eth1

Alternatively, if you know you will be destroying the machine soon and wish to configure this quickly, simply run:

sudo ifconfig eth1 netmask up

However the above command will not persist through reboots and will not keep your IP static, meaning your computer could be leased a new one if you’re working for a longer period of time.

If you’ve followed all the steps correctly you should now see your connection to the host-only network! Test it out with a ping to see if it works.

by justin at May 27, 2015 07:51 PM

Anna Fatsevych

MySQL Stress Test

I ran various commands through MySQL database testing for speed and size variations.

To check for size of the database I used this code:

SELECT table_schema “Data Base Name”,
-> sum( data_length + index_length ) / 1024 /
-> 1024 “Data Base Size in MB”,
-> sum( data_free )/ 1024 / 1024 “Free Space in MB”
-> FROM information_schema.TABLES
-> GROUP BY table_schema
-> ;

I am running a PHP file, and the MySQL commands are in a loop. Here is the code snipped to show how the processing time is calculated.

       // Start timing
       $time_pre = microtime(true);

        for ($x = 0; $x <= 50000; $x++) {
            $DB->query("INSERT INTO IMG(phash, bhash,author,license,directory) VALUES($hash,'{$bhash}','author','license','{$directory}')");
        // Stop timing
        $time_post = microtime(true);
        $exec_time = $time_post - $time_pre;   
        echo "\n\n TIME: ".$exec_time."\n";

The Results (click for a larger image):


Difference in INSERT statements in TIME between BLOB and VARCHAR(1024) was relatively small (52.54 seconds per 1000 records, and respectively 58.22 for the BLOB formatted ones). But the most significant difference was in size:


by anna at May 27, 2015 07:16 PM