Planet CDOT

August 28, 2015

Catherine Leung

End of Summer – Thank you!

This summer was a big one at for CDOT (Centre for Development of Open Technology).  We had almost 30 RA’s working on many many different types of project.  Yesterday was our final cdot presentation day for the summer.

For those unfamiliar, CDOT is an applied research group within Seneca College.  Over the years we have worked on many different projects related to open technology in various areas.  I am one of the faculty members involved with CDOT.  Projects tend to run over the course of a semester or two.  Students are hired either while they are studying or just after they graduate.

My team worked on projects related to 3D web graphics.  This summer I had two projects.  The first was a project with Box, to implement a 3D data visualization tool.  The second was with a Gorilla Productions.  The project was to implement realistic cameras for three.js.  (cameras that behave as real world cameras do).  In total I had 4 awesome research assistants working for me (Andrii, Barbara, Dmitry and Koji).  They have done a fantastic job on their projects.

However, aside from my two projects, CDOT also hosted many other projects.  One of the reasons the reasons that CDOT exists is to provide experiential learning opportunities to our students.  A chance for them to work on real projects and develop skills.  One of those skills is the ability to talk about their work.  Thus, each Thursday was we have “Demos”.  During that time, the teams will talk about their project and what they have been working on, problems they encountered, solutions they found.  Yesterday, was the last day and everyone had a chance to talk about thoughts of on their summer’s work.

Generally the comments were all very positive. However, there did seem to be a few themes that were shared by multiple students.  These were:

  1. Being surrounded by other very smart people working on interesting projects has helped them develop as programmers
  2. Demo days helped them learn from other teams even when they were not working on remotely the same thing.
  3. Demo days made them less nervous about speaking in front of crowds
  4. CDOT provided opportunities to work on something important with a great amount of freedom to design and implement their work
  5. CDOT experience has helped them to either find a job (a couple of students will start in new positions right after labour day) or the confidence that they will be able to find one shortly.

I truly believe that CDOT is one of those places that are unique and special.  We have students that do very interesting work on a wide variety of projects.  Our students are very bright, and I have always admired their ability to exceed my expectation.  Many of you have taught me things I did not know.  To all the RA’s at CDOT, know that you are the ones that make CDOT special.  When I see what you do, when I see what you have accomplished, I am reminded of why I teach.   Thank you!

by Cathy at August 28, 2015 10:35 PM

August 26, 2015

Justin Flowers

Performing Bulk Operations on Elasticsearch Databases

Elasticsearch is a powerful database technology that uses unique ideas with its restful API for queries. However when it comes to updating and reindexing Elasticsearch has no built in functionality for performing these operations in bulk. This means to do tasks like this one must make their own script composed of various API requests. Additionally, making requests is slow; we want to reduce the amount of requests as much as possible. The Elasticsearch team recognized this and included the scan and scroll options and bulk API for such situations. This post will discuss performing mass updates and reindexing using both of these APIs. We will use Python here for simplicity, but this logic can be applied to any language.

Scan and Scroll

Before discussing the update or bulk API calls we should focus on the scan and scroll options in Elasticsearch. Including a scroll search type parameter in your search API call makes Elasticsearch ignore sorting and rankings when returning the results of your query. This speeds up the whole process by allowing Elasticsearch to simply dump all results in no particular order. The scroll option (which is mandatory for scan search types) defines how long Elasticsearch should wait for the next scan query to come in as it internally keeps track of the last batch of results returned. So a scroll of 1m means it should wait at most 1 minute before clearing data about this search. An example API endpoint using scan and scroll would be:


The first call to that endpoint with a POSTed query will return a “scroll_id” field (among other data regarding the query, such as total hits). It will not, however, return any of the actual hits of that query. To start receiving hits one must extract the “scroll_id” field and send it as POST data to the above link. Then you will start receiving lists of hits matching that query. To perform an operation like this in Python would look like:

import requests
import json

# Define url and query (swap “index” for whatever index you're working with)
url = 'http://localhost:9200/index/_search?search_type=scan&scroll=1m&pretty'
# Example query that gets results between Aug 20 and 22
# Note that we define what fields to grab; this is important
query = """
    "fields": ["_id", "_type", "_index"],
    "query": {
        "range" : {
             "@timestamp" : {
                 "gte" : "2015-08-20T00:00:00.000Z",
                 "lte" : "2015-08-22T00:00:00.000Z"

# Make request posting query data
r = requests.get(url, data=query)

# Extract scroll ID
data = json.loads(r.text)
scrollId = data["_scroll_id"]

# Make a new request using the scroll ID
url = 'http://localhost:9200/_search/scroll?scroll=1m'
r = requests.get(url, data=scrollId)

# And extract hits
data = json.loads(r.text)
hits = data["hits"]["hits"]

This example will make the first request to get the scroll ID, extract the scroll ID, and then make another request to actually begin getting hits. Note that in the query we use the “fields” array to grab only the fields we need in the query; this is important, as you want to reduce unnecessary JSON parsing as much as possible. Now we can loop around this request extracting results until we receive no more hits.

The Bulk API

Elasticsearch’s bulk API is relatively easy to learn. Essentially you compile a list of JSON objects containing the type of operation and what to perform it on separated by newlines. If an operation would require more data than what to do and on what document (for example, an update) then you include this data as a line under the original operation. This means an update would look like:

{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }

This translates to updating the field “field2” in the document which has an ID of “1”, a type of “type1”, and the index “index1”. The update API will either add “field2” if it does not exist or modify its contents if it does already. Any more operations we need would be appended to this list. For example, two updates would look like:

{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }
{ "update" : {"_id" : "2", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field3" : "value3"} }

We would then post this string to:


And receive back a list operation returns for each update (success, failure, etc). To emulate this sort of functionality in code we would use:

# Generate update string
updatecall = '{ "update" : { "_index" : "' + result["_index"] + '", "_type" : "' + result["_type"] + '", "_id" : "' + result["_id"] + '" } }'

# Generate update data
updatedata = '{ "doc" : {"field" : "test"} }'

# Append to poststring
poststring += updatecall + "\n" + updatedata + "\n"

# Send update request to bulk api
url = 'http://localhost:9200/_bulk'
r =, data=poststring)

Putting it all Together

So now that we know how the scan-scroll search type, bulk API, and update command for the bulk API works we can put it all together to create a loop which takes in results and updates data accordingly. For example:

import requests
import json

# Define url and query (swap “index” for whatever index you're working with)
url = 'http://localhost:9200/index/_search?search_type=scan&scroll=1m&pretty'

# Example query that gets results between Aug 20 and 22
# Note that we define what fields to grab; this is important
query =  """
                   "fields": ["_id", "_type", "_index"],
                   "query": { 
                       "range" : {
                            "@timestamp" : {
                                "gte" : "2015-08-20T00:00:00.000Z",
                                "lte" : "2015-08-22T00:00:00.000Z"

# Make request posting query data
r = requests.get(url, data=query)

# Extract scroll ID
data = json.loads(r.text)
scrollId = data["_scroll_id"]

# Make a new request using the scroll ID
url = 'http://localhost:9200/_search/scroll?scroll=1m'
r = requests.get(url, data=scrollId)

# And extract hits
data = json.loads(r.text)
hits = data["hits"]["hits"]

# Loop for all hits
while len(hits) > 0:
    for result in hits:
        # Generate update string
        updatecall = '{ "update" : { "_index" : "' + result["_index"] + '", "_type" : "' + result["_type"] + '", "_id" : "' + result["_id"] + '" } }'

        # Generate update data
        updatedata = '{ "doc" : {"field" : "test"} }'

        # Append to poststring
        poststring += updatecall + "\n" + updatedata + "\n"    
    # Send update request to bulk api
    url = 'http://localhost:9200/_bulk'
    r =, data=poststring)

    # Request next set of data
    url = 'http://localhost:9200/_search/scroll?scroll=1m'
    r = requests.get(url, data=scrollId)

    # Extract hits and reset poststring
    data = json.loads(r.text)
    hits = data["hits"]["hits"]
    poststring = ""

This program would loop around the hits returned from the search, generating update calls for each, post the changes to the bulk API, and request the next list of hits until there were no more results to process. This demonstrates the basic logic of working with Elasticsearch changes in bulk. A similar strategy can be used for reindexing; simply take a result, delete it with the bulk API delete command and then insert the same one to a different index, reindexing all results. I hope this has helped you understand Elasticsearch in better depth!

by justin at August 26, 2015 01:51 PM

August 23, 2015

Hosung Hwang


pHash does it’s mathematical operations for every pixels for original image size. Therefore, when the image is resized, the result is slightly different depending on image size. My assumption is that if every image is resized to certain size when the image is bigger than the size, the general matching quality would be better.

I tested the same set of image samples with previous posting, however, because of the speed, the comparison performed for 3644 images.

To find which size is good for normalization, I resized images to 2000, 1500, and 1000 width. And hamming distance between resized image to from 90% to 10%.


Hamming Distance is bigger than 4

normalization size 2000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 1   100% 0.00 0.00 0.00 0.00 0.00 0.01
90% 0 0 0 0 6 17   90% 0.00 0.00 0.00 0.00 0.08 0.23
80% 0 0 0 1 12 19   80% 0.00 0.00 0.00 0.01 0.16 0.25
70% 0 0 0 1 18 36   70% 0.00 0.00 0.00 0.01 0.24 0.48
60% 0 0 0 12 48 87   60% 0.00 0.00 0.00 0.16 0.64 1.16
50% 0 0 3 26 77 141   50% 0.00 0.00 0.04 0.35 1.03 1.89
40% 0 0 9 62 172 272   40% 0.00 0.00 0.12 0.83 2.30 3.64
30% 1 12 54 156 333 475   30% 0.01 0.16 0.72 2.09 4.45 6.35
20% 27 99 246 424 693 851   20% 0.36 1.32 3.29 5.67 9.27 11.38
10% 163 360 753 1093 1442 1636   10% 2.18 4.82 10.07 14.62 19.29 21.89

normalization size 1500

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 1   100% 0.00 0.00 0.00 0.00 0.00 0.01
90% 0 0 0 0 2 13   90% 0.00 0.00 0.00 0.00 0.03 0.17
80% 0 0 0 0 7 14   80% 0.00 0.00 0.00 0.00 0.09 0.19
70% 0 0 0 1 15 33   70% 0.00 0.00 0.00 0.01 0.20 0.44
60% 0 0 0 2 25 64   60% 0.00 0.00 0.00 0.03 0.33 0.86
50% 0 0 0 7 46 110   50% 0.00 0.00 0.00 0.09 0.62 1.47
40% 0 0 4 25 123 223   40% 0.00 0.00 0.05 0.33 1.65 2.98
30% 0 0 18 86 247 389   30% 0.00 0.00 0.24 1.15 3.30 5.20
20% 6 27 116 257 520 678   20% 0.08 0.36 1.55 3.44 6.96 9.07
10% 137 308 654 969 1313 1507   10% 1.83 4.12 8.75 12.96 17.57 20.16

normalization size 1000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 1   100% 0.00 0.00 0.00 0.00 0.00 0.01
90% 0 0 0 0 0 11   90% 0.00 0.00 0.00 0.00 0.00 0.15
80% 0 0 0 0 0 7   80% 0.00 0.00 0.00 0.00 0.00 0.09
70% 0 0 0 0 5 23   70% 0.00 0.00 0.00 0.00 0.07 0.31
60% 0 0 0 0 6 45   60% 0.00 0.00 0.00 0.00 0.08 0.60
50% 0 0 0 0 26 90   50% 0.00 0.00 0.00 0.00 0.35 1.20
40% 0 0 0 3 56 156   40% 0.00 0.00 0.00 0.04 0.75 2.09
30% 0 0 2 17 132 274   30% 0.00 0.00 0.03 0.23 1.77 3.67
20% 0 4 39 122 354 512   20% 0.00 0.05 0.52 1.63 4.74 6.85
10% 61 161 406 679 999 1193   10% 0.82 2.15 5.43 9.08 13.36 15.96

Hamming Distance is bigger than 6

normalization size 2000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 0   100% 0.00 0.00 0.00 0.00 0.00 0.00
90% 0 0 0 0 0 1   90% 0.00 0.00 0.00 0.00 0.00 0.01
80% 0 0 0 1 2 3   80% 0.00 0.00 0.00 0.01 0.03 0.04
70% 0 0 0 0 4 11   70% 0.00 0.00 0.00 0.00 0.05 0.15
60% 0 0 0 0 8 21   60% 0.00 0.00 0.00 0.00 0.11 0.28
50% 0 0 0 6 20 46   50% 0.00 0.00 0.00 0.08 0.27 0.62
40% 0 0 4 21 46 94   40% 0.00 0.00 0.05 0.28 0.62 1.26
30% 0 0 11 45 106 175   30% 0.00 0.00 0.15 0.60 1.42 2.34
20% 4 14 63 142 286 381   20% 0.05 0.19 0.84 1.90 3.83 5.10
10% 59 153 347 539 752 869   10% 0.79 2.05 4.64 7.21 10.06 11.63

normalization size 1500

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 0   100% 0.00 0.00 0.00 0.00 0.00 0.00
90% 0 0 0 0 0 1   90% 0.00 0.00 0.00 0.00 0.00 0.01
80% 0 0 0 0 0 1   80% 0.00 0.00 0.00 0.00 0.00 0.01
70% 0 0 0 0 2 9   70% 0.00 0.00 0.00 0.00 0.03 0.12
60% 0 0 0 0 6 19   60% 0.00 0.00 0.00 0.00 0.08 0.25
50% 0 0 0 1 10 36   50% 0.00 0.00 0.00 0.01 0.13 0.48
40% 0 0 0 8 28 76   40% 0.00 0.00 0.00 0.11 0.37 1.02
30% 0 0 3 26 81 150   30% 0.00 0.00 0.04 0.35 1.08 2.01
20% 1 4 30 88 221 316   20% 0.01 0.05 0.40 1.18 2.96 4.23
10% 39 99 257 433 639 756   10% 0.52 1.32 3.44 5.79 8.55 10.11

normalization size 1000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 0   100% 0.00 0.00 0.00 0.00 0.00 0.00
90% 0 0 0 0 0 1   90% 0.00 0.00 0.00 0.00 0.00 0.01
80% 0 0 0 0 0 1   80% 0.00 0.00 0.00 0.00 0.00 0.01
70% 0 0 0 0 1 8   70% 0.00 0.00 0.00 0.00 0.01 0.11
60% 0 0 0 0 2 15   60% 0.00 0.00 0.00 0.00 0.03 0.20
50% 0 0 0 0 9 35   50% 0.00 0.00 0.00 0.00 0.12 0.47
40% 0 0 0 0 14 62   40% 0.00 0.00 0.00 0.00 0.19 0.83
30% 0 0 0 4 35 104   30% 0.00 0.00 0.00 0.05 0.47 1.39
20% 0 0 11 39 138 233   20% 0.00 0.00 0.15 0.52 1.85 3.12
10% 14 38 135 270 449 566   10% 0.19 0.51 1.81 3.61 6.01 7.57




According to the test result, in terms of matching percentage, resizing before hashing gives better results; this can be a solution for better matching. However, false positive matching percentage is important.

by Hosung at August 23, 2015 03:10 AM

August 22, 2015

Hosung Hwang

DCT Hash matching quality for resized images

DCT Hash in pHash is selected as image similarity search algorithm for Creative Commons image license search. Recently, we found that some images are not matched when they are resized. So, I tested it for flickr CC images.

Firstly, I resized image to 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, and 10%. Those resized image is hashed and calculated hamming distance from 100%. Since image size matters, I categorized images depending of original size to bigger than 5000 pixels width, 4000~5000, 3000~4000, 2000~3000, 1000~2000, and smaller than 1000 pixels.

Total image count was 7475 images.

Image count that the hamming distance is bigger than 4

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000
90% 5 21 31 39 53 71
80% 8 28 46 58 80 97
70% 13 32 60 85 128 172
60% 23 71 123 170 244 322
50% 30 97 173 246 359 491
40% 65 182 344 490 712 908
30% 125 339 626 861 1217 1519
20% 236 577 1079 1472 2012 2349
10% 505 1080 1983 2698 3419 3823

Percentage of images that the hamming distance is bigger than 4

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000
90% 0.07 0.28 0.41 0.52 0.71 0.95
80% 0.11 0.37 0.62 0.78 1.07 1.30
70% 0.17 0.43 0.80 1.14 1.71 2.30
60% 0.31 0.95 1.65 2.27 3.26 4.31
50% 0.40 1.30 2.31 3.29 4.80 6.57
40% 0.87 2.43 4.60 6.56 9.53 12.15
30% 1.67 4.54 8.37 11.52 16.28 20.32
20% 3.16 7.72 14.43 19.69 26.92 31.42
10% 6.76 14.45 26.53 36.09 45.74 51.14

Image count that the hamming distance is bigger than 6

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000
90% 0 1 1 1 2 4
80% 2 3 7 10 12 17
70% 4 6 14 20 27 38
60% 6 13 23 35 50 76
50% 11 22 42 59 83 129
40% 19 58 99 140 207 297
30% 27 102 195 286 425 577
20% 79 227 441 612 896 1091
10% 249 579 1064 1475 1907 2159

Percentage of images that the hamming distance is bigger than 6

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000
90% 0.00 0.01 0.01 0.01 0.03 0.05
80% 0.03 0.04 0.09 0.13 0.16 0.23
70% 0.05 0.08 0.19 0.27 0.36 0.51
60% 0.08 0.17 0.31 0.47 0.67 1.02
50% 0.15 0.29 0.56 0.79 1.11 1.73
40% 0.25 0.78 1.32 1.87 2.77 3.97
30% 0.36 1.36 2.61 3.83 5.69 7.72
20% 1.06 3.04 5.90 8.19 11.99 14.60
10% 3.33 7.75 14.23 19.73 25.51 28.88



The result shows when the image is resized, there could be some images that are cannot detected. Possible solution is resizing the image to a certain size when the image is bigger than the size before hashing. I tested when the size is 2000, 1500, and 1000 width.

by Hosung at August 22, 2015 01:30 PM

August 21, 2015

Justin Flowers

Optimizing Logstash Performance

Logstash has a wide array of powerful filters, from the ones shipped with it to community maintained plugins. However, Logstash’s speed can be compromised when these filters are not used properly. I encountered this problem while performance testing a setup we’ve been working on for a CRM company. During times of heavy load Logstash was capable of handling our approximately 40,000 logs per minute (or 2.5 million per hour). During heavy loads with backlogs, however, Logstash was not able to keep up. It would take hours for it to catch up, which was simply not acceptable. Here’s some tips I have for optimizing Logstash performance.

1. Isolate performance problems

Before on can go about solving a performance problem one must figure out where the problem is in the first place. I knew that there were 3 possible problems with our Logstash installation:

  • Inputs plugins were blocked and could not take in any more at a faster speed
  • Filters were using too much CPU and taking too long to handle
  • Outputs were blocked due to I/O speeds or Elasticsearch

To identify what the problem was, I used three main tools:

  • strace to follow the Logstash input, filter, and output threads
  • htop to view the CPU usage of inputs, filters, and outputs
  • iotop to view the I/O of the system during these backlogs

You can find out more about these tools online, but suffice it to say you can learn quite a bit about your system with them. The strategy that I used was to remove plugins one by one while monitoring stats. If, after removing a plugin, I still saw the performance hit, then I knew it was one of the remaining plugins. Using these monitoring tools with this strategy I learned two things: that the bulk of the performance problem was in one type of filter file, and that Logstash was only using one thread to perform all filtering. I needed this multiline because I needed to concatenate Java exceptions into one log. What I learned with this was…

2. Never, ever, ever use the multiline plugin on the Logstash side

The multiline plugin requires that Logstash only use one worker thread for filtering. This is because order is very important when using the multiline filter; to concatenate a log with the last one that came in, you must know exactly what the last log was. So, whenever possible, avoid using this plugin. If you really need the multiline filter, then consider using Jason Wood’s Log-Courier, a modification of Logstash Forwarder which includes plugin support to perform multiline operations on the forwarding side, not on the Logstash side. This allows you to increase Logstash’s worker threads while still having multiline filters performed on incoming logs.

3. Drop needless logs

Another valuable way to increase Logstash performance is to preemptively drop needless logs. This can be done via the “drop{}” plugin on the Logstash side, or via Log-Courier’s “filter” codec on the forwarding side. This is a pretty simple idea to grasp: send less logs to your centralizer and it will have less work to do in the log run. Remember, working with Logstash is a numbers game. The less work you do for each log, even if minimal, will have significant results in the long run.

4. Increase worker threads

Worker threads are allocated for performing filters. You can control this setting by adding the “-w” parameter to LS_OPTS in the Logstash service file located at /etc/init.d/ on Red Hat based systems. Explicitly telling Logstash to split up work among multiple threads will increase speed a lot. Obviously you should not let Logstash have so many threads that your OS comes to a grinding halt, so work on modulating this value to the specifications of your system.

5. Optimize regular expression

Again, working with Logstash filters is a numbers game. Even a slight reduction in regular expression processing will increase your speed significantly when you’re taking in 40,000 logs per minute. I suggest using to figure out how efficient your current regex is and to optimize it more. I also suggest you read these awesome articles about how to optimize reges.

6. Selectively apply regular expression

Technically this idea should be in the last section, but it had such a great impact on my system that I felt it should be emphasized on its own. Selectively applying regex based on the size of a log is a great way to reduce your overall processing needs. In my case, 90% of logs were less than 400 characters long. 10%, however, were quite long: in the 1000-35,000 character range. I knew that any log above 1000 characters in length was either a long running query or an exception. And given at the top level of our filter logic a log would have to fail 9 steps of regex for it to be marked as generic, I knew it would be helpful to have these longer logs skip all but the regular expression for parsing LRQs and exceptions. I therefore used a range filter to determine the size of a log and give long logs a “long” tag. Based on whether the “long” tag existed I could selectively apply regex. Using this method increased the speed of our setup exponentially and it teaches a good lesson: try to apply only as much regex as necessary. You can use size, type, host, or any of your own fields to selectively apply regex, but its important that you have some method of filtering how many regular expressions each log has to go through in your filters.

And that’s it! Those were the most powerful methods I found of optimizing our Logstash performance with backlogs. Remember, getting Logstash to run quickly is a numbers game: an optimization seemingly small when viewed one log at a time means huge returns in the long run.

by justin at August 21, 2015 01:50 AM

August 19, 2015

LEAP Project

Installations – Install Trees and Composing 2: Overriding and the Anaconda Galaxy

Continuing from the last post involving Install Trees and composing, this time we’ll talk about a related idea and an activity I had regularly engaged in during the phase in getting the installer to work as it should for LEAP.

The idea of Overriding as it relates to the composing of images for us was to use the –source option in Lorax to specify additional repositories beyond the base repository to be included for the composition. These additional repos would contain packages that may override packages in our base repo due to their versions being newer. While normally this is likely used to simply expand the number of repos one wants to include in the install tree, for me I was using it as a point of testing. I think to better illustrate why we did this, let’s delve into the journey of taming the Anaconda installer to work for us.

Prior to me joining the LEAP team, the state of the installer was such that it was made to work in text mode and with kickstart installations. The graphical side wasn’t quite functioning yet. LEAP is built off CentOS sources so naturally we had attempted to use the Anaconda (19.31.123) that they had in their distribution. For reasons we had not yet understood at the time, we were having issues getting the installer to do its job. To facilitate this necessary component of our earlier alpha release, we had used a version of Anaconda that was more recent than the the from CentOS and crafted the packages that this new Anaconda depended on to make it usable. We placed this new Anaconda and its dependencies in another repository which we dubbed the Override repo as these packages would override equivalent but older packages in our base repository.

This was a rather trial and error sort of process as we would start with the Anaconda packages in the Override and then attempt to use lorax to compose only to discover we needed yet more dependencies or there would be errors codes to decipher. Eventually we got to the point where we had text mode. When I was put onto to this aspect of the project to get the graphical mode working we unfortunately had an accident where we unknowingly wiped out the work that had been done to with the Override previously. So I had to retrace the steps of months ago when my coworker Andrew Oatley Willis who had done the ground work.

Thus began my travels in the Anaconda galaxy:



Having the knowledge of the team’s efforts with 19.31, I decided to press on from that and start with a Fedora 21 version of Anaconda as the base. I quickly released that 21.15 was not a good starting point as that version did not have support for Kickstart installations. Moving on from there I attempted to use 21.35 which was the latest version in the 21.X line before there would be dependency conflicts between its requirements and the rest of our main LEAP repo. Here we hit an error with Metacity (a window manager) that we weren’t able to resolve. Initially the Metacity package had yet to be built for our repo so we had thought that to be the issue but even after that had become part of our package set the issue did not disappear. Andrew had then suggested I attempt a Fedora 22 version of Anaconda and work from there.


With 22.4 I had reached some degree of success. We had gotten past the Metacity error that had us stumped. But now the issue was that we had a bootloader error. It seemed as though the bootloader was not being installed to the right location. Luckily fixing only required a simple patch that changed the location of the bootloader directory to where we expected things to be installed. After this the text based kickstart installations had begun to work. We had finally retraced the steps to where we had left off from before. Despite all that the VNC was still not all there. Errors regarding gtk and gdk had starting cropping up. To bypass these issues I attempted to override the relevant gtk packages in the LEAP repo with packages ones that would supposedly make it work. After numerous tries at circumventing dependency issues we somehow manage to get the graphical installer to at least run but it still wouldn’t be able to install as it could not pick up any installation sources.

I tried even more different releases of Fedora 22 Anaconda after this point to see if we could get it done as it seemed we were just so close to the goal. In the end it was a fruitless road.


So where does one go now having traveled the dark reaches of space with nothing to show for it. The place I went was back to the beginning of this expedition. We returned to Anaconda 19.31. Having done so much trial and error, I had thought to return to the start of things to reexamine why is it that the original sources just wouldn’t work for us. I started to delve through the source code and the conclusion was that it should have all the necessary aarch64 support for it to work on our platform. In short it should be working. So why wasn’t it? To find out I stripped out our Override repo, did a new compose and started a PXE boot. To my surprise it looked as though it was working both in the text mode and graphically. We only had one error that came up which was the bootloader error from before. After applying the same patch as earlier everything just worked.

It turns out that the reason we were unable to leverage the graphical portion of Anaconda in earlier attempts was due to the fact that we had yet to complete the building of the full package set at the time. Now that we had all those pieces we were missing prior, a simple patch was all that was needed to get a functional installer. Funny how these things can take you on a journey only for the end point to be where we started all along. A story for next generation of LEAP developers.

Back to the point of where this post started, by using the –source option with lorax you can add additional repositories to be included in the compose. We had used it to do a fair amount of testing of individual packages to get our installer working by slowly including extra packages (though by the end of it we didn’t need to override very much at all). While the methodology was very trial and error it allowed me to learn a variety of things when it comes to the installers. Hopefully you don’t have to retrace my steps though.

by hzhuang3 at August 19, 2015 08:33 PM

August 18, 2015

Barbara deGraaf

Designing the UI-The colour scheme

For the past while my focus has shifted to work on the UI while my co-worker works on the shader file.

The first issue I came up with was what colours to use for the UI. So this is a short post on my thought process behind picking colours.

Even though blues are a popular choice for UI I wanted to use a different colour scheme. I started by taking inspiration from the colours commonly found in cameras. The colours I found in a lot of different cameras where blacks, dark greys and silvers. For the UI I liked the idea of using sliver so I found a nice sleek sliver tone I liked #D7DBE1 and then created the UI with that.

For the splash screen it was stated that a blackish background with light font was wanted so I went from there. I know that straight black and white is too harsh/jarring for users to see so I went with a dark grey for the background and light grey for the font. The background grey I used #2D2F31, my hope with this colour is that it reminds someone of the colour of the body of a camera ( a really dark grey). The colour I used for font was #DCD8D6 to find this one I went online and looked for what light greys would go good with the dark grey I used as the background.

The only other to mention in terms of design for the UI is the font. In order to find this I went on bootstrap theme pages and design websites and found a sleek typography that I thought suited the UI. One such that I liked was the Comso theme on bootswatch. This theme used the Source Sans Pro font found on google fonts.

That’s it mostly for the design concepts that went into the UI. Next up I will talk about the scripting behind the UI.



by barbaradegraafsoftware at August 18, 2015 07:09 PM

August 14, 2015

Ali Al Dallal

JSConf Training Track - Testing IE on Linux or Mac

I know this is probably a bit old now, but I have to share this post anyway since it sits on my draft box for a while now and I think it is a good post and good for my own record if I want to look back at something :)

One of the track at JSConf 2015 is how to test IE on Linux or Mac by Rey Bango which I find it very useful especially when Microsoft starting to get more attention and they are doing really well in term of supporting the open-web. Many people might still disagree on this point, but Rey also made a good point at his session about the more we use and support different browser in our app the better it will be for us consumer and developers.

NOTE: To be exact on what Rey said there... Rey actually said that so not only one engine dominant the web, but the above is my interpretation :P

There are different ways to test IE on Mac or Linux. Most of the stuff you can look from or

There is also IE Compatibility Cookbook

Well, another question you might ask before going any further here. Why should I test my web application on IE? Simple answer... 55% of the users online still use IE!

Also, why big corporation don't upgrade from IE6,7,8 to IE 11 or edge? Obvious it's a large investment for them to upgrade and they have made their application very specific to that specific browser's version, and I'm sure you don't want your bank to upgrade their software without good testing, right?

So, let's continue how do I test Internet Explorer and Microsoft Edge on Linux or Mac?

In the session Rey introduce us to Microsoft Remote Desktop (Mac download link) and using ngrok to tunnel your application from localhost.

That's pretty much it! You just need Microsoft Remote Desktop and ngrok your application, get the url run it with the Remote desktop client and you just got yourself IE on your Mac or Linux to test your app!

If you have any trouble just hit me on twitter @alicoding and I'm happy to help :)

by Ali Al Dallal at August 14, 2015 06:58 PM

August 13, 2015

Anna Fatsevych

pHash in JavaScript Tests

There are a few differences in my implementation of pHash on the client in JavaScript (here) and the original

First, I use HTML5’s FileReader and Canvas APIs to retrive and decode image data.
CImg stores images in a 4-dimensional array: width,height,depth,dimension (x,y,z,c), whereas Canvas produces pixel data in a linear array, where the pixel values for the Red, Blue, Green and Alpha channels are stored linearly for each pixel. After getting the image data from canvas, my implementation creates a 2-dimensional array with x and y pixel values, which then is filled with the result of the greyscaled image, as the Red Blue and Green channels in the greyscale image will have the same value, and thus, ultimately, only one channel.

The source values for each pixel (R,G,B) are sometimes askew, when visually the image will look the same, but the B (Blue) channel value will have a variance of 0-25 pixels.

After applying the CImg grey-scale formula to the collected data, with 100 images tested, the values were matching, and the source pixel variations did not play a role.
This is the formula:

greyscale = (66*(data[i]) + 129*data[i+1] + 25*data[i+2] + 128)/256 + 16;

After the greyscale pixels are initialized in the 2-D array, the convolve (get_correlate()) filter is then applied with a mask of 7×7 pixels, with the value of 1 (more on correlate here).

With around 100 images tested, this produced accurate results that were matching with those of CImg’s get_correlate() function.

The next step is the resize() function, which turned out to be simpler than I had anticipated, and is based on the proportional re-size (oldDimension/newDimension).
This function has produced accurate pixel value results when tested.

After that, two matrices are created 32×32 pixels each. The values are constant and identical to those of pHash function.

Next, the resized image is multiplied by the matrices consecutively. CImg uses floats, and the results are not completely matching, and have variances, when tested with various float precision in JavaScript. The main variance is evident here and produces hash hamming distances of 2, and sometimes 4. This is in about 1% of tested images.

Testing: Methods and Troubles

To test the JS implementation, I had fed the data of pixel values at greyscale point, derived directly from the original pHash implementation – to make sure the algorithm would work correctly, and the issue would be with decoding the pixel values only.

This produced great results, as I have discovered that hashes were matching, and there were only slight variances in the ending digits, producing distances of 6, 7, and 8. This led me to reexamine my code, and also take a closer look at the hash results. pHash’s dct_imagehash() function produces a 64-bit hash containing 1’s and 0’s based on the median variance. It looks something like this:


This is then converted to a 64-bit unsigned integer, and the result is returned.

My implementation of JS produced the same 64-digit string containing 1’s and 0’s and then sent via AJAX request to the server, where I used bindec() php function to convert it to the decimal. Upon further inspection and research and using this decimal to binary converter, I realized that the 1’s and 0’s sequences were matching, and the differences were caused by the difference in decimal conversions, which makes sense, seeing as php supports signed 64-bit integers only, NOT unsigned.

My colleague Ho Sung has a blog post about the solution to this issue, which is to convert the results to a hexadecimal string, which is possible in both JS and C++.

The following images were tested:

  • 1,000 images sized 32×32 were tested, producing one error – JavaScript hashed a completely black image with a hash value different from pHash. A solution for this will be implemented.
  • 1,000 images sized 250 pixels were tested, with no distance over 4, and 8 with distance of 2, the remaining distances were all 0 – a complete match.
  • 1,000 images of various sizes were tested, with 8 distances of 2, and the rest at 0.

Google Chrome was crashing, as it was taking up all the CPU power at images sized 7MB or more, this issue was resolved by combing through my code using JS alert() method, which stops the processing of the script, allowing for further debugging along with console.log() method.

  • 20 images sized 15-20 MB were tested successfully, producing no crash if hashed individually, and hamming distance of 0.
  • 15MB images averaged 150 seconds = 2.6 minutes
  • 20MB images averaged 200 seconds = 3.33 minutes

When trying to hash multiple large images, Chrome crashed. There is more testing to be carried out to optimize this process.

One solution is to resize the image with canvas, then to hash it – a handful of images was tested using this method with unsatisfactory results, as the distances increased (8-12).
Second solution would be to resize using the CImg method: this has not been tested yet.

More on the testing and implementation in near future.



by anna at August 13, 2015 10:16 AM

August 11, 2015

Hosung Hwang

64bit unsigned long long type transfer between Javascript and C++ Daemon

Currently, APIs to add and match image license get a pHash value that are extracted from image. This hash value is 64bit binary. For the fast processing, database and C++ daemon used it as unsigned long long type. However, recently, while Anna is developing Javascript pHash module, there was a problem. When Javascript calculation print the output hash value, last 4 or 5 characters were wrong values. That was because maximum value of number in javascript was 2^53.

  • Max value of integer in Javascript :
    2^53 : 9007199254740992 : 0x20000000000000

  • Max value of unsigned long long :
    2^64 : 18446744073709551615 : 0xFFFFFFFFFFFFFFFF

There are two solutions:

  1. Using Big integer library like
  2. Using Hexadecimal String for output

First solution has a benefit : another modules do not have to be changed. Second solution’s benefit is that doesn’t need additional Javascript library.
We decided to use solution 2, because

  1. hash value is used only to be sent to php API page
  2. do not need calculation
  3. later, when another hash algorithm is used, it can be much longer
  4. when additional Javascript library is used, client implementation will be slower.

After adopting this solution, following modules are affected.

  • javascript : added code to change from binary string to hexadecimal string

  • phash : hash generator from image
    I changed the code from generating integer string to generating hexadecimal string.

//printf("%llun", tmphash);
printf("%016llXn", tmphash);
  • hamming : hamming distance calculator from two hash values
    I changed it to get hexadecimal string :
//    ulong64 hash1 = strtoull(argv[1], NULL, 10);
//    ulong64 hash2 = strtoull(argv[2], NULL, 10);
    ulong64 hash1 = strtoull(argv[1], NULL, 16);
    ulong64 hash2 = strtoull(argv[2], NULL, 16);
  • regdaemon : C++ daemon
    I changed add/match command so it gets hexadecimal string.
//uint64_t uiHash = std::stoull(strHash);
uint64_t uiHash = std::stoull(strHash, 0, 16);

php API doesn’t have to changed because it bypasses by base64 encoding.

For MySQL database field, we decided to keep 64bit unsigned integer type for DCT hash value. That is because this way doesn’t need to be changed from string type to number type to load on the memory for indexing.

by Hosung at August 11, 2015 05:27 PM

August 07, 2015

Anderson Malagutti

How to resize windows on Mac OS X

I’ve been using a Macbook pro for a little while, and I’ve been a little disappointed with the fact that resizing windows on the Apple’s system isn’t that great! (You probably know what I’m talking about)

What I want to say is that if you came from others system (Ubuntu, Fedora, Windows…) You might find a little painful to resize windows manually (going with your mouse’s cursor on the borders)

I tried to survive doing this, but recently I decided that I would even buy a little app to make it better…

I’ve found the BetterSnapTool app, which costs around $2 dollars, and it promises to resize windows on Mac as you could do on Microsoft Windows.

However, I found a little app (FREE and OPEN SOURCE) to make my life easier :)

This app is called “Spectacle“, and this is its official website:

You can download it there, and follow its wizard with the instructions to allow the app on your OS X.

It won’t take you more than a minute.

After that you can use its keys combinations (or create your own combinations) to manage your windows how you wish to do  :)


Hope it helps. Thanks!


by andersoncdot at August 07, 2015 06:17 PM

August 04, 2015

LEAP Project

Tales of LEAP: Schrodinger’s Linux Distro

This beginning to our Tales series of posts will be snippets of the stories and activities we as the LEAP team have had in our adventures in developing this distribution. We hope you enjoy reading about our experiences which are at times comedy, mystery or something else entirely.

At some point in time, at some place in this world we encounter strange situations. Be they a manifest of our fatigue, oddities lurking in the depths of our kernel or truly phantasms beyond our ken, we at times stand to meet these ghosts in the shell.

On a seemingly regular day at the LEAP team offices, I and my coworker Artem were taking apart some of our ARM machines. We were doing this with the intent of transplanting them into 1U and 2U chassis which had some other hardware so we could properly place them into our Enterprise Hyperscale Lab. The machine I had been working we ended up having to put aside due to a faulty power supply with no replacement on hand. Artem’s machine (let’s call it Bob) however we were able to exchange chassis without issue. Prior to this body exchange, Bob had been running LEAP and was one of our more stable machines. With everything in place we inserted Bob into the EHL and connected it to our network. When we booted it up and took a peek through the serial connection we were greeted with the UEFI shell.

Seeing the shell at this point in time meant that for some reason we were not able to boot into LEAP as that is the first option in the boot order. Having done nothing to the the machine beyond switching up the case, nothing should be amiss. We then took a look at the FS0:/ partition in the UEFI shell to see if there was indeed a LEAP directory in the boot partition. What we ended up finding was a Fedora directory. This was rather surprising as I had personally deployed LEAP to all the machines we have in use so there certainly shouldn’t be a Fedora boot partition anywhere at this point. We weren’t able to even boot into Fedora either so seemingly we only had a boot partition.

Artem and I were confused and began to ponder about what could possibly had gone wrong with this simple task:

  1. The most obvious mistake was perhaps we had placed the wrong hard drive into that 1U chassis? This possibility seemed rather unlikely the drive that came with Bob’s original casing was an SSD and the 1U chassis had an HDD. As I was holding the HDD in my hands there was certainly no chance for such a mismatch (although we had begun to doubt ourselves at a point).
  2. There might have been a error in some of the connections to the Sata ports on Bob.
  3. As purposed by the the Dev Ops team of our office, “LEAP is actually installed into the chassis and not the disc drives”. I had to agree it was the most logical answer.

Joking aside, we decided to take Bob out of the EHL and took some time to examine its connections once again to see if something wasn’t in its proper place. After a bit of poking around, I had remembered that on these particular ARM boards the choice of Sata port used did matter and only a particular one would allow for a link to be established. We rectified that connection, closed up the case and LEAP was able to boot. Schrodinger’s Linux Distro turned out to be just a faulty connection?

However this still wouldn’t explain that mysterious Fedora boot partition that shouldn’t exist. And where did it go? Opening and closing the box didn’t give us any cats (which might have been more fun). Not satisfied with this conclusion we pondered some more. Amidst this Artem noticed that there was a curious little SD card inserted into Bob… It turns out that indeed inside that SD card was the Fedora directories that had very magically come and gone in our efforts this day. When the Sata link to the SSD wasn’t able to establish itself due to a mismatched Sata port, the SD card was probably mapped to FS0:/ in the UEFI shell as the next in line.

The final question would be why is this SD card here in the first place? The answer to that was myself. After this wild goose chase I remembered that one month prior I was working with Bob doing some PXE installation tests with it and for some reason there was some issue with getting the boot partition for F21 installed properly so I put it onto a SD card as a work around. We all had a good laugh at how things came together in the end.

That’s just a day in the life of the LEAP team.

by hzhuang3 at August 04, 2015 11:16 PM

July 31, 2015

Justin Flowers

Manually Copying User Profiles in Windows

While working on generating Windows base boxes with Vagrant, I encountered a problem early on. I had created an account with my name by accident to install all the requisite software I wanted to bundle with the base box. This proved annoying, as I would need the exact same setup of this account for a “vagrant” user. At first I thought a simple user name change would work, but unfortunately that proved to be cosmetic at best. My home folder and login credentials were still “jflowers” after I had changed the user name to “vagrant”. I ended up realizing I would need to copy the “jflowers” user profile to a new user named “vagrant”. In fact, Windows includes a tool accessible through Advanced Settings which seems to allow one to copy user profiles. Unfortunately, I couldn’t seem to get it to work. The “Copy Profile” button was grayed out no matter what account I logged in from or what other system settings I changed. That’s when I decided to do a manual user profile copy, outlined below.

1. Create the user account you plan on copying to

If you don’t already have the two accounts you want to copy from and to, you’ll need to make a second with the right name. In my case, this was making a “vagrant” account.

2. Create another administrative account to work from

Its dangerous to try to perform these changes while logged into either account, so you should make a third administrative account (if you don’t already have one) to work from for this.

3. Log into the account made in step 1, then log out

This is merely to make sure Windows has created and configured the user profile for the new account.

3. Log into the administrative account made in step 2

To begin working with user profiles.

4. Go to C:\Users

This is where the user profiles exist on your computer.

5. Backup anything important from the user you’ll be copying to

In the case that you already had the account you wanted to copy to, this is the step where you’ll want to backup any data in it before continuing. This method WILL delete all data in the second user’s Libraries (Documents, Pictures, etc).

6. Copy original user

Create a copy of the folder for the User Profile you’ll be moving to another account in C:\Users. Simply copy and paste it’s folder in the same directory. If you get any warnings here about Windows not being able to copy files, don’t worry about it. These are usually temporary and pre-generated files which Windows will recreate at start up.

7. Delete folder created for user from step 1

Delete the profile generated for the user created in step 1. Then rename the copy you made in the last step to the name of the account’s folder you just deleted. Essentially here you’re deleting the original folder and substituting it with the user profile you want to keep.

8. Log into account made in step 1

Now if you log out of your administrative working account and log into the account made in step 1 you should see all the icons and settings of the original account you copied. Additionally all Library items from the original account will be located in this account now.

9. Delete unnecessary users

Now for cleanup. Here you can delete the other two accounts (the original we copied and the administrative we used for work) if you want, or leave them! I deleted these accounts to remove clutter and make it clear to anyone using my base box what account they were meant to use.

And that’s it! You should now have copied all settings and files from one user profile to another. While this process is tedious, its rather simple to understand. Why Windows refuses to do it in their GUI when its such a simple process is rather strange, but for now this method should do.

by justin at July 31, 2015 10:10 PM

July 30, 2015

Anderson Malagutti

MAMP sysvshm module – Install PHP modules on MAMP

Hello everybody,

I’ve been developing some PHP applications, and as a Mac user I’ve been using MAMP on my development environment. MAMP has worked just fine until I had to use the System V Shared Memory on my PHP code.

I couldn’t run my PHP code because basically it required some PHP modules (sysvshm, sysvsem and pcntl), and MAMP by default does not include them. Therefore, my solution was to install these modules myself.
I thought it might help someone else, so I will give a little step by step on how you can add your own  module into your MAMP.

Let’s start:

You will have to have Homebrew and autoconf installed as well as Xcode (what you probably already have).
You can easily install homebrew by the command provide on its website:

ruby -e "$(curl -fsSL"

With homebrew working on your system, you can install autoconf easily with the command:

 brew install autoconf

As I said earlier MAMP does not include the PHP modules that you need, so the best solution for now it’s download the PHP version that you need from the PHP official website.
I will use the php5.6.10 as an example. I believe it should work on other versions as well.

After you have downloaded the PHP source code into your mac, just move it to:

mv your-new-php-directory /Applications/MAMP/bin/php/php5.6.10/include/

Note: You may not have the “include” folder, it’s not a problem you can just create this new directory running this command: 

mkdir /Applications/MAMP/bin/php/php5.6.10/include/

After you have copied the original source code in the correct location, you should rename the folder (from phpX.X.X to php), then your path will look like this:

mv /Applications/MAMP/bin/php/php5.6.10/include/php5.6.10 /Applications/MAMP/bin/php/php5.6.10/include/php

Now we will finally start to have some fun :)

Let’s configure the original source code on our system.

 cd /Applications/MAMP/bin/php/php5.6.10/include/php


Then go to the folder of the module that you need (in this case the module is sysvshm):

 cd /Applications/MAMP/bin/php/php5.6.10/include/php/ext/sysvshm

Let’s compile our module with the MAMP’s phpize, inside your modules’ directory run:


After that, let’s configure and make :)

./configure && make

Copy the new module to your MAMP directory:

cp /Applications/MAMP/bin/php/php5.6.10/include/php/ext/sysvshm/modules/ /Applications/MAMP/bin/php/php5.6.10/lib/php/extensions/no-debug-non-zts-20131226/

Then on your php.ini (/Applications/MAMP/bin/php/php5.6.10/conf/php.ini) file you should include the module wanted:

Just start PHP again and have fun :)

I’ve compiled some other modules (sysvsem and pcntl) using this same method, and it has worked for me as well!
I have used MAMP free version 3.3 on OS X 10.10.4 (Yosemite)

Thanks! Hope it has helped you.

by andersoncdot at July 30, 2015 09:48 PM

Barbara deGraaf

Callback hell and promises

While creating the GUI for my project I ran into a tiny bit of a callback hell, so I will talk a little about promises and how they are useful.

While making the GUI I used jQuery for to populate the camera and lenses select boxes. The main thing about jQuery that is important for this post is that it is async; this means that the rest of the code block will run while the jQuery request is being made. In my code I have the select boxes populate and change appropriately, in order to do this I needed to use callbacks to manage this async nature.

This lead to callback hell where my callbacks needed to be nested and I ended up with the “triangle of doom.” Callback hell is suck a widely know occurrence with async javascript that there is even a website devoted just to callback hell.

There was two things I did to clean up the code a bit which included promises and making modules.

First a bit about promises. There is a really good webpage that a coworker directed me to that I will post here.   It’s worth pointing out that jQuery actually has “Deferreds” these are actually a little different than promises but there is a way to cast to javascript promises with;

var thisPromise = Promise.resolve($.ajax('jsonFile.json'));

You can go to jQuery’s website to view more about Deferreds and how they can be used. For promises they can be chained together to make a sort of queue that the statements run in. This can be done with .then, I will so an basic example to make it easier to understand.

.then(function(data0) {
  //do what you want here
  return anotherasycncall(data0);
.then(function(data1) {
  //do more
  //can return a third async call here
  //catch any error that occured

The second thing that I did to make the code easier to read is to modularize the two parts that are async. Basically what was done is have lens and camera select separate modules

privateMethods.LensSelect = function (lensfolder) { 
//the async code here with promises 

And then in the method where the GUI is being made call the above method like so

var lensfolder = this.gui.addFolder("Lens");, lensfolder);

Using these two concepts; promises and modules my code has less chance of being spaghettied.

For the future the GUI is going to made in a different way using bootstrap so look forward to that.



by barbaradegraafsoftware at July 30, 2015 06:20 PM

July 29, 2015

LEAP Project

Installations – Install Trees and Composing: Lorax

Greetings once again fellow LEAP users, continuing on with the Installation series, this time I’ll be speaking about the composing process. Composing refers to the creation of the image files, EFI bootloader files and other assorted bits that are pieces of our install-tree. One can then use the install-tree to install the distribution. The tool we use to do this is called Lorax. Lorax is a tool for creating anaconda install images (which are used by the Anaconda installer, a graphical system installer used in Fedora and also leveraged by us in LEAP). Today I will do a brief overview of install-trees and how to use this tool.

To have an idea of what a completed install-tree is, we can quickly take a look at our official LEAP install-tree. The pieces that are generated by Lorax are the EFI, LiveOS and images directories (it also creates a couple files .treeinfo and .discinfo which hold some information about the tree and disc).

The EFI directory (which only has a BOOT directory) contains files that are mainly involved with the bootloader portion of an installation:

  • BOOTAA64.EFI – An EFI executable file that handles the booting process. Files named BOOTX.EFI are typically for booting non permanently installed operating systems ie hot pluggable media such as a live images and other installation media.
  • MokManager.EFI – An EFI executable file that manages Machine Owner Keys which is involved with Secure Boot (We don’t currently make use of Secure Boot on LEAP)
  • grub.cfg –  A grub configuration file that specifies what the grub menu consists of (In LEAP’s case it’s less of a menu and more of a specification of the boot for a PXE context)
  • grubaa64.efi – A EFI executable file much like the BOOTAA64.EFI that handles the booting process. Unlike the BOOTX.EFI files these ones are more usually used for booting permanently installed operating systems.
  • Fonts – As it implies it’s a font.

The LiveOS directory containsly only one file which is squashfs.img which is the image that contains the filesystem used for LiveOS.

Finally the images directories contain the images that one will be used to boot the system:

  • boot.iso – An image containing a minimal boot of the operating system.
  • efiboot.img – A UEFI boot image which really contains what was in the above EFI directory.
  • pxeboot – A directory that contains the vmlinuz and initrd.img. vmlinuz is the an executable file that contains the kernel for the distribution and initrd.img is the initial ramdisk for the system.

Now that we have an idea of what Lorax generates let’s move on to using Lorax.


There’s a few things to note when using Lorax:

  1. You need have root access or run Lorax with sudo.
  2. selinux needs to be disabled.
  3. Lorax 22 is recommended.

Using Lorax

An excerpt from the Lorax manpage:

-p STRING, --product=STRING
  product name

-v STRING, --version=STRING
  version identifier

-r STRING, --release=STRING
  release information

  source repository (may be listed multiple times)

A sample of using lorax could be:

sudo lorax --product=LEAP --version=7 --release=lp7 --source=

The above command will then use the source repository to generate an the EFI, LiveOS and images directories with their files using the product, version and release information supplied.

Some additional optional arguments that one might find rather useful are:

  • –excludepkgs=STRING – This option can be used to exclude specific package globs from the compose. An example would be –excludepkgs=fedora-productimg-workstation which would exclude packages that are part of the set that comprises what makes up the fedora workstation package set.
  • –isfinal – This option signifies that the compose that is being created will be for a stable, release ready distribution. Having this option will create a isfinal environment variable in the buildstamp (which is located in the initrd.img). This variable is then checked by the Anaconda installer to see if this is a stable release. If it isn’t a stable release then Anaconda installer will have various “This is pre-release software, install at your own risk” type of messages and graphics.
  • –buildaarch=STRING – This will set the architecture that this installation is for.

Hopefully that brief overview was informative for those unfamiliar with the composing process that is a step in creating the install tree which is used to install distributions like LEAP. While Lorax is the tool that we used to compose, there are a variety of others that also facilitate this process. Largely the process is rather simple but next time I’ll cover some additional aspects that involve composing when it comes to a testing context.

Until next time, happy leaping everyone. If there are any questions or comments please hit us up on irc or in the below comments sections.

by hzhuang3 at July 29, 2015 09:34 PM

July 28, 2015

LEAP Project

Installation Tarball fixed

Greetings again to all you LEAP users,

I recently discovered that there was a small typo in the grub.cfg file in the installation tarball. The exact typo was:

initrd (tftp)/leap/EFI/leap/pxeboot/initrd.img

This would have caused the installations to fail as the pxeboot portion does not exist in the directory structure. One can fix this through just the removal of that pxeboot part of the line:

initrd (tftp)/leap/EFI/leap/initrd.img

or you may download the updated installation tarball.

I hope this issue didn’t cause too much grief. Apologies once again and hopefully this allows for smooth installations from here on out. If any other issues crop up please drop by the irc channel at: irc:// or our bugzilla and let us know what’s going.

by hzhuang3 at July 28, 2015 06:12 PM

July 27, 2015

Anderson Malagutti

JAVA – SSL exception Could not generate DH pair

Hello everybody.

It’s been a long time since I posted something here…

Today I had a little trouble with a SSL exception. It was driving me crazy! hahaha I wish I had a post about it somewhere, then I decided to publish one myself… Hopefully I might help someone else having problems with it. :)


After playing with Titanium Studio for a little while on Mac OS 10.10.4 (Yosemite), the IDE stopped to work, and I was no longer able to login and make changes on my code.

I decided to try to launch the IDE trough the command line, then the SSL exception appeared…


Download these two jars:

1. bcprov-jdk15on-152.jar 2. bcprov-ext-jdk15on-152.jar

After you have these files on your computer, you simply have to move them into:


Also, edit the file located in:


You should add close to the other security providers.:


You might want to replace the “security.provider.1″ for “security.provider.X” where X is another number, as it may cause you some future issues…

Hope it helps somehow. :)

by andersoncdot at July 27, 2015 08:18 PM

Anna Fatsevych

Inside pHash Continued

There is one thing standing between me and a completely functioning client-side JavaScript pHash – CImg resize() function. I actually have bypassed the resizing, and had gone ahead and written a semi-functional pHash in JavaScript (here) – it works and produces decent results (over 90% accuracy on 976 images tested), ONLY if your images are 32×32 pixels , in other words, it is not very useful right now.

I was quietly hoping I can resize with the help of the native HTML5 Canvas, but it is not that easy, and I am now rewriting CImg library’s resizing algo in JavaScript as the final step to completing the hash.

On a side note, the low accuracy is due to no precision set to the decimal spaces as yet:
Of almost 1000 images, 24 are erroneous, of which 12 (half) are with the hamming distance of 5 (allowable is 4 bits). I have not set any precision yet, since I am continuously testing images, but it is easily fixable in my opinion.

Here is the output:


More details on the resize soon,


by anna at July 27, 2015 06:02 AM

July 24, 2015

Ali Al Dallal

Localizing React app using React-Router with React-Intl

In the past, when we do our localization work we'll usually write our own library to do localization, and one of the example is node-webmaker-i18n.

Recently, I had to localize our React application for Webmaker and this time our team thought we might change our approach and let's not write our own, but use something that's already out there, so I did a research about different libraries for internationalization for React application and ended up using React-Intl by Yahoo.

Obviously, there are some good reason why we didn't write our own this time and why we ended up using React-Intl instead of other options. One of the reasons we didn't write our own this time simply because we don't want to reinvent the wheel and helping the community by contributing to the existing library is also a good idea. I find that React-Intl got a lot in common in term of their needs in the library that they've wrote, and they are very responsive and helpful when we had problem using their library.

Now, let's get started on how to integrate React-Intl with React app that's using React-Router for handling routes.

NOTE: We're using Webpack.js to handle our modules bundling.


import React from 'react';  
import Router from 'react-router';  
import routes from './routes.jsx';  
import messages from './messages';

var locale = navigator.language.split('-')  
locale = locale[1] ? `${locale[0]}-${locale[1].toUpperCase()}` : navigator.language

var strings = messages[locale] ? messages[locale] : messages['en-US']  
strings = Object.assign(messages['en-US'], strings);

var intlData = {  
    locales : ['en-US'],
    messages: strings
};, Router.HistoryLocation, function (Handler, state) {  
  React.render(<Handler {...intlData} />, document.querySelector("#my-app"));


import React from 'react';  
import { Route } from 'react-router';

var routes = (  
    <Route name="home" path="/" handler={require('./home.jsx')} />

module.exports = routes;  


// This is essentially bulk require
var req = require.context('../locales', true, /\.json.*$/);  
var exports = {};

req.keys().forEach(function (file) {  
  var locale = file.replace('./', '').replace('.json', '');
  exports[locale] = req(file);

module.exports = exports;

Just to explain a bit what's going on here in each file.

  1. messages.js is basically just to pre-load all the locale files, so that you don't have compile time error with webpack.

  2. routes.jsx this file is pretty straightforward since it's just the normal way of declaring your routes in react-router.

  3. entry.jsx this is where it gets a bit complicated. First thing first let's talk about this line

var locale = navigator.language.split('-')  
locale = locale[1] ? `${locale[0]}-${locale[1].toUpperCase()}` : navigator.language  

This basically just extracting the language code from the browser using navigator.language then we rewrite the string to match what we have in our dictionary that was stored in messages.js file. The reason I have to do toUpperCase() here because Safari will return en-us where as Firefox and Chrome will return en-US.

var strings = messages[locale] ? messages[locale] : messages['en-US'];  

This one is pretty simple since we are just trying to retrieve the strings from our dictionary and if we can't find that locale then just fallback to en-US.

strings = Object.assign(messages['en-US'], strings);  

Sometimes we will include a language with partial translation
and we need to make sure the object that we pass to intlData
contains all keys based on the en-US messages otherwise React-intl will throw.

Now, let's look at home.jsx where we will use React-Intl to dynamically change the string based on our language.


import React from 'react';

export class render extends React.Component {  
  mixins: [require('react-intl').IntlMixin],
  render() {
    return (


  'hello_world': 'Hello World'


  'hello_world': 'สวัสดีชาวโลก'

So, I think that's it! We are now fully localized our React app using React-Intl :)
If you find any problem in my code or have any question don't forget to leave a comment down below or tweet me @alicoding and I'm happy to answer :)

by Ali Al Dallal at July 24, 2015 04:19 PM

July 23, 2015

LEAP Project

Installations – Kickstart Options: Users, LVM, RAID

Greetings to potential or current users of LEAP. My name is Michael and on the LEAP team I was working on various pieces of testing and in particular getting the installer for our distribution working. Today I’d like to start a series of short talks about the various aspects of installing LEAP. The topic this time is Kickstart options. They can help to enhance your automated installations for those who are new to the whole thing (much like I was when I first joined the project).

When installing LEAP one has the option of using the graphical VNC installer or the text mode installer. While there is a certain allure to being able to see more than a terminal window when doing an installation it will likely become a cumbersome activity if there are a number of systems to deploy LEAP to or if re-installations are often occurrences. Kickstart installations allow for all the installation configurations to be set in a Kickstart file which is then used to bypass the configuration portion of the installation for the user. For our release we have provided a rather simple and generic Kickstart file for users to use but it doesn’t make use of certain options that might be useful. Let’s look at a few of those.

User Creation

The default Kickstart does not have any users created but should one want to create one it is rather simple. Simply use the ‘user’ option:

user --name=leapuser --groups=wheel --password=leapfrog

Here we create the leapuser account and add it to the wheel group and give it a password of leapfrog. There are additional flags one can use with the user option such as the –homedir= flag to set the default home directory for that user (otherwise /home/ is used).

LVM Partitioning

The portion of the default Kickstart file that is relevant to storage partitioning is as follows:

bootloader --location=mbr --boot-drive=sda
clearpart --all --initlabel 
part  /boot/efi --fstype efi --ondisk sda --size 200
part  /boot --fstype xfs --ondisk sda --size 200
part  swap --ondisk sda --size 4000
part  / --fstype xfs --ondisk sda --size 10000 --label=root --grow

At the first line we state that we want to install the bootloader into the sda drive at the master boot record. zerombr refers to initializing any disks whose formatting is unrecognized by the installer, that is to say those disks will be wiped clean. The clearpart option removes partitions from the system prior to creating new partitions. We remove all partitions in our kickstart with the –all flag and initialize the disks with –initlabel. Following those instructions we specify the new partitions we want to create on the system with flags that decide their file system type, label, size and disk to be created on. As can be seen, it is a rather simplistic file system that is done with standard partitioning and it takes up all of the sda disk.

To make use of LVM partitioning in the Kickstart we make use of the volgroup and logvol options:

bootloader --location=mbr --boot-drive=sda
clearpart --all --initlabel
part  /boot/efi --fstype=efi  --ondisk=sda  --size=200
part  /boot   --fstype=xfs --ondisk sda  --size=200
part pv.01 --size=1000 --grow --ondisk=sda
volgroup rootvg01 pv.01
logvol / --vgname=rootvg01 --name=lv_root --fstype=ext4 --size=1024 --grow
logvol swap --vgname=rootvg01 --name=lv_swap --size=4000

Up to the /boot partition it is still the same as before. After that point we create a partition which will be used for LVM (these partitions all start with the pv prefix). We then create a volume group on top of that partition with the volgroup command. Then we can start creating as many logical volumes on the volume group as we like. The syntax is largely similar to the existing lines with the part command with two additional flags that refer to the volume group name and the name give to the logical volume being created.


Setting up a RAID with the Kickstart options is also relatively easy and it follows a similar process as with the previous partitioning schemes:

part raid.11 --fstype=raid --size=200 --ondisk=sda
part raid.12 --fstype=raid --size=200 --ondisk=sda
part raid.13 --fstype=raid --size=10000 --ondisk=sda
part raid.21 --fstype=raid --size=200 --ondisk=sdb
part raid.22 --fstype=raid --size=200 --ondisk=sdb
part raid.23 --fstype=raid --size=10000 --ondisk=sdb
raid /boot/efi --fstype=efi --device=md0 --level=1 raid.11 raid.21
raid /boot --fstype=ext4 --device=md1 --level=1 raid.12 raid.22
raid pv.01 --device=md2 --level=1 raid.13 raid.23
volgroup sysvg pv.01
logvol / --vgname=sysvg --name=lv_root --fstype=ext4 --size=8000 --grow
logvol swap --vgname=sysvg --name=lv_swap --grow --size=1024 --maxsize=2048

We once again create partitions that will be used for the RAID. For this set up we’ll be doing a RAID 1 with two disks sda and sdb. Using the part command with the raid prefix will create partitions that will be used for RAID and the fstype is also raid. Following the creation of these partitions we use the raid command to create the raid device. The command is again similar in syntax to the part command except with the –device flag that specifies the name of the device, the level of the RAID and the partitions that make up the device. We create a device for the EFI partition, the boot partition and also a physical volume for LVM. Then we do a similar set up as the previous LVM example.

Hopefully that was a somewhat informative post about how you can do common place configurations such as user creation, LVM and RAID partitioning with Kickstart options. This is only a fraction of what’s available and for further reading please refer to the github page for pykickstart which outlines all the options as well as a comprehensive overview on how Kickstart installations work.

Happy installations everyone and we’ll see you next time.

by hzhuang3 at July 23, 2015 05:31 PM

Barbara deGraaf

A short post on debugging three.js shaders

Just a very small update on how it was to create the shader I made before and how I went about debugging it.

Debugging shaders is known to be very hard and there are a couple of ways to debug your code. The first way is to set your values to a gl_fragcolor and compare the output of your texture with the values you want.

There is some software that people have released in order to debug webgl that I didn’t use but may be useful for some people. One of them being Web Gl inspector. As well there is a Firefox Web Gl shader edition This allows you to edit the shader code in realtime and mouse over to see it’s effect in the scene. And if you prefer chrome there is a some Chrome canvas inspection dev tools that allow you to capture frames and debug code as well.

Of course if you don’t feel like downloading something or using a different browser you could do some rubber duck debugging but I would strongly recommend using the programs or techniques above.

That’s all for now, stay tuned for a brief post on promises.



by barbaradegraafsoftware at July 23, 2015 04:39 PM

July 22, 2015

Armen Zambrano G. (armenzg)

Few mozci releases + reduced memory usage

While I was away adusca released few releases of mozci.

From the latest release I want to highlight that we're replacing the standard json library for ijson since it solves some memory leak issues we were facing for pulse_actions (bug 1186232).

This was important to fix since our Heroku instance for pulse_actions has an upper limit of 1GB of RAM.

Here are the release notes and the highlights of them:

  • 0.9.0 - Re-factored and cleaned-up part of the modules to help external consumers
  • 0.10.0:
    • --existing-only flag prevents triggering builds that are needed to trigger test jobs
    • Support for pulse_actions functionality
  • 0.10.1 - Fixed KeyError when querying for the request_id
  • 0.11.0 - Added support for using ijson to load information, which decreases our memory usage

Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

by Armen Zambrano G. ( at July 22, 2015 02:21 PM

July 20, 2015

Anna Fatsevych

Inside pHash

I had written in my previous post about the functions that make up the pHash. This time, I have compiled my own program (that runs the functions from pHash and saves the results as separate images) to visually see the change process of the image in hopes of writing the JavaScript implementation.

First, you will need CImg library that can be downloaded from source here, or

sudo apt-get install cimg 
sudo apt-get install imagemagick // needed to work with the images the images

To compile, just run this command (from official CImg doc) for linux

g++ -o myprogram.exe myprogram.cpp -O2 -L/usr/X11R6/lib -lm -lpthread -lX11 

Here is the c++ code and, respectively, the result:

    CImg meanfilter(7,7,1,1,1);
    CImg img;
    if (src.spectrum() == 3){
        img = src.RGBtoYCbCr().channel(0).get_convolve(meanfilter);
    } else if (src.spectrum() == 4){
    int width = img.width();
        int height = img.height();
        int depth = img.depth();
    img = src.crop(0,0,0,0,width-1,height-1,depth-1,2).RGBtoYCbCr().channel(0).get_convolve(meanfilter);
    } else {
    img =;

Here is the result after the greyscale and convolve filters are applied:


The rest of the code here will resize the image, then take a run of the pixels and hash them to 1 or 0 bits depending on the mean value.

    CImg *C  = ph_dct_matrix(32);
    CImg Ctransp = C->get_transpose();
    CImg dctImage = (*C)*img*Ctransp;
    CImg subsec = dctImage.crop(1,1,8,8).unroll('x');;
    float median = subsec.median();
    ulong64 one = 0x0000000000000001;
    hash = 0x0000000000000000;
    for (int i=0;i< 64;i++){
    float current = subsec(i);
        if (current > median)
        hash |= one;
    one = one << 1;

Subsequently here are the images:

Resize (32x32 pixels): hash

dctHash (32x32 pixels): dctimg

Final sub-section upon which the dct hash is based (64x1 pixels) : subsec

This is the visualization of the dct pHash process.



by anna at July 20, 2015 09:18 PM

July 17, 2015

Anna Fatsevych

pHash in JavaScript

I have been taking a closer look at the pHash dct_hash algorithm in order to recreate it on the client side. I have also looked into the possibilities of running compiled C++ programs on the client side with such tools like Native Client and Emscripten.

I decided to write my own JavaScript pHash implementation, as the pHash.js that I have been previously testing on has continuously produced a large hamming distance as a result and is not identical to the pHash dct hash algorithm – I have contacted the author, but in the meantime I will be writing my own.

Here is the main ph_dct_imagehash function that hashes the images:

int ph_dct_imagehash(const char* file,ulong64 &hash){

    if (!file){
    return -1;
    CImg src;
    try {
    } catch (CImgIOException ex){
    return -1;
    CImg meanfilter(7,7,1,1,1);
    CImg img;
    if (src.spectrum() == 3){
        img = src.RGBtoYCbCr().channel(0).get_convolve(meanfilter);
    } else if (src.spectrum() == 4){
    int width = img.width();
        int height = img.height();
        int depth = img.depth();
    img = src.crop(0,0,0,0,width-1,height-1,depth-1,2).RGBtoYCbCr().channel(0).get_convolve(meanfilter);
    } else {
    img =;

    CImg *C  = ph_dct_matrix(32);
    CImg Ctransp = C->get_transpose();

    CImg dctImage = (*C)*img*Ctransp;

    CImg subsec = dctImage.crop(1,1,8,8).unroll('x');;
    float median = subsec.median();
    ulong64 one = 0x0000000000000001;
    hash = 0x0000000000000000;
    for (int i=0;i< 64;i++){
    float current = subsec(i);
        if (current > median)
        hash |= one;
    one = one << 1;
    delete C;

    return 0;

pHash c++ code seems not so large at a first glance, but it also incorporates CImg Library, in particular the following functions:

in this line

img = src.RGBtoYCbCr().channel(0).get_convolve(meanfilter);
  • RGBtoYCbCr(0) - converts the image to YCbCr profile (y component is the greyscale image)
  • channel(0) - returns specified image channel
  • get_convolve(meanfilter) - convolve image by a mask, this function subsequently calls
  • get_correlate(mask) - correlate image by a mask: res(x,y,z) = sum_{i,j,k} (⇤this)(x + i,y + j,z + k)⇤mask(i,j,k), which performs the maths as well as calls
  • magnitude() - returns normalized image represented in matrix form

These lines of code follow right after

CImg *C  = ph_dct_matrix(32);
CImg Ctransp = C->get_transpose();

resize() is fairly easily replicated
ph_dct_matrix(32) - here is the function:

CImg* ph_dct_matrix(const int N){
    CImg *ptr_matrix = new CImg(N,N,1,1,1/sqrt((float)N));
    const float c1 = sqrt(2.0/N); 
    for (int x=0;xdata(x,y) = c1*cos((cimg::PI/2/N)*y*(2*x+1));
    return ptr_matrix;

get_transpose() then calls
get_permute_axes("yxzc") which permutes the axes order

After that, the image is cropped (8x8), unrolled - all the pixels on the x-axis, and consequently hashed:

    CImg dctImage = (*C)*img*Ctransp;
    CImg subsec = dctImage.crop(1,1,8,8).unroll('x');

    //pHash code 

So far, the pixel manipulations seem feasible in canvas. I will be posting updates on my implementation update.



by anna at July 17, 2015 07:12 PM

LEAP Project

LEAP is Officially Released!

After month of preparation, we are pleased to release LEAP: Linux for Enterprise ARM Platforms.

LEAP is our software distribution for testing and evaluating 64-bit ARM enterprise computing platforms. It is based on the CentOS 7.1 sources, with package upversioning, optimization, and fixes, plus additional packages. We will be continuously updating LEAP over the coming months to support additional ARM64 platforms and improve performance.

What are you waiting for? Head on over to the LEAP Homepage for all the details!

by christophertyler at July 17, 2015 06:46 PM

July 16, 2015

Cong Wang

address to coordinates transformation through openstreetmap Search through the above url and the response is in xml format. I have written a java app to convert address to coordinates here is my code: package javaconnection; import; import; import; import; import; import; import; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.transform.OutputKeys; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.dom.DOMSource; import; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.Node; import org.w3c.dom.NodeList; import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; public class JavaConnection { public static void main(String[] args) throws MalformedURLException, IOException, SAXException, ParserConfigurationException, Exception { String roadnumber = "135"; String AvenuneName = "pilkington"; String avenue = "avenue"; String Country = "birmingham"; String BaseURL = ""; String uri = BaseURL + roadnumber + "%20" + AvenuneName + "%20" + avenue + "," + "%20" + Country + "?format=xml&point=1&addressdetails=1"; URL url = new URL(uri); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setRequestMethod("GET"); connection.setRequestProperty("Accept", "application/xml"); InputStream xml = connection.getInputStream(); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(xml); prettyPrint(doc); } public static final void prettyPrint(Document xml) throws Exception { Transformer tf = TransformerFactory.newInstance().newTransformer(); tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); tf.setOutputProperty(OutputKeys.INDENT, "yes"); Writer out = new StringWriter(); tf.transform(new DOMSource(xml), new StreamResult(out)); for (String coordinates : out.toString().split(" ")) { if(coordinates.contains("lon=")) System.out.println(coordinates); if(coordinates.contains("lat=")) System.out.println(coordinates); } // System.out.println(out.toString()); } }

by wang cong ( at July 16, 2015 08:00 PM

July 15, 2015

Hosung Hwang library mismatch problem and solution


When I tried to run a executable that had been built at other machine, it showed following error :

$ ./regdaemon
./regdaemon: /usr/lib/x86_64-linux-gnu/ version `GLIBCXX_3.4.20' not found (required by ./regdaemon)



The reason of this error was because dynamic linking library's version was lower than the library version used in the build machine.

On the build machine, the library is like following:

/usr/lib/x86_64-linux-gnu$ ll libstdc*
lrwxrwxrwx 1 root root      19 Nov  4  2014 ->
-rw-r--r-- 1 root root 1011824 Nov  4  2014

This means that the library that is actually used by the executable is and links to it. This library is installed with new gcc.

On the other machine that showed error, the library was like following:

/usr/lib/x86_64-linux-gnu $ ll libstdc*
lrwxrwxrwx 1 root root     19 May 14 14:11 ->
-rw-r--r-- 1 root root 979056 May 14 14:41 links to and it is older version than on the build machine.



Since the machine was linux mint, which was debian, newest gcc can be installed by following command :

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install g++-4.9

Then the library is updated like this :

/usr/lib/x86_64-linux-gnu $ ll libstdc*
lrwxrwxrwx 1 root root      19 Apr 23 13:00 ->
-rw-r--r-- 1 root root 1541600 Apr 23 13:23

Now, because installed library was newer than in the build machine, the executable worked well.

The other solution will be linking statically by adding <code>-static-libgcc</code> option.

additional information

Which files(file/socket etc.) are opened by a process can be seen using "lsof" utility.

hosung@hosung-Spectre:~$ lsof -p 6002
regdaemon 6002 hosung  cwd    DIR                8,2     4096 2589221 /home/hosung/cdot/ccl/regdaemon/Debug
regdaemon 6002 hosung  rtd    DIR                8,2     4096       2 /
regdaemon 6002 hosung  txt    REG                8,2  1066943 2545008 /home/hosung/cdot/ccl/regdaemon/Debug/regdaemon
regdaemon 6002 hosung  mem    REG                8,2    47712 2117917 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2    14664 2117927 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2   100728 2101352 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2  1071552 2117915 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2  3355040 6921479 /usr/lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2  1840928 2117938 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2    92504 2097171 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2  1011824 6846284 /usr/lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2  1112840 6830431 /usr/lib/
regdaemon 6002 hosung  mem    REG                8,2   141574 2117939 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung  mem    REG                8,2   149120 2117935 /lib/x86_64-linux-gnu/
regdaemon 6002 hosung    0u   CHR             136,23      0t0      26 /dev/pts/23
regdaemon 6002 hosung    1u   CHR             136,23      0t0      26 /dev/pts/23
regdaemon 6002 hosung    2u   CHR             136,23      0t0      26 /dev/pts/23
regdaemon 6002 hosung    3u  IPv4              63342      0t0     TCP localhost:60563->localhost:mysql (ESTABLISHED)
regdaemon 6002 hosung    4u  unix 0x0000000000000000      0t0  114861 /tmp/cc.daemon.sock

by Hosung at July 15, 2015 10:07 PM

Anna Fatsevych

Images with HTML5 Canvas

HTML5 Image canvas comes with an array of possibilities in terms of pixel manipulation. I will be exploring more of its functionality in order to recreate pHash dct hash on the client side with JavaScript.

Here is a simple code to create an image:

canvas id="myCanvas" width="100" height="100">
Your browser does not support the HTML5 canvas tag.

var c = document.getElementById("myCanvas");
var ctx = c.getContext("2d");
var imgData = ctx.createImageData(100, 100);

var i;
for (i = 0; i <; i += 4) {[i+0] = 255;[i+1] = 0;[i+2] = 255;[i+3] = 125;

ctx.putImageData(imgData, 0, 0);

First we create the canvas tag with the width and height of 100 pixels.
Then we create the Image Data with width and height of 100 pixels.

In the forloop we assign the values for each pixel - it has four values for each channel Red, Green, Blue, and Alpha.
The channels are the greyscale representations of intensity of each colour (R,G,B and Alpha) defined from 0 to 255 shades of grey. Red (R), Green (G), and Blue (B) are the primary colours, and all the other colours can be created by combining the combination of the former in various strengths (intensities).The fourth channel is the alpha channel that represents transparency, where black is the indication of complete transparency in the spectrum.

To begin the pHash process the image needs to be turned to greyscale first. In this wikipedia article there is a formula on how to reduce the RGB channels in order to achieve the result.

It recommends the following modification: greyscale image = R * 0.299 + B * 0.587 + G *0.114;
So for each pixel, the formula of reducing red by 30%, blue by 60% and green by 10% will be applied.

Next step will be applying filters and masks - Will be discussed in detail in next post.



by anna at July 15, 2015 11:23 AM

July 14, 2015

Fardad Soleimanloo

Justin Flowers

Configuring Windows 7 Vagrant Base Boxes with SSH

With Vagrant it can be quite frustrating setting up Windows Base Boxes using WinRM. I have never had any success myself using the Vagrant WinRM method. While I gawk in amazement at pre-built boxes which have WinRM control, there seems to be no complete documentation anywhere on how to set it up. In fact, the Vagrant page describing how to set up Windows base boxes has formatting issues which makes its Windows code blocks near unreadable. Add on top of that the fact that their (and others’) instructions are either missing steps or outright wrong and you end up where I was three weeks ago: falling back on using the SSH method to connect and provision with Vagrant. While Vagrant does not have much built in automatic provisioning features with SSH mode Windows, you can still do manual provisioning using the Vagrantfile modifyvm command to configure what you need.

Step 1: Create vagrant user

The first important step is to create the vagrant user on Windows. Make sure the account’s username and password is “vagrant” and that it is an administrator. Then log into this new account.

Step 2: Install any provider specific requirements with Cygwin and OpenSSH

Next, from the vagrant user’s account, install any provider specific required software (like VBox’s Guest Additions) and then install Cygwin and OpenSSH using these great instructions from Oracle.

At the end of this section, you should be able to SSH to localhost from a Cygwin terminal by running:

ssh localhost

Step 3: Configure Windows Firewall

You’ll need to do add some entries to the firewall to allow communication through port 22 so that Vagrant can communicate with the base box.

  1. Go to “Windows Firewall with Advanced Security” in the start menu.
  2. Go to “Inbound Rules”
  3. Hit “New Rule”
  4. Select “Port” based rule
  5. Select “TCP”
  6. Select “Specified local ports” and enter 22
  7. Select “Allow the connection”
  8. Selected all check boxes for “When does this rule apply?”
  9. For name make it something along the lines of “Allow SSH Access”

You may need to add an outbound rule as well with the same setup to explicitly allow connections outbound over 22 but most likely that is not necessary.

Step 4: Package up base box

Now to package up the base box. Create a folder called “vagrant_win7”, change dir into it and run:

vagrant package --base "VM_NAME_HERE" --output ""

Substituting “VM_NAME_HERE” with the name of your VM in your respective provider. This will take a while and will create a file called “” in the contained folder.

Step 5: Configure Vagrantfile

In order for Vagrant to even add this new base box to a provider it needs a Vagrantfile. In our case, we’re using:

  1. A Windows guest OS
  2. SSH to a Windows guest OS
  3. Password protection

All of which go against Vagrant default functionality. Hence, we need our Vagrantfile to reflect that. We must also disable the default shared Vagrant folder because it does not set up correctly automatically over SSH. Here’s an example of the Vagrantfile I used to create my Windows base box:

Vagrant.configure(2) do |config| = "vagrant_win7"

  config.vm.provider "virtualbox" do |v| = "vagrant_win7"
    v.customize ["modifyvm", :id, "--nic2", "hostonly"]
  config.vm.synced_folder '.', '/vagrant', disabled: true
  config.vm.guest = "windows"

  config.ssh.insert_key = false
  config.ssh.username = "vagrant"
  config.ssh.password = "vagrant"

With this configuration I set the name of the box in Vagrant and VirtualBox. I also set up a host only adapter using the modifyvm parameter with v.customize.  I then disable the automatic synced folder and explicitly tell Vagrant its using a Windows guest OS. Finally I tell Vagrant to ignore using a private/public key-pair with SSH and tell it the username and password to use to connect.

Step 6: Add box to Vagrant and vagrant up

Finally, now that you have a working base box and Vagrantfile, its time to add your box and vagrant up! From the “vagrant_win7” folder simply run:

vagrant box add vagrant_win7

Once its finished adding your box you can run:

vagrant up

to turn it on and:

vagrant ssh

to create a SSH connection!


And that’s it! As you can see, the SSH method is a little complex, and leaves you without some automatic provisioning features with Vagrant. However if you simply need a Windows box working quickly and cannot get the WinRM route working, then this is a functional alternative.

by justin at July 14, 2015 02:35 AM

July 03, 2015

Koji Miyauchi

Using Box APIs

This week, I have been working on researching Box APIs in order to use the service in our application.

Box is a one of the popular cloud storage, and we decided to use this service as data source of our data visualization application project. The idea is that user can login to their Box account, and choose one of files for data visualization.

Get started

These links will be good resources to get started.

CORS support

Tutorial from the above link is quite straight forward.

But when people try to use any kinds of APIs through javascript, cross origin error will always be a problem.
Because generally web browsers does not allow cross domain access from javascript.

Since our application consists of mostly client side script, I needed to consider cross origin issue, however, Box has CORS(cross origin resource sharing) support for their API.

In order to do cross origin access with Box APIs, you need to contact Box API support and let them know what URLs you want to use for your application. They will set up Allow-Control-Allow-Origin in your application for you.

API request samples

After you get contact from Box and all set up, you can try this example.

Also there are some examples that I have tried for some APIs.

User login popup: First step of user login. It will popup window and let user authenticate our application.

var loginForm ="", "LoginWindow", "width=400, height=600");

//Keep listening popped-up window until authentication has been done.
var timer = window.setInterval(function(){
  var code = "";
    code =;

  //When popup redirected to a page with client code.
  if(code != ""){
    var reg = /code=(.*).$/g;
    console.log(reg.exec(code)[1]) // The code you need for OAuth request
}, 100);

OAuth: After you get the code from previous example, you can make a request to OAuth.

var form = new FormData();
form.append('grant_type', "authorization_code");
form.append('code', code);
form.append('client_id', "CLIENT_ID");
form.append('client_secret', "CLIENT_SECRET");

  url: "",
  type: 'POST',
  contentType: false,
  processData: false,
  data: form
}).complete(function (data){
  var json = JSON.parse(data.responseText);
  console.log(json.access_token) //This is the token that you can use for your API accesses.

API request
Now you can use that access-token to make a request to many different .

var headers = {
   Authorization: 'Bearer ' + token,
  url: "curl",
  type: 'GET',
  headers, headers,
  contentType: false,
  processData: false
}).complete(function (data){

Solution for our application

After I tried a couple of those examples, I found out that Box provides some widgets that you can simply embed and use.

This will create a button “Select from Box” button, and it will popup login form when you click it.

Screen Shot 2015-07-03 at 5.43.16 PM

Screen Shot 2015-07-03 at 5.42.45 PM

Once you login, you can explore your folder and choose file(s) to use in your application.
Screen Shot 2015-07-03 at 5.43.11 PM

At the end, this is all what I need.

by koji miyauchi at July 03, 2015 09:51 PM

Glaser Lo

Koji build system overview

As being Pidora's maintainer in winter, most of my time was spent on Koji, a package building system created by Fedora engineers. Packing a repository scale of packages is a task that could take huge amount of time to complete, and it is unrealistic to build on your computer one by one. Therefore, Koji is created. The purpose of having koji is making a multi-user, reusable, scalable package building system. There are a few components in the Koji system:

Koji build system graph

  • Koji-hub - Central service of koji - Receiving commands from client, assigning/passing tasks to other components based on httpd and XML-PRC call.

  • Builders/Hosts (kojid) - machines that are used for building packages only. Once the daemon kojid on it receives any task from koji-hub, the builder creates a mock chroot and builds packages in it.

  • Storage - Simply a network-attached storage mounted at /mnt/koji on any koji machines, storing all the build logs and package files.
    • Koji repositories - collections of packages that have already built on koji. Since a koji repo is basically a yum repo, testing it becomes easy and convenient.
  • Kojira - In order to make a built package available for building other packages, koji repository needs to be regenerated. Kojira is a daemon automatically detects any new changes and create a repo regeneration task.

  • Postgresql database - koji user and package database

  • koji web - A web interface allowing user to quickly check about current tasks, build status, host status, reports, etc.

  • koji client - A workstation contains koji user certificate. Once the client is authenticated, user can manage packages on koji server using koji cli commands.

Koji build system overview was originally published by Glaser Lo at Illusion Village on July 03, 2015.

by Glaser Lo ( at July 03, 2015 09:07 PM

Hosung Hwang

CC Image License Search Engine API Implementation


CC - New Page


Previously, my colleague Anna made a page that search similar images by uploading or from the link. This UI page can be either inside the server or outside the sever. It uses only PHP API without accessing Database directly.


This is open API that have functions of Adding, Deleting, and Matching image. It can be accessed by anyone who want this function. UI page or client implementation such as browser extension uses this API. The matching result is JSON format.
This API page Add/Delete/Match by asking “C++ Daemon” without changing Database.
Only for read-only access to the Database will be permitted.

C++ Daemon

All adding/deleting operation will be done in this daemon. By doing so, we can remove the problem of synchronization between database and index for matching. That is because this daemon will have content index on the memory all the time for fast matching.
Because this daemon is active all the time, to get the request and give result to “PHP API”, it works as domain socket server. PHP API will request using domain socket.


Database contains all metadatas about CC license images and thumbnail path that are used to show as a preview in the matching result.

by Hosung at July 03, 2015 03:16 PM

July 02, 2015

Chris Tyler (ctyler)

The OSTEP Applied Research Team

I haven't introduced my research team for quite a while, and it has changed and grown considerably. Here is the current Open Source Technology for Emerging Platforms team working with me at Seneca's Centre for Development of Open Technology. From left to right:

  • (me!)
  • Michael (Hong) Huang (front)
  • Edwin Lum (rear)
  • Glaser Lo
  • Artem Luzyanin (front)
  • Justin Flowers (rear)
  • Reinildo Souza da Silva
  • Andrew Oatley-Willis

Edwin and Justin work with me on the DevOps project, which is applying the techniques we've learned and developed to the software development
processes of a local applied research partner.

Michael, Glaser, Artem, Reinildo, and Andrew work with me on the LEAP Project. Recently (since this photo was taken), Reinildo returned to Brazil, and has been replaced by Christopher Markieta (who has previously worked with this project).

I'm dying to tell you the details of the LEAP project, so stay tuned for an announcement in the next week!

by Chris Tyler ( at July 02, 2015 08:51 PM

Anna Fatsevych

Flickr API Woes

My genius Flickr Downloader was chugging along and downloading images with all the required licensing and author information and everything seemed fine, until yesterday when I ran into an interesting issue. The images kept duplicating themselves after the folder was at 4,497 files. I ran the program again and (after a few hours, mind you) the issue reappeared. After I had exhausted all the possibilities of errors on my end (code, maximum dir size capability, etc), I began an investigation on Flickr API that yielded no results. Today I ran the program a few different times, on various dates, and alas, it capped out at 4,500 images on the dot each time.

The only limit ever mentioned in Flickr Official API Documentation is the 3600 api calls per hour throttling cap, and nothing is documented on the maximum results returned by a search. I had dug out this StackOverflow article that mirrors my issue, the only difference that it states the cap to be 4,000 search results, whereas I found it to be 4,500.

I am now testing the new downloader with more frequent time increments that yield search results that are less than the allowable max.

by anna at July 02, 2015 01:02 AM

June 29, 2015

Hosung Hwang

Pastec Test for Performance

So far, I tested Pastec in terms of the quality of image matching. In this posting, I tested speed of adding and searching.

Adding images to index

Firstly I added 100 images. Adding 100 images took 48.339 seconds. Then I added all directory from 22 to 31. Those images are uploaded to wikimedia commons from 2013.12.22 to 2013.12.21.

Directory Start End Duration Count Average
22 17:32:42 18:43:50 01:11:08 8785 00:00.49
23 18:43:50 19:42:03 00:58:13 7314 00:00.48
24 19:42:03 20:28:56 00:46:53 6001 00:00.47
25 20:28:57 21:28:02 00:59:05 7783 00:00.46
26 21:28:02 22:41:12 01:13:10 9300 00:00.47
27 22:41:19 23:54:28 01:13:09 9699 00:00.45
28 00:54:28 01:53:23 00:58:55 7912 00:00.45
29 00:53:23 02:27:42 01:34:19 11839 00:00.48
30 02:27:42 03:31:48 01:04:06 8827 00:00.44
31 03:31:48 04:23:15 00:51:27 6880 00:00.45

Average time for adding an image was around 0.46 second and it didn’t increased as the index grows. Most of the time for adding an image is extracting features.
I saved the index file for 100 images, from 22 to 26, and from 22 to 31. The size were 8.7mb, 444.1mb, and 935.8mb respectively.


Searching images

I loaded the index file for 100 images. And searched all 100 images that are used to add.

Directory Start End Duration Count Average
22 00:01:14 100 00:00.74

Searching took 1m14.781s. Since it is 100 images, average time to add one image was 0.74 second.

Then I loaded the index file that contains index for 39,183 images in the directory from 22 to 26.

Directory Start End Duration Count Average
22 09:00:05 11:21:06 02:21:01 8785 00:00.96
23 11:21:06 13:13:52 01:52:46 7314 00:00.93
24 13:13:52 14:48:26 01:34:34 6001 00:00.95
25 14:48:26 16:48:44 02:00:18 7783 00:00.93
26 16:48:44 19:13:11 02:24:27 9300 00:00.93

This time, average time for searching one image was 0.95 second.

Then I loaded the index file that contains index for 84,340 images that are in the directory from 22 to 31.

Directory Start End Duration Count Average
22 19:32:54 22:44:09 03:11:15 8785 00:01.31
23 20:44:09 23:16:59 02:32:50 7314 00:01.25
24 01:16:59 03:24:52 02:07:53 6001 00:01.28
25 03:24:52 06:11:33 02:46:41 7783 00:01.28
26 06:11:33 09:30:53 03:19:20 9300 00:01.29

Searching performed for the same images from 22 to 26. Average time for searching was 1.3 seconds.


  • Adding an image took 0.47 second.
  • Adding time didn’t varied by index size.
  • Searching an image varied by index size.
  • When the index size was 100, 39183, and 84340, searching time was 0.74, 0.95, and 1.3 seconds, respectively.
    Screenshot from 2015-06-28 23:14:15
    In the chart, y-axis is time in milliseconds. Around 0.6 second is likely to be for reading an image and extracting features. And searching time will be increased in proportion to the size of index.

by Hosung at June 29, 2015 03:28 AM

June 26, 2015

Barbara deGraaf

The thrilling saga on shaders continues

In my last post I detailed some basics of creating a shader and in this post I will be focusing on how to create a depth of field shader.

There is going to be a couple files that need changing including the shader file and the main js file. I am going to start off with the shader file and mention the js file later.

As I stated before in the last post the depth of field shader is only going to change the fragment shader so the vertex shader will be the same as the one that I have posted on the last post.

So this post will manly focus on the fragment shader. I was going to talk about the code in the shader but that has made the post too long so I will talk about the main concept of creating depth of field. Which is as follows; create a texture containing the depth map. Then grab the value from the depth texture to figure out how far away from the camera the pixel is. Using the inputs from the camera find out what the near and far depth of field areas are. We can then compare the depth of pixel to the near and far depth of field to find out how blurry it should be. We then do something called image convolution. This process grabs the colour of the pixels around the certain pixel and adds them together so that the final pixel is a mix of all the colours around it.

To get the shader to work Three.js has something called effect composer and shader pass to work with your shaders. This is done in rough form as follows;

composer = new THREE.EffectComposer( renderer );
composer.addPass( new THREE.RenderPass( scene, camera ) );

var Effect1 = new THREE.ShaderPass( shadername, textureID );
Effect1.uniforms[ 'value1' ].value = 2.0 ;
Effect1.renderToScreen = true; //the last shader pass you make needs to say rendertoscreen = true
composer.addPass( Effect1 );

Then to get this to work you need to call composer.render() in the render loop instead of the normal renderer.render().

I will end here for this post, If need be I will wrap up some minor things about shaders in the next post. As well once the scene is nicely set up and the GUI works with real world cameras/lenses I will post a post with a survey to see what shader produces the best results and where it can be improved.



by barbaradegraafsoftware at June 26, 2015 02:04 PM

Hosung Hwang

scp/sftp through ssh turnnel

SSH Tunneling

Machine CC can be connected from another machine called zenit.
To do scp to CC through zenit, following command establish a ssh tunnel to CC.

ssh -L 9999:[address of CC known to zenit]:22 [user at zenit]@[address of zenit]
in my case,
ssh -L 9999:

Now, 9999 port of localhost( is for tunnel to CC through zenit.
This session need to be alive to do all followings.


SCP through the SSH Tunnel

Then these commands do scp from local test.png file to CC:~/tmp and copy from CC:/tmp/test.png to ..

scp -P 9999 test.png ccuser@
scp -P 9999 ccuser@ .


Making it easy

Typing those long command is not a good idea.
I added an alias to .bashrc.

alias ccturnnel='ssh -L 9999:'

Then wrote two simple bash script.

This is cpfromcc.

var=$(echo $1 | sed 's/\/home\/hosung/~/g')
scp -P 9999 ccuser@$remote $2

This is cptocc.

for var in "$@"
    if [ $i -ne $# ]
        values="$values $var"
        var=$(echo $var | sed 's/\/home\/hosung/~/g')
scp -P 9999 $values ccuser@$remote

The reason why I use sed for remote path is because bash changes ~ to my home directory.
Now I can establish ssh tunnel by typing ccturnnel.
Then I can do scp from my machine to CC using :

cptocc test.jpg test2.jpg ~

And I can do scp from CC to my machine using :

cpfromcc ~/remotefile.txt .


Making it convenient using sftp

When the tunnel is established, sftp is the same.

$ sftp ccuser@


Making it more convenient using Krusader

By typing sftp://ccuser@ in the URL bar of the Krusader, and then by adding the place to the bookmark, the remote machine’s file system is easily accessed.

Screenshot from 2015-06-26 10:23:39

Mounting it using sshfs also will be possible.

by Hosung at June 26, 2015 03:54 AM

June 24, 2015

Anna Fatsevych

Flickr API – Date Time

Flickr API has a funny way with dates, I am in the middle of discovering how it really works. Before I was sending the date in terms of string “YYYY-MM-DD” and setting a difference of one day i.e. “2015-03-20 2015-03-21″ and I was getting only about 1,000 images per day (on average).

I had dug deeper into the API and realized that UNIX timestamp and MySQL datetime. In my php code I set the default timezone to Greenwich and then set the date in the MySQL datetime like this

min_upload_date: “2015-03-20 00:00:00″
max_upload_date: “2015-03-20 23:59:59″

And now I get on average of 200,000 results per day (Licenses 1 through 7).
This is great news – there are some grey areas that I need to further research – in terms of time comparison, or how exactly does Flickr compare dates, with what precision, round off, or truncation.

More to come as I am still researching and running tests.



by anna at June 24, 2015 09:47 PM

Hosung Hwang

Pastec Test for real image data

In the previous test of Pastec, I used 900 jpeg image that was mainly computer generated images. This time, I tested images from WikiMedia Commons Archive of CC License Image that are uploaded from 2013-12-25 to 2013-12-30. They are zip file 17GB to 41GB and it contains around 10,000 files including jpg, gif, png, tiff, ogg, pdf, djvu, svg, and webm. Before testing, I deleted xml, pdf, djvu and webm. Then there are 55,643 images.


Indexing 55,643 images took around 12 hours and Index file was 622mb. At first, I made separate index files for each day. However, Pastec can load only 1 index file. So I added all 6 days’ images and saved it to one index file.

While indexing there are some errors.

  1. Pastec uses OpenCV, and OpenCV doesn’t support gif and svg. For these two format, OpenCV didn’t open.
  2. Pastec adds images that is bigger than 150×150 pixel.
  3. There are zero bytes images : 153 files in 55,643 files. However on the web page of wikimedia, there are valid images. Anyways it causes an error.
  4. One tiff image cause crash inside the Pastec. It need debugging.


After loading the 622 mb index file, images can be searched. Searching 55,643 images took around 15 hours. Every searching process, it extracts features before searching, therefore, searching takes more time.

Search result

Among 55,643 images, 751 images(1.43%) are smaller than 150×150, so they were not added. 51479 images are proper size, proper format for OpenCV, they are indexed and can be searched.

  • 42,931 (83%) images are matched with only themselves (exactly the same image)
  • 8,459 (15%) images are matched more than one image
  • 90 (0.17%) images are not matched with any images even with themselves.

Images didn’t match with any images

These 90 images are properly indexed, but didn’t match even with themselves.

  • 55 images were png image that include transparency. Other than this case, jpg images
  • 14 images were long panorama images like followings


  • 6 images were simple images like followings

__Amore_2013-12-30_14-18 __Bokeh_(9775121436) __Bokeh_(9775185973) __Hmm_2013-12-30_16-54 __Moon_early_morning_Shot

  • 8 vague images : lines are not clear and photographs that are out of focus

__20131229141153!Adrien_Ricorsse SONY DSC __Llyn_Alwen,_Conwy,_Cymru_Wales_21 __Minokaya_junior_high_school __Moseskogbunn __Nella_nebbia._Franco_Nero_e_Valeria_Vaiano_in_Mineurs_-_Minatori_e_minori SONY DSC SONY DSC

  • Other cases
    __Brännblåsa_på_fingret_2013-12-26_13-40 __Pottery_-_Sonkh_-_Showcase_6-15_-_Prehistory_and_Terracotta_Gallery_-_Government_Museum_-_Mathura_201d247a1ec8535aec4f9bf86066bd10dd
    These two images are a bit out of focus.

__Jaisen_Wiki_Jalayathra_2013_Alappuzha_Vembanad_Lake26 __Jaisen_Wiki_Jalayathra_2013_Alappuzha_Vembanad_Lake41 __Jaisen_Wiki_Jalayathra_2013_Alappuzha_Vembanad_Lake42

Original image size of this is 150×150 pixel. May be it is too small and simple.

Images matched with more than one image

8,459 images were matched with more than one images. To compare the result, I generated an html file that shows all match results like following :
Screenshot from 2015-06-24 16:29:49

I converted all images to 250×250 pixel using convert -resize 250x250 filename command to show it on one page. The html file size was 6.8 mb and it shows 64,630 images.

As I mentioned on my previous blog, Pastec is good for detecting rotated/cropped image.
Almost all matching was reasonable(similar). Followings are significant matchings :
20131225102452!Petro_canada Petro_canada_info

20131225193901!Bitlendik Bitlendik-avatar

In these two cases, the logo was matched.

20131225212947!June_Allyson_Dick_Powell_1962 June_Allyson_Dick_Powell_1962 Aveiro_342

This matching looks like false positive.

Buddhapanditsabhacandapuri Aveiro_144

This matching also is false positive.

NZ-SH45_map NZ-SH30_map NZ-SH38_map

In this case, the map is shifted.

PK-LJH_(Boeing_737-9GP)_from_Lion_Airlines_(9722507259) Blick über die Saalenbergkapelle in Sölden zum Kohlernkopf

This is obvious false positive, maybe sharp part of the airplane and the roof part was matched.

From my observation, obvious false positive matching that doesn’t share any object was less than 50, which was 0.08%. Usually when the image contains graphs or documents, there were wrong matching. When the image was normal photograph, the result was very reliable.

by Hosung at June 24, 2015 09:41 PM

Anna Fatsevych

Curl and wget

When downloading images using php (using curl or file_put_contents) I have ran into issues with download sizes, possible interruptions and memory usage, which all can and have to be changed in your php.ini file.

Then I have come across this comparative article about wget and curl curl vs. wget and decided to give wget a try as it does not seem to initially have those limitations and has the ability to continue downloading even after an interrupt, thus making the case for the preferred download method in our case.

Curl relies heavily on php.ini settings and is incorporated with my php program, whereas wget is executed as a command line and downloads independently of the php settings, thus might be more beneficial in making a portable downloader with minimal changes of configuration required.

I did not have to install wget package for he Linux Mint Cinnamon OS and can just run the executable within my php code like this:

exec("wget http://your/url");
exec(“wget “.$urlToDownload);

or you can choose to specify the directory of the download with wget

exec(“wget https://your/url -O /your/dir/filename.jpg”);


I have ran more test, as at times wget would give me a 100% downloaded message, but the file was sized 0 bytes. This alarmed me as the error was not caught and it was caused by a redirect, which is automatically handled by curl. I am currently looking more into this issue, but in the meantime I have ran some tests, and these are my results:

280 images – CURL: 422 seconds, No Errors
WGET: 703 seconds, No Errors

350 images – CURL: 475 seconds, No Errors
WGET: 821 seconds, No Errors

450 images – CURL: 541 seconds, No Errors
WGET: 1008 seconds, 3 Errors – Images size 0

In regards to file storage, I have found out (NTFS) that the directory can store sufficient amount of image files for our purposes in NTFS format, and therefore one directory would be enough to store images that way as opposed to having them stored as blobs in the MySQL database.

More to come on this topic,


by anna at June 24, 2015 02:57 PM

June 23, 2015

Andrew Smith

Using ImageMagick without running out of RAM

For our research project we needed to use pHash to do some operations on a lot (tens of thousands) of image files. pHash uses ImageMagick internally, probably for simple operations such as resizing and changing the colour scheme.

I am pretty familiar with errors such as these coming from convert or mogrify:

convert.im6: no decode delegate for this image format `Ru-ей.ogg' @ error/constitute.c/ReadImage/544.
convert.im6: no images defined `pnm:-' @ error/convert.c/ConvertImageCommand/3044.
sh: 1: gm: not found

[CImg] *** CImgIOException *** [instance(0,0,0,0,(nil),non-shared)] CImg<unsigned char>::load(): Failed to recognize format of file 'Ru-ей.ogg'

What I wasn’t expecting was to get such errors in one of my own applications that uses a library (phash) that uses another library (imagemagick). What moron prints error messages to stdout from inside a library? Seriously!!??

But it gets worse. As I put this code in a loop it quickly found a reason (the first was a .djvu file) to eat up all my ram and then start on the swap. Crappy code, but it’s a complex codebase, I can forgive them. I figured I’ll just set my ulimit to not allow any program to use over half a gig of RAM with “ulimit -Sv 500000″ and ran my program again:

[CImg] *** CImgInstanceException *** [instance(0,0,0,0,(nil),non-shared)] CImg<float>::CImg(): Failed to allocate memory (245.7 Mio) for image (6856,9394,1,1).
terminate called after throwing an instance of 'cimg_library::CImgInstanceException'
  what():  [instance(0,0,0,0,(nil),non-shared)] CImg<float>::CImg(): Failed to allocate memory (245.7 Mio) for image (6856,9394,1,1).

Aborted? What sort of garbage were these people smoking? You don’t bloody abort from a library just because you ran out of memory, especially in a library that routinely runs out of memory! Bah. Anyway, I found a way to make sure it doesn’t abort. Set ulimit back to unlimited and instead created a global imagemagick configuration file /usr/share/ImageMagick-6.7.7/policy.xml:

  <policy domain="resource" name="memory" value="256MiB"/>
  <policy domain="resource" name="map" value="512MiB"/>

Now no more aborts and no more running out of memory. Good. Until I got to about file number 31000 and my machine ground to a halt again, as if out of RAM and swapping. What this time? Out of disk space of course, why not!

I’ve already set ImageMagick in my program to use a specific temporary directory (export MAGICK_TMPDIR=/tmp/magick1 && mkdir -p $MAGICK_TMPDIR) so that my program, after indirectly using the imagemagick library can run “system(“rm -f /tmp/magick?/*”);” because, you know, it’s too much to ask ImageMagick to clean up after itself. Barf… But it even got around that. For a single PDF file it used over 65GB of disk space in /tmp.

And if at least they said they’re using other people’s libraries it’s not their fault and so on and so forth maybe I wouldn’t be so pissed, but instead they give me bullshit like “oh what’s a lot of resources to you is nothing to someone else, we have 1TB of RAM, bla bla”.

Piss off, I’m going to find another solution that doesn’t involve using this garbage.

by Andrew Smith at June 23, 2015 02:47 AM

June 19, 2015

Barbara deGraaf

An Introduction to shaders

For our project we are using shaders to replicate the depth of field for the camera. The shaders online certainly work but I was not happy with the lack of explanation or the procedure within those shaders, so I have decided to make my own to replicate depth of field.

Within this post I am just going to explain some introductory concepts about using shaders in Three.js and led up to the final shader results in the later posts.

Before going into details about the shaders I am going to talk a bit about the rendering pipeline and then jump back. The rendering pipeline is the steps that OpenGL (The API that renders 2D and 3D vector graphics) takes when when rendering objects to the screen.


This image was taken from the OpenGl rendering pipeline page here.

Glossing over some things a bit, there is basically two things happening. First the pipeline deals with the vertex data. Then the vertex shader is responsible for turning those 3D vertices into a 2D coordinate position for your screen(responsible for where objects get located on the screen). After some other stuff rasterization occurs which makes fragments(triangles) from these vertices points. After this occurs the fragment shader occurs. This fragment shader is responsible for what colour the fragment/pixel on screen has.

This whole pipeline runs on the GPU and the only two parts of this pipeline that are programmable by a user are the vertex shader and the fragment shader. Using these two shaders we can greatly alter the output on the screen.

For Three.js/WebGL the shaders are written in GLSL (with three.js simplifying things for us a little bit) which is similar to C. This shader file can be separated into three main parts: uniforms, vertex shader, and the fragment shader.

For the first part the uniforms this is going to be all the values passed from the main JS file. I’ll talk about passing in values in a later post. A basic example is;

uniforms: {
"tDiffuse": { type: "t", value: null },
"value1": { type: "f", value: 1.2 }

tDiffuse is the texture that was passed from the pervious shader and this name is always the same for three.js. The types that can occur in the uniforms are many but some of the basic ones are i = integer, f=float, c=colour, t=texture, v2 = vector2 (also 3 and 4 exist), m4 = matrix4 etc….

The next part is the vertex shader, because of what I want to do (change the colour of the pixel to create a blurring effect) I don’t need to change anything in here, but it is still required to write this in the shader file. If you want to code one you must code the other as well.

vertexShader: [

  "varying vec2 vUv;",
  "void main() {",
    "vUv = uv;",
    "gl_Position = projectionMatrix * modelViewMatrix * vec4( positio       n, 1.0 );",


Varying meaning that the value change for each pixel being processed. In this one we have vUv which is a vector that holds the UV (screen co-ordinates) of the pixel and is automatically passed in by three.js. The next line just takes the 3D coords and projects them onto the 2D coords on your screen. I am going to skip the explanation of why this works as it is not important, just look it up or ask me if you really want to know.

Now for the important one, the fragment shader;

fragmentShader: [

"uniform sampler2D tDiffuse;",
"varying vec2 vUv;",

"void main() {",
  "vec4 color = texture2D(tDiffuse, vUv);",
  "gl_FragColor = color;",


For this vUv is the same as from the vertex shader and tDiffuse is the texture that was passed in (stated as sampler2D here). In this main loop we are grabbing the RGBA value from the passed in texture as coord vUv and then assigning it to the output pixel.

This is the shader I will be using to create a depth of field and for the rest of the posts I will be looking at this shader only.

That’s it for the introduction, next post I will start to get into the fragment shader and image convolution.



by barbaradegraafsoftware at June 19, 2015 07:48 PM

Dmitry Yastremskiy

Hello Data!

I’m working on the project of 3D Data Visualization, which is an emerging industry and getting popular these days, especially where lots of data gets generated and needs ways to interpret it to allow humans read it and learn something from it. The goals of this project are to be able to grab pretty much any data and visualize it as well as visualize it taking advantage of the 3rd dimension Z, where 2 dimensions X and Y just not enough. In order to give the app to be extensible and to live happy life we are structuring it the way people will be able to add their own templates and sources of data, so it is not wired to particular data sources or visualizations. From the technical side the tools we using are: Three.js for WebGL, Backbone.js for MVC pattern, Require.js for dynamic script loading and pure vanilla JavaScript for the rest. You can see our first steps here: We will be happy for any feedback or advices. Feel free.

by hamabama at June 19, 2015 07:03 PM

June 18, 2015

Hosung Hwang

Pastec analysis

Pastec works as following order :

  1. Load visual words : visualWordsORB.dat file contains it, the size is 32,000,000 bytes. Loading the file takes around 1 seconds.
  2. Building the word index : using the visual words, builds word index; it takes around 13 seconds.
  3. Now previously saved index file can be loaded, or an image can be added to the index.
  4. Using an image file, similar image file that contains similar word indexes can be searched.
  5. Index in the memory can be written to a file

Adding new image to the index works as following order :

  1. Using OpenCV, ORB features are extracted.
  2. Matching visual words are searched.
  3. Matching visual words are indexed on the memory

When I added 900 images, the size of index file was 16,967,440 bytes.

By changing source code, I saved matching visual word list to the text file for each images. Each word matching stored using this struct :

struct HitForward
    u_int32_t i_wordId;
    u_int32_t i_imageId;
    u_int16_t i_angle;
    u_int16_t x;
    u_int16_t y;

Each word matching has word id, image id, angle, and x/y coordination. Saved file looks like this (order of ImageID,Angle,x,y,WordId) :


It contains 1593 lines, which means it has 1593 matching words. Image id 469 was Jánské.jpg and the image looks like this :
The size of this image is 12.8 mb. Like other HDR images, it contains lots of features. Also it has biggest number of matching words among 900 images. When the data was written to the text file, the size was 39,173 bytes, it would be the worst case. When the image is simple, only few words are matched. Full size of matching word text files of 900 images was 20.9 mb.

To reduce it, I made a simple binary format. Since the image id is the same for an image, I wrote it once, and it is followed by 4 bytes count. Then every word is written as 4 bytes word id, 2 bytes angle, 2 bytes x, and 2 bytes y.

4 bytes - id
4 bytes - count
4,2,2,2 (10 bytes) *  count

In case of id 469 image, the size is 11,238 bytes. And the file looks like this :

00000000: d501 0000 3906 0000 e282 0100 dcd9 a101  ....9...........
00000010: 6f00 a2fc 0300 10b4 a801 c501 889c 0000  o...............
00000020: 9610 6203 0901 f2b1 0900 00ad 5703 2701  ..b.........W.'.
00000030: 9b70 0000 0ee7 df02 0c01 4d20 0200 ee30  .p........M ...0
00000040: 1102 7000 9ba0 0200 e130 f401 2700 3b68  ..p......0..'.;h
00000050: 0400 a2bd 6702 3b00 b094 0800 c64c 5f02  ....g.;......L_.

0x1d5 is 469 and 0x639 is 1593.
In this case, the size was 15938 bytes, which was 15 kb, around 34% of text format (39 kb).
Since this image is the worst case, storing all binary index to database for all image record is realistic.
Full size of all 900 images was 8.5 mb. (text file was 20.9 mb)
Interestingly, it is smaller than index file for 900 images (16.2 mb)


I was thinking of saving index file. However, saving word list for each image will be the better solution because when it is binary format, it consumes less storage and adding it to the index is very fast. Also, when it is stored as a database field, synchronization between index and database is not a problem.

by Hosung at June 18, 2015 09:58 PM

June 17, 2015

Hosung Hwang

How to import CMake project in Eclipse CDT4

Currently I am analysing Pastec; it uses CMake as a build system. To split them up, I wanted to analyse it using the functionality of Eclipse.

Pastec can be built using following order.

$ git clone
$ mkdir build
$ cd build
$ cmake ../
$ make

To build Pastec in Eclipse CDT, instead of doing “cmake ..”, following order need to be done. (Debug build)

$ cd build
$ cmake -G"Eclipse CDT4 - Unix Makefiles" -D CMAKE_BUILD_TYPE=Debug ..

Then, it can be imported into Eclipse:

  1. Import project using Menu File->Import
  2. Select General->Existing projects into workspace:
  3. Browse where your build tree is and select the root build tree directory(pastec/build). Keep “Copy projects into workspace” unchecked.
  4. You get a fully functional eclipse project


by Hosung at June 17, 2015 03:53 PM

June 16, 2015

Anna Fatsevych

Wiki Commons API

I have been working on downloading meta data for the images found in the Wiki Image Dumps. I am using the Commons Tools API to gather licensing data and author information.

The fact that anybody can edit information on the Wiki, is great for many reasons, but can produce unexpected, and sometimes totally unreadable results when trying to parse XML returned from the call.

Here is the code snipped, and while the image name is unique and stays unchanged, the author name, license, description, and even the template itself can be changed and edit by the user.

 [file] => SimpleXMLElement Object
            [name] => QuezonNVjf181.JPG
            [title] => File:QuezonNVjf181.JPG
            [urls] => SimpleXMLElement Object
                    [file] =>
                    [description] =>

            [size] => 6480788
            [width] => 4608
            [height] => 3456
            [uploader] => Ramon FVelasquez
            [upload_date] => 2013-12-29T09:28:24Z
            [sha1] => 8646ca2be96f423faa2c33da1f2bbddbeee454c8
            [date] => 
            [author] => a href="" title="User:Ramon FVelasquez">Ramon FVelasquez SimpleXMLElement Object

As you can see – the author tag has an html tag, that sometimes can be just plain text; I am parsing the “title” tag and storing the contents, which prove to be erroneous at times. Also as far as licensing is concerned, it is usually much clearer, as the pre-set Creative Commons Licenses are mostly used, and thus provide an easier parse-able fields:

    [licenses] => SimpleXMLElement Object
            [@attributes] => Array
                    [selfmade] => 1

            [license] => SimpleXMLElement Object
                    [name] => CC-BY-SA-3.0
                    [full_name] => Creative Commons Attribution Share-Alike V3.0
                    [attach_full_license_text] => 0
                    [attribute_author] => 1
                    [keep_under_same_license] => 0
                    [keep_under_similar_license] => 1
                    [license_logo_url] =>
                    [license_info_url] =>
                    [license_text_url] =>


I am using this Commons Tool to get the information from already downloaded images. I also have been checking first if I have the complete information in the XML file dumps first, but now, have decided to bypass that check and just use the API, as I think it will provide us with the newly updated information, and less possibility for an outdated or corrupt XML file.



by anna at June 16, 2015 08:58 PM

June 15, 2015

Hosung Hwang

Pastec test method and result


Pastec is mentioned on my previous posting about Content Based Image Retrieval(CBIR). It extracts features using ORB and Visual Word.

Pastec offers visual word data file: visualWordsORB.dat that is 10.5MB. Pastec program load the visual word data initially and then load index data file. Then it can be searched. Today, I am going to write about the test result for 900 images the same as I did before. Performance and source code analysis will be done later.

Test Mothod

Full API is in this page.
Pastec runs as HTTP Server that uses RESTful API. It can run using following command :

./pastec visualWordsORB.dat

I added all jpeg images in the directory to the index by writing this script :

for F in /home/hosung/cdot/ccl/hashtest/all-images/*.jpg;
    curl -X PUT --data-binary @"${F}" http://localhost:4212/index/images/$i;

Then each image is searched by this script :

for F in /home/hosung/cdot/ccl/hashtest/all-images/*.jpg;
    echo $i,"${F}"
    curl -X POST --data-binary @"${F}" http://localhost:4212/index/searcher;

These generates an output like following :

2,/home/hosung/cdot/ccl/hashtest/all-images/05 0751 DOE NamUs UP 345 Reconstruction 001a.jpg
3,/home/hosung/cdot/ccl/hashtest/all-images/0514-80 Reconstruction 002b.jpg
70,/home/hosung/cdot/ccl/hashtest/all-images/A 3D Object design using FreeCad Software.jpg

Since response is json data, I had to parse again. So I wrote simple python script because in python, json parsing is easy.

import json

id = 0
file = "nofile"
error = 0
notfound = 0
found = -1
moreThanOne = 0
onlyOne = 0

with open("search2.txt", "r") as f:
    for line in f:
        if line[0] != '{':
            line1 = line.split(',')
            id = int(line1[0])
            file = line1[1]
            j = json.loads(line)
            if j["type"] == "SEARCH_RESULTS":
                ids = j["image_ids"]
                if len(ids) == 0:
                    notfound += 1
                if len(ids) == 1:
                    found = ids.index(id)
                    onlyOne += 1
                if len(ids) > 1:
                    moreThanOne += 1
                    print str(id) + " : ",
                    print ids,
                    print file
                print str(id) + " : " + j["type"],
                print " : " + file
                error += 1

print "Error : " + str(error)
print "NotFound : " + str(notfound)           
print "Match Only One : " + str(onlyOne)
print "Match More Than One : " + str(moreThanOne)

I printed only the results that include more than one matching. Following is the result of previous python script

22 : [22, 835] /home/hosung/cdot/ccl/hashtest/all-images/1992-06560 Reconstruction 002.jpg
23 : [23, 835] /home/hosung/cdot/ccl/hashtest/all-images/1992-06614 Reconstruction 002.jpg
28 : [28, 29, 30] /home/hosung/cdot/ccl/hashtest/all-images/20131017 111028 green spiral ornament with Purple background.jpg
29 : [29, 30, 28] /home/hosung/cdot/ccl/hashtest/all-images/20131017 111122 Fairest wheel ornament with wall as background.jpg
30 : [30, 29] /home/hosung/cdot/ccl/hashtest/all-images/20131017 111143 - White Feerest wheel ornament with plywood background.jpg
70 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/A 3D Object design using FreeCad Software.jpg
77 : [77, 78] /home/hosung/cdot/ccl/hashtest/all-images/Alaska Hitchhiker Skull (Moustache Hair Eyepatch).jpg
78 : [78, 77] /home/hosung/cdot/ccl/hashtest/all-images/Alaska Hitchhiker Skull (Moustache Hair).jpg
90 : [90, 91] /home/hosung/cdot/ccl/hashtest/all-images/Anisotropic filtering en.jpg
91 : [91, 90] /home/hosung/cdot/ccl/hashtest/all-images/Anisotropic filtering pl.jpg
175 : [175, 180] /home/hosung/cdot/ccl/hashtest/all-images/Ch Light10.jpg
176 : [176, 177] /home/hosung/cdot/ccl/hashtest/all-images/Ch Light2.jpg
177 : [177, 176] /home/hosung/cdot/ccl/hashtest/all-images/Ch Light3.jpg
178 : [178, 181] /home/hosung/cdot/ccl/hashtest/all-images/Ch Light4.jpg
180 : [180, 175] /home/hosung/cdot/ccl/hashtest/all-images/Ch Light6.jpg
193 : [193, 195] /home/hosung/cdot/ccl/hashtest/all-images/Circle reflect wikipedia 2.jpg
195 : [195, 193] /home/hosung/cdot/ccl/hashtest/all-images/Circle reflect wikipedia sky.jpg
204 : [204, 205] /home/hosung/cdot/ccl/hashtest/all-images/Computer generated image of the M챈rsk Triple E Class (1).jpg
205 : [205, 204] /home/hosung/cdot/ccl/hashtest/all-images/Computer generated image of the M챈rsk Triple E Class (cropped).jpg
207 : [207, 367, 772] /home/hosung/cdot/ccl/hashtest/all-images/Copper question mark 3d.jpg
211 : [211, 210] /home/hosung/cdot/ccl/hashtest/all-images/Cro-Magnon man - steps of forensic facial reconstruction.jpg
216 : [216, 217] /home/hosung/cdot/ccl/hashtest/all-images/CTSkullImage - cropped.jpg
217 : [217, 216] /home/hosung/cdot/ccl/hashtest/all-images/CTSkullImage.jpg
220 : [220, 222] /home/hosung/cdot/ccl/hashtest/all-images/Cubic Structure.jpg
222 : [222, 220] /home/hosung/cdot/ccl/hashtest/all-images/Cubic Structure with Shallow Depth of Field.jpg
237 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Dimens찾o Fractal.jpg
251 : [251, 252] /home/hosung/cdot/ccl/hashtest/all-images/Earthrelief.jpg
252 : [252, 251] /home/hosung/cdot/ccl/hashtest/all-images/Earthrelief mono.jpg
266 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/ENIGMA Logo.jpg
281 : [281, 282] /home/hosung/cdot/ccl/hashtest/all-images/Flower And Vase (Graphic).jpg
282 : [282, 281] /home/hosung/cdot/ccl/hashtest/all-images/Flower And Vase Ver.02.jpg
337 : [337, 338] /home/hosung/cdot/ccl/hashtest/all-images/Frankfurt Skyline I - HDR (14196217399).jpg
338 : [338, 337] /home/hosung/cdot/ccl/hashtest/all-images/Frankfurt Skyline II - HDR (14391360542).jpg
350 : [350, 352, 351] /home/hosung/cdot/ccl/hashtest/all-images/Glass ochem dof2.jpg
351 : [351, 350, 352] /home/hosung/cdot/ccl/hashtest/all-images/Glass ochem dof.jpg
352 : [352, 350, 351] /home/hosung/cdot/ccl/hashtest/all-images/Glass ochem.jpg
356 : [356, 357] /home/hosung/cdot/ccl/hashtest/all-images/GML-Cave-Designer (1).jpg
357 : [357, 356] /home/hosung/cdot/ccl/hashtest/all-images/GML-Cave-Designer.jpg
358 : [358, 359] /home/hosung/cdot/ccl/hashtest/all-images/GML-Gothic-Cathedral (1).jpg
359 : [359, 358] /home/hosung/cdot/ccl/hashtest/all-images/GML-Gothic-Cathedral.jpg
360 : [360, 361] /home/hosung/cdot/ccl/hashtest/all-images/GML-Gothic-Window-Thickness (1).jpg
361 : [361, 360] /home/hosung/cdot/ccl/hashtest/all-images/GML-Gothic-Window-Thickness.jpg
362 : [362, 363] /home/hosung/cdot/ccl/hashtest/all-images/GML-Stuhl-Template (1).jpg
363 : [363, 362] /home/hosung/cdot/ccl/hashtest/all-images/GML-Stuhl-Template.jpg
364 : [364, 365] /home/hosung/cdot/ccl/hashtest/all-images/GML-Voronoi-Diagram (1).jpg
365 : [365, 364] /home/hosung/cdot/ccl/hashtest/all-images/GML-Voronoi-Diagram.jpg
367 : [367, 207, 772] /home/hosung/cdot/ccl/hashtest/all-images/Gold question mark 3d.jpg
377 : [377, 378] /home/hosung/cdot/ccl/hashtest/all-images/Griffith Park Jane Doe Reconstruction 9b.jpg
378 : [378, 377] /home/hosung/cdot/ccl/hashtest/all-images/Griffith Park Jane Doe Reconstruction 9d.jpg
423 : [423, 424] /home/hosung/cdot/ccl/hashtest/all-images/Hall effect A.jpg
424 : [424, 423] /home/hosung/cdot/ccl/hashtest/all-images/Hall effect.jpg
435 : [435, 815, 814] /home/hosung/cdot/ccl/hashtest/all-images/HDR The sound of silence (The road to Kamakhya).jpg
436 : [436, 837] /home/hosung/cdot/ccl/hashtest/all-images/HEAD inline.jpg
448 : [448, 449] /home/hosung/cdot/ccl/hashtest/all-images/Homo erectus pekinensis
449 : [449, 448] /home/hosung/cdot/ccl/hashtest/all-images/Homo erectus pekinensis.jpg
453 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/HrdiBloomExample.jpg
457 : [457, 458] /home/hosung/cdot/ccl/hashtest/all-images/Ilame In Tengwar Ver.01-2.jpg
458 : [458, 457] /home/hosung/cdot/ccl/hashtest/all-images/Ilam챕 (Name) In Tengwar.jpg
487 : [487, 488] /home/hosung/cdot/ccl/hashtest/all-images/King's Cross railway station MMB C1.jpg
488 : [488, 487] /home/hosung/cdot/ccl/hashtest/all-images/King's Cross railway station MMB C2.jpg
489 : [489, 490] /home/hosung/cdot/ccl/hashtest/all-images/King's Cross railway station MMB C3.jpg
490 : [490, 489] /home/hosung/cdot/ccl/hashtest/all-images/King's Cross railway station MMB C4.jpg
494 : [494, 495] /home/hosung/cdot/ccl/hashtest/all-images/KrakowHDR pics.jpg
495 : [495, 494] /home/hosung/cdot/ccl/hashtest/all-images/KrakowHDR slides.jpg
512 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/LOD Example.jpg
521 : [521, 524, 523] /home/hosung/cdot/ccl/hashtest/all-images/Lync02.jpg
523 : [523, 524, 521] /home/hosung/cdot/ccl/hashtest/all-images/Lync04.jpg
524 : [524, 523, 521] /home/hosung/cdot/ccl/hashtest/all-images/Lync05.jpg
586 : [586, 593] /home/hosung/cdot/ccl/hashtest/all-images/Mount Vernon
610 : [610, 611] /home/hosung/cdot/ccl/hashtest/all-images/Obsidian Soul 1.jpg
611 : [611, 610] /home/hosung/cdot/ccl/hashtest/all-images/Obsidian Soul 2.jpg
617 : [617, 618] /home/hosung/cdot/ccl/hashtest/all-images/Oren-nayar-vase1.jpg
618 : [618, 617] /home/hosung/cdot/ccl/hashtest/all-images/Oren-nayar-vase2.jpg
667 : [667, 668] /home/hosung/cdot/ccl/hashtest/all-images/Radiosity Comparison.jpg
668 : [668, 667] /home/hosung/cdot/ccl/hashtest/all-images/Radiosity scene.jpg
676 : [676, 677, 678] /home/hosung/cdot/ccl/hashtest/all-images/Rauzy2.jpg
677 : [677, 678, 676] /home/hosung/cdot/ccl/hashtest/all-images/Rauzy3.jpg
678 : [678, 677, 676] /home/hosung/cdot/ccl/hashtest/all-images/Rauzy4.jpg
721 : [721, 724, 722, 723] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.00.12 PM Meshlab.jpg
722 : [722, 721, 723, 724] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.00.26 PM meshlab.jpg
723 : [723, 722, 721, 724] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.00.37 PM meshlab.jpg
724 : [724, 721, 722, 723] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.00.49 PM meshlab.jpg
725 : [725, 726, 731, 730, 727, 729, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.09.42 PM blender.jpg
726 : [726, 725, 731, 730, 727, 729, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.11.32 PM blender.jpg
727 : [727, 725, 726, 731, 730, 729, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.11.42 PM blender.jpg
728 : [728, 729, 727, 726, 725, 731, 730] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.13.32 PM blender.jpg
729 : [729, 726, 731, 727, 725, 730, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.14.07 PM blender.jpg
730 : [730, 731, 726, 725, 727, 729, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.14.11 PM blender.jpg
731 : [731, 730, 726, 725, 727, 729, 728] /home/hosung/cdot/ccl/hashtest/all-images/Screen Shot 2013-10-27 at 2.14.15 PM blender.jpg
734 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Scupltris logo.jpg
763 : [763, 764] /home/hosung/cdot/ccl/hashtest/all-images/Snapshot12.jpg
764 : [764, 763] /home/hosung/cdot/ccl/hashtest/all-images/Snapshot13.jpg
772 : [772, 207, 367] /home/hosung/cdot/ccl/hashtest/all-images/Spanish Question mark 3d.jpg
790 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Sterling2 icon SterlingW2589.jpg
799 : [799, 800] /home/hosung/cdot/ccl/hashtest/all-images/Synagoge Weikersheim innen 01.jpg
800 : [800, 799] /home/hosung/cdot/ccl/hashtest/all-images/Synagoge Weikersheim innen 02.jpg
814 : [814, 435, 815] /home/hosung/cdot/ccl/hashtest/all-images/The Sound of Silence -2EV.jpg
815 : [815, 435] /home/hosung/cdot/ccl/hashtest/all-images/The Sound of Silence Resulting HDR.jpg
835 : [835, 22, 23] /home/hosung/cdot/ccl/hashtest/all-images/UP 3773 and UP 3774 (1400UMCA and 1397UMCA) Reconstruction 001.jpg
837 : [837, 436] /home/hosung/cdot/ccl/hashtest/all-images/UPPER inline.jpg
844 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Valentine Doe 1993 Scaled.jpg
852 : [852, 854] /home/hosung/cdot/ccl/hashtest/all-images/ViewFrustum.jpg
854 : [854, 852] /home/hosung/cdot/ccl/hashtest/all-images/ViewWindow2.jpg
876 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Woman in bra staring.jpg
882 : [882, 883] /home/hosung/cdot/ccl/hashtest/all-images/WP VS 1 rel(dachris).jpg
883 : [883, 882] /home/hosung/cdot/ccl/hashtest/all-images/WP VS 2 rel(dachris).jpg
898 : IMAGE_SIZE_TOO_SMALL : /home/hosung/cdot/ccl/hashtest/all-images/Zoomin.jpg

Error : 10
NotFound : 10
Match Only One : 783
Match More Than One : 97

Test Result

The result says that there were 10 images that did not added. The reason was ‘IMAGE_SIZE_TOO_SMALL’. According to the source code, when the image’s width or height is smaller than 150 px, it does not add to the index. Since the 10 images didn’t added to the index, there were 10 images that are not founded in the searching.
783 images were matched with only the same image.
And 97 images were matched with more than one images.
Therefore, there was no true negative result.

Followings are some meaningful matching.

Cropped image

This :1992-06560 Reconstruction 002, and this : 1992-06614 Reconstruction 002 matches with this:UP 3773 and UP 3774 (1400UMCA and 1397UMCA) Reconstruction 001

It means this algorithm detects when an image is part of the other image. Followings are similar results :

Computer generated image of the Mærsk Triple E Class (1) Computer generated image of the Mærsk Triple E Class (cropped)

Cro-Magnon man rendered Cro-Magnon man - steps of forensic facial reconstruction

Frankfurt Skyline I - HDR (14196217399) Frankfurt Skyline II - HDR (14391360542)

Hall effect Hall effect A

Homo erectus pekinensis Homo erectus pekinensis, forensic facial reconstruction

Oren-nayar-vase1 Oren-nayar-vase2

moving and similar images

20131017 111028 green spiral ornament with Purple background 20131017 111122 Fairest wheel ornament with wall as background 20131017 111143 - White Feerest wheel ornament with plywood background

Mount Vernon, NYJane Doe facial reconstruction NamUs 3123 Reconstruction 001

This result is a bit strange. Faces of two people look resemble, however, this seems to be a false positive result.

Synagoge Weikersheim innen 01 Synagoge Weikersheim innen 02

changing colours and rotation

Copper question mark 3d Gold question mark 3d Spanish Question mark 3d

The other cases

KrakowHDR pics KrakowHDR slides

Three images’ position were changed.

Obsidian Soul 1 Obsidian Soul 2

Rauzy2 Rauzy3 Rauzy4

This result is a bit strange; another false positive.

Snapshot12 Snapshot13

This result is interesting because rotated object are detected. Whereas, for similar images (Snapshot00, 01, 02 ~ 14.jpg) that gives a lot of false positive result in pHash, didn’t match each other.


  • Pastec ignores images when their width or height is smaller than 150px. This should be considered.
  • Rotated and cropped images can be detected.
  • Comparing to DCT/MH hash in pHash, there were much less false positive results.
  • All in all, the result for 900 images were reliable than pHash
  • Hashing/Indexing and searching seems to be quite fast. However, performance test should be performed.
  • Hash size and indexing/searching mechanism should be analysed to customize for our server system


by Hosung at June 15, 2015 09:53 PM

Ali Al Dallal

Simple React Webpack and Babel Starter Kit

At Mozilla Foundation, we're starting to use React mainly to create our Web application and most of the time writing React without Webpack and Babel can be a bit annoying or really hard I can say.

Finding an example to create React app with Webpack and Babel sometimes you get tons of stuff that you don't want or don't care and having to remove stuff yourself you'll either create bugs or finding yourself spending more time fixing things that you broke than starting to code, so I created this simple repo with just the simple stuff you need to get started.

React Webpack and Babel
Simple React Webpack Babel Starter Kit

This is a simple React, Webpack and Babel application with nothing else in it.

What's in it?

Just a simple index.jsx, webpack.config.js and index.html file.

To run

You can simply run webpack build using this command:

> $ npm run build

If you want to run with webpack-dev-server simply run this command:

> $ npm run dev

Please contribute to the project if you think this can be done better in anyway even for the README :)

by Ali Al Dallal at June 15, 2015 02:37 PM

June 12, 2015

Anna Fatsevych

Flickr API in PHP

In one of my previous posts, I wrote a Python program to download images using Flickr API.

Now, I wrote it in PHP using phpFlickr API, which is quite easy to use and understand. For our purposes, my program will download all the images uploaded on the specific date. It makes one api call per image, it also hashes the images, as well as stores them in MySQL database.

Here is a code snipped to see how easy it is to make an API call and set the required parameters:

$f = new phpFlickr("YOUR API KEY HERE");
$photos = $f->photos_search(array("tags"=>"car","per_page"=>"500",
          "license"=>"3", "extras"=>"url_o,owner_name, license"));

More details on Flickr API queries and limitations are in my previous post here. The PHP program is available on GitHub.



by anna at June 12, 2015 07:24 PM

June 11, 2015

Hong Zhan Huang

OSTEP – The City of ARMs – Tools of the trade 2: tmux

The tool of the trade to be featured in this post will be the terminal multiplexer known as tmux. A terminal multiplexer is a tool which allows a user to create, access and manage a number of terminals all withing the confines of one screen. tmux also has the ability to detached from the screen and continue running in the background and later reattached when one wishes to continue the work from where the session was left off at. The tmux manual offers an encompassing literature on the workings of the program for those interested.

In this post I’ll be expounding upon my experience in setting up and using tmux.

The work that I’m doing at CDOT for the OSTEP team involves ssh’ing into a variety of machines (mainly housed in our EHL server cabinet) on a daily basis. After a certain point it becomes difficult to manage each connection with just a regular terminal. There’s also the inability to continue from the point where you had left off, the next time you want to return to work. After seeing my coworkers making use of tmux in their work processes, I endeavored to attempt to do the same.

tmux vs screen

Before we get into the basics of tmux, we should perhaps compare it with another terminal multiplexer: GNU’s Screen. I’m no expert on Screen but the gist of the comparison seems to point to tmux being a more modern and better version of Screen and is still actively being supported. The reading on the reasons why that is can be read on this FAQ. For myself as a new user to tmux and only have a little bit of dabbling with Screen, tmux does seem to be the better tool so far.


After installing tmux onto your system, to use it you’ll need to start a new session of tmux. This can be done through this command:

tmux new -s Demo

This will create a new session named Demo that has a single window and display it on the screen. You’ll also notice that in this window there is a status line at the bottom of the screen that will show information about the current session as well as being the location to input tmux commands.

A basic tmux session with one window

From here we can begin using tmux’s features and functionality to modify our terminal work space to suit our liking.

tmux prefix

The prefix or escape function is the key combination that allows the user to exit normal input and enter tmux commands or shortcuts. The prefix in tmux is ctrl-b or in other words ctrl plus b together. Following this input you may press any key that has a bound functionality to it (ctrl-b c will create a new window for example) or press the colon key to enter the tmux command prompt where you can type out the command you wish to execute manually. You can find a list of all the currently assigned bindings with ctrl-b then question mark (ctrl-b ?). Now with the knowledge of the prefix let’s go and play around.

We’ll start by creating an additional three more windows in our session:

ctrl-b c x3 or new-window in the tmux command prompt

In our first window we’ll split the window into three panes by first splitting the window in half vertically:

ctrl-b % or split-window -v (v for vertical splits and h for horizontal splits)

Lastly we’ll rename the current window to “A Distant Place” (tmux has a search function for window names so you can easily find a window if you have many running if you have a name for it):

ctrl-b , or command-prompt -I #W "rename-window '%%'"

Now our session looks like this:

We have four windows as shown in the status line and our first window now named A Distant Place has a two pane split. These are just some of the basic options to creating a work-space to your liking.


One of the pros of using terminal multiplexers like tmux is the ability to start a task, walk away and come back to it later. The process to do this is to detach the session:

ctrl-b d or detach-client

and then when you wish to return to your session:

tmux attach -t Demo

Sessions are ended when all windows of a session are exited. My typical usage of tmux so far is to have my workstation start the session and thus become the tmux server. I can then remotely access my workstation via a laptop when I’m not on site and can continue using my session for as long as it exists. By using tmux I can maintain  a constant terminal environment with all the ssh or serial connections easily.


I had said earlier that tmux is quite easily customizable. You can change how the key bindings are for tmux commands or create new ones for your own preferences. You may also change the visual aspects of tmux such as the colours of the status bar items. You can add items of choice to the status bar such as up-time, number of users currently using the session or battery-life of your laptop. Mouse support also exists for tmux should you want it. Suffice to say there is a lot of customization you can do with tmux. I’ll share the .tmux.conf file that has all the configurations I’ve been using so far (comments are prefixed with the # sign):

#Start numbering at 1
set -g base-index 1
set -g pane-base-index 1

#Set status bar for cleaner look
set -g status-bg black
set -g status-fg white
set -g status-left '#[fg=green]#H'

#Highlight active window
set-window-option -g window-status-current-bg red
set-window-option -g window-status-activity-style "fg=yellow"

#Show number of current users logged in and average loads for the computer
set -g status-right '#[fg=yellow]#(uptime | cut -d "," -f2-)'

#Set window notifactions
setw -g monitor-activity on
set -g visual-activity on

#Automatically set window title
setw -g automatic-rename

#Rebind split window commands
unbind % #Remove the default binding for split-window -h
bind | split-window -h
bind - split-window -v

#Less input delay in command sequences ie C-a n
set -s escape-time 0

#Mouse support
set -g mode-mouse on
set -g mouse-resize-pane on
set -g mouse-select-pane on
set -g mouse-select-window on

#Allow for aggressive resizing of windows (not constrained by smallest window)
setw -g aggressive-resize on

#pane traversal bindings
bind h select-pane -L
bind j select-pane -D
bind k select-pane -U
bind l select-pane -R

# reload config
bind r source-file ~/.tmux.conf \; display-message "Config reloaded..."

#COLOUR (Solarized 256)

#default statusbar colors
set-option -g status-bg colour235 #base02
set-option -g status-fg colour136 #yellow
set-option -g status-attr default

#default window title colors
set-window-option -g window-status-fg colour244 #base0
set-window-option -g window-status-bg default
set-window-option -g window-status-attr dim

#active window title colors
set-window-option -g window-status-current-fg colour166 #orange
set-window-option -g window-status-current-bg default
set-window-option -g window-status-current-attr bright

#pane border
set-option -g pane-border-fg colour235 #base02
set-option -g pane-active-border-fg colour136 #base01

#message text
set-option -g message-bg colour235 #base02
set-option -g message-fg colour166 #orange

#pane number display
set-option -g display-panes-active-colour colour33 #blue
set-option -g display-panes-colour colour166 #orange

set-window-option -g clock-mode-colour colour64 #green

# status bar
set-option -g status-utf8 on

So that about wraps up an introductory bit about tmux’s utility and a brief on how you can go about using it. I think it is a really useful tool for those who are regularly using remote machines through ssh and I’ll likely be using it all the time from here on out. There are many features and items I didn’t touch on such as tmux’s copy mode, multi-user sessions and more. If you’re so interested in learning more about tmux, please refer to their official manual.

by hzhuang3 at June 11, 2015 05:36 PM

Hosung Hwang

MH Hash, MVP-Tree indexer/searcher for MySQL/PHP

Current development server works on the LAMP stack. Anna is working on Creative Commons Image crawler and User Interface using PHP/MySQL. For the prototype that works with the PHP UI code and MySQL database, I made an Indexer and Searcher.


The database contains lot’s of records that contains image url, license, and hash values. And that is make by crawler written in PHP.


Source code :

Description :

$ ./mhindexer
Usage :
     mhindexer hostName userName password schema table key value treeFilename
     hostName : mysql hostname
     userName : mysql username
     password : mysql password
     schema : db name
     table : table name
     key : image id field name in the table
     value : hash field name in the table
     treeFilename : mvp tree file name
Output :

The program takes MySQL connection informations : hostname, username, password. And the database information : schema, table, key, value. After connecting using the information, it reads all ‘key’ and ‘value’ fields from the ‘table’. ‘key’ is used as a unique key that points the db record that contains image information : filename, url, hash value, etc. ‘value’ is a hash value that is used to calculate hamming distance.

After connecting to the database, program reads all records that contains hash values. And makes add them to MVP-tree. When the tree is built, it is written to the ‘treeFilename’ file.

I made simple bash script that run mhindexer with parameters. output is :

$ ./,784,0.035845

From the hashes in the database, the tree is written to and there are 784 nodes and it took 0.035845 seconds.


Source code :

Description :

Usage :
    mhsearcher treeFilename imageFilename radius
    eg : mhsearcher ./test.jpg 0.0005
output : 0-success, 1-failed
    success : 0,count,id,id,id,...
      eg : 0,2,101,9801 
    failed : 1,error string
      eg : 1,MVP Error

For now, searcher reads the tree file(treeFilename) to generate tree structure, and extracts MH hash from input file(imageFilename), then search the hash value in the tree using ‘radius’.

Output is used by php script. When the first field divided by comma is 0, there is no error and the result is meaningful. Second field is count of detected hashes. And following fields are ids of hashes. Using the ids, php script can get image information from the database.
When the first field is 1, following field is the error message.

To test it, I randomly chose an image that is in the database.
Example output is :

$ ./mhsearcher WTW_Nov_2013_Tumanako_023.JPG 0.001
$ ./mhsearcher WTW_Nov_2013_Tumanako_023.JPG 0.1
$ ./mhsearcher WTW_Nov_2013_Tumanako_023.JPG 0.2
$ ./mhsearcher WTW_Nov_2013_Tumanako_023.JPG 0.3
$ ./mhsearcher WTW_Nov_2013_Tumanako_023.JPG 0.44

For the performance statistics purpose, I added radius, calculation count and extraction time at the end of the result.
In this image’s case, when the radius was 0.2, matching image was found. And when the radius was 0.44, there was 5 results.


  • This utilities works well with MySQL and PHP.
  • Because of the characteristics of tree search algorithm, repeated search from the radius of 0.001 to 0.5 inside the searcher can be done to get the fast and reliable result.
  • Later, indexer and searcher can be changed to linux daemon process to maintain the tree in the memory for fast searching.
  • When the amount of database record is enormous(millions ~ billions), the tree can be divided to several sections in the database.

by Hosung at June 11, 2015 04:30 AM

June 10, 2015

Koji Miyauchi

Heroku with node.js & mongoDB


The goal of our project last two weeks is toput our application on to Github Page. In order to do that, we had to host our server side APIs to somewhere accessible.
After some discussions with our clients, we decided to host the server side codes to Heroku.

Heroku is one of the popular cloud application platforms ( such as AWS, DigitalOcean and Engine Yard ) that can host your web application. Good thing is about Heroku is initial cost is free.

This service is very easy to use.
Only you need to do is basically these.

  1. Have your git repository for the app.
  2. Proper configuration in your project.
    In our case, we use Node.js, so we configure the applications’ dependencies and start up file in package.json
  3. Push the repository to Heroku

After deploy your application to Heroku’s master repository. It will automatically install all the dependencies your app need, and run it.

Deploy your application to Heroku

Here is good instruction how to deploy your Node.js application onto Heroku. Setting up is very straight forward.

Install mongoDB Add-on

In order to use mongoDB on Heroku after set up your application. You need to install Add-on calledmongoLab or Compose MongoDB. I use mongoLab this time.

Installing Add-on is also quite easy to do. Just type

heroku addons:create mongolab -a <application>

and it will install the Add-on on to your application.
All the configuration of your DB is available from Heroku’s web console.
mongoLab 500MB storage for free.


Heroku accepts many types of applications, such as Node.js, Ruby on Rails, php, Java, Python and so on.
And it allow user to deploy the application very quickly. It will automatically set up infrastructure for you, so you can save your time as well.

by koji miyauchi at June 10, 2015 09:11 PM

Anna Fatsevych

Wiki Parser and User Interface

As I mentioned in the last post I was writing a “parser” of some sorts to get through the xml files that are located in the Wiki Image Grab along with the corresponding images.

I have a php program now, that will get the image name from the list file, and will then use wiki API to get the latest data (author, license, and its existence status). The program is available on GitHub.

I have also written a User Interface in PHP that will allow for comparison of images: either downloaded or VIA url. Here is a preview of it.


Here is the link to this code on GitHub. This is a quick demo for now, using jQuery and Bootstrap – and the PHP code will be re-factored and cleaned up.

by anna at June 10, 2015 09:02 PM

Hosung Hwang

MVP Tree with MH Hash for Image Search

MH Image hash in pHash project generates 72bytes’ hash values. Despite the weakness of false positive result for simple images, it has a benefit of the fact that it can be used with MVP Tree implementation.

Sample program

I wrote sample utility using C++ to test real samples.
Source code is (can be changed later) :

This program works like following usage :

Usage :
    MHHashTree drectory filename radius
      directory : a directory that contains .hashmh files that will be in the MVP-tree
      filename : a .hashmh file to search from the tree
      radius : radius to search eg. 0.0001, 0.1, 1.0, 4.0
    MHHashTree drectory filename radius BranchFactor PathLength LeafCap
      BranchFactor : tree branch factor - default 2
      PathLength : path length to use for each data point - default 5
      LeafCap : leaf capacity of each leaf node - maximum number of datapoints - default 25

Test Result 1

The sample directory contains 900 image hashes that are extracted from images. I picked up an image that has 1 similar image :
Ch Light6

$ ./MHHashTree /home/hosung/cdot/ccl/hashtest/all-images "Ch Light6.jpg.hashmh" 0.001
(*) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
------------------Results 1 (9 calcs) (0.000011 secs)---------
(0) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
$ ./MHHashTree /home/hosung/cdot/ccl/hashtest/all-images "Ch Light6.jpg.hashmh" 0.1
(*) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
------------------Results 2 (738 calcs) (0.002161 secs)---------
(0) Ch Light10.jpg.hashmh   : ff43e93158c7740000949690aecc3100e7e0b2a5493b6fa5263444bad9c16930224891f9b2fc300bc1f0fc7e392436c7e7f1ffb40c04e07030fc7e3f038fc7000000000000000000
(1) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
$ ./MHHashTree /home/hosung/cdot/ccl/hashtest/all-images "Ch Light6.jpg.hashmh" 0.4
(*) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
------------------Results 11 (897 calcs) (0.000733 secs)---------
(0) Ch Light10.jpg.hashmh   : ff43e93158c7740000949690aecc3100e7e0b2a5493b6fa5263444bad9c16930224891f9b2fc300bc1f0fc7e392436c7e7f1ffb40c04e07030fc7e3f038fc7000000000000000000
(1) Metaball3.jpg.hashmh   : 0000000000000000000002b9c0400fc4620000f6e4a77c7b877e242496ec45b978d848db24b5254f99b97cdcdb2076ccdfefcd6de42400447e2a0203381e00000000000000000000
(2) Ch Light6.jpg.hashmh   : ff43e93178c77400008696922ecc3100efe2b2a5493b6fa72524409aac816330204898fcb2fc300bc9f0fc7e392436c7e3f1ffb40c04e07030fc7e3f038fc7000000000000000000
(3) Orx-logo.jpg.hashmh   : 000000000000000000000000000000000000000063f1b9ef2fb200006b897da4b194020000226c5098e17dea00000037fbf6f92dfe00000000000604000000000000000000000000
(4) Snapshot10pointcloud.jpg.hashmh   : 0000000000000000000000001fcfc000000000000012cdb0000000000000124db0000000000000161db00000000000001228d8000000000000027028000000000000000000000000
(5) Snapshot05.jpg.hashmh   : 00000000000000000000000000afc80000000000001a4db0000000000000122da800000000000016cb680000000000001263200000000000001b72e0000000000000000000000000
(6) Snapshot01.jpg.hashmh   : 00000000000000000000000000a1980000000000001b4308000000000000112928000000000000136ba80000000000000922400000000000000c0378000000000000000000000000
(7) Alaska Hitchhiker Skull (Moustache Hair).jpg.hashmh   : 000080003007bc0000f702c50c4d773389448fd50e3fcf81399c0400d2c483b1f88f96d220b4a4ea6ba81e4b2223d300e1e8a81f2883000000000000000000000000000000000000
(8) Alaska Hitchhiker Skull (Moustache Hair Eyepatch).jpg.hashmh   : 00008000700fbc0000ff00439c4c4704683c8fd51e7f4781399d8c0095dce391f84f079220eda6eb69a9e64b2623d300e1e8a81d2a83000000000000000000000000000000000000
(9) Snapshot04.jpg.hashmh   : 00000000000000000000000000a1880000000000001a424800000000000012292800000000000016cfb0000000000000092bf00000000000001b7078000000000000000000000000
(10) Snapshot07.jpg.hashmh   : 00000000000000000000000000a1c80000000000001a4df0000000000000126db8000000000000124db00000000000001244d8000000000000137270000000000000000000000000

When the radius was 0.001 or 0.01, calculation count was 9 and the result was only 1 image that is exactly the same. Time was 0.000011 secs.
When the radius was 0.1, calculation count was 738, and the result was 2 images. More time took than 9 times’ calculation. Newly added image(Ch Light10.jpg.hashmh) was this :
Ch Light10
When the radius was 0.3, the result was the same as 0.1
When the radius was 0.4, calculation count was 897 and there were 11 results. The result images are :
Snapshot07Snapshot04Alaska Hitchhiker Skull (Moustache Hair Eyepatch)Alaska Hitchhiker Skull (Moustache Hair)Snapshot01Snapshot05Snapshot10pointcloudOrx-logoMetaball3

Test Result 2

This time I picked up an image that has white background and more similar images : Snapshot01.jpg.

$ ./MHHashTree /home/hosung/cdot/ccl/hashtest/all-images "Snapshot01.jpg.hashmh" 0.01
(*) Snapshot01.jpg.hashmh   : 00000000000000000000000000a1980000000000001b4308000000000000112928000000000000136ba80000000000000922400000000000000c0378000000000000000000000000
------------------Results 1 (21 calcs) (0.000073 secs)---------
(0) Snapshot01.jpg.hashmh   : 00000000000000000000000000a1980000000000001b4308000000000000112928000000000000136ba80000000000000922400000000000000c0378000000000000000000000000

$ ./MHHashTree /home/hosung/cdot/ccl/hashtest/all-images "Snapshot01.jpg.hashmh" 0.1
(*) Snapshot01.jpg.hashmh   : 00000000000000000000000000a1980000000000001b4308000000000000112928000000000000136ba80000000000000922400000000000000c0378000000000000000000000000
------------------Results 10 (152 calcs) (0.000435 secs)---------
(0) Snapshot06.jpg.hashmh   : 00000000000000000000000000000000000000000000afc00000000000001244d0900000000000124ff4000000000000040200000000000000000000000000000000000000000000
(1) Snapshot09.jpg.hashmh   : 00000000000000000000000000aee01000000000001642e37c00000000001244940000000000000b6db80000000000000d92d0000000000000088100000000000000000000000000
(2) Snapshot02.jpg.hashmh   : 00000000000000000000000000a1980000000000000929f80000000000001b63780000000000001b6b280000000000000922580000000000000882f0000000000000000000000000
(3) Snapshot05.jpg.hashmh   : 00000000000000000000000000afc80000000000001a4db0000000000000122da800000000000016cb680000000000001263200000000000001b72e0000000000000000000000000
(4) Snapshot03.jpg.hashmh   : 0000000000000000000000000020200000000000001253b00000000000001262500000000000001a62480000000000001b6b08000000000000040c00000000000000000000000000
(5) Snapshot01.jpg.hashmh   : 00000000000000000000000000a1980000000000001b4308000000000000112928000000000000136ba80000000000000922400000000000000c0378000000000000000000000000
(6) K-3D logo.jpg.hashmh   : 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
(7) Snapshot04.jpg.hashmh   : 00000000000000000000000000a1880000000000001a424800000000000012292800000000000016cfb0000000000000092bf00000000000001b7078000000000000000000000000
(8) Snapshot07.jpg.hashmh   : 00000000000000000000000000a1c80000000000001a4df0000000000000126db8000000000000124db00000000000001244d8000000000000137270000000000000000000000000
(9) Snapshot00.jpg.hashmh   : 00000000000000000000000000000000000000000000000000000000000016d998000000000000126fb0000000000000040380000000000000000000000000000000000000000000

When the radius was 0.01, the result was only 1 that exactly matches with 21 calculation.
When the radius was 0.1, after 152 calculation, there were 10 results that are similar :
Snapshot07Snapshot06Snapshot05Snapshot04Snapshot03Snapshot02Snapshot01Snapshot00K-3D logo


  • When the radius was smaller than 0.01, calculation count in the tree was only few, and the result was exactly the same.
  • When the radius was 0.1, calculation count was grater, and the result was similar.
  • When the radius was 0.4, calculation count was almost similar to the count of all samples.
  • Radius means the distance in the tree that says the similarity, and it is same as hamming distance.
  • MH hash generates lots of 0 when the image contains solid background colour. Therefore, the hash value of the image that has only black colour is all 0.
  • As for BranchFactor, PathLength, and LeafCap parameters that are used for making MVP-Tree, I used default values, 2, 5, and 25 respectively. Test for various values need to be done.

by Hosung at June 10, 2015 08:22 PM

MVP Tree for similarity search

For several days, I analysed and implemented C++ utility that interact with Perceptual Hashes from the database. In this posting, I will introduce general analysis of MVP-Tree.

MVP Tree

Following two papers gives details about VP-Tree and MVP-Tree for similarity search :

“In vp-trees, at every node of the tree, a vantage point is chosen among the data points, and the
distances of this vantage point from all other points (the points that will be indexed below that node) are computed. Then, these points are sorted into an ordered list with respect to their distances from the vantage point. Next, the list is partitioned to create sublists of equal cardinality. The order of the tree corresponds to the number of partitions made. Each of these partitions keep the data points that fall into a spherical cut with inner and outer radii being the minimum and the maximum distances of these points from the vantage point. The mvp-tree behaves more cleverly in making use of the vantage-points by employing more than one at each level of the tree to increase the fanout of each node of the tree.” [Bozkaya & Ozsoyoglu 2]

Screenshot from 2015-06-10 10:39:47
[Bozkaya & Ozsoyoglu 9]

Screenshot from 2015-06-10 10:40:18
[Bozkaya & Ozsoyoglu 10]

MVP Tree implementation

The source code that I used was from that introduced on this page.

Followings are major APIs.

MVPTree* mvptree_alloc(MVPTree *tree,CmpFunc distance, unsigned int bf,unsigned int p,unsigned int k);
typedef float (*CmpFunc)(MVPDP *pointA, MVPDP *pointB);

mvptree_alloc allocates memory to store MVP-tree structure. CmpFunc is comparison function that is used to calculate hamming distance between two hash values inside MVPDP struct when new data point is added and the searching is happened.

MVPError mvptree_add(MVPTree *tree, MVPDP **points, unsigned int nbpoints);

This function add a data point to the tree. It can be the array of data point or one data point. While adding the node, tree is formed by comparing using CmpFunc.

MVPError mvptree_write(MVPTree *tree, const char *filename, int mode);
MVPTree* mvptree_read(const char *filename, CmpFunc fnc, int branchfactor, int pathlength, int leafcapacity, MVPError *error);

Using these functions, the tree structure can be written to a file, and can be loaded without making the tree again.

MVPDP** mvptree_retrieve(MVPTree *tree, MVPDP *target, unsigned int knearest, float radius, unsigned int *nbresults, MVPError *error);

This function retrieves similar hash results based on radius. When the radius is big, the comparison is done more times.

Sample program results

When I used 100 samples of 10 bytes random binaries, when the radius is changed from 0.01 to 3.0 and 5.0, results are :

radius : 0.01
------------------Results 1 (7 calcs)---------
(0) point101

------------------Results 3 (18 calcs)---------
(0) point108
(1) point101
(2) point104

------------------Results 10 (24 calcs)---------
(0) point102
(1) point103
(2) point105
(3) point107
(4) point108
(5) point101
(6) point104
(7) point106
(8) point109
(9) point110

When the radius was 0.01, there was 7 calculation while going through the tree. When the radius was 5.0, there was 24 calculations. When I changed the size of samples from 10 bytes to 72 bytes, the size of MH hash, the comparison count was more than the number of samples.


Sample program generates random values instead of using real hash of image. Since random values hardly have similarity between them, when the radius was less than 0.1, there was only one result that exactly matched value. To get some results, the radius should be at least 3. In this case, calculation count was almost the same with the number of values.
When I used real image hash values, the search result was quite impressive. That will be written in the next posting.

by Hosung at June 10, 2015 03:36 PM

Barbara deGraaf

What’s in a GUI

In this post I am going to talk about adding a GUI to a test scene so that a user can change values. I have meant to put this up earlier but got side tracked because I was watching  Hannibal’s season 3 premiere which has some of the most breathtaking cinematography I have seen, so if you want just what sort of results a cinematographer can create that would be the show to watch.

So back to the GUI, using THREE.js there is a library file called dat.gui which you can grab from google code page. Within your javascript file you can start making the GUI with;

  var gui = new dat.GUI();

I also recommend creating something to hold all the values that are going to be used in the GUI so in this case

var params ={


focallen: 100,

…(all other camera and lens properties)


If after you made the GUI you want to add folders you can add them with;

var camfolder= gui.addFolder(Camera);

var lenfolder=…

After you make all the folders you want you can start adding variables to the folder with;

var foc = lenfolder.add(params, focallen‘); 

The gui.dat library will add text boxes or sliders depending on if the value in params was a text or a number. So for the number values we can change it so that the user has a lower and upper limit for what the value can be and change the increment for the slider with using this line instead;

var foc = lenfolder.add(params, focallen,10,200).step(4).name(focal length);

The other type of input was a select menu for camera/lens. In order to do this the first step is to store the information about the camera/lens into a JSON file. After having the file we can use  jQuery;


//the inner working may change depending on how the JSON file was setup but you are          //going to use $.each to loop through the JSON file getting each entity and grab the value you    //want. So for this example I looped and grabbed the format value and then added it to an  //array of cameras(listcams).

after looping with the $.each we can use this list of cameras formats as options for the menu with

var cam = camfolder.add(params,’format’,listcams);

After having the GUI working we want it to do something when we change values so we can use


params.focallen =value;


We can do this for all values to continuously update the params. If you are running into issues with the JSON and storing the values gathered from the JSON file just remember that jQuery is async and do the onChange within the $.getJSON above.

If you want to add a button to the GUI the best way to do that is;

var obj= {submit:function(){

//logic that occurs when pressed here

//I did calculations of Hyperfocal distance and depth of field here


gui.add(obj, ‘submit’);

So this is basically all we will need to work with in terms of making and changing the GUI. The next step that my partner and I worked on was the depth of field using shaders. So next blog post I will talk about shaders before going into depth about them with depth of field.

Have a good night everyone.



by barbaradegraafsoftware at June 10, 2015 01:58 AM

June 09, 2015

Hong Zhan Huang

OSTEP – The City of ARMs – Tools of the trade 1: iperf

In the short time that I’ve been working on the OSTEP Team at CDOT there’s been much to take in and learn. In these Tools of the trade series of posts I’ll be describing a tool I have been making use of in my work.


iperf is a network performance measuring tool that I have been using to do some testing with the Ethernet ports of some of our ARM machines. I required a tool that would be able to measure the maximum performance of these ports while bypassing intermediate mediums that could obfuscate the results (such as the write speeds of a hard drive). iperf seemed to be a tool that would meet all my needs and more.

To quote the features of iperf from their official site:

  • TCP
    • Measure bandwidth
    • Report MSS/MTU size and observed read sizes.
    • Support for TCP window size via socket buffers.
    • Multi-threaded if pthreads or Win32 threads are available. Client and server can have multiple simultaneous connections.
  • UDP
    • Client can create UDP streams of specified bandwidth.
    • Measure packet loss
    • Measure delay jitter
    • Multicast capable
    • Multi-threaded if pthreads are available. Client and server can have multiple simultaneous connections. (This doesn’t work in Windows.

There’s quite bit of things that iperf is able to do but for my purposes, the TCP functionality with one client and one server suits me fine.

Using iperf

As alluded to prior, iperf operates in a client and server model where the server will artificially serve a file to the client and from that interaction iperf will measure the performance of the transfer between the two machines.

The steps to start up a basic testing process are as follows:

  1. Start iperf on the machine that will act as the server with: iperf -s
  2. On the other machine, start it up as the client with: iperf -c {IP of the Server}

And that’s it for basic operation! Following the completion of that instance you will see the results of the testing both the server and the client machines and it’ll look something like:

Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
[852] local port 5001 connected with port 33453
[ ID]   Interval          Transfer       Bandwidth
[852]   0.0-10.6 sec   1.26 MBytes   1.03 Mbits/sec

Again this is for the most basic usage of iperf which use the default window sizes, ports, protocol (TCP is default), units of measurement (Mbits/sec is default) and other options. For my use I only made use of the -f option which allows the user to choose what unit of measurement the results should be formatted to (in my case I used -f g which gives me the results in GBits/sec). In the chance you’d like access iperf’s other features this guide is what I read to get a understanding of how to operate this tool.

To make my life a little easier I wrote one bash script to automate the process of doing the iperf tests and recording their results as well as another to more easily parse the resulting logs.

test script:


echo "Beginning tests"

if [ "$1" = "" ] || [ "$2" = "" ]
  echo "Requires IP of the iperf server and output file name."
  touch ./$2
  for i in `seq 1 10`;
    iperf -c "$1" -f g >> $2

echo "Finished the tests"

The test script is meant to be used on the client machine as follows: test {IP of Server} {Filename of log}

parse script


echo "The file begin parsed is $1:"

echo "`grep "Gbits/sec" $1`\n"

AVG=$(grep -o "[0-9].[0-9][0-9] Gbits/sec" $1 |
  tr -d "Gbits/sec" |
  awk '{ SUM += $1} END { print SUM }' |
  awk '{ AVG = $1 / 10 } END { print AVG }')

MAX=$(grep -o "[0-9].[0-9][0-9] Gbits/sec" $1 |
  tr -d "Gbits/sec" | sort -n | tail -1)

MIN=$(grep -o "[0-9].[0-9][0-9] Gbits/sec" $1 |
  tr -d "Gbits/sec" | sort -n | head -1)

echo "The average rate of transfer was:  $AVG"
echo "The max rate was: $MAX"
echo "The min rate was: $MIN"

echo "Finished parsing."

The parse script is again used on the client in the following manner: parse {Filename of log}

And that about wraps up iperf in brief. The only thing to note is that you may need to open the relevant ports for iperf to work.

by hzhuang3 at June 09, 2015 06:00 PM

June 08, 2015

Barbara deGraaf

First up a test scene

Most feedback I got from the last post was that it was too mathy and I promise this one will have 100% less math than the last one.

The first thing to do done in the project was to make a test scene to work with. This will allow us to try different techniques and see if the outcome was expected.

The first part of making the test scene was to make walls and a ground. By using the box geometry or plane geometry in THREE.js is it very easy to make a wall or ground of size wanted. Adding all walls and the ground to an single object3D object allows us to move the whole scene around if we want the walls and ground to be in a different place.

To help measure units in the scene better a black and white checkerboard pattern was added to the wall and ground. The best way to do this is to have a small texture pattern of the checkerboard and to set texture.wrapS and texture.wrapT to THREE.RepeatWrapping and then use texture.repeat.set(x,x) where x is half the length/width of the geometry used above. Basically these three lines will cause the small checkerboard texture to appear on the whole wall/ground.

After having the basic walls and ground of the scene set up the next part is to add some detailed objects to the scene. Instead of boxes and spheres we need something will more definition and I decided to use humanoid models. There are a couple different ways to add external models to the scene. The way I did was to use MakeHuman software which allows you to easily make models and use them under the cc0 license. Exporting the created model to a obj/mtl files allows easy use in THREE.js. You can also use to make a object and export them to the file type you want.

To load the model THREE.js has a obj/mtl loader. The THREE.js website has excellent documentation on how to use it so I will say to check that up if you need to. After the model is loaded you can make as many meshes of the model as you want to put in the scene. The models can be easily scaled for accurate dimensions. By defining 1 THREE.js unit as representing 1 foot we can resize the models. Using a box of dimension 6x2x1 I can resize the human model to fit inside the box and therefore be accurate. I also added all the humans to a single object3D so that all humans can be moved at once. For my scene I ended up putting 5 human models in the scene spaced evenly apart from each other.

With these elements we have a scene that can be customized for any dimensions or distances we may want to test depth of field or field of view.

I was going to talk about adding the GUI here but I think instead I will make a separate post talking about the GUI so I can mention some specific points in creating it. So look forward to that next.


by barbaradegraafsoftware at June 08, 2015 01:57 AM