Planet CDOT

October 20, 2016

Matt Welke

Multiple Front End Tools

Today we had a rather productive meeting with where we got a better idea of what they wanted when it comes to the front end API. We originally envisioned two options for tools. One would be a simple collection of prepared queries, the other would be a powerful graphical query builder. It turns out they don’t really care for a powerful query builder.

They’re more interested in the answers those prepared queries would provide, provided we spent a lot of time carefully making them so that the tool is useful for them long term. They also want a way to analyze the data we collected and calculate information that would be used to categorize the visitors. For example, an affinity score might be how well a certain user likes a certain category. Those variables can be integrated into their Elasticsearch system, and this augments their recommender engine.

This two-pronged approach, actively getting analytical answers and passively augmenting their recommender engine, is definitely the best way we can use the data we collected. The scope of this project is huge in terms of the number of applications we’ll be making. It seems there’s a new API being proposed every few weeks. But then again, splitting our application into multiple parts like this is probably a good way to organize things. It’s probably going to be flexible and easier to look at later on when it comes time to take useful work we’ve done and consider publishing it as open source software. All these APIs will be worth it!


by Matt at October 20, 2016 08:42 PM

Laily Ajellu

Reduce Flashing Lights - Get 'A' Certified

Flashing images and blinking lights can be very dangerous for some users, and uncomfortable for others. This post will help you follow accessibility guidelines to get your web app 'A' certified.

Who does this feature affect?

  • People with photosensitivity (light - sensitivity) seizure disorders
  • People with migraine headaches
  • Everyone (Remember, accessibility features create a better experience for all)

How to implement:

  1. Content on the page should not change or flash more than 3 times per second.
  2. If it is necessary for the content to change more than 3 times per second:
    • Show the flashes on a small part of the screen (less than 21,824px square area)
    • The above square area is for the average screen, but if you know the size of the screen your content will be displayed on, there’s a formula you can use to calculate the safe square area: Calculation instructions
    • This should be the only flashing area on the page
  3. Reduce the colour contrast for flashing content
  4. Reduce the screen light contrast for flashing content
  5. Don’t use fully-saturated red colour for flashing content
  6. Use analysis tools to check if your webpage passes:

How Not to implement:

  • It’s not enough to have a button for the user to stop the flashing, because a seizure or migraine can be induced extremely fast. The user wouldn’t have enough time to press the button.
  • Do not just put a seizure warning:
    • because people may miss them
    • children may not be able to read them

Examples of When an Application might use Flashing lights

  • A slide in a presentation changing
  • A video of explosions going off
  • A video of a concert with strobe lights
  • A video of lightning flashing

Interesting facts:

These guidelines were originally for TV programs, but now they’ve been adapted to computer screens taking into consideration:
  • The shorter distance between the screen and the eyes
  • That the computer screen takes up more of our field of vision when we’re looking at it
  • An average of 1024 x 768 computer screen resolution


Three Flashes or Below Threshold

Image References:

Lady with a migraine
Child on the computer
Fully saturated Red

by Laily Ajellu ( at October 20, 2016 02:59 PM

October 19, 2016

Matt Welke

Reacting to Our Need for React

Today I began work on my third mockup, which is when we decided to dive into a heavier front end JavaScript framework. We studied the idea of doing our thorough dynamic UI for querying our back end API and decided that even though it would be difficult to implement, it was feasible. However, we definitely need something more advanced than vanilla JavaScript or even jQuery. We’re going to end up with many HTML elements, many of which will have events associated with them so that they manipulate other elements or talk to servers, etc. And the biggest problem with this is that if you have a client side app that’s removing things from the DOM and adding them in, you need to re-setup all your events for the DOM elements when they’re added back in. This would be a nightmare for an app of this scope.

When it comes to modern front end JavaScript frameworks, ReactJS (aka React) is probably the best bet for us. It has an ingenious system for bubbling events up to one root element which then interprets the events and decides what to do. Therefore, transient DOM elements won’t be a concern to us. This pattern is called “event delegation” and the idea has been around for a while, even before React implemented it under the hood. React also lets you create your own “components” which contain a combination of state and presentation, where the presentation is whatever HTML representation of your metaphorical component you can dream of.

This way of programming brings me back to my days of learning GUIs with Windows Forms. I let that thinking rot in my mind when I got immersed into web development, having been convinced of the merits of RESTful applications that aren’t necessarily event-driven. It’s funny how things tend to converge over time. I’m open minded, and I can already tell that React is a powerful tool, so I’m looking forward to learning it and I know we can produce a great visual MongoDB query builder for this project.

by Matt at October 19, 2016 08:38 PM

October 18, 2016

Matt Welke

Investigating MongoDB Aggregation

Today we decided that it would be a good idea to be ambitious and go for my original “plan A” front end API. This is the one that would dynamically read the schema as the user builds queries using a GUI to guide them along and create any MongoDB query possible. Investigating this route is a good idea because it would provide with a tool that would stand the test of time. It’s a much better long term solution than a tool that only has a handful of queries we’ve created today based on the current database schema.

The main challenge with creating this front end API is learning what MongoDB code would map to the things the user would click on and the things the user would type in. MongoDB calls this their “Aggregation Framework”. Because MongoDB lacks many of the features that relational databases have (like joins), they have provided their own ways of doing complex analytical queries. For example:

  • SQL has “SELECT” to restrict what you get back from what you match. MongoDB has “$project” for this.
  • SQL has “WHERE” to restrict what you’re matching. MongoDB has “$match” for this.
  • SQL has “INNER JOIN” to allow to relate a dataset with another in a normalized system (“show me the users who visited at least 5 times today”). MongoDB can embed the hits into the users in this example with “$lookup”.

From above, it’s clear that it’s kind of simple how things line up… but the complexity comes from arranging the MongoDB query the right way. It’s arranged differently compared to the SQL queries we’re used to writing. And even though we can use $lookup in place of some joins, memory usage is a concern. MongoDB has a “working set” which is the data it’s looking at at a particular moment during the execution of a query. In the above example, finding hits and users and relating them, all of the hits and all of the users might have to be held in memory. With the kind of scale we have, this may be a problem. Additionally, you can’t shard your database when you use $lookup, which is key to use scaling our solution so that it suits long term.

We have more research to do to make sure what we do will be a good, long term solution, and that we even have time to implement this complex querying and data analysis tool.

by Matt at October 18, 2016 09:33 PM

October 17, 2016

Matt Welke

UI Mock Ups

Today I continued work on my front end API user interface mock ups. In my previous blog post I described to ideas. One was a very heavy duty UI that would adapt to our MongoDB schema, and the other was a simpler collection of queries arranged for them to use. I worked on the second option today. I figured the best way to show the mock up would be to get a quick little prototype running. So I made it with Bootstrap and JavaScript. It’ll end up doing a lot in the front end with JavaScript anyways (using jQuery to manipulate the DOM), and Bootstrap is quick and easy to use so it didn’t slow down the design process. I have things that do things when you click them, and a simple query filtering system almost working. This will work quite well to show my team lead the idea and maybe show the staff soon too for some feedback if they ask to see it soon.

by Matt at October 17, 2016 09:33 PM

Henrique Coelho

Making a User Control module for DotNetNuke

I had to take a small break from JavaScript and Node.js in the past few days and work with ASP.NET: our client module in the front-end needed some information from the server, but there was no way we could retrieve it from the DOM, so I had to develop a small User Control module for DotNetNuke (an ASP.NET framework) that passes this information to our module in the frontend. A User Control module is a module that can be embedded in the skin of the website, so it can be called for every page.

This is how it works: in the backend, we get the information, put it in a stringified JSON, and include it in the DOM of the webpage. For instance(I had to use square brackets instead of angle brackets because WordPress was breaking my page – again, use your imagination):

<%=getServerInfo(); %>

String getServerInfo() {
    return "[script] = 1;[/script]";

With this, the window instance will have an “info” object, which can be accessed by the JavaScript in the frontend:

    [script] = 1;[/script]

This is very simple, but my C# skills were a bit rusty, and I’ve never worked with DotNetNuke before. These are the files (with pseudocode, of course) that I had to create in order to get a module running:

<%@ Control Language="C#" CodeBehind="View.ascx.cs" Inherits="MyClass.Controller" %>
[asp:Literal ID="placeholder" runat="server"][/asp:Literal]

The file above is responsible for creating the View of the module, as well as linking it to its “codebehind” (the logic behind the view), specifying which classes it implements, and making a placeholder for our JSON to be inserted.

using ...;

namespace MyClass
    public partial class Controller : System.Web.UI.UserControl
        protected void Page_Load(object sender, EventArgs e)
            placeholder.Text = "";

The file above is the “codebehind” of the module – as soon as the page loads, it will replace our placeholder.

    [package name="View" type="SkinObject"]
      [friendlyName]My Module[/friendlyName]
        [component type="SkinObject"]
        [component type="Assembly"]
        [component type="File"]

The file above is responsible for defining the module: it tells DotNetNuke what it is.

<%@ Register TagPrefix="dnn" TagName="MYCLASS" Src="~/DesktopModules/MyClass/View.ascx" %>
[dnn:MYCLASS ID="dnnMyClass" runat="server" /]

The snippet above is inserted in the skin of the portal, so it can be called in all pages.


After the module is compiled into a DLL, it can be used by the website.

by henrique coelho at October 17, 2016 09:06 PM

October 14, 2016

Matt Welke

Beginning Work on the Front End API

Today was mostly a research-oriented day for me. I need to look into creating a front end API with a nice user interface for to use to query our back end API. I have two main ideas right now:

A) A very powerful UI that would learn about the schema as the user clicks on things to give them buttons etc for every possible Mongo query they could do. For example, if they clicked “Hit”, they’d get a few buttons popping up below listing the attributes or “Session” because there’s a many-to-one relationship between Hit and Session. This thing would learn forever, and always work for them, even as they add more data to the schema.

B) A simple UI with as many queries that we can come up with as possible based on our current schema. They would be sorted into categories so it would make sense, but because it’s just a set of queries we’ve prepared, it wouldn’t grow in the future.

A is extremely hard. I practiced making a mock up in HTML (no functionality) and I still couldn’t even wrap my mind around how it would all connect together. I looked online for some open source libraries in case anybody had already created this “visual query” tool, and I couldn’t find any.

B is more feasible. My team mate agrees with me so far with this.

I began looking into an ODM (object document mapper) called Mongoose to make our lives easier as we make this. They let you use programming-like code to traverse the cardinal relationships among your models. Instead of making a query to get the id of a certain session, and then finding all hits based on that id (two separate queries, lots of code), we can do something “findTheSession().hits”. Done. Boom.

I love these mappers. I’ve used ActiveRecord with Rails and Entity Framework with .NET, so this feels familiar. Once I get passed the growing pains of learning how to set it up, I think it will speed us up a lot. We didn’t need to use an ODM for the back end API because it wasn’t doing any reading of our database, just taking in simple objects and inserting them into the database.

Over the next few days I’m going to continue playing with Mongoose, but also brainstorm types of queries to help analyze their data, and think about the UI we’ll create.

by Matt at October 14, 2016 10:30 PM

October 13, 2016

Matt Welke

We’ve Got the Data, Now What?

Our demo went well!’s staff seems impressed by our work so far. There will be more refinement to do on our back end API in charge of receiving visitor data, but we can also start thinking about the other half of our project at this point. We will need to present the information we’ve collected in a useful way for them to use to learn about their users. We also will need to link the data into their existing system. If their existing system, which uses Elasticsearch as a recommender engine for example, can use our data to augment their recommendations, they will be able to grow.

This means a front end API (which we’re also referring to as the GET API) which is able to query our back end API and send responses in two main forms. The first form will likely be JSON, or some other useful, stable, powerful, machine-readable language. This can be used by Elasticsearch or the rest of their system to augment their recommendations. The second form will likely be the V in MVC. It will be an HTML view or a React or Angular web app. It will be something with an interface they can use to build queries and see data presented in a nice clean, pretty way. My role on the team right now is to investigate creating this front end API while my team mate looks into ways to make our back end API collect even more useful information (likely involving hooks with their CMS).

So far this has been a pretty interesting project, but really, we’re only getting started.

by Matt at October 13, 2016 09:21 PM

October 12, 2016

Henrique Coelho

Deployment of containers on AWS

We spent the past days reading about AWS in order to deploy the 2 containers we developed: one container only has a “Dockerfile” with our database (MongoDB) and the other container has the API to insert data into this database (Node.js). In this post, I’ll describe some things I would like to have known before I started the deployment; it was a very frustrating process, but after you learn how everything works, it becomes really easy.

First of all, the deployment is not a linear process: you will have to know some details about your application before you start the process; these details, however, will not be obvious for you if you haven’t used AWS before: this is one of the reasons why it was a slow, painful process for us.

Looking back, I think the first step to deploy these containers is to upload the repositories, even though they are not properly configured yet: you need the repositories there to have a better perspective on what to do. So, first step: push the docker images to EC2 Container Registry. The process is simple, it only takes 4 steps (3 steps, after the first push), which are just copying and pasting commands in the command line.

After the containers are uploaded, we should choose a machine that will run Docker with the containers, and here is the catch: we need to choose a machine that is already optimized to be used for Container Service, otherwise it will not be a valid Container Instance and you would have to configure it yourself. To find machines that are optimized for ECS, we search for “ecs” on the custom instances. After the machine was chosen, we select the other specifications we’ll need, such as storage, IPs, and so on – but nothing too special here.

With the right machine instance, a default Cluster will be created in the Container Registry. Here is the interesting part: the cluster is a set of services, which are responsible for (re)starting a set of tasks, which are groups of docker containers to be used by the machine. Instead of starting from the service, we now should start from the task, adding its containers and work back to the service – then the deployment will be complete.

To create a task is simple: we give it a name and a list of the repositories (the ones that we uploaded in the beginning), but we also have to set how the containers are going to interact with each other and with the devices outside. There are two special settings we had to do:

1- The MongoDB container should be visible for the API. This can be done by linking them together: on the container for the API, we map the name of the database container to an alias (for instance: Mongo:MongoContainer); with this, the container of the API will receive some environment variables, such as MONGOCONTAINER_PORT, with the address and port of the other container. We can use this to make the API connect to the database (and the source code would probably have to be modified to use this port).

2- The MongoDB container should use an external drive for storage, otherwise, its data will be lost when the container is restarted. For this, we map the external directory (that we want the data to be stored into) to an internal directory, which is used by the database (for instance, /usr/mongodb:/mongo/db). Since we wanted to use an external device, we also had to make sure the device would be mounted when the machine was started.

After the task is set up, we make the service for the cluster: the service, in this case, contains the only task that we made. With the service properly configured, it will start and restart the tasks automatically: the deployment should now be ready.

It’s easy to understand why we spent so much time trying to make it work (giving the amount of details and steps), but looking back, this modularity makes a lot of sense. The learning curve is very steep, but I am very impressed by how powerful this service is. I am very inclined to start using it for deploying my own projects.

by henrique coelho at October 12, 2016 10:52 PM

Matt Welke

Demo is Ready!

Today we continued to prepare our demo for We finally got it running on AWS (Amazon Web Services), using their ECS (Elastic Container Service). It was tricky getting a good mental grip on how everything fits together on AWS, but when you get used to it, it isn’t so bad.

Our setup is basically an ECS cluster (of containers) consisting of two Docker containers (one for our Mongo database and one for our Node.js back end API), set up with one ECS-prepared EC2 (Elastic Compute Cloud) instance to run it all. The instance uses EBS (Elastic Block Storage) to store the data so that it’s persisted when the instances are stopped.

In English… Our apps are running in virtual machines inside a virtual machine, and it’s set up in a way that it will reboot them automatically if they crash, and they can be scaled too in the future.

The demo will consist of using a web page with some dummy content with our front end script (which looks for events like scrolling to the bottom of the page) to show the staff tomorrow that it logs the data when we do those events. We have a tool called adminMongo, which can be best-described as the MongoDB equivalent of phpAdmin. We’ll use it to show the data in the database after we’ve clicked on a few things and scrolled around. And it’s all stored in the cloud.🙂

AWS is a beast, but once tamed, it’s a powerful beast.

p.s. I should make an Amazon acronym cheatsheet sometime. Perhaps I’ll call it the AAC.

by Matt at October 12, 2016 09:42 PM

Laily Ajellu

Time Limits - Get 'A' Certified

Time Limits Should be Adjustable

People need varying times to complete tasks.
Follow these guidelines to give enough time to your users, and make your Web App 'A' Certified.

UX examples of where its needed:

  • Situation:
    Text is scrolling across the screen
    Add a pause button

  • Situation:
    User is taking an Online test
    A moderator should be able to extend the time to complete the test

  • Situation:
    User is using Online Banking
    If a the user is inactive for a while, give the user 20 seconds to press any key to extend the session.

  • Situation:
    User is trying to by a concert ticket online (With limited tickets)
    Warns user 20 seconds before the time limit is going to end. Also let the user input all of information and banking information before the time limit starts. It would be unfair to allow time extensions in this case.

The Checklist - One of the below should be true:

  1. Turn off:
    The user can to turn off the time limit before it even starts
  2. Adjust:
    The user can change the time limit by at least ten times the default length
    (It's 10 extension times because of clinical experience.)

    Essential Exception:
    Where the time limit is essential. Eg. granting double time on a test
    If it would invalidate the outcome to give all users the ability to change the time limit, a moderator must be able to change time for them (Eg. the teacher)
  3. Extend:
    If the time limit is ending, give the user 20 sec to do something simple (like pressing the space bar). The user should be able to extend the time limit at least 10 times.

Other Exceptions

  1. Real-time Exception:
    The time limit is a required part (eg, an auction) And it would be unfair to give more time to some and not others
  2. 20 Hour Exception:
    If the default time limit is longer than 20 hours, you don't have to have any of the above features.

Who this Benefits:

  • People who are reading/listening to content that is not in their first language
  • People with physical disabilities often need more time to react
  • People with low vision need more time to locate things on screen and to read
  • People with blindness using screen readers may need more time to understand screen layouts
  • People who watch sign language interpreters

References: Time Limits Required Behaviours

by Laily Ajellu ( at October 12, 2016 03:21 AM

October 11, 2016

Matt Welke

Continuing Work on the Demo

Today we continued work on getting our application hosted on Amazon Web Services (AWS). This ended up being a lot more challenging than we expected. It’s difficult to navigate the system when we’re restricted by needing to manually get permissions (Amazon’s way of limiting what some users can do) from to start or stop services, create instances, etc. It is good practice, of course, to be conservative with these permissions. We empathize with them wanting to do things this way.

Our main problems seem to be with Docker crashing when it begins to run on the AWS instance we created. We also sometimes can’t even get the instance to be registered with the container service. We will continue work on this tomorrow. If for some reason we can’t get the Docker setup working before our demo on Thursday, we will likely resort to just manually using instances without any container service managing them.

by Matt at October 11, 2016 09:32 PM

Mohamed Baig


A process for small teams for managing their git repositories.

by mbbaig at October 11, 2016 05:08 PM

October 08, 2016

Andrew Smith

On the reception and detection of pseudo-profound bullshit

I first thought this was a joke, but it isn’t. Someone actually ran a study about how well people can detect bullshit. And the result is a wonderfully-written paper (which I’ll copy here, because the original will probably get its URL changed).

I haven’t finished reading it yet (my attention span isn’t what it used to be) but I’m posting it here because it’s clearly an impressive piece of work and is so relevant today, when it seems that you bump into idiots professing the truth no matter where you turn your ears.

Thanks to the authors for getting it done! I’m sure they had to jump through many hoops to have their project approved.

by Andrew Smith at October 08, 2016 09:38 AM

October 07, 2016

Matt Welke

Preparing for the Demo

Today I put some finishing touches on the client code before looking into hosting the back end push API and the client code as we prepare for our Tuesday demo.

We unfortunately didn’t have enough permissions on our access to’s Amazon Web Services account right now, so I couldn’t get it fully hosted. But I began to investigate this on my own, using my own AWS account’s free tier. I was able to set up an IAM user for myself, give myself the needed permissions (which ended up being quite a bit… Docker containers on AWS need a lot of their services to work). I wasn’t quite able to get the task working for using ECS (Elastic Container Service). We’ll look at it in the morning next work day to get it running. Then, once we get the proper permissions from, we can get it running on theirs, knowing exactly what to do.

For the client side of things, we were able to access their CMS server, and add some JavaScript that would run on all pages. It was just a development environment there, so we don’t need to worry about breaking things for millions of visitors (phew…). That was pretty simple, so I’m betting the AWS is going to be the trickier part of this.

My team mate worked on the code while I investigated deployment like I described above. He secured the connections with HTTPS and WSS (WebSocket Secure) instead of our old HTTP and WS.

by Matt at October 07, 2016 09:27 PM

October 06, 2016

Matt Welke

I See You, User

Today we worked on the client side/browser hook code. We paid close attention to making sure it wouldn’t block the browser. We’re preparing a ready version to link to the main system (or at least a dev version of it) completely cloud-hosted, so we wanted to code these best practices ahead of time to make sure what we create is good enough to use in production.

I specifically worked on creating client side event handles that would log when a user begins to fill out a form, and also would log when that user submits the form. One thing we have to be careful about is treating each group of form inputs as their own entity. If they have more than form on the page (perhaps an email sign up, and a comment area), we want to track when they start to fill these out and/or submit them separately. See the example below. We set the event handle for “click” for radio buttons and checkboxes, and “onkey” for text inputs, so that it’s able to track them all:

// Find all form elements
const forms = document.getElementsByTagName('form');

// For each of them...
for (let i = 0; i < forms.length; i++) {
    // Each set of inputs get their own group of event handles
    // (inputs for one form don't interfere with inputs of another form)
    // ...prepare an array of child input elements
    const children = forms[i].children;

    // filter for just inputs
    const inputs = [];
    for (let i = 0; i < children.length; i++) {
        const child = children[i];
        if (child.tagName.toLowerCase() === 'input') {

    // Listener only fires for first form element changing
    let triggered = false;

    // For each of them, assign an onchange event listener
    inputs.forEach(input => {
        switch (input.type) {

            case 'text':
                input.addEventListener('keydown', () => {
                    if (!triggered) {
                        triggered = true;

                        // log the data

            case 'checkbox':
            case 'radio':
                input.addEventListener('click', () => {
                    if (!triggered) {
                        triggered = true;

                        // log the data

by Matt at October 06, 2016 09:16 PM

Henrique Coelho

JavaScript and Non-blocking functions

One of the most interesting features of JavaScript must be its event-driven and asynchronous nature: the operations can, but don’t need to block the next operation from being executed before the current one is done. For instance, the following snipped follows a very logical sequence:


// The output is: 1 2 3 4

However, we can make that these functions are executed in a different sequence by setting timeouts for them:

setTimeout(() => console.log(1), 75);
setTimeout(() => console.log(2), 0);
setTimeout(() => console.log(3), 50);
setTimeout(() => console.log(4), 25);

// The output is: 2 4 3 1

Why is this useful? Well, if we have operations that are costly to perform, but don’t have a high priority. Normally, they would block other operations, even though they are more important and don’t need the former:

// Not very important operation
for (let i = 0; i < 1000000000; i++);
console.log('Not very important operation is done!');

// Very important operations
console.log('Super important operation');
console.log('This operation is also very important');

/* Output:
Not very important operation is done!
Super important operation
This operation is also very important

In this case, we could simply move the costly and unimportant operation to the end of the file (since it is not used by anything else, after all), but real life is not that easy: although we should prioritise the interaction with the user while leaving costly and unimportant operations for last, the interactions with the user are not predictable: we cannot create a logical sequence that attends to all the cases. However, we can use the setTimeout function and set a 0 (zero) timeout for a procedure: the operation will be sent to the back of the queue of operations to perform. Like in this case:

// Not very important operation
setTimeout(() => {
    for (let i = 0; i < 1000000000; i++);
    console.log('Not very important operation is done!');
}, 0);

// Very important operations
console.log('Super important operation');
console.log('This operation is also very important');

/* Output:
Super important operation
This operation is also very important
Not very important operation is done!

Having this in mind, I started experimenting on what is the best combination to create a script that does the most vital (and cheap) operations as soon as possible, but leaves the ones that would affect the user experience for last.

First, I made a simple webpage like this one (I replaced the angle brackets with square brackets because WordPress was screwing up with my page. Just use your imagination):

[script src="1.js"][/script]
[div id="overall"]
  ...around 100,000 auto-generated HTML elements here...
[script src="2.js"][/script]

alert('Script 1 ' + document.getElementById('overall').childNodes.length);
for (let a = 0; a < 1000000000; a++);
alert('Script 1 done');

alert('Script 2 ' + document.getElementById('overall').childNodes.length);
for (let b = 0; b < 1000000000; b++);
alert('Script 2 done');

(I will change the file 1.js during this post, but 2.js and page.html will stay the same)

The idea is simple: a very heavy webpage with a script in the header, and a script at the end of the DOM; these scripts are just alerts saying how many elements are there in the DOM. This was the order of what happened while loading the page:

1- Script 1, 0 (alert in a blank page)
2- A few seconds of a blank page
3- Script 1 done
4- Dom is loaded
5- Script 2, 100002 (alert in a fully-loaded page)
6- A few seconds of loading, but with the page fully functional
7- Script 2 done

* the first childNode.length actually gives an error because childNodes wasn’t even defined yet, but the moral is: the DOM is not loaded

This is why it is recommended to put your script at the end of the page: it will not block your DOM from rendering. On top of that, if you are planning to do some DOM manipulation, you have to wait for it anyway, otherwise there won’t be anything to manipulate (duh).

However, it has a drawback: your script will only be called after the DOM is already rendered. For our case, we want to know how much time it took for the DOM to load, so this is not an acceptable alternative. What we can do in this case is use an event to see when the page gets loaded, and then we execute the script:


alert('Doing some very fast and important work here...');
document.addEventListener("DOMContentLoaded", function () { 
  alert('Script 1 ' + document.getElementById('overall').childNodes.length); 
  for (let a = 0; a < 1000000000; a++); 
  alert('Script 1 done'); 

With this, the orders of executions become:

1- Doing some very fast and important work here
2- Dom is loaded
3- Script 2, 100002 (alert in a fully-loaded page)
4- A few seconds of loading, but with the page fully functional
5- Script 2 is done
6- Script 1, 100002 (alert in a fully-loaded page)
7- A few seconds of loading, but with the page fully functional
8- Script 1 done

Now another problem arrives: what if there are several costly, but less important functions inside that one? Say this is our 1.js now:


alert('Doing some very fast and important work here...');
document.addEventListener("DOMContentLoaded", function () {
  alert('Doing not very important operation...');
  for (let a = 0; a < 1000000000; a++);
  alert('Not very important operation done');

  alert('Super important operation');
  alert('This operation is also very important');

The order of operations would be:

1- Doing some very fast and important work here
2- Dom is loaded
3- Script 2, 100002 (alert in a fully-loaded page)
4- A few seconds of loading, but with the page fully functional
5- Script 2 done
6- Doing not very important operation…
7- Not very important operation done
8- Super important operation
9- This operation is also very important

Can we send the “Not very important operation” to the back of the queue again? Yes we can. By using the setTimeout function I described before:

alert('Doing some very fast and important work here...');
document.addEventListener("DOMContentLoaded", function () {
  setTimeout(() => {
    alert('Doing not very important operation...');
    for (let a = 0; a < 1000000000; a++);
    alert('Not very important operation done');
  }, 0);

  alert('Super important operation');
  alert('This operation is also very important');

This is the order of operations we would get:

1- Doing some very fast and important work here
2- Dom is loaded
3- Script 2, 100002 (alert in a fully-loaded page)
4- A few seconds of loading, but with the page fully functional
5- Script 2 done
6- Super important operation
7- This operation is also very important
8- Doing not very important operation…
9- Not very important operation done

By using some timeouts and some events, I’m confident we will be able to make a client module that will execute at the right time: without interfering in the user experience, but still doing the right operations in the right time.

by henrique coelho at October 06, 2016 07:55 PM

Laily Ajellu

Accessible Websites are Like Essays - A Memory Aid

Creating accessible pages is like writing an essay, you must have all the necessary organizational structures.

Both Must Have:

  1. Main headings
  2. Sub-headings
  3. Text for all explanations of images and meaningful color usage in your UI. (text must be screen reader only, or screen reader and visually available)
  4. Labels positioned to maximize predictability of relationships
  5. Page numbers for pdf documents
Content Structure Separation Programmatic

by Laily Ajellu ( at October 06, 2016 02:09 AM

Required Fields - Get 'A' Certified


If you’re using an asterisk to indicate a required field,
  1. Use the aria-required attribute on its input field.
  2. Instructions for the form should be
    ”all required fields are displayed in red and marked with an asterisk *”
If you’re using colour (like red) to indicate required field
  1. Use the aria-required attribute on its input field.
  2. Instructions for the form explain should be:
    ”required fields are labeled with red text”

Code Example

What it looks like with some CSS


by Laily Ajellu ( at October 06, 2016 01:57 AM

October 05, 2016

Matt Welke

Switching to Websockets and Cleaning Up

Today we worked on completing the switch from using AJAX to websockets for all of the code in our back end and browser hook that logs the information. This was a slow and steady process of editing both ends to slowly get us moved over. We test as we go using both unit tests and a little HTML page with inline JavaScript to play with it and see how the new method gets the data into the database.

The funny thing is we noticed as we did this that we don’t even need Express.js anymore at this point. We do need a simple HTTP server running just to interface with Amazon Web Services, and for performing a few development administrative tasks (like manually clearing the database and creating tables), but the actual work is done with just websockets. So we refactored our back end by removing Express.js and replacing it with just the websocket server code, using that “ws” Node.js library. Less is more is our opinion. I joked that I like removing code more than I like making code, and I think that philosophy holds merit.

Our next steps are to identify any other information we want to log last minute before we send this prototype off to, and also to investigate using encryption (using WebSocket Secure aka wss:// instead of WebSocket aka ws://) to make our app more robust once it hits production.

by Matt at October 05, 2016 09:54 PM


What a surprisingly productive day! We started off the morning by beginning to investigate using the WebSocket protocol to implement tracking one of the things we were interested in.

Some background:

Most of the things we track can be accomplished by running some JavaScript on the browser to get that information and send it to us using AJAX. Where this becomes insufficient is tracking how long they were on the page before closing it. If we have an AJAX request fire when they close it, there’s no guarantee that the request will complete (logging the data in our database), because there’s a chance the closing of the tab or browser will stop the request.

One solution we thought of was to store the time they closed the page in a cookie or other form of local storage in the browser. Then, the next time they visited, we could see that data in the local storage, and know how long they were on the page before. But this is a problem if the user never returns, and it ruins the simplicity of our logging scheme, where we can completely log something in just one request, without needing to revisit it and modify the data.

We thought of using AJAX to poll the server (“Hey server, web browser here, I’m still viewing the article. I’ll let you know if I still am in another second!”) but there would be way too much overhead associated with doing that. This overhead can be anywhere from about 900 bytes up to kilobytes depending on cookie information. We thought we’d investigate websockets as the next technology choice to accomplish this. This technology allows for a persistent TCP connection between client and server where each “message” exchanged has an overhead of only 2 bytes. The overhead is just a character saying “here come’s a message!” and then after the message, a character saying “k, done”. If a poll is just us knowing whether or not the client is still viewing the page, we just need one character to be that message. Any character will do. Therefore, each message sent by’s visitors’ browsers as they poll our server would consist of only 3 bytes.

So fast forward to this morning when we seriously took a look at it and tinkered with some code… We looked into using, a powerful, easy to use JavaScript library that would run in the browser. It was lightning fast to try out… I was able to get a working example running in minutes following their instructions. But the only catch was that the library’s client side that would run in the browser is 300 KB. We found out that just using websockets by itself without any library in the browser does the job. It’s a bit less powerful, but we don’t need to use it to do anything complicated. We still need to use Node.js’s ws library in our server, but we’re not concerned with server side library size, just client side. We’re quite happy with 0 KB as the additional amount of JavaScript needed on the browser side.🙂

We had a working prototype at the end of the day. This meant removing the HTTP routes from our Express.js back end. We replaced them with middleware that would connect to the MongoDB database. This middleware was called by a simple switch statement in our app.js. Our app now consists of a websocket server starting up, waiting for connections, and then listening for messages from anyone who connects. It calls the appropriate middleware based on the event name component of the message. For example, if the message was a JSON object with the “event” property of “receiveHit”, the websocket server interprets that message as the equivalent of a POST request for accepting a Hit model. We decided to replace the old HTTP routes with websocket server handles because we may as well keep things simple! There’s no need to use HTTP and AJAX for everything except logging the page view time. We may as well just use websockets for everything.

I think this technology goes beyond just letting us log more types of data. It will have the bonus effect of reducing our number of HTTP queries per Hit (page view) from 10-30 (depending on what the visitor does on the page) down to just one. It will save massive amounts of execution time and data transfer on the server, and even help save data transfer on the browser side. If they’re using a mobile phone, that means saving data and therefore even extending battery life. This technology is amazing… I love that I stumbled into this while working on this project. I think websockets are going to revolutionize the kind of web apps we start to see.

by Matt at October 05, 2016 12:14 AM

October 04, 2016

Henrique Coelho

Approaches for database connection with Express.js

While experimenting with different databases and Node.js, I saw several different approaches for pooling connections from the database and ensuring they can handle a lot of traffic.

When using MySQL, for instance, the simplest way to make a connection would be like this:

const mysql = require('mysql');
const express = require('express');
const app = express();

const dbSettings = {
    host    : 'localhost',
    user    : 'root',
    password: 'password',
    database: 'myDatabase'

// Index route
app.get('/', (req, res) => {

    const connection = mysql.createConnection(dbSettings).connect();

    // Here we get all rows from a table, end the connection,
    // and respond
    connection.query('SELECT * FROM Table', (err, doc) => {

        if (err) {
        else {

This is a very simple way to connect to a database: for every request that we get, we connect to the database, fetch the results, end the connection, and respond. But it has a drawback: it is slow and does not support several connections at the same time.

To solve this problem, we can use a pool of connections – a cache of database connections that can be reused and can handle several connections at the same time. This is how it would look like using a pool:

const mysql = require('mysql');
const express = require('express');
const app = express();

const dbSettings = {
    connectionLimit : 100,

    host    : 'localhost',
    user    : 'root',
    password: 'password',
    database: 'myDatabase'

const pool = mysql.createPool(dbSettings);

// Index route
app.get('/', (req, res) => {

    pool.getConnection((err, connection) => {
        if (err) {

        // Here we get all rows from a table, end the connection,
        // and respond
        connection.query('SELECT * FROM Table', (err, doc) => {

            if (err) {
            else {

This is a much better way to handle connections in production.

For NoSQL, however, the pattern that I found was a bit different: instead of starting the server before anything, we first connect to the database, and then we start the server. The connections will be kept alive until the application is terminated:

const mongodb = require('mongodb').MongoClient;
const express = require('express');
const app = express();

// Index route
app.get('/', (req, res) => {

    // Here we just get any document from a collection and respond
    app.locals.db.collection('myCollection').findOne({}, (err, doc) => {
        if (err) {
        else {

// Connecting to MongoDB before starting the server
mongodb.connect('mongodb://localhost:27017/myDatabase', (err, db) => {

    // Aborting in case of error
    if (err) {
        console.log('Unable to connect to Mongo.');

    // Making the connection available to the application instance
    app.locals.db = db;

    // After the connection has been stablished, we listen for connections
    app.listen(3000, () => console.log('Listening on port 3000'));


by henrique coelho at October 04, 2016 08:15 PM

October 03, 2016

Matt Welke

Beginning the Client Side Hook

Today we worked on creating the client side JavaScript code that would be inserted into every article’s readers would read. They already use a similar style of injecting JavaScript for what they need with every response, so we would simply be adding another hook. This code would use AJAX to send POST requests to our back end API that we’ve now finished creating and testing.

When you want to use AJAX to send HTTP requests, you have one obvious choice, which is jQuery. There’s a lot of documentation on using jQuery for AJAX, and it’s reliable. However, we wanted to minimize the amount of JavaScript that we would be injecting in our hook, so we used a library called axios. This library got us down to about less than 20 kb of JavaScript code going to the reader when they load an article. Its syntax for making the requests is pretty similar to jQuery’s, so learning how to use it isn’t hard.

Creating our client side hook basically involves brainstorming to find out as much information we can capture as possible. For example, we’re going to use the “onscroll” event of the DOM to detect when the visitor has scrolled to the bottom of the web page, because we want to know what articles visitors are interested in enough to finish reading. Some pseudocode for what we created to do this would be:

domElement.addListener(‘onScroll’, function => {
if (/* the user reached the bottom of the page */) {
// hitId is the client side hook knowing what “Hit” produced this action

For some things we want to track, like when a visitor closes the web page/tab/browser/loses power/etc… We need something reliable. We can’t rely on an AJAX query finishing firing as the page closes, and we can’t block them with synchronous JavaScript. We believe our solution to this is web sockets, likely using the JavaScript library. Summarized, web sockets allow you to maintain a persistent full duplex connection between a web server and a client after starting an HTTP request. There’s no need for polling, it’s very quick, and very low overhead. We can use this to continuously check if the visitor is still “alive”. (We will poll, which is technically an anti pattern of web sockets, but we need to do this and the low overhead makes this happen for us).

by Matt at October 03, 2016 10:42 PM

Unit Testing

On Friday, we finished creating the first version of the back end API and created unit tests for it. In this testing environment, it was running through our IDEs (WebStorm) and connecting to a MongoDB database running on a server on the LAN sitting on our desks. That server was running Fedora. We used the Mocha JavaScript unit testing frameowork.

We basically had the following pattern: “Send some information in via a POST HTTP request, then use the MongoDB Node.js driver to examine the contents of the database. If what was expected was there, pass, else fail.” An example would be doing a POST request to add a Hit (representing a visitor browsing a page on the site). The Hit model contained the Platform model and Resource model, which held information about the visitor’s web browser, device, which article or other site resource they were viewing, etc.

We had a very liberal view about what information we would accept. Just because the Hit might be missing some fields, perhaps because the visitor’s privacy settings didn’t allow that information to come through, we still want to take it. So we said that unit test would pass if the Hit and its Platform and Resource models were created, and had *any* information.

Other aspects of the tests were more conservative, like the data types. We looked for IP addresses to specifically be strings, for certain ids to be numbers (while most ids were strings) because that’s how stores those ids.

At the end of the day, we were pretty confident we had a working back end that was ready to start logging information.

by Matt at October 03, 2016 01:36 PM

September 29, 2016

Henrique Coelho

Thoughts on API design

Captain’s Log, Stardate 29092016.5. For the past few days, we have been working on the API that will be responsible for pushing the data sent by the client into the database. It is all very simple in theory, but it is always a challenge to come up with a good, consistent flow for an application – when it is just a simple task, such as pushing data, it is all very straight forward, but when it depends on foreign keys and relations, then we start having problems.

For instance, it is easy to receive some information about a user (like login, password and email), and just put in the database; but if the information is stored in multiple tables (like login, password, email, list of friends, list of messages), we need to be more careful: what if one of them fails? Which one should we do first? Ensuring the correct flow for the application, while keeping modularity, and recording and treating errors correctly suddenly becomes very hard.


We are talking about APIs, not OOP, but you got the idea

The problem is common and it is not easy to avoid it, but it gets much easier to deal with when the parts require minimal effort for using, and are powerful enough to perform operations consistently.

It is tempting to make APIs that are very simple: they just put/get/delete a little piece of information, and we are done. It’s easy and quick to get this job done, easy to test, easy to understand, and simplicity is good. But as a consequence, the code outside of it will pay the price: all the logic that you saved in the API may have to go somewhere else. For instance: say you need to design an API that logs items into the database. Easy, just design something like this:

It takes one item and records it. Done. The problem comes when you realize that you may receive a list of items to insert. What do you do then? You will have to make a loop that does this call over and over again – if it is an ajax request, you will have to use promises or nest several callbacks just to insert several items. It would have been much better to design an API that is capable of accepting a list of comma-separated items after all:,myItem2,myItem3

If you are using an SQL database, it is easy to insert items in bulk; if you are using NoSQL, just split the string and you’ll have an array ready to be inserted.

Simplicity is nice, it should always be present, but we also have to make sure that what we are designing fulfills the scenarios we are expecting: if you expect to receive several items, don’t design something that only receives one and has to be called over and over again. People try to save time by making something very simple, but when it is way too simple, you are likely to pay the price later. It takes more time to build this base, but you will be able to code the rest of the application in a breeze.

by henrique coelho at September 29, 2016 10:10 PM

Matt Welke


Today we continued work on the back end API. is going to hand over credentials to limited access to their Amazon Web Services account soon so we can begin testing it on there, so our goal is to have a simple version of the back end API created by Tuesday, preferably also with some client side code can put into their CMS system to call the back end API.

I had to get accustomed to a new Node.js concept today. I was vaguely aware so far of the concept of “middleware”, where functions are executed asynchronously as Node.js completes an http request, but so far I didn’t have to add my own middleware. Now that we’re coding our back end API to be quicker and more efficient with data transfer, I need to get comfortable with it.

Here’s how things are done without middleware:

Normally, http requests come in from the browser to the server and the server sends a response. I’ll call this a “complete request”. If you need to do some work in the web server, during that request, you might choose to spawn a new http request, just to do that work (like getting the data). So for example, if a complete request involves getting a fruit basket, and in order to do that, that fruit basket needs an apple and a banana, the “get fruit basket” request won’t be complete until it starts and receives responses from a “get apple” request and a “get banana” request, both of which would be considered complete requests before the “get fruit basket” request is complete. The client connecting in would only be executing one “get fruit basket” request, the requests that one spawns is abstracted out.

For a more technical, web development type example, it might be like a weather webapp responding to a “get the weather” request with “the weather” in the response, but behind the scenes, it needs to start and receive responses from a “get temperature” request and a “get air pressure” request. Perhaps to get that information, it sends those http requests to weather stations located who knows where.

And here’s how we’re using middleware:

We could do that for our application, but we want to minimize the number of requests the client side sends via AJAX, for multiple reasons. So our client side code will do a “send hit” request into our back end API, which will then break that hit model into the various little models we’re storing in our MongoDB database. The middleware is those requests. Like the “get apple” example above, we have “put platform” (which refers to info about the user agent), “put session” (if they’re starting a new session of hits that we need to track), etc. The client only does a “put hit” request to accomplish this.

And this is okay for us to do because we’re not talking about reaching out to other web apps, to other APIs, to do those middleware steps. We’re staying inside our web app. So there’s no need to go through all the trouble (and added overhead) of doing http requests. We just call functions in our web app.

To implement it, we just add the JavaScript functions doing all those little things into a row of functions the main request needs to call to get it ready to response. A little care is taken to make everything happen in the right order, since it is asynchronous with callbacks, but it seems to be working well so far.


by Matt at September 29, 2016 08:40 PM

September 28, 2016

Matt Welke

Working on API and AWS Management

Over the past few days, I’ve been working on the back end API along with my team mate. There’s nothing terribly exciting to report there. It’s just making the C part of an MVC app. However, what’s more “exciting” to report is the fact I was unexpectedly charged about $27 on my AWS account we’re using for research and testing. We try to test everything locally, seldom running things on the cloud, but we needed to set up a little bit of stuff to explore it and learn how it all fits together.

We accidentally provisioned too much throughput for DynamoDB, more than the free tier allows for. This was what caused the charges. I looked online for instances of this happening to people and looking for how Amazon support would react in the situation, and I found forum threads where Amazon support mentions their willingness to work with people over crazy high bills. This gives me hope that I’ll have my charges easily reversed.

I submitted a case and we’ll see what happens. In the mean time, I’m now looking more into AWS user privileges etc to make sure it doesn’t happen again. I set up alerts, and I’ll soon be switching to an IAM account where I can limit myself to only have certain access. Their IAM user setup is similar to how Linux prevents damage to systems by not giving the root user root privileges. While Linux makes you use sudo, the AWS IAM accounts won’t even be able to use sudo. It’ll be like setting up strict parental controls.

Ironically, this is the exact sort of thing I needed to research today anyways as we prepare to gain access to’s AWS account.

And I’d rather not charge them $27, so I best be careful. >.>

by Matt at September 28, 2016 08:42 PM

September 27, 2016

Matt Welke

Implementing Mongo and Docker

Today I spent the morning comparing the economic feasibility of using DynamoDB vs. MongoDB for our project. I wanted to be very thorough just in case using our preferred technology (MongoDB) would be too expensive. After carefully looking at three cost categories for each technology (compute cost, bandwidth cost, and storage cost), it turns out MongoDB wins in each of them. Here’s a breakdown of the cost we could expect for our particular use case. For confidentiality reasons, I won’t reveal the details for the number of hits expects per month according to their last year of logs, but we used that “number of hits per month” number along with the number of transactions per hit we expect (where a transaction is an ajax query our code will fire against our back end) to come up with these numbers:


Per month as of the 13th running month:

Compute: $0 *They don’t charge for compute time just to take in data
Transfer: $21.67 *They charge $0.0065 per 10 “units” where a unit is one 1 KB transaction per second
Storage: $343.32 *They charge $0.25 per GB per month

Cumulative after 12 months of running: $2,491.60

MongoDB on Elastic Compute Cloud (EC2) with Elastic Block Storage (EBS)

Per month as of the 13th running month:

Compute: $18.99 *They charge $0.026 per hour for the t2.small instances
Transfer: $0 *They don’t charge for data coming into your web app
Storage: $137.33 *They charge $0.10 per GB per month

Cumulative after 12 months of running: $1,120.56 // winner

If you haven’t noticed by now, MongoDB having such a low price per GB stored each month is going to make it excellent for our application. And that’s exactly what the results of my analysis show. We’re definitely going to want to go with MongoDB for this. It will be open source, making it extensible and free for in the future, and it’s cheaper. Maybe they can give me and my team mate bonuses!

if (they.likeUs()) {


The rest of our day was spent switching the first version of the back end API that we made weeks ago to complete the route actions by talking to MongoDB and getting it “Dockerized”. My team mate worked on that while I continued to look into the logistics of using MongoDB on AWS. It turns out it’s a heavily studied subject. I was able to find some very useful resources on combining these two technologies:

by Matt at September 27, 2016 02:54 AM

September 26, 2016

Kezhong Liang

Install Snort on CentOS 5.8 (X86_64)

Snort is a free lightweight network intrusion detection system(NIDS). The following steps are what I installed Snort on my CentOS 5.8 server.

Install CentOS 5.8 (X86_64)
When I installed the operating system, I installed MySQL, HTTP, Development Tools and Development Libararies, and then update it the latest.

Install the necessary packages
# yum install mysql-bench mysql-devel php-mysql gcc php-gd gd glib2-devel gcc-c++

Download snort and its dependent packages
# mkdir /root/snortinstall
# cd /root/snortinstall
# wget
# wget
# wget
# wget
# wget
# wget

Install snort and its dependent packages
# tar xvzf daq-0.6.2.tar.gz
# tar xvzf libdnet-1.12.tgz
# tar xvzf libpcap-1.2.1.tar.gz
# tar xvzf pcre-8.30.tar.gz
# tar xvzf snort-
# tar xvzf tcpdump-4.2.1.tar.gz

# cd libpcap-1.2.1
# ./configure
# make
# make install
# cd /usr/lib64/
# rm
# rm
# ln -s /usr/local/lib/ /usr/lib64/
# ln -s /usr/lib64/ /usr/lib64/
# ln -s /usr/lib64/ /usr/lib64/

# cd /root/snortinstall/
# cd daq-0.6.2
# ./configure
# make
# make install

# cd ../pcre-8.30
# ./configure
# make
# make install

# cd ../libdnet-1.12
# ./configure
# make
# make install

# cd ../snort-
# ./configure –with-mysql-libraries=/usr/lib64/mysql/ –enable-dynamicplugin –enable-zlib –enable-ipv6  –enable-sourcefire
# make
# make install

# groupadd snort
# useradd -g snort snort -s /sbin/nologin
# mkdir /etc/snort
# mkdir /etc/snort/rules
# mkdir /etc/snort/so_rules
# mkdir /etc/snort/preproc_rules
# mkdir /var/log/snort
# chown snort:snort /var/log/snort
# mkdir /usr/local/lib/snort_dynamicrules
# cd etc/
# cp * /etc/snort/

Register on Snort official web site and download rules
# cd /root/snortinstall/
# tar xvzf snortrules-snapshot-2921.tar.gz
# cd rules/
# cp * /etc/snort/rules
# cp ../so_rules/precompiled/Centos-5-4/x86-64/* /etc/snort/so_rules
# cp ../preproc_rules/* /etc/snort/preproc_rules

Edit /etc/snort/snort.conf file
1.change “var RULE_PATH ../rules” to “var RULE_PATH /etc/snort/rules”, change “var SO_RULE_PATH ../so_rules” to “var SO_RULE_PATH /etc/snort/so_rules”, change “var PREPROC_RULE_PATH ../preproc_rules” to “var PREPROC_RULE_PATH /etc/snort/preproc_rules”
2. comment on the whole “Reputation preprocessor” section, because we haven’t whitelist file
3. find “Configure output plugins” section and add the line “output unified2: filename snort.log, limit 128”

Setup MySQL Database
# echo “SET PASSWORD FOR root@localhost=PASSWORD(‘yourpassword’);”|mysql -u root -p
# echo “create database snort;”|mysql -u root -p
# cd /root/snortinstall/
# cd snort-
# mysql -u root -p -D snort < schemas/create_mysql
# echo “grant create, insert on root.* to snort@localhost”|mysql -u root -p
# echo “SET PASSWORD FOR snort@localhost=PASSWORD(‘yourpassword’);”|mysql -u root -p
# echo “grant create,insert,select,delete,update on snort.* to snort@localhost”|mysql -u root -p

Download and install ADODB and BASE
# yum -y install php-pear
# pear upgrade –force
# pear install Numbers_Roman
# pear install channel://
# pear install Image_Color
# pear install channel://
# pear install channel://
# cd /root/snortinstall/
# wget
# wget
# cd /var/www
# tar xvzf /root/snortinstall/adodb511.tgz
# mv adodb5/ adodb/
# cd html/
# tar xvzf /root/snortinstall/base-1.4.5.tar.gz
# mv base-1.4.5/ base/
# cd base/
# cp base_conf.php.dist base_conf.php

Edit base_conf.php file, change parameters as below
$BASE_urlpath = ‘/base’;
$DBlib_path = ‘/var/www/adodb’;
$DBtype = ‘mysql’;
$alert_dbname   = ‘snort’;
$alert_host     = ‘localhost’;
$alert_port     = ”;
$alert_user     = ‘snort’;
$alert_password = ‘yourpassword’;

Create BASE AV
# service httpd restart
Launch the web browser, https://yourip/base, click on “Setup Page” and then click “Create BASE AV”

Secure the BASE
# mkdir /var/www/passwords
# /usr/bin/htpasswd -c /var/www/passwords/passwords base

Edit the file /etc/httpd/conf/httpd.conf, and add the following lines
<Directory “/var/www/html/base”>
    AuthType Basic
    AuthName “SnortIDS”
    AuthUserFile /var/www/passwords/passwords
    Require user base

# touch /var/www/html/index.html
# service httpd restart

Install Barnyard2
# cd /root/snortinstall/
# wget
# tar xvzf barnyard2-1.8.tar.gz
# cd barnyard2-1.8
# ./configure –with-mysql-libraries=/usr/lib64/mysql/
# make
# make install
# cp etc/barnyard2.conf /etc/snort/
# mkdir /var/log/barnyard2
# chmod 666 /var/log/barnyard2
# touch /var/log/snort/barnyard2.waldo

Edit the file /etc/snort/barnyard2.conf,
change “config hostname:  thor” to “config hostname: localhost”
change “config interface: eth0” to “config interface:  eth2”
add the line at the end of file “output database: log, mysql, user=snort password=yourpassword dbname=snort host=localhost”
Note: my eth0 use to launch the BASE web page, my eth2(don’t set IP) is myricom 10ge card and use for snort

# /usr/local/bin/snort -u snort -g snort -c /etc/snort/snort.conf -i eth2
If it prompts “Initialization Complete”, it proves to work.

Make Snort and Barnyard2 boot up automatically
Edit the file /etc/rc.local, add the below lines
/sbin/ifconfig eth2 up
/usr/local/bin/snort -D -u snort -g snort -c /etc/snort/snort.conf -i eth2
/usr/local/bin/barnyard2 -c /etc/snort/barnyard2.conf -d /var/log/snort -f snort.log -w /var/log/snort/barnyard2.waldo -D

Reboot and enjoy my pig to snort
# reboot

Install Snort 2.8.6 on CentOS 5.5

Filed under: Uncategorized

by kezhong at September 26, 2016 09:55 PM

Henrique Coelho

Cookies, Third-Party Cookies, and Local/Session Storage

In this post I will make a brief introduction to cookies, but more importantly, I want to talk about third-party cookies: What are they? Where do they live? How are they born? What do they eat? And what are the alternatives?

Due to the nature of HTTP (based request and response), we don’t really have a good way to store sessions (have a fixed, persistent memory between visits in a webpage), this is solved by using cookies: the website creates a text file on the client’s computer, and this cookie can be accessed again by the same website. This solves the problem of having to ask the client’s username and password for every single page, for instance, or storing information about his “shopping cart” (when not using databases). There is also a security feature: a website cannot access cookies that it didn’t create, in other words, the cookie is only available to the domain that created it.

According to scientists, this is how a cookie looks like

HTTP Cookies first appeared in the Mosaic Netscape navigator (the first version of Netscape), being followed by Internet Explorer 2. They can both be set on the server-side (with PHP, for instance) or on the client-side, using JavaScript. There are several types of cookies:

  • Session cookies are deleted when the browser is closed
  • Persistent cookies are not deleted when the browser is closed, but expire after a specific time
  • Secure cookies can only be transmitted over HTTPS
  • HttpOnly cookies cannot be accessed on the client side (JavaScript)
  • SameSite cookies can only be sent when originating from the same domain as the target domain
  • Supercookie is a cookie with a “top level” domain, such as .com and are accessible to all websites within that domain
  • Zombie cookie gets automatically recreated after being deleted
  • Third-party cookies (I will talk about them now)

Third-party cookies

Cookies are a lot more powerful than they seem, despite the obvious limitation of only being available to the domain: let’s suppose I have a website called and I decide to put some ads in other websites:, and Instead of simply offering these websites a static image with my advertisement, I could pass a PHP file that generates an image:

<a href="..."><img src="" /></a>

This PHP file would just generate an image dynamically, but it could do more: it could send JavaScript to the clients and set cookies in the websites that I announced. When someone access the website, my script could detect the website address ( using JavaScript (window.location), and record this information in a cookie. When the user navigates to other websites with my ads, such as or, my script would repeat the process. This information would be accessible to me: I would know exactly which websites the user visited.

Courtesy from Wikipedia


Problems with privacy and blocking

Needless to say, people who are slightly concerned with privacy do not like cookies; especially for non-techie people, this is a very convenient witch to hunt – I am surprised that magazines and newspapers are not abusing it. Most modern web browsers can block third-party cookies, which is a concern if you are planning a service that relies entirely on this feature.

It’s not easy to find statistics about cookie usage, but I got one from Gibson Research Corporation:


Browser usage


Cookie configuration by browser, where: FP = First-Party cookie and TP = Third-Party cookie.

It seems that Third-Party cookies are disabled on Safari by default, while other web browsers are also getting more strict about them. Despite still being used, it seems that this practice is reaching a dead-end. On top of that, cookies are also not being able of tracking users across different devices.


Alternative: Local/Session Storage

Apparently, cookies are dying. It may be a little too early to say this, but we don’t want to create something that will be obsolete in 5 years, so it is a good idea to plan ahead. What is the future, then?

Probably, the most promising tool is called Local and Session Storage, it also seems to be supported in the newest browsers:

Compatibility for Local and Session storage

The way Local and Session storage work is very simple: they behave as a little database in the browser, storing key-value pairs of plain text. While Local Storage is persistent (does not get deleted), Session storage lasts only while the browser is open (it is deleted when the browser is closed, but not when the page is refreshed). It is great for storing persistent and non-sensitive data, but they are not accessible from the server: the storage is only accessible from the client-side – if the server must have access to it, it must be sent manually.

Using the local storage, it is possible to build a similar system to Third-Party cookies, with methods similar to the ones I explained. Here is an article on how to do this: Cross Domain Localstorage.



by henrique coelho at September 26, 2016 02:38 PM

September 25, 2016

Matt Welke

MongoDB Pretty Dynamite After All

Alright, that’s it for the DynamoDB puns, I swear.🙂

I feel like on Friday I finally had a breakthrough for our project while investigating technologies to use for the project. My team mate built a prototype of our back end and the front end we will need to create in a few months, with Node.js and DynamoDB for the back end, so that we could demo it after the weekend. Meanwhile, my job for the day was to take one more stab at finding an open source alternative to DynamoDB. I looked further into using MongoDB.

MongoDB Atlas is limited to 10 TB if you use it as a service. This isn’t good enough, but luckily, you can use Amazon Elastic Cloud Compute (EC2) to provision your own instances to run your web apps, and this includes choosing instances that have up to 48 TB of cheap, magnetic storage per instance. Eureka! We were so obsessed with looking at X-as-a-Service that we forgot about the DIY options. We can still use Docker to make things easier for the client, reducing maintenance and tech knowledge needed to migrate the system we build in the future. So this may be an amazing sweet spot: An EC2 instance running our Node.js back end in a Docker container, and another EC2 instance with tons of storage running MongoDB in a Docker container, to accept and store the data. The latter can scale to multiple EC2 instances all running MongoDB (sharded) to store more and more data as grows. And it’s all open source.🙂

After this discovery, I created a small prototype of a Node.js and MongoDB back end, running in two separate Docker containers locally, to get a sense of how it would fit together. Docker ended up being very intuitive. I think it’s going to be a popular tool over the next few years. My prototype worked. My team mate’s prototype worked. Mission accomplished!

by Matt at September 25, 2016 08:17 PM

Andrew Smith

DIY portable socket organizer

I wanted to build this for a long time. I hate looking through random boxes for a socket that’s 1mm larger or smaller than another socket which almost fits the nut I’m taking off :)


Part of the problem is that all the large socket sets you can buy have some sizes missing. Even the expensive socket sets. So it took me a while to assemble this set. It includes:

  • Most of the black short and deep sockets are from the original set, which I bought used (I was told it was an impact socket set but now I think that was a lie).
  • A couple of the short sockets are Mastercraft Maximum.
  • One of the short sockets and three of the deep sockets are from Princess Auto.
  • One of the deep sockets is from Amazon.

Crazy, yeah? I also have in here:

  • A nice short 3/8″ Ingersoll Rand air ratchet with a swivel adapter from Lowe’s. My IR impact wrench wouldn’t fit in here with everything else.
  • A wobble extension bar set from Canadian Tire.
  • A full adapter set from Princess Auto.
  • And three ratchets and a screwdriver bit set from I don’t even remember where.

The box I found at the curb – someone threw it out. It was lined with foam with indentations for what looked like ceremonial spear heads, is my best bet. Something fancy or another that broke or was no longer loved. But it worked great for this purpose.

The holes in the polystyrene (not anything good, just from packing material) were cut with a fret saw. The grooves on the wood I cut with a mitre saw because I didn’t yet have a table saw at that time. The dado on the big piece of wood was also made with the mitre saw (mine has a depth adjustment).

Everything except the big wood pieces is held together with hot glue. I was originally planning to make this modular so that I can remove and insert pieces as time goes by. I had to give up on that idea – it was too complicated, but I figured hot glue will be good enough anyway.

I just finished this socket organizer but it seems like it’s going to hold. The flimsiest part is the polystyrene, but I wasn’t going to spend 20$ on a sheet of better quality stuff from an arts supplier and I didn’t have enough drill bits to match every socket size in wood. I need a bit of friction so they don’t fall out.

Looks great, works great. Weighs a lot but it’s a good solid box that I don’t intend to bang around. Now I just have to figure out a good, portable way to organize my wrench set :)

by Andrew Smith at September 25, 2016 05:43 PM

September 24, 2016

Eric Brauer

First Post!

Hello, this is my first post for this work blog. I’ve been at CDOT for nine months, but only now am I attempting that feat that some call: ‘blogging on a consistent basis.’ Many have tried to reach that summit, many have failed. (Ha!)

In the coming weeks, I hope to have some interesting updates on some of our work at CDOT. I’ll confess to sometimes being at a loss in what to write about/talk about when pressed for content. The fantastic thing about working at CDOT is that I’m constantly encountering new concepts and technologies. Yet I’m all too aware that for many among the prospective audience, these concepts are familiar. I’ll try not to be tedious..!

As for role at CDOT: I’m coming from a computer engineering program, which basically means that I end up being the go-to guy whenever a computer is being asked to place nice with external bursts of electrical current.  We always hope that these are useful bursts of electrical current. Hopefully they are coming from a sensor, or a switch, or hopefully they are going to a light or external device.

I expect to be back in a week’s time with something interesting to discuss.

by ebraublog at September 24, 2016 02:50 PM

September 23, 2016

Matt Welke

MongoDB kinda Dynamite

Well, where to start… This was a busy few days!

To start with, I should apologize for the previous blog post bashing DynamoDB before Amazon gets sad. It turns out my team mate and I were completely wrong about its abilities. We jumped to the conclusion that it could not support queries without specifying the primary key. It turns out it can. And it does this through the use of indexes that you manually specify (which you can do after the table has been created too). These indexes aren’t quite like indexes in relational databases though. They’re hashes. And given DynamoDB’s price competitiveness, we’re pretty happy looking at it as an option. Tl;dr we can use DynamoDB basically just as feasibly as any other NoSQL database. For the non-tl;dr, see my team mate’s blog post where he gets into the gritty details about DynamoDB and hash indexes.

Now on to the rest! We’re still not ready to start building because we’re still stuck at the stage where we decide on technology. Technologically, DynamoDB is good enough for us. However, I don’t work at the Seneca College Centre for Development of Proprietary Technology. We breath open source. If we can find an open source alternative to DynamoDB that is comfortable with, we can avoid coupling them too tightly to a proprietary technology. Freedom is nice.

So open source is good, NoSQL is necessary. All aboard the hype train, next stop MongoDB? I’ve known about MongoDB for years, and to my knowledge, it’s past its hyped up evangelized stage and if it’s still around, it must be good. I researched it. Turns out it’s massively scale-able, provided you have the metal for it, or if you want to go on the cloud, the database’s creator even offers MongoDB Atlas, a Database-as-a-Service (DBaas). Atlas sounds like what we need, but it better be scaleable. That’s the problem we keep running into with these cloud services. They’re convenient but limited. The main reason we need NoSQL at this point is because all of the relational database services on Amazon Web Services have a 6 TB limit, and we know we’ll need much more space than that if wants to run our creation for years to come. From what I gathered reading Atlas’s whitepaper and website pricing info, we should be able to get at least 10 TB from their cloud service. Better… But I sent them an email requesting more information and guidance before we take it too seriously.

Now here’s where the drama starts again. I researched how accurate MongoDB was. That is, is it just as strong as a traditional relational database? Is it ACID-compliant? Will it explode?



Our database for the project, maybe.

Yes. MongoDB’s criticism after its initial hype was warranted after all. It is absolutely garbage for relational data, *if* you need that data to be accurate. It isn’t ACID-complaint on the transactional level, so if you have a power outage, say goodbye to the usefulness of your data. Now for our project, this may be okay, because we don’t need the data to be super accurate. If things get disjointed after being denormalized (as anything must be to fit into a NoSQL database), it just slightly reduces the amount of useful analytics data we would have to work with. And having 99.99% of our analytics data that we mined available to us is still completely fine. However, I would definitely not use MongoDB in the real world for anything involving important information, especially e-commerce etc. If this paragraph went over your head or bored you, you’d probably enjoy reading the use case blog post I read to discover this. In it, American developer Sarah Mei describes how using MongoDB during the launch of Diaspora almost destroyed the project, and why they ultimately had to retreat back to relational databases.

So what does this mean? We’re closer, but we need to triple check the safety of using MongoDB for this project (and all NoSQL databases like DynamoDB for that matter!) before finally getting started building.


by Matt at September 23, 2016 02:37 AM

September 21, 2016

Henrique Coelho

Getting acquainted with DynamoDB

In my previous post about dynamodb I explained some limitations and quirks of DynamoDB, but I feel I focused too much on the negative sides, so this time I will focus on why it is actually a good database and how to avoid its limitations.

DynamoDB is not designed to be as flexible as common SQL databases, regarding to making Joins, selecting anything you want, and making arbitrary indexes: it is designed to handle big data (hundreds of gigabytes or terabytes), so it is natural that operations like “select all the movies from 1993, 1995 and 1998” would be discouraged – the operation would just be too costly. You can still do them, but it would involve scanning the whole database and filtering the results. Having this in mind, DynamoDB appears to be useful only if you are working with Big Data, if not, you’ll probably be better with a more usual database.

So, what is the deal with queries with Secondary Indexes, exactly (I mentioned them in my previous post)? To explain this, it is good to understand how indexes work for DynamoDB, this way we can understand why they are so important.

Suppose we have this table, where id is a primary key:

id (PK) title year category
 1  The Godfather  1972 1
 2  GoldenEye  1995 1
 3  Pirates of Silicon Valley  1999 2
 4  The Imitation Game  2014 2

In this case, we could search “movies which the id is 3”, but not movies which the id is less than 3, more than 3, different than 3, or between 1 and 3 – this is because the primary key must always be a hash. Although it is a number, due to the way that this ID gets indexed (probably in a binary tree) makes it impossible to be searched by criteria that demand sorting, it can only be an exact value.

Now, I already explained that in order to make queries, we always need to use the primary key. This is true, but not entirely: you can create “secondary primary keys” (global secondary indexes), so you can search based on them, and for secondary indexes, they do not have to be unique. I will explain what are “local secondary indexes” later, for now I’ll focus on global indexes: we could make a global secondary index in the category of the movie:

id (PK) title year category (GSIH)
 1  The Godfather  1972 1
 2  GoldenEye  1995 1
 3  Pirates of Silicon Valley  1999 2
 4  The Imitation Game  2014 2

Where GSIH = Global secondary index, hash. Indexes need a name, so I will call this one “CategoryIndex”.

Now that we have a secondary index, we can use it to make queries:

TableName : "Movies",
IndexName: "CategoryIndex",
ProjectionExpression:"id, title, year",
KeyConditionExpression: "#cat = :v",
ExpressionAttributeNames: { "#cat": "category" },
ExpressionAttributeValues: { ":v": 2 }

This will get us the movie The Godfather and GoldenEye. The attribute “category”, however, is still a hash, and this means we can only search it with absolute values.

Not very intuitively, indexes can actually have 2 fields, the second one being optional: a hash (in the examples I showed, id and category), and a range. Ranges are stored sorted, meaning that we can perform searches with operators such as larger than, smaller than, between, etc – but, you still need to use the hash in the query. For instance, if we wanted to get the movies from category 2 from 1995 to 2005, we could turn the attribute year into a range, belonging to the index CategoryIndex:

id (PK) title year (GSIR) category (GSIH)
 1  The Godfather  1972 1
 2  GoldenEye  1995 1
 3  Pirates of Silicon Valley  1999 2
 4  The Imitation Game  2014 2

Where GSIH = Global secondary index hash, and GSIR = Global secondary index range.

TableName : "Movies",
IndexName: "CategoryIndex",
ProjectionExpression:"id, title, year",
KeyConditionExpression: "#cat = :v and #ye between :y and :z",
ExpressionAttributeNames: { "#cat": "category", "#ye": "year" },
ExpressionAttributeValues: { ":v": 2, ":y": 1995, ":z": 2005 }

This would give us the movie Pirates of Sillicon Valley. Global secondary indexes can be created/deleted whenever you want: you can have up to 5 of them in your table.

Local secondary indexes are the almost the same, the differences are: instead of creating a hash and an optional range, the primary key is the hash, meaning it will have to appear in the query. They are also used to partition your table, meaning that they cannot be changed after the table is created.

But after all, why do we still need to divide our data into smaller categories to search? Well, because if you are working with big data, you should divide your data into a smaller piece somehow, otherwise it will be just too hard to search. How can you divide it? Just find something in common that would separate the data nicely into homogenous groups.

Remember my other example, when I only wanted to search movies from 1992 to 1999, but without scanning the whole table? How could we do this? Let’s think a bit about this example: why would you query this? If you are querying this because your website offers a list of “all movies released from the year X to Y in the Z decade”, you could make use of this common ground, create an attribute for it, and index it like this (I’ll call it DecadeIndex):

id (PK) title decade (GSIH) year (GSIR) category
 1  The Godfather  70  1972 1
 2  GoldenEye  90  1995 1
 3  Pirates of Silicon Valley  90  1999 2
 4  The Imitation Game  00  2014 2

Now look: we have a hash index (decade) that covers all the possible results that we want, and we also have a range field (year). We can search it with:

TableName : "Movies",
IndexName: "DecadeIndex",
ProjectionExpression:"id, title, year",
KeyConditionExpression: "#dec = :v and #ye between :y and :z",
ExpressionAttributeNames: { "#dec": "decade", "#ye": "year" },
ExpressionAttributeValues: { ":v": 90, ":y": 1992, ":z": 1999 }

If I didn’t type anything wrong, we would get the movies GoldenEye and Pirates of Silicon Valley.

If you are like me, you are probably thinking: “Ok, but what if I wanted movies from 1992 to 2005? This will span more than 1 decade”. This also simple to solve: if this is a possibility, you could have another index with the same functionality, or simply query once per decade – it seems costly, but since the entries are indexed, the operation will still be infinitely faster than doing a scan (and probably faster than doing the same operation in an SQL database).

In conclusion, DynamoDB seems to be extremely efficient for operations in tables with enormous amounts of data, but it comes with a price: you must plan the structure of your database well and create indexes wisely, having in mind what searches you will be doing.

by henrique coelho at September 21, 2016 09:55 PM

Why DynamoDB is awesome and why it is not

I made a new post about DynamoDB and how to solve its limitations: Getting acquainted with DynamoDB


We still don’t know for sure which technologies we are going to be using for our API, including the technologies for the databases; the two main technologies we are focusing right now is DynamoDB and PostgreSQL. Most developers are already familiar with PostgreSQL: it is an open-source, free SQL database, similar to MySQL; DynamoDB, however, is a No-SQL, proprietary database that belongs to Amazon.

We made our research and tried both, these are our impressions and the main differences:

DynamoDB PostgreSQL
Structure NoSQL SQL
Documentation Misleading and confusing Good
Price on AWS [1] Cheap and flexible Fair, but not flexible
Syntax Has its own syntax SQL
Easiness to use [2] Fair Very easy
Scalability on AWS Excellent Good
Performance on AWS Excellent Good
  • [1] AWS = Amazon Web Services.
  • [2] May be misleading, since we come with an SQL perspective, so there is not much to learn in order to use PostgreSQL. In fairness, DynamoDB does a good job on being intuitive.

It seems that DynamoDB is a fair competitor, however, it may have a dealbreaker: the way it handles indexes and queries. To explain this, let’s suppose we have the following table called Movies (NoSQL doesn’t have tables, I know, can you stop being pedantic pls? Besides, this is actually the correct name for tables in DynamoDB: table):

id (PK) title year category
 1  The Godfather  1972 1
 2  GoldenEye  1995 1
 3  Pirates of Silicon Valley  1999 1
 4  The Imitation Game  2014 1

Just a disclaimer before I start explaining the differences: the documentation for DynamoDB is very obscure, so it is possible that I am missing some pieces of information or simply misunderstood them. So, in DynamoDB, your primary key must be a hash field – it is unique, but cannot be searched as a range (you can’t search for “id between 1 and 5”, for instance). You can, however, specify another column to be a range (year could be a range). For this example, there is only one index: id.

In order to select all the data from the table, this is how we could do in SQL:

SELECT id, title, year FROM Movies;

This is how we could do with Dynamo (it may have an error somewhere, I can’t test it now, just bear with me, ok?).

TableName: "Movies",
ProjectionExpression: "id, title, year"

Nothing incredible, right? ProjectionExpression are the fields we are looking for. This kind of operation is called a scan – it scans all the table and gets all the results. So how would we search for a specific ID, say, ID 3? In SQL:

SELECT id, title, year FROM Movies WHERE id=3;

In DynamoDB:

TableName : "Movies",
ProjectionExpression:"#k, title, year",
KeyConditionExpression: "#k = :v",
ExpressionAttributeNames: { "#k": "title" },
ExpressionAttributeValues: { ":v": 3 }

Weird, right? But the idea is actually simple: #k and :v are placeholders – #k is ‘id’ and :v is ‘3’, just like variables and their values. KeyConditionExpression is the condition, ExpressionAttributeNames are the “map” for the keys, ExpressionAttributeValues are the “map”for the values.

So far so good, but here is the catch: when you create a table in DynamoDB, you have to specify a primary key which is also the index, and you cannot make a query that doesn’t use the key in the condition. What I mean by this is that, say you want to “find the movies made in the 90s”, putting the condition in the query… Well, in principle, you can’t, simple as that – because you are not using the primary key in the condition. There are, however, workarounds for it: doing scan and filter, and using secondary indexes.

The first alternative is doing a scan in the database (getting all the data) and then filtering it like this:

TableName: "Movies",
ProjectionExpression: "id, title, year",
FilterExpression: "#yr between :start_yr and :end_yr",
ExpressionAttributeNames: { "#yr": "year", },
ExpressionAttributeValues: { ":start_yr": 1990, ":end_yr": 1999 }

Seems simple, but it has a big drawback: you will actually pull ALL the data from the database and then filter it – this is often unacceptable if you have large quantities of data.

The other alternative is doing what they call “secondary indexes”, and this is where things get complicated: secondary indexes can be local or global – local indexes can be queried, but must still be dependent on the original hash key (the primary key), global indexes can be queried, HOWEVER, they must rely on another hash, one that is not the primary key. If we made a global secondary index for year that used the category as the hash, we could query for the “movies made between 1990 and 1999 which belong to category 1” (assuming that category is the hash and year is the range) like this:

TableName: "Movies",
IndexName: "CategoryYearIndex",
ProjectionExpression: "id, title, year",
KeyConditionExpression: "category = :cat and year between :ys and :ye",
ExpressionAttributeValues: { ":ys": 1990, ":ye": 1999, ":cat": 1 }

Which is reasonable, HOWEVER, global secondary indexes also have problems: you are still tied to a hash, and you have to pay to use them.

Alright… But that does not really answer the question: how can I use it to “select movies from 1990 to 1999”, without using another hash? Well, as we understood from the documentation, the only way around this is scanning your whole table and filtering it. Not ideal, HOWEVER, local secondary indexes kind of solve this: I read in another blog post that if you do scans filtering secondary queries, it is still very performant and won’t be as costly as fetching all the data. HOWEVER, local secondary indexes can only be made in the moment of creation of the table: you cannot change or add them, which is not exactly scalable.

It seems that DynamoDB is really powerful and easy to use if you want to make simple, fast queries to only retrieve values without too many conditions – it will really shine in these situations. But if your system requires more unusual queries and your indexes may change over time, I don’t think DynamoDB is a good choice; you can work around these limitations, but I feel like you will just be swimming against the current.

by henrique coelho at September 21, 2016 12:12 AM

September 20, 2016

Matt Welke

DynamoDB Not So Dynamite

Today we checked out the wonders of NoSQL, specifically with DynamoDB. As we get closer to finalizing our plan on what technology to use for the project, we wanted to investigate the database the client likes to use for their setup right now. That’s DynamoDB, a NoSQL database-as-a-service that’s apparently quick to setup, easy to use, and infinitely scaleable. It’s scalability comes from the fact that Amazon abstracts out all the details of maintaining a robust database to store tons of data. We would simply throw data into it.

The problem is… throwing data in and pulling data out seems to be all DynamoDB is capable of. It excels at a few things. Its speed and scalability are amazing. But due to its nature, you cannot query a table based on attributes alone without iterating over every row in the table and checking it. You need the primary key to do a query. This is useless for data analysis. We need something where we can ask it “what are the articles published between such and such date that fit into such and such category” etc. We need something more capable than DynamoDB for the kind of work we’re going to be doing. Luckily, Amazon does offer another automatically-managed database service… RDS. We need to worry about instances, how powerful they should be, and it’s billed by the hour, not by the millisecond, but it’s able to do what we need. It allows databases such as PostgreSQL to run on it, which we strongly believe at this point we will need.

The rest of the week will be used to prepare our plan to submit to the client on Thursday and hopefully we’ll get to start building at that point! I’ve been anxious to start building. So far, it feels like we haven’t done anything measurable or useful. But I suppose when it comes to programming, it’s best to measure thrice and cut once.

by Matt at September 20, 2016 10:54 PM

Matthew Marangoni

Localization with React - Supporting a Multitude of Languages

In the HTML5 client, we've been working to implement an open-sourced method for translating text throughout the application (known as localization). What seems to be the most popular and compatible method for us is including the yahoo/react-intl package to support localization. One of the issues with this package is that it is not optimal for BigBlueButton's application in the sense that each language must be statically imported and loaded along with the application each time it is loaded. In most cases, this is not an issue since the majority of websites and web apps only support a handful of languages. For the BigBlueButton project however, anywhere from 50-100 languages will need to (eventually) be supported. This means, every user would potentially be loading all 100 languages every single time they want to use the HTML5 client, when in reality they will only need two or three  languages at most.

Right now I'm trying to determine if there is a configuration file in this package that we can use to do a few things:
  1. Detect the browser language and region (i.e. pt-BR) for Portuguese as spoken in Brazil
  2. If the language and region combination is not found, try instead for just the language (i.e. pt). Regions are not required, only optional.
  3. Set the application's default language to en-US.
After looking over the issues and pull requests on the react-intl github page, it doesn't appear that there is anything already in progress that will suit our needs, therefore we may be forced to come up with our own custom solution that works with this package, or find a different, more suitable one.

Another small issue that exists with this package is that all FormattedMessage's require a default message attribute which is identical to the message found in the en.json file. Problems that arise from this are that when a message needs to be altered, it will need to be changed in two places anywhere that text which requires translation can exist. The other issue is that when a default message is left out, screen readers will instead read the id attribute of the formatted message, which has no value or meaning to the user. Currently these are the two major issues we've encountered with the react-intl package and are in the process of seeking a solution or alternative.

by Matthew Marangoni ( at September 20, 2016 08:43 PM

September 19, 2016

Jaeeun Cho

Localization and Internationalization.

As the world is getting bigger and online users are increasing rapidly, websites or applications need to provide the websites which is proper in the different nations especially for the international companies. Because each country use different languages and formats for date, time, currencies and others. Unfortunately, it is easy to overlook to consider users from a different countries. To fix these issues, we need to understand 'Localization' and 'Internationalization'.


Localization refers to the adaption an existing website to local language and culture in the place that will using the current website.It is sometimes written as 'i10n' which is 10 number of letters betwen i and n. A successfully localized website is that users can feel naturally when they use the website.


Internationalization supports the languages and other formats of people from the different world and cultures. It is often written 'i18n' which is eighteen letters between 'i' and 'n'. In the websites or applications, languages or formats are typically localized in according to the users by locales. 

Internationalization might involve the below:

  1. Barriers might remove in designing and developing to localization or international deployment. This entails the enabling the use of Unicode(UTF-8), or the proper character encoding.
  2. Add markup to support bidirectional text, or languages. Or add CSS support for vertical text.
  3. Code should support local, regional, language, or culturally related preference. For example, each country has the different format of date, time, calendars, number and even name and address.


A "locale" is a collection of language-related user preference information represented as a list of values. The locales argument must be either a string in BCP47( or an array of language tags. If there is no arguments in locales, the default locale is going to be used. The language tag which is languages, scripts, countries and others in BCP47 can find in the IANA Language Subtag Registry.

by Jaeeun(Anna) Cho ( at September 19, 2016 11:35 PM

Matt Welke

Lambda vs. Docker

Today was heavy on research… We want to make sure we choose the right technology for our back end. Our client wants to use Amazon Lambda, which is the most popular of the new “serverless” architecture services, and we at first wanted to just make an application and push it to a cloud service. The solution is probably going to be a compromise in between.

We wanted to just create an application because that’s what we’re comfortable with. We’re students. We tinker with new technologies and quickly make prototypes and host them. We don’t necessarily care about supporting them. In order to be effective developers in the real world, we need to be willing to see things from the client’s perspective. They want to use “serverless” architectures. This is a bleeding edge new type of cloud service where you don’t have to think of the server you’re going to run your code on because they provision the servers for you. That’s what they mean by “serverless”. The server is there, it’s just abstract now. (I think of it like encapsulation with object oriented programming, the low level code actually doing things is there, it’s just abstracted by my methods). With the serverless architecture, you don’t even have to think of the application you’re going to create, just the code inside it, because it will actually take raw code and just run it on applications that it creates, on servers that it provisions. On paper, it sounds wonderful. But we are definitely running into some pain points as we try to adapt to this new style.

Most of the problems with using serverless architecture that we’ve encountered seem to come with a general unfamiliarity with the Amazon Web Services online interface. But a lot of the issues probably also stem from the relative immaturity of the serverless architecture compared to more traditional methods of doing web development. This sentiment was echoed by a blogger I encountered who described Amazon Lambda as being “not ready” just seven months ago. From when we tested it out, we encountered tons of digital red tape so to speak. We couldn’t just pass parameters to code to do stuff with them. We had to write middleware and config files just to get the parameters into our code.

An alternative we’re considering is using Docker. It has the advantages of an actual application (we get our parameters! yay!) but it also has that abstract slightly serverless style that the client is going for. It’s supposedly independent of the server you’re going to run your Docker “images” on. The client should be able to take a Docker image we produced and easily get it running on any Docker-supporting cloud provider they wish. And that includes Amazon Web Services, where they currently have everything running. Docker itself has a learning curve that my team mate and I will need to get through to go this route, but I’m confident we can get comfortable with it. Today, we were indeed able to get something running locally and pushed to Amazon, so this so far looks much more feasible to work with than Lambda.

One thing’s for sure, in the past few weeks I’ve learned that there’s a lot more to web dev in the real world than I imagined!

by Matt at September 19, 2016 09:35 PM

Laily Ajellu

Audio and Video in your App - Get "A" certified

In last week's post (Input Forms Accessiblity - Get "A" certified) , we started to analyze what our web apps need to reach level A of accessibility certification.

Now we'll analyze how to make the audio and video in your web app accessible.

Why make Time Based Media more Accessible

Alternatives allow your content to be easily duplicated or re-purposed, which can make your application more popular and available to a wider audience.

For time-based media (media where info unfolds over time - and video), we provide alternatives for the following audience:
  • Visually disabled users
  • Users who have less knowledge of the topic
  • Hearing disabled users
  • User’s with time limitations

How to make Time Based Media more Accessible

  1. Link to a text transcript containing the content below - right beside the link to a video/audio page

  2. Link to an audio MP3 - right beside the link to a video page. The audio MP3 should be a mix of the audio extracted from the video and the content below

  3. Add captions to the video

Content of transcripts, captions and supplementary audio

  1. Identify the name of the person who is speaking
  2. Identify if the speaker is a person from the audience or the main speaker
  3. Identify if the statement is a question or a part of the main content
  4. Mention if there's applause
  5. Mention if there's laughter
  6. Mention when music is being played
    • Identify non-vocal music:
      • Title
      • Movement
      • Composer
      • Tempo
      • Mood
    • Transcribe the lyrics of vocal music
  7. Note other significant sounds that are part of the recording
  8. Description of what’s being show in the video
    • Actions
    • Characters
    • Scene changes
    • On-screen text

    Note: If there is already a text transcript of a video/audio, captions are not required

Special Cases

Interaction with a Video

If you have to interact with the video (Click here to continue/Choose an answer), you must provide a keyboard accessible way to interact, and include it in the transcript or text summary of the video

Video tutorials (without Audio)

  • Make sure any text in the video is placed in a transcript
  • If there is no text in the video, add a brief summary of what is shown in the video

by Laily Ajellu ( at September 19, 2016 08:37 PM

September 17, 2016

Andrew Smith

What does a POSIX signal handler and an SQL transaction have in common?

Since I expect this to be a long post, I’ll give you the answer at the top: both are in-effect critical sections, you should avoid performing unnecessary operations there at all cost. Or else it will cost you and other people days, months, years of wasted time.

POSIX Signals

I’ll start with a story about one of my successful open source projects: Asunder. I took over the project from a nice but busy guy at version 0.1. Most of it was written (pretty well too), and I’ve been fixing things here and there, improving it one bit at a time.

One of the things I added was logging, to make sure that I can fix problems experienced by others in their own environments. It was very useful (I now very rarely get bug reports) but I made one mistake: I added an fprintf in a SIGCHLD signal handler. It took me literally years to figure out that was a terrible mistake. For at least two years I kept getting bug reports about random, unrelated freezes and the log never provided any answers. This is what was happening:

  1. The app was running, starting sub-processes and waiting for them to complete.
  2. When a sub-process completed – it sent a SIGCHLD to the parent. That signal was handled in a signal handler, which interrupted whatever code was currently running in the parent.
  3. The above is expected and rarely a problem, except it turned out that the printf function makes some kind of global lock while it’s doing it work. So when:
  4. A signal handler itself was interrupted by another signal, the printf in the new signal handler was waiting for the old printf to complete, which would never happen because the original printf was interrupted by the new one.

When I figured that out I cried a little in my mind. But I fixed it and took it as a good addition to my industry experience.


Fast forward a few years where I maintain a MediaWiki instance for our school. I migrated it to a new server, updated the PHP to the newest version, updated the database, etc. All worked well.

But then the semester started and new students were trying to register to get accounts on the wiki. And disaster struck. It turned out that new users could not register. To make thing worse – existing users couldn’t get their passwords changed or reset. Right in the beginning of the semester. When I did the migration I tested everything, but have not considered that operations on the user table were in any way special. Turns out they’re not, except they are. Here’s what was happening:

  1. The first person since the servers were rebooted tried to register for an account. The web interface would just hang there, with the spinning circle until the end of time. Not time outs or error messages.
  2. MediaWiki started an SQL transaction on the MySQL backend. To record that a user is being created.
  3. Before committing the said SQL transaction – MediaWiki would attempt to send an email to the new user via some PEAR library, via the server configured in $wgSMTP.
  4. $wgSMTP was not configured correctly, and the step above never completed.
  5. Which means the SQL transaction was never committed.
  6. Which means the users table remained locked, permanently.

I spent so much time (including three overnighters) figuring this out! I ended up nearly desperate, asking for help on the MediaWiki-l mailing list. One guy (Brian Wolff) replied saying he doesn’t know what the problem is but he offered what turned out to be the straw I needed ti figure it out myself: enabling the MediaWiki debug log. I had a bunch of logging enabled already, but this is the one that showed me the deadlock.

Before that, I would stare at MySQL’s “SHOW FULL PROCESSLIST” and wonder how it’s possible that even though no queries were being executed – new ones would result in a timeout like this:

MySQL [cdotwiki_db]> SELECT user_id FROM `mw_user` WHERE user_name = 
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transactio

I would look at the output of “SHOW ENGINE INNODB STATUS;” and wonder why there are multiple transactions there that have been sitting for hours but not causing a deadlock, even though it looked like a deadlock. I spent hours trying to decipher memory dumps like this:

---TRANSACTION 4748, ACTIVE 62 sec
8 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 8
MySQL thread id 1345, OS thread handle 140329361524480, query id 20771 
web-cdot1.sparc cdotwiki_usr cleaning up
Trx read view will not see trx with id >= 4742, sees < 4742
TABLE LOCK table `cdotwiki_db`.`mw_user` trx id 4748 lock mode IS
RECORD LOCKS space id 70 page no 157 n bits 624 index user_name of table 
`cdotwiki_db`.`mw_user` trx id 4748 lock mode S locks gap before rec
Record lock, heap no 244 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 0
  0: len 6; hex 41736f583139; asc AsoX19;;
  1: len 4; hex 000001ac; asc     ;;

Record lock, heap no 558 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 0
  0: len 8; hex 41736d6974683230; asc Asmith20;;
  1: len 4; hex 000036c2; asc   6 ;;

TABLE LOCK table `cdotwiki_db`.`mw_user` trx id 4748 lock mode IX
RECORD LOCKS space id 70 page no 436 n bits 112 index PRIMARY of table 
`cdotwiki_db`.`mw_user` trx id 4748 lock_mode X locks rec but not gap
Record lock, heap no 42 PHYSICAL RECORD: n_fields 17; compact format; 
info bits 0
  0: len 4; hex 000036c2; asc   6 ;;
  1: len 6; hex 00000000128c; asc       ;;
  2: len 7; hex 21000001362118; asc !   6! ;;
  3: len 8; hex 41736d6974683230; asc Asmith20;;
  4: len 0; hex ; asc ;;
  5: len 30; hex 
3a70626b6466323a7368613235363a31303030303a3132383a2f66545962; asc 
:pbkdf2:sha256:10000:128:/fTYb; (total 222 bytes);
  6: len 0; hex ; asc ;;
  7: len 21; hex 61736d6974683230406c6974746c657376722e6361; asc 
asmith20 at;;
  8: len 14; hex 3230313630393135303530303137; asc 20160915050017;;
  9: len 30; hex 
623061346535323762613365336462656133323035633666343564663163; asc 
b0a4e527ba3e3dbea3205c6f45df1c; (total 32 bytes);
  10: SQL NULL;
  11: len 30; hex 
396561386335613365663263623666353062303736646165393934393331; asc 
9ea8c5a3ef2cb6f50b076dae994931; (total 32 bytes);
  12: len 14; hex 3230313630393232303530303130; asc 20160922050010;;
  13: len 14; hex 3230313630393135303530303130; asc 20160915050010;;
  14: SQL NULL;
  15: len 4; hex 80000000; asc     ;;
  16: SQL NULL;

TABLE LOCK table `cdotwiki_db`.`mw_watchlist` trx id 4748 lock mode IX
RECORD LOCKS space id 70 page no 157 n bits 624 index user_name of table 
`cdotwiki_db`.`mw_user` trx id 4748 lock_mode X locks rec but not gap
Record lock, heap no 558 PHYSICAL RECORD: n_fields 2; compact format; 
info bits 0
  0: len 8; hex 41736d6974683230; asc Asmith20;;
  1: len 4; hex 000036c2; asc   6 ;;

TABLE LOCK table `cdotwiki_db`.`mw_logging` trx id 4748 lock mode IX
TABLE LOCK table `cdotwiki_db`.`mw_recentchanges` trx id 4748 lock mode IX

and getting no closer to figuring it out. In the end – how I found the problem was a single log line on a debug instance of the server. What an adventure!


The bug in Asunder happened because I ignored the warnings in the glibc manual that told me to keep unnecessary code out of signal handlers. I did not at the time know (or even consider) that printf could lock some kind of global structure, which could eventually cause a deadlock.

The bug in MediaWiki happened for the same reason, except they ignored the MySQL manual: “keep transactions that insert or update data small enough that they do not stay open for long periods of time”. I’m sure their code is a lot more complicated, but at the end of the day – they are sending an email in the middle of an SQL transaction, which is just a disaster waiting to happen. There’s no way I’m the only one who ran into this problem.

I’ll report the bug and we’ll see if they take it as seriously as I took my Asunder bug.

by Andrew Smith at September 17, 2016 06:29 PM

September 16, 2016

Matt Welke

Checking Out Elasticsearch

One of the technologies said they currently use is Elasticsearch, so we needed to set out to learn what that was and how to integrate it into what we are creating. At first I thought it was just some Amazon proprietary technology. Not that this mean’s it’s bad, just that I get more excited about open source technologies, because if I learn how to use them once, I can use them freely in anything else I create, be it open source or other projects at work.

I was pleasantly surprised to learn that it was actually a pretty stable, useful, open source framework. It’s basically a program you run that abstracts out a lot of the complex data analysis and machine learning type stuff, so that you can do queries against it similar to how you’d do queries against a database. “Show me articles that are similar to this one…” etc. It uses its own data store to perform these complex queries and filters from. So some people will just use the Elasticsearch data store as their main database, or they just need to figure out how to sync their database with it if they’ve already got something started and they want to add Elasticsearch to it. This is our situation. has a primary data store in their CMS containing their articles, etc. They already sync it with Elasticsearch, and we’ll need to make sure we integrate our information that way too if that’s applicable.

I spent some time today learning about Elasticsearch as an open source framework, that way when it comes time to use Amazon Elasticsearch, Amazon’s implementation of it that already uses in production, I’ll better understand it. They have an an official book which is available online for free, and I was able to find some nice YouTube videos which show you how to dive in. So far, my team mate and I imagine our system being a combination of querying the data we collect and the stats crunching Elasticsearch can do, to return really useful information for the client. I’ll be focusing a lot on continuing to learn and practice using Elasticsearch over the next little while.

by Matt at September 16, 2016 08:27 PM

Anderson Malagutti

iOS 10: Unlock iPhone without having to press the home button

Since the iOS 10 update, users are now ‘forced’ to press the home button to unlock their iPhones (by default); however, I’ve just found a way that you can change it, and make your iPhone unlock process work as well as on older iOS’s if you have Touch Id.

It’s very simple.


Then you’ll have to check the option REST FINGER TO OPEN.



After that you’ll be able to unlock your iPhone just resting your finger on the home button as you probably used to do on the iOS 9.🙂


by andersoncdot at September 16, 2016 03:49 PM

September 15, 2016

Henrique Coelho

Planning and technology limbo

In the last few days we spent a lot of time planning the system: iterating over the database schema, how to implement the APIs, how to implement the client modules, and how all these pieces fit together. This often means that one part will influence the other, until we finally find a setting that fits together and works.

I usually enjoy this part of the project, planning involves a lot of thinking and good strategy – like solving a puzzle, but it can be very stressful sometimes. What I don’t like about planning is that it takes time, and during this time, you end up floating in limbo: you can’t make concrete plans because you don’t know if they will hold up in the long term. The technologies we are considering now for the project are MySQL, AWS Lambda + Gateway, and AWS Elastic Search.

The capabilities of PostgreSQL that I described in the previous post seem to be supported in MySQL 5.7, which makes it a suitable candidate for a database; however, we need to make sure it is capable of enduring the traffic. For the past few days, I’ve tried n times and failed n-1 times (for now) to create a suitable testing scenario for MySQL. The scenario is simple: set up a VM with MySQL and bombard it with hundreds of millions of rows (with numbers and JSON data) and see what happens – if it behaves as it should, we query it until it breaks. Seems simple, but the universe was trying to thwart my plans (and succeeding) in the past few days:

  • 1st try: The internet was strangely slow that day, when I started the download of the VM. One hour later, it finished: the download was corrupted and I had to start over.
  • 2nd try: VM installed, but the version of the MySQL was wrong and I had to update it – obviously, I broke the installation beyond repair and I just rebuilt a new VM.
  • 3rd try: VM installed and MySQL updated. I also made a little, neat script that inserts batches of 1,000 random registers in the database and let it run for a while. The result: 55,000,000 rows inserted. “Great! Now I can start testing” – I messaged myself mentally. After some informal tests, it was time to go home and I had to insert more registers; we decided to let the script run overnight, but first, “does it stop when we lock the computer?” – we thought, and decided to try. Any sensible person would backup the database before doing this, but 55 million rows really takes a while to download; besides, we are beyond sensible, so we locked it anyway: that’s how we corrupted the third VM.
  • 4th try: We quickly set-up the database again (just made a new table) and left the scrip running overnight. During the commute, we were betting the results: I bet 80% on the VM being corrupted, 15% the script broke somehow, 10% someone turned the computer off, and 5% it worked – the fact that it sums up to 110 does not matter, what matters is the general idea. The database was corrupted.
  • 5th try: New VM made, we left the script running for a few hours (reaching around 80 million rows) until the VM completely ran out of space; with a few adjustments, we increased the space and let it run a little more. Tomorrow we will run it again to insert more registers.

So that was the MySQL saga until now. The other limbo that we are floating in is the technology regarding the API: the client suggested that we used AWS Lambda + Gateway, and maybe AWS Elastic Search. These services are really nice (I will probably post about them, if we get to learn more about it), but Lambda + Gateway seem to be a very simplified “framework” for an API – I am afraid that in the future we will have to modify it to be more robust and it will just not work. Although I would like to use them, I fear that the bureaucracy of AWS and its not-intuitiveness will hurt us more than help.

by henrique coelho at September 15, 2016 11:01 PM

Matt Welke

Refining the Schema and Media Queries

Today we took another look at the schema for our back end API to iron out what it was we would be storing. With some insight from our team lead, who is experienced with web development, we identified more things we should store, but also ways to simplify the schema we had originally developed by storing less. For example, we originally planned on storing a “hit” when a visitor loaded a page and then coming back to that hit to add more info when the user did other things. But we’ve now reduced it to a simpler model that acts more like a log. Once a hit is written, it’s written. We don’t muck things up by revisiting its entry. I think the simpler we make the schema from the beginning, the less problems we’ll encounter later on.

I plan on investigating media queries thoroughly. They’re used to find out information about the client’s device and environment. And they can be very powerful. For example, they can tell when a device is low resolution or high resolution, and decide to send either a desktop or mobile version of the website to the client to view. But, they can tell when a client is a mobile phone. So if it’s a modern mobile phone with a high resolution, it wouldn’t send the desktop version because that wouldn’t make sense to view on a phone. We can mine this for our analysis. It’s a wealth of information that can help us learn how’s visitors are viewing the site.

by Matt at September 15, 2016 08:26 PM

September 14, 2016

Matt Welke

Data Mining Scenarios and MySQL

Today we continued to look into how to store the data we will be mining. We looked at our schema and decided it would be a good idea to abstract out our “visit” entity. A visit is going to be a series of hits (page views). They will be related to each other because they’re from the same visitor. We can link them together and to a user either by using the user’s login session or with cookies if they aren’t logged in. Instead of worrying at this phase about what constitutes a visit (Is it when they haven’t returned for another hit in over a half hour? Is it that a visit ends when they close their browser window?), we can just calculate the visits later on with parameters. We can link the hits together with these parameters to present the visits given those parameters. This may for example be an SQL view instead of the results of querying an SQL table.

After learning that the client would prefer we not use Postgres and instead use MySQL or SQL Server, we decided to benchmark MySQL, specifically how quickly it can read JSON data types. It turns out modern MySQL also supports JSON as a native type. It can even do indexes on JSON columns to speed up read queries. Coupled with the fact it is also free and open source, we’re mainly looking at it at this point for our back end. Given the log data we got from the client, where we learned how many unique visitors they get per month, we ran some scenarios to find out how many queries per second our system would need to handle. We’re still well within doable range. We’re confident we’ll be able to get a system ready that’s capable of handling the queries, with room to grow for the future.

by Matt at September 14, 2016 08:31 PM

Back End Routes and Promises

Today I contributed to work on the back end system by working on the routes. This meant creating a file for each HTTP verb in each folder for every model we expect to have in our application. For example, we expect to log visits, so we needed a “visits” controller and we needed to fulfill the read all, read one, create, update, and delete actions (or modes). A particular combination of HTTP verb and the presence of an “id” parameter implements these:

  • Read all – GET without an id
  • Read one – GET with an id
  • Create – POST without an id
  • Update – POST with an id
  • Delete – DELETE (with a mandatory id)

Using the HTTP verbs means we can have shorter, simpler route names. We can use GET for read one, and POST for update, so that we only need a “/visits” route, instead of having two routes: “/visits/read” and “/visits/update”. We still have to make two routes, but shorter route names are always nice.

I again learned more about Node.js today as I tried to link all the routes into the application, and debug why they weren’t registering when I ran the server on our local machines. I learned about using “require” and the “module.exports” object in Node.js. I also learned about promises, and how they can help you code asynchronously without using callbacks. Thank you Mozilla Developer Network! ^_^ Instead of callbacks, you can chain JavaScript function calls:

new Promise(
    (resolve, reject) => {
        if (finishBlogPost()) // maybe this isn't successful?
    // this runs if resolve() was called
    (successValue) => {
    // this runs if reject() was called
    (failureReason) => {

by Matt at September 14, 2016 03:29 AM

September 13, 2016

Jaeeun Cho

Implemented Actions-dropdown list for actions button

I've worked with dropdown components to change reusable component for a long time but it is refactored. Because the components are strongly coupled and dependent. I thought I understood what coupling and dependency are, but I was wrong. Knowing and understanding are totally different.

Dependency is the relationship between two classes. If a class Vehicle uses a class B, then A depends on B. If a class A cannot be reused without a class B, then a class A is dependent and a class B is dependency. According to this, my components have problems as a reusable component.

My next task is to implement actions button with reusable dropdown components. When I did with new component, it was more convenient than old version of dropdown. I didn't need to pass a lot of props in component. I'm still figuring out how new dropdown components work as a reusable component.

It has some issues related to the font and icon style. It will update later.

I'm also studying about Internationalization API in ECMAScript and why we need to use this api for handling messages. ( and underscorejs (

by Jaeeun(Anna) Cho ( at September 13, 2016 11:28 PM

Laily Ajellu

Input Forms Accessiblity - Get "A" certified

In this post we’ll be discussing how to create accessible input forms to reach level A of accessibility certification.


In total, there are three levels of accessibility certification for your web app:
  1. A
  2. AA
  3. AAA
By law, you must be A certified if you are:
  • a private or non-profit organization with 50+ employees; or
  • a public sector organization
Starting in 2021, your app must reach level AA

Input Fields

One of the factors for achieving A level is Error Identification, here's how to make error identification more accessible:
  1. Notifying a User when they have incorrect input
    It's best to programmatically connect an error message with a failed input field. But if you can't, set aria-invalid = "true" on the failed fields
    • It should not be set to “true” before input validation is performed or the form is submitted.
    • If it’s done programmatically, you’re not required to use aria-invalid
  2. Displaying error messages:
    To display an error message, give it role="alertdialog"

  3. How to design the Form:
    • Give each input field an HTML5 label tag, so that it will be read by the screen reader
    • Place each label beside the field so users that use zoom will be able to see both at once
    • Give examples for input that needs to be formatted a specific way like dates and addresses
    • Use aria-labelledby to tell the user a field is required. It doesn’t mean you can’t use asterisks, color or other visual cues, you just need both.
    • Group together related input fields visually but also using roles. For example: put the your radio button input fields into a div with the role="radiogroup"

If you want to explore these concept in more detail, see the Reference below!

Reference and Credit: - How to make Websites Accessible

by Laily Ajellu ( at September 13, 2016 10:55 PM

Matthew Marangoni

Hardware & Software Troubles

Unfortunately, I've had a lot of hardware problems in the past week. My hard drive began to fail early in the week, and after relocating my desk, my first thought was to perform a drive clone to avoid have to reinstall Windows and my Virtual Machine + other software. I soon learned that unless you have powerful hard drive cloning software (like the ones used for forensic investigations on hard drives), it is near impossible to clone a drive using free cloning software once it has bad sectors. Cloning is generally performed on healthy hard drives. I had read some suggestions that doing a Windows backup and restore was a good alternative to cloning, this however also failed due to the bad sectors. It appears that once a clone or backup operation encounters a bad sector, it does not know how to handle it (ideally it should skip/ignore these sectors and continue, or attempt to recover them if the software is advanced enough).

After having no success with cloning or backups, I decided I would just make a copy of my virtual machine so that I would only have to reinstall Windows and could keep my dev environment intact. Once Windows was installed, along with all my other required software and drivers I attempted to load and restore my virtual machine. To nobody's surprise, this also failed because VMware Player failed to recognize any of the files as a valid virtual machine (files that were created by VMware Player itself; so I don't know what could have caused this issue). So once again I had to reinstall my VM from scratch which brings me to my current state where I can start to resume work on the BigBlueButton project again.

One final thing I attempted - since I had upgraded to three smaller solid state drives, it was suggested to me that the best thing to do was set them up RAID 5 so that I could have two hard drives to work with and one in case of a failure. After reviewing the process on setting up hard drives in RAID 5, I learned that it is impossible to set this up in Windows 7 without a separate hardware RAID controller. The next best solution would be to set up RAID 0, but for some reason Windows installed the reserved boot sequence on SSD 0, and did the rest of the windows installation on SSD 1, making RAID setup impossible without a complete Windows reinstall. I am continuing to work now with the 3 separate hard drives since restarting this process to fix the Windows install for RAID would consume more time than it's worth.

by Matthew Marangoni ( at September 13, 2016 03:59 PM

Henrique Coelho

Why PostgreSQL is awesome

I was supposed to post this update on Friday (September 9th), but I forgot, so I decided to post it on Saturday (September 10th), but I forgot again; so then I decided to post it on Sunday (September 11th) and I forgot again; so I’ll post it today.

One of the most important (or maybe the only one) features behind the popularity of NoSQL is its ability to store data without a schema, in other words: forget about tables, you store anything you want in JSON. This flexibility comes in really handy when the data you need to store is a bit unpredictable, but still needs to be indexed and searched – normally we can overcome this by doing some complicated workarounds with the schemas, but that is where NoSQL really shines. Where it doesn’t shine, however, is where you actually need relational schemas and organizing data in a very cohesive way.

Myself, I’ve never been a big fan of NoSQL: I love JSON, and I love to store information in JSON, but NoSQL never gave me the confidence of actually being reliable; thankfully, newer databases already support similar features for storing JSON. PostgreSQL accepts the data formats JSON and JSONB, which recognizes and treats JSON objects as actual columns.

For instance, the entry below contains a JSON object with the data of a person called John Doe, 54 years old, that lives in Toronto/Ontario.

TABLE people

 id | doc
1   | {
    |   “city”: “Toronto”,
    |   “province”: “Ontario”,
    |   “person”: {
    |     “name”: { “first”: “John”, “last”: “Doe” },
    |     “age”: 54
    |   }
    | }

His first and last name could be retrieved using the following SQL query:

SELECT doc->'person'->'name'->>'first', doc->'person'->'name'->>'last'
FROM people WHERE id=1;

The syntax is fairly simple, and almost self explanatory, only with one detail: the arrow ‘->’ is used to retrieve a value of the object, while ‘->>’ retrieves it as a string.

The nice thing about this feature is that SQL can now be a mix of both worlds, it also means that instead of pulling query results from the database and computing/filtering them in the API, if necessary, this can be done directly in the SQL statement.

by henrique coelho at September 13, 2016 03:17 AM

Matt Welke

Getting Good at Git

Git has turned out to be so much more useful than I ever could have imagined. Today I again spent my time mostly learning as we started creating the back end system. We’re using WebStorm for a Node.js/Express.js project and I got to test my new Git knowledge and see how it all fits together. I can safely say my days of sharing code by tossing a USB drive or uploading a zip to Dropbox are over.

The tables in our plan for the back end are numerous, meaning many routes are needed, which I created today, while my team mate worked on unit testing and using some tricks he’s used before to streamline the code we’ll have to write for each route. Node.js concepts like middleware, promises, and generators are things he’s using to make our code look cleaner and reduce the amount of it we’ll have to write.

by Matt at September 13, 2016 12:13 AM

September 09, 2016

Matt Welke

Getting To Know Node

Today I spent a lot of time getting familiar with our chosen back end framework, Express.js. I’ve used other MVC web frameworks before, so it didn’t take long to get used to Express. Aspects like views, models, controllers, and routes were already familiar to me. However, I need to spend time getting familiar with Node.js itself. It has unique development tools, commands, ways of managing dependencies, etc.

It also is more modular than other web frameworks I’ve used. To compare it with Ruby on Rails, my usual go to framework, it gets out of your way and does nothing for you. No code is generated for you. You need to choose which modules to pull in and create variables to represent your server that you want to use. I’m used to a certain project structure enforced, where the frameworks expect certain files in certain folders, arranged a certain way, with a lot of stuff happening behind the scenes. Rails forces you to make controller classes in a controllers folder, but if you do, things just connect together. Node expects you to manually create routes and pass in functions as arguments as you do, which become the controller behaviour.

There are other concepts often used in JavaScript programming like closures, asynchronous programming, and AJAX that I looked at today with the help of my team mate, who is much more experiences with JavaScript than I am. I’m sure I will have no problem becoming familiar with them soon.

We also continued to look into our database schema for when we will be logging user data and looked into our choice of database technology. It turns out PostgreSQL has support for JSON data as a native type. It can query JSON objects and perform all the complex and powerful analysis we’re used to with SQL, without pulling the JSON object out (as a string) and storing it in memory. This could have an incredible impact on how quickly our system can help correlate things for the users and recommend articles etc for them.



by Matt at September 09, 2016 09:27 PM

Benchmarking Back Ends

Today we started looking into options for creating our back end API, which will be used to log the information we track about’s visitors. The back end will receive the data using RESTful routes. This gives us a lot of flexibility, since REST is a standard implemented by basically every web framework out there. Four frameworks came to mind as feasible:

  • Express.js (running on a Node.js web server)
  • Rails API
  • Good old, plain PHP (running on an Apache web server)

I love the sheer speed at which a developer can prototype and deploy things with Ruby on Rails, and my team mate has a strange love affair with JavaScript. However, we aimed to be objective and not choose to use one programming language or framework simply because we anecdotally liked it.  We decided to benchmark. Spoiler alert! I, for one, welcome our new JavaScript overlords.😀

We looked at the number of transactions per second the back end could accept, where a transaction was a request that involved accessing a database. We didn’t have time to properly test all the options (Rails was tricky to get set up), but some specific Express.js vs. PHP results so far can be found on my team member’s associated blog post.  In summary, we found that Express.js on Node.js ended up being incredibly fast, even faster than we thought it would be at first. We actually expected Apache with PHP to be quickest. We thought of it as the “C” of web programming languages. It’s simple and low level. It ended up being the slowest and the quickest to fail when the number of concurrent requests grew.

We didn’t get around to testing ASP.NET (and to be honest, we think we’ll ditch it anyways since we want to stay open source), and Rails API ended up performing somewhere in the middle. However, Rails needs to be further tested since we weren’t able to get it fully set up to take advantage of multiple CPU cores or even access the database. In all cases, we tested these against a PostgreSQL database. We strongly believe we’ll end up going with PostgreSQL because of its reputation as a powerful, stable, open source option.

We still have much to do here. We need to create realistic benchmarks. Real users don’t click on things or scroll their mouse wheel thousands of times per second. We need to create benchmarks that reflect the way people read articles online and engage by commenting, sharing, etc. Preferably, they would reflect the way visitors visit the site. Perhaps we’ll continually benchmark as we gain information by starting to monitor the users and make the benchmarks more accurate over time.

by Matt at September 09, 2016 04:13 AM

Henrique Coelho

Benchmarking PHP5 x Node.js

Long story short: one thing we did today was thinking what would be best language/framework to build an API: it should be stable under heavy load, fast, and capable of cpu-intensive operations; we ended up with 2 alternatives: PHP5 and Node.js and decided to do a little benchmarking to find out which one would be the best.

For the first test, we set up a server with virtual machines of Apache + PHP5 and another with Express + Node.js and used Siege, a stress tester, to benchmark both servers. Siege creates several connections and produces some statistics, such as number of hits, Mb transferred, transaction rate, etc. For both servers, we used 4 combinations of settings:

  1. 1 core and 1,000 concurrent users
  2. 4 cores and 1,500 concurrent users
  3. 1 core and 1,500 concurrent users
  4. 4 cores and 1,500 concurrent users

The tests consisted in a very simple task: receive the request of the user, perform a SELECT query in a database, and return the raw results back – we tried to keep the tests as similar as possible. The database used was PostgreSQL, located in another virtual machine.

These are the source codes we used for the tests:


var express = require('express');
var pg = require('pg');

var config = {
  user: 'postgres',
  database: '...',
  password: '...',
  host: '...',
  max: 10,
  idleTimeoutMillis: 30000

var app = express();
var pool = new pg.Pool(config);

var query = 'SELECT * FROM testtable;';

function siege(req, res, next) {
    pool.connect(function (err, client, done) {
        if (err) throw err;

        client.query(query, function (err, result) {
            if (err) throw err;

app.get('/siege', siege);

app.listen(3000, function () {
  console.log('Example app listening on port 3000!');


$connection = pg_connect("host=... dbname=... user=... password=...");
$result = pg_query($connection, "SELECT * FROM testtable");
echo $result;

These are the results:

Result 1 core
1,000 users 1,500 users
Node.js PHP Node.js PHP*
Number of hits 39,000 4,300 2,000
Availability (%) 100 95 66
Mb. transferred 11 0.06 0.56
Transaction rate (t/s) 1,300 148 800
Concurrency 655 355 570
Longest transfer (s) 0.96 28.14 1.16
Shortest transfer (s) 0.08 0.15 0.11
Result 4 cores
1,000 users 1,500 users
Node.js PHP Node.js PHP*
Number of hits 55,000 5,100 14,000
Availability (%) 100 98 93
Mb. transferred 16.02 0.07 4
Transaction rate (t/s) 1,800 170 1,700
Concurrency 19.6 424 73
Longest transfer (s) 0.4 28.16 1
Shortest transfer (s) 0 0 0

* Aborted (too many errors)

I really was expecting the opposite result, Node.js seems to be incredibly fast in comparison to PHP for these operations.

For the next test, we tried to focus on cpu-intensive operations by running the following algorithm that searches for the first N prime numbers (yes, they could be optimized, but the purpose of the test was to make them cpu-intensive):


var express = require('express');
var app = express();

app.get('/', function (req, res) {
    function isPrime(num) {
        for (var i = 2; i < num; i++) {
            if (num % i === 0) { return false; }
        return true;

    function display(n) {
        var count = 0;
        for (var i = 3; i < n; i += 2) {
            if (isPrime(i)) { count++; }

app.listen(3000, function () {
  console.log('Example app listening on port 3000!');


function isPrime($num) {
    for ($i = 2; $i < $num; $i++) {
        if ($num % $i === 0) { return false; }
    return true;

function display($n) {
    $count = 0;
    for ($i = 3; $i < $n; $i += 2) {
        if (isPrime($i)) { $count++; }
    echo $count;


My expectations were that PHP would perform much better for this kind of tasks. These were the results:

Result 70,000 numbers 100,000 numbers
Node.js PHP Node.js PHP
Seconds 2 26 2.5 Timed-out after
~33 seconds

I don’t know what to think anymore. I guess we are not using PHP.

by henrique coelho at September 09, 2016 12:50 AM

September 08, 2016

Catherine Leung

Summer 2016

Once again, it is almost that time of the year for school to start once again.  The summer has been an interesting one and this blog is a reflection on some of the things I did.

I had a good number of summer projects planned… but I only really got around to one of them.  This summer, I wrote a guide to using p5.js with my cousin Ben, who teaches at an international school in Hong Kong.  We wanted something that a teacher could use with their students in the classroom. We decided to write the guide using an online publisher named gitbook (I love gitbook for writing notes for my students.  Write it once with markdown, get it published to web, pdf, epub and mobi… awesome)

I had actually started this project back in February.  I got to about chapter 3 and I hated what I was doing with it.   I felt that it was very wordy, too much reading, not enough getting to the fun programming parts.  I remember learning to program when I was a kid.  I didn’t want to read about how things were done.  I didn’t care about the background of BASIC…  I just wanted to write programs to make my computer do things.

After talking things through with Ben, we decided to take a different approach to our project.  What is the minimum amount of background info/setup we need in order to get started?   How can we allow someone to write code with as little setup as possible? It turns out that we only need to write about 3 paragraphs, include a picture guide, add a link to a video and use an amazing web based editor.

Sometimes, I teach introduction to programming and the first week typically involves explaining how to set up the development environment.  It takes time to do this.  How to get the compiler.  How to get an IDE.  how to claim your unix account.  Where to find your text editor.  The joys of pico/nano (don’t laugh too hard…it was the first editor I learned how to use on unix…)…vi, emacs, gcc, vs, xcode… its a lot of setup.  I know a lot of us take this stuff for granted but think about what happens when you get a new computer… getting your dev environment set up is not a fast process.  So, how do we simplify this as much as possible?  How do we get to the fun parts as quickly as possible?

It starts by choosing tools that will minimize the setup.  p5.js is a JavaScript library.  To use it, you need to get the library files from  You need to set up an html page and you need a JavaScript file to write the script in.  After you set up your html page, you actually generally do not modify it.   You only need to edit your js file so even though you absolutely need the html page its not actually part of the program you are writing.  For tools you typically need a web browser and an editor. This is not a lot… but if you are first starting or if you are in an environment where what you are allowed to put on your machines is limited every extra thing you need to do before you start coding makes it that much harder to start.

To help simplify this setup, we decided to use Mozilla’s Thimble editor.  It is an html/css/js online editor.   It also allows you to publish your work. By doing this, we eliminate the text editor (and if you want to publish your work, we eliminated the webserver too).  Using Thimble means that the only application we need is a modern web browser.

Furthermore, and this is the really cool part, using Thimble means that we can actually setup the basic p5.js project.  Ben and I created an account on Thimble.  We then set up thimble project with all the files need (the p5.js lib file, the html file and a stubbed out  JavaScript file for people to write in).  The JavaScript file contains some starter code for the p5.js sketch.  Thimble also allows us to write our own tutorials.  Thus, we can write instructions on what to do inside thimble.  We then publish this project (one button inside thimble).  We get the link off of the Remix button from the published page and put that link into our project book.  Each chapter of our project book contains a goal (typically an image) to show what we are aiming for.  This is immediately followed by a link to the related thimble project remix.  The remix contains instructions (typically where to write the code, what to write).  In otherwords, all you need to get started is to click a link!  No other setup.

The guide then continues on with more detailed explanations for those who want to know the why for each of the topics covered.  Towards the end of the guide, I added the chapters about how to setup your own sketch outside of thimble and some background material.

There is still a lot of work to be done on our guide for sure.  Currently we have only one very basic project.  We will add more in the future but I’m pretty happy with what we have done so far.  You can access our guide here

On a more personal note, I started the summer by helping my parents out for a bit at their restaurant.  Its very different from my usual job to say the least.  My part of the work was not really hard but the hours are quite long.  All I can say is how much I respect my parents for doing it.  I know how hard  they have worked all these years to raise my brother and I.  I am forever grateful.
I am also continuing to decorate my new place.  This summer’s decorating involved the balconies, one of the best features of my new place.  I grew some strawberries, some herbs, and some cherry tomatoes (why are the leaves drying out ? there is plenty of water. help!). I even put in a couple of chairs.

I also made a few pieces of pottery this summer mostly for myself.   One of them is this garlic jar.  I am rather happy with it.


by Cathy at September 08, 2016 03:35 PM

Matt Welke

Data, Data, Data

The climax of today was getting to meet the client, engineering news publisher But while we waited until that meeting, my team mate and I re-watched some machine learning lectures from a Coursera MOOC we took. The lectures describe techniques for analyzing data and making recommendations or sorting the data.

For example, TF-IDF (term frequency-inverse document frequency) helps you recommend similar things (like news articles) based on how relevant they’re calculated to be in the context of every item in a system, and clustering helps you optimize things by pre-sorting the items based on their properties so that the recommender system won’t have to search everything to find the most appropriate recommendation.

After the meeting with the client, we learned about how they want to increase the amount of article reading their visitors do, and target the users more effectively with mailing list emails to help them drive revenue. It sounds like my team mate and I had the right idea as we prepared. We’re going to have to log as much data as we can to uncover new things we can learn about’s users, and understand their behavior. Then, we can help them refine their current article recommender system (which is built into the web app framework they use for their site), or perhaps help them build a new, more powerful recommender system able to cooperate with their current system. Sounds fun!

by Matt at September 08, 2016 03:33 AM

September 07, 2016

Andrew Smith

Everyone disables SELinux, it’s annoying!

Everyone disables SELinux, it’s annoying!

Hah! I’ve been saying that for years and years, but the quote above isn’t mine, it’s from Scott McCarty‘s security talk at LinuxCon 2016. The room was full of other Linux Pros and the statement was followed by way more nods than grimaces :)

SELinux zealotry reminds me of the free software fanatics, emacs nutters, and other such special people. Why not tell it as it is? Why tell everyone to enable SELinux when you know, in your heart, that will cause them way more trouble than it will save them from?

Thanks Scott! I feel vindicated.

by Andrew Smith at September 07, 2016 07:09 PM

Matt Welke


My first day! I was nervous, but I think I was mostly overthinking things. Today was productive. I set up my workstation and got to know some of the people at CDOT. It’s hard to do anything on our project right now because we haven’t yet met with the client, but in a way, we’re already ahead of the curve.

We learned that the client uses ASP.NET for their current application. While investigating ASP.NET, we managed to get an ASP.NET 2.0 web application created in Visual Studio running in a Linux environment using the Mono framework on Linux Mint. Our workstations are also now able to develop ASP.NET applications using an IDE called Project Rider. Mono officially supports up to ASP.NET 4.5, but for now we were only able to get up to 2.0 working. With the rising importance of using and building open source software, this bodes well for our mission to create excellent open source software to help our client.

I can’t wait to begin the project itself!

by Matt at September 07, 2016 02:18 AM