<![CDATA[The Dot Product]]>http://thedotproduct.org/Ghost v0.4.2Mon, 17 Aug 2015 21:02:10 GMT60<![CDATA[Migrating to a statically served Ghost blog]]>The blog you're reading is produced by Ghost which previously I'd served in a pretty typical way - an AWS instance running Node and a local MySQL database server. This was fine for quite a while but recently, there has been a spate of security issues in, for instance, OpenSSL. This and an increasing cost of hosting on AWS which frequently triggered my AWS billing alerts (incidentally, if you run anything on AWS, I'd strongly recommend you set these up) led me to reconsider how I host my blog.

After some thought, the strategy I came up with is static serving via AWS S3. I also wanted to add a CDN in order to better serve readers who are further from my hosting origin, so I took the opportunity to plan this in as well. Static serving gives a number of advantages, chiefly:

  • Much lower hosting costs
  • A massively reduced attack surface
  • Zero-downtime upgrades are much simpler

But also the disadvantage that the admin/editing side of my blog is now not available over the internet. This is not a problem in my situation as I am the only contributor.

The new solution involves me running VirtualBox (on my MacBook Pro) which hosts a Debian virtual machine onto which I installed:

  • IOJS (compiled from source - instructions in the source readme)
  • Percona (installed from the Percona Debian repo)
  • NginX (installed from the NginX Debian repo
  • NTP (from the standard Debian repo: sudo apt-get install ntp). This helps keep the VM system clock accurate - S3 uploads will fail if it's drifted.

I then had to download my Ghost website files (using tar + gzip: tar -czf blog.tar.gz /path/to/files) and database (via MySQLDump) and install them on the VM (via SCP), restoring the database locally. I then amended my Ghost config to point to the new, local database server and also configured Ghost to listen on localhost.

Once that was done, I set up NginX to proxy to the local, dynamic (i.e. the version I use to add posts) version of Ghost and also to serve a local, static version of the site so I can check it before publishing (simplified):

# Dynamic, local version:
upstream ghost {  
    server 127.0.0.1:2368 max_fails=0 fail_timeout=10s;
}

server {  
    listen 80 deferred;
    server_name thedotproduct.org;
    server_tokens off;

    gzip on;
    gzip_min_length 1024;
    gzip_proxied any;
    gzip_types text/plain text/css application/javascript application/json application/xml application/octet-stream;
    location /
    {
        proxy_set_header Connection "";
        proxy_http_version 1.1;
        proxy_pass http://ghost;
    }
}

#Static, local version:
server {  
    server_name static.thedotproduct.org;
    listen 81;

    location / {
        index index.html;
        alias /var/www/static.thedotproduct.org/127.0.0.1/;
    }
}

A little port forwarding on VirtualBox from 127.0.0.1:9080 (on my host MacBook Pro) to the VM guest on 10.0.2.15:80 and from 127.0.0.1:9081 (host) to 10.0.2.15:81 (guest) to allows me to access the dynamic Ghost instance via http://127.0.0.1:9080/ and the static instance on http://127.0.0.1:9081/ - nice!

So I then needed to be able to reliably and automatically (via CLI script) create the static version of my Ghost blog. For this, I wanted to keep things as simple as I could so I turned to wget which is installed by default on most Linux distro's. To cut a long-ish story short, I simply used the wget arguments -m (mirror) and -p (download assets) to recursively download the whole blog (running this on my VM). This worked just fine but had a small issue with assets which I am versioning via a query string - wget added the query string to the filename it downloaded and since NginX didn't know this, it was looking for static files without the query string in the filename - not good. I couldn't find a solution natively with wget (maybe I should have tried cURL?) so I did a little bit of googling and adapted some findings to create a solution using find and its -exec argument to remove the query string recursively from any downloaded local assets.

Once this was done, I finally needed to be able to upload my files to S3 so I used the AWS CLI (with an argument wrapper so I can easily first check locally before pushing to S3) in my publish script. The publish script now looks like this (it's really simple and somewhat rough and ready at the moment):

#!/bin/bash

# vars
HOSTNAME="thedotproduct.org"  
STATIC_DIR="/var/www/static.thedotproduct.org"  
STATIC_ASSET_DIR="/root/tdp-local/static-assets"  
LOCAL_SCHEME="http://"  
LOCAL_ADDRESS="127.0.0.1"  
LOCAL_PORT=80  
S3_BUCKET="thedotproduct.org"  
WEB_SERVER_GROUP="www-data"  
MAX_AGE="3600"

# Crawl local version of website
mkdir -p $STATIC_DIR  
cd $STATIC_DIR

# wget the local dynamic website content...
wget -m -p --header="Host: $HOSTNAME" -q $LOCAL_SCHEME$LOCAL_ADDRESS:$LOCAL_PORT/

# ...and some static asset files which aren't linked in the HTML
cp -R $STATIC_ASSET_DIR/* $STATIC_DIR/$LOCAL_ADDRESS/

chgrp -R $WEB_SERVER_GROUP $STATIC_DIR  
chmod -R g+rx $STATIC_DIR

# Remove query strings appended to filenames
find ./$LOCAL_ADDRESS -name "*\?*" -exec rename 's/\?.*$//' * {} \;

# Upload to S3 if "upload" is the first arg
if [ "$1" = "upload" ]  
then  
aws s3 sync ./$LOCAL_ADDRESS/ s3://$S3_BUCKET --acl public-read --delete --cache-control "public,max-age=$MAX_AGE"  
fi  

So I can simply name the above as e.g. publish.sh and mark it executable (via chmod +x publish.sh) and just run that first to simply check locally: ./publish.sh and then when I'm comfortable it's all good, upload to S3: ./publish.sh upload.

Note: S3 does not send Cache-Control headers by default, hence the --cache-control "public,max-age=$MAX_AGE" in the script above.

I had one last issue to solve, and to be honest, I had been putting it off as it's a bit awkward. My dynamic hosting had been set up to send an HSTS header with max-age of 1 year. This meant, essentially, that I had to serve the website over HTTPS as anyone or any crawlers which had visited the website in the past would require that HTTPS connection and would hard-fail if it was unavailable. S3 doesn't support direct HTTPS so it wasn't an option to serve direct from S3 and in any case, I wanted to add a CDN. The added complexity was that I serve the website on the zone apex (i.e. thedotproduct.org - no "www" or other sub-domain). AWS Cloudformation could have worked as they offer CNAME flattening via their "alias" records, but try as I might, I couldn't get it to serve my pages (which are all called index.html in a directory named for the blog post title slug). I then remembered that Cloudflare had announced support for (RFC-compliant) CNAME flattening so I headed over and configured a free Cloudflare account which also now includes a free TLS certificate without even the need for you to create a CSR and private key - double win! This worked really nicely with very, very little effort - just a delay (about 10 hours) while Cloudflare presumably aggregated some TLS certificate requests and created a SAN certificate which included my hostname.

Lastly, just worth mentioning that the Cloudflare TLS and web-server configuration is (as you'd expect), really nice and gives me an A-rating on Qualys labs SSL tester (also the newest strong ciphersuite - Cha-cha) and also IPv4 and IPv6 connectivity.

This new hosting method has reduced hosting costs from about $60/month to single-digit $'s. I'm not quite sure how much yet as it's only been active a week or two but it'll be an order of magnitude lower which is not to be sniffed at.

I'm going to ultimately look at creating a Ghost plugin to statically publish in the future (unless someone has already done it or beats me to it).

long post but I hope it helps explain one approach - it'd be great to hear other people's approaches and opinions.

]]>
http://thedotproduct.org/migrating-to-a-statically-served-ghost-blog/1d5777e6-7a30-452e-b611-47a174ceebadFri, 14 Aug 2015 15:33:30 GMT
<![CDATA[AWS: The availability zones of the specified subnets and the AutoScalingGroup do not match]]>I've just been working on an AWS cloudformation stack which sets up the infrastructure for my project. I usually deploy the stack to eu-west-1 but this time, we're testing some multi-region functionality so I was launching into us-east-1.

To cut a long story short, my AWS cloudformation stack kept bombing out with an error message:

"The availability zones of the specified subnets and the AutoScalingGroup do not match"

Hmmm...that's a bit cryptic. So, I had a look through my ASG (auto-scaling group) cloudformation config and saw nothing unusual, just the standard:

"Properties":  
{
  "AvailabilityZones":{ "Fn::GetAZs" : { "Ref" : "AWS::Region" } },
  "VPCZoneIdentifier":
  [
    {"Fn::FindInMap" : [ "subnetIDRegionMap", { "Ref" : "AWS::Region" }, "publicSubnet0" ]},
    {"Fn::FindInMap" : [ "subnetIDRegionMap", { "Ref" : "AWS::Region" }, "publicSubnet1" ]},
    {"Fn::FindInMap" : [ "subnetIDRegionMap", { "Ref" : "AWS::Region" }, "publicSubnet2" ]}
  ],
...

That's the same as I use in eu-west-1 with no troubles.

Most AWS regions have 3 AZ's (availablility zones) so my "core-infrastructure" cloudformation script has allowance for just 3 AZs, into each of which it creates a subnet. I wondered if perhaps us-east-1 had more or less that 3 AZ's, I was suspecting more as the error messages on AWS when items are missing are usually a little clearer that this.

It turns out that us-east-1 does indeed have more AZ's, 4 in fact - oddly enough, for me at least, they're labelled 1a, 1b, 1c and 1e - no idea what happpened to 1d.

So the fix is super simple, I just had to create a subnet for AZ 1e (luckily my VPC had just enough space in it's range for another /21 subnet) and then amend the ASG config my stack above to:

"Properties":  
{
  "AvailabilityZones":{ "Fn::GetAZs" : { "Ref" : "AWS::Region" } },
  "VPCZoneIdentifier":
  [
    {"Fn::FindInMap" : [ "subnetIDRegionMap", { "Ref" : "AWS::Region" }, "publicSubnet0" ]},
    {"Fn::FindInMap" : [ "subnetIDRegionMap", { "Ref" : "AWS::Region" }, "publicSubnet1" ]},
    {"Fn::FindInMap" : [ "subnetIDRegionMap", { "Ref" : "AWS::Region" }, "publicSubnet2" ]},
    {"Fn::FindInMap" : [ "subnetIDRegionMap", { "Ref" : "AWS::Region" }, "publicSubnet3" ]}
  ],
...

Easy! The stack then builds successfully.

So the error comes from my specifying:

"AvailabilityZones":{ "Fn::GetAZs" : { "Ref" : "AWS::Region" } }  

Which essentially tells cloudformation to build the ASG across all 4 AZs and since I was supplying only 3 subnets, the AZ set didn't match the subnets provided. So the error message makes sense...once you know/realise that!

]]>
http://thedotproduct.org/aws-the-availability-zones-of-the-specified-subnets-and-the-autoscalinggroup-do-not-match/d0f1c77c-9a9f-4149-b9d7-f7c8ac62e3e3Thu, 09 Jul 2015 12:01:02 GMT
<![CDATA[nginx: the difference between $host, $http_host & $hostname and priorities of $host]]>Nginx has a lot of system variables, this makes it really flexible which is great news. It also inevitably introduces some complexity, this is generally not a huge deal but in the odd place, the nginx docs are somewhat lacking which means you have to either google the issue you’re facing or investigate.

I hit this situation in my current project for something simple yet fundamental. What I needed to have some clarity on is the $http system variable. Let’s get one issue sorted straight off, the docs state that two similar nginx system variables $host and $hostname are:

$host
in this order of precedence: host name from the request line, or host name from the “Host” request header field, or the server name matching a request

$hostname
host name

So $hostname is a little scant on detail. It’s (at least on *nix) the FQDN machine hostname that you’d get by running (on the shell):

#hostname -f

Good. Let’s move on.

So, the more useful $host variable highest priority match looks to be 100% synonymous with the requested host in the HTTP request but actually, it’s a normalised version thereof. The normalisation of $host is to make it lowercase and to remove the port designation (e.g. :80).

It seems that there’s also another, undocumented and in fact higher priority for $host (I tested nginx 1.9.2, the current latest) which I found experimentally: if you’re using nginx with a proxy_pass directive to an nginx upstream, $host appears to take the name of the upstream as the highest priority.

For example, I have a VM running on my laptop which is accessed from my host OS via port forwarding (:9080 on host to :80 on guest), the :80 listener on the guest is nginx which has a server block for test.example.com which does an HTTP proxy_pass to an upstream named “prx” which in this case simply listens on :9999 and serves local, static files. When I access the guest from my host OS (via a record in /etc/hosts: 127.0.0.1 test.example.com) I see that $host contains “prx”.

add_header $host always;

# OUTPUT: prx

It’s unclear whether this is a result of my host to guest port forwarding or whether it’s a new priority which hash’t yet been documented or perhaps unintended functionality. Maybe in fact, I did something wrong. Comments are very welcome!

]]>
http://thedotproduct.org/nginx-the-difference-between-host-http_host-hostname-and-priorities-of-host/97fe8eeb-4939-4c51-8f01-8d45c0ed54eaFri, 03 Jul 2015 12:57:39 GMT
<![CDATA[NGINX vary header handling]]>I'll prefix this short article with a statement that this is not in any way a criticism of NGINX, just an observation which I have not found documented elsewhere so I wanted to write it down in case it helps anyone else.

Recently, in my day job, i've been working on a project which involves usage of NGINX as a caching reverse proxy - a task at which NGINX excels. Today, I was validating some of the detail on how NGINX handles HTTP headers when caching proxied HTTP content (in this case, mainly HTML web pages) - in our case we're doing this via proxy_cache. One of the functions I wanted to validate NGINX for was correct (as per RFC 7234) handling of the vary HTTP response header.

If you're reading this, likely you know exaclty what the vary HTTP header does but for anyone else, the vary HTTP header provides cache servers/services one or more HTTP headers whose values should be used in addition to the cache server/services defined cache key. The vary HTTP header thus effectively instructs the cache server/service to keep multiple copies of the content which may be compressed and uncompressed or perhaps geographically varying content.

Test 1 was no problem, the origin for which NGINX was proxying returned HTTP headers which indicated that the the proxied content should be cached:

...  
Cache-Control: max-age:30, stale-while-revalidate  
Vary: Accept-Encoding  
...

All pretty typical stuff, no surpise that NGINX handles that just fine and does indeed separately cache (for example) a gzip'd and an uncompressed version of the content once requests are made for each.

Test 2 was a different page which issued a larger set of headers on which to vary, including some of our custom response headers:

...
Cache-Control: max-age:30, stale-while-revalidate  
Vary: Accept-Encoding,Custom-Header-One,Custom-Header-Two,Custom-Header-Three  
...

So nothing too unusual, admittedly thought some of the HTTP headers are custom, but this time, NGINX did not cache the content, despite the above HTTP headers indicating it should. Begin investigation...

So, a colleague and I ran through to try to find the issue. At first, we had nothing to go on other than NGINX not caching but after a quick comparison of test case 1 with test case 2, the only significant change was the vary HTTP header. I'd also seen a few fixes recently stated on the NGINX change log which centered around lengths of fields/values. So we went with that as a starting point and fiddled the vary HTTP header value, adjusting its length. Lo and behold, we quickly found that NGINX would cache the content correctly if the vary HTTP header value was 42 or fewer characters in length. A quick search through the NGINX source code mirror on github for "42" showed the reason on the very first result:

#define NGX_HTTP_CACHE_VARY_LEN      42

This is in ngx_http_cache.h. A search for "NGX_HTTP_CACHE_VARY_LEN" shows a little more of the logic concerned in ngx_http_file_cache.c:

...
if (h->vary_len > NGX_HTTP_CACHE_VARY_LEN) {  
        ngx_log_error(NGX_LOG_CRIT, r->connection->log, 0,
                      "cache file \"%s\" has incorrect vary length",
                      c->file.name.data);
        return NGX_DECLINED;
    }
...

In effect, this logic means that if the vary HTTP header value is > 42 characters in length, it will be treated as if it had a value of "*", i.e. the content will not be cached. The violation if logged as per the above code snippet, it'll end up in the error log (assuming you have this enabled) at critical level.

The current NGINX vary HTTP header handling is described here but makes no mention of the maximum length of the header value.

As i noted earlier, this is not a criticism but I don't believe it's in any NGINX documentation so could perhaps catch people out.

HTTP headers do not have a standardised maximum lengths but most software which handles them imposes a limit - usually several (often 8 or 16) KBytes - so 42 Bytes seems a little low (maybe it's a dev joke on the meaning of life? :-)). I'd personally like to see this limit raised, perhaps to 256 characters so I'll do that as a patch on our build.

I'd also like to thank Maxim Dounin for his quick response on the issue and confirmation that increasing the vary HTTP header limit will likely have very little impact on performance.

]]>
http://thedotproduct.org/nginx-vary-header-handling/8c730b70-464f-4e28-821c-95bce3fd9a52Wed, 03 Jun 2015 20:42:48 GMT
<![CDATA[Achieving grade A SSL certificate configuration on AWS ELB]]>TL;DR

The situation: background

A while back, I wrote an article on strong SSL configuration for NginX which was the result of many hours of research and trial. I need to update that article as one of the articles I referenced is on raymii.org and has been updated since and is basically has better advice than my article - so a big thanks to Remy.

Anyway, I'm currently in the final phase of architecting a new social network site hosting infrastructure and deployment strategy and am now pinning down the finer details. I've had a lot of input into this project and thus have insisted on HTTPS throughout, so naturally I wanted to ensure it's strongly configured. To this end, I have spent some time this afternoon essentially porting Remy's recommendations (in as much as possible in this case) over to AWS to form an ELB (Elastic Load Balancer) configuration.

The recommended ciphers from Remy's article are (naturally) specified as an openSSL cipher string whic NginX can use: AES256+EECDH:AES256+EDH. This is exactly what's needed for NginX but does not match the explicit cipher listing on AWS ELB in the web console. So, I used the Qualys SSL certificate checker output from my own website and looked up the resulting ciphers on openssl.org cipher name listing page and then configured my ELB with that and ran the Qualys test. The result is an A grade, pretty respectable:

The ELB configuration

So the AWS ELB configuration is actually quite simple, just follow these steps when you get to the SSL configuration part of the AWS ELB setup or when you're editing your existing ELB listener (click the "Change" hyperlink under the "Ciphers" heading in the grid listing of your ELBs for the relevant ELB):

Security policy

Select "Custom security policy" (if not already selected) - this will allow you to configure the protocols etc. properly:

Protocols

Disable SSLv2 and SSLv3, enable TLSv1.0, TLSv1.1, TLSv1.2:

You'll need TLSv1.0 for IE <11.

SSL options

There's only one option in the SSL options, "Server Order Preference" - you want this:

Ciphers

Next up is the critical part, selecting the ciphers, we want (only):

ECDHE-RSA-AES256-GCM-SHA384
ECDHE-RSA-AES256-SHA384
ECDHE-RSA-AES256-SHA
DHE-RSA-AES256-GCM-SHA384
DHE-RSA-AES256-SHA256
DHE-RSA-AES256-SHA

So deselect everything else.

Finishing up

Now you'll just need to either apply your changes (if you were editing an existing ELB) or finish the wizard if you started from scratch.

Cloudformation/API

I haven't detailed out exactly how this would be specified via cloudformation or the AWS API as yet but when I do, I'll update this article.

Footnote

Once again, thanks to Remy van Elst for the article that fed this.

I should note that my ELB is in pure TCP mode as I'm running websockets over it so it has to be layer 3 only (not HTTPS).

]]>
http://thedotproduct.org/achieving-grade-a-ssl-certificate-configuration-on-aws/83fa5f0e-4df1-47be-af7c-68971cb28b04Wed, 29 Oct 2014 16:19:10 GMT
<![CDATA[Running multiple NodeJS / Travis CI tests without a test runner]]>I'm currently writing an authentication plugin for actionHero - it's a NodeJS module with a MongoDB backend. Usually, I'd use Mocha as a test runner and run my tests automatically (via Travis CI) when i commit/push to GitHub. In this instance though, I have some oddball issue which means that the MongoDB connection/driver just won't work under Mocha.

Owing to the above, I elected to bypass Mocha for the moment at least and just write the tests "raw". I knew I could do this as I recalled reading that Travis essentially uses the process exit status as the marker for success or failure (exit code 0 being success, anything else being failure). The issue was that I wanted to maintain several test files and to run them all on each commit/push. Usually, Mocha, as the test runner, would handle this for me but without Mocha, I was somewhat stuck.

To cut a long story short, with a couple of tests, I determined that the following works just fine to run multiple, independent test scripts via Travis and to record a build error if any one of them fails:

package.json file:

...
"scripts":  
{
    "test": "node ./test/test1.js; node ./test/test2.js"; node ./test/test3.js";
}

Then in the test files, exit with relevant codes to show test success/failure e.g:

...
if(testWasSuccessful)  
{
    process.exit(0);
}
else  
{
    process.exit(1);
]

Obviously, it's not ideal having to list all the test files manually and I could (and maybe will) create a small globbing test runner/wrapper but this gets me off the starting blocks. Maybe this will help someone else in a similar situation.

]]>
http://thedotproduct.org/running-multiple-nodejs-travis-ci-tests-without-a-test-runner/a82bd6c5-b673-4a1b-a102-9b792543a353Fri, 17 Oct 2014 11:41:04 GMT
<![CDATA[Heartbleed fix for Debian including on AWS (Debian AMI)]]>As many others have in the past couple of days, I've spent a fair bit of time reading about, fixing and reassuring customers about the heartbleed bug in openSSL and GNUTLS. The openSSL, GNUTLS and Debian package maintainers acted quickly to fix the issue and most people will simply be able to run:

apt-get update  
apt-get upgrade  

(remember to do this via sudo or as root)

On standard Debian wheezy installs, this will install a patched version of openSSL 1.0.1e so despite appearances, you should then be free of heartbleed. You can test this via a variety of ways but one of the simplest is published by security consultant Filippo Valsorda here (inlcuding source code so you can satisfy yourself that it's friendly).

When I came to update my Debian installs on AWS which I began building a week or two ago via the official Debian AMI (ver. 7.4), I found that there were no security apt sources in /etc/apt/sources.list and thus running apt update/upgrade didn't install the patched openSSL version. All I had was:

deb http://cloudfront.debian.net/debian wheezy main  
deb-src http://cloudfront.debian.net/debian wheezy main  
deb http://cloudfront.debian.net/debian wheezy-updates main  
deb-src http://cloudfront.debian.net/debian wheezy-updates main  

So, I had to add them in:

deb http://security.debian.org/ wheezy/updates main contrib non-free  
deb-src http://security.debian.org/ wheezy/updates main contrib non-free  

After that, the usual:

apt-get update  
apt-get upgrade  

Installed the patched openSSL version and thus heartbleed is no more, happy days. All simple stuff but might perhaps help someone out who's not so familiar with these things.

]]>
http://thedotproduct.org/heartbleed-fix-for-debian-including-on-aws-debian-ami/41707594-666b-47c5-9c26-4d36a533914aWed, 09 Apr 2014 17:51:23 GMT
<![CDATA[A simple and rough comparison of Akamai and Cloudfront CDN's]]>In my day job, we have traditionally used Akamai as our content delivery network (CDN) of choice. For quite some time, Akamai was arguably the only true enterprise-ready CDN and our clients demand a high level of service, hence Akamai is/was a good fit in most cases. The CDN world has definitely changed over recent months and years with many new players entering the market and many existing players strengthening their offerings - Amazon Web Services (AWS) Cloudfront is a popular CDN amongst these newer/improved offerings and has caught my attention, not least due to it's pricing.

AWS in general has been very disruptive in the market, offering very low cost services which are perhaps not always as comprehensive as competitors but fit the requirements for a large number of people/companies/projects. AWS Cloudfront is very much of this ilk, currently, although expanding it is quite basic as a CDN (it lacks many features offered by companies such as Akamai e.g. DDoS insurance, WAF and so on) but is as much as an order of magnitude cheaper than major competitors so if simple object caching is what you need (HTTP or HTTPS) then Cloudfront may well work for you.

Akamai still has by far the largest CDN in terms of PoP's (Point of Presence - i.e. CDN nodes) however the newer CDN vendors typically state that they operate under a different model, going less for sheer scale and more for strategically placed so-called super-PoP's (i.e. less PoP's but used more effectively).

As I've been using AWS more and more, I wanted to have a simple but reasonably meaningful (to me at least) comparison with it and Akamai so I set up something to achieve this which is as follows.

Test configuration

This test is semi-scientific, i.e. it is simple but relatively logical and is as fair as I could reasonably make it (since I'm running it in my spare time). The test is not supported or endorsed by either Akamai or AWS, nor is it influenced by either company or by my employer.

Notes are:

  • Tested against a single page (the home page) on our company website (relatively low traffic - low thousands of page views across the site per day) via 2 DNS records (both DNS records have the same TTL at 600 seconds and are served from the same authority):
    • www. which is cname'd to an Akamai edge hostname
    • cf. which is cname'd to an AWS Cloudfront config hostname
  • Both CDN configs hit the same origin servers (a pair of web servers load balanced by a hardware load balancer)
  • Public/live traffic is using the Akamai (www.) hostname, only Pingdom traffic is on the Cloudfront (cf.) hostname
  • From looking at IP information and physical locations, it seems that Pingdom do not host (at least not entirely) on AWS - important as this could have artifically skewed results
  • Our website is hosted in London, UK in our own private virtualised environment which is very well capacity-managed and has low contention and high-performance components
  • Cloudfront config uses all edge locations
  • As similar as possible Akamai and Cloudfront configs:
    • HTTP only traffic
    • Honour HTTP headers from origin with minimum cache times imposed
    • All items on the page are cached (the base HTML, CSS, JS and so on) by the CDN
  • Response times tested via Pingdom, one test per DNS record (as shown above) at a 1 minute interval from February 9th 2014 to today (18th March 2014) using default settings/locations (many locations, grouped into US and Europe)
  • All cacheable objects have at least 1 day maxAge and thus the cache will remain warm from Pingdom traffic alone (since tests run every minute from many locations)
  • Test duration is Feb 9th 2014 to Mar 18th 2014

The results I observed under these conditions are as follows:

Metric All locations Europe (all) U.S. (all)
Akamai
Overall average 772ms 572ms 976ms
Fastest average 425ms 293ms 591ms
Slowest average 894ms 702ms 1130ms
Cloudfront
Overall average 772ms 462ms 982ms
Fastest average 592ms 285ms 778ms
Slowest average 986ms 872ms 1125ms
% Cloudfront is faster than Akamai
Overall average 0% 19.2% -0.6%
Fastest average -39.3% 2.7% -31.6%
Slowest average -10.3% -24.2% 0.4%

Here's a graph of these results:
Akamai versus Cloudfront test results

Notes:

  • negative values of % indicate that Cloudfront is slower than Akamai
  • averages shown are calculated on a per-day basis i.e. slowest average is the average of the slowest response time on each day
  • the pingdom accounts used are the free accounts and thus stats are as-provided, with no modifications (i.e. I did not remove the top/bottom extreme values etc.)

Conclusion

In terms of overall average and across all locations, Cloudfront and Akamai are identical under my test conditions. Cloudfront is however a little faster on overall average in Europe.

The fastest/slowest times are somewhat more variable than the overall, this may be a result of Akamai having a larger and potentialy more stable (under heavy load) network of PoP's or perhaps the Akamai architecture is more able of coping with heavy traffic (overall internet traffic). Akamai has an edge -> midgress -> origin architecture and to my knowledge (please correct me if I'm wrong via comments) AWS is more direct, edge -> origin. Alternatively, it may be something I haven't thought of.

Further thoughts

It would be nice to have been able to test from more locations around the world and for more than one origin. This would perhaps better point to the reasons behind the differences

Caveats

This is a semi-scientific but very small-scale and relatively limited test/comparison but I haven't seen another similar so wanted to put it out there in case it's helpful. Read into it what you will but you should do thorough work of your own before selecting a CDN.

Edit 1: Added duration of test info
Edit 2: Added Pingdom network location info, corrected typos, clarified network routes to origin, added TTL info for cached objects, added Cloudfront config info
Edit 3: Amended incorrect "Metric" column title order (fastest & slowest the wronf way around) - thanks @CyrilDuprat!

]]>
http://thedotproduct.org/a-simple-and-rough-comparison-of-akamai-and-cloudfront-cdns/2acd9fda-f503-48fd-8320-47a85964b029Tue, 18 Mar 2014 16:30:24 GMT
<![CDATA[Splunk regular expression modifier flags]]>I use Splunk on a daily basis at work and have created a lot of searches/reports/alerts etc. A fair number of these use regular expressions (the Splunk "rex" function) and today, I absolutely had to be able to use a modifier flag, something of a rarity for me in Splunk.

As it turns out, the regex docs in the Splunk rex documentation is not described (unless I somehow missed it) so I had to do a bit of digging to find out how to do this. The upshot is that it's very simple, for example:

rex field=hostname "(?Ui)^(?<year>\d{4})-(?<month>\d{1,2})-(?<date>\d{1,2}) (?<hours>\d{1,2}):(?<minutes>\d{1,2}):(?<seconds>\d{1,2}).*$"  

The flags used in this example are in the leading (?Ui) before the caret (^):

  • U - ungreedy match
  • i - case-insensitive match

but you can use any PCRE modifier flags you want, e.g. multiline would be (?m).

Splunk uses PCRE regular expressions and there's a handy PCRE regex cheatsheet I found and also a really good regex tester.

]]>
http://thedotproduct.org/splunk-regular-expression-modifier-flags/1af0ca47-92c1-46b0-af48-4c9a865e3c69Wed, 12 Feb 2014 16:30:29 GMT
<![CDATA[Percona installation missing init script]]>I have just migrated from an RDS instance to a local percona instance on a Debian server and encountered an odd issue...

After installation, which all appeared normal at first, I tried to start percona and there was no init script (which would usually live at /etc/init.d/mysql). Disaster!

Some googling ensued but to no avail. Then I decided to check out some likely locations and sure enough found the file I needed at /usr/share/mysql/mysql.server. Maybe this will help someone else out of the same situation!

P.S. Percona has several pre-built my.cnf files (my-huge.cnf, my-medium.cnf etc.) which are also in /usr/share/mysql but a newer development is their configuration wizard which is well worth a try!

]]>
http://thedotproduct.org/percona-installation-missing-init-script/233e4f2d-8040-436d-964b-980648248824Wed, 12 Feb 2014 10:44:02 GMT
<![CDATA[Secure SSL certificate configuration with Nginx]]>tl;dr: Secure SSL cert config for Nginx (grade A-rated on GlobalSign SSL cert checker)

UPDATE 8th July 2014: I have amended my config slightly to use OWASP recommended ciphers and some updates to Debian core libs mean this configuration now produces an A+ result.

It's been quite some time since I wrote a blog post and there are a fair number of reasons for this, not least is that I have migrated this blog from Wordpress to Ghost which took up a lot of time as it also included a platform migration to AWS. During this migration, i decided that my architecture would include SPDY so I needed an SSL certificate and thus needed to configure said certificate.

As a bit of pertinent background information, my web server runs Nginx which proxies requests to Ghost, a NodeJS app. My Nginx install comes from the awesome dotdeb so SPDY support is really easy, as easy as installing the SSL cert and adding the "SPDY" keyword to the listen directive in Nginx:

server {  
    listen 443 ssl spdy;
      server_name thedotproduct.org;
    ...
}

So that's great, we're up an running with SPDY, tidy!

After getting SPDY working, I wanted to check my SSL cert was working for people other than just me on my browser in my little world so I ran a test using Globalsign's SSL cert checker which was really helpful. I then started reading as SSL ciphers and so on are something I knew a little but not a lot about, I confess that I'm still not a hardcore expert but I have decent understanding of the basics now.

Immediately from the test results, it was clear that some basic remedial actions were needed such as disabling SSLv2 and SSLv3 since no decent, half-modern browser needs them and they contain known weaknesses. Choosing the best combination of SSL ciphers was less simple but after a number of hours reading, tweaking and testing, I came up with the following configuration which achieves a grade A rating on the GlobalSign SSL cert checker:

Here are the test results:
thedotproduct.org SSL cert test results

The only compromise I am left with is being vulnerable to the BEAST attack. The choice I had in this regard was using an RC4 cipher which is fairly widely believed to have been compromised by the NSA or being vulnerable to the BEAST attack and since the BEAST attack has now been mitigated in most modern browsers and is by all counts extremely difficult to execute, I see that as the less nasty option. I could simply disable the older TLS versions but too many browsers require them (either because they don't support newer cryptogaphic protocols or they're for some ridiculous reason disabled by default - IE, Firefox, I'm looking at you!) so we'll have to live with them for the moment at least. Here's a rundown on what is configured (with wording according to test results):

If you're going for FIPS or PCI DSS compliance or a similar accreditation, you should note that this will almost certainly mandate how you handle keys and other sensitive info/files as well as the way in which your SSL certificate usage is configured.

I am pretty confident that this is the best (which is an opinion and/or requirement-specific of course) SSL config possible right now, at least under Nginx but if you know different, please let me know via a comment below.

Hopefully this'll help if you need to set up an SSL cert under Nginx. Many thanks to those who wrote the references I used, mainly these:

Cheers!

]]>
http://thedotproduct.org/secure-ssl-certificate-configuration-with-nginx/6bcf3c44-0d37-486c-8d8f-6dd402cd3777Fri, 06 Dec 2013 12:17:51 GMT
<![CDATA[Javascript performance: document.getElementById versus document.querySelector and document.querySelectorAll]]>I’ve been lucky enough to be working on an internal project the last few weeks which has a known set of modern browsers as the target audience. This means i’ve been able to ignore some of the older design/development issues.

I am really liking document.querySelector and document.querySelectorAll which are relatively modern (well, to me at least) as they’re so convenient to use. This did make me wonder what their performance would be like though, typically convenience is inversely proportional to performance so i thought i’d put together a quick test on jsperf.com;

http://jsperf.com/document-getelementbyid-versus-document-queryselector

The results vary in scale by browser but the overall trend is that document.getElementById is much faster than document.querySelector which is in turn a fair amount faster than document.querySelectorAll (which is probably a more obvious outcome).

So there you have it! Use document.getelementById wherever you can and only use document.querySelectorAll if you defintely need to locate multiple elements.

Cheers!

UPDATE: Just tested on Opera 12.11 (OSX) and bizarrely, it’s faster at document.querySelectorAll. Overall though Opera is a great deal slower in this test case than Chrome/Firefox. Safari 6 on OSX is immensely fast for document.getElementByID

]]>
http://thedotproduct.org/javascript-performance-document-getelementbyid-versus-document-queryselector-and-document-queryselectorall/55bda2a0-bc6e-4232-bff2-08ddf0d2b6c0Mon, 26 Nov 2012 12:00:00 GMT
<![CDATA[Previous sibling, the missing CSS selector?]]>CSS 2.1 has some really handy selectors, one of which is the adjacent (next) sibling selector which has the form:

el1 + el2
{
color:#f0f;
}

The above would apply a tasty pink(ish) text colour to el2 where it directly follows el1 in HTML element order. Excellent, that can be seriously useful.

The glaring omission (as far as i can see) in the CSS selectors currently available though is the exact opposite selector, previous-sibling which might perhaps have syntax:

el1 - el2
{
color:#f0f;
}

so i would see this as the obvious way to style el2 where it occurs directly before el1 with that same delightful pink(ish) text colour. This would have been immensely helpful in the project I am working on right now as i’m using a flexbox layout on a Zend Framework form and want to swap around the order of the input and label when and only when the input is a checkbox so i’d have loved to have been able to do:

label - input[type="checkbox"]
{
order:-1;
}

on HTML source of:

<div>
<label for="a">
Label text
</label>
<input type="checkbox" name="a" id="a">
</div>

There is also currently a non-direct sibling selector which uses a tilda in place of the plus, the opposite of this could perhaps be:

el1 -~ el2
{
color:#f0f;
}

Please, please, please can we have these browser devs? i’m certainly not the first to ask for them and I am aware that e.g. jQuery implement these so it obviously makes sense.

]]>
http://thedotproduct.org/previous-sibling-the-missing-css-selector/fd8f7fd6-e8b4-406a-8d33-270be14fba73Thu, 08 Nov 2012 12:00:00 GMT
<![CDATA[How to install a Java runtime on Apple Mac OSX Mountain Lion]]>One of the (perhaps) understandable but more irritating aspects of Apple Mac OSX upgrades which I just experienced is the fact that my upgrade from lion to mountain lion is that the Java runtime (JRE) is uninstalled…This meant that my usual code editor/IDE Netbeans was broken – not good!

So i searched around and found some installers e.g. from Oracle but I couldn’t be 100% sure they were going to work. Experience tells me that I should be careful installing JRE’s as broken installs can be a real pig to fix.

So I thought I’d just double check that my JRE was definitely removed by launching a terminal and typing “java -v” (which actually seems to be incorrect but was a guess) and luckily enough, OSX detected that I wanted a JRE and offered to download it for me. The install worked as far as I can tell and Netbeans is now working, bonus!

Just thought that might help someone out and save some time messing around.

]]>
http://thedotproduct.org/how-to-install-a-java-runtime-on-apple-mac-osx-mountain-lion/54907fc7-9d97-418e-b876-60d67d4bafd9Sat, 28 Jul 2012 10:00:00 GMT
<![CDATA[NodeJS NPM registry add/update notifications via Twitter]]>

I have been messing with NodeJS for a little while now and contrary to my preconceptions, I quite like it. NodeJS is not the right choice for every task (as with any development paradigm/language) but it has some great use cases for i/o bound tasks and is definitely easy to use!

NodeJS uses a module management system called NPM which makes it easy to obtain, manage and create NodeJS modules.

The npm-updates module was added a little while back and it made me think, wouldn’t it be cool if I could allow people to hook into the alerts…so I threw together a very rough (and i do mean very rough) app last night and pushed it to Heroku. It seems to be running ok so if NodeJS is your thing, maybe you’ll want to take a look. I made 2 accounts in case you are only interested in either additions or updates.

Twitter accounts to follow are:

Additions to NPM: @npmadditions

Updates to NPM modules: @npmupdates

Happy noding!

]]>
http://thedotproduct.org/nodejs-npm-registry-addupdate-notifications-via-twitter/917d1a12-6530-47a3-a2df-b72b75e1ac86Fri, 13 Jul 2012 10:00:00 GMT