So you and your team have been talking about your new social web app. Development has been going well, and someone had the bright idea: “Hey, we should automatically populate some information from the links our users share. Maybe pull in an image or two too!”

Now you have to build a web scraper. How hard could it be?

Don’t you worry friend. We’ll do it together.

(Mess around with the actual web scraper I built live!)

Make the Client Do the Work

Web scraping is not terribly resource intensive, but maybe you’re dreaming big and want to do a live scrape for every link anytime one of your users looks at it. No problem, add it to the client code and keep the extra load off your server!

Yeah . . . no. Unfortunately, because of browsers’ same-origin policy, it is not possible for the client to fetch html from anywhere but your own server (or other servers that have explictly whitelisted it ahead of time). In order to avoid using your own server, you’re basically going to have to hook into some proxy service, which really isn’t helping anything.

So let’s just build the scraper on our own server.

Your Tools

Your going to want to head over to your console and npm install request and cheerio.

Request is the standard for server-side http requests. It makes it dead simple to send GET, POST or whatever requests to whereever you want. Even better, install request-promise instead of request, to get all that request goodness with some clean and clear Bluebird promises baked in.

Meanwhile, Cheerio is a server-side implementation of jQuery. You are going to be parsing through some html files in a second, and you are definitely going to want access to jQuery syntax when you do it.

Fetching the HTML

Using request-promise to get the html you want couldn’t be simpler:

var request = require('request-promise');

request(url)
.then(function (html) {
  scrape(html);
});

That’s it. So now that we have html, what do we do with it? What goes into that little scrape function there?

Cheerio And You

Cheerio requires a little bit of extra setup. After requireing it normally, you use the load method to create a jQuery-like $ object with all of the HTML you’re planning to scrape. So, if your goal was just to return the title of a webpage we, you code might look something like this:

var cheerio = require('cheerio');

var scrape = function(html) {
  var $ = cheerio.load(html);

  return $('title').text();
};

Once properly loaded, cheerio works identically to jQuery, giving you access to all the same selectors and methods you would normally have on the front-end. So now that you have all of this power, what do you do with it?

meta Is So Meta

There’s a good chance that whatever information you wanted to scrape can be found in a site’s metadata. With the rise of social networks, most webpages are hoping to be shared, liked, tweeted, upvoted, pinned, or übermensched. Enter the meta tag. Designed to contain information about a site’s title, subject matter, authorship, and more, these tags are left in the head of a page for an enterprising social media gurus like yourself (and Facebook, mostly Facebook) to find. For example, here’s some of the HTML from a Udemy course:

<head>
  <title>Advanced React and Redux | Udemy</title>

  <meta name="title" content="Advanced React and Redux - Udemy">
  <meta property="udemy_com:category" content="Development">
  <meta property="udemy_com:instructor" content="https://www.udemy.com/user/sgslo/">

  <meta property="og:title" content="Advanced React and Redux - Udemy">
  <meta property="og:url" content="https://www.udemy.com/react-redux-tutorial/">
  <meta property="og:description" content="Detailed walkthroughs on advanced React and Redux concepts - Authentication, Testing, Middlewares, HOC&#39;s, and Deployment">
  <meta property="og:image" content="https://udemy-images.udemy.com/course/480x270/781532_8b4d_6.jpg">

  <meta name="twitter:title" content="Advanced React and Redux - Udemy">
  <meta name="twitter:url" content="https://www.udemy.com/react-redux-tutorial/">
  <meta name="twitter:description" content="Detailed walkthroughs on advanced React and Redux concepts - Authentication, Testing, Middlewares, HOC&#39;s, and Deployment">
  <meta name="twitter:image" content="https://udemy-images.udemy.com/course/480x270/781532_8b4d_6.jpg">

  <meta itemprop="name" content="Advanced React and Redux - Udemy">
  <meta itemprop="url" content="https://www.udemy.com/react-redux-tutorial/">
  <meta itemprop="description" content="Detailed walkthroughs on advanced React and Redux concepts - Authentication, Testing, Middlewares, HOC&#39;s, and Deployment">
  <meta itemprop="image" content="https://udemy-images.udemy.com/course/480x270/781532_8b4d_6.jpg">

</head>

Well hello metadata. So what is all of this? Well, there isn’t yet any standard metadata system, and since meta tags are basically roll-your-own, most social networks have done just that. The og tags are used by Facebook, the twitter tags by Twitter. Udemy has even built some of their own custom udemy_com tags. Most sites will try to cover all of their bases, and include a hodge-podge of tags, many redundant, just to make sure they don’t spoil their chances of being the next viral sensation.

In all of these cases you’ll use attr('content') to get the information you need from the meta tag, but the selector is a little more complicated. Standard class or id selectors obviously won’t work, which is why you’ll want jQuery’s attribute selector. For the Twitter title for example, that would look like $('meta[name="twitter:title"]'). For the OG title, it would be $('meta[property="og:title"]'). Armed with these tools, we could easily write a simple metadata scraper:

var cheerio = require('cheerio');

var scrape = function(html) {
  var $ = cheerio.load(html);

  return {
    title: $('meta[property="og:title"]').attr('content'),
    url: $('meta[property="og:url"]').attr('content'),
    desc: $('meta[property="og:description"]').attr('content')
  };
};

This code will work assuming every site always uses Facebook metadata, which is not as reliable as you might think. And what if, heaven forbid, they don’t use meta tags at all? If your needs are simple, you may be fine with a scraper that only works most of the time, but if you’re planning on building something really robust, you’ll need a system.

Keep It Organized With JSON

What we want is an array of possible tags we can have Cheerio cycle through until it finds a match. This is exactly the sort of data we can store out of the way in a seperate JSON file. For example, if we wanted to design a slightly more robust scraper for title and description, the JSON file might look like this:

{
  "title": [{
    "prop": "name", 
    "val": "title"
  }, {
    "prop": "itemprop", 
    "val": "name"
  }, {
    "prop": "property", 
    "val": "og:title"
  }, {
    "prop": "name", 
    "val": "twitter:title"
  }],

  "description": [{
    "prop": "itemprop",
    "val": "description"
  }, {
    "prop": "property",
    "val": "og:description"
  }, {
    "prop": "name",
    "val": "twitter:description"
  }]
}

With that in place we can write a simple function to loop through the specified options until it finds something.

var cheerio = require('cheerio');
var targets = require('./targets.json');

var scrape = function(html) {
  var $ = cheerio.load(html);

  // This function cycles through possible meta tags until it finds a match
  var scrapeFor = function(type) {
    for(var i = 0; i < targets[type].length; i++) {
      var prop = targets[type].prop;
      var val = targets[type].val;

      var scraped = $('meta[' + prop + '="' + val + '"]').attr('content');
      if (scraped) return scraped;
    }
  };

  return {
    title: scrapeFor('title'),
    desc: scrapeFor('description')
  };
};

The big advantage of this architecture, is that as you discover new pages with weird tags, adding them is as simple as adding a new object to your JSON file. Of course, if you ever want to scrape something other than meta tags, your code will have to get a fair amount more complicated (heaven help you if you start scraping text), but I’m sure you can hash out all those details for yourself.

For those of you unfamiliar, Mongoose is a popular ODM for MongoDB. If the previous sentence made no sense to you, you should probably follow those links and do some reading, because this is about to get . . . technical.

Among Mongoose’s many conveniences there is a variety of middleware called “hooks”, which allow you to listen for certain database events, and trigger a function when they happen. A common example would be the “pre save” hook, which is often used in to hash a user’s password before saving it to the database. That might look something like this:

UserSchema.pre('save', function(next) {
  var user = this;

  if (!user.isModified('password')) return next();

  bcrypt.hash(user.password, null, null, function(err, hash) {
    user.password = hash;
    next();
  });
});

Here we see Mongoose at its best. It almost reads like english. Before we save a user, check to see if their password is being modified. If not, skip to the next thing. If so, hash it, and then go to the next thing. As long as the developer remebers to call next at some point in their hook so that Mongoose can continue, it is really hard to screw this thing up. And you can create a “post” hook just as easily:

UserSchema.post('save', function(user) {
  console.log('User saved', user._id);
});

Note that with a post hook, Mongoose isn’t going to wait for you to finish, so there is no next function passed in. Instead, you get the new document that was stored in the db. Hooray! Easy!

If your need for middleware stops there, then you probably know all you need to. But as I learned the hard way in a recent project, if you want some higher level interactions, it starts to get much much more complicated very quickly. What started as a convenience, begins to instead feel like a weight on your shoulders. Don’t worry, I’m here to help.

The Many Many Events

Although save is by far the most common event to listen for in these hooks, you actually have a ton of options: save, init, validate, update, find, findOne, findOneAndUpdate . . . . Wait. What? How is “find” different from “findOne”? Does “update” fire when you “findOneAndUpdate”? “init” is for . . . when your document is new?

What each of these do and when they fire is not always particularly clear, and is generally poorly documented online. Trying to build the hooks you need for complex interactions with your various schema can end up being a real minefield. So let’s start with just listing all of the available events, what they mean, and when they fire:

save
  • Refers to a document being saved to the database
  • Basically just fires on doc.save() or Doc({...}).save()
  • Will not fire on update. More on that later.
init
  • Refers to contact with the database being initialized
  • In other words, it has nothing to do with initializing a new document
  • Fires first on just about everything: find, findOneAndUpdate, save (usually), etc
  • But won’t fire on remove, update, or save when creating a new doc (i.e. Doc({...}).save())
remove
  • Refers to a document being removed/deleted from the db
  • Fires on doc.remove()
validate
  • Refers to Mongo validating the properties of an object before saving it
  • Fires before “save” does on doc.save() and Doc({...}).save()
update
  • Refers to . . . the Mongoose method “update” being called
  • Has nothing to do with the general concept of db items being updated
  • Fires on doc.update(), and that is it
  • Does not fire on findOneAndUpdate, findByIdAndUpdate, or anything else
find
  • Same deal as update, refers to the method “find”, and that is all
  • Fires on Doc.find
  • Does not fire on Doc.findOne, Doc.findById, etc
findOne, findOneAndUpdate, findAndUpdate, findById, findByIdAndUpdate
  • Like find, and update, you can probably guess when these events fire

Wha? Why?

It’s best to think of these events as being in two seperate categories. save, init, remove, and validate are all events fired by interaction with the database itself. update, find, and the others are all events fired by a particular Mongoose method being invoked. Why the distinction? And why doesn’t a general db event like save fire when you are updating a document?

The reasons here are actually fairly technical. It is not actually possible to trigger a save event when you update a doc. You see, all of the various versions of update (update, findOneAndUpdate, etc) are what is called “atomic” methods. They don’t actually pull anything out of the database. They go in, find the thing, and then modify it in place. So when you “save”, by grabbing a thing, pulling it out, modifying it, and putting it back, the database will let everyone know: “Hey, someone is saving something here!” But if you just sneak in, and make a little tweak in place with update, the db has nothing to say.

Most of these “atomic” triggers then, are actually a convenience ginned up for Mongoose 4. In the past if you wanted to use nifty methods like findOneAndUpdate, you wouldn’t have been able to use any hooks at all. They are definitely a huge value add for the ODM, but lumping all of these different sorts of events together can lead to a lot confusion for unfamiliar developers. So let’s clear up some important differences now.

Order Matters

First, declaring the hooks. For all of the method based events, order is very important:

// This is the proper implementation.
var mongoose = require('mongoose');

var UserSchema = new mongoose.Schema({...});

UserSchema.pre('save', function(next) {...});
UserSchema.pre('update', function(next) {...});

module.exports = mongoose.model('User', UserSchema);

The above will work fine. Both hooks are declared before you build the User model by invoking mongoose.model(). However, the below is broken:

// This update hook will never fire. Ever.
var mongoose = require('mongoose');

var UserSchema = new mongoose.Schema({...});

module.exports = mongoose.model('User', UserSchema);

UserSchema.pre('save', function(next) {...});
UserSchema.pre('update', function(next) {...});

You might shrug this off at first, “just declare hooks first”, but this rule does not apply to save, init, validate, or remove. The following works just fine:

// This works. Because reasons.
var mongoose = require('mongoose');

var UserSchema = new mongoose.Schema({...});

UserSchema.pre('update', function(next) {...});

module.exports = mongoose.model('User', UserSchema);

UserSchema.pre('save', function(next) {...});

Why? I actually have no idea on this one. But watch out.

This Is Different

You may have noticed in my original “pre save” example, we made good use of the this variable. That time, this referred to the actual user we were saving. Super useful and convenient. In fact, it would have been hard to hash the password without it. Does this refer to the same thing in a “pre update” hook? Well . . . try running this code sometime if you’re feeling adventurous:

UserSchema.pre('update', function(next) {
  console.log(this);
});

What you’ll get is a huge garbled mess of private variables and various methods. You my friend are looking at a Mongoose Query object. Wh-why? What am I supposed to do with this? Well, once again we’re running up against some technical limitations. Because we’re working atomically, nothing has been pulled out of the database. There is simply no user model to give you. Same applies to before a find resolves. Sorry.

The Query object does give us some options though. For example, say you have an updatedAt timestamp that you want to change on every save. That is easy as pie, and looks like this:

UserSchema.pre('save', function (next) {
  this.updatedAt = Date.now();
  next();
};

Brilliant. But I bet you can’t guess how we’d do the exact same thing when we’re handed a Query on update:

UserSchema.pre('update', function (next) {
  this.update({}, { updatedAt: Date.now() });
  next();
};

Although you don’t have the object itself, the Query object does have access to the same update function you used originally. So what you are doing here is adding to that. And what if you want to do the same thing on findOneAndUpdate? Remember, these just trigger on the method call, so you’ll have to set up a hook for each individual method you are listening for.

Examples

At this point your questions may be answered, and you may be ready to go out and conquer the world of Mongoose middleware. Excellent. If not, keep reading for some actual implementations my team and I used in my recent project Roadmap To Anything.

The setup: we have two schemas, User and Roadmap. A roadmap is an ordered collection of resources and lessons that people can follow to learn a thing. The user’s schema must be able to track three sets of roadmaps: ones they’ve personally created, ones they’ve begun but haven’t finished, and ones they’ve finished. The UserSchema then, will look something like this:

var mongoose = require('mongoose'),
    ObjectId = mongoose.Schema.ObjectId,
    hooks    = require('../modelHooks.js');

var UserSchema = new mongoose.Schema({
  username     : { type: String, required: true, unique: true },
  password     : { type: String, required: true },
  roadmaps     : {
    authored   : [ {type: ObjectId, ref: 'Roadmap'} ],
    inProgress : [ {type: ObjectId, ref: 'Roadmap'} ],
    completed  : [ {type: ObjectId, ref: 'Roadmap'} ]
  }
});

hooks.setUserHooks(UserSchema);

module.exports = mongoose.model('User', UserSchema);

Notice that our hooks were complicated and inter-related enough that it became worthwhile to move them to their own file. Also notice, that they are still declared before the User model is instantiated!

Now, let’s build our hooks. First, when a new roadmap is created, we want a hook that will automatically add a roadmap to it’s author’s authored array. Since we are going to use save exclusively for creating new documents, and various versions of update for updating existing documents, we can use a “pre save” here:

RoadmapSchema.pre('save', function(next) {
  var User = require('./users/userModel.js');
  var authorId = this.author;
  var roadmapId = this._id;

  var update = { $push:{ roadmaps.authored: roadmapId } };

  User.findByIdAndUpdate(authorId, update)
  .then(function() {
    next();
  });
});

Easy enough. Before saving itself, the roadmap will look for the author designated in its own author property and then send a db request to push itself to that user’s authored roadmaps array. Assuming we only ever use save for new roadmaps, we’re all set to go. If we were worried about save sometimes being used to update existing roadmaps, we could modify this solution by throwing in a check like if (this.isNew), or by using $addToSet instead of pushing.

Now, for our next hook we are going to need update. Anytime a user is modified by adding a roadmap to their completed roadmaps array, we want to automatically remove that same roadmap from their in progress array. A roadmap cannot be both in progress and completed. Since our plan is to use functions like User.findOneAndUpdate and User.findByIdAndUpdate elsewhere in our code, we cannot rely on the “save” event. And since there are three different possible method triggers we could be listening for, we’ll first abstract our logic into a helper function which we can call on each trigger:

UserSchema.pre('update', function(next) {
  handleCompletedRoadmaps.call(this, next);
});

UserSchema.pre('findOneAndUpdate', function(next) {
  handleCompletedRoadmaps.call(this, next);
});

UserSchema.pre('findByIdAndUpdate', function(next) {
  handleCompletedRoadmaps.call(this, next);
});

Notice how we make sure to call our heper function and pass in the current this context. As messy as that Query object is, we’re going to need it.

var handleCompletedRoadmaps = function (next) {
  var completeId;

  if (query._update.$addToSet) {
    completeId = this._update.$addToSet['roadmaps.completed'];
  }

  if(completeId) { 
    this.update({}, { $pull:{ 'roadmaps.inProgress': completeId } });
  }
  
  next();
};

Woof. Okay, let’s parse through this. While adding to an update is relatively straightforward (just use this.update()), figuring out what is already being updated, like we need to do here, is little bit more of a challenge. Actually, if anyone knows a more standards compliant way to do it, please email it to me. In the mean time, this somewhat hacky way is the best I was able to figure out.

So, on that Query object will be a property called _update which is an object that contains all of the different updates you are about to make. What we want to know is if anyone is trying to add to the roadmaps.completed set. If they were, _update would look like this: {$addToSet: {'roadmaps.completed': 'SOME REALLY LONG ID STRING'}}. So first we check to make sure there is an $addToSet property, and then we set completeId to that long id string.

Assuming we set completeId to something, we know we have to add a $pull to our update, and do so using the same this.update method we used in the updatedAt example earlier. Invoke next and we’re done.

That’s it! Hopefully these examples, and the preceeding explanations, helped you grasp a little more of the logic and methodology behind Mongoose hooks. The learning curve is steep, but they are indispensable tools which allow you to wire up your database in loads of interesting and powerful ways. Happy coding!

So you’ve been programming in JavaScript for a little while now. Your code is getting lean and mean and DRY as a California reservoir. Maybe you’ve even started to take advantage of some of JavaScript’s single-line conditionals and you’re wondering just how deep this rabbit hole goes. Maybe you have no idea what I’m talking about when I say “single-line conditionals”. Whether you’re a doe-eyed newbie or a hardened professional, this post has some fun tricks you may be interested in.

Braces Optional

The traditional way to write an if statement looks something like this:

var five = 5;

if (five > 4) {
  console.log('Five is greater than for!');
  spreadTheNews();
}

This is all fine and good, sticks to practices we’re used to, makes a lot of sense. But what if we only have a single line of code in between those brackets?

if (five > 4) {
  spreadTheNews();
}

Three lines of code for what amounts to one simple if/then statement? I don’t know about you, but this is starting to feel downright wasteful. Well, the fix in this case is rather simple. Just kill the braces:

if (five > 4) 
  spreadTheNews();

This is a totally valid JS statement and will execute just fine. Anytime an if is not followed by curly braces, JavaScript will look for the next statement, and consider that the then part of your conditional. Even better, since whitespace is ignored, let’s kill that too:

if (five > 4) spreadTheNews();

BAM! Single-line conditional. Not only is it less code, but by removing a bunch of extraneous braces, I think we’ve actually made our code more readable too. And what if our simple if/then has a simple else? Not a problem, else works the same way:

if (five > 4) spreadTheNews();
else rethinkMath();

Simple. Readable. Short. My favorite kind of code. Technically you can do the same thing with else if’s too, though in that case I might add a bit of white space back in to help with readability. Of course, if your plan was if/else all along, there may be a better tool:

The Ternary Operator

The ternatory operator (so named because it takes three operands), is one of the more intimidating pieces of JavaScript syntax a new coder is likely to encounter. It looks strange and alien, and the way it works is sometimes profoundly unclear. However, if you really want to save space, you can write the above if else statement in one single line:

five > 4 ? spreadTheNews() : rethinkMath();

Frankly, I find that the ternary operator really hurts readability, and I generally avoid it for that reason. You could add some white space to help clear things up:

five > 4
  ? spreadTheNews()
  : rethinkMath();

This is a debatable improvement, and no longer satisfies our single line desires. Why not just go back to an explicit if else at this point? Well, I usually do. BUT, there is a scenario in which there is no substitute for our ternary frienemy: assignment.

var message = five > 4 ? 'excellent!' : 'wtf?';

Unlike an if else statement, the ternary operator is an operator. That means you are free to use it to the right of an assignment statement, which would throw one heck of a syntax error if you tried it with if else. Though this isn’t necessarily any more readable than other ternary uses, it saves an amazing amount of code when you compare it to the alternative:

var message;

if (five > 4) {
  message = 'excellent!';
} else {
  message = 'wtf?';
}

Gross.

The Case For Defaults

It turns out that there are more operators we can press into service to make our conditionals cleaner. One common example is to use the logical OR (||) to create default values in functions. For example:

var returnInputOrFive = function(input) {
  input = input || 5;
  return input;
}

If you’ve never seen it before, this construct may be a little confusing, though it does read in a remarkably sensible way: “input equals input or five.” In other words, if there’s an input, input should be that, if not, it should be five. Just like with a ternary, the beauty of this set up is that we can put this conditional in places you couldn’t put an if statement. Like for instance, as part of the return statement:

var returnInputOrFive = function(input) {
  return input || 5;
}

Same effect. Less Code. More readable. And imagine the alternate version using if else. Might as well go back to punch cards at that point.

But how does this bizarre hack of the OR operator actually work? The secret is in how JavaScript handles logical operators. In the case of ||, JS is trying to determine whether either of the two operands is “truthy”. As soon as it sees the first one is, there is no reason to bother with the second. So it doesn’t. Furthermore, JS never bothers converting a truthy value to true, or a falsey value to false. Why bother? If the first operand is truthy, just return it. If not, the truthiness of the second operand will determine if the overall expression is truthy or not. So don’t even check it, just return it and be done.

One big gotcha to watch out for here: be sure the value of input can’t be falsey value that you want to keep. In the above code for example, if we passed in an input of 0, we would ignore it and return 5. There are six falsey values, false, 0, NaN, '', undefined, and null, and if our input evaluates to any of them, our default will be returned. But if that is the sort of behvior you are looking for, you can really clean up your code this way. Does that mean we can use && to write single-line conditionals too?

Using && to Write Single-Line Conditionals

Similar to the logical OR, && checks to see if either of two operands is falsey. If the first operand is, there is no point in checking the second. This behavior is not taken advantage of nearly as often as ||, but I did just write some actual server code that I couldn’t have been done any other way:

module.exports.seedUsers = function(next) {

  var addUser = function(i) {
    if (i === users.length) return next && next();

    User(users[i]).save()
      .then(function () {
        addUser(i + 1);
      });
  };

  addUser(0);
};

The above helper function may seem a little daunting out of context, so allow me to offer a brief explanation. Using an array of users defined elsewhere, I am creating a series of User objects in my database. The order is important here, so I can’t iterate through users with a simple for loop. If one User happens to be slow to save, the next one would end up being created first. Not good. By using a recursive function I can ensure that each iteration will wait for the one before it.

And what about that insane (read: beautiful) single-line base case? I might have written it more clearly (read: uglily) like this:

if (i === users.length) {
  if (next) {
    next();
  }
  return;
}

If you’ve never seen this usage before, a return statement is a handy way to break out of a function. No more code will be executed once you run that return. It’s perfect for a recursive base case. Even better, if you need to call a function on your way out, rather than write them on seperate lines, you can just return the function call itself. So I might have used my own curly brace lesson from before and written:

if (i === users.length) return next();

But, I have one more problem problem. This helper function can be called in both asynchronus and synchronus environments, and so I do not know ahead of time whether or not next will be defined. The typical construct for calling a function only if it exists is the simple and readable if (fun) fun();, but if you tried to return if (fun) fun();, you would be rewarded with a big fat syntax error. Why? Remember, if and else are a statements, ?:, ||, and && are operators. You cannot return a statement. But you can return the results of an operation. Which brings us back to my original implementation:

if (i === users.length) return next && next();

If next is undefined, JavaScript has no need to evaluate next(), and will simply skip it, returning the value to the left (undefined, which is fine for my purposes). On the other hand, if next is a function (and therefore truthy), JS will look at the value on the right, see that there is a function that needs to be executed, and do so. A fairly complex series of operations have been reduced to one simple (okay, not that simple) line.

With Great Power…

To me, JavaScript is the ultimate “eh, sure I guess”, language. Can I just call undefined false? “Eh, sure I guess.” Can I get rid of these curly braces? “Eh, sure I guess.” What about this OR operator, seems like I could use it to set a default value. “Eh, sure I guess.”

That sort of flexibility can be very freeing, but there are pitfalls too. Remember that human beings still need to read your code. Before I pull out any of these tricks I always try to ask myself: does this make my code cleaner or messier? Does it make it more or less readable?

Shorter is nice, but clearer is better. It’s when you have the opportunity to do both that these tricks really shine.

So, you’ve been over the Node.js and Express documentation. You got a simple server up and running and everything is going great. You’ve even gone out into the wide web and found some awesome APIs to use with your web service, like the Sunlight Foundation’s amazing political API’s, or Giphy’s ridiculous gif API. Fun times all around.

But you’ve got a bit of a problem. You’ve put all your work on GitHub and now all those precious precious API keys you signed up for are as easily visible as the rest of your code. All someone has to do is glance at your repo and soon they’ll be guzzling up twitter feed requests in your name! Maybe you’ve got some other bits and pieces to guard, like perhaps the secret you’re using for your session. What to do?

Enter config.js

A common practice for keeping sensitive information away from your version control system, but still available to your application, is to use a config file. The concept is fairly simple: put all those api keys into a seperate file, add that file to your .gitignore. Since it is a configuration file, you could also use it to store various other settings if that’s convenient. Just keep in mind, any updates here will not be pushed to github.

The file itself is fairly simple. Just save the keys as strings, preferably with variable names in ALL_CAPS as this is the JS convention for constant variables which should not be altered. If you’re using a JavaScript file, finish up by making sure you set up the module.exports object for Node.js:

var SESSION_SECRET = 'shhh dont tell';
var KEYS = {};

KEYS.GIPHY = 'kajhfaksjdfawdb';
KEYS.SUNLIGHT = 'alkwejbsadhbvsal';


module.exports = {
  SESSION_SECRET: SESSION_SECRET,
  KEYS: KEYS
};

Afterwards add config.js (or whatever you called it) to your .gitignore, and then you require it in your server like any other JS file or module:

var express = require('express');
var session = require('express-session');
var app = express();

var SESSION_SECRET = require('./config.js').SESSION_SECRET;
app.use( session({ secret: SESSION_SECRET }) );

You’re done! Now your API keys and other secrets are stashed away from prying eyes on github. But I think we can simplify this a bit more.

No logic? Use JSON.

Why bother writing out an entire JavaScript file when we’re just trying to save a few variables. If we aren’t planning to actually write any logic into our config file, JSON will suit just fine. Let’s do a quick refactor:

{
  SESSION_SECRET: 'shhh dont tell',
  KEYS: {
    GIPHY: 'kajhfaksjdfawdb',
    SUNLIGHT: 'alkwejbsadhbvsal'
  }
}

Of course, you’ll have to change the filename to config.json, and update our .gitignore accordingly. But how do we import a json file into our node server? Really easily actally. Node can import json files just like anything else. All we have to change in our server is two little letters:

var SESSION_SECRET = require('./config.json').SESSION_SECRET;

As an added bonus, I also like to use the JavaScript convention for private variables and name the file _config.json if I’m not planning on making it public. It’s a nice reminder that no one should be passing the file around, though obviously this has no effect on the way the code actually runs.

And The Client?

So what if you are making API calls from the client-side? That’s a much thornier issue. Even if your sensitive client code isn’t on GitHub, it’s still on . . . the client. Ultimately anything in your client code is not going to be safe without some pretty complex workarounds like PHP wrappers or some sort of hashing scheme, both of which are beyond the scope of this blog post.

But depending on your needs, there are a couple of simpler options which might help. First, consider whether or not your client really has to be the one making API calls. Potentially the server could handle it all and then just send the data along. But, depending on how often these calls are made, or how much data they handle, this may be a poor solution as you’ve just added a somewhat extraneous networking middleman.

You could split the difference by adding a route in your server’s internal api that will send back the keys to your client (which the server got from the config file). Using Angular’s $http module (because who doesn’t love Angular?), your client’s GET request might look something like this:

var GIPHY_KEY = '';
var SUNLIGHT_KEY = '';

$http({
  method: 'GET',
  url: '/api/keys'
})
.then(function (response) {
  GIPHY_KEY = response.data.GIPHY;
  SUNLIGHT_KEY = response.data.SUNLIGHT;
});

This call would only need to happen once when the client is first loaded, so the extra bandwidth required is negligible, but of course if the route does not require some sort of authorization, this really isn’t much more sequire than saving the API keys in your code. There won’t be much to stop determined ne’er-do-wells from sending their own GET request and peeking at the results. But at least you’ve taken a few first steps to building a better security strategy on your new server.

Test driven development (or TDD), is a JavaScript development paradigm that espouses not only building tests to make sure your code works properly, but actually writing those tests before you code. The workflow looks something like this:

  1. Conceptualize project.
  2. Build tests for projects.
  3. Watch tests fail.
  4. Make tests not fail.
  5. Improve project.
  6. Repeat.

In this paradigm, the tests essentially serve as a rough outline for your code. Writing them first forces you to really think through what it is you want your code to do, and how it is you will actually do it. It can be a heady process, but once you are done, the benefits are self-evident. Afterwards, you just code to the test, filling in each piece as needed to make tests pass. And later when you are adding features, you will instantly know that you broke your code.

So how can we make writing tests as painless as possible? Thankfully are a variety of testing libraries out there to help simplify and streamline the process. Today, we’re going to talk about one called Jasmine.

Getting Started

As you might expect, Jasmine has an npm module which you can install easily enough through your command line:

npm install -g jasmine

Afterwards, you’ll want to head to your project’s directory and type:

jasmine init

And if you’d like you can even provision yourself with some example tests using:

jasmine examples

Finally, if you’d like to configure your project’s testing suite at all, you fill find a config file in spec/support/jasmine.json, and you can actually run your tests by going to your project’s root directory and typing:

jasmine

Writing a Test

Jasmine tests are built on fairly simple syntax, designed to look more or less like english. Well, english with a lot of extra dots and anoymous functions.

describe('Arithmatic', function() {
	
});

Say hello to the describe block. This is the broadest unit of Jasmine tests. It designates a collection of tests all pertaining to a particular part of your code. There is no hard and fast rule about how much of your code any given describe block should cover, just think of it as a sub-heading to help organize your tests.

describe('Arithmatic', function() {
	
  it('should be able to add', function() {
		
  });
	
});

If describe is a sub-heading, than the it block is a bullet point. These are used for each individual nuggets of functionality you want to test. What do you want your code to do? List each thing you come up with in an it block. You’ll notice that the code for each of these is very similar. They are functions which take two parameters, a string description, followed by an anonymous function that contains what will actually be run.

describe('Arithmatic', function() {
	
  it('should be able to add', function() {
    expect(2+2).toEqual(4);
  });
	
});

And now with an expect statement, our test is complete! If an it block represents the individual qualities being tested, the expect statements are the actual stress tests we are subjecting those qualities to. If any expect statement were to fail in testing, the enclosing it block will be show up as failed (and even print a convenient explanation for the failure). Specifically, expect statements work by comparing the argument passed in the expect call to the argumant passed in the “matcher”, in this case toEqual. There are a variety of matchers (described below), and they are all designed to read like plain english.

describe('Arithmatic', function() {
	
  it('should be a word', function() {
    expect(typeof 'Arithmatic').toBe('string');
  });
	
  describe('Addition, function() {
		
    it('should be able to add', function() {
      expect(2+2).toEqual(4);
      expect(4+4).toEqual(8);
      expect(2+2).not.toEqual(5);
    });
	
  });
	
});

Note that there is a fair amount of freedom to how you mix and match your describe, it, and expect statements. describe blocks can contain other describe blocks, and it blocks can be nested at the same level as those describes. But you would never place a describe block within an it block, and an it block must be contrained within at least one describe. Additionally, you can stack as many expect statements as you like within an it. Every expect must succeed for the test to pass.

Before and After

Frequently you will want to execute code before and/or after each of your tests. Rather than copying and pasting that code within every it block, Jasmine provides you with the beforeEach and afterEach functions:

describe('Arithmatic', function() {
  var add = function(x, y) {
    return x + y;
  };
  var operands;
	
  beforeEach(function() {
    operands = [1, 2, 3, 4];
  });
	
  it('should be able to add', function() {
    for (var i = 0; i < operands.length; i++) {
      expect(add(operands[i], 2)).toEqual(i+3);
    }
  });
	
  afterEach(function(){
    operands = [];
  });
	
});

Here we are working with some functions and variables that we’re going to use in our tests, and we use beforeEach to ensure they are set properly before each test. afterEach on the other hand, will allow us to reset any changes we may have made during the test. You do not need both of these blocks, and would probably only use one in most cases, but both are available. Note that unlike describe and it, neither takes a descriptive string as an argument, just the anonymous function. Also note that all of JavaScript’s regular scoping rules apply. If you declare a variable in your beforeEach block it will not be available inside your it blocks. Instead, you must declare variables within the enclosing describe and modify them as needed within beforeEach.

Some Useful Matchers

As I said earlier, toEqual is called a “matcher”, and there are a few others you can choose from. You can even write your own. In the mean time, here are some others you might find useful:

expect('a').not.toEqual('b');

Using not before your matcher will do just what you expect: expect things not to match.

expect('b').toBe('b');

toBe is very similar to toEqual, but is much stricter when it comes to objects. toEqual will actually recursively examine two different objects to see if they are “equivalent”. For example expect({a: 1}).toEqual({a: 1}) will pass, but expect({a: 1}).toBe({a: 1}) will fail. Although the two objects are equivalent, they do not actually refer to the same spot in memory and so are not strictly equal. For primitive data types there is no difference between toBe and toEqual.

expect(obj.prop).toBeDefined();

expect(obj.foobar).toBeUndefined();

As you might expect, toBeDefined and toBeUndefined test whether or not a thing is undefined.

expect(1).toBeTruthy();

expect(0).toBeFalsey();

Also pretty straightforward. Tests whether the given argument is truthy or falsey.

expect([1, 2, 3]).toContain(2);

toContain is a nifty little matcher, which will check an array or string to see if it contains a given item or substring.

expect(3.14).toBeCloseTo(3.1, 1);

toBeCloseTo allows you to round numbers to a specified decimal place. The above example passes becasue 3.14 equals 3.1 when rounded to the first decimal place. In contrast, expect(3.14).toBeCloseTo(3.1, 2) would fail. Using a zero (i.e. toBeCloseTo(n, 0)) will effectively round your arguments to whole integers, while negative numbers start looking at the tens places to the left of the decimal: expect(120).toBeCloseTo(100, -2).

There are a handful of other matchers out there as well, which you can find explanations of in Jasmine’s documentation, including toBeNull, toBeGreaterThan, toBeLessThan, toThrow (for exceptions), and toMatch (for regex, i.e. regular expressions).

Now Go Write Some Tests!

Test driven development is the basis for smart, focused, effecient coding. It allows you to spend less time wondering what to do next and helps you avoid project-breaking bugs before they begin. All the tools you need to build these tests are easy and available. So what you waiting for?