Javascript Tricks: Maps

Maps and Sets are part of the ECMAScript 6 proposal (Harmony). While they haven’t officially been implemented yet, you can start experimenting with them by using Paul Miller’s es6-shim module. Maps are much like Javascript objects in that they are collections of key-value pairs, but they have a few features that may make them more useful than regular Objects. The Mozilla guide lists the following advantages of Maps over regular Objects:

  • An Object has a prototype, so there are default keys in the map. However, this can be bypassed using map = Object.create(null).
  • The keys of an Object are Strings, where they can be any value for a Map.
  • You can get the size of a Map easily while you have to manually keep track of size for an Object.

In Parsoid, I used a Map to map logTypes (“error”, “warning”, or /error|warning/) to Arrays of logging backends (functions that would print to a console, write to a file, send an HTTP response, etc.)

Getting Started

Maps are easy to work with. Open up a terminal and try the following (but make sure to require es6-shim or an equivalent module first):

  • Add new key-value pairs with Map.set(key, value).
  • Retrieve a value for a given key using Map.get(key).
  • Determine whether a Map contains a given key with Map.has(key).
  • Get the size of a Map with Map.size.
  • Delete a key from a Map with Map.delete(key).
  • Clear all keys from a Map using Map.clear.

Beware of Non-Identical Keys

The keys to a Map can be any type of object, which seems like an improvement over using regular strings. Unfortunately, when using Map.get to retrieve a value for a particular key, you must pass in a key with the very same object ID as the original. In Parsoid, I initially used regular expressions as logTypes (keys) corresponding to Arrays of backends (values). This made it impossible to retrieve the backends later, since regular expressions all have unique object IDs, even when created with identical source strings.

var backendArray = [logToFirstBackend];
this._backends.set(/error|warning/, backendArray);
this._backends.get(/error|warning/); //undefined

Instead of storing regular expressions as keys, I had to obtain their source strings and save those instead. Unless you are able to store references to all the objects you are using as keys, you’ll have to do the same. From this perspective, Maps don’t have much of an advantage over regular objects.

if (logType instanceof RegExp) {
  logTypeString = logType.source;
} else {
  logTypeString = "/^" + logType + "$/";
}

this._backends.set(logTypeString, backendArray);

Iterating Over Maps

On the other hand, the convenient iteration method forEach is a good reason to use Maps. Like the forEach method for Arrays; forEach allows you to apply a callback function to every key/value pair in a Map. The arguments to the callback are the current value, the current key, and the Map itself. When you use a regular Object as a Map, that Object inherits default properties from Object.prototype that you’ll want to ignore by using the hasOwnProperty boolean; alternatively, as suggested in the Mozilla guide, you can create an Object with a prototype of null. Using a Map saves you the headache of fiddling with Objects, because the only key / value pairs in a Map are those that you deliberately set yourself.

In Parsoid, I use forEach when figuring out which backends to log a message to. If the current logType matches any keys (saved logTypes) in my Map savedBackends, then I take the relevant backend functions from the matching values (arrays of backend functions) and push them onto an applicableBackends array. For example, if my logType is “error” and savedBackends contains the keys "error" and "error|fatal", and "warning", then the backends are elements of the Arrays returned by savedBackends.get("error") and savedBackends.get("error|fatal").

// Iterate over all of the saved backends.
savedBackends.forEach(function(backendArray, logTypeString) {
  // Convert the saved string back into a regular expression
  // and test the passed-in logType.
  if (new RegExp(logTypeString).test(logType)) {
    backendArray.forEach( function(backend) { 
       // Push each backend from the matching backendArray
       // onto my list of backends.
    });
  }
});

Javascript Tricks: Object.assign

Object.assign() is a new ECMAScript 6 function that can be used to merge together two objects. (If you want to try ES6, I suggest checking out the es6-shim module!)

Basic Use Case: Merging Together Two Objects

Object.assign(target, source) copies over all of the own properties of source into target. In the example below, we create an object duckEgg with a prototype egg and assign its properties to the object omelet. Thus, third now has has property fat from duckEgg, in addition to its initial own property carbs. However, it doesn’t have access to property protein, which is defined only on duckEgg's prototype.

// Merging two objects together with Object.assign.     
> require('es6-shim');
{}
> var egg = {"protein": 4};
> var duckEgg = Object.create(egg);
> duckEgg.fat = 3;
3
> var omelet = {"carbs": 3};
> Object.assign(omelet, duckEgg);
{ carbs: 3, fat: 3 }
> omelet.protein
undefined

In Parsoid: Combining Fields from Multiple Objects

You can also use a combination of Object.assign and Array.prototype.reduce to merge together multiple objects. In my Parsoid logger, I can use this approach to combine logged objects with different custom fields into a single object. So far, I’ve mainly used Errors and strings for logging data rather than objects with specific fields, but you can imagine using different objects for different types of information and merging them together at the end.

// Calling env.log.
env.log("error", obj1, obj2, obj3);

// Within the log() function; combining logged objects into one
// loggedObjects is the Array [obj1, obj2, obj3].
loggedObjects = loggedObjects.reduce(function(prev, object) {
  return Object.assign(prev, object);
}, {});

A Quick Caveat

If both the target and source objects have a property with the same name, Object.assign overwrites the target object’s property with that of the source object’s. You can easily lose information if you don’t ensure that the two objects don’t have overlapping properties. In the example below, the nutrition and taste variables both share the property calories. When the two are merged together, the resulting object only has the calories property from taste.

> var nutrition = {"calories": 5};
> var taste = {"savory": true, "calories": 100};
> Object.assign(nutrition, taste);
{ calories: 100, savory: true }
> nutrition
{ calories: 100, savory: true }

Javascript Tricks: Array.prototype.slice.call(arguments)

In this round of posts, I’ll blog a little bit about Javascript trick I picked up while working on Parsoid, using some of my logger code to illustrate.

1. Array.prototype.slice.call(arguments)

Ideal for manipulating an arbitrary number of arguments that have been passed into a function. This code copies all or some of the arguments into an array, which can then be handed off to nested functions. For example, we might want to pass an object with an arbitrary number of properties to my logging / tracing function in order to describe an error or to provide tracing information. The logger then hands the object off to a data-processing function that constructs logging messages based on the object’s properties.

// A few sample use cases of the logging function. 
// We pass all but the first argument (the logType) to a nested data-processing function.
env.log("trace/request", "completed parsing of", prefix, ":", target, "in",
             env.performance.duration, "ms");
env.log("error", new Error());
env.log("error", token);

The first argument to env.log is the logType (the type of log output that we’re generating), while the remaining arguments are data that’s used to construct a log message. The arguments can be anything from an error to an object to a bunch of strings. In my implementation of log, logType is the only named parameter. I want to separate the remaining arguments from logType, funneling them into a logObject variable.

// How Array.prototype.slice.call is used in the logger
Logger.prototype.log = function (logType) {
  var self = this;
  var logObject = Array.prototype.slice.call(arguments, 1);
  var logData = new LD(this.env, logType, logObject);

arguments is a magical Javascript keyword that lets us access all the arguments passed to a function. So if I call env.log("error", token); then arguments[0] is "error", while arguments[1] is token. It seems like an Array because you can index into it, but it isn’t; though it has length and can be indexed into, it lacks Array methods like pop, shift, and slice. If arguments were an array, I could just set logObject to arguments.shift(1). But it isn’t, so that’s where Array.prototype.slice.call comes to the rescue.

\\ Copies an Array-like object into a new Array.
\\ Beginning and ending indices are optional.
newObject = Array.prototype.slice.call(oldObject, [beginningIndex, [endingIndex]]);

slice takes an Array and returns a new Array containing all or a subset of an existing Array. Its arguments are the beginning and ending indices of the copy. Even though slice is a method that’s only defined on Arrays, call allows us to use slice on Array-like objects. call redefines the this value in slice from an Array to the Arguments object. The first argument to call is the new this value. The remaining arguments to call are passed in as the regular arguments to slice. So you can use Array.prototype.slice on an Array-like object to get back a copy, starting (or ending) at specific indices.

In this case, we’re copying everything from arguments, except for the first argument, and putting it into an array named logObject. Although arguments isn’t an Array, slice can still handle it because it has the properties that slice is looking for (such as length and numeric indices).

The Never-Ending Patch

I’ve spent the past seven weeks on the same error logging patch. Being stuck on a patch is a new sort of purgatory; I’ll spend several days working on the next patchset, only to be sent back to the beginning when my team members discover a new error, ask for a new feature, or suggest different implementations.

Scoping is probably the biggest reason why the patch has dragged on for so long. The patch replaces every error and warning log in Parsoid with my logging function, which means that it’s used in a large number of files (23 at last count). In the beginning, this made for very slow going, since I wanted to test every call site to make sure that I was referencing the logging function properly and that it generated the desired output. Besides this, the potential for error increases along with the number of lines of code. As time goes on, my patchset gets larger and harder to review, and it’s easy for me and my reviewers to overlook important details.

Another reason is that I’m not very familiar with some of the underlying technologies. Not only is Parsoid a somewhat complicated project, but it relies on frameworks that I’m not very familiar with: Node.js (sever-side Javascript), Connect (a middleware framework for Node), and Express (a web development framework for Node). Whenever we run into a framework-related issue on Parsoid, I spend a day reading about the framework instead of writing code. I like to take the time to completely understand the problems with the current patchset before making any changes…which often results in too much rabbit-holing, and not enough coding.

A good example of this was an infinite error-logging bug that the team discovered on February 11th. It crashed the Parsoid servers by filling up the disk with identical error logs. The Parsoid web server uses Connect, which comes with its own default error handler. The web server also had its own error handler, which set HTTP headers and send an HTTP response with information about an error. If we called the custom error handler but set HTTP headers again afterwards, we ended up with a “Can’t set headers after they are sent” error that would go to Connect’s default error handler. The default error handler would try to set headers again, resulting in another “can’t set headers error”, sending Parsoid into an infinite error recursion tailspin.

It took me a couple of days of reading about Connect and talking to my mentor to even understand what had caused the error recursion in the main branch of Parsoid, and another several days to process my mentor’s suggestions for how to structure my logging function to avoid error recursion. Ten days passed before I felt confident enough about the restructured code to submit my next patch.

I’m now on my 22nd iteration of the patch and feeling (delusionally?) hopeful that the next patchset will be the last. If I were to do it all over again, I’d have kept my patch smaller and more tightly scoped; since it’s too late for that (we’re down to revising the same two files each time), I’ve devised a coping strategy to speed up the feedback cycle. I’ve been sending my mentor gists for specific files, instead of waiting for his input until I’ve submitted a patchset. I’d be curious to hear whether other people have suggestions for dealing with never-ending patches.

Coding by Consensus

My latest contribution to Parsoid was a generic logging and tracing function. It took me four weeks, twelve patchsets, and three different approaches before the patch was merged in.

Initially, I wrote a single function and put it in our Util module. Next, per my mentor’s suggestion, I expanded the function into a Logger class that could be customized with a different configuration for every class and file using it. The Logger class included a #log function as well as wrapper functions (#trace, #dump) that called the basic #log function with certain parameters. Our team lead disagreed with the Logger implementation, though, saying that it was too complicated to have separate loggers in each file. Based on his feedback, I moved the logging function to an “environment” object that’s accessible throughout most of the codebase. I also got rid of the wrapper functions, moving everything into a single logging function that prints different output depending on a logType parameter.

In the process of revising my patch, I learned a lot about Wikimedia’s culture. Whether it’s formal or informal, my team essentially operates by consensus. We can spend hours in friendly debates over questions of style and implementation (like the best way to write a logging function). And there’s always room for further discussion, even after the code’s already been merged. Because of the need for consensus, it takes longer to produce a final version of a patch.

I’ve learned a lot because of the consensus-based approach. Now I’ve implemented the logger three different ways (and understand the associated pros and cons), as opposed to having written it once and being done with it. I picked up some new concepts from my team’s debates on implementation, such as the difference between subclasses and subtypes. And I got used to the process of revising my code to accommodate feedback from many different perspectives.

I’m curious about how code review works in other teams. An inclusive, consensus-based approach has been helpful for my learning, but perhaps it would seem inefficient to some organizations.

Meeting My Mentor

Two weeks ago, I used my OPW internship travel stipend to visit my mentor Subbu Sastry in Minnesota. He spent two days helping me on my latest patch, explaining Parsoid and Wikimedia, and feeding me delicious South Indian food.

While visiting Subbu wasn’t strictly necessary (he’s extremely responsive on IRC, code reviews, and email), it was still very helpful to see him in person. Here are a few of the ways in which I benefited from the visit:

  • Understanding historical context. Documentation and wikis can give you a good sense of the current state of a project, but not its past or its future. Subbu helped me understand how Parsoid evolved out of MediaWiki’s original PHP parser, how it interacts with the Visual Editor project, and what the goals for Parsoid are going forward. (Some of this is also covered in a fairly lengthy and slightly outdated blog post.)
  • Visual learning. Parsoid’s process for converting wikitext tokens into an HTML DOM tree confused me until Subbu drew me a diagram showing the pipeline of transformations. I have a much better mental model of Parsoid as a result. You can’t readily send drawings back and forth over IRC or explain them very well over email; it’s really best when someone draws a diagram in real time and narrates as they go along.
  • Accidental learning. I learned a lot about Wikimedia’s internal tools and infrastructure just by looking over Subbu’s shoulder. For example, he showed me Zuul, a tool for running tests and other jobs on patches submitted through Wikimedia’s code review system Gerrit.

Most of all, meeting in person gave Subbu a good sense of who I am as a person and as a programmer. Even though we’re back to interacting on IRC, he can more readily detect when I’m making progress, or when I’m desperately confused and need to chat.

To other OPW interns: I definitely recommend seeing your mentor in person, assuming that the $500 travel stipend is sufficient to cover it. (I wish this stipend were higher for people living in other countries!)

Hacker School Month 3 Retrospective

Hacker School ended three weeks ago, a fact that I find both poignant and inescapable. In some ways, I feel like I didn’t make the most of it; I didn’t “finish” a major project while I was there. On the other hand, I didn’t go to Hacker School to learn more about web development. I went because I wanted to learn new languages and paradigms, to explore computer science topics like algorithms and data structures, and to collaborate with curious and talented programmers. From that perspective, I think that I spent my time well.

Throughout Hacker School, I made significant progress on my e-flirting web app, Datebot, but never completed and deployed it. I also worked through the first 1.5 chapters of SICP, learning about functional programming in the process. In addition, I paired extensively with other Rubyists on their projects, went to lots of seminars by Hacker School residents and facilitators, and engaged in lots of accidental learning. Finally, I started interning on an open source project, which is something I wouldn’t have dreamed of doing before Hacker School.

In the last month of Hacker School, I continued working on my projects, but also made time for the fun, sparkly, enlightening things that make Hacker School so wonderful and distracting. Here’s a roundup:

  • Datebot reorganization: I revised my database schema, added tests and validations, wrote some helpful Rake tasks, and began converting overly-powerful helper methods into modules. Now that I’m nearly done refactoring, the final step will be to finish the Google Calendar integration and actually schedule dates with crushes on behalf of the users.
  • Botastic: I paired with Will Chapin on this clever Zulip chatbot, which responds to messages with fun semi-relevant facts from Wikipedia. We refactored his code into short, three-line methods and experimented with functional programming techniques like pipelines. We also rewrote Botastic so that it could respond to any type of sentence, instead of only sentences in a specific format, by using a part-of-speech tagger.
  • Parsoid: I began interning on December 10, two weeks before Hacker School ended, so I had even less time for my Hacker School projects. On the other hand, working on Parsoid at Hacker School meant that I could get help from facilitators (especially maryrosecook, who’s a Javascript wizard), learn about Wikimedia’s organizational structure and developer tools from Sumana, and collaborate with Be Birchall, who’s both a Hacker School alumna and a fellow intern on Parsoid.
  • Markov Fun: I attended an amazing seminar by the lovely Alex Rudnick on using n-grams to generate sentences given a specific corpus. The demo code he used was all in Python, so I ported his code to a Ruby gem.
  • Functional programming techniques in Python and Ruby: maryrosecook gave a great practical introduction to functional programming using Python. I followed along in Ruby, and was surprised at how many functional techniques I take for granted (e.g., map, reduce, filter).
  • r0ml Lefkowitz’s talk on APL: Not only is APL a fascinating language (everything is a matrix! no for loops!), but it was great to hear about what programming was like Back In The Day (drum memory! teletypewriters!)
  • Korhal: I was deeply intrigued by Travis Thieman's Clojure-based Starcraft AI. I didn't know anything about Clojure or Starcraft, so I didn't feel qualified to contribute to it, but it was still thrilling to see him explain how to implement a zerg rush.

Making Ruby Gems with Bundler

Last week, I made my first Ruby gem, markovfun, which generates sentences using a technique that Alex Rudnick taught us at Hacker School. (There’s a Python version here for those who are interested.) Making a gem is very easy, especially when you’re using Bundler. The entire process takes about half an hour from start to finish, which is longer than it took me to write this blog post. I recommend it! You can use your gem locally, just as a way to keep your code uncluttered, or you can share it with the world by pushing it to Rubygems. As of this writing, my gem’s been downloaded 304 times, which means (hopefully) that I’ve helped hundreds of people have fun with Markov chains.

Read on to learn how to make and use a Ruby gem! If this takes you longer than 30 minutes, I’d like to hear about it.

Building Your Gem

  1. Run bundle gem my_gem from the command line. This will create a folder my_gem that contains a Gemfile, gemspec, Rakefile, and the lib folder. Inside the lib folder, you’ll see a file called my_gem.rb, which you’ll update with your gem’s methods. There’s also the folder lib/my_gem, which contains version.rb, a file that specifies the gem’s current version.

  2. Update my_gem.gemspec. Bundler has already pre-filled out this file with your name and email address (by integrating with Git, I imagine). Provide a gem description and summary; the gem won’t build until you do this. In addition, specify any gem dependencies at the bottom of the file with spec.add_development_dependency.

    lib = File.expand_path('../lib', __FILE__)
    $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
    require 'my_gem/version'
    require 'pry'
    
    Gem::Specification.new do |spec|
      spec.name          = "my_gem"
      spec.version       = MyGem::VERSION
      spec.authors       = ["Maria Pacana"]
      spec.email         = ["maria.pacana-rubygems@gmail.com"]
      spec.description   = %q{Best gem ever}
      spec.summary       = %q{Highly recommended}
      spec.homepage      = ""
      spec.license       = "MIT"
    
      spec.files         = `git ls-files`.split($/)
    
      spec.add_development_dependency "bundler", "~> 1.3"
      spec.add_development_dependency "rake"
      spec.add_development_dependency "pry"
    end
    
  3. Run bundle to install any gems that your gem relies on.

  4. Update my_gem.rb with the code that you want to share.

    If everything fits in one file, go to lib/my_gem.rb, where Bundler’s automatically created the module MyGem. Add your methods to this module.

    require "my_gem/version"
    
    module MyGem
      def self.happy_new_year
        puts "Happy New Year!"
      end
    end
    

    If your code involves classes or modules that are spread out across multiple files, you can place them in lib/my_gem/ and then require them from lib/my_gem.rb.

  5. git commit, if you haven’t done so already. This is necessary because Bundler uses git ls-files to figure out what files are being used in your gem. (git ls-files lists the files in Git’s index, or staging area.)

  6. Build your gem using gem build my_gem.gemspec.

Using Your Gem

You can either push your gem up to RubyGems or continue to test it out locally.

  1. Pushing to RubyGems

    Create a RubyGems account, if you haven’t done so already. Next, push your gem up to Rubygems with gem push my_gem-0.0.1.gem.

  2. Using your gem locally.

    Use rake install to make your gem available throughout your system. You can also use bundle exec pry if you want to test your gem out in a REPL.

My OPW Internship: Getting Started

I’ve been interning on Wikimedia’s Parsoid project for three weeks now. I didn’t make much progress in the first couple of weeks, partly because I was finishing up Hacker School, but also because I was still learning how to be effective in an open-source project.

Challenges Thus Far

My first hurdle was simply getting used to asking for help on IRC. I knew that I should not be afraid to ask questions. My mentors had volunteered to take on interns, after all, and they knew what they were in for. Sumana additionally gave me some very good advice, which was that I should ask for help if I was confused for more than 20 minutes. Spending too much time puzzling over a problem prevents me from being productive and is ultimately not helpful for Wikimedia. But old habits die hard; I’m the sort of person who hates asking for directions or talking to salespeople at clothing stores. Moreover, I didn’t want to be a burden on people that I’d never met face-to-face.

As with most phobias, I eventually overcame my fear of IRC through gradual and increasing exposure. In the beginning, I’d join the #mediawiki-parsoid channel and passively lurk for most of the day, asking questions only when it seemed absolutely necessary. My mentors always responded thoroughly and promptly, so over time I became much more comfortable asking for help.

My second hurdle was (despite much good advice to the contrary!) that I kept trying to Understand All of Parsoid. Instead of actually working on the task I’d been assigned, I hand-drew diagrams of the relationships between different modules, variables, etc. in an effort to understand how everything fit together. I ended up with a better idea of how Parsoid worked, but knew only marginally more than I did when I began. (I did manage to clean up some bad indentation and typos along the way, so my efforts weren’t a total waste.)

After losing a week to this, I gave up on attempting to understand Parsoid completely and hacked together my patch in less than an hour. I still wish I had a better mental model of Parsoid, but I’m assuming that my understanding of the project will only get better as I touch more and more of the codebase.

A good measure of productivity is how often I submit a patch on Gerrit, Wikimedia’s code review system. I submitted 1 patch the first week and 0 patches the second week. Now that I’m in the third week, I’ve been submitting at least two patches every day and am hoping to keep up the streak.

What I’m Working On

My current patch is an effort to generate more informative error logs. The codebase is full of error notifications / warnings, usually implemented by using console.warn. My goal is to abstract out some of this error logging by making a utility function that prints out an error message and the context for the error (i.e., the wiki page in which the error occurred). This is the first step in transitioning error logging to a logging library like bunyan.

 # Before the patch
 var location = 'ERROR in ' + env.conf.wiki.iwp + ':' + env.page.name;
 console.warn(location);
 console.warn("Do not have necessary info. to encapsulate Tpl: " + i);
 console.warn("Start Elt : " + startElem.outerHTML);
 console.warn("End Elt   : " + range.endElem.innerHTML);
 console.warn("Start DSR : " + JSON.stringify(dp1 || {}));
 console.warn("End   DSR : " + JSON.stringify(dp2 || {}));

 # After the patch (errorLog is the function I added)
 var err = [ "Do not have necessary info. to encapsulate Tpl: " + i ];
 err.push( "Start Elt : " + startElem.outerHTML );
 err.push( "End Elt   : " + range.endElem.innerHTML );
 err.push( "Start DSR : " + JSON.stringify(dp1 || {}) );
 err.push( "End   DSR : " + JSON.stringify(dp2 || {}) );

 Util.errorLog( env, err );

Datebot Refactoring: The Trouble with Helpers

I began integrating Google Calendar into Datebot so it can determine when users are available for dates. My first step was to write a get_future_dates function that connects to a user’s calendar and retrieves all events marked “datebot”. For the time being, users will indicate when they are available for dates by creating events and labeling them “datebot”. When a crush indicates interest in a user, Datebot will search the user’s calendar for “datebot” events and then propose the next available date to the crush.

The get_future_dates function is part of a helper file called google_oauth. I have a slew of helper files where I stow all of the code that doesn’t quite fit anywhere else: google_oauth helpers that connect to the Google API and retrieve contact and calendar information; twilio helpers send and receive SMSes using Twilio; a user helper for login and signup.

# Helper file
helpers do
  def initialize_client
  def display_oauth_google
  def import_contacts(currentuser)
  def get_future_dates(user)
end

# Controller
get '/get_gcal' do
  get_future_dates(currentuser)
end

After talking to Alan, I realized that my helpers were taking on responsibilities beyond the scope of ordinary route and view helpers. In addition, the lack of namespacing makes it hard to understand what a helper function is supposed to do. For example, it’s unclear whether import_contacts is a Twilio helper or a Google helper; Google.import_user_contacts is more descriptive. I’m planning to convert my helper files into modules and transfer them out to a lib folder instead of the original helper folder. If I’m feeling particularly ambitious, I may even convert my Google oauth helper into a gem and include it in my Gemfile.