Force Bubble Charts in D3

Screen Shot 2013-12-19 at 1.44.31 PM.png

Authored by Steven Hall

Using the force layout in D3 to produce a bubble chart for categorical data can be a good choice for communicating the message your data has to offer and has the side benefit of being a real crowd-pleaser visually. In this post I am going to explain how to produce the three example below using the force layout. I am also going to present a method I use to produce these quickly by laying out the graphic using the treemap layout (for rectangular schemes) and the pack layout (for radial schemes). In these examples I am not explicitly saying where to place the clusters of bubbles or the labels. They are being determined on the fly using the built in D3 layouts. This is a cool trick to have handy and, in this case, means you can have dynamic numbers of categories and not have to worry too much about laying out new coordinates.

Examples for this Post:

Rectangular Layout (treemap)

Force Bubble using Rectangular Layout

You can take a look at the example here. This one uses the treemap layout to set-up the locations for the clusters. The size of the bubbles in these examples represents the combined (city, highway) miles per gallon (MPG).

Radial Layout (pack)

Force Bubble Using Radial Layout

You can look at the example here. Here, the clusters are organized in a radial pattern using the pack layout.

Jim Vallandingham example

Jim Vallandingham Example

You can take a look at my example here. I am recreating a great example that you have probably seen if you have been looking around for ideas. It's done in Coffee Script and the code is really well documented. Definitely check it out. Here I am using his data with a slightly different method and the results are interesting to compare. The two have different layouts for the resulting nodes and the force animation is different.

A Few Words on Force Layouts

The examples seen here are the product of a lot of experimentation. The sort of "oozing" effect that I ended up with was not something that I intended to do. The D3 force layout is a pretty heavy physical simulation with a lot things going on simultaneously that can be difficult to predict. To me, you just have to embrace that. After messing around with different settings I decided I liked this effect, but it is really kind of a quasi-force layout. If you look closely at the code I never send the layout any nodes and I am really just using the repeated tick event and some code lifted from a Mike Bostock example to do collision detection that together produce the final result. Alrighty then, I'll try and point out some of levers you can pull to create your own effects as we go along.

Begin at the Beginning

In the examples I am using fuel efficiency data from fueleconomy.gov. This is not a great example of hard hitting analysis, so don't make your next car purchases based on this. I just selected a few different makes and then did some filtering and recoding to set up the data.

In each case explained here we have a set of categorical variables and another numerical dimension that we want to use for the size of the bubbles. The code is largely the same for all of the examples, so I will do a detailed overview of the rectangular example and loop around briefly to the other two. I am making a few modifications in each case that I will highlight.

Along with D3, all the examples use underscore, jquery, and bootstrap. The bootstrap library is only needed to do the "pretty" popovers and buttons seen in the examples, so totally optional.

The data is a csv file with the following relevant columns where "comb" (the combined city/highway mpg) is our numerical data for the size of the bubbles. All the other variables are categorical string variables.

make,model,year,trans,vclass,cylinders,drive,comb

Setting Up the Data

Looking at the the first example when the page loads we do a few things to set up the data:

d3.csv('data/fuel.csv', function (error, data) {

  var width = 800, height = 800;
  var fill = d3.scale.ordinal().range(['#827d92'...'#2a3285','#383435'])
  var svg = d3.select("#chart").append("svg")
      .attr("width", width)
      .attr("height", height);

   for (var j = 0; j < data.length; j++) {
     data[j].radius = +data[j].comb / 2; //A
     data[j].x = Math.random() * width; //B
     data[j].y = Math.random() * height; //C
   }

After setting up the SVG and scale, we add a few elements to the data to make life a little easier down the road. At point A we add a "radius" field to the file that is a function of our numerical field "comb." You could really do any kind of transformation here or have a scale set-up to properly size the radius. In the last example covered here, a scale is setup and used at this point in the code. In this case I am dividing by two for no particular reason other than it was simply easier to get it to look nice for the example. In B and C we are adding the initial x and y locations for each of the points. I want the nodes to fly in from random locations when the force layout starts, but you could put constants or whatever you like. In this case, the x and y do have to be set here or your visualization will bomb out later. If you send nodes to the force layout directly it will handle initializing the needed variables, but in this case that is being sidestepped.

The really interesting part of the code starts here and branches out:

draw('make');  //A

$( ".btn" ).click(function() {
 draw(this.id);  //B
});

function draw (varname) {
  var centers = getCenters(varname, [800, 800]);
  force.on("tick", tick(centers, varname));
  labels(centers)
  force.start();
}

The basic idea is that the page loads with the "Make" button active (selected) and data for the make variable gets displayed when we call the draw function at point A. If you look in the HTML the make button has a class of active and when a user clicks on one of the buttons bootstrap handles changing the states of the buttons, we don't need to code for that (which is one of many reasons to use a framework like bootstrap). In the HTML you'll also see that there is an "id" which corresponds to the fields in the dataset. So when the user clicks the buttons change state, and the JQuery at point B above send the id to the draw function.

Find Your Center

The first thing that happens when the draw function is called is a set of centers is created for the selected variable. These are the center points for each of our clusters of bubbles. The code sends the variable name and the height and width of the layout we want to the "getCenters" function. That function looks like this:

var getCenters = function (vname, size) {
  var centers, map;
  centers = _.uniq(_.pluck(data, vname)).map(function (d) { //A
    return {name: d, value: 1};
  });

  map = d3.layout.treemap().size(size).ratio(1/1); //B
  map.nodes({children: centers}); //C

  return centers; //D
};

At point A we simply get a unique list of values for the variable passed in and then map that into an array of objects with a name and value. At point B we setup D3 treemap layout. This may seem a little strange, but the treemap layout simply breaks a rectangle area into subdivisions based on the data you send to it. This is exactly what we want. If we were to code this by hand we would likely do some quick math based on the height and width of the SVG. That would work, but it's brittle and takes time to do. If later you need add a category or the data is dynamic this can be a big time waster. Also, you still haven't dealt with the labels. By setting it up with the layout we know the center and the height for label placement as well.

A couple of things to note. I am giving the treemap layout a ratio parameter and using the default algorithm "squarify". The ratio simply tells D3 that we want try and maintain the 1/1 ratio as much as possible. You'll find that the layout gives pretty good results but you have to think about how it will interpret what you send it a little to get the results you want (this is why you send the height and width instead of just assuming that we want it to be the height and width of the SVG, at times you may want to tweak it a little). You could also experiment with the layout algorithm like "slice-dice" to achieve other layouts.

Rectangular Centers (treemap)

So with the layout setup we send it a small dataset at point C with each category having a value of 1 (making them all equal size in the layout). We are essentially saying we have a group of equal things we want to layout inside the height and width parameters given to the layout.

At point D we return an array of centers like the one to the left. The layout attaches these relevant field to the data:

x - the minimum x-coordinate of the node position.
y - the minimum y-coordinate of the node position.
dx - the x-extent of the node position.
dy - the y-extent of the node position.

With this attached we now have our foci for the visualization. The x and y gives us the upper left of the cluster for the category and the dx and dy tell us how big it is. With this info we can make all the calculations needed to direct the nodes and place labels above the clusters.

Specifying a Tick Function

At each cycle of the force simulation D3 needs to direct each node from it's current location to its next stop. The instructions for doing this are passed in the "tick" function. If you look back at the draw function above we specify a tick function here:

// in the draw function
force.on("tick", tick(foci, varname));

In my example the "tick" function is a high-order, factory type function that returns a function set up with the parameters we send it. This is a classic use of a closure to set-up some internal variables on a function and return. The tick function looks like this:

function tick (centers, varname) {
  var foci = {}; // Making an object here for quick look-up
  for (var i = 0; i < centers.length; i++) {
    foci[centers[i].name] = centers[i];
  }
  return function (e) { //A
    for (var i = 0; i < data.length; i++) {
      var o = data[i];
      var f = foci[o[varname]];
      o.y += ((f.y + (f.dy / 2)) - o.y) * e.alpha;
      o.x += ((f.x + (f.dx / 2)) - o.x) * e.alpha;
    }
    nodes.each(collide(.11)) //B
      .attr("cx", function (d) { return d.x; })
      .attr("cy", function (d) { return d.y; });
  }
}

The function receives the centers "map" that was setup previously and the variable name that is currently selected. So in the tick function above we use our foci to recode the data with updated x and y values at point A. You can see how the x, dx, y, and dy elements are being used to define the next location for the circle. That data is then used in the quadtree function called "collide" at point B. The collide function is taken directly from Mike Bostock with some very slight modifications. If you are not familiar you can read up on this here. This is some pretty heavy code that keeps the circles moving away from each other in the examples. Changing the parameter for the call to collide is another opportunity to experiment with the code to get different results. This one is largely responsible for the oozing effect on the examples.

Looking back at our code with the tick function specified, the labels are added with the following code:

function labels (centers) {
  svg.selectAll(".label").remove();

  svg.selectAll(".label")
  .data(centers).enter().append("text")
  .attr("class", "label")
  .text(function (d) { return d.name })
  .attr("transform", function (d) {
    return "translate(" + (d.x - ((d.name.length)*3)) + ", " + (d.y - d.r) + ")";
  });
}

We're passing in the centers again and using the coordinates from the layout to figure out the label placement. Looking at it now, this part could be more elegant but it works pretty well.

With the labels in place the simulation starts with force.start() and we are off and running! The page is just waiting for a button click to trigger a new simulation.

A Radial Layout

The code is almost identical for the radial layout. We are swapping the treemap layout for the pack layout in the "getCenters" function here:

var getCenters = function (vname, size) {
  var centers, map;
  centers = _.uniq(_.pluck(data, vname)).map(function (d) {
    return {name: d, value: 1};
  });

  map = d3.layout.pack().size(size);
  map.nodes({children: centers});

  return centers;
};

The array of objects that gets returned is a little different since these are circles with a center and radius, so it requires a little modification on the tick function:

function tick (centers, varname) {
  var foci = {};
  for (var i = 0; i < centers.length; i++) {
    foci[centers[i].name] = centers[i];
  }
  return function (e) {
    for (var i = 0; i < data.length; i++) {
      var o = data[i];
      var f = foci[o[varname]];
      o.y += (f.y - o.y) * e.alpha;
      o.x += (f.x - o.x) * e.alpha;
     }
     nodes.each(collide(.11))
       .attr("cx", function (d) { return d.x; })
       .attr("cy", function (d) { return d.y; });
  }
}

Other than using the radius to the place the label above the cluster, I believe those are the only differences between the two examples. On to the last one.

Gates Foundation Example

Ok. Last one! This is a cool example done by Jim Vallandingham that I wanted to recreate to show that if you are not interested in using the rectangular or radial layouts and have the coordinates ready to go you can just plug them in.

If you look at the example the coordinates are specified in the code here:

var year_centers = {
  "2008": {name:"2008", x: 150, y: 300},
  "2009": {name:"2009", x: 550, y: 300},
  "2010": {name:"2010", x: 900, y: 300}
}

var all_center = { "all": {name:"All Grants", x: 500, y: 300}};

He has the circles start in the middle and then separate by year. To follow my convention of using the variables in the file I just made a slight modification to the data as it is coming in and added an "all" variable that is the same for all of the nodes:

_.each(data, function (elem) {
  elem.radius = radius_scale(elem.total_amount)*.8;
  elem.all = 'all';
  elem.x = _.random(0, width);
  elem.y = _.random(0, height);
})

Then when the draw function is called it looks like this to get the foci sent to the tick function correctly:

function draw (varname) {
  var foci = varname === "all" ? all_center: year_centers;
  force.on("tick", tick(foci, varname));
  labels(foci)
  force.start();
}