Tuesday, January 17, 2012

VC/Company Investment Visualizer

A friend asked me last week if I knew a tool to help him visualize which VCs were investing in a sector. I did not. But I realized I could pretty quickly repurpose the VC Bar Chart code and some unpublished code that pulls in data from a Google spreadsheet to show a force-directed graph. So, weekend project.

Data from Crunchbase, visualizaton using the d3.js library.

Here's my portfolio.

The site is here. Just start typing company names in the upper-left hand corner box and hit plus to add. Real name to Crunchbase permalink translation uses the list of companies as of Friday* or so, so if the company was added to CB later, autocomplete finds nothing;  just type in the permalink and the company will still be added. In the screenshot above a few of my companies had no CB investor entries, so they're just floating out there. Many of my other companies are not linked to me because CB does not mention me as an investor.

One way to explore is to enter a bunch of companies in your area of interest and see how the graph falls out.  Here's one of the AdTech industry.


The save functionality is experimental (to me, that is.) It uses HTML5 localStorage. The caveat is that you can't email visualizations around that way, and there may be times when your browser clears localStorage (sometimes when clearing cookies, for example.) If it does, you lose all saved visualizations.

The code is all client-side, so it's right there in your browser if you want to look at it. I found myself late last night using a non-analytical debugging process** when I was trying to get the 'load visualization' piece to work. I'll put it up on GitHub some time after I clean it up.

-----
* And I redacted the list to only include companies that CB showed having investors. The full list was too big to load efficiently.
** Mainly making random code deletions.

Friday, December 2, 2011

You can't manage what you can't measure. Not at scale, anyway.

A year ago I wrote, re investing in social marketing, "The social loop will share superficial characteristics with the display loop, but it's really completely different... the area with the most near-term leverage will be tools that help communicators understand the impact of how they are communicating and then help them make better decisions." This has turned out to be completely true.

I've been thinking about social marketing for five years. It has seemed obvious that major advances in marketing technique will occur through the social channel, but it was never clear to me exactly what those would be. I looked at and worked with a couple dozen social media marketing companies before throwing up my hands and declaring non-prescience.

My rule of thumb is that when the evolution of the landscape seems unknowable it is usually because the technology that will underpin the advance is still in flux. The obvious solution is dropping a level deeper in the stack and looking for investments there. In mobile, that meant Flurry four years ago and Media Armor a year ago. In social, it meant Awe.sm.

The smartest guy I ever knew in the ad business (like being the tallest dwarf, I know...) said, of managing people, "Whatever chart you put on the wall goes up."
That was me, the tallest dwarf, from back when I knew Clay, when he was just another guy.

I worked at IBM during the heyday of the Six Sigma movement. I was a design engineer, trying to optimize a very small piece of the central processor of what became the System 390 series of mainframes. As a design engineer there were several layers of abstraction between me and the silicon: the design language was a visual one--I wrote a flowchart which was compiled into a set of logic gates which were then mapped onto silicon. Aside from tweaking the logic gate-level design to try to get better performance, I spent my time at the flowchart level, as did most of the engineers.

Six Sigma methodology has you measure processes, find causes of errors and remedy them. The idea is to improve processes until there are fewer than 3.4 defects per million. IBM had a company-wide mandate to implement Six Sigma. I was subject to this mandate.

I asked my manager how I was supposed to measure my 'defects' and why would I even want to if I had to define them in such a way that I essentially never, ever made that type of mistake. He said "How are you going to improve if you aren't noticing your mistakes and figuring out how to stop making them?" "I already do that," I said, "I'm just not marking them down on some stupid piece of graph paper thats been pre-printed with a normal curve." He said "But then how can we manage it?"

Ah, Bach.

You can't manage what you can't measure. Stupid as managing designers on the binary idea of defect/not-defect and on such a stringent scale, constantly knowing how well you are doing so that you can constantly improve is extremely powerful. This idea, probably more than any other, drives my investment strategy: things that are not being measured are being managed poorly; creating new ways to measure creates ways of doing things immensely better, it creates entirely new businesses.

The fact is, you do get what you measure, whatever graph you put on the wall will go up. But the moral of that pithy aphorism was meant to be: be careful what you wish for.

If what you are measuring in social marketing is Likes or Follows, that is what you will get. But how closely aligned are these measures with what a business really wants: happy and loyal customers, higher sales? You don't know. No one knows. This particular loop hasn't been closed. Because the social gesture cause and business result can't be tied together in a measurable way, it can't be managed and it can't be improved.

I invested in Awe.sm's seed round because they provide core social measurement functionality, the ability to tie social actions into their actual results, to close the loop. I re-upped into their Series A because they're now doing something even more interesting: they're providing this functionality to other developers via API. Instead of being just an analytics player, they're now enabling the creation of an entire social marketing infrastructure that can use measurement to provide a ever-improving feedback loop.

I may have gravitated to marketing in part because dealing directly with people is too messy to ever even approach Six Sigma, but the engineer in me still believes that by measuring you can improve, and by linking measurement and algorithms you can create a feedback loop that allows you to improve adaptively and in real-time. This idea has revolutionized online advertising over the past few years. It's going to revolutionize social marketing over the next few.

Wednesday, November 23, 2011

iMapBox

I've always been the type who, when confronted with a one-hour task, will instead take two hours to automate it. Here's an example.

VCdelta is my bot that tracks additions to VC portfolio pages. It has its own twitter feed. Its twitter feed is about to surpass my twitter feed in number of followers. It seems my bot is more interesting than I am. I thought it would be interesting to graph the number of people who have followed me versus the number of people who have followed VCdelta over time. Twitter does not provide stats like that, but whenever I get a follow email from Twitter, I hit archive, not delete. So all I needed to do was count the follow emails by month.

Turns out Python doesn't have a very good library for using a mailbox as a data source. The Python email libraries assume you are planning on writing an email client. So I wrote an abstraction layer for the Python IMAP library. Code is here*.

Here's the code to count twitter followers:

import IMapBox 

me=IMapBox.IMapBox("imap.gmail.com",my_acct,my_pwd)
mymail=me["[Gmail]/All Mail"]

myfollows=mymail.frm("twitter").subject("following")

mydates=[myfollows[x]['date'] for x in myfollows]

The 'me=' and 'mymail=' open a connection to my email account and select a mailbox, in this case the All Mail mailbox. (The command 'me.list()' lists all the mailboxes for the account.)

The next line filters mymail so myfollows is only emails from Twitter that have 'following' in the subject line**. iMapBox is lazy--it doesn't fetch the emails itself until it has to--so this is pretty fast. myfollows acts like a dictionary, so you can len() it, ask for the keys()--these would be the message IDs--or the items(), iterate over it, or get items.

Each of the items in the dictionary is an email message. These also act like dictionaries, with keys like 'from','to','subject','date', and 'text'. The next line creates a list called mydates of the date each follow email was sent. It does this by iterating over each item in myfollows and pulling its date out. This is the slower part: when you set up an iterator, iMapBox gets all the headers***.

The part about counting follows per date I will leave as an exercise to the reader. Here's the graph of my follows and VCdelta's follows. I've been tweeting for some three years, VCdelta for six months.


On a sidenote, this is a logarithmic scale. The green line is my trend. This is odd, no? I mean, I'm not getting exponentially more popular, so this argues that a lot of follow behavior is algorithmic of some sort. I had expected more linear growth.  I also expect VCdelta to level out soon, as it reaches the limits of its natural audience.

Another example, email volume over time:



You can see where I started using my current email account full-time, in September 2006. And you can see when I started investing full-time, in mid-2009. And you can see why my email response time has slowed dramatically.

The code:

from datetime import date, timedelta
import IMapBox

me=IMapBox.IMapBox("imap.gmail.com",my_acct,my_pwd)
mymail=me["[Gmail]/All Mail"]

for yr in range(2006,2012):
 for mo in range(1,13):
  beg_month = date(yr,mo,1)
  end_month = date(yr+mo//12,mo%12+1,1)-timedelta(days=1)
  print mo,"/",yr,"\t",len(mymail.dates(beg_month,end_month))

This is an alternative way to count emails per month, filtering by date instead of collecting dates. The 'dates(x,y)' method filters the emails for only those that were received between date x and date y (inclusive.) This is faster because even the headers are never fetched.

Some other ways to use it:

c=mymail.frm('josh')+mymail.frm('matt')
d=mymail.frm('josh')-mymail.to('matt')
e=mymail.today()
f=-mymail.today()

The first is all messages from either Josh or Matt. The second is all messages from Josh that aren't also to Matt, the third is all today's messages, the fourth is all messages except today's.

 ----- 
 * I'm an electrical engineer, not a computer scientist. So I can build a waveguide to your specifications, but I'm not entirely sure that this code is all that good. Please, feel free to fork, suggest improvements, make improvements, tutor me on garbage collection or unit testing, whatever. 
 ** I like object chaining. I know it's not Pythonic, but I'm not sure why. It strikes me that since I don't really understand too deeply how Python garbage collects, that this may be creating extraneous intermediate objects. If you plan to use this is any sort of real code, you might want to figure that out. I did notice that if I object-chain the IMAP connection ('me' in this example), it gets dereferenced and gc'd, which invoked the very polite __del__ method, closing the connection. I'm not sure how to avoid that, so I just commented out the __del__ method, leaving a messy open connection to the server. 
*** My thinking is to only go do the time-consuming fetching of messages when needed: when an email message object is referenced or when an iterator is set up (on the assumption that when you set up an iterator, you plan to consume the whole set of messages.) This latter is because fetching 100 messages in a single fetch is far faster than 100 single message fetches. The default is to only fetch the headers, except when the text itself is explicitly asked for. This default can be changed by setting priority='both' or priority='text' when you call iMapBox to open a connection to the server. 

Friday, October 7, 2011

Disruptive innovation, buy vs. build, the most pernicious lie in business, and how to know if you're fooling yourself

If a man has good corn or wood, or boards, or pigs, to sell, or can make better chairs or knives, crucibles or church organs, than anybody else, you will find a broad hard-beaten road to his house, though it be in the woods. 
—Ralph Waldo Emerson, big fat liar

No matter what the dictionary says, you can't describe a company as disruptive without giving weight to Christensen's description of innovation. It's perhaps overly simplistic to divide innovation into two categories--disruptive and sustaining--but the strikingly different characteristics of companies pursuing these strategies makes the partition a natural one.

Sustaining innovation means finding ways to do things better. Lowering the cost of manufacturing a widget by 10%, making a widget 20% more durable while only spending 10% more, reorganizing a department so ten people can do the work of twelve, creating an integrated supply chain to deliver goods to your stores in smaller quantities and less time. That sort of thing. Sustaining innovation often results in products that exceed customer needs at a given price point. The proliferating options in Microsoft Office show a sustaining innovation cycle that has exceeded most of the market's need.

Disruptive innovation means creating a product or service that is radically cheaper but much less functional (and this needs to appeal to a customer set that was previously underserved, so disruptive innovation often creates entirely new markets) and then using sustaining innovation to improve it until it meets mainstream customer needs (but is still radically cheaper.)

Before Google, there was targeted advertising. Very targeted. Hog Farmers Digest (now National Hog Farmer) was aimed at hog farmers. If you were a hog farmer, you read it; if you weren't, you didn't. It was a pretty effective buy: not a lot of wasted impressions. But creating an entire magazine for a very specific market is a difficult business proposition. The fixed cost of putting a book together limits how small its audience can be and so how targeted its ads can be.

Google's disruptive innovation was being able to create content for next to nothing. They can create a page that addresses a market segment as small as a single person for nominal marginal cost. Even though the content was lower quality than that it was competing with--the lack of human writers and editors means that any specific page is much less useful than a well-written and thought-out page would be--it turned out it was good enough. And because advertisers could be so specific in their buy, they could spend much less money. This opened up an entirely new market: advertisers that don't have multi-million dollar budgets.

Existing publishers could not compete: they could not lower their cost per page to anywhere near Google's. If they tried, they would lose quality and the loss of quality would mean losing their existing customers. This is the beauty of disruptive innovation: it is almost impossible for incumbents to respond. Disruptive innovations are disruptive because business logic precludes old-line companies from shrinking their business to address the disruptors.

It's incredibly difficult and expensive to challenge incumbents with nothing but a better product. Sustaining innovations are easy to copy and well-managed incumbents are always on the lookout for challengers and willing to learn from them. But when a disruptor comes along, they are trapped.

*****

What kind of innovation are we peddling in adtech? Article after article calls our companies disruptive, but do we really fit the Christensen mold? A disruption scenario would look like this:
  • the existing industry would supply a product of higher quality/functionality than the majority of potential customers actually needs and at a very high price;
  • the disruptive companies would find a way to bring in a product of lower quality/functionality at a much lower price;
  • customers that did not need and could not afford the old product would emerge as customers of the disruptive product, allowing the new companies the wherewithal to quickly mature their technology until it was competitive in the old product's market.
Does this sound like ad tech to you? It doesn't to me. The current ad-world is not supplying services at a higher quality than its customers need and there seems to be advertising inventory at every price point. If you can't supply advertising at a radically lower price point to customers who were previously underserved at a quality level that the incumbents are not interested in touching, you aren't really in a position to be disruptive. Almost all of adtech now is sustaining innovation: building a better mousetrap.

We clearly have a better solution than what existed, no argument. But the big lie of business, the pernicious fallacy that has deluded countless entrepreneurs, is that if you build a better mousetrap the world will beat a path to your door. It doesn't work that way.

*****

What is going on in adtech right now is clearly innovative. But because it's not disruptive in the Christensen sense, it means we're going to have to earn our money. We need to move fast to build scale.

There have been scores of M&A discussions in adtech this Summer and only a few have resulted in deals. One of the things I heard as an excuse over and over (from buyers, from sellers, from bankers, from founders, after a few drinks) is that the buyer said "we don't need to pay up for this, we could build it internally."

Build versus buy is an interesting discussion to have before you buy anything, especially something with the revenue multiple adtech VCs are looking for. Cold hard fact is, there's almost nothing out there in adtech that someone else couldn't build from scratch. The CTO would certainly tell the CEO that building would be cheaper than buying a company, and be right.

And yet, and yet. And yet the companies that are prowling for bargains still can't get advertising right. They clearly have a ton of tech talent in their core businesses, and the ability to hire more. They have the money to hire and manage and build adtech solutions. But they don't. Why not?

When I was at Omnicom, back in the 90s, investing in the early interactive agencies--clearly not disruptive businesses--the old-guard ad agencies that then made up the bulk of Omnicom's business talked big about building their own interactive units. But they never could. They also refused to pay the valuations the i-agencies commanded. They were on the sidelines while their clients hired hotshot young startups to build their websites, and some of the startups got pretty big in the process.

There were several reasons for this. Primarily, the old guard couldn't hire good people: no one who understood the web back then would go work for an agency whose primary business was making 30 second films for TV. Why would anyone who was any good go be a second-class citizen at a firm that was paying nothing but a salary and had no career path in interactive? Why wouldn't they go instead to Razorfish and get stock options and be a hero to their management everyday? They would, of course, and they did. And almost all the true stars of that era spent time in one of the independent agencies.

As then as now. Why would any competent adtech engineer go work for AOL or Yahoo or Twitter or any of the other big old companies where stock options issued today will in all probability never be worth anything? There are plenty of good jobs at exciting startups where there's the possibility of making actual money*. More importantly, why go to one of those big companies and be a second-class citizen, the "ad guy," when at a startup you're essential to their product?**

Companies can do very well at their core mission. But when their core mission is media or software or infrastructure or professional services, it's going to be really hard for them to get a foothold in the quickly changing adtech world. This never seems to be taken into account in build versus buy analyses: they can't build, and even if they could, they won't. And if they do, it will suck. Trust me, I've been there. And if you don't trust me, just take a look around.

But remember that the era of the independent i-agencies only lasted some six or seven years. At some point the number of people that could do the work more than competently was enough that even old-line agencies could hire them. At that point the i-agencies were like every other agency: they competed head-to-head with the old guard. Many of the biggest remained independent until acquired for great prices. But these were the ones who earned it. Unlike a disruptive business where nothing but guts, an innovative spirit and a huge dose of luck are necessary, competing head-to-head means competing: blood, sweat and tears.

We need to keep building, ignore the distractions and focus on winning clients, not just raising money, so that when it comes time to compete head-to-head, we will win. That's as it should be, of course, and I think many of our industry leaders have what it takes. But if you're starting an adtech company and you want to win, you have to know that you're in it for the long-term. It's a marathon, not a sprint, the cliche goes, and it's true.

*****

Meh, you say. I'm disruptive, I am going to go viral, achieve imminent world domination and sell to Google for $5 billion in two years. Neumann's an idiot.

Maybe. But disruptive businesses have certain characteristics. Ask yourself these questions.

1. Am I creating a new market, bringing in a set of customers for whom there was previously no value proposition?

Disruptive businesses bring out a product or service that is so far off the industry price/quality line that customers who would never have used the industry's products start to. This gives the disruptor the foothold it needs to start improving quality until it threatens the incumbents. Google AdWords is an excellent example of this.

Who are the unserved markets in advertising? Are there any? I think there are, and I think that if you don't see any, you need to think about what advertising is more broadly.

2. What is price in my market?

If you're in ad-tech, what does price even mean to your end-customers (the advertisers***)? Is it just lower CPMs? There have always been low CPMs out there. Is it higher ROI? That's probably closer to the mark. The best answer I have heard is that it is lower risk: the ability to more accurately predict ROI.

You have to credibly answer this question and then be radically better along this dimension if you are disruptive. I think there are many answers here, and your answer will depend on your answer to question one, above.

3. What is quality in my market?

In disk drives (Christensen's first case study), this is an easy question: quality is how much data can be stored. The disruptors built lower-quality disk drives at lower prices, then used the march of progress to threaten the old-line disk makers. The old-line disk makers' customers wanted more storage, not less, so they did not see this market and could not address it with the existing customer bases. But key to the disruptors long-term value was the ability to improve quality quickly. If they could not, they would not have been able to displace the old guard.

What is quality in adtech? Conversion? Click-through? Pinpoint targeting? And if you know what quality is to your market, can you then improve quickly along that metric so you serve not only the new market you've created, but the giant market that already exists?

Quality. I've been thinking about this question for ten years and don't have a definitive answer. Do you?

If you do, if you think you really have a disruptive business model, call me, I'm looking to back people like you.

-----
* If this is you, email me.
** Soldiers don't get promoted if they haven't seen battle. If you want a career path, always take the job in the middle of the action, even if it pays worse.
*** And are the advertisers really your customers? Why aren't the 'consumers'?

Sunday, September 11, 2011


To my friends who died ten years ago, I hope you had a fortunate rebirth.

To my friends who had family members die, may your loved ones find happiness.

To my son, born 42 weeks and one day later: there is only loss if there is love, the way forward is always through love.





Thursday, August 4, 2011

How I Wrote VCBar

All the people ask me
How I wrote elastic man.

                     - The Fall
My friend Chris Wiggins asked me to post the code for the VC bar chart generator I blogged earlier this week. It's here.

It's an interesting project if only because it's run entirely on the client-side. There's no server side (except, of course, for delivering the files to you.) This is possible because the Crunchbase API supports JSON callbacks. Every bit of code in the git repo is as you see it on http://neuvc.com/labs/vcbar.

But in the spirit of making the source code available, I'm going to go one better and show you how to write your own visualization of Crunchbase data. Because there's no server-side, you can play with this code on your computer with nothing more than a text editor and a web browser.

Adapt this code to visualize other data sets: people respond to visualizations and, as this shows, it's not very hard to make them.

This code is going to be as bare as possible, no bells and whistles. I hope to illustrate just the bones of it. You can add bells and whistles and DTD declarations to your hearts' delight, but this works too.

*****

The program will be broken into three parts: the HTML, the CSS and the Javascript.

The HTML

Create a directory on your computer, download d3.js from https://github.com/mbostock/d3/archives/master, unzip the archive and move the file d3.js into your new directory. Then create a file named index.html in the directory. Put this in it:

<html> 
   <head>  
      <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.5.2/jquery.min.js"></script> 
      <script src="d3.js"></script>  
      <script src="vcbar.js"></script> 
      <link rel="stylesheet" type="text/css" href="vcbar.css" /> 
   </head> 

   <body> 
      <div id="barchart"> 
      </div> 
      <div id="controls"> 
         <a href="#" id="union-square-ventures">Union Square Ventures</a>
         <a href="#" id="true-ventures">True Ventures</a>
       </div> 
   </body> 
</html>

Pretty simple. The head loads the javascript (including jQuery from Google's CDN) and the CSS. The body has two divs, one named "barchart"--this is where the javascript will put the chart object itself--and one named "controls", where links for the two VC firms this example will link to will live. Note that the links do not link anywhere. We will use the javascript to execute an action when a link is clicked.

The Javascript, part 1: Getting and Parsing the data

Put all the javascript into a file called vcbar.js in the same directory as index.html.

There are three things we want to do in the program:
1. Detect when one of the links is clicked;
2. Get and parse the data;
3. Display the bar chart.

The first is easy, especially using jQuery:
$(document).ready(function () {
   $("a").click(function () {
      var vc = $(this).attr("id");
      $.getJSON("http://api.crunchbase.com/v/1/financial-organization/" +
               vc + ".js?callback=?",parseCB);
      return;
   });
});

This code uses jQuery (the '$') to run an anonymous function every time an <a> tag is clicked. The function first gets the id attribute of the clicked tag (which we set to Crunchbase's unique identifier, their 'permalink') and then uses jQuery to execute an Ajax call for JSON, with a callback. The empty return does nothing except prevent the default click action. The enclosing document.ready method makes sure the script won't try to attach the code until after the HTML is loaded.

Part of the reason this site can do everything it does on the client-side is because Crunchbase's API supports JSON callbacks. In general, client-side Javascript can't go willy-nilly fetching things from other sites because of the same origin policy enforced by browsers for security purposes. But if you're trying to pull the data from a site that supports JSON with callbacks, you can easily get data from it.

The getJSON function sends a request to Crunchbase for the VC's data. You can see an example of the raw JSON here. When the data returns it calls the callback function--parseCB--with the JSON as the argument. Note that this happens asynchronously, so if you send multiple calls (as with the vcbar site, when you click one of the subset buttons) the data does not necessarily come back in the order you asked for it. Or, maybe, at all. The callback function gets called once for each set of JSON. You need to think through the implications, in some cases.

Here we're asking for one set of data, so it's easy.  Here's parseCB:

var parseCB = function(jsn) {
   var idx, yr, mo,
       byear=2005, eyear=2011,
       months=(eyear-byear+1) * 12,
       data=[];
      
   for (var i=0; i < months; i+=1) { data[i] = 0; };
  
   if ("investments" in jsn) {
      for (var i in jsn["investments"]) {
         var j = jsn["investments"][i];
         if ("funding_round" in j) {
            yr = j["funding_round"]["funded_year"];
            mo = j["funding_round"]["funded_month"];
            if (!yr || !mo || (mo == "None")) { continue };
            idx = (parseInt(yr)-byear) * 12 + parseInt(mo) - 1;
            if (idx < 0) { continue };
            data[idx] +=1;
         };
      };
   };
   return bchart(data,byear);
};

The first few lines declare the function's variables. They set the beginning year to 2005, the end year to 2011 and then calculate the number of months in that span. Then it creates an array with a zero value for each month.

The function then parses the JSON. Go look at the raw JSON at the link above again, if you want to see what's going on here. First it tests to see if there is an "investments" key in the JSON. If there is an investments key, the corresponding value will be an array with an entry for each investment. Each entry in this array will be a dictionary with keys for "funded_year" and "funded_month". parseCB first tests to make sure that neither the year nor the month is empty and that the month is not "None", then computes how many months from beginning of 2005 (byear) until the investment was made. It then increments the array element representing that month.

When it is finished slotting each investment into a month, it calls bchart, the charting function.

The Javascript, part 2: Charting

The bar chart function is essentially cribbed from Mike Bostock's bar chart tutorial. It uses the d3.js data manipulation library to create a SVG element in the HTML.

Here's the code, broken into chunks so I can explain it.  It's all inside a

var bchart = function (data, byear) {
   ...
};

First, let's set up some variables. h is the height, totw is the total width, w is the width of each bar, lgst is the largest value in the data to be charted, tks is the number of horizontal ticks we want, years is an array of years from the beginning year (byear) to the end year (this is used to label the x-axis.)

y is a special d3 function that maps the 'domain' to the 'range'. In this case, it maps a value from 0 to lgst to the range 0 to h. That is, y(x) = x * h / lgst. This scales the bars so the largest value in the data is the height of the chart.

var h = 300,
    totw = 800,
    w = totw / data.length,
    lgst = d3.max(data),
    tks = Math.min(lgst,5),
    years = d3.range(byear,byear+data.length/12+1),
    y = d3.scale.linear()
          .domain([0,lgst])
          .range([0,h]);

Then, let's get rid of any chart that happens to already be there, so we don't keep adding new charts one after the other.

$(".chart").remove();

Now we add a SVG element to the div with id="barchart". We will make it wider than totw and higher than h so we have room to add the axes and their labels.

// insert SVG element     
var chart = d3.select("#barchart")
              .append("svg:svg")
                .attr("class","chart")
                .attr("width", totw+40)
                .attr("height", h+40);

Then we'll add the x and y-axis ticks, the light gray lines that help us see what the values are. We use a built-in d3 function called ticks, which chooses sensible values for the ticks based on tks, the number of ticks we want. The way d3 works (and I'm not going to explain this in too much depth, you can go to the d3 site for much better explantions) is that it takes an array of data (the data method below the select ), iterates through each item and uses the enter method to put that data into existing svg elements that match the select. If there are not enough existing elements, it appends them, as here.

The below code iterates through each of the ticks generated by ticks and appends a new svg:line with attributes (x1, y1) and (x2, y2). The methods chained after data can have anonymous functions that have access to the data in the array (d) and the index of the data (i). For instance, the y-axis ticks have an x1 of 20 (I've added an offset of 20 to all the x values to accomodate the y-axis labels) and an x2 of totw+20. The y1 and y2 value are trickier. They are both the same (it's a horizontal line) and they both take the d value (where the tick is), scale it using the y function and then subtract that value from h, because the origin of the svg plotting area, the (0,0) point, is in the top left whereas our chart's (0,0) point is in the bottom left.

The text labels do something similar. The y-axis uses the tick value as a string for the text and the dx attribute to move the label slightly before the axis itself. The x-axis uses the array of years we created earlier as labels, and centers them between ticks.

   // create y-axis ticks
   chart.selectAll("line.hrule")
            .data(y.ticks(tks))
        .enter().append("svg:line")
            .attr("class","hrule")
            .attr("x1",20)
            .attr("x2",totw+20)
            .attr("y1",function(d) { return h-y(d); })
            .attr("y2",function(d) { return h-y(d); })
            .attr("stroke","#ccc");

   // label y-axis ticks  
   chart.selectAll("text.hrule")
            .data(y.ticks(tks))
        .enter().append("svg:text")
            .attr("class","hrule")
            .attr("x",20)
            .attr("y",function(d) { return h-y(d); })
            .attr("dx",-1)
            .attr("text-anchor","end")
            .text(String);

   // create x-axis ticks           
   chart.selectAll("line.vrule")
            .data(years)
        .enter().append("svg:line")
            .attr("class","vrule")
            .attr("y1",h+10)
            .attr("y2",0)
            .attr("x1",function(d) { return (d-byear)*w*12 + 20; })
            .attr("x2",function(d) { return (d-byear)*w*12 + 20; })
            .attr("stroke","#ccc");

   // label x-axis ticks          
   chart.selectAll("text.vrule")
            .data(years)
        .enter().append("svg:text")
            .attr("class","vrule")
            .attr("y",h)
            .attr("x",function(d) { return (d-byear) * w * 12 + w * 6 + 20; })
            .attr("dy",10)
            .attr("text-anchor","middle")
            .text(String);

Now we create the data bars. Here we feed the d3 the array of data. For each of the data elements it creates (using enter) a new svg:rect, a rectangle.  Each rectangle has x and y as its top left point and a width and height. The rectangles will also be styled by the CSS, which we'll talk about later on.

    // create bars
    var bars = chart.selectAll("rect")
            .data(data)
        .enter().append("svg:rect")
            .attr("x", function(d, i) { return i * w + 20; })
            .attr("y", function(d) { return h - y(d); })
            .attr("width",w)
            .attr("height", function(d) { return y(d); }); 

And, finally, the x and y axes. The reason we create the ticks first, then the bars and then the x and y-axis is that this is the order of layering we want, ticks at the bottom, bars on top of them, then the axes.

   // create x-axis
   chart.append("svg:line")
        .attr("x1",20)
        .attr("y1",h)
        .attr("x2",totw + 20)
        .attr("y2",h)
        .attr("stroke","#000");

   // create y-axis               
   chart.append("svg:line")
        .attr("x1",20)
        .attr("y1",h)
        .attr("x2",20)
        .attr("y2",0)
        .attr("stroke","#000");

Don't forget to include the function declaration before all the chart code and the '};' after it all. Just saying. Also, the javascript should have the functions first, so essentially in the opposite order presented here. I've put all the javascript in one contiguous piece at the bottom*.

That's the chart. After that, the CSS is a piece of cake.

CSS

Nothing fancy here. Put it in a file called vcbar.css in the same directory as index.html.

.chart {
    margin-left: 40px;
    font: 10px sans-serif;
    shape-rendering: crispEdges;
}
           
.chart rect {
    stroke: white;
    fill: steelblue;
}

And that's it. If you put this code into files on your computer and open index.html from your web browser, you should get a chart. Then go and change the code and see what happens, or add lots more code and do something really, really cool. When you do, tweet me, I want to see it.

-----
* vcbar.js, in total:

var bchart = function (data, byear) {
   var h = 300,
       totw = 800,
       w = totw / data.length,
       lgst = d3.max(data),
       tks = Math.min(lgst,5),
       years = d3.range(byear,byear+data.length/12+1);

    $(".chart").remove();

    var y = d3.scale.linear()
             .domain([0,lgst])
           .range([0,h]);

   // insert SVG element      
    var chart = d3.select("#barchart")
        .append("svg:svg")
            .attr("class","chart")
            .attr("width", totw+40)
            .attr("height", h+40);

   // create y-axis ticks
    chart.selectAll("line.hrule")
            .data(y.ticks(tks))
        .enter().append("svg:line")
            .attr("class","hrule")
            .attr("x1",20)
            .attr("x2",totw+20)
            .attr("y1",function(d) { return h-y(d); })
            .attr("y2",function(d) { return h-y(d); })
            .attr("stroke","#ccc");

   // label y-axis ticks  
    chart.selectAll("text.hrule")
            .data(y.ticks(tks))
        .enter().append("svg:text")
            .attr("class","hrule")
            .attr("x",20)
            .attr("y",function(d) { return h-y(d); })
            .attr("dx",-1)
            .attr("text-anchor","end")
            .text(String);

   // create x-axis ticks           
    chart.selectAll("line.vrule")
            .data(years)
        .enter().append("svg:line")
            .attr("class","vrule")
            .attr("y1",h+10)
            .attr("y2",0)
            .attr("x1",function(d) { return (d-byear)*w*12 + 20; })
            .attr("x2",function(d) { return (d-byear)*w*12 + 20; })
            .attr("stroke","#ccc");

   // label x-axis ticks          
    chart.selectAll("text.vrule")
            .data(years)
        .enter().append("svg:text")
            .attr("class","vrule")
            .attr("y",h)
            .attr("x",function(d) { return (d-byear) * w * 12 + w * 6 + 20; })
            .attr("dy",10)
            .attr("text-anchor","middle")
            .text(String);
   
    // create bars
    var bars = chart.selectAll("rect")
            .data(data)
        .enter().append("svg:rect")
            .attr("x", function(d, i) { return i * w + 20; })
            .attr("y", function(d) { return h - y(d); })
            .attr("width",w)
            .attr("height", function(d) { return y(d); }); 

   // create x-axis
    chart.append("svg:line")
        .attr("x1",20)
        .attr("y1",h)
        .attr("x2",totw+20)
        .attr("y2",h-.5)
        .attr("stroke","#000");

   // create y-axis               
    chart.append("svg:line")
        .attr("x1",20)
        .attr("y1",h)
        .attr("x2",20)
        .attr("y2",0)
        .attr("stroke","#000");     
};

var parseCB = function(jsn) {
   var idx, yr, mo,
       byear=2005, eyear=2011,
       months=(eyear-byear+1) * 12,
       data=[];
      
   for (var i=0; i < months; i+=1) { data[i] = 0 };
  
   if ("investments" in jsn) {
      for (var i in jsn["investments"]) {
         var j = jsn["investments"][i];
         if ("funding_round" in j) {
            yr = j["funding_round"]["funded_year"];
            mo = j["funding_round"]["funded_month"];
            if (!yr || !mo || (mo == "None")) { continue };
            idx = (parseInt(yr)-2005) * 12 + parseInt(mo) - 1;
            if (idx < 0) { continue };
            data[idx] +=1  
         };
      };
   };
   return bchart(data,byear);
};

$(document).ready(function () {
   $("a").click(function () {
     var vc = $(this).attr("id");
     $.getJSON("http://api.crunchbase.com/v/1/financial-organization/" + vc + ".js?callback=?",parseCB);
     return;
   });
});

Monday, August 1, 2011

Pace of VC investing by subsector

I couldn't sleep last night so I figured I'd see if I could confirm a nagging suspicion about the early-stage VCs I know. About six months ago it seemed like they were slowing down their pace of investing while the corporates and newer super-angels were doing a lot more deals. If this were true it would be an interesting warning sign.

So I downloaded d3.js, pulled out the list of VCs I put together for VCdelta and built a visualizer for Crunchbase data. It's fun to play with*.

Here's a graph of the deals the 150+ VCs have done since 2005, according to Crunchbase. If you go to the site and click "All" at the bottom, you get this, except it's live to add and subtract either VC firms or round types from and you can hover over the bars and see the names of the companies invested in that month**. You can also, if you click the subsets below, see who I included and who I didn't. And then add or subtract to your heart's content.

What looks like a small downturn in 2008 and 2009 in deals done is mainly due to VCs continuing to do later rounds--B and later. I assume many of these were into companies that were already portfolio companies.

Here are all the VCs, but just the rounds tagged Seed, Angel and A.
This makes it easier to see the dropoff in 2008 and 2009. But the low point in early stage investments came later than I thought, in 2009. It had seemed to me that early 2008 was dryer. Also, according to Crunchbase, more early stage deals are getting done now than in 2007.

New York City is on a roll, right? Right. Below are the NYC funds (not NYC deals) and how many early stage (Seed, Angel, A) deals they did.

Compare this to Sand Hill Road:

Sand Hill Road has remained relatively conservative into 2010 and 2011.

Some other VC subsets. I used the top 20 venture capitalists in Forbes' Midas List to create a 'smart money' subset of firms. Here are their early-stage deals. The pronounced uptick from the lows in 2008 and 2009 into 2010 and 2011 are heartening.


I also made a subset consisting of firms that have been around since before the 1980s, the 'old school.'  I assumed that if they've made it this long, they must be doing something right. Their increase in early stage investments, while less pronounced, is also heartening.

Last, the Super Angels. No surprise here.

The one thing these graphs don't do is support my original thesis, VCs are not slowing down their funding of early-stage companies. Interestingly, I found that even the VCs who have flat-out told me they are slowing down their investing are not really doing so: while there's fear in the market, VCs are also clearly seeing opportunities they can't turn down.

-----
* d3.js is awesome. The Yieldbot guys turned me on to it. I'm just learning it, so I know I'm manhandling it something awful, but it's a joy to work with.
** Let's do the usual caveats: Crunchbase data sucks for this kind of thing. It's incomplete, it's biased, it's not very clean or accurate, etc. This is all completely offset by the fact that it's free. If I had a better dataset, I'd use it, but I don't.