Tuesday, February 28, 2012

Selecting, not filtering: Give me a reason to say yes

Raising money for my last startup was humbling, frustrating, time-consuming. But that part was okay: any highly selective process will be humbling, frustrating, and time-consuming. The part that really bothered me wasn't that it took so much time, but that so much of that time was a complete and total waste. In almost all of the VC meetings, we did not leave with a check. But in some 80% of the meetings we also did not leave with any insight as to why not*.

The best VCs listened to us and then gave us some insight into their thinking. Fred Wilson and Brad Burnham actually said no to us and then worked through their thinking about what we were doing in great and helpful detail. Josh Kopelman and Howard Morgan told us it wouldn't work, told us exactly why, then invested, and--after that--introduced us to people who helped us fix the flaws in the plan. But many others gave us either no response at all ("we'll get back to you") or generic non-responses ("we'd like to see more traction.")

We had a pretty firm idea of the problem we wanted to solve, but we were somewhat flexible about how we would solve it. We used the feedback from the money-raising process to hone our ideas. When we got no feedback, we felt we had given the VCs critical market intelligence and gotten nothing in return.

One of my ideals when I started investing was to always provide feedback when I said no. But here I am four years in, going through the pitches that piled up last week while I was on vacation. I'm finding it hard to live up to that ideal. I'm saying no to companies that I don't have a concrete reason to say no to. After a bit of introspection, I think that finding a reason to say no is not really how I make the hard decisions.

I have a tangible reason to say no to some 85% of the pitches I see, and I say yes to less than 2% (some of these I don't end up doing because we can't agree on a deal.) Here's a swag at how my dealflow works out:

  • 40%: No; I do not know your market well enough to help you succeed (also known as I do not know your market well enough to make a good decision about investing);
  • 20%: No; I do not think your idea will work and I can't see where else you will be able to put the technology you're building to work/you are completely inflexible about entertaining other potential markets for your technology/you are too flexible about where you will put your technology to work (the "we're a platform!" syndrome);
  • 10%: No; You are creating something merely better, not different;
  • 5%: No; You have the wrong team/your team does not seem to gel/you do not seem to think you need a team at all/you are coding in .NET;
  • 5%: No; Other explainable reasons;
  • 5%: No; Bad**;
  • 13%: Meh;
  • 2%: Like.
I always explain, in as much detail as the entrepreneur wants, my thinking behind the 85% where I can say no. And I am always happy to explain why I like the 2% I like.

The rub is in the penultimate 13%. These are companies that I don't have a real reason to say no to, companies where I analytically think they have a venture-capital-winner expected value but where I just can't get excited about them. The reality is that with these companies--and, in fact, with all companies--I am not looking for a reason to say no; I am looking for a reason to say yes. With the 85%, there is a glaring reason why I can't say yes. With the 13%, I just can't get the word to come out of my mouth.

For the companies I can easily say no to, some dimension of their plan (team, market, vision, product, customer, etc.) does not rise above my threshold of yes. For the 15%, all aspects do. Analytically the fitness function then necessarily also rises above my threshold.

The difference between the 'meh' and the 'like' is that the 'meh' companies are good enough in all aspects but not great in any of them. The 'like' companies are the ones where they really excel in at least a couple of ways: a great team, a big market, a compelling vision. I try to select not just for how good a company is, but how good it will be. It's easy to improve along one dimension, it's possible to improve along a couple of dimensions, but it's almost impossible to improve along all dimensions. The companies that are just good enough in all dimensions need to improve in all dimensions. The companies that are great in a few just need to improve in a few others, not all, to be great overall.

In fact, some of my favorite companies are the ones that may not even rise above the threshold in one or two dimensions but make up for it by having a superstar team or a gigantic market or a world-beating vision. These are the companies that have a shot at being legendary.

I don't know what to say to 'meh' companies after they pitch me. It's hard for you to recover from a "we're not so bad" pitch. But if you're dreaming up your startup right now my recommendation is to be good at everything, but to be insanely great at something. That's what gets me excited.

-----
* And, I should note, the founding team knew the venture market inside and out. We had done our research on which firms to approach based on what they were interested in, which partners to approach, had pre-sold the idea before the physical meeting, had customized the deck to highlight the aspects that particular firm/partner could grab onto most quickly, etc. Highly suggested in any case.

** My dealflow right now is pretty highly curated so I don't get a lot of pitches that are just, well, bad. Not to be judgemental. Bad, to me, is a founder who simply does not know what they're doing: a non-coder trying to enter a market either (i) that they just don't know anything about--generally where they've had a bad customer experience but have not done the research to understand the institutional framework behind the root cause, (ii) where there are great companies already doing exactly what they want to do and they've never heard of them, or (iii) that is so small that even revolutionizing it will create almost no societal value. Or, they could give a damn about creating societal value, they just want to make some money quick.

Tuesday, January 17, 2012

VC/Company Investment Visualizer

A friend asked me last week if I knew a tool to help him visualize which VCs were investing in a sector. I did not. But I realized I could pretty quickly repurpose the VC Bar Chart code and some unpublished code that pulls in data from a Google spreadsheet to show a force-directed graph. So, weekend project.

Data from Crunchbase, visualizaton using the d3.js library.

Here's my portfolio.

The site is here. Just start typing company names in the upper-left hand corner box and hit plus to add. Real name to Crunchbase permalink translation uses the list of companies as of Friday* or so, so if the company was added to CB later, autocomplete finds nothing;  just type in the permalink and the company will still be added. In the screenshot above a few of my companies had no CB investor entries, so they're just floating out there. Many of my other companies are not linked to me because CB does not mention me as an investor.

One way to explore is to enter a bunch of companies in your area of interest and see how the graph falls out.  Here's one of the AdTech industry.


The save functionality is experimental (to me, that is.) It uses HTML5 localStorage. The caveat is that you can't email visualizations around that way, and there may be times when your browser clears localStorage (sometimes when clearing cookies, for example.) If it does, you lose all saved visualizations.

The code is all client-side, so it's right there in your browser if you want to look at it. I found myself late last night using a non-analytical debugging process** when I was trying to get the 'load visualization' piece to work. I'll put it up on GitHub some time after I clean it up.

-----
* And I redacted the list to only include companies that CB showed having investors. The full list was too big to load efficiently.
** Mainly making random code deletions.

Friday, December 2, 2011

You can't manage what you can't measure. Not at scale, anyway.

A year ago I wrote, re investing in social marketing, "The social loop will share superficial characteristics with the display loop, but it's really completely different... the area with the most near-term leverage will be tools that help communicators understand the impact of how they are communicating and then help them make better decisions." This has turned out to be completely true.

I've been thinking about social marketing for five years. It has seemed obvious that major advances in marketing technique will occur through the social channel, but it was never clear to me exactly what those would be. I looked at and worked with a couple dozen social media marketing companies before throwing up my hands and declaring non-prescience.

My rule of thumb is that when the evolution of the landscape seems unknowable it is usually because the technology that will underpin the advance is still in flux. The obvious solution is dropping a level deeper in the stack and looking for investments there. In mobile, that meant Flurry four years ago and Media Armor a year ago. In social, it meant Awe.sm.

The smartest guy I ever knew in the ad business (like being the tallest dwarf, I know...) said, of managing people, "Whatever chart you put on the wall goes up."
That was me, the tallest dwarf, from back when I knew Clay, when he was just another guy.

I worked at IBM during the heyday of the Six Sigma movement. I was a design engineer, trying to optimize a very small piece of the central processor of what became the System 390 series of mainframes. As a design engineer there were several layers of abstraction between me and the silicon: the design language was a visual one--I wrote a flowchart which was compiled into a set of logic gates which were then mapped onto silicon. Aside from tweaking the logic gate-level design to try to get better performance, I spent my time at the flowchart level, as did most of the engineers.

Six Sigma methodology has you measure processes, find causes of errors and remedy them. The idea is to improve processes until there are fewer than 3.4 defects per million. IBM had a company-wide mandate to implement Six Sigma. I was subject to this mandate.

I asked my manager how I was supposed to measure my 'defects' and why would I even want to if I had to define them in such a way that I essentially never, ever made that type of mistake. He said "How are you going to improve if you aren't noticing your mistakes and figuring out how to stop making them?" "I already do that," I said, "I'm just not marking them down on some stupid piece of graph paper thats been pre-printed with a normal curve." He said "But then how can we manage it?"

Ah, Bach.

You can't manage what you can't measure. Stupid as managing designers on the binary idea of defect/not-defect and on such a stringent scale, constantly knowing how well you are doing so that you can constantly improve is extremely powerful. This idea, probably more than any other, drives my investment strategy: things that are not being measured are being managed poorly; creating new ways to measure creates ways of doing things immensely better, it creates entirely new businesses.

The fact is, you do get what you measure, whatever graph you put on the wall will go up. But the moral of that pithy aphorism was meant to be: be careful what you wish for.

If what you are measuring in social marketing is Likes or Follows, that is what you will get. But how closely aligned are these measures with what a business really wants: happy and loyal customers, higher sales? You don't know. No one knows. This particular loop hasn't been closed. Because the social gesture cause and business result can't be tied together in a measurable way, it can't be managed and it can't be improved.

I invested in Awe.sm's seed round because they provide core social measurement functionality, the ability to tie social actions into their actual results, to close the loop. I re-upped into their Series A because they're now doing something even more interesting: they're providing this functionality to other developers via API. Instead of being just an analytics player, they're now enabling the creation of an entire social marketing infrastructure that can use measurement to provide a ever-improving feedback loop.

I may have gravitated to marketing in part because dealing directly with people is too messy to ever even approach Six Sigma, but the engineer in me still believes that by measuring you can improve, and by linking measurement and algorithms you can create a feedback loop that allows you to improve adaptively and in real-time. This idea has revolutionized online advertising over the past few years. It's going to revolutionize social marketing over the next few.

Wednesday, November 23, 2011

iMapBox

I've always been the type who, when confronted with a one-hour task, will instead take two hours to automate it. Here's an example.

VCdelta is my bot that tracks additions to VC portfolio pages. It has its own twitter feed. Its twitter feed is about to surpass my twitter feed in number of followers. It seems my bot is more interesting than I am. I thought it would be interesting to graph the number of people who have followed me versus the number of people who have followed VCdelta over time. Twitter does not provide stats like that, but whenever I get a follow email from Twitter, I hit archive, not delete. So all I needed to do was count the follow emails by month.

Turns out Python doesn't have a very good library for using a mailbox as a data source. The Python email libraries assume you are planning on writing an email client. So I wrote an abstraction layer for the Python IMAP library. Code is here*.

Here's the code to count twitter followers:

import IMapBox 

me=IMapBox.IMapBox("imap.gmail.com",my_acct,my_pwd)
mymail=me["[Gmail]/All Mail"]

myfollows=mymail.frm("twitter").subject("following")

mydates=[myfollows[x]['date'] for x in myfollows]

The 'me=' and 'mymail=' open a connection to my email account and select a mailbox, in this case the All Mail mailbox. (The command 'me.list()' lists all the mailboxes for the account.)

The next line filters mymail so myfollows is only emails from Twitter that have 'following' in the subject line**. iMapBox is lazy--it doesn't fetch the emails itself until it has to--so this is pretty fast. myfollows acts like a dictionary, so you can len() it, ask for the keys()--these would be the message IDs--or the items(), iterate over it, or get items.

Each of the items in the dictionary is an email message. These also act like dictionaries, with keys like 'from','to','subject','date', and 'text'. The next line creates a list called mydates of the date each follow email was sent. It does this by iterating over each item in myfollows and pulling its date out. This is the slower part: when you set up an iterator, iMapBox gets all the headers***.

The part about counting follows per date I will leave as an exercise to the reader. Here's the graph of my follows and VCdelta's follows. I've been tweeting for some three years, VCdelta for six months.


On a sidenote, this is a logarithmic scale. The green line is my trend. This is odd, no? I mean, I'm not getting exponentially more popular, so this argues that a lot of follow behavior is algorithmic of some sort. I had expected more linear growth.  I also expect VCdelta to level out soon, as it reaches the limits of its natural audience.

Another example, email volume over time:



You can see where I started using my current email account full-time, in September 2006. And you can see when I started investing full-time, in mid-2009. And you can see why my email response time has slowed dramatically.

The code:

from datetime import date, timedelta
import IMapBox

me=IMapBox.IMapBox("imap.gmail.com",my_acct,my_pwd)
mymail=me["[Gmail]/All Mail"]

for yr in range(2006,2012):
 for mo in range(1,13):
  beg_month = date(yr,mo,1)
  end_month = date(yr+mo//12,mo%12+1,1)-timedelta(days=1)
  print mo,"/",yr,"\t",len(mymail.dates(beg_month,end_month))

This is an alternative way to count emails per month, filtering by date instead of collecting dates. The 'dates(x,y)' method filters the emails for only those that were received between date x and date y (inclusive.) This is faster because even the headers are never fetched.

Some other ways to use it:

c=mymail.frm('josh')+mymail.frm('matt')
d=mymail.frm('josh')-mymail.to('matt')
e=mymail.today()
f=-mymail.today()

The first is all messages from either Josh or Matt. The second is all messages from Josh that aren't also to Matt, the third is all today's messages, the fourth is all messages except today's.

 ----- 
 * I'm an electrical engineer, not a computer scientist. So I can build a waveguide to your specifications, but I'm not entirely sure that this code is all that good. Please, feel free to fork, suggest improvements, make improvements, tutor me on garbage collection or unit testing, whatever. 
 ** I like object chaining. I know it's not Pythonic, but I'm not sure why. It strikes me that since I don't really understand too deeply how Python garbage collects, that this may be creating extraneous intermediate objects. If you plan to use this is any sort of real code, you might want to figure that out. I did notice that if I object-chain the IMAP connection ('me' in this example), it gets dereferenced and gc'd, which invoked the very polite __del__ method, closing the connection. I'm not sure how to avoid that, so I just commented out the __del__ method, leaving a messy open connection to the server. 
*** My thinking is to only go do the time-consuming fetching of messages when needed: when an email message object is referenced or when an iterator is set up (on the assumption that when you set up an iterator, you plan to consume the whole set of messages.) This latter is because fetching 100 messages in a single fetch is far faster than 100 single message fetches. The default is to only fetch the headers, except when the text itself is explicitly asked for. This default can be changed by setting priority='both' or priority='text' when you call iMapBox to open a connection to the server. 

Friday, October 7, 2011

Disruptive innovation, buy vs. build, the most pernicious lie in business, and how to know if you're fooling yourself

If a man has good corn or wood, or boards, or pigs, to sell, or can make better chairs or knives, crucibles or church organs, than anybody else, you will find a broad hard-beaten road to his house, though it be in the woods. 
—Ralph Waldo Emerson, big fat liar

No matter what the dictionary says, you can't describe a company as disruptive without giving weight to Christensen's description of innovation. It's perhaps overly simplistic to divide innovation into two categories--disruptive and sustaining--but the strikingly different characteristics of companies pursuing these strategies makes the partition a natural one.

Sustaining innovation means finding ways to do things better. Lowering the cost of manufacturing a widget by 10%, making a widget 20% more durable while only spending 10% more, reorganizing a department so ten people can do the work of twelve, creating an integrated supply chain to deliver goods to your stores in smaller quantities and less time. That sort of thing. Sustaining innovation often results in products that exceed customer needs at a given price point. The proliferating options in Microsoft Office show a sustaining innovation cycle that has exceeded most of the market's need.

Disruptive innovation means creating a product or service that is radically cheaper but much less functional (and this needs to appeal to a customer set that was previously underserved, so disruptive innovation often creates entirely new markets) and then using sustaining innovation to improve it until it meets mainstream customer needs (but is still radically cheaper.)

Before Google, there was targeted advertising. Very targeted. Hog Farmers Digest (now National Hog Farmer) was aimed at hog farmers. If you were a hog farmer, you read it; if you weren't, you didn't. It was a pretty effective buy: not a lot of wasted impressions. But creating an entire magazine for a very specific market is a difficult business proposition. The fixed cost of putting a book together limits how small its audience can be and so how targeted its ads can be.

Google's disruptive innovation was being able to create content for next to nothing. They can create a page that addresses a market segment as small as a single person for nominal marginal cost. Even though the content was lower quality than that it was competing with--the lack of human writers and editors means that any specific page is much less useful than a well-written and thought-out page would be--it turned out it was good enough. And because advertisers could be so specific in their buy, they could spend much less money. This opened up an entirely new market: advertisers that don't have multi-million dollar budgets.

Existing publishers could not compete: they could not lower their cost per page to anywhere near Google's. If they tried, they would lose quality and the loss of quality would mean losing their existing customers. This is the beauty of disruptive innovation: it is almost impossible for incumbents to respond. Disruptive innovations are disruptive because business logic precludes old-line companies from shrinking their business to address the disruptors.

It's incredibly difficult and expensive to challenge incumbents with nothing but a better product. Sustaining innovations are easy to copy and well-managed incumbents are always on the lookout for challengers and willing to learn from them. But when a disruptor comes along, they are trapped.

*****

What kind of innovation are we peddling in adtech? Article after article calls our companies disruptive, but do we really fit the Christensen mold? A disruption scenario would look like this:
  • the existing industry would supply a product of higher quality/functionality than the majority of potential customers actually needs and at a very high price;
  • the disruptive companies would find a way to bring in a product of lower quality/functionality at a much lower price;
  • customers that did not need and could not afford the old product would emerge as customers of the disruptive product, allowing the new companies the wherewithal to quickly mature their technology until it was competitive in the old product's market.
Does this sound like ad tech to you? It doesn't to me. The current ad-world is not supplying services at a higher quality than its customers need and there seems to be advertising inventory at every price point. If you can't supply advertising at a radically lower price point to customers who were previously underserved at a quality level that the incumbents are not interested in touching, you aren't really in a position to be disruptive. Almost all of adtech now is sustaining innovation: building a better mousetrap.

We clearly have a better solution than what existed, no argument. But the big lie of business, the pernicious fallacy that has deluded countless entrepreneurs, is that if you build a better mousetrap the world will beat a path to your door. It doesn't work that way.

*****

What is going on in adtech right now is clearly innovative. But because it's not disruptive in the Christensen sense, it means we're going to have to earn our money. We need to move fast to build scale.

There have been scores of M&A discussions in adtech this Summer and only a few have resulted in deals. One of the things I heard as an excuse over and over (from buyers, from sellers, from bankers, from founders, after a few drinks) is that the buyer said "we don't need to pay up for this, we could build it internally."

Build versus buy is an interesting discussion to have before you buy anything, especially something with the revenue multiple adtech VCs are looking for. Cold hard fact is, there's almost nothing out there in adtech that someone else couldn't build from scratch. The CTO would certainly tell the CEO that building would be cheaper than buying a company, and be right.

And yet, and yet. And yet the companies that are prowling for bargains still can't get advertising right. They clearly have a ton of tech talent in their core businesses, and the ability to hire more. They have the money to hire and manage and build adtech solutions. But they don't. Why not?

When I was at Omnicom, back in the 90s, investing in the early interactive agencies--clearly not disruptive businesses--the old-guard ad agencies that then made up the bulk of Omnicom's business talked big about building their own interactive units. But they never could. They also refused to pay the valuations the i-agencies commanded. They were on the sidelines while their clients hired hotshot young startups to build their websites, and some of the startups got pretty big in the process.

There were several reasons for this. Primarily, the old guard couldn't hire good people: no one who understood the web back then would go work for an agency whose primary business was making 30 second films for TV. Why would anyone who was any good go be a second-class citizen at a firm that was paying nothing but a salary and had no career path in interactive? Why wouldn't they go instead to Razorfish and get stock options and be a hero to their management everyday? They would, of course, and they did. And almost all the true stars of that era spent time in one of the independent agencies.

As then as now. Why would any competent adtech engineer go work for AOL or Yahoo or Twitter or any of the other big old companies where stock options issued today will in all probability never be worth anything? There are plenty of good jobs at exciting startups where there's the possibility of making actual money*. More importantly, why go to one of those big companies and be a second-class citizen, the "ad guy," when at a startup you're essential to their product?**

Companies can do very well at their core mission. But when their core mission is media or software or infrastructure or professional services, it's going to be really hard for them to get a foothold in the quickly changing adtech world. This never seems to be taken into account in build versus buy analyses: they can't build, and even if they could, they won't. And if they do, it will suck. Trust me, I've been there. And if you don't trust me, just take a look around.

But remember that the era of the independent i-agencies only lasted some six or seven years. At some point the number of people that could do the work more than competently was enough that even old-line agencies could hire them. At that point the i-agencies were like every other agency: they competed head-to-head with the old guard. Many of the biggest remained independent until acquired for great prices. But these were the ones who earned it. Unlike a disruptive business where nothing but guts, an innovative spirit and a huge dose of luck are necessary, competing head-to-head means competing: blood, sweat and tears.

We need to keep building, ignore the distractions and focus on winning clients, not just raising money, so that when it comes time to compete head-to-head, we will win. That's as it should be, of course, and I think many of our industry leaders have what it takes. But if you're starting an adtech company and you want to win, you have to know that you're in it for the long-term. It's a marathon, not a sprint, the cliche goes, and it's true.

*****

Meh, you say. I'm disruptive, I am going to go viral, achieve imminent world domination and sell to Google for $5 billion in two years. Neumann's an idiot.

Maybe. But disruptive businesses have certain characteristics. Ask yourself these questions.

1. Am I creating a new market, bringing in a set of customers for whom there was previously no value proposition?

Disruptive businesses bring out a product or service that is so far off the industry price/quality line that customers who would never have used the industry's products start to. This gives the disruptor the foothold it needs to start improving quality until it threatens the incumbents. Google AdWords is an excellent example of this.

Who are the unserved markets in advertising? Are there any? I think there are, and I think that if you don't see any, you need to think about what advertising is more broadly.

2. What is price in my market?

If you're in ad-tech, what does price even mean to your end-customers (the advertisers***)? Is it just lower CPMs? There have always been low CPMs out there. Is it higher ROI? That's probably closer to the mark. The best answer I have heard is that it is lower risk: the ability to more accurately predict ROI.

You have to credibly answer this question and then be radically better along this dimension if you are disruptive. I think there are many answers here, and your answer will depend on your answer to question one, above.

3. What is quality in my market?

In disk drives (Christensen's first case study), this is an easy question: quality is how much data can be stored. The disruptors built lower-quality disk drives at lower prices, then used the march of progress to threaten the old-line disk makers. The old-line disk makers' customers wanted more storage, not less, so they did not see this market and could not address it with the existing customer bases. But key to the disruptors long-term value was the ability to improve quality quickly. If they could not, they would not have been able to displace the old guard.

What is quality in adtech? Conversion? Click-through? Pinpoint targeting? And if you know what quality is to your market, can you then improve quickly along that metric so you serve not only the new market you've created, but the giant market that already exists?

Quality. I've been thinking about this question for ten years and don't have a definitive answer. Do you?

If you do, if you think you really have a disruptive business model, call me, I'm looking to back people like you.

-----
* If this is you, email me.
** Soldiers don't get promoted if they haven't seen battle. If you want a career path, always take the job in the middle of the action, even if it pays worse.
*** And are the advertisers really your customers? Why aren't the 'consumers'?

Sunday, September 11, 2011


To my friends who died ten years ago, I hope you had a fortunate rebirth.

To my friends who had family members die, may your loved ones find happiness.

To my son, born 42 weeks and one day later: there is only loss if there is love, the way forward is always through love.





Thursday, August 4, 2011

How I Wrote VCBar

All the people ask me
How I wrote elastic man.

                     - The Fall
My friend Chris Wiggins asked me to post the code for the VC bar chart generator I blogged earlier this week. It's here.

It's an interesting project if only because it's run entirely on the client-side. There's no server side (except, of course, for delivering the files to you.) This is possible because the Crunchbase API supports JSON callbacks. Every bit of code in the git repo is as you see it on http://neuvc.com/labs/vcbar.

But in the spirit of making the source code available, I'm going to go one better and show you how to write your own visualization of Crunchbase data. Because there's no server-side, you can play with this code on your computer with nothing more than a text editor and a web browser.

Adapt this code to visualize other data sets: people respond to visualizations and, as this shows, it's not very hard to make them.

This code is going to be as bare as possible, no bells and whistles. I hope to illustrate just the bones of it. You can add bells and whistles and DTD declarations to your hearts' delight, but this works too.

*****

The program will be broken into three parts: the HTML, the CSS and the Javascript.

The HTML

Create a directory on your computer, download d3.js from https://github.com/mbostock/d3/archives/master, unzip the archive and move the file d3.js into your new directory. Then create a file named index.html in the directory. Put this in it:

<html> 
   <head>  
      <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.5.2/jquery.min.js"></script> 
      <script src="d3.js"></script>  
      <script src="vcbar.js"></script> 
      <link rel="stylesheet" type="text/css" href="vcbar.css" /> 
   </head> 

   <body> 
      <div id="barchart"> 
      </div> 
      <div id="controls"> 
         <a href="#" id="union-square-ventures">Union Square Ventures</a>
         <a href="#" id="true-ventures">True Ventures</a>
       </div> 
   </body> 
</html>

Pretty simple. The head loads the javascript (including jQuery from Google's CDN) and the CSS. The body has two divs, one named "barchart"--this is where the javascript will put the chart object itself--and one named "controls", where links for the two VC firms this example will link to will live. Note that the links do not link anywhere. We will use the javascript to execute an action when a link is clicked.

The Javascript, part 1: Getting and Parsing the data

Put all the javascript into a file called vcbar.js in the same directory as index.html.

There are three things we want to do in the program:
1. Detect when one of the links is clicked;
2. Get and parse the data;
3. Display the bar chart.

The first is easy, especially using jQuery:
$(document).ready(function () {
   $("a").click(function () {
      var vc = $(this).attr("id");
      $.getJSON("http://api.crunchbase.com/v/1/financial-organization/" +
               vc + ".js?callback=?",parseCB);
      return;
   });
});

This code uses jQuery (the '$') to run an anonymous function every time an <a> tag is clicked. The function first gets the id attribute of the clicked tag (which we set to Crunchbase's unique identifier, their 'permalink') and then uses jQuery to execute an Ajax call for JSON, with a callback. The empty return does nothing except prevent the default click action. The enclosing document.ready method makes sure the script won't try to attach the code until after the HTML is loaded.

Part of the reason this site can do everything it does on the client-side is because Crunchbase's API supports JSON callbacks. In general, client-side Javascript can't go willy-nilly fetching things from other sites because of the same origin policy enforced by browsers for security purposes. But if you're trying to pull the data from a site that supports JSON with callbacks, you can easily get data from it.

The getJSON function sends a request to Crunchbase for the VC's data. You can see an example of the raw JSON here. When the data returns it calls the callback function--parseCB--with the JSON as the argument. Note that this happens asynchronously, so if you send multiple calls (as with the vcbar site, when you click one of the subset buttons) the data does not necessarily come back in the order you asked for it. Or, maybe, at all. The callback function gets called once for each set of JSON. You need to think through the implications, in some cases.

Here we're asking for one set of data, so it's easy.  Here's parseCB:

var parseCB = function(jsn) {
   var idx, yr, mo,
       byear=2005, eyear=2011,
       months=(eyear-byear+1) * 12,
       data=[];
      
   for (var i=0; i < months; i+=1) { data[i] = 0; };
  
   if ("investments" in jsn) {
      for (var i in jsn["investments"]) {
         var j = jsn["investments"][i];
         if ("funding_round" in j) {
            yr = j["funding_round"]["funded_year"];
            mo = j["funding_round"]["funded_month"];
            if (!yr || !mo || (mo == "None")) { continue };
            idx = (parseInt(yr)-byear) * 12 + parseInt(mo) - 1;
            if (idx < 0) { continue };
            data[idx] +=1;
         };
      };
   };
   return bchart(data,byear);
};

The first few lines declare the function's variables. They set the beginning year to 2005, the end year to 2011 and then calculate the number of months in that span. Then it creates an array with a zero value for each month.

The function then parses the JSON. Go look at the raw JSON at the link above again, if you want to see what's going on here. First it tests to see if there is an "investments" key in the JSON. If there is an investments key, the corresponding value will be an array with an entry for each investment. Each entry in this array will be a dictionary with keys for "funded_year" and "funded_month". parseCB first tests to make sure that neither the year nor the month is empty and that the month is not "None", then computes how many months from beginning of 2005 (byear) until the investment was made. It then increments the array element representing that month.

When it is finished slotting each investment into a month, it calls bchart, the charting function.

The Javascript, part 2: Charting

The bar chart function is essentially cribbed from Mike Bostock's bar chart tutorial. It uses the d3.js data manipulation library to create a SVG element in the HTML.

Here's the code, broken into chunks so I can explain it.  It's all inside a

var bchart = function (data, byear) {
   ...
};

First, let's set up some variables. h is the height, totw is the total width, w is the width of each bar, lgst is the largest value in the data to be charted, tks is the number of horizontal ticks we want, years is an array of years from the beginning year (byear) to the end year (this is used to label the x-axis.)

y is a special d3 function that maps the 'domain' to the 'range'. In this case, it maps a value from 0 to lgst to the range 0 to h. That is, y(x) = x * h / lgst. This scales the bars so the largest value in the data is the height of the chart.

var h = 300,
    totw = 800,
    w = totw / data.length,
    lgst = d3.max(data),
    tks = Math.min(lgst,5),
    years = d3.range(byear,byear+data.length/12+1),
    y = d3.scale.linear()
          .domain([0,lgst])
          .range([0,h]);

Then, let's get rid of any chart that happens to already be there, so we don't keep adding new charts one after the other.

$(".chart").remove();

Now we add a SVG element to the div with id="barchart". We will make it wider than totw and higher than h so we have room to add the axes and their labels.

// insert SVG element     
var chart = d3.select("#barchart")
              .append("svg:svg")
                .attr("class","chart")
                .attr("width", totw+40)
                .attr("height", h+40);

Then we'll add the x and y-axis ticks, the light gray lines that help us see what the values are. We use a built-in d3 function called ticks, which chooses sensible values for the ticks based on tks, the number of ticks we want. The way d3 works (and I'm not going to explain this in too much depth, you can go to the d3 site for much better explantions) is that it takes an array of data (the data method below the select ), iterates through each item and uses the enter method to put that data into existing svg elements that match the select. If there are not enough existing elements, it appends them, as here.

The below code iterates through each of the ticks generated by ticks and appends a new svg:line with attributes (x1, y1) and (x2, y2). The methods chained after data can have anonymous functions that have access to the data in the array (d) and the index of the data (i). For instance, the y-axis ticks have an x1 of 20 (I've added an offset of 20 to all the x values to accomodate the y-axis labels) and an x2 of totw+20. The y1 and y2 value are trickier. They are both the same (it's a horizontal line) and they both take the d value (where the tick is), scale it using the y function and then subtract that value from h, because the origin of the svg plotting area, the (0,0) point, is in the top left whereas our chart's (0,0) point is in the bottom left.

The text labels do something similar. The y-axis uses the tick value as a string for the text and the dx attribute to move the label slightly before the axis itself. The x-axis uses the array of years we created earlier as labels, and centers them between ticks.

   // create y-axis ticks
   chart.selectAll("line.hrule")
            .data(y.ticks(tks))
        .enter().append("svg:line")
            .attr("class","hrule")
            .attr("x1",20)
            .attr("x2",totw+20)
            .attr("y1",function(d) { return h-y(d); })
            .attr("y2",function(d) { return h-y(d); })
            .attr("stroke","#ccc");

   // label y-axis ticks  
   chart.selectAll("text.hrule")
            .data(y.ticks(tks))
        .enter().append("svg:text")
            .attr("class","hrule")
            .attr("x",20)
            .attr("y",function(d) { return h-y(d); })
            .attr("dx",-1)
            .attr("text-anchor","end")
            .text(String);

   // create x-axis ticks           
   chart.selectAll("line.vrule")
            .data(years)
        .enter().append("svg:line")
            .attr("class","vrule")
            .attr("y1",h+10)
            .attr("y2",0)
            .attr("x1",function(d) { return (d-byear)*w*12 + 20; })
            .attr("x2",function(d) { return (d-byear)*w*12 + 20; })
            .attr("stroke","#ccc");

   // label x-axis ticks          
   chart.selectAll("text.vrule")
            .data(years)
        .enter().append("svg:text")
            .attr("class","vrule")
            .attr("y",h)
            .attr("x",function(d) { return (d-byear) * w * 12 + w * 6 + 20; })
            .attr("dy",10)
            .attr("text-anchor","middle")
            .text(String);

Now we create the data bars. Here we feed the d3 the array of data. For each of the data elements it creates (using enter) a new svg:rect, a rectangle.  Each rectangle has x and y as its top left point and a width and height. The rectangles will also be styled by the CSS, which we'll talk about later on.

    // create bars
    var bars = chart.selectAll("rect")
            .data(data)
        .enter().append("svg:rect")
            .attr("x", function(d, i) { return i * w + 20; })
            .attr("y", function(d) { return h - y(d); })
            .attr("width",w)
            .attr("height", function(d) { return y(d); }); 

And, finally, the x and y axes. The reason we create the ticks first, then the bars and then the x and y-axis is that this is the order of layering we want, ticks at the bottom, bars on top of them, then the axes.

   // create x-axis
   chart.append("svg:line")
        .attr("x1",20)
        .attr("y1",h)
        .attr("x2",totw + 20)
        .attr("y2",h)
        .attr("stroke","#000");

   // create y-axis               
   chart.append("svg:line")
        .attr("x1",20)
        .attr("y1",h)
        .attr("x2",20)
        .attr("y2",0)
        .attr("stroke","#000");

Don't forget to include the function declaration before all the chart code and the '};' after it all. Just saying. Also, the javascript should have the functions first, so essentially in the opposite order presented here. I've put all the javascript in one contiguous piece at the bottom*.

That's the chart. After that, the CSS is a piece of cake.

CSS

Nothing fancy here. Put it in a file called vcbar.css in the same directory as index.html.

.chart {
    margin-left: 40px;
    font: 10px sans-serif;
    shape-rendering: crispEdges;
}
           
.chart rect {
    stroke: white;
    fill: steelblue;
}

And that's it. If you put this code into files on your computer and open index.html from your web browser, you should get a chart. Then go and change the code and see what happens, or add lots more code and do something really, really cool. When you do, tweet me, I want to see it.

-----
* vcbar.js, in total:

var bchart = function (data, byear) {
   var h = 300,
       totw = 800,
       w = totw / data.length,
       lgst = d3.max(data),
       tks = Math.min(lgst,5),
       years = d3.range(byear,byear+data.length/12+1);

    $(".chart").remove();

    var y = d3.scale.linear()
             .domain([0,lgst])
           .range([0,h]);

   // insert SVG element      
    var chart = d3.select("#barchart")
        .append("svg:svg")
            .attr("class","chart")
            .attr("width", totw+40)
            .attr("height", h+40);

   // create y-axis ticks
    chart.selectAll("line.hrule")
            .data(y.ticks(tks))
        .enter().append("svg:line")
            .attr("class","hrule")
            .attr("x1",20)
            .attr("x2",totw+20)
            .attr("y1",function(d) { return h-y(d); })
            .attr("y2",function(d) { return h-y(d); })
            .attr("stroke","#ccc");

   // label y-axis ticks  
    chart.selectAll("text.hrule")
            .data(y.ticks(tks))
        .enter().append("svg:text")
            .attr("class","hrule")
            .attr("x",20)
            .attr("y",function(d) { return h-y(d); })
            .attr("dx",-1)
            .attr("text-anchor","end")
            .text(String);

   // create x-axis ticks           
    chart.selectAll("line.vrule")
            .data(years)
        .enter().append("svg:line")
            .attr("class","vrule")
            .attr("y1",h+10)
            .attr("y2",0)
            .attr("x1",function(d) { return (d-byear)*w*12 + 20; })
            .attr("x2",function(d) { return (d-byear)*w*12 + 20; })
            .attr("stroke","#ccc");

   // label x-axis ticks          
    chart.selectAll("text.vrule")
            .data(years)
        .enter().append("svg:text")
            .attr("class","vrule")
            .attr("y",h)
            .attr("x",function(d) { return (d-byear) * w * 12 + w * 6 + 20; })
            .attr("dy",10)
            .attr("text-anchor","middle")
            .text(String);
   
    // create bars
    var bars = chart.selectAll("rect")
            .data(data)
        .enter().append("svg:rect")
            .attr("x", function(d, i) { return i * w + 20; })
            .attr("y", function(d) { return h - y(d); })
            .attr("width",w)
            .attr("height", function(d) { return y(d); }); 

   // create x-axis
    chart.append("svg:line")
        .attr("x1",20)
        .attr("y1",h)
        .attr("x2",totw+20)
        .attr("y2",h-.5)
        .attr("stroke","#000");

   // create y-axis               
    chart.append("svg:line")
        .attr("x1",20)
        .attr("y1",h)
        .attr("x2",20)
        .attr("y2",0)
        .attr("stroke","#000");     
};

var parseCB = function(jsn) {
   var idx, yr, mo,
       byear=2005, eyear=2011,
       months=(eyear-byear+1) * 12,
       data=[];
      
   for (var i=0; i < months; i+=1) { data[i] = 0 };
  
   if ("investments" in jsn) {
      for (var i in jsn["investments"]) {
         var j = jsn["investments"][i];
         if ("funding_round" in j) {
            yr = j["funding_round"]["funded_year"];
            mo = j["funding_round"]["funded_month"];
            if (!yr || !mo || (mo == "None")) { continue };
            idx = (parseInt(yr)-2005) * 12 + parseInt(mo) - 1;
            if (idx < 0) { continue };
            data[idx] +=1  
         };
      };
   };
   return bchart(data,byear);
};

$(document).ready(function () {
   $("a").click(function () {
     var vc = $(this).attr("id");
     $.getJSON("http://api.crunchbase.com/v/1/financial-organization/" + vc + ".js?callback=?",parseCB);
     return;
   });
});