LeadCrunch Demo Video
Why LeadCrunch Wins in a Crowded Market
Startup Show San Diego Interview with LeadCrunch
LeadCrunch - We care about your success
Lookalike Audiences for B2B Lead Generation
At first blush, creating better leads for companies that sell things to other companies may not seem like a big deal in a social sense, but I think it really is. Companies aren’t this sort of cold corporate entities. They are really people, people trying to do things. In a lot of cases, they have built really great products, really great services, and yet they fail. I think one of the big reasons they fail is they just have not been able to connect with the market.
It could be that they failed to get product market fit right, that they built the wrong solution. I think a lot of times, they’ve built a great solution, but they just don’t have sales capabilities, or marketing capabilities. They don’t have that little bit of luck to find those first important customers who are going to pay the bills and help them prove that what they have really works in the marketplace.
What we’re doing here helps more companies do that better and not just the big guys. For example, in B2C marketing, you know who the behemoths are, right? Amazon, Facebook, Twitter, Netflix, these guys all have boatloads of data and they have the best machine learning guys money can buy. What about the little guys who are trying to sell B2C? It’s not a particularly equitable situation when it comes to the distribution of data, which is critical to success, and the AI talent, which is also critical to success in these kind of marketing applications.
In B2B, the story is different. Nobody is dominant yet, and we can make a difference and make this something that small companies, big companies, everybody can have kind of on par. I think that’s really good for innovation and getting the best products to bubble to the top, and that helps everybody.
The data in B2B that is required is just forming up. A lot of it is in pretty awful shape, even basic things about companies, things like what industry a company is in. We could talk about that at length. There’re lots of issues with it.
If you go to a data source, think of Dun & Bradstreet or one of those similar data sources and ask. Pick a company, any company, and ask them, “What industry is this company in?” It’s right somewhere in the vicinity of 50% or 60% of the time. Even if it’s reasonable, it’s not the whole picture. It’s the same with headcount, revenue, and those sorts of things.
The real issue is those things are far too simplistic to figure out what companies need and what they’re able to buy and consume and use for their benefit. It’s a much more complicated picture than B2C where you’re talking about a single individual, small-sized transactions generally. You’re talking about motivations that really aren’t that complicated. In B2B, there are teams of people who decide what to buy. There are companies that are either aware or not aware of what needs they have. Those things are always changing. It’s very difficult to get a clear picture of which companies need what I’m selling, and which companies are ready to buy. It’s very difficult compared to B2C. I think that’s why it’s been tough to solve.
When I first dove into this domain, it was a head-scratcher. Why hadn’t more progress been made? A lot of companies with a lot of funding had been trying to apply AI machine learning in this area, yet no one had really hit the ball out of the park. There was no clear winner. There was no one who had made us say, “Okay. They’re really on to the right core of the solution.”
What I think happened was that because the data was in such bad shape and because there’s so much complexity to deal with, you have to know about all the companies and all the people who work at all the companies and how they interact with each other. It’s much easier for someone trying to solve those issues to say, “Look. I can’t deal with this complexity. I’m going to have to pick a piece where I can see some clarity.”
It’s one reason you see a lot of companies in the intent space. “Hey, let’s just figure out who’s raising their hand, who’s giving me signals that they need to buy something in this particular category.”
The problem with that is it’s only applicable to maybe a couple of percent of the cases of deciding about what to buy and when. Even if you have a great signal, it’s not great all the time. In fact, it’s usually great just a couple percent of the time. There’s no free lunch here. This problem is going to require pasting together lots of tiny little pieces, and so far, nobody seems to have the formula for doing that.
When I looked at the problem, that’s what I saw, and I just said, “Look. We’re going to have to do our best. We may not get enough of these little pieces pulled together, but if we don’t try, we’re not going to get the solution we want.” My approach is different. It’s to face this complexity head on. I know I’m not going to be able to solve it perfectly, but I’m going to take the best shot I can at putting all these pieces together and making a reasonable picture out of it to get a sense of why companies do what they do. To our surprise, we’ve got a small but very powerful team of scientists here. We’ve been able to actually show that, by doing that, we can create performance that others can’t.
In our view, industry is the core activity of a company. Companies buy things. They take inputs. They transform those inputs and they turn it into outputs that they sell. That’s what companies do. An industry is designed to capture that core activity that they’re doing.
Codes like the SIC code and NAICS codes that replaced them, these are government codes, but there are a lot of organizations that have their own homegrown codes that have been based on a couple of really big assumptions that have giant implications for analyzing companies. They weren’t intended to support the kind of machine learning that we’re doing. They get a free pass for that, but it doesn’t mean that you can lean on that data and get what you want out of it. You have to fix it. You have to do something about it.
The big assumption that’s built in that often gets overlooked is that when you look at a company, they’re either in that category or not. You are either an agricultural company or you are not. Now, you might be able to say, “That’s one of my secondary codes,” or “Hey, that’s my primary code and I have a couple of other secondary codes,” but you’re either in or you’re out. I’m either a retailer or I’m not. I’m either an information company or I’m not.
That turns out to be a huge assumption, because a lot of companies do a lot of things. If you think about Apple, what do they do? Well, gosh! They build computers. That’s for sure. They’re a computer manufacturer. They do retail like crazy. They also build phones. In a sense, they’ve got a foot in the utility space. They write a lot of software, so they do that. They do consulting services when you go in to an Apple store. Now, they’re doing media.
There are a million different things. Are they all those things to the same degree? No. Not really. They are probably strongest in manufacturing and retail and some of these other things, like entertainment, media, not so strong, but they do them, and it’s important that they do them if you’re trying to figure out what they might want to buy.
Any company, not just Apple, but any company, is going to have a more complicated picture and it’s really important to get at the degree to which they do all the things they do to understand them. Even if we do that right, which we’ve constructed a system here where we’re able to do that right, you still have the issue of figuring out what each company is. The way that’s worked in the past is companies kind of raise their hand when they form, or occasionally a company like Dun & Bradstreet will reach out to them and try and determine what’s correct. There’s not a lot of incentive to get that exactly right, and that’s why we see these accuracy rates in the vicinity of batting .500. They’re not great.
The other issue and the reason for that, really, the big reason is the categories were defined for other reasons. NAICS code weren’t defined to really understand the core activity of companies. They were designed to understand the sorts of people who were hired at companies and the sorts of processes that companies engaged in. Kind of clump those together so similar processes happen at similar codes for companies. It serves that fairly well, but it doesn’t serve what we’re trying to do, which is figuring out what a company needs to buy.
There are lots of confusions, lots of conflated issues in the way the world looks at industry. I think the biggest is that a company’s market, who they sell to, and a company’s core activity, what they really do as they transform inputs to outputs, get confused all the time. I’m a software company. I build software systems for hospitals. I sell into the healthcare space. A fair amount of the time, we’ll see that company listed as a healthcare company, which is nonsense. They don’t treat patients. That’s not their business, but they will be listed as a healthcare company, or maybe as a secondary. I’m a software company and my secondary industry is healthcare.
It’s really important to tease those apart accurately and have a way of understanding a company’s market and a company’s industry. They’re separate issues. There are many, many issues that the way these taxonomies are setup. Many categories, things like warehousing, or enterprise management, that’s something that every company does to a degree, those codes in the NAICS and SIC code hierarchies really aren’t core activities. They’re not. They’re activities, but one is a supply chain role and the other is just something every company does to make an operation work.
We tease those apart as well and we clean this mess up. We build the right kind of coding scheme. We don’t confuse market and industry, and then we have the problem of figuring out from the data available for a company what is their industry and what is their market. We’ve built models to solve that problem.
If we talk about an entity in the B2B ecosystem—let me draw a little bit of that here. We have a bunch of companies. I’ll make companies circles, and we have a bunch of people who work at these companies. I can’t draw them all, there are millions and millions of the companies and hundreds of millions of people who work there, and not every one of these triangles is an employee. Some might be investors. Some might just be consumers of products or services. We’re going to connect these guys. There are lots of connections, different types.
One of the most important is which companies sell to which other companies. I might sell something to this guy, and he might sell something back to me. This guy might sell things to this guy, but he actually is operating as a channel and he’s reselling those to these guys. There’re a lot of buying-and-selling relationships going on. Those are very important in the ecosystem. But there’re also people who work at companies, people who used to work at companies. He used to work there. He also used to work here. This guy actually consults for two companies. These guys work here, and this guy is his friend. There’s a connection there, and he knows this guy, too.
It’s a complicated picture, with lots of companies and lots of people. There are more entities, things like ideas that flow in this ecosystem. Relationships are critical, but the nodes are important, too. For every one of these nodes, I need to know more about this company. What do I know about Company C1?
In the traditional framework, I would know, like I said, their industry, their headcount, their revenue, things like that. There might be a few more, but it gets really thin pretty quickly. It turns out that’s not enough, and it’s also not enough to know an individual’s job title at this company, right? I need to go way deeper.
For a company, I need to know a lot more things. For an individual, I need to know a lot more things. The way we’ve chosen to do that is the concept of a vector. It’s just like the vectors you learned about in physics, such as the forces acting on a rock, the gravity downward, and maybe the force of throwing it upward that you imparted on it when you threw it. There’re vectors associated with the direction and magnitude of those forces.
It’s the same thing here. A vector is really just a list of numbers, and the key point that I want to get across is numbers and continuous values versus what we have in the old data repositories of our companies, which are very categorical. What industry category are you? What headcount bin are you in?
We took something that’s naturally a number and we turned it into a category. Why did we do that? I don’t know. Maybe it was convenient, but it’s not particularly useful when we’re really trying to dig in and understand companies.
Our decision was that everything is a number, and it’s typically a real number, meaning it didn’t have any value. If I’m going to understand this company, I’m going to want to understand its industry, headcount, revenue, growth. Maybe I want to score for its sophistication and marketing, so marketing sophistication score … right? This goes on and on. Hundreds and hundreds of characteristics I might want to know about companies.
I can even make these things categorical. There might be 100 different industry codes, right? Typically, things like NAICS and SIC have thousands, tens of thousands of codes. We chose not to do that. We made it much simpler. For every one of these different categories that are in industry, we actually have a real number: A .9 in agriculture. A .1 in mining. Zero in construction … Lots of different things for industry.
Headcount. There is some sort of measure of what my headcount is relative to all other companies, say, in my category. Revenue; same thing, everything becomes a real number. Growth; I’m growing exceptionally fast, so I’m going to get a .95 in growth. My marketing sophistication is low, so I’m going to get a 0.1 there.
When we squash all of these out, we end up a big list of numbers. That’s all a vector is. It’s just a list of numbers, and the useful thing about having a list of numbers is you can do a lot of comparisons from one company. This is the vector for company C1, but I have a whole bunch more companies up here, millions of them. If I want to compare them to each other, I have a lot more flexibility when every one of them is represented by exactly the same structure for a list of numbers.
Their numbers are different, but the structure is the same. If this has 75 real numbers in it, every one of these guys does, I can do comparisons of all of it or pieces of it to answer questions like, “Hey, if I give you company C1, can you go find me out of all the millions of companies, the ones that have industry, [so these first few codes here, these first few real numbers], the most similar to this guy?” That’s a mathematical question. There’s a body of mathematics, essentially linear algebra, that says, “Go find me the closest numbers to this guy.” That math is blazingly fast. I don’t have to do a bunch of database queries to make it work.
It’s really, really a convenient way to find what’s similar and what’s not, and that is a deeply valuable thing to do many times over for different questions that we might want to answer, and it has superior performance to just querying a database and trying to draw some sort of a box and say, “Well, if your headcount is between here and here and your industry is either this one or that one, those are the types of queries that get done today.” This is a far more capable way of doing that, and it’s a much more precise way to characterize what this company really is.
We do that for individuals, too. What are their skills? What sort of career path have they taken? What’s their education? Those kinds of things. Everything in this diagram gets a vector, and we can use those numbers to ask and answer a lot of questions that are very valuable to us.
These companies joined the LeadCrunch movement.