How Cloudflare is Working to Fix the Internet with Matthew Prince

Published: Aug. 10, 2023, 10 a.m.

b'

Matthew Prince, Co-founder & CEO at Cloudflare, joins Corey on Screaming in the Cloud to discuss how and why Cloudflare is working to solve some of the Internet\\u2019s biggest problems. Matthew reveals some of his biggest issues with cloud providers, including the tendency to charge more for egress than ingress and the fact that the various clouds don\\u2019t compete on a feature vs. feature basis. Corey and Matthew also discuss how Cloudflare is working to change those issues so the Internet is a better and more secure place. Matthew also discusses how transparency has been key to winning trust in the community and among Cloudflare\\u2019s customers, and how he hopes the Internet and cloud providers will evolve over time.


About Matthew

Matthew Prince is co-founder and CEO of Cloudflare. Cloudflare\\u2019s mission is to help build a better Internet. Today the company runs one of the world\'s largest networks, which spans more than 200 cities in over 100 countries. Matthew is a World Economic Forum Technology Pioneer, a member of the Council on Foreign Relations, winner of the 2011 Tech Fellow Award, and serves on the Board of Advisors for the Center for Information Technology and Privacy Law. Matthew holds an MBA from Harvard Business School where he was a George F. Baker Scholar and awarded the Dubilier Prize for Entrepreneurship. He is a member of the Illinois Bar, and earned his J.D. from the University of Chicago and B.A. in English Literature and Computer Science from Trinity College. He\\u2019s also the co-creator of Project Honey Pot, the largest community of webmasters tracking online fraud and abuse.


Links Referenced:

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.



Corey: Welcome to Screaming in the Cloud. I\\u2019m Corey Quinn. One of the things we talk about here, an awful lot is cloud providers. There sure are a lot of them, and there\\u2019s the usual suspects that you would tend to expect with to come up, and there are companies that work within their ecosystem. And then there are the enigmas.



Today, I\\u2019m talking to returning guest Matthew Prince, Cloudflare CEO and co-founder, who\\u2026 well first, welcome back, Matthew. I appreciate your taking the time to come and suffer the slings and arrows a second time.



Matthew: Corey, thanks for having me.



Corey: What I\\u2019m trying to do at the moment is figure out where Cloudflare lives in the context of the broad ecosystem because you folks have released an awful lot. You had this vaporware-style announcement of R2, which was an S3 competitor, that then turned out to be real. And oh, it\\u2019s always interesting, when vapor congeals into something that actually exists. Cloudflare Workers have been around for a while and I find that they become more capable every time I turn around. You have Cloudflare Tunnel which, to my understanding, is effectively a VPN without the VPN overhead. And it feels that you are coming at building a cloud provider almost from the other side than the traditional cloud provider path. Is it accurate? Am I missing something obvious? How do you see yourselves?



Matthew: Hey, you know, I think that, you know, you can often tell a lot about a company by what they measure and what they measure themselves by. And so, if you\\u2019re at a traditional, you know, hyperscale public cloud, an AWS or a Microsoft Azure or a Google Cloud, the key KPI that they focus on is how much of a customer\\u2019s data are they hoarding, effectively? They\\u2019re all hoarding clouds, fundamentally. Whereas at Cloudflare, we focus on something of it\\u2019s very different, which is, how effectively are we moving a customer\\u2019s data from one place to another? And so, while the traditional hyperscale public clouds are all focused on keeping your data and making sure that they have as much of it, what we\\u2019re really focused on is how do we make sure your data is wherever you need it to be and how do we connect all of the various things together?



So, I think it\\u2019s exactly right, where we start with a network and are kind of building more functions on top of that network, whereas other companies start really with a database\\u2014the traditional hyperscale public clouds\\u2014and the network is sort of an afterthought on top of it, just you know, a cost center on what they\\u2019re delivering. And I think that describes a lot of the difference between us and everyone else. And so oftentimes, we work very much in conjunction with. A lot of our customers use hyperscale public clouds and Cloudflare, but increasingly, there are certain applications, there\\u2019s certain data that just makes sense to live inside the network itself, and in those cases, customers are using things like R2, they\\u2019re using our Workers platform in order to be able to build applications that will be available everywhere around the world and incredibly performant. And I think that is fundamentally the difference. We\\u2019re all about moving data between places, making sure it\\u2019s available everywhere, whereas the traditional hyperscale public clouds are all about hoarding that data in one place.



Corey: I want to clarify that when you say hoard, I think of this, from my position as a cloud economist, as effectively in an economic story where hoarding the data, they get to charge you for hosting it, they get to charge you serious prices for egress. I\\u2019ve had people mishear that before in a variety of ways, usually distilled down to, \\u201cOh, and their data mining all of their customers\\u2019 data.\\u201d And I want to make sure that that\\u2019s not the direction that you intend the term to be used. If it is, then great, we can talk about that, too. I just want to make sure that I don\\u2019t get letters because God forbid we get letters for things that we say in the public.



Matthew: No, I mean, I had an aunt who was a hoarder and she collected every piece of everything and stored it somewhere in her tiny little apartment in the panhandle of Florida. I don\\u2019t think she looked at any of it and for the most part, I don\\u2019t think that AWS or Google or Microsoft are really using your data in any way that\\u2019s nefarious, but they\\u2019re definitely not going to make it easy for you to get it out of those places; they\\u2019re going to make it very, very expensive. And again, what they\\u2019re measuring is how much of a customer\\u2019s data are they holding onto whereas at Cloudflare we\\u2019re measuring how much can we enable you to move your data around and connected wherever you need it. And again, I think that that kind of gets to the fundamental difference between how we think of the world and how I think the hyperscale public clouds thing of the world. And it also gets to where are the places where it makes sense to use Cloudflare, and where are the places that it makes sense to use an AWS or Google Cloud or Microsoft Azure.



Corey: So, I have to ask, and this gets into the origin story trope a bit, but what radicalized you? For me, it was the realization one day that I could download two terabytes of data from S3 once, and it would cost significantly more than having Amazon.com ship me a two-terabyte hard drive from their store.



Matthew: I think that\\u2014so Cloudflare started with the basic idea that the internet\\u2019s not as good as it should be. If we all knew what the internet was going to be used for and what we\\u2019re all going to depend on it for, we would have made very different decisions in how it was designed. And we would have made sure that security was built in from day one, we would have\\u2014you know, the internet is very reliable and available, but there are now airplanes that can\\u2019t land if the internet goes offline, they are shopping transactions shut down if the internet goes offline. And so, I don\\u2019t think we understood\\u2014we made it available to some extent, but not nearly to the level that we all now depend on it. And it wasn\\u2019t as fast or as efficient as it possibly could be. It\\u2019s still very dependent on the geography of where data is located.



And so, Cloudflare started out by saying, \\u201cCan we fix that? Can we go back and effectively patch the internet and make it what it should have been when we set down the original protocols in the \\u201960s, \\u201970s, and \\u201980s?\\u201d But can we go back and say, can we build a new, sort of, overlay on the internet that solves those problems: make it more secure, make it more reliable, make it faster and more efficient? And so, I think that that\\u2019s where we started, and as a result of, again, starting from that place, it just made fundamental sense that our job was, how do you move data from one place to another and do it in all of those ways? And so, where I think that, again, the hyperscale public clouds measure themselves by how much of a customer\\u2019s data are they hoarding; we measure ourselves by how easy are we making it to securely, reliably, and efficiently move any piece of data from one place to another.



And so, I guess, that is radical compared to some of the business models of the traditional cloud providers, but it just seems like what the internet should be. And that\\u2019s our North Star and that\\u2019s what just continues to drive us and I think is a big reason why more and more customers continue to rely on Cloudflare.



Corey: The thing that irks me potentially the most in the entire broad strokes of cloud is how the actions of the existing hyperscalers have reflected mostly what\\u2019s going on in the larger world. Moore\\u2019s law has been going on for something like 100 years now. And compute continues to get faster all the time. Storage continues to cost less year over year in a variety of ways. But they have, on some level, tricked an entire generation of businesses into believing that network bandwidth is this precious, very finite thing, and of course, it\\u2019s going to be ridiculously expensive. You know, unless you\\u2019re taking it inbound, in which case, oh, by all means back the truck around. It\\u2019ll be great.



So, I\\u2019ve talked to founders\\u2014or prospective founders\\u2014who had ideas but were firmly convinced that there was no economical way to build it. Because oh, if I were to start doing real-time video stuff, well, great, let\\u2019s do the numbers on this. And hey, that\\u2019ll be $50,000 a minute, if I read the pricing page correctly, it\\u2019s like, well, you could get some discounts if you ask nicely, but it doesn\\u2019t occur to them that they could wind up asking for a 98% discount on these things. Everything is measured in a per gigabyte dimension and that just becomes one of those things where people are starting to think about and meter something that\\u2014from my days in data centers where you care about the size of the pipe and not what\\u2019s passing through it\\u2014to be the wrong way of thinking about things.



Matthew: A little of this is that everybody is colored by their experience of dealing with their ISP at home. And in the United States, in a lot of the world, ISPs are built on the old cable infrastructure. And if you think about the cable infrastructure, when it was originally laid down, it was all one-directional. So, you know, if you were turning on cable in your house in a pre-internet world, data fl\\u2014



Corey: Oh, you\\u2019d watch a show and your feedback was yelling at the TV, and that\\u2019s okay. They would drop those packets.



Matthew: And there was a tiny, tiny, tiny bit of data that would go back the other direction, but cable was one-directional. And so, it actually took an enormous amount of engineering to make cable bi-directional. And that\\u2019s the reason why if you\\u2019re using a traditional cable company as your ISP, typically you will have a large amount of download capacity, you\\u2019ll have, you know, a 100 megabits of down capacity, but you might only have a 10th of that\\u2014so maybe ten megabits\\u2014of upload capacity. That is an artifact of the cable system. That is not just the natural way that the internet works.



And the way that it is different, that wholesale bandwidth works, is that when you sign up for wholesale bandwidth\\u2014again, as you phrase it, you\\u2019re not buying this many bytes that flows over the line; you\\u2019re buying, effectively, a pipe. You know, the late Senator Ted Stevens said that the internet is just a series of tubes and got mocked mercilessly, but the internet is just a series of tubes. And when Cloudflare or AWS or Google or Microsoft buys one of those tubes, what they pay for is the diameter of the tube, the amount that can fit through it. And the nature of this is you don\\u2019t just get one tube, you get two. One that is down and one that is up. And they\\u2019re the same size.



And so, if you\\u2019ve got a terabit of traffic coming down and zero going up, that costs exactly the same as a terabit going up and zero going down, which costs exactly the same as a terabit going down and a terabit going up. It is different than your home, you know, cable internet connection. And that\\u2019s the thing that I think a lot of people don\\u2019t understand. And so, as you pointed out, but the great tragedy of the cloud is that for nothing other than business reasons, these hyperscale public cloud companies don\\u2019t charge you anything to accept data\\u2014even though that is actually the more expensive of the two operations for that because writes are more expensive than reads\\u2014but the inherent fact that they were able to suck the data in means that they have the capacity, at no additional cost, to be able to send that data back out. And so, I think that, you know, the good news is that you\\u2019re starting to see some providers\\u2014so Cloudflare, we\\u2019ve never charged for egress because, again, we think that over time, bandwidth prices go to zero because it just makes sense; it makes sense for ISPs, it makes sense for connectiv\\u2014to be connected to us.



And that\\u2019s something that we can do, but even in the cases of the cloud providers where maybe they\\u2019re all in one place and somebody has to pay to backhaul the traffic around the world, maybe there\\u2019s some cost, but you\\u2019re starting to see some pressure from some of the more forward-leaning providers. So Oracle, I think has done a good job of leaning in and showing how egress fees are just out of control. But it\\u2019s crazy that in some cases, you have a 4,000x markup on AWS bandwidth fees. And that\\u2019s assuming that they\\u2019re paying the same rates as what we would get at Cloudflare, you know, even though we are a much smaller company than they are, and they should be able to get even better prices.



Corey: Yes, if there\\u2019s one thing Amazon is known for, it as being bad at negotiating. Yeah, sure it is. I\\u2019m sure that they\\u2019re just a terrific joy to be a vendor to.



Matthew: Yeah, and I think that fundamentally what the price of bandwidth is, is tied very closely to what the cost of a port on a router costs. And what we\\u2019ve seen over the course of the last ten years is that cost has just gone enormously down where the capacity of that port has gone way up and the just physical cost, the depreciated cost that port has gone down. And yet, when you look at Amazon, you just haven\\u2019t seen a decrease in the cost of bandwidth that they\\u2019re passing on to customers. And so, again, I think that this is one of the places where you\\u2019re starting to see regulators pay attention, we\\u2019ve seen efforts in the EU to say whatever you charge to take data out is the same as what you should charge it to put data in. We\\u2019re seeing the FTC start to look at this, and we\\u2019re seeing customers that are saying that this is a purely anti-competitive action.



And, you know, I think what would be the best and healthiest thing for the cloud by far is if we made it easy to move between various cloud providers. Because right now the choice is, do I use AWS or Google or Microsoft, whereas what I think any company out there really wants to be able to do is they want to be able to say, \\u201cI want to use this feature at AWS because they\\u2019re really good at that and I want to use this other feature at Google because they\\u2019re really good at that, and I want to us this other feature at Microsoft, and I want to mix and match between those various things.\\u201d And I think that if you actually got cloud providers to start competing on features as opposed to competing on their overall platform, we\\u2019d actually have a much richer and more robust cloud environment, where you\\u2019d see a significantly improved amount of what\\u2019s going on, as opposed to what we have now, which is AWS being mediocre at everything.



Corey: I think that there\\u2019s also a story where for me, the egress is annoying, but so is the cross-region and so is the cross-AZ, which in many cases costs exactly the same. And that frustrates me from the perspective of, yes, if you have two data centers ten miles apart, there is some startup costs to you in running fiber between them, however you want to wind up with that working, but it\\u2019s a sunk cost. But at the end of that, though, when you wind up continuing to charge on a per gigabyte basis to customers on that, you\\u2019re making them decide on a very explicit trade-off of, do I care more about cost or do I care more about reliability? And it\\u2019s always going to be an investment decision between those two things, but when you make the reasonable approach of well, okay, an availability zone rarely goes down, and then it does, you get castigated by everyone for, \\u201cOh it even says in their best practice documents to go ahead and build it this way.\\u201d It\\u2019s funny how a lot of the best practice documents wind up suggesting things that accrue primarily to a cloud provider\\u2019s benefit. But that\\u2019s the way of the world I suppose.



I just know, there\\u2019s a lot of customer frustration on it and in my client environments, it doesn\\u2019t seem to be very acute until we tear apart a bill and look at where they\\u2019re spending money, and on what, at which point, the dawning realization, you can watch it happen, where they suddenly realize exactly where their money is going\\u2014because it\\u2019s relatively impenetrable without that\\u2014and then they get angry. And I feel like if people don\\u2019t know what they\\u2019re being charged for, on some level, you\\u2019ve messed up.



Matthew: Yeah. So, there\\u2019s cost to running a network, but there\\u2019s no reason other than limiting competition why you would charge more to take data out than you would put data in. And that\\u2019s a puzzle. The cross-region thing, you know, I think where we\\u2019re seeing a lot of that is actually oftentimes, when you\\u2019ve got new technologies that come out and they need to take advantage of some scarce resource. And so, AI\\u2014and all the AI companies are a classic example of this\\u2014right now, if you\\u2019re trying to build a model, an AI model, you are hunting the world for available GPUs at a reasonable price because there\\u2019s an enormous scarcity of them.



And so, you need to move from AWS East to AWS West, to AWS, you know, Singapore, to AWS in Luxembourg and bounce around to find wherever there\\u2019s GPU availability. And then that is crossed against the fact that these training datasets are huge. You know, I mean, they\\u2019re just massive, massive, massive amounts of data. And so, what that is doing is you\\u2019re having these AI companies that are really seeing this get hit in the face, where they literally can\\u2019t get the capacity they need because of the fact that whatever cloud provider in whatever region they\\u2019ve selected to store their data isn\\u2019t able to have that capacity. And so, they\\u2019re getting hit not only by sort of a double whammy of, \\u201cI need to move my data to wherever there\\u2019s capacity. And if I don\\u2019t do that, then I have to pay some premium, an ever-escalating price for the underlying GPUs.\\u201d And God forbid, you have to move from AWS to Google to chase that.



And so, we\\u2019re seeing a lot of companies that are saying, \\u201cThis doesn\\u2019t make any sense. We have this enormous training set. If we just put it with Cloudflare, this is data that makes sense to live in the network, fundamentally.\\u201d And not everything does. Like, we\\u2019re not the right place to store your long-term transaction logs that you\\u2019re only going to look at if you get sued. There are much better places, much more effective places do it.



But in those cases where you\\u2019ve got to read data frequently, you\\u2019ve got to read it from different places around the world, and you will need to decrease what those costs of each one of those reads are, what we\\u2019re seeing is just an enormous amount of demand for that. And I think these AI startups are really just a very clear example of what company after company after company needs, and why R2 has had\\u2014which is our zero egress cost S3 competitor\\u2014why that is just seeing such explosive growth from a broad set of customers.



Corey: Because I enjoy pushing the bounds of how ridiculous I can be on the internet, I wound up grabbing a copy of the model, the Llama 2 model that Meta just released earlier this week as we\\u2019re recording this. And it was great. It took a little while to download here. I have gigabit internet, so okay, it took some time. But then I wound up with something like 330 gigs of models. Great, awesome.



Except for the fact that I do the math on that and just for me as one person to download that, had they been paying the listed price on the AWS website, they would have spent a bit over $30, just for me as one random user to download the model, once. If you can express that into the idea of this is a model that is absolutely perfect for whatever use case, but we want to have it run with some great GPUs available at another cloud provider. Let\\u2019s move the model over there, ignoring the data it\\u2019s operating on as well, it becomes completely untenable. It really strikes me as an anti-competitiveness issue.



Matthew: Yeah. I think that\\u2019s it. That\\u2019s right. And that\\u2019s just the model. To build that model, you would have literally millions of times more data that was feeding it. And so, the training sets for that model would be many, many, many, many, many, many orders of magnitude larger in terms of what\\u2019s there. And so, I think the AI space is really illustrating where you have this scarce resource that you need to chase around the world, you have these enormous datasets, it\\u2019s illustrating how these egress fees are actually holding back the ability for innovation to happen.



And again, they are absolutely\\u2014there is no valid reason why you would charge more for egress than you do for ingress other than limiting competition. And I think the good news, again, is that\\u2019s something that\\u2019s gotten regulators\\u2019 attention, that\\u2019s something that\\u2019s gotten customers\\u2019 attention, and over time, I think we all benefit. And I think actually, AWS and Google and Microsoft actually become better if we start to have more competition on a feature-by-feature basis as opposed to on an overall platform. The choice shouldn\\u2019t be, \\u201cI use AWS.\\u201d And any big company, like, nobody is all-in only on one cloud provider. Everyone is multi-cloud, whether they want to be or not because people end up buying another company or some skunkworks team goes off and uses some other function.



So, you are across multiple different clouds, whether you want to be or not. But the ideal, and when I talk to customers, they want is, they want to say, \\u201cWell, you know that stuff that they\\u2019re doing over at Microsoft with AI, that sounds really interesting. I want to use that, but I really like the maturity and robustness of some of the EC2 API, so I want to use that at AWS. And Google is still, you know, the best in the world at doing search and indexing and everything, so I want to use that as well, in order to build my application.\\u201d And the applications of the future will inherently stitch together different features from different cloud providers, different startups.



And at Cloudflare, what we see is our, sort of, purpose for being is how do we make that stitching as easy as possible, as cost-effective as possible, and make it just make sense so that you have one consistent security layer? And again, we\\u2019re not about hording the data; we\\u2019re about connecting all of those things together. And again, you know, from the last time we talked to now, I\\u2019m actually much more optimistic that you\\u2019re going to see, kind of, this revolution where egress prices go down, you get competition on feature-by-features, and that\\u2019s just going to make every cloud provider better over the long-term.



Corey: This episode is sponsored in part by Panoptica. \\xa0Panoptica simplifies container deployment, monitoring, and security, protecting the entire application stack from build to runtime. Scalable across clusters and multi-cloud environments, Panoptica secures containers, serverless APIs, and Kubernetes with a unified view, reducing operational complexity and promoting collaboration by integrating with commonly used developer, SRE, and SecOps tools. Panoptica ensures compliance with regulatory mandates and CIS benchmarks for best practice conformity. Privacy teams can monitor API traffic and identify sensitive data, while identifying open-source components vulnerable to attacks that require patching. Proactively addressing security issues with Panoptica allows businesses to focus on mitigating critical risks and protecting their interests. Learn more about Panoptica today at panoptica.app.



Corey: I don\\u2019t know that I would trust you folks to the long-term storage of critical data or the store of record on that. You don\\u2019t have the track record on that as a company the way that you do for being the network interchange that makes everything just work together. There are areas where I\\u2019m thrilled to explore and see how it works, but it takes time, at least from the sensible infrastructure perspective of trusting people with track records on these things. And you clearly have the network track record on these things to make this stick. It almost\\u2014it seems unfair to you folks, but I view you as Cloudflare is a CDN, that also dabbles in a few other things here in there, though, increasingly, it seems it\\u2019s CDN and security company are becoming synonymous.



Matthew: It\\u2019s interesting. I remember\\u2014and this really is going back to the origin story, but when we were starting Cloudflare, you know, what we saw was that, you know, we watched as software\\u2014starting with companies like Salesforce\\u2014transition from something that you bought in the box to something that you bought as a service [into 00:23:25] the cloud. We watched as, sort of, storage and compute transition from something that you bought from Dell or HP to something that you rented as a service. And so the fundamental problem that Cloudflare started out with was if the software and the storage and compute are going to move, inherently the security and the networking is going to move as well because it has to be as a service as well, there\\u2019s no way you can buy a you know, Cisco firewall and stick it in front of your cloud service. You have to be in the cloud as well.



So, we actually started very much as a security company. And the objection that everybody had to us as we would sort of go out and describe what we were planning on doing was, \\u201cYou know, that sounds great, but you\\u2019re going to slow everything down.\\u201d And so, we became just obsessed with latency. And Michelle, my co-founder, and I were business students and we had an advisor, a guy named Tom [Eisenmann 00:24:26] in business school. And I remember going in and that was his objection as well and so we did all this work to figure it out.



And obviously, you know, I\\u2019d say computer science, and anytime that you have a problem around latency or speed caching is an obvious part of the solution to that. And so, we went in and we said, \\u201cHere\\u2019s how we\\u2019re going to do it: [unintelligible 00:24:47] all this protocol optimization stuff, and here\\u2019s how we\\u2019re going to distribute it around the world and get close to where users are. And we\\u2019re going to use caching in the places where we can do caching.\\u201d And Tom said, \\u201cOh, you\\u2019re building a CDN.\\u201d And I remember looking at him and then I\\u2019m looking at Michelle. And Michelle is Canadian, and so I was like, \\u201cI don\\u2019t know that I\\u2019m building a Canadian, but I guess. I don\\u2019t know.\\u201d



And then, you know, we walked out in the hall and Michelle looked at me and she\\u2019s like, \\u201cWe have to go figure out what the CDN thing is.\\u201d And we had no idea what a CDN was. And even when we learned about it, we were like, that business doesn\\u2019t make any sense. Like because again, the CDNs were the first ones to really charge for bandwidth. And so today, we have effectively built, you know, a giant CDN and are the fastest in the world and do all those things.



But we\\u2019ve always given it away basically for free because fundamentally, what we\\u2019re trying to do is all that other stuff. And so, we actually started with security. Security is\\u2014you know, my\\u2014I\\u2019ve been working in security now for over 25 years and that\\u2019s where my background comes from, and if you go back and look at what the original plan was, it was how do we provide that security as a service? And yeah, you need to have caching because caching makes sense. What I think is the difference is that in order to do that, in order to be able to build that, we had to build a set of developer tools for our own team to allow them to build things as quickly as possible.



And, you know, if you look at Cloudflare, I think one of the things we\\u2019re known for is just the rapid, rapid, rapid pace of innovation. And so, over time, customers would ask us, \\u201cHow do you innovate so fast? How do you build things fast?\\u201d And part of the answer to that, there are lots of ways that we\\u2019ve been able to do that, but part of the answer to that is we built a developer platform for our own team, which was just incredibly flexible, allowed you to scale to almost any level, took care of a lot of that traditional SRE functions just behind the scenes without you having to think about it, and it allowed our team to be really fast. And our customers are like, \\u201cWow, I want that too.\\u201d



And so, customer after customer after customer after customer was asking and saying, you know, \\u201cWe have those same problems. You know, if we\\u2019re a big e-commerce player, we need to be able to build something that can scale up incredibly quickly, and we don\\u2019t have to think about spinning up VMs or containers or whatever, we don\\u2019t have to think about that. You know, our customers are around the world. We don\\u2019t want to have to pick a region for where we\\u2019re going to deploy code.\\u201d And so, where we built Cloudflare Workers for ourself first, customers really pushed us to make it available to them as well.



And that\\u2019s the way that almost any good developer platform starts out. That\\u2019s how AWS started. That\\u2019s how, you know, the Microsoft developer platform, and so the Apple developer platform, the Salesforce developer platform, they all start out as internal tools, and then someone says, \\u201cCan you expose this to us as well?\\u201d And that\\u2019s where, you know, I think that we have built this. And again, it\\u2019s very opinionated, it is right for certain applications, it\\u2019s never going to be the right place to run SAP HANA, but the company that builds the tool [crosstalk 00:27:58]\\u2014



Corey: I\\u2019m not convinced there is a right place to run SAP HANA, but that\\u2019s probably unfair of me.



Matthew: Yeah, but there is a startup out there, I guarantee you, that\\u2019s building whatever the replacement for SAP HANA is. And I think it\\u2019s a better than even bet that Cloudflare Workers is part of their stack because it solves a lot of those fundamental challenges. And that\\u2019s been great because it is now allowing customer after customer after customer, big and large startups and multinationals, to do things that you just can\\u2019t do with traditional legacy hyperscale public cloud. And so, I think we\\u2019re sort of the next generation of building that. And again, I don\\u2019t think we set out to build a developer platform for third parties, but we needed to build it for ourselves and that\\u2019s how we built such an effective tool that now so many companies are relying on.



Corey: As a Cloudflare customer myself, I think that one of the things that makes you folks standalone\\u2014it\\u2019s why I included security as well as CDN is one of the things I trust you folks with\\u2014has been\\u2014



Matthew: I still think CDN is Canadian. You will never see us use that term. It\\u2019s like, Gartner was like, \\u201cYou have to submit something for the CDN-like ser\\u2014\\u201d and we ended up, like, being absolute top-right in it. But it\\u2019s a space that is inherently going to zero because again, if bandwidth is free, I\\u2019m not sure what\\u2014this is what the internet\\u2014how the internet should work. So yeah, anyway.



Corey: I agree wholeheartedly. But what I\\u2019ve always enjoyed, and this is probably going to make me sound meaner than I intend it to, it has been your outages. Because when computers inherently at some point break, which is what they do, you personally and you as a company have both taken a tone that I don\\u2019t want to say gleeful, but it\\u2019s sort of the next closest thing to it regarding the postmortem that winds up getting published, the explanation of what caused it, the transparency is unheard of at companies that are your scale, where usually they want to talk about these things as little as possible. Whereas you\\u2019ve turned these into things that are educational to those of us who don\\u2019t have the same scale to worry about but can take things from that are helpful. And that transparency just counts for so much when we\\u2019re talking about things as critical as security.



Matthew: I would definitely not describe it as gleeful. It is incredibly painful. And we, you know, we know we let customers down anytime we have an issue. But we tend not to make the same mistake twice. And the only way that we really can reliably do that is by being just as transparent as possible about exactly what happened.



And we hope that others can learn from the mistakes that we made. And so, we own the mistakes we made and we talk about them and we\\u2019re transparent, both internally but also externally when there\\u2019s a problem. And it\\u2019s really amazing to just see how much, you know, we\\u2019ve improved over time. So, it\\u2019s actually interesting that, you know, if you look across\\u2014and we measure, we test and measure all the big hyperscale public clouds, what their availability and reliability is and measure ourselves against it, and across the board, second half of 2021 and into the first half of 2022 was the worst for every cloud provider in terms of reliability. And the question is why?



And the answer is, Covid. I mean, the answer to most things over the last three years is in one way, directly or indirectly, Covid. But what happened over that period of time was that in April of 2020, internet traffic and traffic to our service and everyone who\\u2019s like us doubled over the course of a two-week period. And there are not many utilities that you can imagine that if their usage doubles, that you wouldn\\u2019t have a problem. Imagine the sewer system all of a sudden has twice as much sewage, or the electrical grid as twice as much demand, or the freeways have twice as many cars. Like, things break down.



And especially the European internet came incredibly close to just completely failing at that time. And we all saw where our bottlenecks were. And what\\u2019s interesting is actually the availability wasn\\u2019t so bad in 2020 because people were\\u2014they understood the absolute critical importance that while we\\u2019re in the middle of a pandemic, we had to make sure the internet worked. And so, we\\u2014there were a lot of sleepless nights, there\\u2019s a\\u2014and not just at with us, but with every provider that\\u2019s out there. We were all doing Herculean tasks in order to make sure that things came online.



By the time we got to the sort of the second half of 2021, what everybody did, Cloudflare included, was we looked at it, and we said, \\u201cOkay, here were where the bottlenecks were. Here were the problems. What can we do to rearchitect our systems to do that?\\u201d And one of the things that we saw was that we effectively treated large data centers as one big block, and if you had certain pieces of equipment that failed in a way, that you would take that entire data center down and then that could have cascading effects across traffic as it shifted around across our network. And so, we did the work to say, \\u201cLet\\u2019s take that one big data center and divide it effectively into multiple independent units, where you make sure that they\\u2019re all on different power suppliers, you make sure they\\u2019re all in different [crosstalk 00:32:52]\\u201d\\u2014



Corey: [crosstalk 00:32:51] harder than it sounds. When you have redundant things, very often, the thing that takes you down the most is the heartbeat that determines whether something next to it is up or not. It gets a false reading and suddenly, they\\u2019re basically trying to clobber each other to death. So, this is a lot harder than it sounds like.



Matthew: Yeah, and it was\\u2014but what\\u2019s interesting is, like, we took it all that into account, but the act of fixing things, you break things. And that was not just true at Cloudflare. If you look across Google and Microsoft and Amazon, everybody, their worst availability was second half of 2021 or into 2022. But it both internally and externally, we talked about the mistakes we made, we talked about the challenges we had, we talked about\\u2014and today, we\\u2019re significantly more resilient and more reliable because of that. And so, transparency is built into Cloudflare from the beginning.



The earliest story of this, I remember, there was a 15-year-old kid living in Long Beach, California who bought my social security number off of a Russian website that had hacked a bank that I\\u2019d once used to get a mortgage. He then use that to redirect my cell phone voicemail to a voicemail box he controlled. He then used that to get into my personal email. He then used that to find a zero-day vulnerability in Google\\u2019s corporate email where he could privilege-escalate from my personal email into Google\\u2019s corporate email, which is the provider that we use for our email service. And then he used that as an administrator on our email at the time\\u2014this is back in the early days of Cloudflare\\u2014to get into another administration account that he then used to redirect one of Cloud Source customers to a website that he controlled.



And thankfully, it wasn\\u2019t, you know, the FBI or the Central Bank of Brazil, which were all Cloudflare customers. Instead, it was 4chan because he was a 15-year-old hacker kid. And we fix it pretty quickly and nobody knew who Cloudflare was at the time. And so potential\\u2014



Corey: The potential damage that could have been caused at that point with that level of access to things, like, that is such a ridiculous way to use it.



Matthew: And\\u2014yeah [laugh]\\u2014my temptation\\u2014because it was embarrassing. He took a bunch of stuff from my personal email and he put it up on a website, which just to add insult to injury, was actually using Cloudflare as well. And I wanted to sweep it under the rug. And our team was like, \\u201cThat\\u2019s not the right thing to do. We\\u2019re fundamentally a security company and we need to talk about when we make mistakes on security.\\u201d And so, we wrote a huge postmortem on, \\u201cHere\\u2019s all the stupid things that we did that caused this hack to happen.\\u201d And by the way, it wasn\\u2019t just us. It was AT&T, it was Google. I mean, there are a lot of people that ended up being involved.



Corey: It builds trust with that stuff. It\\u2019s painful in the short term, but I believe with the benefit of hindsight, it was clearly the right call.



Matthew: And it was\\u2014and I remember, you know, pushing \\u2018publish\\u2019 on the blog post and thinking, \\u201cThis is going to be the end of the company.\\u201d And quite the opposite happened, which was all of a sudden, we saw just an incredible amount of people who signed up the next day saying, \\u201cIf you\\u2019re going to be that transparent about something that was incredibly embarrassing when you didn\\u2019t have to be, then that\\u2019s the sort of thing that actually makes me trust that you\\u2019re going to be transparent the future.\\u201d And I think learning that lesson early on, has been just an incredibly valuable lesson for us and made us the company that we are today.



Corey: A question that I have for you about the idea of there being no reason to charge in one direction but not the other. There\\u2019s something that I\\u2019m not sure that I understand on this. If I run a website, to use your numbers of a terabit out\\u2014because it\\u2019s a web server\\u2014and effectively nothing in\\u2014because it\\u2019s a webserver; other than the request, nothing really is going to come in\\u2014that ingress bandwidth becomes effectively unused and also free. So, if I have another use case where I\\u2019m paying for it anyway, if I\\u2019m primarily caring about an outward direction, sure, you can send things in for free. Now, there\\u2019s a lot of nuance that goes into that. But I\\u2019m curious as to what the\\u2014is their fundamental misunderstanding in that analysis of the bandwidth market?



Matthew: No. And I think that\\u2019s exactly, exactly right. And it\\u2019s actually interesting. At Cloudflare, our infrastructure team\\u2014which is the one that manages our connections to the outside world, manages the hardware we have\\u2014meets on a quarterly basis with our product team. It\\u2019s called the Hot and Cold Meeting.



And what they do is they go over our infrastructure, and they say, \\u201cOkay, where are we hot? Where do we have not enough capacity?\\u201d If you think of any given server, an easy way to think of a server is that it has, sort of, four resources that are available to it. This is, kind of, vast simplification, but one is the connectivity to the outside world, both transit in and out. The second is the\\u2014



Corey: Otherwise it\\u2019s just a complicated space heater.



Matthew: Yeah [laugh]. The other is the CPU. The other is the longer-term storage. We use only SSDs, but sort of, you know, hard drives or SSD storage. And then the fourth is the short-term storage, or RAM that\\u2019s in that server.



And so, at any given moment, there are going to be places where we are running hot, where we have a sort of capacity level that we\\u2019re targeting and we\\u2019re over that capacity level, but we\\u2019re also going to be running cold in some of those areas. And so, the infrastructure team and the product team get together and the product team has requests on, you know, \\u201cHere\\u2019s some more places we would be great to have more infrastructure.\\u201d And we\\u2019re really good at deploying that when we need to, but the infrastructure team then also says, \\u201cHere are the places where we\\u2019re cold, where we have excess capacity.\\u201d And that turns into products at Cloudflare. So, for instance, you know, the reason that we got into the zero-trust space was very much because we had all this excess capacity.



We have 100 times the capacity of something like Zscaler across our network, and we can add that\\u2014that is primar\\u2014where most of our older products are all about outward traffic, the zero-trust products are all about inward traffic. And the reason that we can do everything that Zscaler does, but for, you know, a much, much, much more affordable prices, we going to basically just layer that on the network that already exists. The reason we don\\u2019t charge for the bandwidth behind DDoS attacks is DDoS attacks are always about inbound traffic and we have just a ton of excess capacity around that inbound traffic. And so, that unused capacity is a resource that we can then turn into products, and very much that conversation between our product team and our infrastructure team drives how we think about building new products. And we\\u2019re always trying to say how can we get as much utilization out of every single piece of equipment that we run everywhere in the world.



The way we build our network, we don\\u2019t have custom machines or different networks for every products. We build all of our machines\\u2014they come in generations. So, we\\u2019re on, I think, generation 14 of servers where we spec a server and it has, again, a certain amount of each of those four [bits 00:39:22] of capacity. But we can then deploy that server all around the world, and we\\u2019re buying many, many, many of them at any given time so we can get the best cost on that. But our product team is very much in constant communication with our infrastructure team and saying, \\u201cWhat more can we do with the capacity that we have?\\u201d And then we pass that on to our customers by adding additional features that work across our network and then doing it in a way that\\u2019s incredibly cost-effective.



Corey: I really want to thank you for taking the time to, basically once again, suffer slings and arrows about networking, security, cloud, economics, and so much more. If people want to learn more, where\\u2019s the best place for them to find you?



Matthew: You know, used to be an easy question to answer because it was just, you know, go on Twitter and find me but now we have all these new mediums. So, I\\u2019m @eastdakota on Twitter. I\\u2019m eastdakota.com on Bluesky. I\\u2019m @real_eastdakota on Threads. And so, you know, one way or another, if you search for eastdakota, you\\u2019ll come across me somewhere out there in the ether.



Corey: And we will, of course, put links to that in the show notes. Thank you so much for your time. I appreciate it.



Matthew: It\\u2019s great to talk to you, Corey.



Corey: Matthew Prince, CEO and co-founder of Cloudflare. I\\u2019m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you\\u2019ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you\\u2019ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that I will of course not charge you inbound data rates on.


Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.


'