Perspectives and guidance on how to scale your artifacts as your team grows
The term microservices has become ubiquitous in our industry, comprising many different meanings. From the engineer’s perspective, it promises the green-field dream, with framework freedom and a blind eye to existing code. This aspect is often emphasized by AWS, a vendor tailored for that. From a systems perspective, it promises stability through asynchronous communication and event-based systems.
Confluent, the makers of Kafka, have based their whole business strategy on this. On the tooling front, version control and integrated IDEs welcome the piecemeal approach, avoiding the scale needs of humungous codebases, as the success of scaffolding technology like create-react-app, championed by Facebook has shown.
Finally, the service mesh should eat the microservices chassis complexity, letting developers focus on business problems, a core tenant of Google Cloud Platform, operating on different layers of abstraction, from Kubernetes to Anthos.
For a time, it seemed a consensus was about to emerge that microservices were simply a superior architecture, despite the evidence that most successful large and small companies did not rely on it. Instagram is a large monolith Django application, Facebook sustained its growth in a simple PHP frontend, and Google, which pioneered containers, has many of its key services spanning hundreds of engineers, like GWS or Mustang. On the mid-size spectrum, Shopify and Stack Overflow are also monoliths.
Successful startups are invariably based on a simple CRUD server with some auxiliary flows, a highly efficient architecture for small groups of engineers. The challenge microservices initially came to solve was how to go from this small organization to a large one without facing the incredibly hard path of scaling in a centralized fashion taken by the early Big Five, namely Google, Amazon, Facebook, Apple, and Microsoft.
Instead, they proposed to scale as a group of small startups, an option AWS was famously created for, as it supported Amazon’s continued hypergrowth. Later, companies like Twitter, Uber, and Airbnb tried to mimic that with limited success. Netflix may be the most publicly successful case.
Recently, this consensus started to fade. Companies that went through the journey of adopting microservices started to talk about macroservices or well-sized services as a more suitable architecture. Event-oriented communication did not fully replace RPC technologies, and streams are still far from replacing databases as the source of truth.
The advances in scaling the tech stack of the verticalized trillion-dollar companies started to leak through their own SaaS or engineers creating their own super-scalable SaaS startups. But the industry shakeup brought important innovations which, when carefully combined with the modern incarnations of traditional technologies, can make the journey of a small to large company way smoother and help at any point in time.
In the last couple of years, I had the chance to experience those architectural decisions from many angles. In this write-up, I capture some of the thoughts I had and the techniques I developed as both an individual contributor and a leader in a large company, in my startup, and in a scale-up. With no further ado, here are my perspectives and guidance on how to scale your artifacts as your team grows.
Software Engineering is Programming Over Time
TL;DR: You do not have a crystal ball. Invest in decisions that are resilient to change and try to make the rest reversible.
A few important perspectives are often missed when talking about software engineering. The book “Software Engineering at Google” defines software engineering as programming over time. The key insight is that the optimal way someone builds software artifacts is tightly connected to the expected lifetime of that software artifact. Since that quantity is often unknown, one also needs to account for uncertainty and decide not only the best strategy for writing software for the problem at hand but also the change management strategy if the expected lifetime of the software is found to be shorter or larger.
This uncertainty permeates the decision-making when writing software. Different engineers will have different guesses based on the specific problem at hand or on their past experience in different organizations, and they may arrive at distinct conclusions. Take the common task of doing something in parallel.
One might argue it is better to write a throw-away shell script using ampersands; another will argue it is better to write a small Golang CLI with proper channels and some types, and some will prefer to deploy a rust server with provably correct concurrency. The correct answer not only depends on much broader reasoning but also depends on information that is not yet available, and the discussion is a bit like sports cheering.
If you accept uncertainty as a premise, you stop looking at the problem as striking a balance since the problem is dynamic — with strong chances that whichever balance you pick today will turn out wrong tomorrow. Borrowing some machine learning lingo, we can say that instead of trying to tune precision and recall on every decision, you want to develop principles that improve the area under the curve of your model.
If you find principles that have little to no cost when you are right and that decrease the cost of the inevitable mistakes over time, you can iterate much faster and buy yourself the privilege of quickly adapting to the learnings and changes on the business side. That is the main pillar of this essay: what are the practices in your software architecture you should adopt, given that you cannot know how long your software will live?
First Comes Data
TL;DR: use semantic versioning for the private state, protocol buffers for everything else
There is little doubt that the most valuable asset in companies today is data. Once a startup is past the MVP phase, data will be its guide to finding the most promising markets, and if all goes well, it will become its treasure as it unlocks strong opportunities. And with a lot of luck and hard work, data can then become a moat, unveiling the powerful scale effects underpinning the most successful companies in our industry.
Data comes in many forms. Tables in a traditional database like Postgres are a common architecture to store durable data decoupled from code. Large amounts of nested data are often found in NoSQL databases, like MongoDB, which traded some of the ACID properties to scale geographically and in size. The NeoSQL databases like spanner tried to combine properties of those systems, and there are still myriads of variants, like those focused on graph operations, like Neo4J, or like Datatomic with its strong functional bias. They are usually what comes to people’s minds when we talk about frequently written data that should live long. It is also where most systems are born.
There is also long-lived, but read-most data. This used to be the realm of the business intelligence teams and is now populated with a variety of professionals, from data engineers to machine learning experts. Data-warehouse systems like Redshift are the norm in this world, with columnar storage being the key defining property of those architectures.
Search-heavy systems with inverted indices on their core, like Elasticsearch, have arisen recently, bringing temporal data and logs to the core of the data analysis, and powerful mixed-paradigm data lakes like Snowflake have made strides in our industry.
Non-durable data is also present in several parts of a system, with its own challenges. External caching saw a revolution ignited by Memcached and consolidated by Redis. Modern systems like DAX brings external caching access to the microsecond latency level, on par with in-memory JSON parsing. SQLite has become ubiquitous, often used creatively in durable and non-durable scenarios. Inverted lists for search systems are often mmap’ed to memory or even recreated on-demand on system load. In the frontend space, sophisticated data management solutions, the so-called state libraries, like redux, promise features like time machine debugging.
The distinction between data and code gets blurry when we talk about machine learning models. They can be a h5 blob deployed as an asset in a binary or an independently versioned artifact served by TensorFlow Serving. Even though our mental model is always tied to Von Neuman’s architecture, with code and data flowing as separate entities, code is part of the system state in a distributed system. And as we will see, the lifetime of code needs to be managed just like any other piece of data.
As we discussed, software is programming over time, so we need to talk about data over time as well. This is often discussed in the context of schema evolution. Ruby and Rails and Django are popular ORM systems that come with migration tooling, a common but rather limited mechanism for schema evolution. So-called migrations are SQL-based transformations deployed with code that can evolve the schema of a relational database together with the changes you have decided as your system has evolved. When carefully crafted, they can rollforward and rollback the database to match the code expectations.
They have one severe shortcoming though, which is that they are contained in the database. They have no effect on the data warehouse or other data copies and will generally result in breakages on analytical pipelines. It is beyond their reach to rewrite log lines from the past or invalidate caching layers like Redis, which may end up with a mix of data schemas across its data entries.
Another popular approach to dealing with data over time is schema versioning. The ecosystem around serialization format Avro champions that solution, and central schema databases, dubbed schema registry, are offered both in open source format and as SaaS services by some vendors. Whereas migrations fall short for being insufficient on where they can be applied, schema registries are a poor solution to schema evolution because they rely on cumbersome and mostly overlooked compatibility modes when specifying how schemas mixed in the axis of time. Having access to all historical schemas of a given table will hardly suffice to efficiently write analytical code, which has to reduce or join data represented in multiple schema versions over the years.
Both migrations and schema registries are shy, if not shortsighted solutions, to the problem of schema evolution, offering only localized solutions that ignore that data crosses several systems with different properties in a given instant and over time as well.
But not all is lost. Protocol buffers are a schema language born inside Google with a strong focus on the schema lifetime and a highly efficient associated serialization machinery with powerful backward and forward compatibility semantics. They are pervasive in the Golang ecosystem but available in virtually every language. The modification rules for protocol buffers can be mechanically verified by different tools like Buf Breaking Changes Detector and offer a powerful set of properties around data over time in exchange for some extra weight on the data designer.
The trade-off is well worth it, as it allows both the core database and all the derived ones, from data lakes to message queues, to be resilient to schema changes across many different operations. I would say this is one of the core secret sauces of Google’s infrastructure. You can look years behind and still make sense of the data, even in the presence of cumulative software changes as products and people come and go.
A schema language with automatically enforced backward/forward compatibility rules and associated serialization machinery like Protocol Buffers is a strong requirement for any distributed system, and it is no coincidence that most large successful organizations adopt it, from Google itself to Netflix’s appraised microservices architecture. Facebook’s Thrift drank from the same fountain, and Cap’nProto gives a glimpse of what can be accomplished if you dare take it into the future.
But there is an overhead associated with it, from a small hit to developer productivity perspective, since you have extra cognitive and typing overhead from the schema language and tooling to performance, given the cost of tag-encoding that enables the forward/backward compatibility properties over the wire.
It would be amazing to use protocol buffers for Scala case classes, C++ pod types, or the new Python Data Classes. But it is a trade-off that makes sense to pay for whenever your data gets serialized. If it goes to the network, if it goes to the database, or even if it goes to logs, you should use protocol buffers. Because now, your data is part of an ecosystem that does not evolve atomically and needs the bells and whistles from protocol buffers.
You have a terrific scape-hatch in JSON encoding for systems that do not natively support protocol buffers. By adopting the best practice of keeping your protos mostly flat, you can handle your data easily in most environments.
For data that is purely private, you can get away with something else. Think about in-memory vectors, caching layers, or even the code itself. These are data pieces that often cannot accept, or we cannot impose the overheads associated with protocol buffers. For those, the industry has successfully adopted the semantic versioning protocol. You could think of major version changes as imposed migrations to systems that consume that data. Semantic versioning only offers backward/forward compatibility across minor versions, but by being explicit, it often allows different major versions to live side by side.
We see that happening all the time as we push releases in systems like Kubernetes, where minor versions live together, and major versions require new endpoints, as commonly done in public REST APIs. We tend to think that only the binary artifacts require semantic versioning, but you can use the same strategy for table names in cache systems or for completely private databases. It is all about whether you have atomic updates or not. If you start to share your redux data stores across UI parts that get updated independently, soon you land in the same problem, hence the usage of hash-prefixed URLs, where you feed semver into the hash function. And once you start to expose the data, be it through logs or through your warehouse system, you really want to promote it to protocol buffers.
Remember, we are ultimately talking about distributed systems here. What differentiates a distributed system from a centralized one is whether they have the same notion of “now,” They can live in the same microchip or in a cluster. What matters is if they have the global Newtonian clock, or if they are in Einstein’s reality where time depends on the point of reference. Once the data gets published to a system with a different clock, you need to handle schema evolution for your organization to succeed.
But despite how hard it is to grasp all this, the solution is simple. Always do semantic versioning. Once you start serializing your data to the network or to a database, write a protocol buffer to describe it and enforce break-changing detection in your CI pipeline. This is the single most important property for your system, doesn’t matter if you are doing tiny microservices or a big monolith. Because a monolith also becomes a distributed system when you add a data lake or even when your clients store its responses.
Mind the Cross-Cutting Concerns
TL;DR: adopt tooling on the boundaries and kill private tooling
Cross-cutting concerns are the problems that arise in most software engineering projects but are not related to the business problem at hand. Some of the classic examples are logs and security. In modern applications, the set of cross-cutting concerns tends to be pretty high. You typically have authentication/authorization. You have several forms of observability, comprising from logs to tracing, several experiment classes, from a/b testing to canary releases, and deploy concerns, from the simple CI to a robust CD system. And on it goes.
Heroku famously published a collection of best practices they have collected over the years from those developing applications in their platform. Under the name “12-factor app”, the document collected extremely pragmatic and battle-tested solutions for problems every developer faced for some of the classic cross-cutting concerns of the time. The industry benefited tremendously from adopting the opinionated solutions provided in that document, and many of them are today not only considered a (somewhat aging) standard but available in refined form as out-of-the-shelf solutions you can download as open source or buy in SaaS form.
When an organization is in its infancy, the cost of cross-cutting concerns is low. Early successful companies will usually have a simple CRUD monolith; if they have drunk the microservices cool-aid, they will have a handful of services. At worst, you will have addressed a given cross-cutting concern, e.g., the CI pipeline, once or twice with deeper thoughts and maybe ignored it or hacked it a handful of times. The reality is that there isn’t much meat for the concern to cross. As companies grow, this starts to change. If you try to scale the monolith further, your cross-cutting concerns solutions need to grow in sophistication.
For example, your CI pipeline will start to have slow start times or use too much memory. If you started adopting microservices from the beginning, you would be swamped in setting up new, different CI setups that rotten independently across microservices and need to be kept in sync. And then, you will create a scaffolding tool that will become the cross-cutting concern and start to deviate resources from your business problem.
The reality is that there is no simple solution. Efficient and effective solutions for cross-cutting concerns are paramount for scaling a software organization but are irrelevant to the business. They are always a sunk cost that will be competing for resources and needs to be protected somehow. You will need error budgets, dedicated teams, and several processes. But it will always feel bad for the business, and it will always feel exciting for the engineering organization, even risking creating a miasma that can easily contaminate the company culture.
A robust strategy is to adopt or buy cross-cutting concerns solutions from companies or organizations dedicated to this problem (and hence the business goals are aligned) and kill the internal tooling. You will still dedicate significant resources, but not to build tools that fit your (secondary) use case, but to fit your use case to the existing tools. Resist the urge to write your own Kubernetes wrapper or a create-react-app clone for your backend. They start simple and grow into large projects that slow down your progress.
The tricky part is always when you start to consider what happens when they cross boundaries over time and over teams. Once an internal tool is no longer private to a team (or even to an individual), the organization should see that it is time to outsource someone who does that as their core business. In the long run, a company must be good at building what it sells, not what others sell.
Let us go through some of the common software-related cross-cutting concerns which pop up as your organization grows.
The cloud
TL;DR: never do anything on-premise
Unless you are competing with GCP, Azure, or AWS directly, there is no point in running your infrastructure today. If internal policies or practices bind you, invest your time in changing them. If you are bound by regulatory power, you must comply, but really verify as that is becoming less and less common. You cannot underestimate the high cost of running on-premise.
Also, you should aim to be in a single cloud. They have strengths and weaknesses, but combining them is a worse scenario than any individual cloud. Do not believe you can increase stability by being multi-cloud. This will only make sense if 5+ 9s stability is a crucial selling point for your product, and this will only be the case if you sell infrastructure SaaS. Unless you are dedicating the majority of your organization’s resources to reach this very hard goal, the load balancer you need to put on top of the clouds is a cross-cutting concern that will drag efforts from your business and fail more often than the underlying services.
If you use multiple clouds, which tends to happen even if you try not to, strongly avoid overlapping services. Cross-cloud communication is expensive and slow, and the cognitive cost of distinct tooling is also high. On the other hand, intra-cloud interop is not only fast and cheap but tends to be smooth as well since the cloud vendor has done the hard work for integration. If their exact implementation is not a fit for you, do not patch on top of it; instead, try to modify your requirements to match their offer.
Remember, it does not matter if you are running a gigantic monolith or a thousand microservices. If your architecture prevents you from using the cloud, the cost is too high. And do not forget that the cloud is also evolving. If your monolith requires 32GB of RAM, it is probably not a big deal since this can easily be provisioned on modern clouds. A few years ago, at this size, you would probably be already forced to review your architecture.
Service mesh
TL;DR: Use a high-level service mesh. Do not use plain Kubernetes.
Services, no matter their size, need to be brought live and need to communicate. The modern service mesh provides tooling for several important aspects of the software’s lifetime in production. Talking about GCP alone, you can look at plain Kubernetes, Istio, and Anthos. The higher the abstraction you can plug, the more you get. The service mesh operates on the boundaries of your services, and the more services you have, the more critical it becomes. Organizations that try to run microservices at scale and build their service mesh simultaneously will divert an incredible amount of resources from their core business.
An attractive sweet spot now is running Cloud Run for Anthos. This is a very high-level abstraction layer without imposing many constraints on your organization. Every cloud has something similar, like AWS Fargate or Azure Containers. What I believe sets Cloud Run apart is that it is close enough to the serverless offer without the downsides. Serverless is a pretty exciting trade-off, which imposes on developers the requirement of fast startup times, and it gives you automatic balancing, pay-for-what-you-use semantics, and a bunch of other goodies. It also imposes some restrictions on how to handle state.
Overall the restrictions by serverless are good practices software should adopt anyways, for various reasons, but it is still a world that requires a very strong opt-in from your organization. Cloud Run gives the same benefits and is a bit less stringent in imposing good practices. You can still use highly-optimized in-process concurrency frameworks and standard protocols (gRPC, HTTP), but the hourly restart means you don’t need to track down that nasty leak that only manifests itself after a week. And you can get pretty much all of the benefits of the pay-for-what-you-use model.
I remember a similar journey on a smaller scale as we started to build web services over twenty years ago. Full-blown server applications were being replaced by small cgi-bin scripts in Perl, powered by the Apache chassis. Those felt a lot like the microservices of that time. On the limit of simplicity, inetd mode forked a service reading from stdin and writing to stdout, very much like serverless nowadays. The world goes round and round.
Developer environments
TL;DR: favor vanilla stacks and limit their number; offer five-second modify/test cycles, syntax highlighting, and breakpoints from the get-go.
The discussion around microservices often evolves around people having the freedom to use the programming languages and tools they like. When a company is born, the first person writing code will pick the stack they like the best and will make the initial investment to have a productive environment. For a greenfield project, this is usually very easy to do. Just clone some skeleton git repository or use a scaffolding tool. As the organization grows, the dynamics change. New engineers come with their own experiences and preferences and get frustrated if they are not productive from the first day.
The reality is that setting up a development environment is a complex cross-cutting concern. As your company scales up, people come and go, and projects live and die. The requirements are also not obvious either. Since most people in a company in hypergrowth have joined recently, they want the development environment to be tuned for the day one productivity use case. On the other hand, the company has money to hire a lot of people because their existing code generates a lot of value, and getting the core seasoned team still productive and happy after two years is probably more valuable, even though it affects fewer individuals. These two use cases can often clash.
Once you are past a few dozen engineers, you should start thinking about developer experience as an internal product. A simple answer that is often flawed is to have a dedicated team. It gets better if it has a PM with OKRs and everything, but the risk is that the team forgets this is a cross-cutting concern and starts to build tools instead of adopting/buying from organizations that are thinking about the problem much more holistically.
You need a team, but it is a procurement and integration group.
Docker was one of the first external offers that helped with the environment skew problem by letting people adopt a mostly hermetic toolset. More recent offers, like devcontainers and GitHub Codespaces bring the solution to a whole new layer, and hopefully soon, people shall be able to code with a simple laptop, leaving all the heavy lifting to cloud-based systems. Jetbrains’ gateway and many others are pointing in the same direction. Just remember that whatever you go with, your engineers can use breakpoints, syntax highlighting, and quick tests on day one and in year two as well.
Regardless of your solution, or if you deploy thousands of services or a handful of them, the more diverse your environment is, the more expensive it will be to maintain it. Because of that, you want to have a few general-purpose stacks only. For example, it could be Golang in the backend, Python for data science, and Flutter in the frontend. Or it could be Kotlin, Julia, and JavaScript. Or node for isomorphic js code and scala for data. As will discuss later, there is often a need for special-purpose projects, and for those, you probably cannot afford the cross-cutting concern of having a top-notch developer experience. But for the core projects, focus on a few stacks, and deliver the best developer experience.
The bridge between the developer cycle and production, with staging and testing environments, among others, is also a difficult challenge. Skaffold is a new kid in town offering what docker-compose in a sense failed to deliver. Not only do you get continuous build (a breeze with gradle+jib integration), but you also override can backends at will to run private environments, locally or in the cloud, or point to staging or production. Resist the urge to create per-team shared backends, though. SRE should manage the global staging and production environments, and developers who need to modify multiple services can simply expose them under their own namespace. Shared state is hard.
Finally, remember that your developer experience squad goal is not to build wrappers on top of vendors. Otherwise, these become internal products and start to drag resources from your business. Their goal is to delete the wrappers and make sure your requirements can be adjusted to match the vendor’s offer.
Observability
TL;DR: company-wide top-notch logs infrastructure, APM metrics, canaries and flag flips, out of the box
There is a blurry barrier between the developer experience and managing production. Code usually is graduated from the developer environment to different deployments, sometimes called staging, or beta, until finally reaching production. As this happens, the ability to observe and control the environment also changes for a myriad of reasons. Extensive tooling is required for this journey, and their quality is key to fostering collaboration between the SRE team (or the group ultimately responsible for 24/7 production responsiveness) and those doing day-hour shifts and producing new code.
In the past, the DevOps movement advocated for full ownership of production by engineers. In the microservices world, we still see residues of that school. As organizations grow, though, this hasn’t held well in practice. The requirements of a few critical systems, including the mesh itself, will always end up in the hands of some specialized coordinator group. You can and still should have developer-driven systems in their infancy. The key is to make the road towards managed production cheaper than to deviate from it.
Isolate change observation is a critical aspect that drives people away from shared deploys and towards microservices. Unfortunately, a small service does not suffice for proper changeset isolation. The extensive usage of feature flags tends to be cheap enough and with a high payoff. This way, you can require code pushes to always be free of behavior changes and let engineers flip individual flags with close inspection. And you even get finer-grained backward/forward compatibility as a bonus.
This can make a daily monolith push safer than microservices pushes that bundle code and behavior changes together. I can’t emphasize enough how much of a game-changer feature-flags are for a codebase. As always, you can check Martin Fowley’s excellent writing for an overview of feature-flags, and then check out one of the several vendors that exist nowadays, from LaunchDarkly to Unleash, with many others in between. Don’t try to build it yourself, though. It is a cross-cutting concern; those who develop it as their first-class product will surely do a better job.
Pre-production and production observability tools are also typical cross-cutting-concerns because when needed, they are often across several services. You need the ability to join logs across services, and your idealized distributed tracing infrastructure won’t be properly configured if every engineer is spinning their own services with complete freedom. A proper microservices chassis requires opt-in effort from the organization that is virtually always beyond the skills or the effort budget of the teams working on the business’s problems. The solution is always the same.
Buy and adopt the best tools, and adapt yourself to them and not the other way around. ELK was a decisive innovator in this space, and today cloud vendors offer native alternatives with a similar feature set. Advanced tracing with correlation ids seems like a must, but, in practice, I have seldom seen teams that can keep that working, and even when it works, the extracted value seems to be not worth it. Give it a try, but don’t sweat too much.
Security
TL;DR: pay people to hack your system
No discipline of software development is more of a crossing-cutting concern than security. Security is not only very counter-intuitive, but it also requires a lot of cross-field knowledge and experience. From stack overflows to physical threats with guns, from XSS attacks to social engineering, it all needs to be taken care of. And even if you create the ultimate layered architecture to protect your assets, there will always be multiple bypasses, sometimes by mistake, often due to requirements. Never underestimate how deep an environment can be injected.
The security engineer belt is full of tools. Periodic pen tests, automated linters, SSO, and cloud policies. The one recent addition that makes a difference is the service provided by companies like HackerOne. As with all cross-cutting concerns, you should always look at outsourcing all the aspects you can. This is hard for security since it is highly intertwined with your products. This new crop of SaaS offers that brings a pay-as-use red team yields a great value simply because they emulate the threat model very closely. The services ain’t cheap, but there is enough competition in this space now that you may find a vendor that suits your needs. And once you sign the contract, you won’t be capable of ignoring security anymore, as you will find yourself flooded with work to do, and when the hose stops dropping, throw more money.
Also, don’t try to come up with your own security solutions. Do not write your encryption algorithm, do not invent your authentication solution. You cannot afford the resources a proper implementation of those systems requires, and you cannot afford an amateur in-house solution. And if you have not been hired as a security expert, you are an amateur in this field.
Luckily, the cloud vendors provide everything you need, and if you do a good job, you can get authorization into your storage layer. The trade-off with a poorly executed microservice strategy here shows up very quickly. You will need to configure SSO in different programming languages, have different secret stores, and so on and so forth. Move as much as you can to your service mesh chassis. Still, there is always some work that needs to be done inside the services, and if you cannot share libraries and best practices, you may end up with a very uneven level of security across your microservices. One failure may end up leaking too much.
Concurrency models
TL:DR: do not mix models, favor idiomatic, and be explicit about your choices
Significant complexity is introduced in a system with multiple, non-coordinates services due to the lack of a central clock. But even within a single service, there are already challenges in implementing concurrency to leverage the multiple cores and machines made available by modern architecture.
The simplest programming model is a purely blocking single-thread model. From the programmer’s perspective, there are very few traps, with zero implicit data sharing and all the coordination handled at the OS level. Concurrency is achieved through multiple processes, one of the recommendations of the 12-factor best practices. This trade-off suits simple CPU-bound loads well, but when your computation is mostly IO-bound, you start to need some form of multiplexing to achieve a reasonable throughput.
Unfortunately, no great general technical solution has yet been found to deliver highly efficient concurrency solutions without involving the programmer. Languages like C and C++ were developed in a single-core world and provided little support for concurrency. Java was one of the first languages to bring the problem front and center, advocating preemptive concurrency with a thread-per-request model, but the overhead was always significant. The reactive school of thought advocated cooperative concurrency, and several languages adopted syntax-sugar wrapping promise objects, often with the async/await keyword pair. Specialized libraries, like NumPy and spark, offered their own notions of concurrency optimized for their own use cases.
The concurrency model for a service may be seen at first glance as a private matter, but it is really a cross-cutting concern. First, because code is shared among services (either through public libraries or private ones), which often have expectations on the concurrency model. Second, the sort of problem that arises from improperly mixed concurrency models can decrease application performance by a thousand times. Subtle issues like thread starvation, unbounded concurrency, or blocking reactive threads will often lead people to conclude that code cannot scale and try to fix that by splitting services, only to end up in the same situation when the system grows by another order of magnitude. Ultimately, this comes from the fact that resources are not infinite, and you need to have reasonable utilization regardless of your architecture. If you fail on that, your users will have the services denied, typically once you get concurrency in the hundreds.
Unfortunately, there is no vendor that solves this problem. Part of the solution is trying to be idiomatic. In some stacks, this is easier, JavaScript does not have threads anyway, and the async/await syntax is well established. Python is also single-threaded but highly parallel through extensions. It also has a lot of blocking stuff and only recently adopted async/await constructs. Java and friends have a mixed model, and Kotlin structured concurrency is a great relief when it comes to error handling.
Go with goroutines do not offer strong guarantees but is simple, and programmers have been successful in applying its model. C++ and even Rust are still far from settling on a global idiomatic way of leveraging non-blocking io and multiple cores. Frontend programmers are also exposed to very similar challenges due to the rendering thread model used in native code and, to a lesser extent, in web code.
Above all, try to avoid mixing models within the same service or even stack. For example, the typical scientific Python code is CPU-bound and should live in its own single-threaded service. Stream and batch processing, often done by libraries like Kafka Streams, Apache Spark, or Apache Beam, also should be in its own services, using a thread-per-request model and a bounded thread pool. The typical CRUD application should choose between thread-per-request or a reactive model following the API of its database.
This will usually mean the idiomatic form of the language, but for some languages, it can go either way. Make sure you are explicit about the choice and provide configuration at the mesh level to support the option. Enforce it in code reviews and load-test your service to detect accidental bottlenecks when the models get mixed. Finally, make sure you have reasonable timeout defaults in the mesh and deploy backpressure mechanisms like circuit breakers everywhere. There will be problems, and you should both be robust to them and have the ability to diagnose them once they arrive.
Version controls and dependency management
TL;DR: one repo per stack, protocol buffers in a submodule
The way code gets organized is also highly influenced by Conway’s law. If you look at the open source community, you will see code freely available and globally indexed, but the repositories are very independent, and releases are mostly uncoordinated, relying on best-effort semantic versioning to avoid conflicts. The private for-profit organizations are usually more centralized, as they minimally share a mission.
Most large successful companies like Google and Facebook use large monorepos and vendoring as the code-sharing mechanism. This has been a rather successful trade-off as the barrier of consuming internal code decreases by a lot, mostly at the expense of tooling, which has trouble dealing with the scale. There is also a trade-off for developers, as now every new state of code needs to be tested with all downstream consumers at every change.
The first trade-off around tooling is purely a technical one. IDEs like IntelliJ, version control systems like git, or build tools like Gradle, are all fine-tuned to the open source model or small company model, which is where most of their users are. As your codebase grows, you leave this sweet spot and start to feel you should go to the multirepo paradigm and retain your tooling. However, those inside the large organizations that scaled monorepos found that the cost of not having global changes and the ability of vendoring dependencies will increase as your organization gets larger and will often try to scale tooling instead.
There are no vendors today that can give you the same experience developers from the Big Five get, and this is a cross-cutting concern that is just too expensive to tackle nowadays. Hence, you need to settle somewhere. In practice, it seems viable to have one monorepo per stack, each with a different level of maturity. The JavaScript ecosystem has several mature tools that spawned from Lerna, including NX which is inspired by the internal Google model. The Java world, through Gradle’s multi-project and composite builds, has reasonable support as well. Python still lacks much of what is needed, but cross-chain tools like Bazel can be of help. It is a significant upfront cost, but paying it once and gradually per stack seems feasible.
The other trade-off is more social in nature and comes from the fundamental property one wants in its codebase. If you upgrade a given dependency, do you want everyone else to upgrade at the same time? In the manyrepo world, the answer by default is no, requiring less coordination. The effect in practice is that different people in different teams stumble upon the same problem multiple times, increasing development costs and sometimes creating strong diversions that remove the ability of internal code sharing.
On the other side, if you share dependency versions, you will often find roadblocks like the need of upgrading the JVM globally, a large undertaking for the project that finds itself blocked. In practice, these are few and apart, and by having a developer experience dedicated team they can be made much cheaper. Most of the time, upgrading dependencies together is free, and for small diversions, you can always have escape hatches by letting subprojects override explicit versions for their needs.
How to choose which dependencies in your codebase is a matter of its own, and I recommend going through one of my past articles for guidance.
Even if we have one repo per stack, where do data models and other stack-neutral artifacts go? If you use protocol buffers for your data models, you are lucky. Protocol buffers are a very special kind of dependency since their backward/forward compatibility is not given by semantic versioning but instead by their schema evolution rules. Also, they are naturally cross-language artifacts. This means they can be vendored on the individual per-language repositories without needing atomic global updates.
A good solution for achieving this is exposing them as a git submodule on each language repo. Protocol buffer encoded data also have the same good properties as protocol buffer schema files, and similarly, submodules might be a good way of sharing cross-stack data, like mock table data and so on, as long as the data is encoded with protocol buffer binary format (or JSON, given that you follow the more limiting JSON schema evolution rules).
Design systems
TL;DR: pick modern, highly customized design systems
I will skip this one and recommend this excellent article:
https://medium.com/havingfun/rebuilding-loggis-design-system-on-top-of-material-ui-9555fede0466
Lakehouses
TL; DR: leverage modern cloud infrastructure for effortless extract-load, DBT for transform, and protocol buffers for robust schema evolution
As we have said, data comes first. And the data analysis part of modern companies usually lives segregated from the serving systems. There are good reasons for that; the main one is that the data view wants to be aware of the whole history of the data, whereas serving systems are primarily concerned with the current state of the data. Another reason for becoming less relevant is that serving systems usually perform better with row-oriented storage, whereas data systems benefit from columnar storage.
Most data projects spend most of their time struggling with the second reason, moving data from databases to columnar storage. Modern data storage, like Google’s AlloyDb, takes care of this automagically. For more traditional data sources, one can resort to tools like GCP DataStream or Airbyte that keep data synchronized between your database and your data warehouse for you. There is an extensive range of vendors in this space, and you should resist the urge to build your Kafka pipeline that listens to data and transforms and then stores it in your warehouse or lake. Forget ETL; embrace the ELT rage and leverage simpler tools and mental models. Reach out to DBT or Bigquery within your data lake for large transforms and golden datasets, and enjoy on-the-fly-way transformation for your explorations.
The complicated part comes when you remember that data exists across time. Don’t underestimate how valuable a fresh copy of your database is, but you want more than that; you want code that is robust to database changes; you want past versions of it; you want a log of the communication among your services. Since all those use cases cross the time axis, you need a robust schema evolution strategy, which, unlike database migrations, works across systems. Again, protocol buffers come to the rescue. If you follow their strict forward/backward compatibility rules, most problems will disappear. Use protocol buffers to describe your models and your events, and if you use the Outbox Pattern, those are one and the same.
A Growth Story
So far, we have warned about the risks of a poorly executed migration to microservices and spent significant time discussing how data and cross-cutting concerns should fit into your plan. In fact, those are the critical aspects to have in mind and usually the ones least talked about. Once that learning settles, let us take a look at how things usually play out in the open.
Probably one of the things that make microservices migrations particularly challenging is that we start discussing social matters, like the two-pizza team and Conway’s law, but what is taking over people’s sleep are purely technical problems, like system downtimes and slowness. When we mix scaling employees and customers, we get into a perilous place. Let us see an example that is very typical when companies go from ~20 to 50+ people.
In the beginning, there was ORM/REST
Startups need to transform themselves fast to get to the product market fit or die. The zero-to-one stage is very different from the late stage of companies. Whereas in the latter deep understanding of a problem and specialized tools reign, in the former stage, you need very general tools that you can quickly bend towards whatever the unserved demand you identified that month. And there is no better general tool for bringing online simple business requirements than relation databases (maybe spreadsheets, but let us skip that part).
Because of that, virtually all unicorns start with something that looks like Django or Ruby on Rails. Those systems are incredibly adaptive and allow your engineering team to support sales and product, as they cover ground discovering or creating a new market.
Up to twenty employees, no one will call such a system a “monolith” or complain about legacy code. After all, it is a very robust design, which companies have used to serve billions of customers for at least a couple of decades. Latency is really small, and observability is great. But when the company hits the jackpot, it will usually grow to, let’s say, 50 employees, and things will start to change. Not because you have more customers. After all, this design has supported decades of large systems, but because you have more complex requirements.
The horror
If, and only if, a company is successful, it will get swamped by the need to address the more sophisticated needs of clients. They will ask for features that are inherently slow or require batching for economies of scale. They will demand faster response times or sophisticated analytical capabilities. This invariably needs what I call “The Horror,” a distributed system disguised as a simple CRUD with appendages.
For offering analytical capabilities, developers will introduce a de-normalized version of the data, typically in the form of an Elasticsearch cluster of some warehouse technology, to improve response times, a Redis will be thrown on top of the database, and for those slow/batchy requests, there will be a cron job which periodically scans the database looking for non-consumed tasks and run them asynchronously, toggling the “completed” column in the end.
That last one is the infamous database-as-queue anti-pattern and a guaranteed way to have your system slowly degrade and suddenly halt as you generate periodic load peaks, eventually resulting in a snowball self-inflicted denial of service attack. Good luck taming that once it pops up.
At this point, the company is still mostly run by the early engineers who know the system inside out and keep patching problems. Newcomers, however, will start calling it “the monolith,” despite it being anything but that. Data is spread everywhere, code is pushed independently in the cron jobs, and there is concurrent data access from many sources. Goodbye, atomicity.
Finally, after the 50 engineers barrier gets broken and we start to get close to an organization of over 100 individuals, the old timers will feel ashamed for the system failures. As they get outnumbered, the company will decide to move to microservices.
Rinse and repeat
The decision to move to microservices is well-known to be expensive. To justify its cost, companies expect that after a fixed time frame, they will have stability and the ability to iterate fast. Naive groups estimate this in a couple of months of work, and more seasoned groups will define it as a multi-year undertaking delivering progressive improvements along the road.
However, the actual benefit that gets delivered on day one is that newcomers get to write code the way they want without the overhead of coordinating with the existing codebase. In a hypergrowth scenario, typically, startup folks will hire more startup folks, and they will build startups. With multiple groups undergoing the same journey with their tools of choice, you will end up with multiple CRUD stacks, which are highly idiomatic at day one but evolve to have their cron jobs, their own cross-cutting concerns solutions, and the same problems you had before.
You will evolve from “The Horror” to the “Never Ending Survival Mode,” where you get the same design mistakes replicated over and over again, often interacting through one (or more) higher-level Kafka-based event bus. As problems like idempotency are never properly addressed, data gets corrupted subtly in many ways. You are always giving up fixing the problems and rewriting your system in a new microservice. The cycle goes on and on.
This moment is often an existential risk for the company. Engineering resources can be fully diverted into this new architecture that is never delivered. When the new design starts to fail in different but familiar ways, you will often overhear that you have a “distributed monolith,” or the “microservices death-ball,” or the “big ball of mud architecture.” People talking about those also hint that you could do “microservices the right way,” and your problem would be gone (more likely, you would be back to square one).
What happens typically, though, is that the company will lose around one to two years of progress, and demographics will come into play. Enough newcomers will become old-timers and consider they already have their microservice and start resisting new newcomers who want to rewrite their systems. People will start to talk about macroservices, complain about abandoned microservices and rotten scaffolding tools, and start sharing domain best practices through libraries. Eventually, the engineer population growth decreases enough (overall or just in the core areas). In that case, the pressure to scale communication among employees decreases, and the work towards improving local quality and scaling towards numbers of clients gets traction, hopefully bringing stability and some of the efficiency gains desired.
This pyrrhic victory is a good scenario. In the worst-but-not-unrealistic scenario, competitors will outpace you, and the company will die. But even in a good scenario, it is not only that you lost time. The place you arrive still hardly feels like the promised nirvana. Simple things became simple, but hard things are still impossible. You never built the in-depth, domain-dependent platforms that the Big Five had the chance of building through their even more painful journeys and where they dug their moats deeper and deeper. Every new country you must launch still feels like building a whole new company from scratch. Every bug needs to be fixed a hundred times. Every team makes the same mistakes and writes the same workarounds with their mannerisms. You are slow because you are redoing the same work over and over again and failing the same way.
How can things be different?
Surviving Growth
If we want to do “microservices the right way,” we need to define it. Most of the time, the definition of a service is very open. At best, we can say a service should serve a bounded domain, itself a weak definition. The two-pizza team rule recommends that services do not get bigger than a dozen people. Let us try to rethink our growth story aiming for roughly the following targets: around one service for every dozen engineers, making the total number at most three times that, or at least a third of that, depending on how strongly connected your organization is.
This rules out the 100+ people working in a monolith and a company with more microservices than engineers. This is what we need to have in mind to scale the number of programmers in a company. But before we get there, we need to ensure our systems can scale to millions of clients. Once you detangle the problems, you can succeed.
First data
The first step when you start building a distributed system is to make sure you have a data language where you can specify schemas and ensure they can evolve without breaking any services. Because distributed systems have no global clock or state, you cannot assume things like atomic pushes or system-wide changes. We have discussed this already, but it is worth recapping the summary: ban migrations and adopt protocol buffers. The latter can be a bit hard due to tooling. But no worries, if you adopt the schema evolution rules enforced by protocol buffers, you will be good. I will name any table that follows those rules a “pbtable” (short for “protocol buffers table”), and we will see how interesting they are.
One creative technique I came up with recently to create a “universal” protocol buffer-based ORM is to use a Postgres JSONB column together with generated columns and INSTEAD rules, so you can have both the JSON view when working with full protos and a more regular relational view when just accessing individual fields. Let us see how it plays out in the long run.
Scale reads
If your system runs on the internet, it uses HTTP, which co-evolved with a sophisticated caching infrastructure to support the public web. This means that one can easily split traffic resulting in writes in your database from those that may not. The latter is likely to use the GET verb and is often responsible for the majority of the load of a system in its early stage of life.
Hence, before we even start talking about bounded domains, independent pushes, or code ownership, all concerns related to the number of engineers working in a system, we will leverage this natural read/write split to scale the system in the number of requests, by explicitly declaring that thing by default are eventually consistent, and that calls which generate side-effects go through a different path than intrinsically idempotent reads. This can be done nowadays by creating one (or many) read-only replicas of your data storages, a functionality broadly available today, and by creating replicas of your service that can only access the database read replicas. Finally, use the routing infrastructure to send all GETs to these newly created services.
This can be done without writing a single line of code; it is a pure SRE-led effort that can be done in a few days and will let you scale reads to millions of users. Yes, you are losing the strong consistency guarantee you had before, and a few use cases may break. You can add those requests to an exception list to fix them at your pace. Still, in practice, those will be few and apart because, often, the strong consistency guarantee is already absent for a myriad of reasons, from application code mistakes to subtle things like retries, load balancing, pushes, and so forth. When your millions of users start to reload your site, you add more database replicas and more copies of your nanoservice. You can scale horizontally as much as you want now.
We will call the new deployment, which initially runs a copy of the code of the monolith (but without write access to the database) a nanoservice. It is completely stateless, and although it can be pushed together with the monolith, it can also be pushed independently, and the code can slowly fork into something completely different. Yes, it is reading data from a replica of the monolith’s database, which may look like a violation of microservices’ rule of non-shared databases, but we will get to it in a second.
Now, contrast this with the usual strategy of extracting microservices from your CRUD service. Instead of creating a new deployment, you need to bootstrap a new codebase. And instead of replicating the database, you need to create a new one. Instead of days of work, this will take months. But worse than that, you are not truly solving the scalability problem. If you get too many reads, you can’t just increase the database because writes are still hitting there, and they cannot scale horizontally. You still have a design that is trying to keep strong consistency guarantees everywhere, even though you don’t need them most of the time.
Let us look at this from another angle. In a microservice architecture, each microservice should only talk directly to its own database. But often, it needs to join data that is born in a different microservice. You could make a request to that other microservice, coupling them together and mostly defeating the point of separating them. Instead, it would be best if you favored events and the CQRS (command-query responsibility separation) pattern. When the data is born in the first microservice (the command side of CQRS), an event is published, and your microservice can process it and store it locally with its data. Then when the request lands, everything is available and can be served locally (the query side of CQRS). It is sort of a long path where data gets copied around to be used in local joins. And because there are copies, we are again in an eventually consistent world.
Funnily, from the technical perspective, we landed in the same place by doing a read-only database replica plus a deployment clone in a few days and extracting a microservice over a few months. This is clear when you can think about the process that is replicating your database as a CQRS pipeline, with your nanoservice being the query side of CQRS. Every write in the database is an event that gets picked by infrastructure and results in an identity transformation, aka a copy of the data to the new microservice. There is a small caveat, though. The database replica gives decoupled data, and the cloned deployment gives decoupled code. But you still don’t have a decoupled database schema because you don’t control the database replication code. That lives in the infrastructure. If the database schema changes, you will need to coordinate the deployment of your CRUD service and your new nanoservice.
And finally, we get back to protocol buffers. In a traditional events+CQRS microservice architecture, people tend to think of the data flowing through the events as an ETL pipeline. But if you want the infrastructure to move the data around, your transformations need to come last, making this more like ELT pipeline. Instead of the low-coupling coordination around schema happening in the event publication/consumption, it needs to happen earlier as there is no transformation opportunity before the data movement. And with protocol buffers, it happens at schema design time by following protocol buffers rules. This means a read-only replica of a “pbtable” is in itself very much like a microservice. A somewhat dumb one as it only runs SQL queries that are highly decoupled (data, code, and schema), and you can add arbitrary code on top.
A database replica, some promises around schema evolution, a deployment clone, and routing GETs to it. Making eventual consistency explicit, we can scale the number of clients. Now we have our foundation from where we can start doing “microservices the right way.”
Scale writes
Handling writes is one of the hardest problems in software engineering. It is no surprise many programmers love functional languages, where side effects are kept at bay, and bugs are easily caught since everything is easily reproducible. But the harsh reality is that most useful systems have a complex state that must be dealt with. And this gets even more difficult when you have a distributed system, and there is no longer a global clock where you can observe the full state at once.
Writes in a database are side-effects; they modify the state of the system. In a CRUD system are usually associated with the HTTP POST verb (but really all other than GET). Modern database technology cannot yet easily scale writes (for many fundamental reasons), and developers need to use domain knowledge to make writes cheaper and more independent to make a system scale. When one extracts a microservice from another service, she is implicitly stating the writes across two services no longer need to be atomic, making them more independent. And when a large transaction is broken into several smaller ones, it results in cheaper individual writes. Finally, write latency can also be hidden by making it asynchronous from the client’s perspective, at the cost of not being able to handle errors while the client is looking.
Let us take a closer look at these ideas. Let us pick whichever POST request SRE believes is the most expensive for the system. It goes like this: open a transaction, read some data, compute, write some data, repeat this a couple of times, and close the transaction. This is the infamous read-modify-pattern, aka RMW, applied thrice, and it is notoriously hard to scale since the database needs to hold a lock over several operations, which are often fast the moment the code is written but get slow over time. As code changes, hotspots appear, and load increases. How can we break it into smaller and more independent parts?
Let us start decreasing the size of the transaction. Instead of three RMW cycles inside a single transaction, let us modify the code to have three transactions, each with a single RMW. The large transaction is more of an incidental and unnecessary thing, as it tends to be the case as code grows organically. If it is not, there is no way around it. You need to understand and modify the code so that splitting the transaction is possible. If you don’t, microservices won’t save you either. After all, you can’t extract the code. But often, one can modify the transaction boundaries, so let us assume it is true in this example. The problem after you break the transactions is that not only you are consuming data from potentially different states, which we deemed ok, but you are also now publishing piecemeal state changes from each transaction, and this broken-down state may be inconsistent.
The piecemeal state publication can be fixed by modifying the code to only make the changes visible in the final write. An elegant way of doing that is moving from the side-effect world of the transactional database to the functional world of event-oriented systems. The outbox pattern is a technique that is getting increased adoption because it bridges those two worlds very simply. When you want to publish an event, instead of working with queue APIs, you write to the database, and the infrastructure (e.g., debezium) will make that write into an event for you.
This is not just an improvement in ergonomics; it lets you publish both side-effects and events atomically, which is impossible if you reach those two worlds independently. So, now you can have each one of your RMWs decide whether to make its final effect visible or not by writing to the destination table or to delay its visibility by writing to an events table so a different replica of the service can pick it up from there with all the data it needs. This may all seem a bit too complicated at first, but with the proper syntax sugar, it is no different than adding celery’s .delay in your function calls, and it will suffice to scale horizontal resources like the CPU and thread-pool of your service. And, of course, the smaller critical sections will alleviate much of the database load.
Now, let us break one of the RMWs into a read/modify phase and a write phase. For that, we will create another clone of the deployment, call it a microservice, and think of it as the “command” side of the CRQS pattern. This deployment will be connected to the read-only database replica and be capable of publishing events through the outbox pattern through a dedicated database, but it won’t be able to write to the original database. When the POST arrives in our monolith, we will forward it to this new deployment. There, the read and modify parts will run as usual, but we will intercept the write call and instead write its input as an event in the events table. This will get routed back to our monolith, which is allowed to write in the database and will now run the final, quick write part, making the side effects visible.
Technically, this is similar to the outbox pattern but takes care of buffering the input for the monolith and goes by the lesser well-known name of inbox pattern. The multiple writes that yield events until it leads to a consistent state as if it was an atomic operation are part of a distributed transaction implemented with the choreography flavor of the SAGA pattern. You have redesigned your three read-modify-write from RMW:RMW:RWM to RM->RM->RM->WWW. All this is harder than it sounds, and remember; it can be done request by request, focusing on the slower ones. Once you manage to decrease the database timeouts to <10ms for all operations and the total request budget to <100ms, you should be safe.
The usage of the inbox and outbox pattern for communication among the services has several interesting properties. First, as events always originate from a write in the database, there is the opportunity to create an idempotency key for them automatically, making it much simpler (and explicit) to handle at-least-once semantics in the application code. On the consuming side of events, instead of using the pull-based made popular by kafka, we can use a push-based model with HTTP or gRPC callbacks, which completely isolate application programmers from the complex queue APIs, and they are free to continue working with the simple semantics from the CRUD world. Again, this is more than ergonomics.
The push-based model has the advantage of allowing very high throughput by sharing the parallelism configuration across consumers and producers, but it makes the scaling a shared enterprise (e.g., you can increase service replicas, but if there are not enough Kafka partitions, you won’t get more throughput). In the push-based model, you lose some efficiency, but concurrency is much simpler, and even a satisfactory form of backpressure can be achieved by using circuit breakers at the infrastructure level. This is all possible working directly with queues, but in practice, teams never manage to get all the (many) details right.
Up to this point, we just improved the request by moving the read load to the read-replica of the database and bringing the new deployment to take care of the CPU load. From the client’s perspective, nothing has changed it, and the monolith is still hanging on that POST as we will distribute that load and wait for the write call to wrap it up. We have improved throughput significantly, but the latency gains may still not be enough. In that case, we use the ultimate trick by making this an async request from the client’s perspective. For that, we route the POST itself to the microservice, and once we get the request, we just write it as an event before any computation and return success.
The event will trigger the choreography we described. We will treat the communication with the client as we have treated the communication across the deployments. This is a powerful trick that is often abused in a microservices architecture. First, it requires a change of signature, requiring an idempotency key in the request. Otherwise, the client cannot safely do retries. Second, it requires machinery for the client to get the status of the command, whether it succeeded or more importantly, failed. In a straightforward CRUD architecture, failure happens with the client on the line, so you can communicate and react to failure.
In async architecture, you need some sort of callback which is hard to implement and often is forgotten with significant consequences. Minimally, you should have mesh-level configuration to send failures to dead letter queues and connect pagers to them. Don’t make clients’ requests async unless you must.
So, unlike scaling the reads, scaling the writes require understanding a few more concepts and some modifications to the application. But more of these pieces are now available in the cloud. For example, GCP supports outbox pattern and PubSub callbacks out of the box, and Microsoft’s Dapr can be used where those are not native. In fact, virtually all of this could be provided by a Kubernetes operator, and I eagerly await it to show up somewhere. Application code still needs to either be explicit or use some sort of syntax sugar for remote procedure calls, available in frameworks like Celery Tasks or Apache Spark. Moving all this to the mesh and libraries removes the traps people face when implementing ad-hoc retries (e.g., no jittering nor exponential backoffs) and error handling (swallowing errors, dead letter queues rotting everywhere).
Despite the availability of infrastructure, this is still a much simpler task than “extracting a microservice” since you will have to face all those problems there as well. In fact, what we are doing here is the first step of extracting a microservice, by solving the technical problems first so that we can scale the number of clients. Only then we move to solve the social problem of scaling the number of engineers.
Wrap It Up
Up to this point, we have discussed nothing about bounded domains, different programming languages, or application semantics. Instead, we created some clones of the monoliths and splitted the requests among them, according to whether they were GET or POST. The former went to the replica we called a “nanoservice,” and most of the latter to the one we called a “microservice.” In the monolith, we kept the requests that could not afford eventual consistency.
We scaled the database with read replicas and improved decouplement by changing the communication across the deployments to be event-based, which was done with minimal code changes due to the use of the outbox pattern on the producer side and callbacks on the consumer side. We used protocol buffers to prevent the need for deployment coordination when the data schema evolves. The blast radius has decreased a lot, and the usage of circuit breaker in the mesh should take care of the more egregious failure modes.
And what we have in the end is, from the technical perspective, very much like a microservices architecture. We are still considering everything as one big bounded domain, and we still have a single, vertically scaling write database, but this is the starting point to scale the number of engineers. In the number of clients, this design can handle millions or even billions of clients. The rules are simple. If something is slow/expensive/fragile and you don’t have the time to improve it, make it eventually consistent. If it is a read, move it to your nanoservices and enjoy infinite linear scalability. If it is a write, move it to your microservice and make its response async, at the cost of giving up on error screens and monitoring dead letter queues instead. And that is all.
In fact, this is how programmers instinctively approach these problems, but things will break everywhere without making a clear bridge between the eventual consistent world and the transactional world.
Finally, your deployments can now start to diverge. For convenience, we cloned them, making them assets built together from a monorepo, but deployed with different database access permissions. But that can change. You can make them different build targets and only ship relevant code. You can split them into different repositories if scaling the tooling is getting too hard. There may be one request you really need to optimize, so go ahead and use a different programming language that is a better fit for that specific technical problem, even if all the services belong to the same bounded domain.
But you can do it at your time when you see a benefit in doing so, not because if you don’t, your system will overload. And you can create new “monoliths” or new “microservices” or new “nanoservices.” As long as you follow the database access and communication rules, you should be good.
And as time passes, you may see a few more specialized load types. Maybe you have some machine learning computation that you want to be in Python or that you need to run on a GPU-enabled machine. You can add a service with no access to any database and call it a “lambda service.” Or you may be missing that time-based aggregation that needs a peek at the event stream. For that, plug Kafka stream or something similar directly in the event bus, and enjoy the exactly-once-semantics you always struggled to achieve when your system did not separate the side-effect-inducing computations from the event world.
We have created a microservices chassis without ever extracting a microservice from our monolith. And we did it in days for the read case and probably weeks for the write cases. Not in months, not in years. With things in peace, in the next section, we will see how to leverage that chassis to let the newcomers have fun. You survived the 30–50 people growth phase. Let’s see how we go from that to a thousand and beyond.
Thriving on Extracted Value
Finally, with a stable and scalable system, we can make sure new people joining the company can create new value without worrying that the existing systems that pay for their salaries are on the verge of a breakdown.
The idea that we will explore is the one of freedom with responsibility. As we have seen, complexity in distributed systems is higher the closer the code is to the data. State is a hard thing to tame, and even communicating around it can be difficult. Newcomers will move faster if they don’t need to coordinate with several other teams and learn the intricacies of each part of a multi-year codebase. But to produce value, they will need access to the system data. We will categorize this need for data in three dimensions, whether they need to read, modify or write data, and whether they need to do it now, i.e., they cannot tolerate eventual consistency.
With the answer to these three questions, we will define four “classes” of services: monoliths, microservices, nanoservices, and lambda services. Respectively, those will have the main role of doing CRUD, event processing, querying, and computing. Let us talk about them.
Lambda services
Lambda services only do computation. They do not talk to other services. They read from a database, and they do not write data. All they do is receive a request, manipulate the data in the request, and send back a response. And yeah, AWS Lambda or Knative Function would be very convenient ways of implementing them.
For example, if you need to solve a combinatorial problem, there is a good chance you will want to reuse or-tools. One reason is the parallelism properties of the library depend on the chosen algorithm, and their scheduling needs may conflict with the surrounding application.
A more mundane reason is that your main backend language might be typescript, and there are no bindings for or-tools in that language. In that case, you spin a lambda service which is a thin wrapper on top of or-tools exposing a simple gRPC API for solving your specific combinatorial problem. Other isolation techniques like process forking, swigging, or shared memory have a much bigger bug surface, are way more fragile, and will rarely work in practice. If it must be colocated, go with a sidecar.
Although the limitation of not talking to other services or to the database may seem like a big limitation at first, if you adopt the event carried state protocol pattern, you will be able to express a lot of needs with lambda services. And they are a joy to test and debug as they need no mocking.
Nanoservices
For nanoservices, let us go with an example. The company is running a Superball campaign and needs a landing page with some dynamic elements, e.g., stock stickers, that can sustain a heavy load. For rendering that page, only reads are necessary, and if the data is slightly out of date (e.g., seconds), it is no big deal. In that scenario, what we have dubbed as a nanoservice suffices. The fancy name means that this service is not allowed to write or modify any data and that it reads its data from eventually consistent database replicas.
Despite those limitations, nanoservices are very powerful. If you are storing your data in “pbtables,” a nanoservice can access it, and in a well-designed system, this can be virtually all data. It suffices for the data owner to promise not to break forward/backward compatibility, and then you can spin a read-only replica. They are a great place to put business logic and slow joins or scans. Since we aim not to get flooded with services with no maintenance, we require a project to budget two engineers over a quarter to be deserving of a new nanoservice. If you think of your architecture using the CQRS pattern, nanoservices are where the query logic goes. Nanoservices already have some cross-cutting concerns, like accessing the database, so they are preferably developed in one of the approved stacks, but if someone thinks a given new stack is ready to be used, give it a try.
Microservices
We reserved the name microservices for services that work like the idealized microservices people often aim for. They receive events and publish events and hold no private state. Whereas, nanoservices are reading eventually consistent data, microservices are where writes that will yield an eventually consistent view go. In a CQRS architecture, you would see them as the command service. If you are a fintech, a typical task that would suit well a new microservice is computing an investment portfolio for a new user. It can be done using potentially stale data, the computation deserves more than a few milliseconds, so it is naturally asynchronous, and you want to write its result.
If you are using the outbox pattern and a push model for your consumers, writing a microservice is no different than a regular HTTP service. You get requests, compute and write in the database. There is no need to write in a queue like Kafka or poll from it and worry about partitions and coordinate consumer and producers’ capacity when scaling up or down. These are all concerns from the push model, which was made popular by kafka, due to its extremely high throughput, but very few applications need to optimize that. Let outbox pattern infrastructure publish the event for you and configure infrastructure-level callbacks to send the next HTTP request. Virtually everyone is better served with this simple push model with requests and responses.
Also, microservices will usually stitch data from several sources, so to have decent throughput, it is important to pay attention to the concurrency model. Minimally, you need to avoid blocking on I/O. Whereas CRUD services are happy with thread-per-request, microservices usually favor a reactive model.
Finally, because the microservices responses will eventually be published and modify our state, we need to be more careful when developing those. So, for spinning a new one, we will require three engineers working over two quarters, and the code being developed in one of the approved stacks.
CRUD services
Although CRUD services are usually where companies start and where most engineers have experience, they are the most complex class of services to develop because they deal with strong data consistency in the presence of reads and writes. This is a great property, but as Uncle Ben would point out, it comes with a great responsibility.
There are several sorts of hard to foreseen traps as a CRUD service or a monolith scales. They are less severe than people think, and, as we discussed, several companies have scaled them to billions of users. Most traps will be avoided if you keep (all) database timeouts in the <100ms range. You will still have a somewhat hard time mocking state for testing or being in peace by settling on a simple thread-per-request concurrency model and letting the database do the heavy lifting. You must protect yourself from poorly behaved inbound traffic with circuit breakers. But as long as you don’t give up on the first bump and stop to rewrite it as a (broken) constellation of undisciplined microservices, it will work.
If you are introducing a new bounded domain with its data model for the next 100 billion dollar project, do not hesitate to spin off a new django/rails server even if your colleagues look you down. It works. But because getting to that 100 B will take some time, to create a new CRUD service, you should have a team that looks like an early startup, with around six engineers working over three quarters. And to ensure SRE will carry the page once it goes to prod, as should be the case for all data-bearing services, you must stay in a company-approved stack.
Freedom with responsibility
We have introduced four “classes” of services with the goal of giving people freedom with responsibility. The less dependent on state a service is, the more freedom people in the organization have in creating them. As state gets involved, more responsibility is required.
Notice that the definition of the classes doesn’t say anything about how those services are implemented. They are only an SRE concern, a set of configurations in the deployment that will grant the service access or not to data operations. In fact, the same binary may be deployed simultaneously as different service classes. For example, we could use this trick to route the command endpoints of a binary to its microservice deployment, whereas the query endpoints go to a nanoservice. Not only is this a good separation for resource allocation, but it allows us to establish a rule that services only talk to lower classes of services, avoiding the infamous Christmas tree light problem.
More importantly, we have fully detached the problems of scaling your artifacts in the number of users and in the number of developers. For the former, split your service into deployments that have a different level of access to the database, and make timeouts tight when close to data. For the latter, let people create bounded domains that connect to the company business and not the technical constraints. Inside each bounded domain, repeat the scaling formula by starting with a crud service, and as things grow, create new deployments and re-route endpoints.
For needs that are born more independently, introduce new services within that bounded domain according to your needs. New bounded domains let you break the 50-engineer barrier, and now you can scale to hundreds of developers working together. Because client scalability is taken care of, you can focus on people, bow to Conway’s Law and work on the business.
A final interesting observation is that the amount of state does not increase linearly with the company size. In fact, with a strong product market fit, it is more important to optimize value extraction from existing datasets than introduce new ones. This means that the number of people working in crud-services will not increase as fast as those working in micro-services once a tipping point is passed. And as it keeps going, this will be repeated for nano-services and lambda services.
When you have thousands of engineers, optimization needs for value extraction will be so extreme that the company will build its libraries, and data will be so distant that domain-independent services will pop up everywhere. Get to the tens of thousands of engineers and stay on that path as long as Google, Microsoft, or Amazon to finally reach the moment you pivot to sell cloud services :-).
Scaling Your Artifacts With Your Team was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.