Part 1 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 2 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 3 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 5 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 6 – Service vs Components vs Microservices
In part 3 we saw, that in order to ensure a higher degree of autonomy for our services, we need to avoid (synchronous) 2 way communication (RPC/REST/etc.) between services and instead use 1 way communication.
A higher level of autonomy goes hand in hand with a lower degree of coupling. The less coupling we have, the less we need to bother with contract and data versioning.
We also increase our services stability – failure in other services doesn’t directly affect our services ability to respond to stimuli.
But how can we get any work done, if we only use 1 way communication? How can we get any data back from other services this way?
Short answer is you can’t, but with well defined Service Boundaries you (in most cases) shouldn’t need to call other services directly from your service to get data back.
What is a service boundary?
It’s basically a word that’s used to define the business data and functionality that a Service is responsible for. In SOA: synchronous communication, data ownership and coupling we covered Service principles such as Boundaries and Autonomy in detail.
Boundaries determine what’s inside and outside of a Service. In part 2 we used the aggregate pattern to analyse which data belonged inside the Legal Entity service.
In the case of the Legal Entity service we realised that the association between Legal Entity and Addresses belonged together because LegalEntity and its associated Addresses were created, changed and deleted together. By replacing two services with one we gained full autonomy for the Legal Entity service whereby we could avoid the need for orchestration and handling all the error scenarios that can result of orchestrating data mutating calls between services (LegalEntity service and Address service).
In the case of the Legal Entity the issue of coupling was easily solved, but what happens when you have a more complex set of data and relationships between these data? We could just pile all of that data into one service and thereby avoid the problem of having data mutations across processing boundaries (i.e. different services that are hosted in other OS processes or on different physical servers). The issue with this approach is that this quickly brings us into monolith territory. There’s n0thing per se wrong with monoliths. Monoliths can be build using many the same design principles described here, e.g. as modules instead of as microservices, which are bundled together and deployed as a single unit – where as microservices often are deployed individually (that’s at least one of the major qualities that people talk about in relation to microservices).
Blurry boundaries – the slippery slope of monoliths
One of the problems with monoliths is the risk of blurry boundaries. Because the modules are bundled closely together, often in the same code base, they have a tendency of slowly deteriorating due to more and more coupling between the modules because its easy and in the beginning convenient to do so: It’s too easy just calling other functions, components or join with other tables to be quickly done.
The taste of a monolith feels good, especially at the beginning of a project when problems are fewer and complexity is lower.
Monoliths also have a tendency to take on too many responsibilities in form of data and functionality/logic.
With a monolith you can:
- Take advantage of locality
- Perform in memory calls and avoid distributed transactions
- Can perform Joins with other components SQL tables because they’re in the same DB
- Take advantage of development IDE’s and use features such as refactoring, code completion and code searching
The back side of this coin is the risk of higher coupling and low cohesion. Monoliths tend to form a slippery slope where they slowly grow larger and larger, because they take on more responsibilities, because it’s easy to just bolt it on the already existing data and logic.
This is what I like to refer to the slippery slope of monoliths:
With monoliths we also risk running into several disadvantages, such as:
- They are hard to adapt to new technology – you often need to rewrite the entire monolith to use new frameworks/languages/technologies (or use complicated solutions such as OSGi)
- Low Reusability
- Functionality of a part cannot be reused alone
- Slow Delivery train
- Introducing a new feature often requires coordination with other features to deliver all of them at the same time
- The grow and grow and grow in size and responsibilities
- Higher and higher coupling
- Higher and higher maintenance cost over time
- Starting the application often takes a long time
- Testing the application often takes a long time
- Monoliths force high requirements on mental capacity in order to keep the entire monolith in your head
- The failure of one thing can potentially bring the entire monolith down (e.g. due to OutOfMemoryException)
You can design monoliths with internal services/components that have loose coupling and well defined boundaries, but from my 20 years of experience these are rare cases. Big balls of mud is usually the norm.
Integration as a bunch of webservices
From my experience many organisations approach SOA by taking the path of bolting (web)services on top of existing monoliths. This can definitely makes sense as a path to getting a higher degree of reuse for old monoliths.
The problem with this is: most monoliths have evolved to contain many different business capabilities. This means that companies end up having multi-master systems, where many systems owns similar or the same business data and they have no real single source of truth.
If we just take the existing monolith and carve them up into small (micro)services, then we also need to deal with the internal coupling of the monolith. The internal coupling is typically the result of inheritance, direct method calls, SQL joins, etc. If this is our approach to creating (Micro)services then we’ve just gone from bad to worse.
All of this is the result of weak or blurry service boundaries. We have services that are needy and greedy with regards to other services data and functionality. In my opinion this is not loose coupling, it’s the opposite.
Defining Service boundaries
When building new services or carve out services from old monoliths, we need to spend time defining the boundaries of our new services, so we (slowly – for migration cases) can get away from using 2 way communication between our services, except when authority is more important than autonomy – but more about this in a later blog post.
Note: High autonomy is not necessarily the solution for all cases. There might be cases where using 2 way communication is more cost effective from a development point of view and where the lack of autonomy is something that the organisation can live with (e.g. for reads across many services).
In an old monolith we might have collected all functionality and data related to a Retail domain, which might include functional areas such as Product Catalogue, Sales, Inventory, Shipping and Billing. Each of these functional areas could also be called subdomains or business capabilities:
Retail is about selling Products, so each of these functional areas, or subdomains, will in some way involve the domain concept Product:
- A product exists in the Product Catalogue together with e.g. the name, the description, pictures, etc.
- In the Sales subdomain we create Orders for Products.
- In the Inventory subdomain we will be interested in quantity of a given Product we have in stock (QOH) and e.g. where the product is placed.
We may or may not use the name product here, sometime it can be called a Stock Item or a Back Ordered Item, etc. depending on the state of the item/product in relation to our inventory.
- In the Pricing domain we will be interested in pricing strategies for our products. This may also include customer discounts depending on customer statuses (which might be maintained in a CRM monolith/services).
- In Shipping domain we are interested in size, weight of the Products plus where they should be sent to, etc.
If you think about it, all subdomains relate to Product in some way. Subdomains might use the same name or they might use a different name for Product (or other domain concepts such as customer, etc.). They’re also interested in different data in relation to products. Inventory is e.g. interested in Stock Keeping Unit (SKU), Quantity On Hand (QOH), Location code. For them the name or the picture of the product may be irrelevant. If they need it, it would be to aid Inventory workers in doing their job; it would not be a necessity to handling inventory.
On the other hand, Shipping would not care about QOH, Location code, etc. They will be interested in size of the products packaging, the weight and perhaps the name if they were to print the shipping receipt.
These different perspectives on the Shipping domain is what is known as different Bounded Contexts in Domain Driven Design (DDD).
In a monolith it would be very easy to create a Product table with many attributes/associations and then have all the different subdomains just insert/update and join data as they see fit. The risk is that this Product domain model will become big and it will have many reasons to change (Single Responsibility Principle violated) due to the coupling and lack of coherence.
You can’t change the Product table layout since others depend on it. Elevating this from colocated code (e.g. in Java) to services and service contracts basically just removes the technical coupling – the fact that a service still needs data- and functionality from another service will decrease our services autonomy to a level that may be unacceptable.
Defining Service Boundaries
We need a way to design our service boundaries so our services don’t need to talk to each other using 2 way communication in order to fetch data or invoke functionality.
We could start by building our services around functional areas, or business capabilities, and use that as our boundary. This means that our service owns its own data and functionality.
Other services aren’t allowed to own any of this services data.
There can only be one master of the data. With this guarantee in place we can trust our service to be the single source of truth with regards to all of its business data.
By doing this we ensure that our service only needs to respond to changes if the business functions that it’s responsible for changes.
This is also known as the Single Responsibility Principle (SRP) for services. You can read a good discussion about this here and here.
Note: The example below is meant as the first step in the approach to building more loosely coupled services. Defining service boundaries is not easy, so in the next couple of blog posts I will dig deeper into how we can define better aligned service boundaries than what we get from the rudimentary approach described here. E.g. I think many will argue that the Product Catalogue is not a best service boundary, but since many organisations have one of these I will include it here.
Let’s start with Product Catalogue service. In here we will store our single source of truth in relation to the Product aggregate, such as: name, id (remember an aggregate needs a unique id), pictures, description, etc.
In the Sales service we build up Orders based on the users choice through e.g. the web shop.
In Sales our interest in Products amount to the ID of the Product that a specific OrderLine relates to. We don’t need the name of the product to build up an Order in the Sales service (we need the name in the webshop, but that is a read use-case and here we’re focusing on the write use-case of building an Order in the Sales service)
With this our boundaries and service models might look like this (simplified):
The two service domain models above represent two very clean data models which has high coherence and low coupling. The only coupling between the two is that OrderLines reference Products by ID (remember the rule from part 2 which says aggregate reference each other by ID).
The WebShop (which is the client of many services) is responsible for displaying the products for sales, the price the customer needs to pay for each product. When the user has completed filling his Shopping basket, the WebShop will send a command message to the Sales services which contains the quantity, unit price and product id’s for all the products the customer wants to buy. In a later blog post we will look at how we can take advantage of composite UI’s to ensure a low degree of coupling in the WebShop, but for now just assume that the WebShop is the client calls each service using 2 way communication.
As long as the Sales Service is provided with the quantity, unit price and product ID then it can create Orders and add OrderLines without needing to talk to the Product catalogue service.
But what happens when the Sales service want to send the customer an Order confirmation?
When the user receives his Order confirmation, e.g. via email, he’s is interested in seeing more than prices, quantities and Product ID’s. He wants to know the name, maybe a picture, of the product he has ordered, so he can be sure that the confirmation contains what he intended to buy.
So how should the Sales service get hold of the Product’s name from the Product catalogue while preparing the Order Confirmation email?
Let’s look at some options the Sales service has available:
- The most common approach: The Sales service could use 2 way communication to call the Product Catalogue Service for each OrderLine in the Order (either as a call for each or a batch call that collects information for all the Product referenced in the OrderLines)
- This means that the Sales service is now has a stronger contractual and temporal coupling to the Product Catalogue service. The Sales service knows which operations and data the Product Catagolue service offers.
- This means that whenever the Product Catalogue service changes, in a non backwards compatible way, then the Sales service also needs to change (even if the Sales Service didn’t care about the change) or the Product Catalogue service needs to version its contracts.
- This problem can be resolved somewhat if the Product Catalogue service offered consumer driven contracts, where client of the service, e.g. the Sales service, determines what their individual contracts should look like.
- If the Product Catalogue service is down, then the Sales service can’t create Order Confirmations due to the temporal coupling. This might not be a big issue since Order Confirmations aren’t time critical or directly exposed to customers.
- IF the Sales Service also was responsible for rendering the UI containing the Products in the webshop (since it could own the webshop), then the temporal coupling between the Sales Service and Product Catalogue service might be too temporally coupled. Just because the Product Catalogue (which e.g. could be in an ERP) is down/unavailable then it shouldn’t mean we cannot create and accept new Orders in the Sales Service!
- Added 28th of February 2015: There are other approaches to SOA that are different from the Autonomous Service approach I’m describing here. One other approach worth mentioning sees services as not being autonomous or owning business data, instead Services in this approach expose intentional interfaces and are responsible for coordinating interaction between different Systems of Record (SoR). Using this approach, as I understand it, the Product Catalogue (service) and Sales (service) would instead be classified as SoR’s, and still be autonomous. Instead a new coordinating Service that owns the “Send Order Confirmation Email” use case will be introduced. This service will call both the Product Catalogue SoR and the Sales SoR and fetch Order information and Product information to complete the Order Confirmation composition. The service’s operation might still be triggered by an event.
- The Product Catalogue service UI is mashed into the Order Confirmation process.
- This is a more subtle and weaker form of coupling because the Sales Service doesn’t need to know the data inside or contract for the Product catalogue service (except for a very small shared rendering context defined by the UI)
- Service mashup still involves temporal coupling between our services
- I will get back to Composite UI/Service mashup in a later blog post
- The final option would be that the Sales Service contains cached/duplicated data from the Product Catalogue. This could be accomplished, without temporal coupling to the Product catalogue service, by using Data duplicate over Events
Data duplication over Events
When Products are added, changed or removed from the Product Catalogue we can notify other services of this fact using Business Events.
In this case the Product Catalogue service is so simple from a business perspective that the business events would resemble Create/Update/Delete aka CUD Events: ProductAdded, ProductUpdated and ProductDeleted.
Notice that the events are named in the past tense, which is an important fact.
We could make the Sales Service listen for these events over a Message Channel (e.g. Publish/Subscribe style) and allow the Sales Service to build up its own internal presentation of Products with the data it’s interested in:
This will result in the following Service data models:
By using data duplication in this way we have gained the following advantages:
- There’s still a clear owner of the data, product catalogue is the owner of the data and will notify dependent services when data is changed.
- This form of data caching technique is better than most traditional caching mechanisms, where you typically lack any notification or indication from the owner of the data, of when the cached data are invalid. With events you’re notified as soon as the data is changed
- The contractual coupling is lower. You’re only bound to event contracts which only contain data. The event contract are therefore much simpler than classical WSDL service contracts that have both data and functions. Experience shows that Event contract tend to change less often than normal functionality hungry contracts. Still they have to be designed with forward compatibility in mind, so it’s possible to add new non-mandatory fields without causing existing subscribers to fail.
- The degree of coupling between the Product Catalogue service and Sales services is much lower.
- The Sales service only needs to know the event contracts and the message channel address
- The Product Catalogue service doesn’t have any coupling on the Sales service. It doesn’t know what the Sales service intends to do with the events it receives – in fact the Product Catalogue Service doesn’t even know which services gets its events.
- You’ve broken the temporal coupling and technical coupling at the expense of being eventual consistent.
- This follows nicely with the learnings from Pat Hellands “Life Beyond Distributed Transactions – An Apostate’s Opinion” (PDF format). In this paper he concludes that you can only be consistent within a single Aggregate instance (i.e. within a transaction and therefore within an service) whereas you have to be eventual consistent between aggregate instances (i.e. between services and also between individual transactions inside a single service) because we have no way of ensuring consistency between them unless we’re ready to pay a very high price and use distributed transactions.
- In this case eventual consistency means that if the Message channel is unavailable or unable to deliver messages to the Sales Service, then we might be writing out product names in the Order Confirmation even though they have changed. As soon as the message channel is back up, the Sales service will catch up with the Product Catalogue service. Being eventual consistent is actually the norm when you use caching, whether you use Events or not.
- We can make the eventual consistency problem smaller by anchoring your events to time. This can be done using the event name and data. In the data you could inform the recipient how long into the future the values are valid and therefore cacheable (e.g. prices might only change once a day, product names rarely change, etc.)
Sceptics might look at Data duplication over Events and say that it looks like a lot of work for something that could be achieved by existing database technologies. And if that’s all you do use events for, then they’re not entirely wrong. Using Data duplication over Events is also not without its complexities, such as monitoring, channel setup, I/O overhead, added memory and storage footprint for the service(s) that duplicate the data.
Using Data duplication over Events is a well known, technology neutral, pattern for slowly separating monoliths into autonomous services, but it’s not the final solution for event based integration.
We can go even further and reap more benefits. We can use Events to drive Business processes.
Using business events to drive business processes across services
If we elevate events from CUD events to be real business events that reflect the state changes (or facts) in our aggregates, then we can use these events to drive business processes instead of having to use a centralised coordinator (also known as orchestration) to coordinate our business processes (typically) using 2 way communication.
Let’s look at how we could drive the Order fulfilment process using Events. From the Webshop the customer presses the Accept Order button. When the user presses the Accept Order button, this triggers that an AcceptOrder command message is sent (typically asynchronously) to the Sales Service:
The AcceptOrder command results in a state change in the Order, which as a result transitions into the state Accepted.
This state change (or fact) is communicated to all interested service as an OrderAccepted event – we’re stating the fact that the Order has been Accepted, which is irreversible (we can compensate, but not rollback this change).
The Sales service doesn’t know who’s interested in the event, but at company level we have rehearsed our Order fulfilment process and agreed which services should react to the OrderAccepted event.
This is known as reactive programming or Event Driven Architecture (EDA) and it’s very different from the classic BPEL inspired Orchestration approach where services are instructed about what to do.
With EDA services them selves determine what to do when an Event occurs. For scenarios where we need to coordinate multiple services, e.g. to make sure we don’t perform any shipping until the customer has been billed and all items are in stock (or what ever the criteria for shipping might be) we will introduce a new Aggregate that will be responsible for the Order fulfilment business process and capability. Whether this process aggregate belongs within the Shipping service, or if it’s a standalone service (as depicted below), is not so important right now. The important thing is that we have identified a central business capability that we explicitly can assign the responsibility to.
Such a process aggregate can be implemented/supported by a Process Manager or a Saga (as it’s called in Rebus and NServiceBus). The process manager can choose to instruct other services on what to do (i.e. partial orchestration) if it needs to, but in general a lot can be solved using events alone (we will in a later blog post get into when to favour other message types, such Command messages or Documents over Event messages).
In the example below the Order fulfilment service awaits two events, OrderAccepted and CustomerBilled, before it publishes the OrderReadyForShipping event (in this case we could also have sent a ShipOrder command to the Shipping Service, but let’s stick with events for now).
To coordinate two events it requires that they contain enough information to to reveal that they’re related to the same Order fulfilment process instance. This could e.g. be the OrderID or another form for Correlation ID.
This form of coordination between services using events is also known as Choreography, which can be seen as a supplement to the more traditional orchestration (any real life solution will be using a combination of both approaches).
There’s much more to say about Event Driven Architecture, Service boundary definition but this blog post is already long enough, so that will have to wait until next time.