SOA/Microservices: synchronous communication, data ownership and coupling

SOA/Microservices: synchronous communication, data ownership and coupling

Danish version: http://qed.dk/jeppe-cramon/2014/02/01/soa-synkron-kommunikation-data-ejerskab-og-kobling/

I read the other day that the new system Proask for the National Board of Industrial Injuries in Denmark, was the first major project that would realize the Ministry of Employment strategic decision to use a Service Oriented Architecture ( SOA). For those who have not heard of Proask, it is yet another strongly delayed public project which, like most other public projects, are trying to solve a very big problem in a large chunk. A lot can be written about this approach, but in this blog post I will focus on here is their approach to SOA. A related article reports that the new Proask system is 5 times slower than their old system from 1991.

The Proask project was initiated in 2008. It made me think back on that other ( private) SOA prestige project from the same period, for which I was an architect for a subcontractor. The entire project was built around SOA with many subsystems that would deliver services. The entire architecture was built around an ESB that would act as facilitator in terms of mapping and coordination. All communication was done as synchronous WebService calls over HTTP(S). So classic SOA for the period 2003-201? (sadly synchronous calls are still the predominant integration form today). This SOA realization was also characterized by very poor performance, high latency  and low stability.

But why ? The reason lies in the way they had chosen to split up the services and not least that they had chosen to communicate synchronously between the different services and made use of layered SOA approach.

So what’s wrong with synchronous integration?

As such, nothing. It depends on how and when it is being used. If we integrate with an older application, there is often nothing to do but to use the synchronous API that is being made available (if a public API exists at all). This kind of integration is already happening on the trailing edge and API, data ownership and functionality of the service/application we are integrating with is so to speak carved in stone and hard / impossible to change. It is also important to realize that such an integration falls under the category “Integration by a bunch of Web services” or at best “Enterprise Application Integration (EAI)” (over WebServices / REST / etc.). Synchronous integration between services IMO has nothing to do with SOA since it breaks several of the basic SOA principles.

SOA Principles?

Don Box (the man behind SOAP) suggested the 4 Tenets of service orientation:

  1. Boundaries are explicit
  2. Services are autonomous
  3. Services share schema and contract, not class
  4. Service compatibility is based on policy

Principles 1 and 2 are easily overlooked and the focus has mostly been on Principle 3 and 4.
For me, Principles 1 and 2 are very important because they are the ones who will help guide us when we design services and assign them responsibility. They also help to give us the answer to why synchronous calls between Services should be avoided.

First SOA Principle: Boundaries are explicit

That the boundary of a service is explicit can mean different things to many people.
What I am emphasizing on is: A services responsibilities with respect to both data and functionality is clear and coherent.
With coherent I’m thinking about both communicational cohesion (logic and the data the logic is working on are both part of the same service), layered cohesion (a service is responsible for data persistence, handling business-related-security*, user interface, etc.) and Temporal cohesion (the parts / aspects involved in the handling services functionality are grouped together so that they can be executed close to each other in time). Without communicational cohesion and layered cohesion, it is impossible to achieve Temporal cohesion which is a prerequisite for the Second SOA Principle “Services are autonomous.

Business related security – for example is this user allowed to perform this action like cancel this purchase, approve this transfer, etc.

Most of us have grown up with monolithic applications driven by databases. What I have observed is that this type of applications tends to end up with a large / thick data model where everything ends up been connected to each other (because it is easy and convenient to create a join table, a union in a query , etc.).
It starts usually simple but eventually end up easily with a big bowl of mud:

Slowly our data models grow in size until they finally get confusing and messy

Slowly our data models grow in size until they finally get confusing and messy

One of the challenges of large data / domain models is that the same entities / tables ends up being used in many different contexts ( use cases ) .
Knowledge of what associations / columns / properties you can / may use lie hidden in the code and end up being implicit knowledge . Examples of this are the small rules people are told to remember : ” Do not join these tables unless the customer is in arrears ” or ” the property only has a valid value if xxxx is true.”
Another problem is that we in the name of normalization or reuse prefer to reuse the same entities / tables to represent two (or more ) entities in our domain that happen to share the same name, without questioning whether they in fact are the same concept (or a different perspective of the same concept) or if they area different concepts with conflicting names . Within each domain, for example online retail, there are many sub-domains that have their own needs and specialties. One of the reasons that we end up with large domain models is that we do not break up our models per sub-domain.

Sub domains with in the retail domain

Sub domains with in the retail domain

Just because some of our usecases all involve an entity called ” Product ” does not mean that it is the same concept or they share same meaning of the word (“Product”) that the usecases attach to the entity. Unfortunately most of us have only learned to look for nouns ( entities ) and verbs (functions) when we analyze use cases / stories / etc . There is no good guidance on how to figure out whether the nouns / entities we find really deal with the same concept or the same perspective on a concept. It is risky to automatically elevate common Entities to Sub-domains/services . Unfortunately this is what happens in most organizations – we have a Customer Service or Product service. It does not necessarily have to be bad if the definition of the Customer is coherent. Where it becomes problematic is when we take data / perspectives on e.g. a Customer from other sub-domains (eg . Billing , Shipping, etc.) and mix into the Customer sub – domain/service, with the result that the Company sub-domain grows in complexity and quietly gets less and less data cohesion because it’s trying to be something for everyone or everything for none (it’s hard to satisfy everyone’s expectations without disappointing them or succumbing to the pressure ) . In these cases, our service boundaries gets blurred.

When we get islands of data / functionality in this way it becomes necessary for our service to call other services to get the necessary information from it to perform its task. How can we obtain this information without using some form of two way communication (eg. Synchronous WebService / REST calls over HTTP) between our service and the other services that it relies upon?

Data service islands, synchronous communication and coupling

Data service islands, synchronous communication and coupling

Conclusion: If our services have are well-defined boundaries (ie the data and the logic we need is present within the service) and through our implementation we have achieved communicational cohesion and layered cohesion, we have created the foundation for taking the next step and achieve Temporal cohesion and thus temporal decoupling of our services. The higher the cohesion we have, the less we need to rely on other services for our own service to perform its work. As soon as services needs to talk to each other they begin to know too much about each other which increases our coupling (both the temporal coupling and the data-structure / functional coupling which reduces communicational cohesion).

In a future blog post I will look into how we can analyze a domain and identify sub-domains and services.

Second SOA Principle: Services are autonomous

Autonomy means that our service is independent and self-contained and as far as possible doesn’t directly depend on other services to be functional.

97% of the SOA solutions I’ve seen mainly use synchronous service calls as their form of integration. In some contexts it is impossible to avoid synchronous integration, but in most contexts it is certainly possible to avoid them if we take care to not violate the first SOA Principle “Boundaries are explicit” as described above.

Why is synchronous integration so problematic?

Temporal and contractual coupling is obvious (we depend on other services to be available for our service to work and we depend on them not to change their contracts in a way that requires us to migrate). There are many other challenges that make synchronous integration even worse. Let’s base our argumentation on the sequence diagram below and identify some of the problems that arise when integrating using synchronous function call (no matter if it’s Web services or REST):

Synchronous integration example

Synchronous integration example

Here we have a Process Service “Create Sales Order Service” which in sequence (could perhaps be done in parallel) calls the two Task Services “Create Order Service” and “Create Invoice Service”. These task services respectively each call  two Entity / Data Services: “Order Service”, “Customer Service” and “Invoice Service”.

Problems identified with the example above:

  • Latency time from the “Create Sales Order Service ” is called until both Task services has called their two Entity Services and can return an answer.
    • Sometimes you can parallelize some of these calls and other times there is a forced sequence due to inter- service dependencies ( eg . You can not use the Customer in the other Task Service before the Customer has been created by first Task Service)
    • It is not uncommon to see examples where > 10 service calls are needed to complete one process service.
  • If just a single of the underlying service is unavailable the ENTIRE process service (and all other services that may use the same underlying service or our service ) is unavailable. So if Invoice service is down, we can not accept new orders .
    • Most processes, such as in our example, can benefit from postponing some steps to a later stage/time. Eg . for most online retail systems it will be better to be able to receive an Order even though the Invoice service is down. If the arrives minutes, hours or days later is rarely a major problem.
  • If a single service calls from above takes a long time then the entire Process Service takes a long time (weakest link in the chain).
  • If one or more of service calls from above update data and we experience Faults / Exceptions or other errors (eg . IO error) we are faced with an inconsistent system which require complex compensation logic. Below I have given an example of the challenges you will encounter in terms of compensation. The example is by no means complete. The real solution is much more complex and must take into account that System B can be shut down / crash during the compensation process – also known as resume functionality :
    Transactional compensation with synchronous integration

    Transactional compensation with synchronous integration

     

The reason for synchronous and chatty services like above are often caused by violation of the First SOA Principle “Boundaries are explicit”.

Conclusion: When our services develop data / functionality envy and thereby need to talk to other services, then our services begin to know too much about each other which increases our coupling, latency and stability, because we depend on other services availability and contract stability. All this reduces our autonomy and effectively creates a card house that can be easily knocked over. In order to achieve a higher level of service autonomy, we need to avoid integrating using synchronous service calls (and no it does not help to perform the calls with an async API , the temporal coupling between the services is still the same – our service can not continue before the other service has responded). 2 way communication in the form of request / reply or request / response communication is not the way forward , we need to look in another direction to find a solution to our decoupling and autonomy needs.

This will be the subject of the next blog post. Until then, I’m really interested in hearing your opinions and ideas 🙂