My "todo" List

I am planning to write some blogs at somepoint of time in the future on the following topics:

Spring Framework, Hibernate, ADO.NET Entity Framework, WPF, SharePoint, WCF, Whats new in Jave EE6, Whats new in Oracle 11G, TFS, FileNet, OnBase, Lombardi BPMS, Microsoft Solution Framework (MSF), Agile development, RUP ..................... the list goes on


I am currently working on writing the following blog

Rational Unified Process (RUP) and Rational Method Composer (RMC)

Thursday, August 29, 2019

Kubernetes (K8s), ELK, Kafka, Fluent and logging data analytics

Introduction
In this article I will provide an overview of how you can send logging information from Kubernetes (K8s) cluster to a central place for data analysis. Even though I am providing explanation for a K8s cluster the same approach applies for non-containerized applications.
Architectural Overview
The above logical diagram represents a high level overview of the technology stack involved.
A typical log data generation flow is described below
  • User will access different applications that are hosted on K8s cluster. In the above example I am showing two K8s services each with three replica sets. The K8s cluster provides load-balancing and all the capabilities needed for High Availability (HA) and auto-scaling. These user interactions will create application specific logs, for example: an invalid login attempt, runtime application exception etc.
  • The K8s Cluster can be a 3(or N) Node cluster for small scale applications. On each of the nodes you can run K8s DaemonSet that will stream log data from Docker containers hosted on that Node to Kafka cluster or ElasticSearch (ELK – ElasticSearch, Logstash and Kibana) Cluster.
  • The DaemonSet will run light weight version of Fluent (Fluent Bit), it’s possible to use Filebeat instead of Fluent Bit if you want to stay within the ELK ecosystem.
  • I am using Kafka cluster to provide application specific filtering and routing.
  • The ELK cluster is used to store all logs and acts as a central repository for log storage.

Component description:
Kubernetes (K8s) Cluster
This is the typical Kubernetes cluster that will host containerized applications.
Web: service – 3 replicas/App: service – 3 replicas
This represents a typical distributed application where your web application is hosted as a Service within K8s cluster and your application layer is also hosted in the same cluster.

DB

I have shown the DB component outside the K8s cluster as that helps having specialized teams optimize that layer outside of K8s. Also when I say DB, it’s not necessarily relational DB, it can be a NoSQL database like MongoDB or a combination of both. Also it’s not a hard and fast rule to have data layer outside the K8s cluster, I am just providing what is a typically preferred approach.

DaemonSet (Fluent Bit or Filebeat)

Either a Fluent Bit container or a Filebeat container can be deployed as a DaemonSet on each K8s cluster node, the container can be configured to stream log information from Docker location (“/var/lib/docker/containers”).

NOTE: You can also use the same container to stream logs from other locations like “syslog”.Assuming you are using Fluent Bit, within the K8s cluster you will have to define ConfigMap that will be referenced by the DaemonSet Fluent Bit container that will then use the mounted configuration files and stream log contents to either ELK or Kafka depending on how the Fluent Bit configuration file is defined. Below is a high level overview of how the Fluent Bit configuration setting is defined
  1. You start by defining a starting configuration file which will load different sections of Fluent Bit configuration by referencing other files (via includes)
  2. You will have one file defining the different inputs – which in this case will be the location of Docker file container logs on each node, you will use the standard “tail” plugin that comes with Fluent Bit for streaming log contents.
  3. You will have one file defining the different filters – in this case you will use the standard “kubernetes” filter that comes with Fluent Bit to annotate each record with details that include information like – pods namespaces, node name, container name etc. This additional metadata can be used later on to do data analytics via Kibana in the ELK cluster.
  4. You will have one file for different outputs – in this case you will define one output stream for Kafka using the standard “kafka” output plugin and another for ElasticSearch using the standard “es” plugin
The diagram below gives an overview of different configuration elements involved in the Fluent Bit


Node 1 ………..Node N
This component represents the actual nodes that are hosting the K8s cluster. For smaller applications start with a 3-node cluster and then scale from there.

Kafka Cluster

The Kafka cluster is used to filter log records that are getting streamed to it in order for you to create application specific Tasks (through the Kafka Consumers) or send alert notifications, SMS etc.

ElasticSearch Cluster

The ElasticSearch cluster is used to store all log records so you can use Kibana to create visual dashboards and use the ELK console client to perform data analysis.
A few implementation details to consider
  • Even though I have mentioned Fluent Bit as the log forwarder in this article, you can certainly use Filebeat as a viable alternative if you want to stay within the ELK ecosystem.
  • You should consider defining buffer limits on your Fluent Bit input plugin to avoid back pressure
  • By default Fluent Bit does an in memory buffering of records while data is routed to different outputs, consider using file system buffering option provided by Fluent Bit to avoid data loss due to system failure.
  • I have used “Kafka” as a central filtering option for log records. It’s possible to use filters at the Fluent Bit level to remove records before they make their way to the Kafka topic. I prefer the former as that way I have all the log records in the Kafka Topic, in case if I need to do detailed analysis later. It also eliminates the need to utilize CPU/RAM processing power at the container level if the filtering of log records is not done there. The benefit of doing filtering at the Fluent Bit level is that less data transmits over the network(it’s a trade-off)
Conclusion
I hope you liked this article.

Thursday, April 25, 2019

Microservice and Distributed System infrastructure with Spring Cloud

Introduction
In this article I will provide an overview of the building blocks for creating a Microservice infrastructure and in general a distributed system infrastructure. I will be using the Spring Cloud project an umbrella project under which various sub-projects are available to provide the building blocks for creating a distributed system (Microservice is one such example). If you are looking for a technology that provides all the components out of the box then I suggest you go with Kubernetes (K8s) the most widely used open source containerized platform for Microservices. This article is for folks who want to build these components from ground up.

Building Blocks

The above diagram provides some of the build blocks that go into providing components that are required for a Microservice or distributed system implementation. The individual block description follows
External Configuration Server
In a distributed system there needs to be a way for configuration related details to be centrally available and managed for all the services so that changes can be done from one place. External Configuration server component serves that purpose. Within the Spring Cloud ecosystem, Spring Cloud Config server provides that capability. Consul and etcd are the other two widely used alternatives.
NOTE: If you create a multi-node Kubernetes cluster it uses “etcd” as its central configuration server (runs three pods typically for high availability)

API Gateway

When you are exposing the Services via API (typically REST API) so that they can be consumed by either other services or external systems you need a gatekeeper to control the access of these services. API Gateway is the component that provides you with that capability. With the Spring Cloud ecosystem, Spring provides framework that allows you to use Netflix Zuul as one such API gateway server. Just recently Spring Cloud announced another project initiative called Spring Cloud gateway that is worth checking. There are many other alternatives listed in the above diagram and that list is not even complete. At a minimum some of the capabilities that this component provides include

  • Dynamic routing
  • Security around access to services
  • Rate limiting
  • API management
  • Acts as a proxy to external clients accessing the internal Services.

Service Discovery
With services build around Domain Driven Design (DDD) or Business capabilities, the need to allow these services to register and discover each other dynamically is there. The Service Discovery component provides that capability. With the Spring Cloud ecosystem, the Netflix Eureka Server provides that capability. Apache Zookeeper and Consul are popular alternatives.

Logging & Monitor Tools

With any distributed architecture the need to have logging, monitoring and tracing capabilities is essential. There are multiple projects/libraries that are providing such functionality a few of which are listed below


Sleuth
Sleuth provides capabilities to capture identifiable logs which can assist in distributed tracing. Spring Cloud provides integration libraries for Sleuth.

ZipKin

Zipkin provides visual tracing of calls made across multiple services. It relies on Sleuth to provide the identifiable logging capabilities upon which it relies to provide a visual representation via a dashboard. Spring Cloud provides libraries that directly integrate with Zipkin.
With Sleuth/Zipkin you can identify Services in your workflow path that are taking longer time to execute and can identify the root cause visually very quickly.


Hystrix
In a distributed system where inter service communication can be complex and where one slow or non-functioning service might bring the entire service flow orchestration or choreography to a grinding halt. It’s essential to have fail-safe mechanism in place. Circuit Breakers provide that fail-safe capability. Netflix Hystrix is a library that allows a fail-safe service to be called if the primary service is not functioning based on expected response times. Spring Cloud provides libraries that integrate with Hystrix and also provides a dashboard that gives you visual insight into a Service health. Hystrix relies on Spring Actuator a capability provided by Spring to monitor and manage any Spring Java based application. Spring Actuator exposes the endpoints via JMX and/or HTTP and Hystrix relies on those exposed endpoints for its functionality.


Turbine
Hystrix provides metrics on a per application basis. We need something that aggregates metric across applications and provides a visual dashboard for the same. Turbine is the library that provides that aggregate capability across applications.

Logical Architecture:


The above diagram provides details regarding the various components that can be utilized through Spring Cloud to provide a distributed system infrastructure that can be used for Microservices architecture implementation. All of the individual components shown above are typically implemented via Spring Boot style applications with Maven controlling which component specific dependencies provided with Spring Cloud can be used

In the following sections I will elaborate on some of these components to explain how they interact with each other in the overall distributed architecture.

Zuul Proxy
A typical user flow will start with user accessing a website URL via the Eureka Zuul proxy. The Zuul proxy server registers itself as a Eureka client with the Eureka Server so that it can discover all the application specific services that it needs to interact with on behalf of the user. Zuul proxy allows you to control which request headers can be passed on to the services invoked by the client. Typically you will pass JWT tokens for single page applications via http headers to the services so they can authorize the client’s requests. The Zuul proxy runs as any other spring boot application and therefore can utilize the Spring Security framework to authenticate the calling user or application. In the diagram I am showing Redis cluster being utilized to externalize the session management capability. Spring Session another Spring sub-project allows sessions to be externalized into Redis cluster which is what you will require to do in a distributed system versus the traditional application server specific session management. If you are using Angular/React or any other Single Page Application (SPA) you can use JWT tokens for authorization. Hystrix (Circuit Break pattern) support is built into Zuul and can be configured to provide fail-safe service implementations.


Spring Cloud Config Server and Eureka Server (Discovery)
All components that need to externalize their properties to a central place will register with Spring Cloud Config Server. The Eureka Server will also via the bootstrap process register as a client with Spring Cloud Config server and the Spring Cloud Config server will register itself as a Eureka Client with the Eureka Server which allows other services to use the Spring Cloud Config server to store and retrieve their service specific configuration details. Basically both the Eureka server and the Config server will act as clients for each other allowing dynamic discovery to occur.

Redis Cluster

Spring Session provides capabilities to externalize any session management to Redis cluster which is needed for a distributed system so session state is not restricted to specific application server instance. You can also use Redis cluster as a cache server for your Spring Services that are encapsulating your business functionality. Also if you plan to use RDBMS (specifically MySQL) as one of your data layer you can consider using Redis as your second level cache to improve the performance of your database calls.

NOTE: If you are building a Single Page Application, you can and should use JWT tokens to invoke Services via the API gateway in a stateless manner versus the traditional way of managing state via session cookies in a typical server-side web application.
Services

Depending on your design approach, your services can be built around business capabilities or DDD. In the diagram I am showing a sample application that is dividing services based on business capabilities. Each of these services in turn are simple Spring Boot applications that register themselves as discoverable Services with Eureka Server. It is through the Eureka server that other services discover each other so does the API Gateway (Zuul). This decoupling allows each of the services to be build and deployed independent of each other and allows us to horizontally scale them for High Availability and increased work load. For example, in order to include multiple instances of Shopping Cart Service, all you need to do is run multiple instances of Spring Boot application and have each of those instance use the same Service-ID that is used by Eureka server to identify those instances to any clients (Discovery Client) that needs to consume them. The client then gets to load balance request between those service instances which typically are stateless because any state that they have is either saved in Redis Cluster (for sessions/cache) or in Data layer (relational like MySQL or NoSQL like MongoDB).

Two project libraries worth mentioning in the context of Services are

Feign

It creates a declarative style REST client that allows services and clients to invoke REST API services quickly with annotation based approach and with very little boilerplate coding. You should use this client library in all places where you are consuming REST API exposed via Services. The other benefit that you get with using Feign is that you can make direct reference to Service names that are registered with Eureka Server as Feign can directly integrate with Eureka Server and can discover services automatically for you based on service name only.


Ribbon
Ribbon provides client side load balancing capabilities with configurable load balancing rules like round robin, availability filtering and weighted response. It also works with Feign to allow service consumers to access services that are load balanced on the client side using one of the pre-defined rules as well as customizable rules if the predefined ones do not fit your needs. Zuul Proxy also uses Ribbon to invoke Services.


Spring Security

As mentioned before, each of these components are simple spring boot applications that can easily take advantage of the Spring Security framework to perform authentication and authorization of the users.
Spring Security provides Authentication Providers for – custom databases, LDAP, SAML 2.0 (as SSO), OAuth2, CAS and a few more and if none meets your need you can always extent the security framework to provide your own authentication/authorization layer.


Data Layer
The choices to pick here are plenty and hence I am not going to elaborate on this component as your business needs will help you determine whether you need a relational database versus a NoSQL type of database like (MongoDB) or an in memory data grid like Redis cluster. One thing worth mentioning here is that, if you plan to use a Microservice architecture and do not want your data layer to be a single point of monolith communication. You can consider using a Database per Service approach which allows you to build individual services in a “silo” manner. One thing that you may then need to consider is how you achieve eventual consistency post that architectural decision. Some of the things to consider for doing that is mentioned below
  • Event sourcing – you can use Apache Kafka as an event source. There are other alternatives like using an Event Store database. Conceptually what you are doing here is you are storing events in your database (either Kafka or EventStore) along with Payload of the entity whose state is going to change as a result of that event (or action). The way event sourcing works is you store these events and arrive at the final state of your entity by replaying these events from the start until you arrive at the final state (basically you are doing a replay or simply put as an analogy – you are seeing a cartoon character drawn at different static positions run when you flip through pages. This analogy brings back childhood memories for me). You may have to use techniques like snapshot to improve performance to ensure that overtime you do not start replaying the events from the beginning but start from a snapshot of the data at a more recent point in time.
  • Saga patterns to achieve eventual consistency– Two of the saga patterns that are basically built on an Event Driven Architecture needs to be considered - Choreography-based saga and Orchestration-based saga. I am not elaborating on these as these are well defined patterns in an Event Driven Architecture
  • Be prepared to design systems for eventual consistency

In a nutshell, your choice of Database per Service will make your implementation of a distributed system a little bit more complex given that you have to design for eventual consistency. If your business need does not require a Database per service approach I recommend using one database to achieve your ACID requirements.

Conclusion

I hope you liked this article as always send me constructive feedbacks.

Wednesday, April 24, 2019

Shibboleth IdP/SP, Spring Boot with Redis and MySQL – SSO flow

Introduction
In this article I will provide an overview of following technology stack
  • Single Sign On (SSO) identity/access management using Shobboleth IdP, CAS server and Shobboleth Service Provider (SP)
  • Using Spring Boot for Rapid Application Development
  • Redis to cache Spring data as well as its use as a second level cache for ORM (Hibernate/JPA)
  • I will also provide an overview of JHipster – a Swiss army knife for code generation
There is a lot to cover from the above technology stack, the intent of this article is to provide pointers to create enterprise grade scalable system and assist Software Architects in taking decisions that will help in scaling their systems using the mentioned technology stack. I know this is heavily tilted towards Java audience but a similar approach can be used for .NET using ER framework.
Architectural Overview
The above logical diagram represents a high level overview of the technology stack involved.

A typical user activity flow is described below
  • User will start the flow by accessing the protected resource (web application) via the browser
  • The Service Provider (SP) will intercept that request and will redirect the user to the identity Provider (IdP)
  • After successful authentication and exchanging of SAML token the Service provider will redirect the user request to the web application and will set Authorization Headers that the web application can read and determine what the user is authorized to do. These authorization headers are not sent back to the user browser for security reasons. Typically you will have an Apache server hosting your Service provider acting as a reverse proxy for your web application which is hosted on Tomcat.
  • Depending on what the user is authorized to do different User interface functions will be enabled for subsequent actions.
In this flow, I am assuming the authorization piece (roles, access policies etc.) is application dependent and in the above diagram is stored in RDBMS (MySQL) database. It’s recommended to externalize this part for all applications into a Central Access Management System (similar to Identity Authentication - IdP) thereby standardizing access management across all applications.


Component description:
LDAP server

In the overall Single Sign On (SSO) flow, you want to ensure your user passwords are centrally managed. LDAP server represents one such option
Identity Provider (IdP) – like Shibboleth IdP or CAS Server
In the SSO flow, this component will abstract the authentication piece from multiple applications and will provide a standard interface for authentication to multiple applications. You can either use Shibboleth IdP which uses SAML protocol for authentication or CAS server if you are using the CAS protocol. (NOTE: CAS server also supports SAML).
Service Provider (SP) – like Shibboleth Service Provider
In a SSO flow you can protect your web application behind a Service Provider which can co-ordinate the SAML SSO flow with your IdP and pass the authentication details via HTTP headers to your application. Shibboleth SP can be used as a service provider to perform such a function. If you happen to use the Docker image for this component then it will be running Apache server and can be configured to act as a reverse proxy for your Web application. Alternatively you can co-locate the SP with your web application. If you are using Spring then you can use Spring Security SAML that provides SAML support within your application and can act as a Service Provider for your Identity Provider. I still prefer a separate Service Provider configuration via Apache reverse proxy approach as that is abstracting the cross cutting responsibility from individual applications and also provides an extra layer of security through proxy function.
Web application
This is the protected resource (the web application) that the users are accessing in the SSO flow. In the following section I will provide further details on this component.
Redis Cluster (or AWS Elasticache or Azure Redis cache)
You can use Redis as a distributed Data Grid to reduce read load on your relational database. The Redis cluster can be build using virtual machines in your local data center or you can use cloud providers like AWS/Azure that provide Redis cluster as a service. You can use client drivers (like Redisson) that can then communicate with your Redis cluster. Redisson also provides capabilities to integrate with Hibernate to provide second-level caching via Redis to JPA queries thereby reducing database queries and speeding up performance of your application. Redisson comes in two flavors – professional edition and open source edition. I recommend going with professional edition as that gives you features like local caching, data partitioning, XA transaction to name a few. These features will come in handy to provide enterprise level scalability (and the cost is not that bad)

RDBMS (like MySQL)

In the diagram this component is where your data storage will occur. If you are using MySQL you can create read replicas on top of your Redis Cluster to further improve read performance. You can also use a NoSQL database like MongoDB instead of a RDBMS like MySQL to store data.

A few implementation details for the components to consider

There are Docker images available in Docker Hub for
  • Shibboleth IdP
  • Shibboleth SP
  • CAS Server
  • Tomcat
  • OpenLDAP (if you want to use this as your LDAP server)
  • Redis
  • MySQL
You can use these images to containerize your application and deploy it in container aware environments like Kubernetes (K8s), EKS, AKS, GKS or ECS. Containerizing your components will help you design and build a scalable, highly available and resilient system.

Web Application Architecture:


The above diagram provides further decomposition of the web application component.
Spring MVC/Spring Boot

On the left hand side of the dotted line, you will see decomposition of a typical Spring MVC framework. The right hand side shows the same using Spring Boot. If you are heading towards Micro service style architecture and wish to use Java platform then using Spring Boot is the way to go. It allows you to build applications rapidly. If you are building even monolithic applications I suggest you start with Spring Boot since it utilizes under the hood the same concept that Spring MVC does – Dependency Injection (DI) and Inversion of Control (IoC). It also reduces a lot of boiler-plate code which encourages rapid application development. In my opinion, in the Java world Spring Boot is the answer to other competing technologies like NodeJS, Ruby on Rails and/or Django framework (python) in order to do rapid application development.

For the rest of the topics in this article I am going to assume the use of Spring Boot
Redis Cluster (or AWS Elasticache or Azure Redis cache)
This component is used to provide caching at two levels
  • At the application level (Spring) and
  • As a second level cache for Hibernate
NOTE: Hibernate provides first level caching at the session level by default.

As mentioned before if you are using Redisson client (open source or professional), it supports connecting to local data center Redis cluster as well as AWS/Azure hosted Redis clusters.

View

You have multiple options to pick from
  • If you are looking for server side rendering of user interface (UI) then you can pick either JSP or Thymeleaf or any other templating engine.
  • If you are building a Single Page Application (SPA) you can select Angular, React or Vue (there are many more choices to pick from in the SPA world but these three cover 80%+ of the SPA market interest)
In the next section I am going to talk about a code generation tool called JHipster that will allow you to build production grade code very quickly.
JHipster:
The above diagram provides a high level overview of how you can integrated SSO with SPA application. I thought of including this section so I can talk about one of my favorite code generation tool – JHipster. I call it a Swiss army knife for code generation. In the above diagram the dotted line section is generated automatically for you by JHipster.

The user interaction flow described earlier is still valid which is the user starts the flow by trying to access the protected resource (the SPA application in this case), the only delta here is as follows:

Once the service provider redirects the user internally (via reverse proxy) to the location where the SPA application is hosted. The rest of the interaction from that point forward in terms of authorization between the SPA application and the server side component (which typically is REST API exposed via Spring Boot) is done by exchanging JSON Web Tokens (JWT).

In order to code generate with JHipster, you typically start with creating an Entity or Domain model using JHipster online graphical editor. You can then save the model to a file which contains the textual representation of the graphical model in what’s called JHiphster Domain Language (JDL). You can feed that JDL file to JHipster’s code generation engine and voila you have a working application with basic CRUD operations ready for all your Domain entities. JHipster has a lot of options to generate code for different technology stacks but at a high level you get to decide if you want to create
  • A Monolithic application or a
  • Micro Service application
When you select a monolithic application, JHipster will auto generate code spread across following layers using technology stack that you select during initialization step. Some of the layer choices are mentioned below
  • It creates a SPA application using either Angular or React at the client side.
  • On the server side it creates a Spring Boot REST API driven server layer
  • It also through Hibernate/JPA layer give you option to talk with RDBMS layer (MySQL, Oracle, SQL Server etc.)
  • It allows you to pick caching option for Hibernate and Spring Boot. Unfortunately not Redis at the time of writing this article. However, given that Spring Boot and Hibernate provide annotation based caching capability that is independent of which Cache provider you are using. The amount of time to change the code to use Redis is very much restricted to touching one or two files at the maximum. The rest of the Spring Boot/Hibernate code is unaware of the underlying caching provider. With that said JHipster does generate code automatically for Ehcache, Hazelcast, Infinispan and Memcached. With JCache API I have a feeling that JHipster code generation will cover most of the other caching providers including Redis in near future.
  • It integrates the client side calls with the server side call using JWT token authentication flow built into the Spring Security. If you want to authenticate users using the Identity provider approach mentioned earlier. All you have to do is use the Pre-authentication flow that Spring Security provides out of the box thereby delegating the authentication piece to your IdP and then you can use JWT tokens for subsequent authorization. NOTE: It is also possible to use OpenID Connect for SSO with Single Page application as an alternative to IdP.
A few things at the time of writing that you should be aware of in terms of Jhipsters cons
  • JHipster does not auto generate code for versioning or Optimistic locking for hibernate entities
  • JHipster requires JDK 1.8 for code generation. However, you can run the code generated by JHipster on Java 11 but you will receive warnings that some of the class libraries (especially Hibernate) are using native calls which is discouraged. Anyhow I do not think this is necessarily going to be a big issue once JHipster catches up with using the right version of Hibernate in its next release that does not use these java native calls. Just so you know your code will still work with Java 11, you will get some nasty warning’s in your application server logs that for the time being you can ignore.
  • Given the choice of technology stack and more specifically the component versions within the individual technology that JHipster uses to generate code, one thing you need to be aware of is being locked down with specific version of that technology. For example, if JHipster code generation is using the most recent version of Angular today, six months down the road that version may become obsolete and you will have to do the upgrade manually or wait for JHipster to do catch-up.
  • It only generates code for CRUD operations and does not generate code for business specific workflows which you still need to design and code using your developers but in my opinion it generates a very good code base for you to build from
Conclusion
I hope you liked this article. I plan to write an article on Event Driven architecture as well as Microservices when to use them and most importantly when not to use them next. As always send me constructive feedbacks.