Towards robustness and self-organization of ESB-based solutions using service life-cycle management

Enterprise Service Bus (ESB) is a middleware infrastructure that provides a way to integrate loosely-coupled heterogeneous software applications based on the services principles. The life-cycle management of services in such environments is a critical issue for the component's reuse, maintenance and operation. This paper introduces a service life-cycle management module that extends the traditional functionalities with advanced monitoring and data analytics to contribute for the robustness, reliability and self-organization of networks of clusters based on ESB platforms. The realization of this module was embedded in the JBoss ESB, considering a sniffer mechanism to collect the service messages crossing the bus and a Liferay portal to display relevant information related to the services' health.


Introduction
The conceptualization of Internet of Things (IoT) paradigm and the implementation of computational distributed systems reinforce the importance of the integration of heterogeneous software applications across the enterprise Information Technology (IT) infrastructures. In fact, according to a prediction report from Gartner on Application Integration [Gartner, 2012], by 2016, midsize to large companies will spend 33% more on application integration than in 2013, and by 2018, more than 50% of the cost of implementing 90% of new large systems will be spent on integration.
The advent of Service Oriented Architecture (SOA) [Erl, 2005] as a software paradigm for distributed systems to integrate the enterprise IT infrastructures brought the concept of Enterprise Service Bus (ESB). An ESB is a software architecture model used for designing and implementing the communication between interacting software applications in a SOA environment. It is based on the idea to have a standard and structured middleware that offers a way to connect and integrate loosely-coupled heterogeneous software components, named services, reducing the complexity of application interfaces. In 2003, Gartner predicted that the majority of large enterprises will have an ESB running to integrate their IT infrastructures in the near future [Gartner, 2002]. The ESB solutions provide a distributed, modular and pluggable architecture, supporting dynamic routing and mediation of services' discovery, request and execution. The main benefits of using an ESB platform is the increased flexibility and scalability, the interoperability transparency and the existence of configuration rather integration coding. In fact, large-scale distributed systems can benefit from an ESB middleware acting as a broker between the nu-merous heterogeneous service providers/requesters, avoiding a potentially huge number of point to point connections. However, the increased overhead and the possible slower communication speed are the main disadvantages of ESB solutions.

The Problem
One of the main functionalities of an ESB is to monitor and control the routing of messages exchanged between services. The life-cycle management of deployed services, e.g., including functions of monitoring and data analytics, is a crucial issue for the component reusing, maintenance and monitoring [Wang et al., 2012], and in the context of ESBs, contributes to increase the system robustness, reliability and fault-tolerance. In fact, the possibility to monitor the performance of registered services and analyse the evolution of their behaviour, allows to detect in advance the possible degradation or risk propagation, being possible to generate warnings to implement proper corrective actions to mitigate the problem.
Currently, and related to the life-cycle management, the ESB platforms only provide basic functions associated to the service registry and completely misses these kind of advanced functionalities, leading to the need to have a life-cycle management functionality embedded in the ESB that provides monitoring and data analytics of the registered services.

Objectives and Contributions
The objective of this work is to develop a life-cycle management module that may be embedded in the traditional ESBs to provide advanced monitoring and data analytics of the registered and deployed services, contributing to achieve more robust, reliable and fault-tolerant distributed SOA-based systems. Additionally, this functionality will also play an important role in the self-organization of the network of software applications organized as clusters of ESBs, in a dynamic and on-the-fly manner.
The ultimate goal of this work is the integration of this module in the final software deliverable of ARUM (Adaptive Production Management) 1 project that will be used in the final review meeting that will be held at the Airbus industrial facility in Hamburg.

Document Organization
This document is organized into five chapters, the first of which consists of this introduction where the problem statement and objectives were defined. The rest of the work is organized as follows: Chapter 2 shows an overview of the state of the art about SOA and ESB, pointing to the more technical aspects. Besides the description of SOA principles and the ESBs' characteristics, it is presented a closer look in Service Oriented Multi-agent Systems and Web services.
Chapter 3 discusses the architecture of the intelligent enterprise service bus (iESB) developed under the ARUM project, and presents the specification of the life-cycle management module as part of the iESB. The contribution of this module to achieve robustness and selforganization is also detailed, as well as technical details related to the implementation of experimental prototype.
Chapter 4 discusses how the module contributes to achieve robustness and self-organization, by presenting two experimental scenarios and analysing the achieved results.
At last, Chapter 5 rounds up the work with the conclusions and points out some future work. has gone from being an online retailer to be a dominant e-commerce platform by exposing services to their partners using SOA [Harris, 2007]. In 2007, the use of SOA concepts allowed the British Telecom 3 to close down 800 systems [Lawson, 2011]. The Federal Aviation Administration (FAA) 4 used SOA and cloud computing for its National Airspace System, allowing the implementation of the next generation of air traffic management systems [Hritz, 2012].
Beyond business applications, due to the promised agility and flexibility benefits, embedded devices offering services in the context of Cyber-Physical Systems (CPS), namely in automation and manufacturing environments, are no novelty. In fact, as pointed out by [Mendes, 2011], SOA fits well with collaborative automation, "addressing distributed, modular and reconfigurable automation systems whose behavior is regulated by the coordination of services".
This chapter includes a comprehensive summary of the state of the art related with SOA and ESB. The option is to visit the central concepts without delays with historical considerations, focusing the technical aspects.

SOA Principles
Browse by different contributions related to SOA can be a challenging job. • SOA is "An architectural paradigm for defining how people, organizations and systems provide and use services to achieve results." [OMG, 2012] • SOA is "A set of components which can be invoked, and whose interface descriptions can be published and discovered." [W3C, 2004a] The same approach can be used for the concept of "service", listing the different definitions: • Service is "The means by which the needs of a consumer are brought together with the capabilities of a provider." [OASIS, 2006] • A service is a logical representation of a repeatable business activity that has a specified outcome" (…) "and: Is self-contained; May be composed of other services; Is a "black box" to consumers of the service". [The Open Group, 2011a] • "Service is defined as a resource that enables access to one or more capabilities. Here, the access is provided using a prescribed interface and is exercised consistent with constraints and policies as specified by the service description. This access is provided using a prescribed interface and is exercised consistent with constraints and policies as specified by the service description. A service is provided by an entity -called the provider -for use by others. The eventual consumers of the service may not be known to the service provider and may demonstrate uses of the service beyond the scope originally conceived by the provider." [OMG, 2012] • "A service is an abstract resource that represents a capability of performing tasks that form a coherent functionality from the point of view of providers entities and requesters entities. To be used, a service must be realized by a concrete provider agent." [W3C, 2004a] After processing the central documents of open standards organizations, it is clear that SOA is a way to build distributed systems where the central piece is the concept of service that service providers offers to service consumers. A service is the implementation of some business logic made accessible through a well-defined interface (contract), hiding the implementation details.
Service consumers and providers must find themselves visible. Preconditions to visibility are awareness, willingness (predisposition to interact) and reachability (participants must be able to communicate with each other). Awareness prescribes that the service consumer must have information that would lead to know of the service provider's existence [OASIS, 2006].
Using discovery mechanisms, service consumers find the services they need, and interact directly with those services. These discovery mechanisms are not specified at this abstract level, but two main approaches can be considered: using a registry or considering a peer-to-peer solution [W3C, 2004b].
Services with more complexity can be composed by other services (in this context classified as atomic services). Two main types of composition are often distinguished: orchestration, in which one of the services schedules and directs the others [W3C, 2004a] and choreog-raphy, in which the composed services interact and cooperate without the aid of a directing service in a peer-to-peer way [W3C, 2005].

Service Oriented Multi-agent Systems
Notwithstanding the several interpretations, a possible definition for agent is : "An autonomous component that represents physical or logical objects in the system, capable to act in order to achieve its goals, and being able to interact with other agents, when it doesn't possess knowledge and skills to reach alone its objectives". An agent exhibits autonomy and cooperation, and may have reasoning and learning capabilities. For instance, in the manufacturing domain, an agent can represent a physical resource, such as a machine, a robot or a pallet, or a logical object, such as a scheduler or an order. A Multi-Agent System (MAS) can be defined as a set of agents that represent the objects of a system, capable of interacting, in order to achieve their individual goals, when they don't have enough knowledge and/or skills to achieve individually their objectives  (note that each agent has a partial view of the system and none agent has a complete view of the system). These systems have the capability to respond promptly and correctly to change, and differ from the conventional approaches due to their inherent capabilities to adapt to emergence without external intervention [Wooldridge, 2002].
The service-oriented principles can be integrated with MAS to enhance some functionality and to overcome some of its limitations, namely in terms of interoperability and IT-vertical integration. Indeed agents are already present in standard documents of SOA (e.g., see [OA-SIS, 2006]) and, at the same time, services are already part of the agents' specification [FIPA, 2002a]. In spite of being based on the same concept of provide a distributed approach to the system, MAS and SOA present some important differences, namely in terms of computational requirements and interoperability (see [Ribeiro et al., 2008] for a deeply study). These differences (presented in Table 1) highlight the complementary aspects of the two paradigms, suggesting the benefits of combining them.

MAS SOA
Well established methods to describe the behaviour of an agent Focus is on detailing the public interface rather than describing execution details Traditionally, the combination of MAS and SOA paradigms can be performed in different ways, as illustrated in Figure [Greenwood et al., 2007]. However, 9 Since version 1.2, is not an acronym. SOAP was originally an acronym for Simple Object Access Protocol. Management using the described approach, the design of truly service-oriented multi-agent systems are far from the real expected potential and benefits. Another option, illustrated in Figure 1.b), was introduced by [Mendes et al., 2009] and is characterized by the use of a set of autonomous agents that use the SOA principles, i.e. oriented by the offer and request of services, to fulfil industrial systems goals. An important note is that these service-oriented agents do not only share services as their main form of communication, but also complement their own goals with external provided services.

Web Services
The W3C defines the concept of Web service (WS) and offers a family of WS-* standards to support the implementation of concrete SOA applications using the Internet as communication path between service consumers and service providers. As stated in the Web Services Glossary [W3C, 2004]: "A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a ma-

chine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP-messages, typically conveyed using HTTP
with an XML serialization in conjunction with other Web-related standards." This inflexible view, requiring a specific implementation, elides the existence of other alternative Web services technologies like the Representational State Transfer (REST) [Fielding, 2000] approach (to be seen at the end of this subsection).
As stated, W3C Web services use XML messages following the SOAP standard [W3C, 2007a] that defines the message structure and is the base of service interoperability (dealing with requests and responses). The common grammar used to describe the Web services is determined by the Web Services Description Language (WSDL) [W3C, 2007b]. A client connecting to a Web service reads the WSDL file to determine the available operations.
Web services awareness typically is achieved by using a registry. UDDI [OASIS, 2004] provides the infrastructure required to publish and discover services. The concomitant use of SOAP, WSDL and UDDI allows to implement the more usual interaction schema for Web services, as shown in Figure 2.
Initially, a service provider requests the registration of a service into a UDDI registry service using a SOAP message (1). A service requester searches the service (2) and the UDDI registry sends a reference of the service (3). The requester calls the service using the reference (4). Then, after acceptance of the request (5), the interaction to consume the service starts (6).
With the emergence of numerous embedded devices with processing capabilities and connection to the Internet has become natural to think on their participation in SOA solutions. To cope with that reality, the Devices Profile for Web Services (DPWS) framework defines a minimal set of implementation constraints to enable secure Web service messaging, discovery, description, and eventing on resource-constrained endpoints [OASIS, 2009].
Languages for the description of complex operations allowing the composition of atomic Web services are available. When business processes are being exposed as Web services, the creation of new composite services using orchestration mechanisms may be described using the Business Process Execution Language (WS-BPEL) [OASIS, 2007]. On the other hand, choreography compositions (peer-to-peer collaborations) can be described using a choreography language like the Web Services Choreography Description Language (WS-CDL) [W3C, 2005].
The Semantic Web is an extension of current Web in which information is given with a well-defined meaning, allowing the performance of cooperative work between computers and people [Agarwal, 2012] 2007c] allows to add annotations to WSDL elements with additional semantic content.
More WS/SOAP-related standards are available from W3C and OASIS (see Table 2 with examples) but will not be addressed in this work.

Standard Description Organization
Web Services  REST principles can be used to implement an alternative to SOAP-based Web services, the so called RESTful Web services, currently used by big enterprises like Yahoo, Google, and Facebook [Rodriguez, 2008]. REST is an architectural style, initially proposed by [Fielding, 2000], where data and functionality are identified using Uniform Resource Identifiers (URIs).
All REST requests are stateless, meaning that they are independent of previous ones (no memory) and contains all the necessary information to make themselves understood at the destination. Data and functionality are collectively called resources. Resources are manipulated using a uniform interface considering the create, read/retrieve, update and delete operations, mimicking HTTP PUT, GET, POST and DELETE methods [Pautasso et al., 2008]. Despite requests always be stateless, server responses can be statefull, meaning that the response to a request is labelled as cacheable or non-cacheable [Fielding, 2000].

Enterprise Service Bus
SOA systems can be realized by an ESB that provides a layer on top of an implementation of an enterprise messaging system [Ziyaeva et al., 2008], as shown in Figure 3.  Table 3), offering a variety of choices.  Interoperability between heterogeneous systems requires information exchange. To be able to communicate, each participant must understand the contents of incoming messages. Writing "glue code" to deal with the translation of data between systems is a case by case approach, since the inclusion of a novel system originates writing more "glue code". This ap-  It is recognized the importance of SOA type solutions to provide monitoring and management capabilities at service level [Wang et al., 2012]. Currently the ESB platforms only provide basic functions associated to the service registry (see Table 4 with an example for a concrete solution) and completely misses more advanced functionalities.

Positioning in the ARUM Project
The ARUM project focus is on the development of mitigation strategies to respond faster to unexpected events and the implementation of systems and tools for the decision support in planning and operation. For this purpose, the ARUM platform comprises an intelligent ESB (iESB), which enriches the traditional ESBs with a plethora of advanced modules, and provides a common infrastructure for the integration of heterogeneous planning and scheduling tools (e.g., using the MAS principles) and legacy systems, as illustrated in Figure 5. The main modules deployed in the iESB are the Ontology service, Data Transformation Service, Sniffer, Node Management, and Life-Cycle Management (note that only the latter is the responsibility of the author of this work).
The Ontology service module is responsible to gather the pieces of data from various legacy systems, e.g., MES (Manufacturing Execution System) and ERP (Enterprise Resource Planning), via the data transformation service, aggregate and store it in the local triple store Management and then provide it on request to other services. Aiming to provide a common and explicitly defined semantics of data, it was developed a set of OWL (Web Ontology Language)-based ontologies for the description of production processes, shop floor topologies, resources and their availability, scheduling strategies and disruption events [Inden et al., 2013]. The Data transformation service module is responsible for gathering data from legacy systems. The raw data, received from gateways using the legacy system specific interfaces and communication protocols, is transformed into the ontological format (RDF -Resource Description Framework) using the OWL-based ontologies provided in the Ontology service.
The Sniffer module is responsible for capturing the flow of messages across the ESB and related to the registered services, to support the monitoring and understanding of the overall state of the system especially in a distributed environment with multiple interacting services [Vrba et al., 2014].
The Node Management module supports the distributed management of iESB instances, allowing the inter-connection among several ESBs [Marín et al., 2013].
The Dashboard acts as a user interface (UI), providing the user with the means for administration and monitoring of the overall ARUM solution (including all deployed tools). It means for example the deployment of services, monitoring their parameters and health, visu-alizing the message flow and statistics. The dashboard leverages the web portal technology, which is a specially designed web page on which the information is displayed within dedicated user interface components -the portlets.
The Life-cycle management module performs the life-cycle monitoring and analysis of the health of the services that are deployed within the iESB, supporting the dynamic, online and on-the-fly actions to mitigate the degradation of their performance. This module will be deeply analysed during the rest of this document.

LCMM Architecture and Functions
The Life-cycle Management Module (LCMM) performs the continuous monitoring and data analytics of the services that are deployed within the iESB, allowing to dynamically be aware of the current state and health of the services and to perform on-the-fly actions to increase the services' performance. In particular, the main features provided by the LCMM module are: • Monitoring of the registered services' health, providing on-line information related to different KPIs, such as the failure rate and the occupancy.
• Detection of the registered services/tools that are not operating properly and analysis of trend and patterns on the services' performance, e.g., the detection of the degradation in the service quality.
• Analysis of the risk propagation in case of service quality degradation.
• Suggestion of actions to maintain the system's robustness and stability.
The LCMM module interacts with the Sniffer module to get data related to the exchanged messages and the UI Dashboard to support the interaction with the user and particularly to display the monitored info related to the health of registered services according to pre-defined KPIs, as illustrated in Figure 6. Internally, the module comprises the Event Monitoring, Data Analysis and local database. Management Figure 6: Architecture of the life-cycle management module.
The interaction between the Event Monitoring and Data Analysis components allows to trigger a more detailed data analytics and also to provide feedback regarding the adjustment of the pooling rate for a specific service.

Event Monitoring Component
The Event Monitoring component performs mainly the collection of the data related to the exchanged messages across the bus and the monitoring of the services' health. Since the Sniffer module is continuously sniffing the messages crossing the ESB and feeding its database with the gathered information, the Event Monitoring component can request this data using a proper and dynamic polling mechanism that is parameterized according to the service frequency and priority. In fact, the polling time is adjusted according to the service usage frequency, i.e. short polling time if the service is usually requested or larger time if rarely requested. Also, an event-driven mechanism can be used to collect the data from the Sniffer module, but this alternative can only be used if the Sniffer module provides the subscription functionally.
The reasoning engine, embedded in this component, processes the gathered and historical information in order to support the health monitoring of registered services by calculating several pre-defined KPIs, namely in terms of performance and status, that will be exposed as monitoring services to the user, namely through the UI dashboard. Examples are the detection if the registered services are not alive by identifying not answered messages and behaviours that not follows the service patterns.
Considering that = { : = 1, … , } is the set of tools connected to the ESB and each tool offers a set of services = { ij: = 1, … }, the LCMM module provides a plethora of services aiming to monitor several KPIs, as detailed as follows (also illustrated in • getFailureRate: provides the failure rate of a service, calculated as follows, where fij is the number of failures of the serviceij with reference to the last n requests of this service:

=
(1) • getDegradation: provides the information related to the degradation of the response time of a service j of the tool i (δij). The degradation is the comparison of the response time of the last two events.
• getServiceOccupancy: provides the information related to the occupancy of a service.
The Service Occupancy (SOij) of a service j running in the tool i is defined as the ratio of the overall time tij that the service is being used with the overall time Δi of the software tool deployed on the system: • getToolOccupancy: provides the information related to the occupancy of a tool. The Tool Occupancy (TOi) is defined has the ratio of the time ti that a given tool i is being 22 Towards Robustness and Self-Organization of ESB-based Solutions using Service Life-cycle Management used (independent of the overlapping of services in the tool) with the overall time Δi of the tool deployment on the system, as illustrated as follows:

= (4)
• getToolOverallDemand: provides the information related to the load of a tool within the overall ESB load. This load is the ratio of the number ri of requests to the services running in tool i and the total number of requests to all tools, represented as follows: = ∑ (5) • overallStatus: provides the overall service status considering all evaluation parameters, namely the failure, degradation and occupancy, weighted according to pre-defined values. This can be defined as a health scale, where 0 means "good", 1 means a potential "risk" or "problem".
This component can also implement a pre-risk analysis allowing to determine potential situations of service/tool failure. In this way, when a set of conditions are met, such as the presence of historical problematic tools or the warnings coming from the evolution of service KPIs, the component can signalize the critical service(s) and take more pro-active measures, such as changing the warning threshold values. Beside this action enabling the early signalling of potential hazardous situations, it additionally allows an anticipated taking of known actions that permit to overcome the potential situation.

Data Analysis Component
The Data Analysis component aims to perform advanced reasoning, and particularly data analytics, over the historical and current collected information related to the deployed services. In fact, the functions provided by this component include: • Analysis of trends to detect deviations or patterns in the quality and performance of the service.
• Analysis of correlation among the execution of different services (also including the correlation considering services deployed in other ESBs belonging to the same network).
• Analysis of the impact and risk propagation related to the degradation of a service.
The implementation of these functionalities may consider the use of data mining techniques [Witten et al., 2011], namely clustering algorithms. Clustering is a technic used to find, in an automatized way, hidden patterns in big quantities of data. Based on the k-means clustering algorithm presented in [Kanungo et al., 2002], Figure 8 illustrates a strategy integrated in LCMM to perform data analytics to discover the set of services that presents more risk of abnormal behaviour. The output of these functions is the generation of warnings to the user, e.g., providing useful information about the state and risk of a specific service and also suggesting the execution of proper actions, such as unregistering the service (e.g., when the service is not being used), re-starting of a tool (e.g., when the service is degraded or not responding) or creating one clone (e.g., when the service/tool is too busy). These actions can also be Towards Robustness and Self-Organization of ESB-based Solutions using Service Life-cycle Management performed automatically, under well controlled conditions. In this case, as also illustrated in Figure 9, and aiming to support the operation of the LCMM module, several services provided by other modules in the iESB may be requested, namely amIAlive (to verify if the service is alive), unregister (to unregister services by accessing the ESB Register Service), relaunch (to restart the service provided by the tool) and clone (to clone a service/tool, e.g., when a service/tool is too busy). Note that the use of these services may require some kind of privilege access to external tools. Learning is an important piece of the LCMM module, supporting the discovery in advance of potential problems and the definition of the actions to be implemented when a risk is detected (as well as in the adaptation of the warning threshold values). The LCMM module should also consider self-monitoring and self-analysis in order to avoid its chaotic behaviour, e.g., acting as a "cancer" deploying very rapidly services/tools and consequently overloading the system.

LCMM Contribute to Robustness and Selforganization
The implementation of the LCMM functionalities is a step forward to achieve intelligence in the ESB platform and in this way to achieve an iESB. More concretely, this module may contribute to achieve robustness, reliability, fault-tolerance and self-organization in this kind of distributed systems, i.e. those based on the ESB middleware.
Robustness can be defined as the capability of a control system to remain working correctly and relatively stable, even in presence of disturbances [Pereira et al., 2013]. Additionally, an important issue is the system fault-tolerance, i.e. the capability to detect and tolerate internal failures, in order to continue performing their operations without the need for an immediate intervention. Being more tolerant, the downtime is reduced, and being able to detect and diagnosis, the repair process is speed-up, increasing the robustness and productivity of manufacturing systems [Leitão, 2011].
In such kind of distributed systems, based on offering and requesting services, the inexistence of central nodes makes these systems more robust than the traditional centralized systems, by eliminating the single point of failure problem. In fact, more decentralization provides additional reliability due to the implicit redundancy and diversity and the nondependency of central control nodes [Pereira et al., 2013]. However, the existence of a middleware infra-structure to integrate the IT software applications based on services can somehow restrict the robustness and reliability of such systems. Note that reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time. In this way, the LCMM module ensures the increase of robustness and reliability by permitting an automatic discovery of problematic services, e.g., the ones that may be failing, not responding properly or overloaded, and take/suggest appropriate actions, such as launching a parallel service of the one that is identified to be near of failure, to mitigate the possible problems.
Additionally, the LCCM module can greatly contribute as an underlying mechanism to support self-organization at two levels: at service level or at iESB level. On the first case, the LCCM module can act as a referee, issuing warning signals for the deployed services, preventing erratic behaviour (e.g., when a tool is sending over the limit service requests). In this case, and if the appropriate behaviour actions are implemented in the affected tool, the tool can change its internal behaviour accordingly. A second example can be found in a tool that has reduced utilization. In this case, and if a redundant tool is presented, the LCMM can advise the less used tool to change into low profile mode or to, at the limit, un-plug itself from the system, as seen in Figure 10 (hexagonal service).
At iESB level, the information mined by the LCMM can be used by the self-organization mechanism as a way to internally (re)arrange structurally the iESB, by adding, modifying or removing services (ellipse service in Figure 10), or (re)arranging the relations and constitution of clusters in an inter-iESB perspective. This structural self-organization level allows the Management dynamic clustering of iESBs, arranging themselves accordingly, aiming an uniform service performance distribution where the performance of each individual iESB is increased by the decrease of individual service/tool overload and failure rate. The aforementioned insights are drawn from the ADACOR 2 control architecture. In the proposed architecture, the individual behaviour of the entities [Barbosa et al., 2013a] are dynamically changed, aiming a smooth evolution of the system, while a more drastic evolution is achieved through the change of the entities relations [Barbosa et al., 2013b].
Similarly to what is achieved in the ADACOR 2 approach, by combining these two selforganization levels, the LCMM module will enable the achievement of a self-organized and evolvable iESB system, once the overall system is able to adapt itself internally and structurally to system demand fluctuations, internal services disruption or to iESB node change.
Additionally, and also as indicated in the ADACOR 2 control architecture, the LCMM must undergo with a nervousness controller in order to avoid entering in a chaotic process when taking decisions. This stabilization mechanism will prevent intermittent service/tool stop or launch as also the constant (re)arrangement of the iESB clustering.

LCMM Implementation and Operation
The proposed LCMM was developed and deployed as a JBoss ESB service, encapsulating its business logic into a set of Java classes. JBoss ESB [DiMaggio et al., 2012] is an ESB solu-tion maintained under the umbrella of JBoss Community 11 and intends to provide an open source option for the construction of systems based on SOA principles.
The main constitutive part of the LCMM service is a chain of "Actions". Basically, in the JBoss ESB framework, an "Action" is a Java class that allows the ESB services to carry out their tasks: These tasks are realized after the processing of the data referring to the exchange of messages between the registered services in the ESB. To accomplish that, it was implemented a connection to the Sniffer's database, which is implemented using a MySQL database. Then a snapshot is created for each service and tool: ... The LCMM integrates with the ESB by using a configuration file, jboss-esb.xml that appears below in a partial view: The user interface, supporting the visualization of the data resulting from the processing operation of the LCMM service, was developed as a web-based application that can be accessed via web browser. This web-based application was built on the Liferay Portal 13 [Sarang, 2009]  name="LCMM"> <listeners> <jms-listener busidref="serviceJMSChannel" name="ESBListener" /> </listeners> <actions> <action class="pt.ipb.arum.lcmm.LcmmAction" name="LcmmAction" process="process" /> <action name="notificationAction" class="org.jboss.soa.esb.actions.Notifier"> <property name="okMethod" value="notifyOK" /> <property name="notification-details"> <NotificationList type="ok"> <target class="NotifyTopics"> <topic jndiName="topic/lcmm_JMS_topic" /> ...

Management
3.0 14 , which is a charting library written in HTML5 and JavaScript, allowing, among others, to build dynamic charts. Next javascript code shows the instantiation of the overall status chart: The communication between the LCMM service and the Liferay portlet is achieved by using the Java Message Service (JMS). The JMS specification describes the exchange of messages between Java programs, in particular for the use in publish-subscribe solutions. Going into details, it is used a JMS topic, allowing the delivery of messages to multiple subscribers: ... <?xml version="1.0"?> <configuration xmlns="urn:hornetq" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:hornetq /schema/hornetq-jms.xsd"> <queue name="lcmm_ESB_request"> <entry name="queue/lcmm_ESB_request"/> </queue> <topic name="lcmmtopic"> <entry name="topic/lcmm_JMS_topic"/> </topic> </configuration> Figure 11 illustrates a screenshot showing an overview of four KPIs related to the evolution of two services, namely the failure rate, the degradation, the occupancy and the overall demand. Figure 11: Screenshot of the LCMM user interface.
For each KPI, a chart with their evolution is shown. In the case of occupancy index, it is presented a joint view of services and tools. The continuous monitoring of these KPIs allows to Management detect problems and to trigger warnings for the implementation of proper actions that will mitigate their possible negative impact.
The part of the GUI related to the Data Analysis presents a table, as can be seen in Figure   12. The entries of each KPI for each service are presented in the form value [trend]. The trend reflects the evolution of the indexes comparing with the previous instant and it is a value in the set {equal, up, down}. The "Actions" button, presented in the last column, allows to perform some administrative tasks linked to the service to which it relates. The options "Visible" / "Not visible" allow the user to decide whether to visualize the KPIs curves of the service in the Event Monitoring part. The "Remove" option permits to instruct the Node Management module to unregister / undeploy the service from the iESB. The "Clone" option instructs the Node Management module to trigger the installation of a new instance of the service in another iESB with the consequent removal of the service in the current iESB. These two features enable the system to self-organize when the "Automatic" mode is activated. The default option is the "Manual" mode. The column "Admin" allows to view the current state of the administrative options enabled for each of the services. More details and screenshots are presented in Chapter 4.

Results and Discussion
This chapter presents two experimental use cases which demonstrate the advantages of the use of the LCMM towards the system's robustness and self-organization. The first one is a proof of concept that was presented and validated by the ARUM partners in the second year meeting review. In the second use case, it is possible to appreciate more advanced features of the module, namely the use of LCMM's Data Analysis Component to achieve structural selforganization in an ecosystem constituted by two iESBs.

Experimental Setup
The use cases were executed on a PC with the characteristics listed in Table 5.

Component Description
Processor Intel Core i5-3317U Total memory 4 GB Operating system Windows 8 x64 For the first use case one instance of the iESB was launched and for the second one two iESBs were launched.

Peak on Demand Use Case
In this use case, two services were deployed into the iESB: the "ontology service" and the "publish service". The first was implemented to leave unanswered 1/5 of the requests and the second to fail 1/10 of the requests. Each of the services has been requested by a client every second. After 40 requests, the "publish service" began to fail 3/20 times and the requests doubles. As showed in Figure 13, the failure rate of the "ontology service" converges to 20%, stabilizing around this value. Observing the evolution of the failure rate curve of "publish service", it is possible to verify that, between 17:46:10 and 17:46:50, the failure rate is increased from 10% to 15%. In parallel (seeing the chart in Figure 14), it is possible to observe a degradation of the response time of the "ontology service" and "publish service" after 17:46:00, which may be explained by a peak on the demand using the bus. However, both services recover after a while as shown by the negative values in the chart.
The continuous monitoring of these KPIs allows to detect these problems and to trigger warnings for the implementation of proper actions that will mitigate their negative impact.

Inter-iESB Use Case
The objective of the second use case is to demonstrate the use of Data Analysis component.
Two instances of the iESB were launched and joined using the Node Management module present in each of them. In the first iESB, 30 services were deployed, as illustrated in Figure 15. The services s i , i = 10, 20, 30 have been conditioned to fail 1/10 of the requests and for the other services were not forced any failure. For each service has sent a request every second. In the second iESB were running the natives services along with the LCMM and the Node Management module (see Figure 16).  It should be noted that the occupancy is increasing for the most problematic service (i.e. service 30) and that the automatic mode option was selected in order to be the system to address the recovery from problematic situations. In automatic mode, after detecting a problem, the LCMM sends messages to the Node Management module whenever appropriate. One of the possible requests is the uninstall of a service within a node (in this case in the first iESB) and the installation/registration of the service in another available node (in this case the second iESB). The Node Management module deploys and undeploy the packages containing the files of services by copy/delete involving the file system and the deployment folder of the nodes.
After setting the automatic mode for the most troublesome service, the system was allowed to evolve further 5 minutes. Figure 18 shows a detail of the list of services deployed on the second iESB. As can be seen, the service30 is now installed/registered in the second iESB. Figure 18: Services deployed in the second iESB after the system's self-organization.
This demonstrates the ability of the LCMM to detect problematic situations and trigger actions in Node Management module, allowing the system's self-organization in an automated way.

Conclusions and Future Work
The use of ESB middleware allows to implement distributed systems that integrate looselycoupled heterogeneous IT infra-structures. ESB provides several functionalities, namely the monitor and control of the routing of messages exchanged between software applications that expose their functionalities using services. Related to the life-cycle management of services, the ESB usually only provides basic functions associated to the service registry and completely misses advanced functionalities (e.g., data analytics).

Conclusions
This document described a service life-cycle management module that extends the traditional ESB features to provide advanced monitoring capabilities and data analytics to the registered services, contributing to achieve more robust and self-organized SOA-based systems.
The proposed module was implemented as a JBoss ESB service, using Java, and the user interface was developed as a web-based application built on the Liferay Portal. Several functions were implemented allowing to monitor the health of services according to pre-defined KPIs and also to detect trend and patterns in the service performance.
The experimental implementation allowed a proof of concept, by exploring two use cases related to the ARUM project. In the first one, it was observed the effect of a peak on demand and their detection by LCMM, enlightening the utility of the Event Monitoring component. In the second use case, it was observed the system's self-organization induced by LCMM after the detection of the services with less performance. Both use cases demonstrate the usefulness of LCMM to make the system more robust and self-organized.  ICHIT '08, pp.245-249, 2008.