SOA: Responsibilities of Service Providers (Part 8)

Responsibility 7:  Provide business eventing, where applicable (e.g. asynchronously via JMS).

This blog post is one in a series.  An overview and general outline of this series is linked here.


In a nutshell, according to Wikipedia, an Event Driven Architecture (EDA) is one that promotes the publication, consumption of, and subsequent reaction to events.  Publishing applications are independent and decoupled from one to many consumer applications.  And consumption is usually performed asynchronously after the producer's unit of work is complete.  Java Message Service (JMS) is a common asynchronous messaging specification.  An emerging asynchronous messaging specification is AMQP.  While it is possible to implement synchronous event processing, tread carefully.

EDA is complementary to the objectives of SOA but this is sometimes misunderstood.  The complementary relationship might be best described with an example:
  • An AccountService is invoked to Open an Account.
  • Once all the required attributes are validated, the account is created.  The customer proceeds to use the account to transact business.
  • As part of creating and committing the Service's unit of work, a "business event" is published to JMS.  The event contains the channel (e.g. 800 number, sales person, or website) used to open the account and other basic meta data about the account that was opened.
  • Asynchronously and in parallel, one or more separate consumer applications receive the business event as a notification that an account was opened.

Let's continue the example and assume two applications receive the event that was published and process the event independently and in parallel.

Application #1 receives the event and sends a "thank you" email or letter via Postal Mail, depending on the customer's contact preferences.  This correspondence may include invoice and payment terms, return policies, etc. that are specific to the account.

In parallel, Application #2, a SalesForce Automation (SFA) application, receives the event. If the account meets certain criteria (e.g. it is for a corporation) the sales person assigned to the customer's city, state, region, etc. is notified.  The SFA application might analyze/classify the account and make key details readily accessible to the sales person--examples include:
  1. This is an account for a customer that has never transacted business before, or this is a second account for a customer that has not transacted business since X date.
  2. The customer's business is in industry Y per SIC code 1234.
The SFA application might even alert the sales person's Blackberry if the account meets criteria specified in his/her alert preferences within the SFA app.  If so, this could be done automatically in a matter of seconds or minutes after the account was opened.

An event driven approach is more effective than the SFA application waking up on a timed interval to poll the customer service database for new accounts.  Or the customer service database creating a daily extract of new accounts to send to the SFA application.

A few benefits of asynchronous messaging include:
  1. An Event driven architecture provides the ability to react to business events as they occur.
  2. Consumers are decoupled from producers.  Events will simply queue up for one specific consumer application if a planned or unplanned outage is experienced.  When the application becomes available again all messages queued during the outage will be processed.
  3. An application may process messages off the queue in parallel.  This is achieved by creating multi-threaded consumers that connect to the queue.  This is a simple means to achieve increased throughput.

Typically, business services parallel a durable business process it supports. As part of analysis, it is useful to identify the actors that initiate or interact with the business process and also the events (outcomes) that result. Events are often described as a noun plus a verb (or vice versa if you prefer). It is useful to identify and catalog these events as part of business process analysis, capability mapping, etc. Examples of business events:
  • Account Opened
  • Customer Order Placed
  • Customer Order Shipped
  • Inventory Replenished

When required, the service provider carries the responsibility to
successfully publish the appropriate business events to the messaging infrastructure.  The general objectives and challenges are very similar to SOA.  There should be a common approach/strategy to do this.  For event/messaging to be effective as an enterprise resource, publishing business events that are of high-value across the enterprise requires planning and coordination across the organization.

While we've only covered some basics, an effective approach for business eventing is required to spring board into other areas such as event correlation, complex event processing, etc.  In using a football analogy, it is important to get blocking and tacking right before focusing too much on the other complexities of the game.

SOA: Responsibilities of Service Providers (Part 7)

Responsibility 6:  Implement formalized change management. And implement interface versioning to limit the number of clients impacted by changes (decoupling).

This blog post is one in a series.  An overview and general outline of this series is linked here.


Service Oriented Architecture (SOA) typically targets reducing duplication and increasing reuse as a means to promote a more cost effective and agile operating model.

As part of growing a SOA, we hope applications use existing services to solve business problems.  Over time this translates to a service having interface dependencies with many client applications.  Conversely, if a service only interacts with a single client this is not optimal as point-to-point solutions are not the primary focus of SOA.

As the number of interface dependencies grows for a service, there can be challenges.  Examples:
  1. Implementing changes to the interface definition can be a lengthy "ocean boiling" experience when many client applications are impacted.
  2. A change is needed to a service to meet a new set of requirements.  How do you pinpoint all applications that are impacted?  And can we spend less time and energy assessing impact?
  3. Unexpected outages can occur if impact analysis is inaccurate.


It is the responsibility of the service provider to follow a standard, generally accepted versioning strategy.  This consistency is in keeping with Responsibility 4.  A complete strategy supports major versions (breaking changes) and minor versions (compatible changes within a given major) of the interface definition or contract.  The service provider must assess whether a change to the interface contract is a major or minor change.

A minor change is a compatible change.  For example, adding an optional element to the service's request or response for a customer's "maiden name" would be considered a minor change.  Only clients needing visibility to "maiden name" would need to upgrade to adopt the new interface version.  All other clients will continue to work without any changes.

It is important to identify opportunities leverage minor versioning.  These changes can be implemented much easier than major changes but always in keeping with Responsibility 5.  No hacks please :-).

The service provider must maintain effective communication with all application owners that invoke a given service.  It is the responsibility of the service provider to ensure awareness and coordinate all upcoming changes.

Leveraging security is an effective method to identify client applications that invoke a given service.  Air-tight impact analysis can be achieved by all services implementing a common enterprise-wide security mechanism.  This can be justified through the benefits of enabling effective impact analysis and through Responsibility 4.  The basic mechanics include:
  • Each client application should be issued a unique identifier or token to use as a credential.  This is passed by the client to all services it invokes.
  • The service authorizes client-specific access using the unique identifier or token.  An authorization rule must exist for a client to invoke a service successfully--anonymous access should not be allowed.
  • Consider measures to prevent the client's unique identifier or token from being spoofed or shared among multiple applications. Addressing this is part technology (think public key encryption) and part governance (or "g8e" for those that cringe at the term governance :-).
  • Authorization rules (and secondarily usage metrics) are used to perform air-tight impact analysis.

View security as your friend in managing interface dependencies between all clients and services.

In a later series, I plan to blog about the details of an effective major/minor interface versioning strategy.  An effective versioning and decoupling strategy is a key component to realize the benefits of agility and reduced complexity. 

    SOA: Responsibilities of Service Providers (Part 6)

    Responsibility 5:  Implement business rules and edits to ensure the validity/integrity of the operation and any data that is mastered.

    This blog post is one in a series.  An overview and general outline of this series is linked here.


    Okay so you might think this responsibility is an obvious one.  It is.  But it shouldn't be overlooked.  A service that does not implement the complete set of edits breeds unexpected results, complexity, and other issues.  Interfacing apps and end-users dependent upon accessing resulting data later (and even using a downstream enterprise data warehouse) are left holding the proverbial bag.


    There are a number of methods to implement edits and rules.  The nature of the service will drive the technical approach.  Here are a few examples:
    1. Service uses XML Schema Validation to ensure a valid request has been submitted.  Much easier than writing code!!  See best practices below.
    2. Edits and rules implemented within code.  Typically this is useful for things that cannot be expressed via XML Schema.  The unit of work doesn't commit unless all edits and rules pass successfully as part of processing the request.
    3. As part of processing a request, a rules engine is invoked (either across the network as a chained service call or local to the service) to ensure validity.  Only insert a rules engine when it makes sense to do so.
    4. After a service is invoked, the underlying business process requires human action before a "logical unit of work" is complete.  This is best addressed with a human workflow engine to route requests for review and action.  On subsequent queries, a best practice is for the service to indicate the workflow status for any in-flight requests.
    Often a service will leverage one or more of these strategies.  For instance, it is common to use both 1 and 2 together.

    As a best practice, XML oriented services should leverage the richness of XML Schema to describe what constitutes a valid request and response. Examples of items that can be enforced include:
    • One to many relationships
    • Required vs. optional values
    • Dependent values (if X exists, Y and Z are required also)
    • Enumerated list of valid values for attributes/elements
    • Types and patterns.  Minimum and maximum lengths

    With both SOAP (within WSDL) and REST, XML Schema is the mechanism to outline an interface contract to all clients.  A rich contract also enables a client to generate rich language specific bindings.  This provides an object based representation to easily construct requests and parse responses to/from the service.

    Quite often I encounter opportunities to strengthen the XML Schema definition to increase the effectiveness of an interface contract.  In my view, managing a service's interface definition is a critical activity.  On the surface this seems like a mundane task; however, when done properly the benefits are quite visible.  This is especially the case when a robust interface versioning strategy is utilized to seize decoupling opportunities.

    SOA: Responsibilities of Service Providers (Part 5)

    This blog post is one in a series.  An overview and general outline of this series is linked here.

    Responsibility 4:  Implement common interface format and semantics applicable to the type of service.


    Imagine.  Your trusted business partners come to you with a new problem and clearly a user interface is the vehicle to provide a solution for them.  On the backside of the user interface there are a number of existing services that can be leveraged to perform 75% of the heavy lifting.  Sounds great right?

    Stay with me.  Imagine.  The existing services were built independently across multiple development teams.  Some were even built by contractors and consultants during various engagements.  There was little coordination across the organization resulting in each service having its own "look and feel" so to speak.  I view this as an organization that simply has "lots of services" and not a "Service Oriented Architecture" (SOA).

    So what's the big deal?  Well, without a level of coordination and standardization for how services should be built, development teams must conform to every difference embodied by each service. Technical differences across multiple services may include:
    • Different authentication and authorization mechanisms (or no security at all)
    • Different protocols (e.g. SOAP/http, REST/http, tcp sockets, Java RMI)
    • Different payload formats (e.g. XML vs ASN.1) and conventions
    • Different versioning strategies (or none at all) /* TODO blog later */
    • Different methods/mechanisms to return error messages (e.g. SOAP Faults, Remote Exceptions, or returned in a response to be interrogated)

    With each service having a little different look and feel, constructing the user interface and leveraging the existing services can be tedious.  And if care isn't taken the user interface code can easily become more complicated given these differences.


    I've seen a couple examples where a formalized technical "blue-print" for how services should be constructed across the enterprise has yielded significant benefits.  First of all, the service developers are not left to make their own fine-grain decisions for how services should be constructed on a project-by-project basis.  But rather focus can be re-directed towards business logic and the overall solution.  Too many times I've seen developers get wrapped around the axil about things such as "so, how to we implement security for our service".  When these details are landed in a blue-print it provides a platform for the I.T. organization to reach a common understanding and agreement for how services will be built technically. Question marks are minimized for developers and consistency is the net result.

    From my experience, it is a best practice when the underlying service blue-print is implemented in a common framework.  This is the technical foundation that all services are built on top of.  At a high level, the key benefits include:
    1. A rich service framework increases the odds that services will be constructed with similar "look and feel".  As a best practice, the items enumerated in the previous section should be isolated from business logic as much as possible by the framework.
    2. Developers leverage a common code library to implement services.  Focus and energy can be redirected to service-specific interface definition, business logic, and configuration.
    3. Hard problems can be solved once and embodied within the framework to be consumed by all.
    4. When a change is needed to the underlying implementation, it can be changed once within the framework.  And incorporated by each development organization.
    The goal of a common blue-print and framework for service development is to enable each development team to roll out services across the enterprise with a measure of consistency.  Ultimately, this consistency enables clients/consumers to reap the benefits of reduced effort and complexity when interacting with multiple services across the enterprise.

    Now, think about the earlier scenario.  In taking this approach, the development team will invoke the first service as part of building their user interface.  As the team proceeds to invoke their second, third, and fourth service it will be done just like the first.  The team will not have to pause to learn anything new or implement semantics that are unique to each service.

    Formalizing a common blue-print to reach agreement across the organization can be challenging.  Allocating time and priority to build out a common framework is well worth the effort.  The rewards can be very tangible and ultimately contribute to achieving a high-value Service Oriented Architecture.

    SOA: Responsibilities of Service Providers (Part 4)

    This blog post is one in a series. An overview and general outline of this series is linked here.

    Responsibility 3:  Publish and commit to a defined level of service (SLA). Publish planned vs. actual performance, and availability metrics.


    You manage things; you lead people” -Grace Hopper, invented the compiler in 1953

    "If you can not measure it, you can not improve it." -Lord Kelvin, British scientist

    It is difficult to effectively manage anything significant without metrics.  Managing performance and availability should be treated no differently, especially when very aggressive requirements are in play.  In a Service Oriented Architecture, services are the "things" to mange.

    Ineffectively managing performance and availability can create problems.  A couple examples include:
    • Critical business functions may be negatively impacted when availability and performance levels are unknown.  When interfaces are created from clients to services that do not support the required service level, it can result in business and/or direct customer impact.
    • Lack of confidence in a service.  This can result from clients/consumers experiencing unknown, sporadic outages and performance issues.  When confidence is lacking this becomes a barrier to adopting shared services and achieving reusability.


    Historically at FedEx, Customer-Supplier Alignments have been a useful tool to synchronize needs and expectations between groups that are dependent upon one another.  This parallels a Service Level Agreement (SLA) in concept.  With either, the bottom line is effective communication and aligning expectations between constituent groups.  Much has been written about SLAs by others.  I will only hit on a few key points.

    Service level management effectively starts as a Design-time activity and requires Runtime enforcement. /* TODO blog details later */  Your focus and mileage may vary depending on how critical the service is in terms of the availability and performance requirements it must meet.

    Capacity Planning

    When a service is being developed, analysis is required to determine the requirements of the service.  This is typically started by analyzing the types of business processes that will be supported.  A service supporting customers placing orders via the web or 800 number can be very different than supporting back office batch processing.

    Each client application (service consumer) must quantify performance and availability required.  I typically like to quantify requirements using:
    • Requests per second (average and max during peak hour throughput)
    • Response time per request (average and max tolerable response time)
    • Minutes/hours downtime tolerable per hour/day/week/etc.
    • Business impact of not meeting above requirements
      Formal capacity planning/analysis is done to ensure the requirements can be met as-is or by adding hardware, or taking other measures.  The more aggressive the performance and availability requirements, the more formalized the planning activity should be.

      Capturing Metrics

      At run-time, the service should be instrumented to capture actual performance metrics.  These are most useful when captured by client/consumer--this level of detail can always be rolled up to an overall number.  Useful metrics to capture are requests per second and average response time for a given interval.  

      Alerts when performance falls out of variance is considered a best practice, especially when measures can be taken to address degradation.

      Services hosted by eBay and Twitter for example implement "rate limiting" features.  This is to prevent a run-away client's unplanned volume from impacting the ability to meet service commitments for other clients.  Typically this is implemented in a very simple way to enforce the SLA at run-time.  I prefer two levels of alerting:  a warning threshold and an absolute ceiling by client resulting in requests being turned away for specified interval.  /* TODO blog details later */

      Planned vs. Actual performance and availability metrics should be published.  This is a critical tool to assist formal capacity planning.  And while alerting for outage conditions is a must, keeping metrics for total outage minutes, for both planned and unplanned events, is considered a best practice.  These details are very useful for building confidence with current and future consumers.  Also, measurements such as these drive improvements to meet business needs, depending on the criticality of the business processes supported.

      Details regarding service security will not be outlined in this post but I don't wish to minimize it's importance.  A critical success factor is being able to reliably identify each client uniquely.  This will assist troubleshooting, enable metrics to be captured reliably at the client level, and allows the client's SLA to be enforced at run-time.  Allowing anonymous or rogue clients to invoke a service can skew metrics and cause other manageability issues. /* TODO blog details later */

      Level of Rigor May Vary

      The level of rigor applied can vary depending on the criticality of the service.  When developing and managing a service that demands high-performance and high-availability, it is difficult to imagine taking on this challenge without considering the key elements of this principle.

      SOA: Responsibilities of Service Providers (Part 3)

      This blog post is one in a series.  An overview and general outline of this series is linked here.

      Responsibility 2:  Meet performance and availability requirements. As requirements and usage patterns change over time, be prepared to adapt.


      Sometimes we encounter situations where an existing service does not meet the performance or availability requirements for a new project.  Resolving the matter often involves a technical constraint of some variety.  Here are a couple examples:
      • The service makes very heavy use of database stored procedures.  The approach to scalability long term is complicated.
      • The service interface supports very large request/response payload sizes.  The worst case is probably where an "unbounded" size is allowed (e.g. by the XML Schema Definition or XSD).  Under low volumes the service may work fine.  However, the service becomes unstable as volume and number of concurrent requests increases (e.g. due to exhausting the JVM memory allocation).

      Issues such as these and/or organizational concerns can prevent changes from being made in a timely manner to meet requirements.  This can impact the re-usability of a service.


      This requirement, or principle, is front and center so it is clear to service providers the architecture must scale to meet current and future requirements.  Even a re-architecture should not be out of the question for any unforeseen situations.  The service provider must have "ownership" of a solution to meet the purpose and general objectives of a SOA.

      Typical patterns and decision points:

      1. Implement stateless request/response services that scale by adding additional computers or virtual machines behind a load balancer to handle additional volume.
      2. Use of an "intelligent" load balancer to direct requests away from failed instances during a period of outage (as opposed to static DNS RR).
      3. Clear separation between data persistence and application tiers to enable each to scale independently.
      4. Business logic is best suited in the application tier where scalability is much easier to achieve. If there is a belief it belongs within a stored procedure to meet performance requirements, evaluate this belief very carefully. (Maybe I'll cover this topic in a later blog post.)
      5. Where applicable, an in-memory caching strategy can and should be employed behind the service interface to meet aggressive requirements.
      6. The application team must make critical decisions around the appropriate level of granularity for the service. For instance very fine-grain RPC style invocations are typically not appropriate when the technology is SOAP or REST over http. A more coarse-grain approach is best with these technologies.
      7. Where applicable enable a client to submit multiple "units of work" in a single request. This will cut down on the number of round trips across the network.
      In a later installment, we will touch on a related topic--Service Level Agreements and measuring planned vs. actual metrics.

      SOA: Responsibilities of Service Providers (Part 2)

      This blog post is one in a series.  An overview and general outline of this series is linked here.

      Responsibility 1:  The functional needs of the enterprise as a whole must be considered to maximize reuse


      It can be common to find feature/function duplicated across multiple applications and development organizations, especially in companies that have been around a while.  For instance, "address validation", "geocoding", "credit card validation", and "currency conversion" are examples of functionality that might be duplicated.  In each case, the features might differ, the underlying technology might differ, and where applicable underlying software vendors might differ.  Ultimately this duplication translates into increased hardware, license, labor, and maintenance costs.

      What are some reasons duplication exists?

      In using "currency conversion" as an example:
      1. Technology and platform differences:  A distributed application invokes currency conversion directly using a C API.  A COBOL application uses currency conversion on the mainframe and inaccessible to distributed applications.
      2. Differing requirements:  An application area supporting North American operations deploys conversion between U.S. and Canadian currencies only.  At some other point in time, another development area supporting international operations requires conversion for a broader set of countries.  Or an application needs "historical" conversion rates but the service owner is only able to support "current" rates.
      3. Differing priorities and timelines:  An existing service cannot be extended to meet the needs of others due to project workload and competing priorities.
      4. Ownership considerations:  An application area is unwilling to take responsibility for any impacts resulting from outages.  Or is unwilling to coordinate changes/testing with any other development groups.
      5. Differing performance and availability requirements:  An application team with more stringent availability or performance needs decides to deploy duplicate capability to maintain control of their own destiny.
      6. Lack of visibility to existing services that are available
      At this point you might see a common thread:  some of these relate to underlying organizational considerations/issues.  Additionally, without a productive governance program, issues are accentuated in organizations operating in vertical silos (e.g. associated to line-of-business) as opposed to certain functions aligned in a more horizontal manner.

      These items are sometimes used as justification for sprouting duplicate functionality.  An archaeological dig into the history of how duplication came to be is educational.  Sometimes the reasons for duplication can be rationalized.  And sometimes the reasons aren't valid at all, especially after a certain amount of time has passed.


      When building a new service or transitioning a legacy environment to common services, it is important to ensure "functional ownership" is clear.  Achieving reuse can have different context.  Some services have utility across the enterprise.  Others are more domain-specific but hold the promise of greater use across the enterprise in the future.  This is the reason I slipped the word "enterprise" into Responsibility 1.

      To paint a simple picture of a good "currency conversion" service, it is one that maximizes reuse for both today's use cases as well enables future use cases to be supported.  It supports currencies for the countries presently served and can expand to support other countries when needed.  It supports conversions for current conversion rates as well as historical rates if there is a need to do so.  And the architecture and development team is able to flex to meet additional functional, performance, and availability needs to maximize reuse across multiple applications.

      This can be difficult to achieve without organizational considerations and/or a formalized governance process.  Additionally, standing-up a service suitable for the enterprise requires the ability to perform a necessary level of business analysis.  This is to ensure the broad functional needs of the enterprise are identified.

      As briefly mentioned in a prior post, the goals of SOA typically involve reducing duplication and increasing reuse as a means to promote a more cost effective and agile operating model.  It is difficult to meet these objectives without taking at least the spirit of this principle seriously.

      Next time I'll explore...Responsibility 2: Meet performance and availability requirements. As requirements and usage patterns change over time, be prepared to adapt.