Datasports on Software Development

Articles and updates from Datasports about the craft of software

Thoughts on Encapsulation Schemes, Part 1.

leave a comment »

1. Introduction

StreamBase has a set of related features which allow the quick and relatively painless refactoring of logic into modules, and several mechanisms for reusing those modules in your code. These are very powerful features which are an important part of what makes StreamBase development so productive compared to other environments like .NET or J2EE.

Despite being incredibly powerful and flexible, the mechanisms for encapulating and re-using logic, and the many options related to concurrency are accessible and straightforward to use. While the StreamBase platform handles an incredible amount of plumbing that would otherwise fall to the application developer, due care must still be taken to ensure that the correct approach is taken to ensure correct behavior with the highest possible performance.

This article will provide an overview of the many different combinations of hosting types as well as present some heuristics about when the different approaches make sense.

This discussion considers encapsulation schemes along two axes: hosting types (discussed in this article), and concurrency settings (discussed in Part 2 of this article). Not every possible combination is valid, but in general the type of hosting and the concurrency options are independent.

2. Hosting Types

Once logic has been encapsulated in a module, there are 4 basic ways of hosting it (which may actually be combined under some circumstances). Those hosting mechanisms are discussed in this section.

2.1 Module Reference

The simplest hosting type is a Module Reference. This is what the StreamBase EventFlow editor creates when you drag a module from the Package Explorer to the canvas. With a Module Reference, you can create and name the instance, and set instance-specific module parameters. By default the logic within a Module Reference runs in the same thread as the containing module (unless the referenced module explicitly or implicitly creates threads of its own). A Module Reference can be configured to run in its own thread, or to have multiple threads (more on this below), but all instances will have the same values for their module parameters.

An important property of Module References is that the definitions of the input streams are not just checked at design time, they are actually defined at design time. You can create a module with one or more input streams that have dummy schemas defined, and then those schemas will be overridden by the schemas of the streams connected up to them when the module is referenced.

As a very simple example, consider a module which only adds the current time to tuples passed through. It makes no reference to any existing fields. We can define the schema of the input stream to be something simple, preferably indicating that the schema is to be overridden by the containing module:

Dummy input schema in AddCurrentTime.sbapp

The map operator adds one field, CurrentTime, to the input tuple. We see that the output schema reflects this:

Dummy output schema in AddCurrentTime.sbapp

Now we use 2 instances of this module in a parent module as follows:

Overview of containing module

No schema in this containing module corresponds to the dummy schema in the contained module. The building and typechecking mechanism will ensure that each module instance can work with the provided input schema, and calculate the correct output schema. The input schema for this module is as follows:

InEvents Schema

Tuples received through this stream will first be sent unchanged to AddCurrentTimeRef1 and out the OutEvents stream, then through the second branch of logic where ReformatEvent changes the schema and the content of the tuple, and passes it to AddCurrentTimeRef2 and out the OutReformattedEvents stream.

Here is how ReformatEvent modifies the schema and content of its incoming tuple:

The output from ReformatEvent

NOTE: AddCurrentTimeRef1 and AddCurrentTimeRef2 have different input and output schema definitions, despite being instances of the same module. The implicit specification of the 2 different input schemas happens at design/build time, not at run time.

There are even ways to define a module so that certain operations can be performed on the unspecified fields, but that is an advanced topic beyond the scope of this article.

This results in the following output schemas:

OutEvents schema, based on InEvents plus the CurrentTime field

OutReformattedEvents schema, based on ReformatEvent plus the CurrentTime field

NOTE: Consider this flexibility and how best to use it (or avoid it) when defining your modules. If you require this flexibility, then a Module Reference is the only way to get it. If you do not require this flexibility, then consider one of the other mechanisms.

It is possible to define the inputs and outputs of modules to use named schemas, which will enforce a strict matching. So this flexibility is supported by the use of Module References, but it is not required.

2.2 Extension Point

Extension Points are a very powerful and relatively recent feature of the StreamBase platform. This is essentially how polymorphism is implemented in StreamBase. An Extension Point can contain any set of modules, provided all modules implement the same interface.

Interfaces are worth a topic of their own, but the short story is that an interface specifies the set of input and output streams, and referenced Query Tables that must be provided in a module that implements that interface. The names and schemas must match what’s defined in the interface, but the behavior behind those streams and Query Tables is entirely up to the module.

An Extension Point differs from a Module Reference in 3 key ways:

  1. All modules hosted within an Extension Point must have strictly specified schemas.
  2. Multiple instances of the same module hosted within a single line of logic can have different values for their module parameters
  3. A single line of logic can hit instances of different modules

NOTE: Consider using an Extension Point if you have a case where you want to be able to switch at design time, startup, or run time between different behaviors that act on the same data. A great example of this would be the encapsulation of data management that could be coming from different sources (e.g. different DB providers, or different DB instances with differing schemas). Another example is something like a market data provider, where subscriptions, quotes, and trades are all represented via normalized schemas, but where the handling of those items is very different depending on the actual market data source.

2.3 Container

A container is the top-level entity for hosting a module. When you host a module in its own container, there is no way to share data constructs with modules in other containers. It makes no sense to have a placeholder Query Table hosted in a top-level module, although this is allowed. If you are hosting a module in its own container, do a careful analysis to make sure that there are no placeholder tables. If you do find a placeholder table, then ensure that the module is not depending on another component to ensure the correct contents of that table.

Consider using a container to host a module if any of the following are true:

  1. The module contains logic or manages data that should be accessible from a single centralized source (e.g. GUI interaction, stats collection, report generation, data access & management).
  2. It contains well-encapsulated functionality that you may want to run or suspend independent of other logical components (e.g. a complex interaction with an external system, such as an order entry session).
  3. The logic or functionality may be desired in some system configurations but not in others (e.g. simulators for external systems, rich metrics gathering and logging, optional features).

By default, all StreamBase applications will have at least one container, called “default”. You may specify any number of additional containers, but they are not without their cost, so use care and the above guidelines when planning container use. There are 2 ways to specify containers and the applications and connections between them, discussed below.

2.3.1 Defined in .sbconf file

In the past, your .sbconf file was the only place to define containers, specify which modules/applications would run in them, and how they would be connected. This is mentioned here for completeness and because you may encounter this in existing systems. This has been deprecated, and the new mechanism (discussed below) is much nicer to use for the following reasons:

  1. It decouples the specification of run-time parameters and configuration items from the structure of the system so that they can be managed independently.
  2. StreamBase Studio provides more support for .sbdeploy files, including design-time name completion and typechecking of connections. This reduces the likelihood of errors being discovered only at run time.

The old mechanism works fine for now, but it will not be supported indefinitely. If you take ownership of an application that defines containers within an .sbconf file, you should plan to migrate the container/connection definitions to an .sbdeploy file. The format for containers within a .sbconf file is:

	
<runtime>
  <application file="ernie.sbapp" container="Ernie"/>
  <application file="bert.sbapp" container="Bert">
    <container-connection dest="Bert.InSetValues" source="Ernie.OutNewValues"/>
    <container-connection dest="Bert.InGenerateReport" source="Ernie.OutRequests"/>
  </application>
</runtime>

For more complete documentation, see the StreamBase docs.

2.3.2 Defined in .sbdeploy file

NOTE: This is the way to go, for the reasons mentioned above. Unless you are dealing with a legacy system and/or an outdated version of the StreamBase platform, there is no reason to ever define containers anywhere other than in a .sbdeploy file.

The format of .sbdeploy files is documented in StreamBase Deployment File XML Reference

2.4 Separate Instance

StreamBase supports communication between separate server instances, either on the same machine, or remote. This is managed through the StreamBase to StreamBase Input and StreamBase to StreamBase Output adapters, or through remote container connections in the .sbdeploy file. By using these mechanisms, connections between server instances can be established in a way that is similar to the connections between containers within a single server instance.

The only difference between using the StreamBase to StreamBase adapters vs. specifying remote connections in the .sbdeploy file is the optional event port on the adapter. This event port indicates connections, disconnections, timeouts, etc. I strongly recommend using the adapters and making use of this mechanism. At a minimum, it provides a clear mechanism for reflecting the health of the connection in whatever monitoring tool you create to manage the application. Going farther, these events can be coupled with a StreamBase Admin operator to provide a mix of automatic and manual management of the remote system.

While similar to an inter-container connection within one server instance, an inter-server connection using the special adapters differs in the following ways:

  1. One end of the connection needs to use the appropriate StreamBase to StreamBase adapter.
  2. There is an extra burden on the application developer to handle cases where the remote system may not be available.
  3. There is no design time or build time typechecking of the stream schemas, so problems can only appear as run time errors.

Consider using separate server instances if any of the following are true:

  1. You have a requirement to distribute processing across multiple machines in order to make best use of computing infrastructure.
  2. You have a requirement for a component to run remotely (such as co-located at a client or vendor site).
  3. You want to provide the greatest possible insulation between two or more components of your system (e.g. you may have a crash-prone adapter provided by a vendor and you want to ensure that the adapter instance can not kill your core business logic).

The details of setting up a StreamBase to StreamBase connection is an advanced topic, but if you are interested in doing this, then consider the following guidelines:

  1. If possible, have only one of the connected systems be aware of the remote connection. i.e. only one of the two connected server instances uses StreamBase to StreamBase adapters.
  2. Use configuration items in the .sbconf file to specify the remote server and stream details. Never code these values as literals in the .sbapp files.
  3. Include monitoring, startup, and shutdown mechanisms for all remote server instances in your implementation.

The two different mechanisms for communication between separate server instances are documented here:

  1. Remote Container Connection Parameters
  2. StreamBase to StreamBase Input Adapter
  3. StreamBase to StreamBase Output Adapter
  4. StreamBase Admin Operator

Conclusion

In this article, we have presented an overview of the different ways of hosting logic that has been encapsulated into a .sbapp module, as well as some guidance on when and how to use the different approaches. This is a very rich subject worthy of deeper discussion, but hopefully this article will help both new and experienced StreamBase developers think analytically about how best to encapsulate and host the logic components in their systems.

If you have found this article interesting or helpful, please take some time to post your comments, questions, or suggestions on this blog.

Thanks,
Phil Martin
Datasports Inc.

Advertisements

Written by datasports

Oct 4, 2011 at 10:46 PM

Posted in Best Practices

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: