For traditional source systems, the data is constantly present and can be requested at any time. For streaming based source systems this is not the case. The whole data is not available any time. The required data should be retrieved, processed and stored in realtime by communicating with the streaming service. In Celonis, for the supported streaming services Realtime Connectivity can be achieved using Streaming Cockpit.
In order to accommodate these streaming source systems, we introduced a different type of extraction architecture. Each connection to the streaming source system is called "Subscription" in the Streaming Cockpit. These subscriptions can be turned on and off using the Streaming Cockpit UI.
Celonis supports different streaming services like Azure Service Bus, Azure Event Hub, Salesforce Streaming. Since the streaming services are not maintained by Celonis, we are playing the role of a “user”. To retrieve and process data, we have different strategies for different streaming services. Let’s check an example to concretize.
Real-time_Connectors_-_Graphic_2.jpg
For example, Salesforce Streaming gives us a “replayID” which indicates the event number in order. Among all the present events Salesforce Streaming services provide, we know from where to process the data by the replayID. We can signal how many new messages we received and processed. In case of a temporary failure, extraction will correct itself and re-execute automatically by using the last successfully processed replayID.
Once we successfully process the data, we update our replayID, signal the changes and let the Salesforce Streaming decide whether to keep or delete the messages from their stream. We cannot enforce the streaming services to delete processed data because there might be other users watching the same stream.
Salesforce Streaming is just one example. Not all streaming services have the same structure, so we customize our solutions to the specific needs of the streaming services to have a better efficiency in every case.