Reactive Programming in a Nutshell
In this guest post, Peter Verhas attempts to explain reactive programming with simple yet effective examples, which drive home the concept of reactive programming.
Reactive programming is a paradigm that focuses more on where the data flows during computation than on how to compute the result. The problem is best described as several computations that depend on the output of one another, but if several may be executed independently of the other, reactive programming may come into the picture. As a simple example, we can have the following computation that calculates the value of h from some given b, c, e,and f values, using f1, f2, f3, f4,and f5 as simple computational steps:
a = f1(b,c)
d = f2(e,f)
k = f3(e,c)
g = f4(b,f,k)
h = f5(d,a,g)
If we write these in Java in a conventional way, the methods f1 to f5 will be invoked one after the other. If we have multiple processors and we are able to make the execution parallel, we may also perform some of the methods parallel. This, of course, assumes that these methods are purely computational methods and do not change the state of the environment, and, in this way, they can be executed independently of one another. For example, f1, f2, and f3 can be executed independently of one another. The execution of the f4function depends on the output of f3, and the execution of f5 depends on the output of f1, f2, and f4.
If we have two processors, we can execute f1 and f2 together, followed by the execution of f3, then f4, and, finally, f5. These are the four steps. If we look at the preceding calculation not as commands but rather as expressions and how the calculations depend on one another, then we do not dictate the actual execution order, and the environment may decide to calculate f1 and f3 together, then f2 and f4, and, finally f5, saving one step. This way, we can concentrate on the data flow and let the reactive environment act upon it without putting in extra constraints:
This is a very simple approach of reactive programming. The description of the calculation in the form of expressions gives the data flow, but in the explanation, we still assumed that the calculation is executed synchronously. If the calculations are executed on different processors located on different machines connected to a network, then the calculation may not and does not need to be synchronous.
Reactive programs can be asynchronously executed if the environment is asynchronous. It may happen that the different calculations, f1 to f4, are implemented and deployed on different machines.
In such a case, the values calculated are sent from one to the other over the network, and the nodes execute the calculation every time there is a change in the inputs. This is very similar to good old analog computers that were created using simple building blocks, and the calculations were done using analog signals.
The program was implemented as an electronic circuit, and when the input voltage or current (usually voltage) changed in the inputs, the analog circuits followed it at light speed, and the result appeared in the output. In such a case, the signal propagation was limited by the speed of light on the wires and analog circuitry speed in the wired modules, which was extremely fast and may beat digital computers.
When we talk about digital computers, the propagation of the signal is digital, and this way, it needs to be sent from one calculation node to the other one, be it some object in JVM or some program on the network. A node has to execute its calculation if either of the following apply:
- Some of the values in the input have changed
- The output of the calculation is needed
If the input has not changed, then the result should eventually be the same as the last time; thus, the calculation does not need to be executed again—it would be a waste of resources. If the result of the calculation is not needed, then there is no need to perform the calculation, even if the result would not be the same as the last one. No one cares.
To accommodate this, reactive environments implement two approaches to propagate the values. The nodes may pull the values from the output of other modules. This will ensure that no calculation that is not needed will be executed. The modules may push their output to the next module that depends on them. This approach will ensure that only changed values ignite calculation. Some of the environments may implement a hybrid solution.
When values change in the system, the change is propagated toward the other nodes that again propagate the changes to another node, and so on. If we imagine the calculation dependencies as a directed graph, then the changes travel towards the transitive closure of the changed values along the nodes connected.
The data may travel with all the values from one node output to the other node input, or only the change may travel. The second approach is more complex because it needs the changed data and also meta information that describes what has changed. On the other hand, the gain may be significant when the output and input set of data is huge, and only a small portion of it is changed.
It may also be important to calculate and propagate only the actual delta of the change when there is a high probability that some of the nodes do not change the output for many of the different inputs. In such a case, the change propagation may stop at the node where there is no real change in spite of the changed input values. This can save up a lot of calculation in some of the networks.
In the configuration of the data propagation, the directed acyclic graph can be expressed in the code of the program; it can be configured, or it can even be set up and changed during the execution of the code dynamically. When the program code contains the structure of the graph, the routes and the dependencies are fairly static.
To change the data propagation, the code of the program has to be changed, recompiled, and deployed. If there are multiple network node programs, this may even need multiple deployments that should be carefully furnished to avoid different incompatible versions running on different nodes.
There should be similar considerations when the graph is described in some configuration. In such a case, the compilation of the program(s) may not be needed when only the wiring of the graph is changed, but the burden to have compatible configuration on different nodes in the case of a network execution is still there.
Letting the graph change dynamically also does not solve this problem. The setup and the structure are more flexible and, at the same time, more complex. The data propagated along the edges of the graph may contain not only computational data but also data that drives changes in the graph. Many times, this leads to a very flexible model called higher-order reactive programming.
Reactive programming has a lot of benefits, but, at the same time, it may be very complex, sometimes too complex, for simple problems. It is to be considered when the problem to be solved can easily be described using data graph and simple data propagations. We can separate the description of the problem and the order of the execution of the different blocks. This is the same consideration that we discussed in the previous chapter. We describe more about the what to do part and less about the how to do part.
On the other hand, when the reactive system decides the order of execution, what is changed, and how that should be reflected on the output of other blocks, it should do so without knowing the core of the problem that it is solving. In some situations, coding the execution order manually based on the original problem could perform better.
Note
This is similar to the memory management issue. In modern runtime environments, such as the JVM, Python runtime, Swift programming, or even Golang, there is some automated memory management. When programming in C, the programmer has full control over memory allocation and memory release.
In the case of real-time applications, where the performance and response time is of the utmost importance, there is no way to let an automated garbage collector take time and delay the execution from time to time. In such a case, the C code can be optimized to allocate memory when needed; there is a resource for the allocation and release of memory when possible, and there is time to manage memory.
These programs are better performing than the ones created for the same purpose using a garbage collector. Still, we do not use C in most of the applications because we can afford the extra resources needed for automated memory collection. Even though it would be possible to write a faster code by managing the memory manually, automated code is faster than what an average programmer would have created using C, and also the frequency of programming errors is much lower.
Just as there are some issues that we have to pay attention to when using automated memory management, we have to pay attention to some issues in a reactive environment, which would not exist in the case of manual coding. Still, we use the reactive approach for its benefits.
The most important issue is to avoid loops in thedependency graph. Although it is absolutely perfect to write thedefinition of calculations, a reactive system would probably not be able tocope with these definitions. Some reactive systems may resolve in somesimple-case cyclic redundancy, but that is an extra feature, and we generallyjust have to avoid that. Consider the following computations:
a = b + 3
b = 4 / a
Here, a depends on b, so when b changes, a is calculated. However, b also depends on a, which is recalculated, and, in this way, the system gets into an infinite loop. The preceding example seems to be simple, but that is the feature of a good example. Real-life problems are not simple, and in a distributed environment, it is extremely hard sometimes to find cyclic redundancy.
Another problem is called a glitch.Consider the following definition:
a = b + 3
q = b + a
When the parameter b is changed, for example, from 3 to 6, the value of a will change from 6 to 9, and, thus, q will change from 9 to 15. This is very simple. However, the execution order based on the recognition of the changes may first alter the value of q from 9 to 12 before modifying it to 15 in the second step.
This can happen if the calculating node responsible for the calculation of q recognizes the change in b before the value of a as a consequence of the change in the value of b. For a short period of time, the value of q will be 12, which doesn’t match the previous one and the changed state. This value is only a glitch in the system that happens after an input changes and also disappears without any further change in the input in the system:
If you have ever learned the design of logical circuits, then static hazards may ring a bell. They are exactly the same phenomenon.
Reactive programming also assumes that the calculations are stateless. The individual nodes that perform the calculation may have a state in practice and, in most cases, they do. It is not inherently evil to have a state in some calculation. However, debugging something that has a state is significantly more complex than debugging something that is stateless, and functional.
It is also an important aid to the reactive environment, letting it perform different optimizations based on the fact that the calculations are functional. If the nodes have a state, then the calculations may not be rearranged freely because the outcome may depend on the actual evaluation order. These systems may not really be reactive, or, at least, this may be debated.
If this article piqued your interest in reactive programming and Java in general, you can explore Peter Verhas’s Java Projects – Second Edition to learn the fundamentals of Java 11 programming by building industry grade practical projects. Following a learn-as-you-do approach, Java Projects – Second Edition is perfect for anyone who wants to learn the Java programming language (no prior programming experience required).
Leave a Reply