Month: November 2018

Reactive Programming in a Nutshell

In this guest post, Peter Verhas attempts to explain reactive programming with simple yet effective examples, which drive home the concept of reactive programming.

Reactive programming is a paradigm that focuses more on where the data flows during computation than on how to compute the result. The problem is best described as several computations that depend on the output of one another, but if several may be executed independently of the other, reactive programming may come into the picture. As a simple example, we can have the following computation that calculates the value of h from some given b, c, e,and f values, using f1, f2, f3, f4,and f5 as simple computational steps:

If we write these in Java in a conventional way, the methods f1 to f5 will be invoked one after the other. If we have multiple processors and we are able to make the execution parallel, we may also perform some of the methods parallel. This, of course, assumes that these methods are purely computational methods and do not change the state of the environment, and, in this way, they can be executed independently of one another. For example, f1, f2, and f3 can be executed independently of one another. The execution of the  f4function depends on the output of f3, and the execution of f5 depends on the output of f1, f2, and f4.

If we have two processors, we can execute f1 and f2 together, followed by the execution of f3, then f4, and, finally, f5. These are the four steps. If we look at the preceding calculation not as commands but rather as expressions and how the calculations depend on one another, then we do not dictate the actual execution order, and the environment may decide to calculate f1 and f3 together, then f2 and f4, and, finally f5, saving one step. This way, we can concentrate on the data flow and let the reactive environment act upon it without putting in extra constraints:

This is a very simple approach of reactive programming. The description of the calculation in the form of expressions gives the data flow, but in the explanation, we still assumed that the calculation is executed synchronously. If the calculations are executed on different processors located on different machines connected to a network, then the calculation may not and does not need to be synchronous.

Reactive programs can be asynchronously executed if the environment is asynchronous. It may happen that the different calculations, f1 to f4, are implemented and deployed on different machines.

In such a case, the values calculated are sent from one to the other over the network, and the nodes execute the calculation every time there is a change in the inputs. This is very similar to good old analog computers that were created using simple building blocks, and the calculations were done using analog signals.

The program was implemented as an electronic circuit, and when the input voltage or current (usually voltage) changed in the inputs, the analog circuits followed it at light speed, and the result appeared in the output. In such a case, the signal propagation was limited by the speed of light on the wires and analog circuitry speed in the wired modules, which was extremely fast and may beat digital computers.

When we talk about digital computers, the propagation of the signal is digital, and this way, it needs to be sent from one calculation node to the other one, be it some object in JVM or some program on the network. A node has to execute its calculation if either of the following apply:

  • Some of the values in the input have changed
  • The output of the calculation is needed

If the input has not changed, then the result should eventually be the same as the last time; thus, the calculation does not need to be executed again‚ÄĒit would be a waste of resources. If the result of the calculation is not needed, then there is no need to perform the calculation, even if the result would not be the same as the last one. No one cares.

To accommodate this, reactive environments implement two approaches to propagate the values. The nodes may pull the values from the output of other modules. This will ensure that no calculation that is not needed will be executed. The modules may push their output to the next module that depends on them. This approach will ensure that only changed values ignite calculation. Some of the environments may implement a hybrid solution.

When values change in the system, the change is propagated toward the other nodes that again propagate the changes to another node, and so on. If we imagine the calculation dependencies as a directed graph, then the changes travel towards the transitive closure of the changed values along the nodes connected.

The data may travel with all the values from one node output to the other node input, or only the change may travel. The second approach is more complex because it needs the changed data and also meta information that describes what has changed. On the other hand, the gain may be significant when the output and input set of data is huge, and only a small portion of it is changed.

It may also be important to calculate and propagate only the actual delta of the change when there is a high probability that some of the nodes do not change the output for many of the different inputs. In such a case, the change propagation may stop at the node where there is no real change in spite of the changed input values. This can save up a lot of calculation in some of the networks.

In the configuration of the data propagation, the directed acyclic graph can be expressed in the code of the program; it can be configured, or it can even be set up and changed during the execution of the code dynamically. When the program code contains the structure of the graph, the routes and the dependencies are fairly static.

To change the data propagation, the code of the program has to be changed, recompiled, and deployed. If there are multiple network node programs, this may even need multiple deployments that should be carefully furnished to avoid different incompatible versions running on different nodes.

There should be similar considerations when the graph is described in some configuration. In such a case, the compilation of the program(s) may not be needed when only the wiring of the graph is changed, but the burden to have compatible configuration on different nodes in the case of a network execution is still there.

Letting the graph change dynamically also does not solve this problem. The setup and the structure are more flexible and, at the same time, more complex. The data propagated along the edges of the graph may contain not only computational data but also data that drives changes in the graph. Many times, this leads to a very flexible model called higher-order reactive programming.

Reactive programming has a lot of benefits, but, at the same time, it may be very complex, sometimes too complex, for simple problems. It is to be considered when the problem to be solved can easily be described using data graph and simple data propagations. We can separate the description of the problem and the order of the execution of the different blocks. This is the same consideration that we discussed in the previous chapter. We describe more about the what to do part and less about the how to do part.

On the other hand, when the reactive system decides the order of execution, what is changed, and how that should be reflected on the output of other blocks, it should do so without knowing the core of the problem that it is solving. In some situations, coding the execution order manually based on the original problem could perform better.

Note

This is similar to the memory management issue. In modern runtime environments, such as the JVM, Python runtime, Swift programming, or even Golang, there is some automated memory management. When programming in C, the programmer has full control over memory allocation and memory release.

In the case of real-time applications, where the performance and response time is of the utmost importance, there is no way to let an automated garbage collector take time and delay the execution from time to time. In such a case, the C code can be optimized to allocate memory when needed; there is a resource for the allocation and release of memory when possible, and there is time to manage memory.

These programs are better performing than the ones created for the same purpose using a garbage collector. Still, we do not use C in most of the applications because we can afford the extra resources needed for automated memory collection. Even though it would be possible to write a faster code by managing the memory manually, automated code is faster than what an average programmer would have created using C, and also the frequency of programming errors is much lower.

Just as there are some issues that we have to pay attention to when using automated memory management, we have to pay attention to some issues in a reactive environment, which would not exist in the case of manual coding. Still, we use the reactive approach for its benefits.

The most important issue is to avoid loops in thedependency graph. Although it is absolutely perfect to write thedefinition of calculations, a reactive system would probably not be able tocope with these definitions. Some reactive systems may resolve in somesimple-case cyclic redundancy, but that is an extra feature, and we generallyjust have to avoid that. Consider the following computations:

Here, a depends on b, so when b changes, a is calculated. However, b also depends on a, which is recalculated, and, in this way, the system gets into an infinite loop. The preceding example seems to be simple, but that is the feature of a good example. Real-life problems are not simple, and in a distributed environment, it is extremely hard sometimes to find cyclic redundancy.

Another problem is called a glitch.Consider the following definition:

When the parameter b is changed, for example, from 3 to 6, the value of a will change from 6 to 9, and, thus, q will change from 9 to 15. This is very simple. However, the execution order based on the recognition of the changes may first alter the value of q from 9 to 12 before modifying it to 15 in the second step.

This can happen if the calculating node responsible for the calculation of q recognizes the change in b before the value of a as a consequence of the change in the value of b. For a short period of time, the value of q will be 12, which doesn‚Äôt match the previous one and the changed state. This value is only a glitch in the system that happens after an input changes and also disappears without any further change in the input in the system:

If you have ever learned the design of logical circuits, then static hazards may ring a bell. They are exactly the same phenomenon.

Reactive programming also assumes that the calculations are stateless. The individual nodes that perform the calculation may have a state in practice and, in most cases, they do. It is not inherently evil to have a state in some calculation. However, debugging something that has a state is significantly more complex than debugging something that is stateless, and functional.

It is also an important aid to the reactive environment, letting it perform different optimizations based on the fact that the calculations are functional. If the nodes have a state, then the calculations may not be rearranged freely because the outcome may depend on the actual evaluation order. These systems may not really be reactive, or, at least, this may be debated.

If this article piqued your interest in reactive programming and Java in general, you can explore Peter Verhas‚Äôs Java Projects ‚Äď Second Edition to learn the fundamentals of Java 11 programming by building industry grade practical projects. Following a learn-as-you-do approach, Java Projects ‚Äď Second Edition is perfect for anyone who wants to learn the Java programming language (no prior programming experience required).

Transfer Learning

Transfer learning does exactly as the name says. The idea is to transfer something learned from one task and apply it to another. Why? Practically speaking, training entire models from scratch every time is inefficient, and its success depends on many factors.

Another important reason is that for certain applications, the datasets that are publicly available are not big enough to train a deep architecture like AlexNet or ResNet without over-fitting, which means failing to generalize. Example applications could be online learning from a few examples given by the user or fine-grained classification, where the variation between the classes is minimal.

A very interesting observation is that final layers can be used to work on different tasks, given that you freeze all the rest, whether it is detection or classification, end up having weights that look very similar.

This leads to the idea of transfer learning. For example, ImageNet can generalize so well that it’s convolutional weights can act as feature extractors, similar to conventional visual representations and can be used to train a linear classifier for various tasks.

When?

Research has shown that feature extraction in convolutional network weights trained on ImageNet outperforms the conventional feature extraction methods such as SURF, Deformable Part Descriptors (DPDs), Histogram of Oriented Gradients (HOG), and bag of words (BoW).

This means we can used Convolutional features equally well with the conventional visual representations. The only drawback being that deeper architectures might require a longer time to extract the features.

That deep convolutional neural network is trained on ImageNet. The visualization of convolution filters in the first layers shows that they learn low-level features similar to edge detection filters. Whereas, the convolution filters at the last layers learn high-level features that capture the class-specific information.

Hence, if you extract the features for ImageNet after the first pooling layer and embed them into a 2D space. The visualization will show that there is some anarchy in the data. However, if we do the same at fully connected layers, we can notice that the data with the same semantic information gets organized into clusters. This implies that the network generalizes quite well at higher levels, and it will be possible to transfer this knowledge to unseen classes.

According to experiments transfer learning conducted on datasets with a small degree of similarity with respect to ImageNet, the features based on convolutional neural network weights trained on ImageNet perform better than the conventional feature extraction methods for the following tasks:

  • Object recognition: This CNN feature extractor can successfully perform classification tasks on other datasets with unseen classes.
  • Domain adaptation: This is when the training and testing data are from different distributions, while the labels and number of classes are the same. Different domains can consider images captured with different devices or in different settings and environment conditions. A linear classifier with CNN features successfully clusters images with the same semantic information across different domains, while SURF features overfit to domain-specific characteristics.
  • Finegrained classification: This is when we want to classify between the subcategories within the same high-level class. For example, we can categorize between bird species. CNN features, along with logistic regression, although not trained on fine-grained data, perform better than the baseline approaches.
  • Scene recognition: Here, we need to classify the scene of the entire image. A CNN feature extractor trained on object classification databases with a simple linear classifier on top, outperforms complex learning algorithms applied on traditional feature extractors on recognition data.

Some of the tasks mentioned here are not directly related to image classification, which was the primary goal while training on ImageNet and therefore someone would expect that the CNN features would fail to generalize to unseen scenarios. However, those features, combined with a simple linear classifier, outperform the hand-crafted features. This means that the learned weights of a CNN are reusable.

So when should we use transfer learning? When we have a task where the available dataset is small due to the nature of the problem (such as classify ants/bees). In this case, we can train our model on a larger dataset that contains similar semantic information and subsequently, retrain the last layer only (linear classifier) with the small dataset. 

If we have just enough data available, and there is a larger similar dataset to ours, pretraining on this similar dataset may result in a more robust model. As we normally train models with the weights randomly initialized, in this case, they will be initialized with the weights trained on this other dataset. This will facilitate the network to converge faster and generalise better. In this scenario, it would make sense to only fine-tune a few layers at the top end of the model.

How? An overview

There are two typical ways to go about this.

The first and more common way, is to use pre-trained model, a model that has previously been trained on a large scale dataset. Those models are readily available across different deep learning frameworks and are often referred to as “model zoos”.

The pre-trained model is largely dependent on what the current task to be solved is, and the size of the datasets. After the choice of model, we can use all of it or parts of it, as the initialized model for the actual task that we want to solve.

The other, less common way is to pretrain the model ourselves. This typically occurs when the available pretrained networks are not suitable to solve specific problems, and we have to design the network architecture ourselves.

Obviously, this requires more time and effort to design the model and prepare the dataset. In some cases, the dataset to pre-train the network on can even be synthetic, generated from computer graphic engines such as 3D studio Max or Unity, or other convolutional neural networks, such as GANs. The model pre-trained on virtual data can be fine-tuned on real data, and it can work equally well with a model trained solely on real data.

If we want to discriminate between cats and dogs, and we do not have enough data, we can download a network trained on ImageNet from the “model zoo”and use the weights from all but the last of its layers.

The last layer has to be adjusted to have the same size as the number of classes and the weights to be reinitialized and trained.

So it means we can freeze the layers that are not to be trained by setting the learning rate for these layers to zero, or to a very small number. In case a bigger dataset is available, we can train the last three fully connected layers. Sometimes, pre-trained network can be used only to initialize the weights and then be trained normally.

Transfer learning works because the features computed at the initial layers are more general and look similar. The features extracted in the top layers become more specific to the problem that we want to solve.

How? Code example

In this section you will learn the practical skills needed to perform transfer learning in TensorFlow. More specifically, we’ll learn how to select layers to be loaded from a checkpoint and also how to instruct your solver to optimize only specific layers while freezing the others.

TensorFlow useful elements

Transfer learning is about training a network initialized with weights taken from another trained model, we will need to find one. In our example, we will use the encoding part of a pretrained convolutional autoencoder. The advantage of using an autoencoder is that we do not need labelled data. It can be trained completely unsupervised.

An autoencoder without the decoder

An encoder (autoencoder without the decoder part) that consists of two convolutional layers and one fully connected layer is presented as follows. The parent autoencoder was trained on the MNIST dataset. Therefore, the network takes as input an image of size 28x28x1 and at latent space, encodes it to a 10-dimensional vector, one dimension for each class:

Selecting layers

Once the model is defined, model=CAE_CNN_Encoder(), it is important to select layers that will be initialized with pretrained weights. Pay attention that the structure of both networks, must be the same. So, for example, the following snippet of code will select all layers with name convs of fc:

Note that those lists are populated from tf.global_variables(); if we choose to print its content, we might observe that it holds all the model variables as shown:

Once the layers of the defined graph are grouped into two lists, convolutional and fully connected, you will use tf.Train.Saver to load the weights that you prefer. First, we need to create a saver object, giving as input the list of variables that we want to load from a checkpoint as follows:

In addition to saver_load_autoencoder we need to create another saver object that will allow us to store all the variables of the network to be trained into checkpoints.\

Then, after the graph is initialized with init=tf.global_variables_initializer() and a session is created, we can use saver_load_autoencoder to restore the convolutional layers from a checkpoint as follows:

Note that calling restore overrides the global_variables_initializer an all the selected weights are replaced by the ones from the checkpoint.

Training only some layers

Another important part of transfer learning is freezing the weights of the layers that we don’t want to train, while allowing some layers (typically the final ones).

In TensorFlow, we can pass to our solver only the layers that we want to optimize (in this example, only the FC layers):

Complete source

In this example, we will load the weights from a MNIST convolutional autoencoder example. We will restore the weights of the encoder part only and freeze the CONV layers. That train the FC layers to perform digits classification:

If you enjoyed reading this article and want to learn more about convolutional neural networks. You can explore Hands-On Convolutional Neural Networks with TensorFlow. With an emphatic focus on practical implementation and real-world problems.

Hands-On Convolutional Neural Networks with TensorFlow is a must-read for software engineers and data scientists who want to use CNNs to solve problems.

Programming Bitcoin with Python

Learn how to generate private and public keys, and how to create a multi-signature bitcoin address in this tutorial with python.

In order to get started with bitcoin using Python, you must install Python 3.x and the bitcoin Python library called Pi Bitcoin tools in the system.

The Pi Bitcoin tools library

To install the Pi Bitcoin tools library, open the command-line program and execute the following command:

The best thing about this library is that it does not need to have a bitcoin node on your computer in order for you to start using it. It connects to the bitcoin network and pulls data from places such as Blockchain.info.

For a start, write the equivalent of a Hello World program for bitcoin in Python. In the hello_bitcoin.py script, the demonstration of a new bitcoin address is created using Python. Go through the following steps to run the program:

1. Import the bitcoin library:

2. Generate a private key using the random key function:

3. Display the private key on the screen:

How to generate private keys and public keys

With the private key, a public key is generated. Perform this step by passing the private key that was generated to the privtopub function, as shown here:

Now, with the public key, generate a bitcoin address. Do this by passing the public key that is generated to the pubtoaddr function:

The following screenshot shows the private key, public key, and the bitcoin address that is generated:

Note that a bitcoin address is a single-use token. Just as people use email addresses to send and receive emails, you can use this bitcoin address to send and receive bitcoins. Unlike email addresses, however, people have several bitcoin addresses, and it is a must to use a unique address for every transaction.

Creating a multisignature bitcoin address

A multisignature address is an address that is associated with more than one private key. In this section, you’ll create three private keys. Multisignature addresses are useful in organizations where no single individual is trusted with authorising the spending of bitcoins.

Go through the following steps to create a multisignature bitcoin address:

  1. Create three private keys:

2. Create three public keys from those private keys using the privtopub function:

3. After generating the public keys, create the multisig by passing the three public keys to the mk_ multi-sig_script function. The resulting multisig is passed to the addr script function to create the multisignature bitcoin address.

4. Print the multisignature address and execute the script. The following screenshot shows the output for the multisig bitcoin address:

You can also look at the preexisting¬†bitcoin¬†addresses’ transactional history. You‚Äôll need to first get a valid address from¬†Blockchain.info.

The following screenshot shows the copied address of a bitcoin block:

Pass the copied address to the history function, as shown in the following code, along with the output to get the history of the bitcoin address, including the transactional information:

Hope you found this article interesting. To learn more interesting stuff about bitcoins and Python, you can explore Hands-On Bitcoin Programming with Python. Written with an easy-to-understand approach in mind, Hands-On Bitcoin Programming with Python takes you through numerous practical examples to teach you to build software for mining and create bitcoin using Python.

Transforming Data in Different Ways in Pentaho

The set of operations covered in this tutorial is not a full list of the available options, but includes the most common ones, and will inspire you when you come to implement others.

Note that the files used in this tutorial were built with data downloaded from www.numbeo.com, a site containing information about living conditions in cities and countries worldwide. Before continuing, make sure you download the set of data from https://github.com/PacktPublishing/Pentaho-Data-Integration-Quick-Start-Guide.

Extracting data from existing fields

First, you’ll learn how to extract data from fields that exist in your dataset in order to generate new fields. For the first exercise, you’ll read a file containing data about the cost of living in Europe. The content of the file looks like this:

As you can see, the city field also contains the country name. The purpose of this exercise is to extract the country name from this field. In order to do this, go through the following steps:

  1. Create a new transformation and use a Text file input step to read the cost_of_living_europe.txt file.
  2. Drag a Split Fields step from the Transform category and create a hop from the Text file input towards the Split Fields step.
  3. Double-click the step and configure it, as shown in the following screenshot:
  4. Close the window and run a preview. You’ll see the following:

As you can see, the Split Fields step can be used to split the value of a field into two or more new fields. This step is perfect for the purpose of obtaining the country name because the values were easy to parse. You had a value, then a comma, then another value. This is not always the case, but PDI has other steps for doing similar tasks. Take a look at another method for extracting pieces from a field.

This time, you‚Äôll read a file containing common daily food¬†items¬†and their prices. The file has two fields‚ÄĒfood and price‚ÄĒand looks as follows:

Suppose that you want to split the Food field into three fields for the name, quantity, and number of units respectively. Taking the value in the first row, Milk (regular), (0.25 liter), as an example, the name would be Milk (regular), the quantity would be0.25, and the unit would be liter. You cannot solve this as you did before, but you can use regular expressions instead. In this case, the expression to use will be (.+)\(([0-9.]+)( liter| g| kg| head|)\).*.

Try it using the following steps:

  1. Create a new transformation and use a¬†Text file input¬†step to read the¬†recommended_food.txt¬†file.¬†In order to define the Price as a number, use the format #.00 ‚ā¨.
  2. Drag a Regex Evaluation step from the Scripting category and create a hop from the Text file input toward this new step.
  3. Double-click the step and configure it as shown in the following screenshot. Don’t forget to check the¬†Create fields for capture groups¬†option:
  4. Close the window and run a preview. You will see the following:

The RegEx Evaluation step can be used just to evaluate whether a field matches a regular expression, or to generate new fields, as in this case. By capturing groups, you could create a new field for each group captured from the original field. You will also notice a field named result, which, in this example, has Y as its value. This Y means that the original field matched the given expression.

Note that while the Split Fields step removes the original field from the dataset, the RegEx Evaluation step does not.

These are not the only steps that will allow this kind of operation.

More ways to create new fields

Besides just extracting data from the incoming fields, you can also combine the fields by performing arithmetic operations between them, concatenating String fields, and using other methods. Just as in the previous section, this section will expand on a simple example that will serve you as a model for creating your own process.

For this tutorial, you’ll continue using the file containing data about the cost of living. This time, you’ll generate a new field that creates a new index out of the average of the restaurant price index and the groceries index. To do this, go through the following steps:

  1. Create a new transformation and use a Text file input step to read the cost_of_living_europe.txt file.
  2. Drag a Calculator step from the Transform category and create a hop from the Text file input toward the calculator.
  3. Double-click the step and configure it as shown in the following screenshot:
  4. Close the window and run a preview. You will see the following:

As you can deduce from the configuration window, with the¬†Calculator¬†step, you can create new fields by using temporary fields¬†in the way. In the final dataset, you can see each temporary field‚ÄĒtwo¬†and¬†temp, in the example‚ÄĒas a new column.

The Calculator step is a handy step that can be used for performing many common types of operations, such as arithmetic, string, and date operations, among others.

Of course, there is a simpler way for doing the calculation in the last transformation:

  1. Save the previous transformation under a different name.
  2. Remove the Calculator step. You can do this just by selecting it and pressing Delete.
  3. Drag and drop a User Defined Java Expression step from the Scripting folder. Create a hop from the Text file input step toward this new step.
  4. Double-click the step and configure it as shown in the following screenshot:
  5. Close the window and run a preview. You should see exactly the same result as before.

The¬†Java Expression¬†step is a powerful step that allows you to create fields of any type‚ÄĒnot just numbers‚ÄĒby using a wide variety of expressions, as long as they can be expressed in a single line using Java syntax.

In the last example, using the Java Expression step was simpler than doing the same with a Calculator step. Depending on the case, it can be more convenient to use one or the other.

This was just an example that showed you how to add new fields based on the fields in your dataset. There are many steps available, and with different purposes. You will find them mainly in the Transform folder, but there are some others in different folders in the Design tab. No matter which step you pick, the way you use it is always the same. You add the step at the end of the stream and then configure it properly according to your needs.

If you found this article interesting and helpful, you can check out Pentaho Data Integration Quick Start Guide. Featuring simplified and easy-to-follow examples, Pentaho Data Integration Quick Start Guide takes you through the underlying concepts in a lucid manner, so you can create efficient ETL processes using Pentaho.