Category: Java

RESTful APIs and Testing

A software application product has various software layers such, as the user interface (UI), the business logic layer, middleware, and a database. Testing and certification primarily focuses on data integration tests on the Business layer. API testing is software testing that involves direct API testing, unlike other generic tests, which primarily involve the UI:

The preceding diagram depicts the typical layers of software, with API testing on the Business layer and the functional or UI testing on the Presentation layer.

Understanding API testing approaches

Agreeing on an approach for API testing when beginning API development is an essential API strategy. Let’s look at a few principles of API testing:

  • Clear definition of the scope and a good understanding of the functionality of the API
  • Common testing methodologies such as boundary analysis and equivalence classes are part of API test cases
  • Plan, define, and be ready with input parameters, zero, and sample data for the API
  • Determine and compare expected and actual results, and ensure that there are no differences

API testing types

In this section, we will review the various categories of API testing.

Unit tests

Tests that involve the validation of individual operations are unit tests. The following is one of the sample code snippets of a specific unit test case that validates getting all the investors from the API:

@Test 
    public void fetchAllInvestors() throws Exception{ 
          RequestBuilder requestBuilder =  
               MockMvcRequestBuilders.get( 
                      "/investors").accept( 
                      MediaType.APPLICATION_JSON); 
          MvcResult result =  
              mockMvc.perform(requestBuilder).andReturn(); 
          MockHttpServletResponse response = 
              result.getResponse();      
    } 

API validation tests

All software needs quick evaluation and to assert its purpose of creation. The validation tests need to be run for every function that is developed, at the end of the development process. Unlike unit tests, which focus on particular pieces or functions of the API, validation tests are a higher-level consideration, answering a set of questions so that the development can move on to the next phase.

A set of questions for validation tests could be the following:

  • A product-specific question, such as, is it the necessary function that is asked for?
  • A behavioral question, such as, is the developed function doing what is intended?
  • An efficiency-related question, such as, is the intended function using the necessary code, in an independent and optimized manner?

All of these questions, in essence, serve to validate the API in line with the agreed acceptance criteria and also to ensure its adherence to standards regarding the delivery of expected end goals and meeting user needs and requirements flawlessly.

Functional tests

Tests that involve specific functions of the APIs and their code base are functional tests. Validating the count of active users through the API, regression tests, and test case execution come under functional tests. The following screenshot demonstrates one such functional testing example of investor service validation for user authentication:

UI or end-to-end tests

Tests that involve and assert end-to-end scenarios, including GUI functions and API functions, which in most of the cases, validate every transaction of an application, are grouped under end-to-end tests.

Load testing

As we know, an increase in the number of end users should not affect the performance of the functions of an application. Load testing will uncover such issues and also validate the performance of an API in normal conditions too.

Runtime error detection tests

Tests that help monitor the application and detect problems such as race conditions, exceptions, and resource leaks belong in the runtime error tests category. The following points capture a brief about those factors.

Monitoring APIs

Tests for various implementation errors, handler failures, and other inherent concerns inside the API code base and ensures it does not have any holes that would lead to application insecurity.

Execution errors

Valid requests to the API return responses and asserting them for expected valid responses is common; however, asserting invalid requests for expected failures is also essential as part of an API testing strategy and those tests come under execution errors:

The preceding screenshot depicts an example of expecting an error when the user gives an ID that is not present on the system.

Resource leaks

Negative tests to validate the underlying API resource malfunctions by submitting invalid requests to the API. The resources, in this case, are memory, data, insecurities, timeout operations, and so on.

Error detection

Detect network communication failures. Authentication failures from giving the wrong credentials is an example error detection scenario. These are tests ensure the errors are captured and then resolved as well:

Here’s an authentication error, and the previous screenshot depicts this, as the code returns 401 (as it should); this is an example of an error detection test.

If you found this article interesting, you can explore Hands-On RESTful API Design Patterns and Best Practices to build effective RESTful APIs for enterprise with design patterns and REST framework’s out-of-the-box capabilities. Hands-On RESTful API Design Patterns and Best Practices helps you explore the concepts of service-oriented architecture (SOA), event-driven architecture (EDA), and resource-oriented architecture (ROA).

Reactive Programming in a Nutshell

In this guest post, Peter Verhas attempts to explain reactive programming with simple yet effective examples, which drive home the concept of reactive programming.

Reactive programming is a paradigm that focuses more on where the data flows during computation than on how to compute the result. The problem is best described as several computations that depend on the output of one another, but if several may be executed independently of the other, reactive programming may come into the picture. As a simple example, we can have the following computation that calculates the value of h from some given b, c, e,and f values, using f1, f2, f3, f4,and f5 as simple computational steps:

a = f1(b,c) 
d = f2(e,f) 
k = f3(e,c) 
g = f4(b,f,k) 
h = f5(d,a,g)

If we write these in Java in a conventional way, the methods f1 to f5 will be invoked one after the other. If we have multiple processors and we are able to make the execution parallel, we may also perform some of the methods parallel. This, of course, assumes that these methods are purely computational methods and do not change the state of the environment, and, in this way, they can be executed independently of one another. For example, f1, f2, and f3 can be executed independently of one another. The execution of the  f4function depends on the output of f3, and the execution of f5 depends on the output of f1, f2, and f4.

If we have two processors, we can execute f1 and f2 together, followed by the execution of f3, then f4, and, finally, f5. These are the four steps. If we look at the preceding calculation not as commands but rather as expressions and how the calculations depend on one another, then we do not dictate the actual execution order, and the environment may decide to calculate f1 and f3 together, then f2 and f4, and, finally f5, saving one step. This way, we can concentrate on the data flow and let the reactive environment act upon it without putting in extra constraints:

This is a very simple approach of reactive programming. The description of the calculation in the form of expressions gives the data flow, but in the explanation, we still assumed that the calculation is executed synchronously. If the calculations are executed on different processors located on different machines connected to a network, then the calculation may not and does not need to be synchronous.

Reactive programs can be asynchronously executed if the environment is asynchronous. It may happen that the different calculations, f1 to f4, are implemented and deployed on different machines.

In such a case, the values calculated are sent from one to the other over the network, and the nodes execute the calculation every time there is a change in the inputs. This is very similar to good old analog computers that were created using simple building blocks, and the calculations were done using analog signals.

The program was implemented as an electronic circuit, and when the input voltage or current (usually voltage) changed in the inputs, the analog circuits followed it at light speed, and the result appeared in the output. In such a case, the signal propagation was limited by the speed of light on the wires and analog circuitry speed in the wired modules, which was extremely fast and may beat digital computers.

When we talk about digital computers, the propagation of the signal is digital, and this way, it needs to be sent from one calculation node to the other one, be it some object in JVM or some program on the network. A node has to execute its calculation if either of the following apply:

  • Some of the values in the input have changed
  • The output of the calculation is needed

If the input has not changed, then the result should eventually be the same as the last time; thus, the calculation does not need to be executed again—it would be a waste of resources. If the result of the calculation is not needed, then there is no need to perform the calculation, even if the result would not be the same as the last one. No one cares.

To accommodate this, reactive environments implement two approaches to propagate the values. The nodes may pull the values from the output of other modules. This will ensure that no calculation that is not needed will be executed. The modules may push their output to the next module that depends on them. This approach will ensure that only changed values ignite calculation. Some of the environments may implement a hybrid solution.

When values change in the system, the change is propagated toward the other nodes that again propagate the changes to another node, and so on. If we imagine the calculation dependencies as a directed graph, then the changes travel towards the transitive closure of the changed values along the nodes connected.

The data may travel with all the values from one node output to the other node input, or only the change may travel. The second approach is more complex because it needs the changed data and also meta information that describes what has changed. On the other hand, the gain may be significant when the output and input set of data is huge, and only a small portion of it is changed.

It may also be important to calculate and propagate only the actual delta of the change when there is a high probability that some of the nodes do not change the output for many of the different inputs. In such a case, the change propagation may stop at the node where there is no real change in spite of the changed input values. This can save up a lot of calculation in some of the networks.

In the configuration of the data propagation, the directed acyclic graph can be expressed in the code of the program; it can be configured, or it can even be set up and changed during the execution of the code dynamically. When the program code contains the structure of the graph, the routes and the dependencies are fairly static.

To change the data propagation, the code of the program has to be changed, recompiled, and deployed. If there are multiple network node programs, this may even need multiple deployments that should be carefully furnished to avoid different incompatible versions running on different nodes.

There should be similar considerations when the graph is described in some configuration. In such a case, the compilation of the program(s) may not be needed when only the wiring of the graph is changed, but the burden to have compatible configuration on different nodes in the case of a network execution is still there.

Letting the graph change dynamically also does not solve this problem. The setup and the structure are more flexible and, at the same time, more complex. The data propagated along the edges of the graph may contain not only computational data but also data that drives changes in the graph. Many times, this leads to a very flexible model called higher-order reactive programming.

Reactive programming has a lot of benefits, but, at the same time, it may be very complex, sometimes too complex, for simple problems. It is to be considered when the problem to be solved can easily be described using data graph and simple data propagations. We can separate the description of the problem and the order of the execution of the different blocks. This is the same consideration that we discussed in the previous chapter. We describe more about the what to do part and less about the how to do part.

On the other hand, when the reactive system decides the order of execution, what is changed, and how that should be reflected on the output of other blocks, it should do so without knowing the core of the problem that it is solving. In some situations, coding the execution order manually based on the original problem could perform better.

Note

This is similar to the memory management issue. In modern runtime environments, such as the JVM, Python runtime, Swift programming, or even Golang, there is some automated memory management. When programming in C, the programmer has full control over memory allocation and memory release.

In the case of real-time applications, where the performance and response time is of the utmost importance, there is no way to let an automated garbage collector take time and delay the execution from time to time. In such a case, the C code can be optimized to allocate memory when needed; there is a resource for the allocation and release of memory when possible, and there is time to manage memory.

These programs are better performing than the ones created for the same purpose using a garbage collector. Still, we do not use C in most of the applications because we can afford the extra resources needed for automated memory collection. Even though it would be possible to write a faster code by managing the memory manually, automated code is faster than what an average programmer would have created using C, and also the frequency of programming errors is much lower.

Just as there are some issues that we have to pay attention to when using automated memory management, we have to pay attention to some issues in a reactive environment, which would not exist in the case of manual coding. Still, we use the reactive approach for its benefits.

The most important issue is to avoid loops in thedependency graph. Although it is absolutely perfect to write thedefinition of calculations, a reactive system would probably not be able tocope with these definitions. Some reactive systems may resolve in somesimple-case cyclic redundancy, but that is an extra feature, and we generallyjust have to avoid that. Consider the following computations:

a = b + 3 
b = 4 / a

Here, a depends on b, so when b changes, a is calculated. However, b also depends on a, which is recalculated, and, in this way, the system gets into an infinite loop. The preceding example seems to be simple, but that is the feature of a good example. Real-life problems are not simple, and in a distributed environment, it is extremely hard sometimes to find cyclic redundancy.

Another problem is called a glitch.Consider the following definition:

a = b + 3 
q = b + a

When the parameter b is changed, for example, from 3 to 6, the value of a will change from 6 to 9, and, thus, q will change from 9 to 15. This is very simple. However, the execution order based on the recognition of the changes may first alter the value of q from 9 to 12 before modifying it to 15 in the second step.

This can happen if the calculating node responsible for the calculation of q recognizes the change in b before the value of a as a consequence of the change in the value of b. For a short period of time, the value of q will be 12, which doesn’t match the previous one and the changed state. This value is only a glitch in the system that happens after an input changes and also disappears without any further change in the input in the system:

If you have ever learned the design of logical circuits, then static hazards may ring a bell. They are exactly the same phenomenon.

Reactive programming also assumes that the calculations are stateless. The individual nodes that perform the calculation may have a state in practice and, in most cases, they do. It is not inherently evil to have a state in some calculation. However, debugging something that has a state is significantly more complex than debugging something that is stateless, and functional.

It is also an important aid to the reactive environment, letting it perform different optimizations based on the fact that the calculations are functional. If the nodes have a state, then the calculations may not be rearranged freely because the outcome may depend on the actual evaluation order. These systems may not really be reactive, or, at least, this may be debated.

If this article piqued your interest in reactive programming and Java in general, you can explore Peter Verhas’s Java Projects – Second Edition to learn the fundamentals of Java 11 programming by building industry grade practical projects. Following a learn-as-you-do approach, Java Projects – Second Edition is perfect for anyone who wants to learn the Java programming language (no prior programming experience required).

Transforming Data in Different Ways in Pentaho

The set of operations covered in this tutorial is not a full list of the available options, but includes the most common ones, and will inspire you when you come to implement others.

Note that the files used in this tutorial were built with data downloaded from www.numbeo.com, a site containing information about living conditions in cities and countries worldwide. Before continuing, make sure you download the set of data from https://github.com/PacktPublishing/Pentaho-Data-Integration-Quick-Start-Guide.

Extracting data from existing fields

First, you’ll learn how to extract data from fields that exist in your dataset in order to generate new fields. For the first exercise, you’ll read a file containing data about the cost of living in Europe. The content of the file looks like this:

Rank City Cost of Living Index Rent Index Cost of Living Plus Rent Index Groceries Index Restaurant Price Index Local Purchasing Power Index

1 Zurich, Switzerland 141.25 66.14 105.03 149.86 135.76 142.70

2 Geneva, Switzerland 134.83 71.70 104.38 138.98 129.74 130.96

3 Basel, Switzerland 130.68 49.68 91.61 127.54 127.22 139.01

4 Bern, Switzerland 128.03 43.57 87.30 132.70 119.48 112.71

5 Lausanne, Switzerland 127.50 52.32 91.24 126.59 132.12 127.95

6 Reykjavik, Iceland 123.78 57.25 91.70 118.15 133.19 88.95

...

As you can see, the city field also contains the country name. The purpose of this exercise is to extract the country name from this field. In order to do this, go through the following steps:

  1. Create a new transformation and use a Text file input step to read the cost_of_living_europe.txt file.
  2. Drag a Split Fields step from the Transform category and create a hop from the Text file input towards the Split Fields step.
  3. Double-click the step and configure it, as shown in the following screenshot:
  4. Close the window and run a preview. You’ll see the following:

As you can see, the Split Fields step can be used to split the value of a field into two or more new fields. This step is perfect for the purpose of obtaining the country name because the values were easy to parse. You had a value, then a comma, then another value. This is not always the case, but PDI has other steps for doing similar tasks. Take a look at another method for extracting pieces from a field.

This time, you’ll read a file containing common daily food items and their prices. The file has two fields—food and price—and looks as follows:

Food Price

Milk (regular), (0.25 liter) 0.19 €

Loaf of Fresh White Bread (125.00 g) 0.24 €

Rice (white), (0.10 kg) 0.09 €

Eggs (regular) (2.40) 0.33 €

Local Cheese (0.10 kg) 0.89 €

Chicken Breasts (Boneless, Skinless), (0.15 kg) 0.86 €

...

Suppose that you want to split the Food field into three fields for the name, quantity, and number of units respectively. Taking the value in the first row, Milk (regular), (0.25 liter), as an example, the name would be Milk (regular), the quantity would be0.25, and the unit would be liter. You cannot solve this as you did before, but you can use regular expressions instead. In this case, the expression to use will be (.+)\(([0-9.]+)( liter| g| kg| head|)\).*.

Try it using the following steps:

  1. Create a new transformation and use a Text file input step to read the recommended_food.txt file. In order to define the Price as a number, use the format #.00 €.
  2. Drag a Regex Evaluation step from the Scripting category and create a hop from the Text file input toward this new step.
  3. Double-click the step and configure it as shown in the following screenshot. Don’t forget to check the Create fields for capture groups option:
  4. Close the window and run a preview. You will see the following:

The RegEx Evaluation step can be used just to evaluate whether a field matches a regular expression, or to generate new fields, as in this case. By capturing groups, you could create a new field for each group captured from the original field. You will also notice a field named result, which, in this example, has Y as its value. This Y means that the original field matched the given expression.

Note that while the Split Fields step removes the original field from the dataset, the RegEx Evaluation step does not.

These are not the only steps that will allow this kind of operation.

More ways to create new fields

Besides just extracting data from the incoming fields, you can also combine the fields by performing arithmetic operations between them, concatenating String fields, and using other methods. Just as in the previous section, this section will expand on a simple example that will serve you as a model for creating your own process.

For this tutorial, you’ll continue using the file containing data about the cost of living. This time, you’ll generate a new field that creates a new index out of the average of the restaurant price index and the groceries index. To do this, go through the following steps:

  1. Create a new transformation and use a Text file input step to read the cost_of_living_europe.txt file.
  2. Drag a Calculator step from the Transform category and create a hop from the Text file input toward the calculator.
  3. Double-click the step and configure it as shown in the following screenshot:
  4. Close the window and run a preview. You will see the following:

As you can deduce from the configuration window, with the Calculator step, you can create new fields by using temporary fields in the way. In the final dataset, you can see each temporary field—two and temp, in the example—as a new column.

The Calculator step is a handy step that can be used for performing many common types of operations, such as arithmetic, string, and date operations, among others.

Of course, there is a simpler way for doing the calculation in the last transformation:

  1. Save the previous transformation under a different name.
  2. Remove the Calculator step. You can do this just by selecting it and pressing Delete.
  3. Drag and drop a User Defined Java Expression step from the Scripting folder. Create a hop from the Text file input step toward this new step.
  4. Double-click the step and configure it as shown in the following screenshot:
  5. Close the window and run a preview. You should see exactly the same result as before.

The Java Expression step is a powerful step that allows you to create fields of any type—not just numbers—by using a wide variety of expressions, as long as they can be expressed in a single line using Java syntax.

In the last example, using the Java Expression step was simpler than doing the same with a Calculator step. Depending on the case, it can be more convenient to use one or the other.

This was just an example that showed you how to add new fields based on the fields in your dataset. There are many steps available, and with different purposes. You will find them mainly in the Transform folder, but there are some others in different folders in the Design tab. No matter which step you pick, the way you use it is always the same. You add the step at the end of the stream and then configure it properly according to your needs.

If you found this article interesting and helpful, you can check out Pentaho Data Integration Quick Start Guide. Featuring simplified and easy-to-follow examples, Pentaho Data Integration Quick Start Guide takes you through the underlying concepts in a lucid manner, so you can create efficient ETL processes using Pentaho.

How to Use the jcmd Command for the JVM

This article will focus on the diagnostic command introduced with Java 9 as a command-line utility, jcmd. If the bin folder is on the path, you can invoke it by typing jcmd on the command line. Otherwise, you have to go to the bin directory or prepend the jcmd in your examples with the full or relative (to the location of your command line window) path to the bin folder.

If you open the bin folder of the Java installation, you can find quite a few command-line utilities there. These can be used to diagnose issues and monitor an application deployed with the Java Runtime Environment (JRE). They use different mechanisms to get the data they report. The mechanisms are specific to the Virtual Machine (VM) implementation, operating systems, and release. Typically, only a subset of these tools is applicable to a given issue.

If you do type it and there is no Java process currently running on the machine, you’ll get back only one line, as follows:

87863 jdk.jcmd/sun.tools.jcmd.JCmd

This shows that only one Java process is currently running (the jcmd utility itself) and it has the process identifier(PID) of 87863 (which will be different with each run).

JAVA Example

Now run a Java program, for example:

java -cp ./cookbook-1.0.jar

                   com.packt.cookbook.ch11_memory.Chapter11Memory

The output of jcmd will show (with different PIDs) the following:

87864 jdk.jcmd/sun.tools.jcmd.JCmd

87785 com.packt.cookbook.ch11_memory.Chapter11Memory

If entered without any options, the jcmd utility reports the PIDs of all the currently running Java processes. After getting the PID, you can then use jcmd to request data from the JVM that runs the process:

jcmd 88749 VM.version

Alternatively, you can avoid using PID (and calling jcmd without parameters) by referring to the process by the main class of the application:

jcmd Chapter11Memory VM.version

You can read the JVM documentation for more details about the jcmd utility and how to use it.

How to do it…

jcmd is a utility that allows you to issue commands to a specified Java process:

  1. Get the full list of the jcmdcommands available for a particular Java process by executing the following line:

     

jcmd PID/main-class-name help

Instead of PID/main-class, enter the process identifier or the main class name. The list is specific to JVM, so each listed command requests the data from the specific process.

  1. In JDK 8, the following jcmdcommands were available:
JFR.stop

JFR.start

JFR.dump

JFR.check

VM.native_memory

VM.check_commercial_features

VM.unlock_commercial_features

ManagementAgent.stop

ManagementAgent.start_local

ManagementAgent.start

GC.rotate_log

Thread.print

GC.class_stats

GC.class_histogram

GC.heap_dump

GC.run_finalization

GC.run

VM.uptime

VM.flags

VM.system_properties

VM.command_line

VM.version

JDK 9 introduced the following jcmd commands (JDK 18.3 and JDK 18.9 did not add any new commands):

  • queue: Prints the methods queued for compilation with either C1 or C2 (separate queues)
  • codelist: Prints n-methods (compiled) with full signature, address range, and state (alive, non-entrant, and zombie), and allows the selection of printing to stdout, a file, XML, or text printout
  • codecache: Prints the content of the code cache, where the JIT compiler stores the generated native code to improve performance
  • directives_add file: Adds compiler directives from a file to the top of the directives stack
  • directives_clear: Clears the compiler directives stack (leaves the default directives only)
  • directives_print: Prints all the directives on the compiler directives stack from top to bottom
  • directives_remove: Removes the top directive from the compiler directives stack
  • heap_info: Prints the current heap parameters and status
  • finalizer_info: Shows the status of the finalizer thread, which collects objects with a finalizer (that is, a finalize()method)
  • configure: Allows configuring the Java Flight Recorder
  • data_dump: Prints the Java Virtual Machine Tool Interface data dump
  • agent_load: Loads (attaches) the Java Virtual Machine Tool Interface agent
  • status: Prints the status of the remote JMX agent
  • print: Prints all the threads with stack traces
  • log [option]: Allows setting the JVM log configuration at runtime, after the JVM has started (the availability can be seen using VM.log list)
  • info: Prints the unified JVM info (version and configuration), a list of all threads and their state (without thread dump and heap dump), heap summary, JVM internal events (GC, JIT, safepoint, and so on), memory map with loaded native libraries, VM arguments and environment variables, and details of the operation system and hardware
  • dynlibs: Prints information about dynamic libraries
  • set_flag: Allows setting the JVM writable(also called manageable) flags
  • stringtableand VM.symboltable: Print all UTF-8 string constants
  • class_hierarchy [full-class-name]: Prints all the loaded classes or just a specified class hierarchy
  • classloader_stats: Prints information about the classloader
  • print_touched_methods: Prints all the methods that have been touched (have been read at least) at runtime

As you can see, these new commands belong to several groups, denoted by the prefix compiler, garbage collector (GC), Java Flight Recorder (JFR), Java Virtual Machine Tool Interface (JVMTI), Management Agent (related to remote JMX agent), thread, and VM.

How it works…

  1. To get help for the jcmdutility, run the following command:
jcmd -h

Here is the result of the command:

It tells you that the commands can also be read from the file specified after -f, and there is a PerfCounter.print command, which prints all the performance counters (statistics) of the process.

  1. Run the following command:
jcmd Chapter11Memory GC.heap_info

The output may look similar to this screenshot:

It shows the total heap size and how much of it was used, the size of a region in the young generation and how many regions are allocated, and the parameters of Metaspace and class space.

  1. The following command is very helpful in case you are looking for runaway threads or would like to know what else is going on behind the scenes:
jcmd Chapter11Memory Thread.print

Here is a fragment of the possible output:

  1. This command is probably used most often, as it produces a wealth of information about the hardware, the JVM process as a whole, and the current state of its components:
jcmd Chapter11Memory VM.info

It starts with a summary, as follows:

The general process description is as follows:

Then the details of the heap are shown (this is only a tiny fragment of it):

It then prints the compilation events, GC heap history, de-optimization events, internal exceptions, events, dynamic libraries, logging options, environment variables, VM arguments, and many parameters of the system running the process.

The jcmd commands give a deep insight into the JVM process, which helps to debug and tune the process for best performance and optimal resource usage.

If you found this article interesting, you can dive into Java 11 Cookbook – Second Edition to explore the new features added to Java 11 that will make your application modular, secure, and fast. Java 11 Cookbook – Second Edition offers a range of software development solutions with simple and straightforward Java 11 code examples to help you build a modern software system.

How to Develop a Real-Time Object Detection Project

Developing a real-time object detection project

You can develop a video object classification application using pre-trained YOLO models (that is, transfer learning), Deeplearning4j (DL4J), and OpenCV that can detect labels such as cars and trees inside a video frame. You can find the relevant code files for this tutorial at https://github.com/PacktPublishing/Java-Deep-Learning-Projects/tree/master/Chapter06. This application is also about extending an image detection problem to video detection. Time to get started!

Step 1 – Loading a pre-trained YOLO model

Since Alpha release 1.0.0, DL4J provides a Tiny YOLO model via ZOO. For this, you need to add a dependency to your Maven friendly pom.xml file:

<dependency>

  <groupId>org.deeplearning4j</groupId>

  <artifactId>deeplearning4j-zoo</artifactId>

  <version>${dl4j.version}</version>

</dependency>

Apart from this, if possible, make sure that you utilize the CUDA and cuDNN by adding the following dependencies:

<dependency>

  <groupId>org.nd4j</groupId>

  <artifactId>nd4j-cuda-9.0-platform</artifactId>

  <version>${nd4j.version}</version>

</dependency>

<dependency>

  <groupId>org.deeplearning4j</groupId>

  <artifactId>deeplearning4j-cuda-9.0</artifactId>

  <version>${dl4j.version}</version>

</dependency>

Now, use the below code to load the pre-trained Tiny YOLO model as a Computation Graph. You can use the PASCAL Visual Object Classes (PASCAL VOC) dataset (see more at http://host.robots.ox.ac.uk/pascal/VOC/) to train the YOLO model.

private ComputationGraph model;

private TinyYoloModel() {

        try {

            model = (ComputationGraph) new TinyYOLO().initPretrained();

            createObjectLabels();

        } catch (IOException e) {

            throw new RuntimeException(e);

        }

    }

In the above code segment, the createObjectLabels() method refers to the labels from the PASCAL Visual Object Classes (PASCAL VOC) dataset. The signature of the method can be seen as follows:

private HashMap<Integer, String> labels; 

void createObjectLabels() {

        if (labels == null) {

            String label = "aeroplanen" + "bicyclen" + "birdn" + "boatn" + "bottlen" + "busn" + "carn" +

                    "catn" + "chairn" + "cown" + "diningtablen" + "dogn" + "horsen" + "motorbiken" +

                    "personn" + "pottedplantn" + "sheepn" + "sofan" + "trainn" + "tvmonitor";

            String[] split = label.split("\n");

            int i = 0;

            labels = new HashMap<>();

            for(String label1 : split) {

                labels.put(i++, label1);

            }

        }

    }

Now, create a Tiny YOLO model instance:

static final TinyYoloModel yolo = new TinyYoloModel();

    public static TinyYoloModel getPretrainedModel() {

        return yolo;

    }

Take a look at the model architecture and the number of hyper parameters in each layer:

TinyYoloModel model = TinyYoloModel.getPretrainedModel(); System.out.println(TinyYoloModel.getSummary());

Network summary and layer structure of a pre-trained Tiny YOLO model

Your Tiny YOLO model has around 1.6 million parameters across its 29-layer network. However, the original YOLO 2 model has more layers. You can look at the original YOLO 2 at https://github.com/yhcc/yolo2/blob/master/model_data/model.png.

Step 2 – Generating frames from video clips

To deal with real-time video, you can use video processing tools or frameworks such as JavaCV that can split a video into individual frames. Take the image height and width. For this, include the following dependency in the pom.xml file:

<dependency>

  <groupId>org.bytedeco</groupId>

  <artifactId>javacv-platform</artifactId>

  <version>1.4.1</version>

</dependency>

JavaCV uses wrappers from the JavaCPP presets of libraries commonly used by researchers in the field of computer vision (for example, OpenCV and FFmpeg). It provides utility classes to make their functionality easier to use on the Java platform, including Android.

For this project, there are two video clips (each 1 minute long) that should give you a glimpse into an autonomous driving car. This dataset has been downloaded from the following YouTube links:

After downloading them, they were renamed as follows:

  • SelfDrivingCar_Night.mp4
  • SelfDrivingCar_Day.mp4

When you play these clips, you’ll see how Germans drive their cars at 160 km/h or even faster. Now, parse the video (first use day 1) and see some properties to get an idea of video quality hardware requirements:

String videoPath = "data/SelfDrivingCar_Day.mp4";

FFmpegFrameGrabber frameGrabber = new FFmpegFrameGrabber(videoPath);

frameGrabber.start();

Frame frame;

double frameRate = frameGrabber.getFrameRate();

System.out.println("The inputted video clip has " + frameGrabber.getLengthInFrames() + " frames");

System.out.println("Frame rate " + framerate + "fps");
>>> 
 The inputted video clip has 1802 frames. The inputted video clip has frame rate of 29.97002997002997.

The inputted video clip has 1802 frames. The inputted video clip has frame rate of 29.97002997002997.

Now grab each frame and use Java2DFrameConverter to convert frames to JPEG images:

Java2DFrameConverter converter = new Java2DFrameConverter();

// grab the first frame

frameGrabber.setFrameNumber(1);

frame = frameGrabber.grab();

BufferedImage bufferedImage = converter.convert(frame);

System.out.println("First Frame" + ", Width: " + bufferedImage.getWidth() + ", Height: " + bufferedImage.getHeight());



// grab the second frame

frameGrabber.setFrameNumber(2);

frame = frameGrabber.grab();

bufferedImage = converter.convert(frame);

System.out.println("Second Frame" + ", Width: " + bufferedImage.getWidth() + ", Height: " + bufferedImage.getHeight());
>>> 

  First Frame: Width-640, Height-360 Second Frame: Width-640, Height-360

The above code will generate 1,802 JPEG images against an equal number of frames. Take a look at the generated images:

From video clip to video frame to image

Thus, the 1-minute long video clip has a fair number (that is, 1,800) of frames and is 30 frames per second. In short, this video clip has 720p video quality. So, you can understand that processing this video should require good hardware; in particular, having a GPU configured should help.

Step 3 – Feeding generated frames into the Tiny YOLO model

Now that you know some properties of the clip, start generating the frames to be passed to the Tiny YOLO pre-trained model. First, look at a less important but transparent approach:

private volatileMat[] v = new Mat[1];

private String windowName = "Object Detection from Video";

try {

    for(int i = 1; i < frameGrabber.getLengthInFrames();    

    i+ = (int)frameRate) {

                frameGrabber.setFrameNumber(i);

                frame = frameGrabber.grab();

                v[0] = new OpenCVFrameConverter.ToMat().convert(frame);

                model.markObjectWithBoundingBox(v[0], frame.imageWidth,

                                               frame.imageHeight, true, windowName);

                imshow(windowName, v[0]);



                char key = (char) waitKey(20);

                // Exit on escape:

                if (key == 27) {

                    destroyAllWindows();

                    break;

                }

            }

        } catch (IOException e) {

            e.printStackTrace();

        } finally {

            frameGrabber.stop();

        }

        frameGrabber.close();

In the above code, you send each frame to the model. Then, you use the Mat class to represent each frame in an n-dimensional, dense, numerical multi-channel (that is, RGB) array.

In other words, you split the video clip into multiple frames and pass into the Tiny YOLO model to process them one by one. This way, you applied a single neural network to the full image.

Step 4 – Real Object detection from image frames

Tiny YOLO extracts the features from each frame as an n-dimensional, dense, numerical multi-channel array. Then, each image is split into a smaller number of rectangles (boxes):

public void markObjectWithBoundingBox(Mat file, int imageWidth, int imageHeight, boolean newBoundingBOx,

String winName) throws Exception {

        // parameters matching the pretrained TinyYOLO model

int W = 416; // width of the video frame 

        int H = 416; // Height of the video frame

        int gW = 13; // Grid width

        int gH = 13; // Grid Height

        double dT = 0.5; // Detection threshold



Yolo2OutputLayer outputLayer = (Yolo2OutputLayer) model.getOutputLayer(0);

        if (newBoundingBOx) {

            INDArray indArray = prepareImage(file, W, H);

            INDArray results = model.outputSingle(indArray);

            predictedObjects = outputLayer.getPredictedObjects(results, dT);

            System.out.println("results = " + predictedObjects);

            markWithBoundingBox(file, gW, gH, imageWidth, imageHeight);

        } else {

            markWithBoundingBox(file, gW, gH, imageWidth, imageHeight);

        }

        imshow(winName, file);

    }

In the above code, the prepareImage() method takes video frames as images, parses them using the NativeImageLoader class, does the necessary preprocessing, and extracts image features that are further converted into a INDArray format, consumable by the model:

INDArray prepareImage(Mat file, int width, int height) throws IOException {

        NativeImageLoader loader = new NativeImageLoader(height, width, 3);

        ImagePreProcessingScaler imagePreProcessingScaler = new ImagePreProcessingScaler(0, 1);

        INDArray indArray = loader.asMatrix(file);

        imagePreProcessingScaler.transform(indArray);

        return indArray;

    }

Then, the markWithBoundingBox() method is used for non-max suppression in the case of more than one bounding box.

Step 5 – Non-max suppression in case of more than one bounding box

As YOLO predicts more than one bounding box per object, non-max suppression is implemented; it merges all detections that belong to the same object. Therefore, instead of using bxbybh, and bw, you can use the top-left and bottom-right points. gridWidth and gridHeight are the number of small boxes you split your image into. In this case, it is 13 x 13, where w and h are the original image frame dimensions:

void markObjectWithBoundingBox(Mat file, int gridWidth, int gridHeight, int w, int h, DetectedObject obj) { 

        double[] xy1 = obj.getTopLeftXY();

        double[] xy2 = obj.getBottomRightXY();

        int predictedClass = obj.getPredictedClass();

int x1 = (int) Math.round(w * xy1[0] / gridWidth);

        int y1 = (int) Math.round(h * xy1[1] / gridHeight);

        int x2 = (int) Math.round(w * xy2[0] / gridWidth);

        int y2 = (int) Math.round(h * xy2[1] / gridHeight);

        rectangle(file, new Point(x1, y1), new Point(x2, y2), Scalar.RED);

        putText(file, labels.get(predictedClass), new Point(x1 + 2, y2 - 2),

                                 FONT_HERSHEY_DUPLEX, 1, Scalar.GREEN);

    }

Finally, remove those objects that intersect with the max suppression, as follows:

static void removeObjectsIntersectingWithMax(ArrayList<DetectedObject> detectedObjects,

DetectedObject maxObjectDetect) {

        double[] bottomRightXY1 = maxObjectDetect.getBottomRightXY();

        double[] topLeftXY1 = maxObjectDetect.getTopLeftXY();

        List<DetectedObject> removeIntersectingObjects = new ArrayList<>();

for(DetectedObject detectedObject : detectedObjects) {

            double[] topLeftXY = detectedObject.getTopLeftXY();

            double[] bottomRightXY = detectedObject.getBottomRightXY();

            double iox1 = Math.max(topLeftXY[0], topLeftXY1[0]);

            double ioy1 = Math.max(topLeftXY[1], topLeftXY1[1]);



            double iox2 = Math.min(bottomRightXY[0], bottomRightXY1[0]);

            double ioy2 = Math.min(bottomRightXY[1], bottomRightXY1[1]);



            double inter_area = (ioy2 - ioy1) * (iox2 - iox1);



            double box1_area = (bottomRightXY1[1] - topLeftXY1[1]) * (bottomRightXY1[0] - topLeftXY1[0]);

            double box2_area = (bottomRightXY[1] - topLeftXY[1]) * (bottomRightXY[0] - topLeftXY[0]);



            double union_area = box1_area + box2_area - inter_area;

            double iou = inter_area / union_area; 



            if(iou > 0.5) {

                removeIntersectingObjects.add(detectedObject);

            }

        }

        detectedObjects.removeAll(removeIntersectingObjects);

    }

In the second block, you scaled each image into 416 x 416 x 3 (that is, W x H x 3 RGB channels). This scaled image is then passed to Tiny YOLO for predicting and marking the bounding boxes as follows:

Your Tiny YOLO model predicts the class of an object detected in a bounding box

Once the markObjectWithBoundingBox() method is executed, the following logs containing the predicted class, bxbybhbw, and confidence (that is, the detection threshold) will be generated and shown on the console:

 [4.6233e-11]], predictedClass=6),

DetectedObject(exampleNumber=0,

centerX=3.5445247292518616, centerY=7.621537864208221,

width=2.2568163871765137, height=1.9423424005508423,

confidence=0.7954192161560059,

classPredictions=[[ 1.5034e-7], [ 3.3064e-9]...

Step 6 – Wrapping up everything and running the application

Up to this point, you know the overall workflow of your approach. You can now wrap up everything and see whether it really works. However, before this, take a look at the functionalities of different Java classes:

  • java: This shows how to grab frames from the video clip and save each frame as a JPEG image. Besides, it also shows some exploratory properties of the video clip.
  • java: This instantiates the Tiny YOLO model and generates the label. It also creates and marks the object with the bounding box. Nonetheless, it shows how to handle non-max suppression for more than one bounding box per object.
  • java: This main class continuously grabs the frames and feeds them to the Tiny YOLO model (until the user presses the Esckey). Then, it predicts the corresponding class of each object successfully detected inside the normal or overlapped bounding boxes with non-max suppression (if required).

In short, first, you create and instantiate the Tiny YOLO model. Then, you grab the frames and treat each frame as a separate JPEG image. Next, you pass all the images to the model and the model does its trick as outlined previously. The whole workflow can now be depicted with some Java code as follows:

// ObjectDetectorFromVideo.java

public class ObjectDetectorFromVideo{

    privatevolatile Mat[] v = new Mat[1];

    private String windowName;



    public static void main(String[] args) throws java.lang.Exception {

        String videoPath = "data/SelfDrivingCar_Day.mp4";

        TinyYoloModel model = TinyYoloModel.getPretrainedModel();

        

        System.out.println(TinyYoloModel.getSummary());

        new ObjectDetectionFromVideo().startRealTimeVideoDetection(videoPath, model);

    }



    public void startRealTimeVideoDetection(String videoFileName, TinyYoloModel model)

throwsjava.lang.Exception {

        windowName = "Object Detection from Video";

        FFmpegFrameGrabber frameGrabber = new FFmpegFrameGrabber(videoFileName);

        frameGrabber.start();



        Frame frame;

        double frameRate = frameGrabber.getFrameRate();

        System.out.println("The inputted video clip has " + frameGrabber.getLengthInFrames() + " frames");

        System.out.println("The inputted video clip has frame rate of " + frameRate);



        try {

            for(int i = 1; i < frameGrabber.getLengthInFrames(); i+ = (int)frameRate) {

                frameGrabber.setFrameNumber(i);

                frame = frameGrabber.grab();

                v[0] = new OpenCVFrameConverter.ToMat().convert(frame);

                model.markObjectWithBoundingBox(v[0], frame.imageWidth, frame.imageHeight,

                                                true, windowName);

                imshow(windowName, v[0]);



                char key = (char) waitKey(20);

                // Exit on escape:

                if(key == 27) {

                    destroyAllWindows();

                    break;

                }

            }

        } catch (IOException e) {

            e.printStackTrace();

        } finally {

            frameGrabber.stop();

        }

        frameGrabber.close();

    }

}

Once the preceding class is executed, the application should load the pre-trained model and the UI should be loaded, showing each object being classified:

Your Tiny YOLO model can predict multiple cars simultaneously from a video clip (day)

Now, to see the effectiveness of your model even in night mode, perform a second experiment on the night dataset. To do this, just change one line in the main() method, as follows:

String videoPath = "data/SelfDrivingCar_Night.mp4";

Once the preceding class is executed using this clip, the application should load the pre-trained model and the UI should be loaded, showing each object being classified:

Your Tiny YOLO model can predict multiple cars simultaneously from a video clip (night)

Furthermore, to see the real-time output, execute the given screen recording clips showing the output of the application.

If you found this interesting, you can explore Md. Rezaul Karim’s Java Deep Learning Projects to build and deploy powerful neural network models using the latest Java deep learning libraries. Java Deep Learning Projects starts with an overview of deep learning concepts and then delves into advanced projects.