10/8/09

The Role of pureQuery in Legacy Modernization: From DB2 COBOL or PL/I Applications to Java – One SQL Statement at a Time

The Unavoidable Task of Legacy Modernization
New technologies often lure us to write programs using such cool features like metaprogramming in Groovy, Erlang style parallelism in Scala, Hadoop clusters, or the power of our freshly developed Domain Specific Language. However, every now and then we face a serious task of migration a legacy system from COBOL or PL/I to Java. The languages themselves are old, the developers are difficult to find and there is usually an organizational mandate to move from legacy language to Java. The application that needs to be replaced, albeit elderly, may be a critical one, processing billions of dollars a year.

Challenges in Language Transition
Modernizing legacy systems is sometimes done by external teams that are proficient in Java. However, such teams often do not have the understanding of the application whose replacement they are creating. The unfamiliarity with the problem domain often results in a new application being an inadequate replacement. Some organizations, aware of this problem, engage original developers, after they receive Java training and mentoring. The idea is that they will be able to rewrite the old system in Java, having the advantage of understanding the legacy solution and the problem it addresses.

However, COBOL or PL/I developers trained to program in Java have difficulties in the migration. The syntax of the new language is not the key problem; it is the new way of thinking. There is a danger of developers using the Java syntax and the old way of thinking about development. These challenges are widely known and discussed in many articles in industry and academia.

Challenges in Database Access: Productivity, Performance, Security
There is one more difficulty that is not receiving enough attention: the difference in the way how database access is performed in Java and in legacy systems. Most legacy applications use embedded SQL, with convenient mechanisms for passing data between host language and embedded SQL statements. The migration to Java exposes the developers to JDBC, and its low level APIs which require many lines of code to accomplish a simple task. A common reaction to this is that the developers question how this new technology is better than legacy when they are forced to write more code to accomplish the same task. The JDBC database access is simply less productive then a legacy technology. There was an attempt to remedy this situation by creating embedded SQL for Java known as SQLJ, but this solution never gained much popularity. Various object-relational mapping solutions like Hibernate or JPA improve productivity, but at a cost in performance and less than ideal solution when a database already exists. Notice that an often touted performance benefit of cashing, provided by some object-relational mappers, does not help if the applications do not try to access the same data multiple times.

But it does not end there: when developers run their programs they are in for a further surprise: the new program runs slower than the new one. An examination reveals that the legacy systems use static SQL, while JDBC uses dynamic SQL.

When using static SQL with DB2, the development tool sends the SQL to the database during the development time. The SQL can be optimized at the highest optimization level, taking into considerations the properties of data actually stored in the database. On DB2, this compilation and preparation process is called binding. The applications at runtime execute this compiled SQL resulting in fast execution. JDBC applications use dynamic SQL: the SQL is sent to the database first time during the execution of the program. The level of optimization is much weaker, as the database does not performs a thorough optimization as it may take too much time. Albeit a database can use a cache of used SQL statements, this is still less efficient than running highly optimized, precompiled static SQL.

To dismay of DBAs, JDBC can send any SQL to the database for execution. Contrast this with static SQL, which can use a number of security controls restricting the SQL that will run against the database. A low hanging fruit of static SQL is the immunity against the SQL injection attacks.

Legacy Migration with Optim pureQuery Runtime and Optim Development Studio
Optim pureQuery runtime and its APIs allow us to be productive in the migration to Java and to achieve good performance and security. While there is more to migration from legacy to Java than just the database code, we see that this is often a critical part of an application. In migrating legacy to Java, we have noticed a pattern for effective migration using pureQuery persistence API and Optim Development Studio tooling. Here is how to accomplish it using Optim Development Studio using a seven step process.
  1. Create a Java project in Optim Development Studio. This is where your Java code will reside.
  2. Create a Data Development project. This is where you will collect and work with SQL from the legacy system.
  3. Find the SQL in the legacy application and copy it to a SQL script in a Data Development project. You may want to run the SQL in order to ensure that you understand what it is doing.
  4. Generate pureQuery code using the pureQuery. This will result in creation of annotated data interface that contains SQL code in annotations. Query results are passed through generated Java beans. The implementation for the interface is generated as well.
  5. Repeat step 4 for all SQL statements from the legacy application.
  6. Implement application logic in Java, using the generated data interface and beans for passing the values in and out of SQL. As an additional boost in performance, developers can arrange the execution of their code so that it uses heterogeneous batch – significantly reducing the number of interactions of the applications with the database: various SQL statements are accumulated and then send to the database in one interaction only. Notice that some object-relational mappers can send the SQL to the database at the end of transaction, but this can still incur the same number of database interactions as before.
  7. When ready for deployment, bind your application from Optim Development Studio. During execution, the application can then use static SQL on DB2.
The Advantages of pureQuery over Coding Database Access with JDBC
This approach to modernization of legacy applications has several advantages over coding with JDBC.
  1. Productivity. Optim Development Studio generates complete pureQuery code for database access. This feature is very much appreciated by the legacy systems developers who rightfully find JDBC code to be error prone and difficult to write. Even if we would not use the convenient Development Studio tooling, hand written pureQuery code is significantly shorter than equivalent JDBC code. In addition, when editing SQL from Java code, Optim Development Studio provide for SQL autocompletion for tables and columns using a live database connection.
  2. Performance. Performance is improved in two ways: first through the use of static SQL, and second through the use of heterogeneous batch. In our experience, when accessing a database residing on a different machine from the application, heterogeneous batch has a decisive performance advantage over JDBC.
  3. Security. This is due to the ability to restrict the SQL statements run on the database. The applications do not lose the security controls they had in the legacy application.
  4. Traceability and Troubleshooting. Optim Development Studio tooling provides for traceability between the SQL that is sent to the database and the Java statements that contain the SQL. This traceability mechanism provides for better troubleshooting, as problematic SQL can be easily identified in the application code. In addition, for the SQL in the application, one can use performance optimization tools like Optim Query Tuner to optimize SQL, use explanation of access path with Visual Explain and use performance metrics for queries.
Conclusion
We found the migration of legacy DB2 applications to Java is not as difficult as it used to be, thanks to Optim pureQuery runtime and Optim Development Studio. The pureQuery solution fits the gap that exists between cumbersome and verbose JDBC code and object-relational mapping which often underperform when using with existing databases. The presented seven step migration process is easy to follow, even for developers new to Java. The migrated applications execute SQL securely and efficiently, offering a rich functionality for traceability and sophisticated performance tuning.

The conclusion: If you are planning to migrate a legacy DB2 application to Java, I recommend to give pureQuery and Optim Development Studio a try – you might be pleasantly surprised with its efficiency and productivity.

The techniques described in this post are explained in detail in our course Development of Database Applications with pureQuery and Optim Development Studio.

Happy legacy modernization!

No comments:

Post a Comment