Dec 27, 2014

SQL benchmarking tool for Java

This article describes a simple java tool for recording and replaying MySQL queries while measuring their execution times. The replay mechanism makes the measurements comparable across different environments.

Why do we need another benchmarking tool? What's so special about this one?
  • It's capable of reading and replaying MySQL's 'general query logs', not 'slow query logs' and not tcpdumps. Percona Playback in version 0.3 doesn't supports 'general query logs', version 0.6 seems to also support 'general query logs'.
  • It's measuring execution times of all queries and aggregating identical executions. pt-query-digest supports general query log, but only without measuring execution times.
  • It's written in pure Java, making it the most realistic way of running queries against the server if you're developing a Java application.
  • It's small and easy to use, not a full blown full featured benchmarking suite.
  • It does not offer advanced filtering methods, just restriction to connection ids and prefix matchings.
  • Measurements can be exported in either CSV for easy Excel import or in JETM-Style.


In some cases, it's necessary to run a simple single-component benchmark on a MySQL server. In opposition to a full-stack benchmark, a single-component benchmark only targets the isolated database. It can easily be modified to target the network connection, too.

The benchmarking strategy presented in this article, records all queries during a real application run. Using a simple benchmarking tool written in java, allows you to re-run all recorded queries multiple times and record all query runtimes.

Besides slow query log and the built in profiling mechanisms, MySQL offers the often unnoticed general query log. You can use this mechanism to log all queries which run while executing various use cases in your application. My idea was to use the general query log to re-run all queries on the MySQL server and measure their timings. This enables me to reproduce the same use case multiple times on the same or on different MySQL servers or in different network configurations. Measuring the performance based on the same query execution makes measurements comparable.

This strategy is useful if you want to:
  • simply record and replay MySQL queries including all MySQL-parameters while executing a specific use case
  • compare MySQL performance on different server instances
  • compare MySQL performance in different network configurations to spot possible network bottlenecks
  • monitor MySQL performance continuously
  • spot and count slowest queries while executing a possibly complex use case

The following picture shows the whole benchmarking strategy in a sequential process: You execute a use case in your application with enabled general log in mysql. You can use this logfile to re-run all queries which were executed on the server during your usecase. The benchmark results can be viewed as text files or evaluated after an Excel import.


All I needed to do is writing a small tool which is able to parse the general log file and re-run all queries against a configurable mysql server. The output should be configurable in JETM-Style or csv-format which allows me to see all query timings in average, min, max, total and the count of each querie's excution.

Getting the log

Many developers know of the slow query log and the built in profiling mechanism which allow you to analyze and optimize single queries. But it seems not so many developers know about the general query log which is simply activated by setting one sql variable like this:

SET GLOBAL general_log = 'ON';

This method works on any platform and does not require a server restart.

To specify a log file name, use --general_log_file=file_name. To specify the log destination, use --log-output (as described by dev.mysql.com in " Selecting General Query and Slow Query Log Output Destinations")

The tool: MySQLQueryBenchmark

MySQLQueryBenchmark is a java console application, taking a slow query log, executing the contained queries and returning their execution times as JETM-measurements or in csv-format.

Usage for impatient:
# java -jar mysqlquerybenchmarking.jar -log LOG_FILE -db DATABASE -p MYSQL_USERNAME -u MYSQL_PASSWORD -ignore show,connect,set

The results will be stored in results.txt in JETM-Format by default. If nothing different is specified, localhost:3306 is being used as mysql server.

There are useful command line options which allow you to only execute specific queries, which match a prefix or are logged under one connection-id.


After executing all queries, the output in JETM-Format will look similar to this:



In some cases, you proably want to compare multiple results or evaluate the values in a program like Excel. The benchmarking tool supports csv output using the parameter -f csv. The output will look similar to this:


This file can easily be opened in Excel for further evaluation.

Digging deeper

As a developer, I cannot resist telling you how the tool is implemented.

  • For command line parsing, the tool uses jcommander. Parameters are simply stored in a parameter class like this:

  • The class QueryParser accepts one line from the log file, a restriction-id and a prefix to parse a query from that line if possible:

  • The class QueryBenchmark executes all queries and measures their timings using JETM:

  • For query execution, the class SQLStatementExecutor fires against the MySQL-server:

  • For csv output, I simply created a class CsvRenderer derived from JETM's MeasurementRenderer. I used the SimpleTextRenderer as template and adjusted it to generate a csv output.

Downloads

Git tepository
Precompiled jar
This tool is open source under Apache License, Version 2.0.


Enjoy benchmarking with this tool.
Have a nice day!