Using the JMeter Regular Expression Extractor with a For Each Loop to make a Web Crawler

JMeter uses regular expressions to correlate values, it does this via the Regular Expression Extractor control. Regular expressions provide an excellent method to search text that is ideally suited to the task of correlation where a high degree of control is often wanted over when a match is and is not returned. You can read more about their use in the context of performance testing here.

A classic example for their use is to capture some form of session string from a server response and use it to send back to the server with subsequent calls. For example: This JMX file shows a dummy example illustrating how this control works. The first GET call simulates a response from the server containing the string `value=”ABC123efg456″ name=”session-token”`. A Regular Expression Extractor is placed as a child of this GET call which means it will only look at response from this one, individual call. (This is normally how this control is used, it would be complicated and messy to try to extend the scope regular expression extractors by placing them anywhere else other than as a child to a single call.)

The Regular Expression Extractor itself looks like this:

An Example of a Typical Regular Expression Extractor

By using the regular expression: `value=”([^"]+)” name=”session-token”` we are able to extract the value `ABC123efg456`, which is automatically stored in the variable ${session} which we can use later in the test as required.

Crawling a Page

A slightly more advanced example is to use the Regular Expression Extractor control in tandem with a For Each Controller. This test plan shows how this can be done to create a basic web crawler.

The main reason this is so simple to achieve here is because JMeter uses regular expressions to search the response from the server. Regular expressions support the concept of multiple matches and make it very easy to loop through a dynamic result set performing an action for each match.

In the example given we make a request to a dummy homepage:

Simple Homepage Request

As a child to this homepage request there is a regular expression extractor:

A RegEx that Returns Multiple Matches

This regex:

<a href="http:\/\/www.YOURSITE.com([^"]+)"

looks for every href link on the page that goes to the domain www.YOURSITE.com (hrefs to other sites are ignored). It puts every match into the resulting variable: `urls`.

The key here is the use of:

By using `-1` we tell JMeter to look for multiple matches and build an array of results. Normally, we would use a value like 1 here, to return the 1st match, or 0 for a random match.

Now that we have an array of results we can use the ForEach Controller to loop through them:

An Example of Iterating Multiple Matches

 

The configuration of this control is very simple. For the input variable prefix we use, `urls` as this is the variable populated earlier by the regular expression extractor. The output variable prefix can be anything by `url` is logical.

Now, this control will loop once for each url in urls. Each request that is a child of this ForEach Controller will be run once for each loop.

Using the Output from the ForEach Controller

The variable ${url} is created by the ForEach Controller and it contains a different value for every loop for as many href links that were found. If you wanted to extend this to really crawl a site all you would have to do is copy and paste the regular expression controller (you can click on a control and CTRL-C to copy it whole) and create another ForEach Controller nested inside the first and then repeat for as many levels as you want to check for.

OK, there’s better ways to write a web crawler but this serves as an example to demonstrate how easy it is to build, extend and adjust JMeter test plans.

Getting Started with Apache JMeter

Apache JMeter is a free, open-source load testing tool with a rich set of features that enable the creation of advanced, transactional test scripts. It is, I think, the leading open source load testing tool available but it is often seen as having a steep learning curve. This isn’t because JMeter is complicated, it’s actually very simple to use, but it does have a non-standard interface which can be initially confusing, especially to those used to using other load test tools. This post is an attempt to smooth this learning curve by explaining some of the core concepts.

Installation

So, the first thing you need to do is install JMeter. This is very easily done because JMeter runs in Java, and the chances are you’re reading this on a machine that has a Java Runtime Environment already present.

1. Download the latest version from here. For simplicity, chose one of the ‘binary’ files.

2. Create a directory, maybe `c:\apache-jmeter\v2-8\` or `/Applications/apache-jmeter-v2.8/`, you get the idea.

3. Unzip the file to this directory

That’s it. You now have JMeter installed. To run it either execute `/Applications/apache-jmeter-v2.8/bin/jmeter` or double click the `c:\apache-jmeter\v2-8\jmeter.bat` file.

The JMeter GUI

This will open the JMeter GUI which will look a lot like this:

Default JMeter GUI

Here is an example test file that you can download and open using JMeter. Contained in this test plan are some the basic elements for a load test against a typical web site. There are other features you might want to add in, such as correlation, loops etc., but this is a start to explain the interface.

The Tree View

An important concept to understand in JMeter is the use of the tree model. This is displayed in the panel to the left and it is the heart of how tests are put together in JMeter, it decides not only execution order but also the scope of different controls. In fact, it decides everything – in this one place you build the entire test. Here you can:

  • add new requests
  • perform correlation
  • add verification checks
  • define configuration for cookies, headers etc.
  • define virtual users (threads)
  • insert think time
  • define execution loops
  • add while / for loops
  • add if statements
  • define pacing
  • add transactions
  • configure rampup / duration values
  • define scheduling
  • configure groups of scripts
  • decide execution order
  • really, everything…

The tree view is a graphical representation of the test plan. JMeter uses XML to store your test plan and it is this XML that is being shown here. In the same way that XML uses parent child relationships to decide scope, JMeter decides the scope of each control based on its position in the tree.

You’ll notice that there are multiple Thread Groups in this example – loginAndBrowse & staticPages. JMeter uses Thread Groups to separate groups of requests (samples). Each group can be assigned a set of configuration options that decides how many threads (virtual users) are run, how many loops (iterations) are executed and also other values such as rampup and duration. Typically, each Thread Group runs in parallel and execution order in JMeter is simply top down within each Thread Group.

Configure a Thread Group

A Thread Group is best thought of as a script, each group is (normally) run in isolation and in parallel to other groups. In LoadRunner, a virtual user script is the same as a thread group. Inside a thread group, rather than a script view containing code, you have a sub tree containing graphically represented requests (samples) that are executed in order, from the top down. The next sampler (request) will not start until the previous sample is complete. This is how JMeter defines journeys through a site.

 Scope

As previously mentioned the postion of an element in the tree decides its scope, Ie. it decides the other controls that are affected by it. For example, in our example there is a control called HTTP Cookie Manager. This control is telling JMeter to manage cookies – simply by including it you tell JMeter to react to any cookies sent by the server, keep them in memory for each user (thread) and send then back for subsequent calls, it is fully automatic. But in this case we have positioned it inside the loginAndBrowse thread group so it will only apply to samplers inside the same group. In other words, the scope of this control is limited to this Thread Group. We could also have put the HTTP Cookie Manager at the root of the testplan like this:

With the cookie manager at the root of the test plan it will apply to all samplers, in all thread groups.

Another example of scope is shown with this Response Assertion:

A Response Assertion Limited to one Sampler

Because this control is a child of a single sampler, the only request affected by it is its parent, the login request. You could place this assertion higher up the tree and it would check the response of multiple requests. You could even place it at the root of the testplan and it would check every single request. A use case for this is setting a negative assertion, you might have a generic error page for you site that returns something like: “Aw Shucks, Something Went Wrong”. The configuration shown below will check each page for this error text and only pass a request if it is NOT found. Note how the ‘Pattern Matching Rules’ are set to ‘NOT’.

An example of a global error check

A final example of scope is Timers. Timers are instructions to JMeter to wait for a period of time before continuing. Like everything (yes, everything) they follow the same rules of scope. If you do this:

Then this will NOT insert a delay inbetween the login and browse samplers. Instead, it will apply a random wait to every request within its scope. That is, the homepage, login, browse and logout request will all have a ransom wait applied to them. If you want a wait to be specific to a single sampler then do this:

But hey, maybe you want to have every request in this thread group wait for a random time before continuing. Maybe you want all requests in the whole test plan to have think time before them. All you have to do is drag and drop this timer control to the position you want it and the rest is done for you – welcome to the awesomeness of JMeter.

Pacing

In load testing it’s always a good idea to pace your test so that you are running at a controlled rate. In LoadRunner you do this using Pacing, in JMeter you can use the Constant Throughput Controller. When added this will make sure the requests within its scope are only made once every X times per minute. Like everything in JMeter it follows the same scope rules so if you put it globally your test would, overall, only make X requests per minute, if you put it as a child of a single request then this individual request will only be made X times per minute.

By the way, have you noticed how scope applies to everything? That’s a very cool thing that you will come to appreciate. Not only can you reliably and quickly change the whole setup of a test by simply dragging and dropping controls, but because the interface is a visual tree-view, it’s ridiculously easy to do it as well.

So, an example of pacing is this:

Using a Constant Throughput Controller for Pacing

The homepage request is limited to 1 request per minute, per thread. Because all the requests in a thread group are run sequentially and because each request waits for the one before it, you are effectively pacing the whole script (thread group) for each user (thread). This is a useful way to set pacing for a particular journey.

Controlling Execution with Loops

This is so easy it’s silly. Want to add a loop to your test to make one request 5 times? Do this:

That’s an obvious example to explain the interface but you should know that you can do so much more. For example, you can make a request to a search page and then parse the response to get the url of every results link. Then, you can use a For Each Controller to dynamically call each of the results. Hey, you just built a web crawler!

You can have an IF statement that only executes its children if a certain condition is true. Or a While loop that will run until a condition becomes true.

You don’t have to write code for this stuff, it is all visual, and this makes adding this sort of control to your test plan very easy and very fast.

Listeners

Listeners do two main things:

  1. let you visualise the test as it is running (or after it has run by importing the jtl file into them).
  2. Allow you to debug your test.

Visualisation

OK, so this is not JMeter’s strongest feature, JMeter is not pretty; it’s the geeky one in the corner with thick glasses. But you can make it slightly sexier using this plugin. Just copy the JMeterPlugins.jar file into the /lib/ext/ directory.

Debugging a Test

Use the View Results Tree Listener to debug your tests. It’s a very concise, graphical way to see what requests were made, the response you got for each call and the order they were made. There’s no need to trawl through logs file.

Running JMeter

JMeter can be run in two modes, GUI & COMMAND LINE. There’s a good reason for this and, typically, they both have a purpose. The GUI is what you use to create, edit and debug tests with. It is a rich graphical representation of your test plan that – once you get your head round it – allows scripts to be put together in a very short time. You can also run complete tests from the GUI, this is possible, but typically it is better to execute the full, proper test run in COMMAND LINE mode. The reason for this is performance, JMeter will perform much better when not in GUI mode and this is important if you are running heavy load tests.

Running in GUI Mode

Erm, well, just press the green button shaped like an arrow. Or you can just use CTRL-R. It’s big red square or CTRL-. (period) to stop.

Running in CMD LINE Mode

[Note. I'm focusing on unix based systems here but the windows bat file follows very similar conventions.]

Less pretty, MUCH more powerful. Type:

./path/to/jmeter/bin/jmeter -n -t /path/to/my/test.jmx -l /path/where/to/put/results.jtl

Broken down:

./path/to/jmeter/bin/jmeter = run the jmeter process
-n = Run headless, without the GUI (I.e. n = non gui mode)
-t /path/to/my/test.jmx = tell jmeter which test plan to run
-l /path/where/to/put/results.jtl = put results here

When running at the command line you don’t need, nor want Listeners to be enabled. You can disable a control in the GUI by pressing CTRL-T. There’s one exception, you can use a Generate Summary Results listener to get output on the command line.

As ever, there’s a lot more available that just these basics. You can pass in custom variables, define custom properties. You can even integrate this command into a shell script and call it from your build server and hey presto, you have the beginnings of a CI system.

Full Details here. Play.

 

 

jmeter-ec2 | Run JMeter on Amazon’s ec2 Cloud

jmeter-ec2 is a simple, freely-available shell script that automates running Apache JMeter on Amazon EC2 or any other server. It does things like:

  • Using Amazon’s API to launch instances
  • Installing JAVA & JMeter
  • Copying test files to each server
  • Adjusting thread counts to ensure the load is evenly distributed over each host
  • Editing the jmx for file paths to external data files
  • Displaying real-time aggregated results from the test as it is running
  • Downloading and collating all jtl files
  • Terminating instances.

Jump to Example for an idiot-level step by step example.

UPDATE: The name ‘jmeter-ec2′ is actually something of a misnomer now as I recently added the ability to specify a list of already existing hosts that the the script will run jmeter on (via the REMOTE_HOSTS property). These hosts can be any number of linux based machines in any locations that you have access to, you are not limited to using Amazon EC2.

Even though ‘it’ has a ‘name’, it’s just a shell script. As a shell script, it is mainly designed to improve the efficiency of an otherwise repetitive task. It is not an application, it does not write tests scripts, identify bottlenecks or make you better at your job. About the most advanced thing it does do is aggregate the output from the Generate Summary Results listener to the screen, which is basic math. That said, it does save an awful lot of time.

[Source: https://github.com/oliverlloyd/jmeter-ec2 with usage instructions in the README file]

The Cloud and Load Testing
One of the things the Cloud is useful for is load testing; very large amounts of hardware can be used to generate load at minimal cost with the added benefit that, if your application you are testing is external to your corporate network, your tests will be run from a realistic location which prevents any problems with artificial bottlenecks occurring on your LAN. This type of testing, using externally located hosts, is increasingly common and JMeter is a superb open-source solution for facilitating this. There are also several excellent paid for options for running load tests from the Cloud. SOASTA are, in my opinion, developing a very interesting offering in this area – they even allow tests up to 100 threads/users for free using their Cloud Test Lite solution. But if you are comfortable using Apache JMeter, do not need advanced analytics or presentation and if you have a requirement for using multiple externally-hosted hosts for load testing then JMeter and EC2 will probably do the job.

There are still cases where a traditional test lab located internally can and should be used for load testing but there is currently a sea change happening in the industry where more and more functionality is being delivered via the SaaS or PaaS model and in my view our approach to Performance Testing needs to adapt to this. Fundamentally, if the application that is being tested is remotely hosted then the test rig should be also.

Using JMeter on Amazon EC2
Running JMeter in the Cloud is not as simple as you might think. Certainly, assuming you run JMeter from the command line then spinning up a single instance and running a test from there is not especially difficult, but running a test over multiple instances is time consuming.

Some problems with trying to run JMeter in the Cloud:

  1. If you want to use your local machine to control the test you have to navigate the joyous process of getting RMI to work over multiple subnets and firewalls to allow your local machine to control multiple remote (EC2) slaves – you can do this by tunnelling RMI communication and patching JMeter, it’s messy though. To workaround this issue you have to use a remote Master as well as remote Slaves.
  2. Even then, in master/slave mode when running high throughput tests JMeter will eventually reach an IO or network bottleneck that will affect the results (even using Batch mode). Too many processes trying to write to a single file at the same time inevitably start to queue. To be certain of avoiding this issue you have to not use Distributed mode and instead run multiple independent tests over n hosts that do not use the GUI and do not write results to a single master but instead run at the command line and keep results local. Then, after the test is complete, you have to collate everything. This approach has several annoying problems: a) excessive terminal windows, b) limited visibility on test progress as it happens, c) to preserve throughput the jmx file must be adjusted for the number of hosts in use and d) too much time is spent on repetitive tasks that could be automated.

jmeter-ec2 Shell Script – Details
To get around the problems listed above and improve my life/work balance I wrote a shell script and then GitHub made me invent a name for it.

jmeter-ec2 running:

Displaying real-time test results:

Completing the test:

Project Files:
Jmeter-ec2.properties
This file contains several properties required for running the script. Most of these relate to your Amazon account and should already have been setup as part of your Amazon AWS registration. This file should have executable permissions (it is ‘run’ to set each property).

In place of using Amazon to create the hardware it is also possible to specify a list of hosts to run the test over. If the REMOTE_HOSTS property is populated in this file then the script will use these machines instead.

jmeter-ec2.sh
This is the main shell script. It should have executable permissions.

install.sh
Used to install Java & JMeter on each slave instance (not to run locally!)

jmeter & jmeter.properties
These are edited versions of two of JMeter source files. They are included here to enable certain features of the script. After JMeter is installed, if these files are present in the root directory then they are uploaded in place of the default files.

The values changed from their defaults are:
[jmeter]:
HEAP="-Xms2048m -Xmx2048m"                 - OPTIONAL (specific to your test)
NEW="-XX:NewSize=256m -XX:MaxNewSize=256m" - OPTIONAL (specific to your test)

[jmeter.properties]:
jmeter.save.saveservice.output_format=csv  - REQUIRED, DO NOT CHANGE
jmeter.save.saveservice.hostname=true      - OPTIONAL (but useful in this context)
jmeter.save.saveservice.thread_counts=true - REQUIRED, DO NOT CHANGE
summariser.interval=15                     - OPTIONAL (should be tuned)

The main change is to set the output format to csv. Without this the script will not be able to collate the results and they will be corrupted, but the test will still run. The post processing step requires that thread counts be present as it uses a column index to adjust them for the number of hosts used in the test.


An Example: (Using Mac OSX)

Prerequisites:

  • That the testplan to be run has a Generate Summary Results listener*. See here.
  • That your JMeter testplan has a duration or loop count value set**. See here.

[*Without this no results will be displayed to the screen but the test will still run. No other listeners need to nor should be present.]

[**Without this the test will run forever or until you press CTRL-C. All testplans should also employ some form of pacing as best practice - load tests should not be run without some way to control the throughput. One way this can be achieved in JMeter is using the Constant Throughput Controller.]

Prerequisits specific to using Amazon:

  • That you have an Amazon AWS account. See here.
  • You have Amazon’s API tools installed on your machine. See here.

Step by Step Instructions:
STEP 1 - First create a  directory called something like /Users/oliver/jmeter-ec2. This is the script home and it is where you will place your testfiles, project by project.

STEP 2 – Clone / Fork the files from GitHub placing the files into the directory just created. Tip: For simple read-only snapshots use the ZIP button to download.

This gives:

STEP 3 – Now extract the contents of example-project.zip to give:

Note. myproject.jmx is simply a dummy testplan but it demonstrates the structure required for the script to work.

myproject.jmx:

STEP 4 – If you would like to execute a jmx file called foobar.jmx then you should copy this into the /myproject/jmx directory and rename the directory /myproject to /foobar.

Any jmx testplan used should have:

1. A Generate Summary Results listener.

2. Duration or Loop Count should be set to a fixed value. This is not essential as the test can be shutdown using CTRL-C, the script will capture this and process any results up to this point.

STEP 5 – If your testplan uses any external files then copy these into the /data directory under your project. For example. If your testplan foobar.jmx references a file mydatafile.csv then copy this into /foobar/data. During execution the script will automatically adjust any references in the jmx file to point to the remote version of this file.

STEP 6 – The final step is to update the values in the jmeter-ec2.properties file. If using Amazon then these need to correspond with your Amazon AWS account, the AMI you intend to use and the directory you created above. If not using Amazon then REMOTE_HOSTS needs to be set to a valid, comma-separated list of hostnames.

jmeter-ec2.properties file:
# This is a java stye properties file for the jmeter-ec2 shell script
#
# It is treated like a normal shell script and must have executable permissions
#
# See README.txt for more details about each property
#
LOCAL_HOME="/Users/oliver/jmeter-ec2"
REMOTE_HOME="/tmp"
AMI_ID="ami-a5e7dad1"
INSTANCE_TYPE="t1.micro"
INSTANCE_SECURITYGROUP="jmeter"
PEM_FILE="olloyd-eu"
PEM_PATH="/Users/oliver/.ec2"
INSTANCE_AVAILABILITYZONE="eu-west-1b"
USER="root"
RUNNINGTOTAL_INTERVAL="3"
REMOTE_HOSTS=""

Where:

  • LOCAL_HOME is the directory you created on your computer.
  • INSTANCE_SECURITYGROUPPEM_FILEPEM_PATH relate to your Amazon account.
  • AMI_ID, INSTANCE_TYPEINSTANCE_AVAILABILITYZONEUSER all relate to the AMI used.
  • REMOTE_HOSTS is an optional list of hosts to use in place of creating new ones.
  • REMOTE_HOME & RUNNINGTOTAL_INTERVAL can be left as shown.

The test can now be run; open a terminal and type:

$ cd /Users/oliver/jmeter-ec2
$ ./jmeter-ec2.sh myproject 1

If you copied your own foobar.jmx file into the /jmx directory in step 4 then use:

$ ./jmeter-ec2.sh foobar 1

1‘ is the count of hosts to request from Amazon, you can specify as many as your AWS account allows. If you try to launch more than you are allowed Amazon will only return the maximum possible according to your account limits, the script will adjust for this. If you are specifying your own hosts then this value is ignored.

If you have more than one project in the /jmeter-ec2 directory then in the command you would replace ‘myproject’ with ‘anotherproject’. For example, if you wanted to run anotherproject using 5 Amazon instances you would type:

$ ./jmeter-ec2.sh anotherproject 5

Finaly, if you do not want to use Amazon and already have a list of valid hostnames to run the test over then you could simply call:

$ ./jmeter-ec2.sh myproject

and the test will be run over as many hosts were specified in the jmeter-ec2.properties file.

 

Project Source: https://github.com/oliverlloyd/jmeter-ec2

Why does Think Time matter?

A lot of people can tell you what think time is but surprisingly few can explain why it is important. It’s actually an excellent interview question, typically, if you ask a less experienced performance tester this question they will give the stock response: “because it makes your scripts more realistic”. But why does this matter? If I have a simple script that logs into an application, performs 3 purchase actions, and then logs out, why do I care if it pauses in-between page requests? A more experienced person, when asked the same question, is likely to check his watch, take a deep breath, and begin with: “Well,…”.

What comes after that ‘Well,’ could actually be many things but from a person’s response you can gain a good understanding of their experience, for example, I spend a lot of time performance testing session based applications running on Java so my answer would start with something like this:

Think time matters because it holds open a user’s session for a realistic period of time. This means the memory footprint of that user exists for longer (compared to not using think time) and as a consequence I can be more confident that my test is providing a useful representation of how memory will be used in the JVM. Anyone who has experience tuning memory usage in JVMs will know that this is crucial. Without think time you cannot test concurrency, without accurate concurrency you cannot test heap size, garbage collection, thread / connection pool sizes, etc., your entire load profile would pretty much be wrong and your test substantially less useful.

Equally important is knowing when think time is not required. If I am writing a script to test calls to atomic webservices then I have no need to use artificial wait times. My requests will still be using memory in the application server but, in the case of a stateless, atomic call, this memory will only exist for the life of the actual request and the inclusion of think time in-between each call will serve no purpose and could actually detract from the benefit of the test.

Understanding the real purpose of think time is extremely important if you want to build an accurate test scenario; far too many applications are tested without think time (because it reduces the number of vusers required and is thus often cheaper) which results in completely unrealistic results leading to problems being missed and then performance issues in Production systems.

Note. Think time has almost nothing to do with Pacing, they are two separate concepts used for separate purposes. About the only time they relate is where you need to consider the impact of think time when calculating pacing.

How to write a useful Non Functional Requirement (NFR)

The most important points to consider when writing an NFR are:

1. Qualified – It should describe in words what is required. This is important, spreadsheets with boxes that have numbers in them are only useful if everyone agrees on the same assumptions – this is rarely the case as a box with 1 second in it can mean a lot of things to a lot of people.

2. Defined – For example, a response can be measured at various points, there are lots of layers that a request can pass through and the very last point in the stack is when it hits the user’s eyeball. It is important to define which layer we are working in. Typically, in load testing you want to remove the client and the network and only focus on server response times (different people have different bandwidths and different PCs so we remove those variables).

3. Quantified – We need actual numbers to work to. The best number to use is the 90th percentile as this prevent the minority skewing the majority.

4. Flexibility – NFRs are not written in stone and can be changed. But we must have a line in the sand to start with. This line can be, and usually is, moved but having it defined prevents either needless tuning or, worse, insufficient tuning.

5. Volumetrics – Now that we know how fast we want the system to respond we have to decide under what conditions we want it to do this. This is known as defining the ‘load’ to be tested to and is most often expressed in terms of business requests per second/minute. For end to end performance testing it should be complete and describe traffic from all sources. For component testing it should set the scope for what is to be tested. If the system is complicated then this data can be defined in a separate document and referenced.

 

An example:
“As a business owner, I want 85% of Login requests to be served within 650ms, 95% within 850ms and none over 1200ms where a response is defined as a HTTP 200 return code from the server to Apache and does not include any client side or external network (internet) processing. Requests must be made against a fully populated database that is a replica of Production and this level of performance should be maintained for 1 hour without degradation during a simulation of the peak load as defined in the ‘Project Volumetrics’ document which includes 2.6 Login requests per second via 220 concurrent threads.”

web_reg_save_param_regexp() – A regexp primer for performance testers

LoadRunner 11 and later versions come with the long overdue feature of being able to use regular expressions to correlate values. The standard web_reg_save_param_ex() function relies on left and right boundaries and some simple attributes like length, offset and ordinal to narrow down searches. This is generally functional, but regexps are better. They are more accurate, faster and more reliable. There’s a reason why they are arguably the de-facto standard method for extracting values from strings.

JMeter uses regexps as standard and, as a result, once you have a solid understanding of them, I think that it is substantially easier and faster to correlate scripts using this tool. It’s said that it is a steep learning curve to learning regexps, but personally, I reckon you can get the basics in a couple of hours. Besides, any half decent load tester should have this skill even if they don’t use it to correlate; it is extremely useful for data manipulation and this is a common requirement in performance testing. Simply just knowing regexps, awk and sed together is going to solve 90% of your data manipulation needs.

Before getting into detail, anyone starting out with regexps will want a handy regexp tester, like rubular.com. There’s lots and lots of others.

So, a recent post I read asked how to correlate the string 1945 from this json response:

"...:[{"containerName":" ","containerSize":"12","containerStartRow":"0","containerEndRow":"0","rows":[[“NW,RO,RA","DLY","10/07/2011","10/17/2011","10/01/2011","RA","Y","FR","AMEA","AC","1945","50","50","AC 100IOSH-08","UserDefined","10/07/2011","Reassigned"..."

Classically, in LoadRunner you would try to use left and right boundaries but this gets horrible with json.

The obvious LB and RBs here would be

"LB=\"AC\", \"",

"RB=\"",

But the problem with this is what if AC is dynamic too? It probably is. A reliable correlation would have to use the unique text "rows": [[ (I'm assuming this is unique.) but then you'd have to end at ] and you'd end up capturing the whole string and be left with some fun C string manipulation to get the required value.

Another method might be to use SaveOffset but the risk here is that one or more values might have dynamic lengths.

There are probably some ways it could be done - there are always ways - but using a regular expression using web_reg_save_param_regexp() is probably better.

The syntax for this function is:

int web_reg_save_param_regexp("ParamName=<output parameter name>", "RegExp=regular_expression", [<List of Attributes>,] [<SEARCH FILTERS>,] LAST );

where Attributes and SEARCH FILTERS are standard. This is pretty simple so I will focus on just the regexp syntax from here on.

In the case of the json sting above, one regexp you could use is:

rows":\[\[“[^"\r\n]*","[^"\r\n]*","[^"\r\n]*","[^"\r\n]*","[^"\r\n]*","RA","[^"\r\n]*","FR","AMEA","[^"\r\n]*","([^"\r\n]*)"

This will return:

1945

That might look like a bunch of gibberish but it's actually nice and logical and I think most performance testers should be able to grasp the concept and then you just need to learn a few rules. There's lots of stuff out there so I won't repeat it here.

But one thing that is worth highlighting here is the use of:

([^"\r\n]*)

This can actually be simplified as:

([^"]*)

It basically means match everything that is not a double quote. The previous expression matched everything that was not a double quote nor a newline. This is really useful, you can use it to capture any value that is enclosed in double quotes which, frankly, makes up a large part of correlation.

If your response contained something like:

<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="JHGYTFDIUSI"

Then the regexp would be:

VIEWSTATE" value="([^"]+)"

(Technically, it should be VIEWSTATE"\s+value="([^"]+)" where the \s+ matches on white space(s) - it's safer.)

That's the basics, but what makes regexps so great is that they can do so much more, and this is exactly why they are far superior to the old wrsp() with it's clunky boundaries.  The json string given above is one example where a regexp would work better but once you get the hang of them it's surprising what you can do. For example, multiple matches.

Multiple matches work in the same way as with the classic wrsp() using boundaries, you specify Ordinal=ALL and get param_1, param_2...param_count, etc. which by itself is useful. But actually, regexps can do even more than this, you can insert multiple parenthesis intoa single regexp to get multiple groups and if any of these groups match multiple times then you create a multi-dimensional array - give this a little thought and you will realise the potential. Sadly, LR11 & 12 only support using a single matching group so no multi-dimensional arrays yet in HP land.

The lack of support for multiple groups is a shame, in JMeter you can have a regexp like:

rows":\[\[“[^"\r\n]*","([A-Z]{3})","[^"\r\n]*","[^"\r\n]*","[^\/]+\/[\d]+?\/2011","[A-Za-z]*","[^"\r\n]*","[^"\r\n]*","([^"\r\n]*)","[^"\r\n]*","([^"\r\n]*)"

which returns multiple matches looking like:

1    {param_g1} = DLY
2    {param_g2} = AMEA
3    {param_g3} = 1945

Instead, for LR, you'd need multiple wrsp_regexp() statements.

In general, one thing you need to be aware of when using regexps in load testing is greediness causing backtracking - this is crucial, if you don't take care, you'll eat CPU on your Load Generator machines.

Note. There is also a web_reg_save_param_xpath() function which works better for XML responses. This is also a long time feature of JMeter.

Why performance testing on scaled-down test environments does not work

Performance testing is misunderstood. Often. One misconception in particular that I routinely try to debunk is the belief that it is reasonable to run tests on a scaled-down environment and then extrapolate the results up. Equally false is the approach of relying on delta testing. This weekend I saw a classic example of why these concepts do not work.

The site in question is high volume and transactional. A method was noted to be consuming large amounts of CPU resource and needed tuning. The solution put in place involved a change to how data from the database was cached on the App servers. The initial round of performance testing involved using one App server and the delta results we saw showed a huge improvement – the CPU consumption dropped by a factor of three. Extrapolation indicated that we would be able to reach much higher throughput and the feeling in the team was positive. The change was functionally tested, shown to have no issues, and good to go.

At this stage we had proved that the code change successfully reduced CPU usage and did not affect functionality. We had run tests on a scaled-down environment and the delta results were very positive.

But then we decided to do more testing. We built a replica test environment with the full suite of App servers and ran a load test against this. What we found was that the code change had actually introduced a very serious performance bug that would have crashed the system had it been released. Where we only had one App server, the call to the DB to refresh the cache was not a problem, but when there were multiple servers making the same call this created locking and the requests were queued resulting in high database CPU and delays across the system.

This type of problem, and others like it, cannot be identified on scaled-down test environments and relying on this method and the concept of extrapolation is unwise for any enterprise level IT project where the cost of system failure would outweigh the cost of building a replica test environment.

HP LoadRunner Virtual Table Server (VTS) – A quick how to guide.

VTS is an unsupported add-on for LoadRunner. It is non-version specific and, to the best of my knowledge, no longer being developed – not for a long time. So it is what it is and no-one has any responsibility to help you with it. But that said, it is a very sturdy, reliable little beast – in all the years I have been using it I have not once come across any bug, or had a crash or in any way had a problem with this tool – it just works.

It is a simple in memory database that can be easily setup and used by LoadRunner to allow data to be dynamically shared, in real time as a scenario is being executed, between different virtual users.

It is the solution for when you want to store data that is being created by a script, for when you want to pass data from one script to another, for when you have data that can only be ‘used’ once and a lot more besides.

Installation

Initially this looks messy and there’s precious little advice on how to do this but it’s actually a breeze. You can either download the file vts2.zip from here: VTS KB Article (HP support login required.) or you can take this file and copy it to your Controller and each and every LG that you intend to run any script referencing VTS on – basically your whole farm. For each machine do this:

  1. Right click the zip file
  2. Select ‘Extract to’
  3. Browse to the directory where LoadRunner is installed (eg. C:\Program Files\HP\LoadRunner)
  4. Choose this folder and the location to extract the zip file to.
  5. Unzip it

That’s it. It’s installed. Simple, no?

Oh, you may need to register a couple of files. I’ve not had to do this for some time now but it is listed as a required step so:

  1. Open a CMD Prompt
  2. Type (Only the text inside the brackets…)
  3. Type (Only the text inside the brackets…)

Running VTS

Now you have it installed, think about where you want to run it. Think of each little VTS as a server (they are essentially). So they need to run on a machine that is available to every LG that want s to use it – ie. You have to have network connectivity. You could just use your Controller box but it’s nice to have a separate box for VTS as it can use up a lot of memory and, potentially, network bandwidth.

Once you have a machine in mind, go there and do this:

  1. Open a new notepad document
  2. Type this:

ECHO ON
CD "C:\Program Files\HP\LoadRunner\bin\"
START vtconsole -port 9999 -launch
START vtconsole -port 9992 –launch

 Save this notepad file as ‘START_VTS.bat’. You can use another name if you like but the .bat part is key.

Find the file where you saved it – maybe use your desktop? – and double-click it.

Now you should see two windows pop up that look like some old school VB app with some grid control in the centre.

I used two VTS servers here to highlight that you can do this – and probably will want to. Each separate server has it’s own PORT number which is how you reference it and you can open lots if you want – think of them as tables in a DB but in reality they are each their own mini database sitting in memory.

Later, when you have started to use VTS, you will appreciate that you can enhance this START_VTS .bat file by using additional params. Eg. –file [FILENAME] will allow you to open the VTS table and pre-populate it with existing data. This data can either be something you have exported from VTS earlier or just plain old delimited text. The new command would look like this:

START vtconsole -port 9999 –launch – file \\MACHINENAME\DIRECTORY\file.csv

You can also setup VTS to export it’s current data to a file and also to periodically update this file. Just select Options > Export / Import.

Connecting to  VTS

Now it is running, you can create a script that uses it. First of all you need to copy this to the vuser_init() section.

#include "as_web.h"
#include "vts2.h"

vuser_init()
{
return 0;
}

Then you need to load the VTS dll and include a reference to the table itself, a bit like this:

vuser_init()
{
lr_load_dll("vtclient.dll");
lrvtc_connect( “[MACHINE_NAME/IP]”,9999,0);

return 0;
}

Then you need to load the VTS dll and include a reference to the table itself, a bit like this:

vuser_init()
{
lr_load_dll("vtclient.dll");
lrvtc_connect( “[MACHINE_NAME/IP]”,9999,0);

return 0;
}

Note. MACHINE_NAME/IP is where you have the VTS server(s) running.

For best practice you can improve this code to look like this:

#include "as_web.h"
#include "vts2.h"

vuser_init()
{
PVCI pvci = 0;
int rc = 0;
char *VtsServer = “{VTS_SERVER}”;
lr_load_dll(“vtclient.dll”);
pvci = lrvtc_connect( VtsServer,9944,0);
rc = vtc_get_last_error(pvci);
if( rc != 0 ) {
lr_message(“Connection to VTS %s failed”, VtsServer);
return -1;
}
return 0;
}

Note. {VTS_SERVER} is the name of the machine running VTS.

So you have VTS up and running and your script connects to it. Now you just need to use it.

Using VTS

VTS comes with a document listing all the functions and I am not about to repeat this here, so I attached it. That said, the documentation is poor and contains errors, so beware.

In essence you can write, read, read unique, read a whole row and write a whole row. Think of it like a dat file but the difference is you can read and write to this dat file as the scripts are being executed and all scripts can share the same data.

But you CANNOT query VTS. You can’t say give me all data where CAT_ID=3 or anything like that. Just: give me the next CAT_ID.

When you read data from VTS it has a handy feature where you can read and remove the data from the table – you don’t have to do this but it is generally useful as more often than not you are using VTS because you have context sensitive data that can only be ‘used’ once, or perhaps one at a time. Something like this.

So, how to do all this:

Let’s say you have Script A that is creating orders and Script B that is paying them. Script A would connect to VTS Port 8888 and write data, Script B would also connect to VTS Port 8888 but it would read data.

It would look like this:

Script A

Action()
{
[Do some stuff here to create a new user]

//Now write out the relevant data to VTS
lrvtc_send_row1(“password;email”, “hello123;{p_UniqueUsername}@MYDOMAIN.com”, “;”, VTSEND_SAME_ROW);
return 0;
}

Script B

Action()
{
lrvtc_retrieve_row("email;password",";");

[Do some stuff here using this email and password]

Return 0;
}

Notice that when I wrote the data to VTS I was able to use a parameter {p_UniqueUsername} – this is worth remembering, it’s pretty essential. I also made the example use two columns and thus the retrieve_row function. Also by using ‘retrieve’ row I am removing that row from the table.

You could also have a script that starts by removing a row from a table, then it does some stuff with it, and then it writes it back when it is finished. This enables you to prevent two scripts working on the same record at the same time.

You could also have multiple VTS tables representing ‘states’ of data. For example. An insurance quote needs to be created by a ‘clerk user’ then validated by a ‘super user’ then authorised by a ‘auth user’ – whatever – the point is you have one record going through multiple states and rather than one long script where you have to log in and out multiple times to complete the journey, you use VTS to manage the data.

Finally, something to think about when using multiple tables like this is that sometimes you might find a VTS table empty. In this scenario you need to code a do, while loop that polls the VTS table looking for a record before carrying on. This prevents the script failing and, if you code the loop well, it will prevent over loading your system artificially.

Conclusion

Anyone with data that can only be used once, with data that they want to pass between scripts or with data that they generate during the run that they want to keep, should use VTS. Basically that’s pretty much anyone using LR.

Note. You don’t have to use VTS, you can use mySQL as well as this blog post shows.

Files

VTS2.doc
vts_2_10.zip
VTS_Example_Scripts.zip