Automated JRockit Flight Recording using Perl

No Comments

The program below captures the JVM activity using the JRockit Flight Recorder. It is automated to capture the dump traces every 5 minutes for a set duration. Due to the size of each dump trace, I set it to capture at 5 minute intervals.  For a deep dive analysis of the activity, the Diagnostic Volume of events generated can be changed via the weblogic console to either Low, Medium or High.

This is my first Perl program and I am open for suggestions to improve it.

#!usr/bin/perl -w

use strict;
use Sys::Hostname;
use Time::Local;
use POSIX;

#Hostname && Instance Variables
my $PID;
my $name;
my $dumpId;
my $domain = substr hostname, 0, index(hostname, '.');

#grep the PID list and output to a file
my $ps = qx/ps -ef | grep "weblogic.Name" | awk '{print \$2, \$9}'  > "pid_list.txt"/;
open (FILE, 'pid_list.txt') or die("Unable to open file");
print "\nThe LIST of the WEBLOGIC instances and their PID on $domain:\n\n";

#hashing the file to print the PID's clearly
my %hash; while (<FILE>) {   
         chomp;                ($PID, $name) = split (" ", $_);      
         $hash{$PID} .= exists $hash{$PID} ? "$name" : $name;   
         print "$PID => $hash{$PID}\n"; } 
   close (FILE);  

print "\nEnter the PID:";
$PID=<>;
$PID=substr $PID, 0, index($PID, '\n');
$name = substr $hash{$PID}, 16;  


print "Enter the duration of the recording in SECS:";
my $duration =<>;
# print "The test duration is:$duration\n";
my $finalDuration = join '', (substr $duration, 0, index($duration, '\n')), 's';
# print "Duration:$finalDuration\n";

#Path
my $newWorkDir = qx/echo \$JAVA_HOME/;
print "\nJava Home::$newWorkDir\n";   #chdir $newWorkDir;     my $dir = getcwd();    $newWorkDir = join '/', (substr $newWorkDir, 0, index($newWorkDir, '\n')), 'bin';   # $newWorkDir = join '/', $newWorkDir, 'bin';

chdir $newWorkDir; $dir = getcwd();
print "\nCURRENT WORKING DIRECTORY::$dir\n";

my $wldf = qx{./jrcmd $PID check_flightrecording | awk '/compress=false/ {print \$4}'};
my $length = -2;
$wldf = substr $wldf, 6, $length;
#index($wldf, 'WLDF');
print "$wldf\n\n";

my $time = scalar (localtime(time));
print "CURRENT TIME::$time\n\n";

my $mon = substr $time, 4, index($time, ' ');
my $year = substr $time, -4, 4;
my $mday = substr $time, -16, 2;  my $hour = substr $time, -13, 2;
my $min = substr $time, -10, 2;
my $date = join '' , $mday, $mon, $year;
my $testtime;   

 if ($hour ge 13 && $hour lt 24)

   {      $testtime = join '' , $hour, $min, 'PM';
   }
 else
   {      $testtime = join '' , $hour, $min, 'AM';
   }

my $totdumpTraces = ceil($duration/300);
print "$totdumpTraces DUMP TRACES will be collected\n\n";

my $pFile = "/opt/weblogic/JFRrecordings/$domain-$name-$date-$testtime.jfr";

#Start Flightrecording
my $parentTrace = join '', './jrcmd', ' ', $PID, ' ', 'start_flightrecording', ' ', 'filename=', $pFile, ' ', 'duration=', $finalDuration, ' ', 'compress=true';

print "$parentTrace\n\n";
system("$parentTrace");

for(my $dumpId = 1; $dumpId <= $totdumpTraces; $dumpId++)
{
   my $dFile = "/opt/weblogic/JFRrecordings/$domain-$name-$date-$testtime\_Dump$dumpId.jfr"; 
   #Dump Flight Recording for every 5mins
   my $dumpTrace = join '', './jrcmd', ' ', $PID, ' ', 'dump_flightrecording', ' ', 'name="', $wldf, '"', ' ', 'recording=1', ' ', 'copy_to_file="', $dFile, '"', ' ', 'compress_copy=true';
   #print "$dumpTrace\n\n";
   system("$dumpTrace");
   sleep (300);
}

LoadRunner vs. NeoLoad

No Comments

I have been asked recently about what I like about NeoLoad compared to Loadrunner. Without delay I responded saying “Loadrunner is undoubtedly the best tool in the market but NeoLoad is also a promising tool with a small learning curve, easy to implement and is cost-effective”. Having used and evaluated few of the many performance test tools, my priorities for the choice of the test tool would be as follows:

1. Supported Technologies

2. Functional Features (Reusability, Customizations, Ease of Maintenance)

3. Third-Party Tool integration

4. Tool Cost

5. Support

Having mentioned those, I would like to concisely differentiate between the two:

  Loadrunner NeoLoad
Supported Technologies Many more than what NeoLoad supports. NeoLoad supports a suite of RIA / Frameworks, WebServices, SAP & SIEBEL in Web mode
Functional Features Excellent. VuGen integrates with .Net and hence extends the functional capabilities beyond belief. Not sure if VuGen is extended to integrate with Java IDE. Limited Results Analysis due to unavailable raw data. NeoLoad will make raw data available in their next major release which is slated to release in a year. Ability to load java classes from within JavaScript extends the functionality to capture complex virtual user behavior.
Reusability Very flexible Possible with a few limitations
Customizations Can use LR API, C, Java, VB Can use Java script(limited API support) and Java
Ease of Maintenance Good tool for all kinds of projects and application sizes Good tool for small applications as script maintenance complexity grows with the size of the application
Third-Party Tool Integration .Net , CA Wily and seamlessly integrates with HP Suite of tools dynaTrace, CA-APM module to integrate with CA Wily. Helps monitor the synthetic transactions effectively.
Tool Cost

**The cost increases based on the selected add-ons

Close to $100000 for 500 vusers Close to $15000 for 500 vusers

*NeoLoad has flexible pricing options

Support Slow Excellent

Every tool has its own pros and cons and every choice is made according to the business need and objective only to leverage the potential benefit which is tied to the company’s ROI. Although NeoLoad is new in the market it has delivered good promise to date and is definitely an economic choice.

Methodical Implementation of Application Performance Management

The concept of APM will not yield any value as long as the approach toward the process is flawed. Due to its live monitoring and problem alert notification capability of APM tools, organizations tend to rely heavily on it hoping to triage the problem as soon as it happens. In a complex heterogeneous environment it is an advantage for the organization to be notified about the problem on the fly, but a disadvantage to be notified at an increasing rate. The core problem of APM is the absence of methodical implementation and maintenance of instrumented data in pre/post-production environments. Enabling the dashboards, streaming alerts is certainly overwhelming. To reduce the cost of quality organizations need to proactively monitor, analyze and triage issues in pre-production environment. And to achieve this organizations need to tap into the goldmine of instrumented data and ask themselves:

  1. How is the data being instrumented?
  2. What is being instrumented?
  3. How and where is it being retained? How long should it be retained?
  4. What is the quality of the data?
  5. Is the data being tied and referenced across various tiers in the application environment?
  6. Is the raw data being cleansed, characterized and clustered into a readable form for further analysis?
  7. How do we determine the sanity or correctness of the data? Are there any other tools and  methods to cross-reference the same?
  8. What tools do we apply to further analyze and report the data?

 

    http://www.shunra.com/shunrablog/index.php/2011/05/23/apm-is-broken-or-at-least-it%e2%80%99s-not-delivering-on-its-promise-of-improving-performance

Qualifying a Loadtest as PASS/FAIL/USEFUL

While trying to archive old data on my laptop I had come across a document I had written  long ago about load test statuses and thought I should blog it here. The document details about the PASS/FAIL/USEFUL criteria to qualify a test.  In order to apply it we need to understand the criteria under which the test result is qualified as pass/fail/useful. This will help the test team to characteristically collate the test data point(s) for further analysis. Qualifying and quantifying will also help the business understand the current state of the application.

Why should a load test be passed or failed?

Before an application is rolled out to production, an extensive capacity and performance validation testing is performed in order to test if the AUT satisfies certain non-functional requirements.  The test results from these tests are further analyzed to define the performance ( P ), availability (A) and reliability ( R ) of the application and these three variables (P, A, R) could be incorrectly quantified if the test does not run satisfactorily. Before analyzing for performance , availability and reliability it is important to identify if the load test was successful. The table below details a FEW criteria to identify the same.

 

    Note: Please read the NOTE below in order to understand the table

    BudgetThe allocated response time SLA (service level agreement) for each KPI (Key Performance Indicators) and peak hour transactions rates for each KPI.

    Optimal response time – The favorable response time calculated based on the system and network variability (ex. +20% of the budgeted response time) .

    Optimal transaction rateThe favorable transaction rate calculated based on the budgeted requirement for each Key Performance Indicator.

    1. The Optimal response time requirement for the Key Performance Indicators is considered to have response times <= (Budget + 20% of Budget)

      2. The Optimal transaction rate requirement for each KPI is considered to have a transaction rate: (Peak hour load – 10% of Peak hour load) <= Trans Rate <= (Peak hour load + 10% of Peak hour load)

      3. The test is marked as PASS if all of the criteria for “PASS” are satisfied.

    4. The test should be marked as FAIL if any of the criteria 7 – 9 exists during the test duration

5. The test should be marked as FAIL if extensive logging is turned on in the environment. The extent to which logging is enabled affects the performance of the application. Typically logging should contribute an increase in response time to about 1% – 2%.

6. The test should be marked as FAIL if all of the criteria for “FAIL” are satisfied.

 

Table 1:


S.No

CRITERIA

PASS

FAIL

1

Transaction Rate (X)

(Peak hour load – 10% of Peak hour load) <= (X) <= (Peak hour load + 10% of Peak hour load)

(X) < (Peak hour load – 10% of Peak hour load)

2

Average Transaction Response Time (Y)

(Y) <= Optimal resp time

(Y) > Optimal resp time

3

Resource Utilization

<= 80%

> 80%

4

Memory Utilization

<= 75%

> 75%

5

Memory Leaks

None

Identified

6

Load Balance

Balanced or is shared according to the implemented algorithm

Unbalanced or does not share according to the implemented algorithm

7

Script(s)

Passed

Failed

8

Server(s) Crash

None

Exists

9

Server(s) Restart

None

Exists

10

Logging

Turned off or enabled with allowable limit

Extensively turned on

11

HTTP 500 errors

None

Encountered

12

HTTP 404 errors

None

Encountered

 

Why should a Load test be marked USEFUL?

Typically an application is released when it satisfies all the requirements, in other words the PASS criteria. But many times the application is also released when it does not satisfy all of the criteria. Does this necessarily mean that the AUT failed to meet the requirements and hence be FAILED on the whole? Not really. In such situations the business takes optimum risk by adding acceptable thresholds to the current quantifiable data. This is where we need to identify the criteria that prompts the business to release the application and hence mark the Load tests “USEFUL”.

 

Technically a test is passed or failed when the criteria in Table 1 are all satisfied. There are a few cases where the test results can neither be passed nor be failed and falls in the grey area. In an effort to improve the performance of the application many changes are made in the application environment, a FEW of which are:

  • Memory problems
  • Incorrect configurations
  • Overloaded CPU’s
  • Cache sizes
  • Network bandwidth
  • Database indexing
  • Database deadlocks
  • Test scenarios

We learn the best practices/configurations/changes suitable for performance improvement by tweaking the above variables. When the Load tests are run with new changes, every change made in the environment is a learning experience to further improve the performance of the application. In the given test cycle time successful load testing all of these may or may not be accomplished. Among the failed tests we need to identify the usefulness of the load test results to further improve the application. Due to the range of load tests we run during the Testing Life Cycle it is important to differentiate & identify the usefulness of these tests continuously and characterize them accordingly.

Quantifying Reliability through Dynamic and Static Code Analysis

7 Comments

I have been pondering about the application reliability for a few days. Is reliability solely based on a transaction being successful, or is it tempered with an element of security as well? Generally, users will be appreciative of an application that will enable them to securely perform a transaction. So how will you retain the stakeholder confidence regarding application security and reliability? There are many parameters while talking about the security of the application but in this context I would only like to point to the static and the dynamic code analysis. My experiences were mostly around measuring the reliability of a software application based on the dynamic code analysis. The dynamic code analysis leverages the instrumentation data captured at application runtime exposing the vulnerabilities at runtime. It is achieved by running a set of pre-defined test cases which are again limited by the stakeholders’ perceptions. Though the dynamic code analysis helps build the stakeholder confidence, it is not comprehensive. Inorder to evaluate a software application for reliability we need a more holistic approach and can be achieved by complementing the dynamic code analysis with static code analysis. The static code analysis is a source code or byte code analysis of a software application. Though inherently complex, it carries weight since it can be implemented early on and iteratively in the SDLC. Through that iterative process at a certain point when a usable product of small scale is ready to be put to test at run-time (dynamic code analysis), the data gathered can be used to further analyze the static code and vice-versa. The data thus harnessed can be used to remediate the current and future issues of security and performance long before the product is available to the end user. I am emphasizing on the pro-active approach of tackling the issues than being reactive. Although a reliability of 100% would be ideal, I would instill some doubt on the performance of the application because no matter how smart an application design is, there is someone smarter out there to challenge it. Only continuous verification and validation is the key to maintaining an optimum level of user satisfaction and productivity. This in turn would verify the rate of successful transactions that can be quantified as the reliability factor.

A Simple Case for Compatibility Tests

No Comments

After a long sabbatical I chose to post a presentation (ofcourse with my manager’s permission) that I had made long ago at work . I made a few changes inorder to preserve the identity of the company.

The presentation is self-explanatory and might alarm you if you have’nt considered running these tests.

How well is your User Experience Defined?

No Comments

In the ever more digitally transforming world where users have begun to interact with each other and the businesses on a customized integrated dynamic platform, improving user experiences has become pivotal in adding business value. The emphasis here is on Usability, Performance, Capacity, Reliability, Availability and Security of the applications. Quantifying these qualitative factors will lead to defining, improving and managing user experiences. Though each is an extensive study within itself, I would like to prod my understanding with a few fundamental questions. Please apply them to the application environment you are working on and feel free to share your insights.

 Usability

  1. Has the psychology of the users been captured well enough to suitably build the mockups, prototypes and wireframes inorder to get buy-in?
  2. To what extent have the human factors been considered?
  3. How broadly was the heuristic evaluation and feasibility study done?
  4. What is the ease of use and integration with the other applications?
  5. Does the interface appeal to the users and reflect their needs?

 Performance

  1. How is your application behaving in terms of speed?
  2. Is your application exceeding the budgeted service level agreements?
  3. What are the industry performance standards for these transactions?
  4. What are your management’s acceptable thresholds towards latencies?

Capacity

  1. What are your sizing parameters?
  2. What factors constitute the sizing parameters?
  3. How will you compute the values for each of these parameters?
  4. Are the systems capable of handling unexpected variations in workloads?
  5. How much variation can the existing systems gracefully handle?
  6. What is your projection for future trends?

 Reliability

  1. What is the rate of pass and fail of your transactions (include transactions, sub transactions across all the systems and sub systems)?
  2. Do the total count of all the transactions within a user’s single transaction match with the transaction delivery process defined for that particular transaction?
  3. What factors constitute reliability?
  4. What exactly qualifies a transaction as successful?

 Availability

  1. What percentage of the time is application available?
  2. What are the industry standards for downtime?
  3. What are your contingency plans for downtime?

Security

  1. Why is security important?
  2. Are you conforming to the security standards?
  3. What is the cost of non-conformance?

User satisfaction is the Key. I am sure these questions forced you to think about your stakeholders satisfaction?

Let’s talk UEM

No Comments

UEM is gaining momentum as the world of solution providers are now entering the giant virtual community called the CLOUD. UEM encompasses a wide spectrum of study in every component of business process delivery chain ranging from inception of a concept – design and delivery of the product – end user experience. User experience management is very process oriented and helps align the business objectives with the expectations of the end user (customer). The emphasis here is on the end user. Each user experience is different and to strategically exceed the user’s expectations is the key to improving the business value.

Information technology has revolutionized the way world interacts with each other and has empowered businesses to gain competitive advantage. This element alone encourages the businesses to transform its processes across the board to build value. Beneath this transformation lie many horrendous challenges, the biggest being data management thus creating a huge liability. This data can be of various kinds, a few of which are:

  1. The feasibility analysis data
  2. The requirements analysis data
  3. Design and development data
  4. Monitoring Data
  5. Test management data
  6. Maintenance data
  7. Customer / End User data

 Users can be external and internal to an organization. End user data is central to all other components and the qualitative data provided by the end users help business verticals to strategically streamline their processes. My brother once told me that the “key to all the problems lie in the data”. If data is properly analysed and verified then the proposed solutions will be as practical as feasible. The nature of data is dynamic, ever growing and increasingly complex. Businesses need to identify various methods and tools to statistically trend and predict the changing patterns. So how do businesses cope up with this weaving challenge? Based on my experiences and study, I wish to further explore and share the various factors that define user experiences. Stay tuned!

NoteI use the terms end user and customer interchangeably.