When I sent a tweet to @AlinaAvadani and @gbrindusa in January 2013 all I expected was to have fun participating in a test competition. And some fun we had – you can read all about it here: http://devblog.xing.com/qa/nrg-global-test-competition-report/
What I did not expect was to be sitting in a room with Matt Heusser, Maik Nogens, Chris George, Alina Avadani and Katharina Gillmann almost two years later deciding which team would be the best software testers in the world.

Our prize for winning the NRGGlobal Test Competition had turned out to be being part of the judges of the final round of the Software Testing World Cup. How could we say no to that?

For me the finals started way too early at 3 am after waking up from a strange dream and being unable to fall asleep again. A few hours and a bit of travelling later I found myself at the reception of the Agile Testing Days holding the precious name tag with the title “Global Judge” in my hands. The continental winners were already in the big conference room preparing for the competition and I had a first look from the stage.
The tension in the room was clearly noticeable and I started getting really excited myself.
Picture thanks to @jdiaz_berlin

The competition kicked off almost on time and I spent the first minutes listening to the great moderation by Pete Walen and Kira and taking care of Twitter. I will have to listen to the recording of the competition as soon as it’s available since the few bits of the moderation I could focus on were very interesting and I’m sad to have missed so much of it.
Pete and Kira moderating the event - picture thanks to @jdiaz_berlin
The simple reason for me missing the moderation were the first bug reports coming in. It was really impressive to see how fast the teams started reporting inconsistencies, usability issues or even functional defects of a live website: zalando.co.uk. I immediately tried to reproduce them and to get a feeling for the way the teams worked. This also included me walking over to their team spaces and looking over their shoulders. I quickly realizes how focused the teams were – I had to get really close to someone to make them even look up (sorry for that!).

What felt like 15 minutes on the participant side during the NRGGlobal Test Competition were quite relaxed but interesting three hours on the judges’ side of the STWC. As soon as the competition was over the tension in the room dropped instantly. The teams started talking to each other and seemed to notice their surroundings for the first time since hours. I had a few talks with the teams and then the judges were off to dinner in the hotel restaurant. Almost exactly at 9 pm we settled down in the Havana lounge (which was used exclusively by us) and started judging.
Maik, Matt and Chris looked at the test reports, Alina, Katharina and I looked at the issues.
The judges hard at work - picture thanks to @BadTesting
We checked the content and structure of headline and description, discussed if the severities the teams had selected matched our expectations and tried to reproduce the issues which caught our interest. Since over 180 defects had been reported it was impossible to reproduce them all in a reasonable timeframe. To answer a question which had been asked on Twitter: The pure amount of issues reported didn’t automatically determine the winner.
The zalando team had provided us with a list of issues that were of biggest interest to them which also helped in scoring the teams.
Note at the zalando booth on the following day - picture thanks to @ZalandoTech

One of the issues proved to be particularly hard to reproduce: It had been reported by three teams so we were sure that the problem existed but it only appeared after a lot of retries. I’m really curious to know what made it so sporadic.

Once we were done with the issues it was already late in the evening but we still had the reports to look at. We switched tasks and while Matt, Maik and Chris looked at the issues Alina, Katharina and I looked at the reports. It was a great learning experience to see the different styles of test reports and to discuss the content.

Looking at the issues and reports it was clear that we were grading at the work of the six best test teams of the world – the quality of the work products was very high.

It was close to 2 am until we finally had graded everything (including teamwork) and the formulas in the spreadsheet produced a sorted list of the teams. We wouldn’t be testers if we wouldn’t have examined this list closely and performed a sanity check. Since everything lined up we happily shook hands and headed off. Some of us went back to the bar while I went straight to bed. I can remember only very few times when mental work had exhausted me so much.
Still it felt good to have finished the judging and to know that we wouldn’t have to continue it the next day.

During the day a few emails flew back and forth containing the winners list and I got really paranoid not to allow any other attendee to see it.

Fast forward to the evening event of the Agile Testing Days. The winners were announced during the costume party and it was great to see the reactions to 3rd (Team Army Ants representing Europe) and 2nd (Team Annunciation representing Oceania) place. The reaction from the winning team “Cesar Brazil” (representing South America) however completely blew me away. I would have never expected tears of joy during the Software Testing World Cup. Then again as Markus Gärtner put it: “Germany won the Soccer world cup in Brazil, I find it fair that Brazil won the more important Software Testing world cup in Germany. #agiletd” (https://twitter.com/mgaertne/status/532251363481518081)
The winning team during the award ceremony - picture thanks to @testchick

Matt asked us after the judging was done if we would prefer to be judges again next year. My initial answer still stands: I actually don’t care – I just want to be part of it! José Díaz already promised that there will be a Software Testing World Cup 2015 which makes me very happy.

I recently changed my job from being a tester in an agile team at a web company to being the quality assurance in a team at an automotive company. At least in germany this industry is dominated by what I call “old-school” development techniques. V-Model, big requirements documents, manual regression test phases, …
As such I was pleasantly surprised when I was told that there’s a jenkins server doing continuous integration. The pleasantness ended when I learned three things about this server:
– it’s actually a desktop computer under a desk (high availability doesn’t seem to be an issue, still this scares me)
– builds are triggered by a timer (7 am, 11 am, 3 pm, “daily build” at 10pm)
– a build takes at least 90 minutes, more if the build is running in parallel for two branches
In their defense it has to be said that the server had been worked on by any developer that a few minutes to spare – there was never a single responsible person with a budget for it.

Now there’s me and the task to “improve build times to speed up feedback”.

The application I’m working on is based on eclipse rcp, written in java and consists of a few dozen plugins. It’s distributed to our customers in an obfuscated package.

Jenkins performs the following steps in a regular build:
– compile the application code
– compile the test code
– compile debug extensions used internally
– obfuscate the application code
– run the tests
– gather static analysis data
All compilation steps are done for x86 in 32bit and 64bit.
The build isn’t just done for CI purposes, the same build is used to generate the packages which are delivered to our customers.

After talking to a few people (and more importantly: keeping my ears open to what’s being said around me) I learned the following:
– There is a 64bit version of the app, but it doesn’t work properly because the license manager we use is only available for 32bit
– There had been problems in the past with obfuscated code not working properly but there’s not a single test actually looking for those – they had all been found manually
– the debug extension is only needed for packages which had been shipped to customers
– the java compiler is using four threads only because more didn’t speed up the build due to disk io limits

Being the new guy in the company I felt free to challenge many things and experiment a lot.

First step:
Let’s get rid of the obfuscation for CI. This had been a parameter for the job already, all I had to do was to change it from default “true” to “false”. Woops, build times are down from 91 minutes to 58. Isn’t at nice?
Since we didn’t want to give up testing with an obfuscated build completely I enabled the obfuscation for our nightly build – noone cares how long that takes.

Second step:
No more debug extensions. Noone is using them in CI, so why are we building them?
Sadly, this brought the build time down only by a minute. Then again, a minute also adds up during the day…

Third step:
No more 64bit builds. As mentioned before the 64bit version of our app isn’t working properly, so why would we want to build it in CI? Once the problems are sorted out I’ll add it again.
This took the build time down another 4 minutes – less than I expected, to be honest.

Fourth step:
Luckily for us our IT department is fast and maintains quite a stock of hardware inhouse. My request for an SSD to put the jenkins workspace on was fulfilled amazingly fast and after some trouble with the Dell BIOS I could move the workspace.
What shall I say? Disc IO definitely WAS an issue – the build time dropped by 19 minutes, taking us down to a final 34 minutes. That’s a bit more than a third of the original build time so I consider it a success so far.

What did I learn from this experience?
First of all: Talk to people and keep your ears open. I wouldn’t have touched the 64bit builds had I not overheard a discussion about them not working.
Second learning: Be courageous and just do it! Everybody could have done what I did – yet somehow noone did. Especially getting the SSD was a simple thing to do yet somehow everybody shied away from it because of the external dependency to our IT department.
Last learning: Measure, measure, measure! I was expecting the SSD to help us a lot – but now that I have hard numbers I can justify the cost. On the other hand I expected the 64bit to be more expensive – good to know that it isn’t.

What’s next?
After going for the low hanging fruits there’s still things to do.
I did a plot of the test execution times by package compared to the amount of tests in that package which clearly shows that 10% of our tests take up 75% of the test execution time – which equals 17 minutes. Single test methods take up to a minute to execute!
I wonder if those tests can be sped up. If not, I might violate the “all tests, all green, all the time” principle and move the slow and not so important tests to the daily build.

Another thing to investigate is the amount of parallel java compiler threads. Right now there are four compiler threads – according to coworkers because more threads didn’t help due to disc io. Now that this bottleneck is gone let’s see what we can do there.

By the way, the daily and weekly builds (with an extended testing scope) which existed before are still available. They still build the full package (incl. obfuscation). Even for those tests moving the workspace was beneficial, they came down to 59 minutes.

That’s it for now, I’ll post updates once I have more things to report.

At the beginning of November I gave a talk at the QS-Tag in Nürnberg.
The theme of the conference was (inspired by the 10 year anniversary of the Manifesto for Agile Software Development) “test and agile”.

During the first day much to my surprise the “Manifesto of Agile Software Testing” was announced. The attendees were invited to sign the manifesto to declare their support.

It states (I’ll only quote the english text here, there was a german one as well):

We who test in agile projects accept responsibility to contribute to the development of clean, high-quality software.
Through this work we have come to value:

constructive relationship between all team members over testing processes and tools
tested software over comprehensive test documentation
customer collaboration over formal acceptance testing after end of development
responding to change over following a test plan

That is, while there is value in the items to the right, we value the items on the left more.

(emphasis not by me, it’s the same as on the printout the conference attendees got)

As soon as I learned about this testing manifesto I didn’t like it. It took me some time and discussions with other people to understand why that happened and I decided to share my opinion here.

Creation of the testing manifesto
The first thing about the testing manifesto that worried me was the fact that it had been created by

Two of my acquaintances have been dealing with craftsmen lately.
One thing I noticed when talking to them was the fact that craftsmen are not that different from programmers – in the tools they use as well as in the problems that occur.

Craftsmen often have to deal with ambiguous requirements (“I need two power sockets in that corner of the room”) so when they complete their work they perform a manual “developer” test and an acceptance test. Examples for manual tests are checking if there’s water coming out of the faucets and verifying that lights can be switched on and off in a room.
After these tests have been passed they do an acceptance test with the client to make sure the requirements have been met.

What sounds good in theory has a big flaw: Often there’s not only one craft working at the same place but multiple ones. Once the electricians are done the painters will start – similar to programmers building feature after feature.
It’s a well known fact that without regression tests in place chances are high that newly developed functionality breaks existing functionality. Automated continuous regression tests don’t prevent the functionality from breaking but they warn programmers quickly that it happened.

Coming back to craftsmen this is where we find room for improvement.
Should the painter make a mistake and damage a cable he might stop the lights in a room from working – without noticing it. Even the normal manual “developer” tests and acceptance tests may miss it for a long time – why should the lights be checked again if they worked previously? As Murphys law tells us the broken lights won’t be noticed until after the bills have been paid and it’s dark outside.

One solution for this problem would be to perform a complete test of every socket, faucet and lightbulb once all building activities have finished. As with software this may work on small projects but does not scale to large ones, e.g. office buildings. These tests would also happen too late – it’s better to get immediate feedback if something goes wrong. This is the same reason why regression tests are run on continuous integration servers in the software world.

Creating automated regression tests for craftsmen isn’t as difficult and unrealistic as it seems at first glance.
– The ends of Electrical cabels could be outfitted with sensors that detect a drop in voltage and trigger a battery powered acoustic alarm
– Faucets could be outfitted with sensors that check the water pressure and trigger an alarm if it drops. This mechanism doesn’t only help in noticing when a pipe gets damaged, due to the quick feedback it helps reduce water damage
– Gas pipes could be outfitted with pressure detectors as well. Needless to say how important it is to notice gas leaks quickly. At the same time it’s really important that the alarm mechanism doesn’t strike any sparks.

Currently the cost of fixing a damage (bug) after finishing all building activities must be very low – I can’t think of any other reason why the ideas I mentioned haven’t been put into use yet.
Another explanation could be that it’s widely accepted that something will be broken by the time the last craftsman leaves the building.

Last monday morning a coworker walked into the office and told the following story:
“There I was, out of the club and not exactly sober. The easiest way to get home seemed to be renting a StadtRad [network of hire stations in major cities in germany where you can pick up & return bikes – see here for details].
So I went to the hire station and rented a bike with my iPhone. Strangely I couldn’t take the bike out of the station so I tried to rent another one. This worked fine so I rode it home.
As I wanted to return my bike at the hire station close to my house the iPhone told me that I’m not close enough to the hire station to return the bike. Whatever, I just locked it at the station [the bikes are marked as “returned” automatically then] and went to bed.

After waking up again I looked at my phone and realized that the bike was still marked as rented. A bit of panicking and a support call later I found out that my first attempt at renting the bike didn’t work because I rented a bike in Frankfurt! [We’re located in Hamburg, over 300km away]
So when I used the “return” function it didn’t work because I wasn’t close enough to a hire station in Frankfurt. The bike was at the hire station in Frankfurt the whole night!”

Upon hearing this story I remembered the comic from Andy Glover about ways to become a testing expert
His idea of “friday-nightility testing” seemed like a fun way to get drunk on company time when I first read it yet I wasn’t convinced that there were any real world applications of this technique.
Looking at the story I had just been told I have to revise my previous assumption.
So what can we learn from this accidental “friday-nightility testing” session?

To start with something positive: As long as you’re still able to unlock your iPhone being drunk doesn’t prevent you from renting a bike with the StadtRad iPhone app – which shows that they did their usability homework.

At the same time there are no sanity checks in place. The iPhone is location aware and the app even uses this on numerous occasions. During the renting process this information is ignored completely while it could be used to prevent people from hiring bikes that are too far away. This distance needs to be defined, it should at least be smaller than the city limits.
Even if people have to be able to hire bikes that far away at least an additional warning message should be displayed.

The next problem is the fact that bikes that are hired aren’t automatically returned if nobody picks them up after some time. This time needs to be defined as well, it probably should be around 2 minutes.
There are two reasons for that:
These bikes are charged by the minute (so customers might loose money) and as long as it’s unlocked anybody may drive away with it.

One session of “friday-nightility testing” found two issues: one a possible improvement for the initial renting process, the other one a major issue where customers money is in danger.
Regarding the technique itself some testers might feel more comfortable with a note-taker at their side. I suggest to use pen&paper only, audio or visual recordings might lead to awkward situations.

What about your software?
Is it still usable if the customer isn’t 100% focussed and enters illogical data? Does it actively prevent him from making obvious mistakes?
Have you ever used “friday-nightility testing” yourself?

Sorry for the inconvenience – the article will be moved over to another blog.
Updates including the new location to follow soon.

Update: The blog post has been published in the company blog

I’ve been following Elisabeth Hendrickson on Twitter for a while now so I knew she was working on “a seekrit project”. On March 2nd I finally found out that this project was called Entaggle and open for registrations.
There wasn’t much of an explanation besides the tag line “Get recognition. Give recognition. Tag, you’re it.” so after logging in for the first time I started to explore the site.
The concept isn’t hard to get and the UI is very clear so my question “what is this?” was answered rather quickly.

There wasn’t too much in regard of a help system, explanations or user manual so my tester reflexes were triggered and I wanted to dig around to see what I could really do – and if I would be able to break the system. I knew the site had been developed in Ruby on Rails using lots of automated testing by a very skilled tester so it was clear this wasn’t going to be easy.
To my own shame I have to admit that the thought “testing on production systems is bad” didn’t occur to me until I was already way into testing.

Elisabeth on the other hand had been monitoring the site pretty closely and quickly discovered what was going on. It didn’t take long for her to come up with the “testing on production systems is bad” thought – but instead of locking the system, user accounts or IP ranges she did something very clever.

It was so clever it took me 24 hours to fully realize what had happened.

She sent out a message asking people to stop testing on the live server – because a staging server was coming. A mere 2 hours later the message of a staging server for everybody to test on was sent out.
That was exactly what I wanted to hear. A new product I had never seen before? A server with an official OK to test on? My evening was saved!

As I finally went to bed I had long decided that I would spend my lunch break extending my exploratory testing skills and have a good look at this staging server. Feedback could be provided easily by filling out a typewith.me document. A few bugs, notes & concerns later I had to stop and resume my day job but I felt really good. I felt that I had contributed to a great project by doing what I love the most.
This positive energy held on until dinner where I finally took a step back and realized what had happened there.

I had become a crowdsourced tester – and I didn’t even notice it!

I know it would be hard for large web portals to do something like this – replicating the live environment is no easy task. Still I can remember many occurrences where I would have loved to follow up on test ideas I got while using a website. My (admittedly little) professional experience with web companies shows that all of them have some kind of beta server, so why not open it up for the public?
Especially when it comes to security testing I think it would be better to test on a system with fake data that can be monitored very closely instead of having those tests happen on a system where real user data is in danger. Collecting feedback isn’t that different from collecting regular feedback – all you have to do is integrate the customer care and quality assurance departments better.

So what can we learn from this experience?

  1. If you notice people messing around with your system – don’t shut them out, channel the energy into something positive
  2. If you already have a test environment that could be opened up to the public – what’s stopping you from doing it?

I know I’m going to talk about this to some people in my company tomorrow and see how they react.