Sametime WAS Proxy Stops Working

I’ve had an interesting system down call with an existing Sametime 9.0.1 customer in the past week.  The environment is over 18 months old and consists of every server component in single instances including ST Proxy, Meetings, ST Advanced and all Media components.  The media components were added in Dec 2015 and everything has been fine. The Meeting and Proxy servers both have WAS proxies in front of them to handle traffic over port 80 / 443 separately.  Last week the Meeting node was restarted and the WAS Proxy stopped working.  It would load.  The Meeting server was responding on its own application ports to http(s)://hostname:9080 / 9443 both worked but http(s)://hostname failed with

503 Service Unavailable

The WAS Proxy server showed started.  There were no errors in the logs for that or the ST Meeting server.  Not all WAS proxies were broken because the one in front of the ST Proxy server worked.  In short that error suggests that the Meeting server is offline when we knew it wasn’t and since there isn’t any real configuration for the WAS Proxy other than what node it points to - there was nothing to troubleshoot.  I tried deleting and recreating the WAS Proxy a few times, I tried switching it to use alternate ports 81/444, nothing would fix it.

It took a few days and some combined effort to find.  The WAS team wanted us to upgrade to WAS fixpack 5 but that would mean upgrading 8 working servers in the hopes of fixes one WAS proxy.  There was a suggestion that since the Meeting server was a single, not a cluster, I could just change the Meeting server ports to use 80/443 instead of 9080/9443 and do away with the WAS proxy entirely.  That would get rid of the problem but not fix it, just circumvent it.  I wanted to fix it and find out why it happened.

I had checked the virtual hosts to make sure the hostname / port combination was in the stmeet host and wasn’t anywhere else and discovered that in default_host new wildcard port entries had appeared for ports 80 and 443.  I had already deleted those but that didn’t fix the problem.  How did those port entries appear ? I’ve seen this before when you install new ST servers (as we did with Media in Dec) it come sometimes write virtual host entries to the wrong places.  In fact that was my first guess but after I removed those entries from default_host and it still didn’t fix the problem I was out of ideas.  Then Tony Payne from IBM spotted that the admin_host virtual host which is only used by the SSC had the ports 9080 and 9443 in it when it should only have 8700 and 8701.  Again I assume these were added by the previous server installs and of course I never went to look there because the Meeting server was specifically set to use the STMeet host.

I removed those extra ports from the admin_host virtual host definition and restarted the Meeting node and servers (clearing the temp directories first \profilename\temp and \profilename\wstemp as well as \profilename\config\temp) and that fixed the problem.

So why was the presence of those two ports 9080/9443  (used by the ST Meeting server) that were in a virtual host the ST Meeting server doesn’t even use causing the WAS Proxy to break? Why didn’t the Meeting server itself break and why didn’t the ST Proxy Server which also had a WAS proxy in front of it break?

Turns out that no matter what virtual host mapping you have in place for applications, in Sametime the code checks the admin_host and if a port appears there - it silently disables looking up any other host.  The fact that the Meeting server ports appeared at all in the admin_host meant that the STMeet host was being ignored and the WAS Proxy had no way to direct the traffic.

Unfortunately none of that is visible in the logs or in debug logs which all reported the servers and services using the correct STMeet host.  So it wasn’t something that was able to be seen.  It was a combination of Tony seeing the admin entries and me having had a previous call with a server install which added ports to unwanted virtual hosts that allowed us to find it and fix it.

The ST Proxy server itself wasn’t affected because that server was running on 9082/9445 so its ports weren’t in admin_host and its virtual host therefore wasn’t ignored.

Always good to have a problem fixed and learn a ton of stuff about application behaviour at the same time 🙂

Last week in Eindhoven…

We were in Eindhoven last week at the Engage conference.. over 400 attendees, speakers and IBM’ers gathered for two days of learning, talking and cleaning out the hotel bar of tonic water.. I’ve been to several of the past Engage conferences and Theo always puts on a great event but this was bigger and better than ever.  So why?

IBM sent a lot of executives to Engage with the Opening General Session being given by the new ICS general manager (appointed at Connect in January) Inhi Cho Suh and with product strategy presented by Suzanne Livingston , Sara Gibbons and Chris Crummey.  The first thing Inhi announced was that things are going to change - starting with the Orlando conference which moves to February 22nd at Moscone West in San Francisco.  That’s a big decision and commitment - serious tech companies have conferences in SF and that’s where ICS (IBM Collaboration Services) need to be if they are going to innovate, lead and grow as opposed to maintain.   Inhi also let us know that she has asked the product team to work on a 2020 strategy and that it will include IBM Verse on premise.

Then we got the demo of Verse , Toscana and the thinking behind ICS design.  It’s a shame the OGS wasn’t recorded as Suzanne’s background to their design thinking and Sara & Chris’ demo were both much more detailed (and further advanced) than at Connect in January.  However if you want some idea of what we saw take a look at the OGS video from January (from about 90 seconds in to 20 mins in) here

Aside from the OGS the entire IBM team (of which there were more than 30 in attendance) were everywhere wanting to hear about problems, wanting to listen, wanting to change their relationship with partners, with customers with development for the better.   It’s hard not to be taken up with the positivity and enthusiasm.  I’m an optimistic person but I don’t consider myself naive - I feel that I recognise honesty and intent when people talk to me and I what I heard that ICS was important, investable and part of the core IBM development strategy.

In short I choose to believe until I’m proved wrong.

There were of course plenty of great sessions to attend and, as usual, I missed many of the ones I wanted.  Partly because there were also lots of round table discussions too which I found very interesting.  Apparently I’m still the 8 year old in class first to put her hand up with a question.

My session on SHA2 and SSL vulnerabilities was against Mat Newman’s User Blast and Sara Gibbons’ with Toscana.   We were all along the same corridor and I watched person after person go past my room on their way to Mat or Sara’s , so thank you to everyone who chose to hear about security instead and filled out my room.  I hope you found it useful  (and the hand puppets helpful).  For anyone who wasn’t there I have added it to slideshare 

On the final evening of the event Theo invited speakers to a dinner preceded by a surprise.  The surprise was that 32 of us were sent into the Escape Rooms.. you are locked in a themed room for an hour and have to decode lots of puzzles to find the code to get out.  I’ve always wanted to try an Escape Room and I chose the “Tomb” which was an Egyptian tomb and went in with a team including Tim and Mike, Sue Smith, Bill Malchisky, Mat Newman, Rene Winkelmeyer and Carl Tyler.  We didn’t make it out in time - we were soooooo close.. but a few things to bear in mind

  • The tomb was entirely dark except for a small flashlight Tim found hidden in a basket in a corner and some candles.  My night vision varies from “bad” to “crappy”
  • Having multiple alpha males in a small space all shouting instructions at each other may not be the best way to get out quickly
  • There was sand everywhere.  Everywhere.  My shoes may never recover
  • Tim is great at puzzles but apparently in the dark, without his glasses (which he forgot to bring in) and with 7 people shouting at him to hurry up - not so much
  • There was a really cool effect where we completed a puzzle and lasers appeared out of the eyes of a skull on the wall and we had to position 7 different mirrors around the room to bounce the lasers around to hit a small hole on the wall.  We got so excited doing that we didn’t notice we had completed the puzzle and a new “door” had opened for about 10 mins.
  • I was given a cryptex to decode and open.  I broke it by pulling the end off.
  • With only 1 light source we could only do one thing at a time so some of us spent a lot of time kneeling in the sand feeling around fake skeletons for clues

In the end it was great fun and I’d definitely want to do it again.

All of that plus a chance to talk to lots of customers and see lots of friends - some of which came along just to meet up.

I hope you’re recovered Theo - because we’re all up to do it again next year.