A Sametime Chat Mystery

Today I was contacted urgently by a site I did an install for back in early September.  The install went well and I left them several months ago with working components, but apparently about a week ago people stopped being able to login to the Community server. In fact not even the SSC could access it.

.. and yet no-one had changed anything at all.  I do love a good mystery so I thought it would be useful to someone (or even just future Gab) to document what I did:

  • verified if port 1533 was listening using netstat -an |find /i “1533”.
  • verified there were no running AV services that could interfere with the ports.
  • checked if the ST services were running, in fact only about 6 were.
  • tried to start some of the services that weren’t running and they failed immediately.
  • since no-one touched Sametime my next guess was a Windows update that caused a problem.
  • checked the Windows networking settings hadn’t been overwritten (they had) . Although those settings shouldn’t cause the services to fail completely it was worth resetting them.
  • I then added vp_trace_all=1 to the [Debug] settings in the sametime.ini which creates detailed log files in the \ibm\domino\trace directory.
  • having added that I could see log files being created for every service, even the ones that wouldn’t stay started. In fact those ones recreated every couple of minutes.  So the services were trying to start and failing.
  • reviewing the log files I could see on things like STPlaces there was a JVM error, but I put that aside for the time being in case it was a dependency issue.
  • in other logs such as STDirectory I could see broken networking errors and just before that I could see a comment about switching to TLS.

    A-ha! Well, that’s new.

  • checking the sametime.ini I found:
    VPS_PORT=1516
    VPS_TLS_PORT=1516

    which I changed to:
    VPS_PORT=1516
    #VPS_TLS_PORT=1516

    My guess being an incomplete TLS configuration from the SSC.  Having done that the server restarted perfectly and all services started.  The SSC could then access the server with no problem.

Of course once I had spent 4hrs doing that I then found a technote on it which I never would have found before I saw the TLS entry.  Here’s the technote .

Sometimes it’s a rollercoaster but so long as I get things working  I’m calling that a good day.  Now back to building more Connections servers.

 

Hello IBM? Your Sametime Requirements are borked

The first thing I do when building any product is go check the latest system requirements in case they have changed.  That’s a bit of a challenge with Sametime since the system requirements are (and have been for some time) nonsense.  No reference to WebSphere 8.5.5 and definitely no reference to the fixpacks or even the individual servers.

Try it yourself.  Go here http://www-01.ibm.com/support/docview.wss?uid=swg27007792 and click on any of the requirements for 9.0.1 Complete, Communicate or Conference to see what I mean.

Buried in the actual help documentation is the phrase “Restriction: Most of the Sametime 9.0.1 servers that run IBM WebSphere® Application Server require version 8.5.5.8” but that doesn’t help anyone wanting detailed system requirements or who doesn’t find that page.

If anyone from the ST or documentation team sees this – please fix it.

MWLUG Presentations & Wrap Up

After serveral weeks travelling around the US doing work and visiting friends we ended up in  Austin for MWLUG.  Another great event organised by Richard Moy and the team with lots of great sessions including Scott Souder’s session on IBM Verse, more on Project Toscana and Ben Langhinrichs’ on Data Visualisation which is an area I’m working a lot in right now.

I had three presentations during the conference and ended up doing four to fill in for a session that was cancelled at the last minute.  The Adminblast session I gave was one I hadn’t looked at in over a year until 20 minutes before I started so we all went on a magical journey discovering what I meant to say on each slide as it appeared.

Austin was a great town which I didn’t get to see enough of but luckily we arrived early on the Saturday before the rains started and walked around enjoying the bars and the music. Of all the amazing food on offer I will miss the Vegan Nom taco truck the most. Now to try and reproduce those flavours at home…

IBM Traveler, Management and Security

 

The SSL Problem and How To Deploy SHA2 (with Mark Myers from LDC Via)

 

Adminblast Emergency MWLUG Session (original co-authored with Paul Mooney)

 

Deploying Instant Messaging For Mobile Devices

 

Sametime For Mobile Users – #NWTL

My final New Way To Learn session today was looking at the Sametime mobile clients, Connections Chat and Sametime Meetings.  I hope you find it useful and as always the full recorded session is available in the #NWTL Community.

The slides by themselves are below

In this session we looked at the architecture behind the Sametime mobile applications for chat and meetings. What do you need to deploy to support mobile users and what features are available to them on the different mobile platforms. We also looked at potential bottlenecks, security and troubleshooting for the mobile clients.

Sametime Critical Hit – Missing Servlets

This week I will be presenting on upgrading Sametime to 9.0.1 as part of IBM’s New Way To Learn program (see here for details – requires login ).  In preparation for that I wanted to take an existing environment I had and step through the upgrade of all components using the documentation.  I discovered a few things I’ll share in my presentation and on this blog but one spectacular reoccuring critical full stop can’t move any further what was THAT – problem I thought best to share now.

After successfully upgrading the Community server (I know it was successful because the installer and the logs told me so 🙂  I discovered that the server couldn’t start the policy servlet.  It was hard to see since all the other servlets started fine but if I watched the console as it tried to start I saw a servlet error when loading Policy and a message saying com.lotus.sametime.admin.policy.PolicyServlet could not be located.  Luckily I’ve seen similar errors before in some 9.0 upgrades and on those it was the STCore.jar file which sits in the Domino program directory that was at fault.  I took a backup of that STCore.jar and replaced the one in the program directory with one from a 9.0 server (bear with me, it was just to prove something) and sure enough, the server came up and launched Sametime this time finding the Policy servlet but missing the UserInfo servlet.  

OK so I knew where I was.  The STCore.jar that installed as part of the 9.0.1 upgrade was missing some policy files.  I rename both the new 9.0.1 STCore.jar and the copy of my 9.0 STCore.jar to STCore.zip and then extracted them both so I could compare. I drilled down to the folder it claimed was mising com\lotus\sametime\admin\policy and in the screenshots below you can see my 9.0.1 version only has 4 files whereas my 9.0 version had 6 files including the missing one (PolicyServlet).

skitch 2

The STCore.jar as installed by the 9.0.1 upgrade

skitch

The STCore.jar from my 9.0 server

As you can see, the two missing files include the one the server was looking for.  I extracted the two files and added them to my 9.0.1 folder then compressed everything again as STCore.zip and renamed to STCore.jar.  I copied this new “fixed” (I hope) STCore.jar to the Domino directory and the server started with no problems.  At least none I could immediately see.

I had come across this once before (an incorrect STCore.jar) on an earlier customer upgrade so it’s a recurring problem. I’m not sure what happens during the upgrade process – the file itself is dated 25th April 2016 so it’s not built during the install and isn’t broken for new installs.  So two suggestions

1. Always backup STCore.jar before starting any upgrade along with sametime.ini vpuserinfo.nsf stconfig.nsf etc

2. If your server console is reporting a missing servlet during launch then verify that servlet exists in the  STCore.jar

Sametime 9.0.1 Arrives – Sort Of

Like the sun breaking through the clouds on a gorgeous May holiday weekend, the IBM site has just published a document announcing Sametime 9.0.1 with a release date of May 3.

There’s no documentation or even system requirements out there yet but here are some delicious part numbers from the IBM download site to get your teeth into.

I’m not a big fan of installing without documentation but as soon as it appears I’ll be documenting both a clean install and an upgrade process.  If you want any advice on how to upgrade your existing environment feel free to email me.

Screen Shot 2016-05-03 at 14.34.30

Sametime WAS Proxy Stops Working

I’ve had an interesting system down call with an existing Sametime 9.0.1 customer in the past week.  The environment is over 18 months old and consists of every server component in single instances including ST Proxy, Meetings, ST Advanced and all Media components.  The media components were added in Dec 2015 and everything has been fine. The Meeting and Proxy servers both have WAS proxies in front of them to handle traffic over port 80 / 443 separately.  Last week the Meeting node was restarted and the WAS Proxy stopped working.  It would load.  The Meeting server was responding on its own application ports to http(s)://hostname:9080 / 9443 both worked but http(s)://hostname failed with

503 Service Unavailable

The WAS Proxy server showed started.  There were no errors in the logs for that or the ST Meeting server.  Not all WAS proxies were broken because the one in front of the ST Proxy server worked.  In short that error suggests that the Meeting server is offline when we knew it wasn’t and since there isn’t any real configuration for the WAS Proxy other than what node it points to – there was nothing to troubleshoot.  I tried deleting and recreating the WAS Proxy a few times, I tried switching it to use alternate ports 81/444, nothing would fix it.

It took a few days and some combined effort to find.  The WAS team wanted us to upgrade to WAS fixpack 5 but that would mean upgrading 8 working servers in the hopes of fixes one WAS proxy.  There was a suggestion that since the Meeting server was a single, not a cluster, I could just change the Meeting server ports to use 80/443 instead of 9080/9443 and do away with the WAS proxy entirely.  That would get rid of the problem but not fix it, just circumvent it.  I wanted to fix it and find out why it happened.

I had checked the virtual hosts to make sure the hostname / port combination was in the stmeet host and wasn’t anywhere else and discovered that in default_host new wildcard port entries had appeared for ports 80 and 443.  I had already deleted those but that didn’t fix the problem.  How did those port entries appear ? I’ve seen this before when you install new ST servers (as we did with Media in Dec) it come sometimes write virtual host entries to the wrong places.  In fact that was my first guess but after I removed those entries from default_host and it still didn’t fix the problem I was out of ideas.  Then Tony Payne from IBM spotted that the admin_host virtual host which is only used by the SSC had the ports 9080 and 9443 in it when it should only have 8700 and 8701.  Again I assume these were added by the previous server installs and of course I never went to look there because the Meeting server was specifically set to use the STMeet host.

I removed those extra ports from the admin_host virtual host definition and restarted the Meeting node and servers (clearing the temp directories first \profilename\temp and \profilename\wstemp as well as \profilename\config\temp) and that fixed the problem.

So why was the presence of those two ports 9080/9443  (used by the ST Meeting server) that were in a virtual host the ST Meeting server doesn’t even use causing the WAS Proxy to break? Why didn’t the Meeting server itself break and why didn’t the ST Proxy Server which also had a WAS proxy in front of it break?

Turns out that no matter what virtual host mapping you have in place for applications, in Sametime the code checks the admin_host and if a port appears there – it silently disables looking up any other host.  The fact that the Meeting server ports appeared at all in the admin_host meant that the STMeet host was being ignored and the WAS Proxy had no way to direct the traffic.

Unfortunately none of that is visible in the logs or in debug logs which all reported the servers and services using the correct STMeet host.  So it wasn’t something that was able to be seen.  It was a combination of Tony seeing the admin entries and me having had a previous call with a server install which added ports to unwanted virtual hosts that allowed us to find it and fix it.

The ST Proxy server itself wasn’t affected because that server was running on 9082/9445 so its ports weren’t in admin_host and its virtual host therefore wasn’t ignored.

Always good to have a problem fixed and learn a ton of stuff about application behaviour at the same time 🙂