Connections Failure To Authenticate

Last week was spent working on a PMR where a newly migrated (side by side) Connections 5.5 environment refused to let anyone access any applications.  I could login using any credential but the Homepage wouldn’t load and any application that required authentication failed including Communities.

Here are some of the errors in the logs

CLFRW0016E: Could not retrieve details for the user with login ID gabriella.davis@domainname.com due to an exception. The exception occurred when retrieving the details via the virtual member manager directly: {1} (in system out for utilcluster which contains homepage)

ADMN0022E: Access is denied for the expandVariable operation on AdminOperations MBean because of insufficient or empty credentials. (in ffdc)

“CustomAuthent E com.ibm.connections.httpClient.CustomAuthenticatorFactory <init> SONATA: authenticator class name is missing!  {in SystemOut for InfraCluster)

webapp E com.ibm.ws. webcontainer.webapp.WebApp logServletError SRVE0293E: [Servlet Error]- [action]: com.ibm.tango.exception.AuthContextException: com.ibm. connections.directory.services.exception.DSException: com.ibm. connections.directory.services.exception.DSOutOfServiceException: java. lang.NullPointerException (in Systemout for InfraCluster).

 

Here (amongst others) are the things we tested / changed / reverted that didn’t fix it.  Bear in mind a working 5.0 production environment with the exact same configuration had no problems during this time.

  • LDAP was fine (we could login). For giggles we changed credentials and back again
  • We changed the login options from mail;cn;uid (which we use in this environment and works fine) to uid;mail;cn
  • We removed the mapped credentials for application security that were put there by the installer and put them back again – apparently that sometimes works
  • Set the authentication under application security for Communities and Profiles from None to Everyone just to confirm where the problem was
  • About 100 other things

Basically we managed to establish the issue was any intraservice communication but not why.  Eventually it went to L3 who isolated the error  as being something in the LotusConnections-Config.xml.

CustomAuthent E com.ibm.connections.httpClient.CustomAuthenticatorFactory <init> SONATA: authenticator class name is missing!  

That file had been migrated as an artifact via the migration tool and was the same as 5.0 but in there was the line <tns:customAuthenticator name=”DefaultAuthenticator” xmlns:tns1=”http://www.ibm.com/uiextensions-config”/>;

which they asked to be changed to <customAuthenticator name=”DefaultAuthenticator”/>

That immediately fixed the problem.

No-one is quite sure how that setting ever got into LotusConnections-Config.xml but my guess is during a CCM/Filenet installation.  The interesting thing is that it works in 5.0 but breaks 5.5. Maybe it requires you to have CCM installed to work as the 5.5 environment (mine or IBMs) didn’t have that.

Still a nice simple fix for such a painful problem and maybe somewhere for you to check when doing your own debugging.

Thanks very much to David McCarthy & the IBM L2 team for prioritising and working the problem.

Sametime For Mobile Users – #NWTL

My final New Way To Learn session today was looking at the Sametime mobile clients, Connections Chat and Sametime Meetings.  I hope you find it useful and as always the full recorded session is available in the #NWTL Community.

The slides by themselves are below

In this session we looked at the architecture behind the Sametime mobile applications for chat and meetings. What do you need to deploy to support mobile users and what features are available to them on the different mobile platforms. We also looked at potential bottlenecks, security and troubleshooting for the mobile clients.

Sametime Critical Hit – Missing Servlets

This week I will be presenting on upgrading Sametime to 9.0.1 as part of IBM’s New Way To Learn program (see here for details – requires login ).  In preparation for that I wanted to take an existing environment I had and step through the upgrade of all components using the documentation.  I discovered a few things I’ll share in my presentation and on this blog but one spectacular reoccuring critical full stop can’t move any further what was THAT – problem I thought best to share now.

After successfully upgrading the Community server (I know it was successful because the installer and the logs told me so 🙂  I discovered that the server couldn’t start the policy servlet.  It was hard to see since all the other servlets started fine but if I watched the console as it tried to start I saw a servlet error when loading Policy and a message saying com.lotus.sametime.admin.policy.PolicyServlet could not be located.  Luckily I’ve seen similar errors before in some 9.0 upgrades and on those it was the STCore.jar file which sits in the Domino program directory that was at fault.  I took a backup of that STCore.jar and replaced the one in the program directory with one from a 9.0 server (bear with me, it was just to prove something) and sure enough, the server came up and launched Sametime this time finding the Policy servlet but missing the UserInfo servlet.  

OK so I knew where I was.  The STCore.jar that installed as part of the 9.0.1 upgrade was missing some policy files.  I rename both the new 9.0.1 STCore.jar and the copy of my 9.0 STCore.jar to STCore.zip and then extracted them both so I could compare. I drilled down to the folder it claimed was mising com\lotus\sametime\admin\policy and in the screenshots below you can see my 9.0.1 version only has 4 files whereas my 9.0 version had 6 files including the missing one (PolicyServlet).

skitch 2

The STCore.jar as installed by the 9.0.1 upgrade

skitch

The STCore.jar from my 9.0 server

As you can see, the two missing files include the one the server was looking for.  I extracted the two files and added them to my 9.0.1 folder then compressed everything again as STCore.zip and renamed to STCore.jar.  I copied this new “fixed” (I hope) STCore.jar to the Domino directory and the server started with no problems.  At least none I could immediately see.

I had come across this once before (an incorrect STCore.jar) on an earlier customer upgrade so it’s a recurring problem. I’m not sure what happens during the upgrade process – the file itself is dated 25th April 2016 so it’s not built during the install and isn’t broken for new installs.  So two suggestions

1. Always backup STCore.jar before starting any upgrade along with sametime.ini vpuserinfo.nsf stconfig.nsf etc

2. If your server console is reporting a missing servlet during launch then verify that servlet exists in the  STCore.jar

Sametime WAS Proxy Stops Working

I’ve had an interesting system down call with an existing Sametime 9.0.1 customer in the past week.  The environment is over 18 months old and consists of every server component in single instances including ST Proxy, Meetings, ST Advanced and all Media components.  The media components were added in Dec 2015 and everything has been fine. The Meeting and Proxy servers both have WAS proxies in front of them to handle traffic over port 80 / 443 separately.  Last week the Meeting node was restarted and the WAS Proxy stopped working.  It would load.  The Meeting server was responding on its own application ports to http(s)://hostname:9080 / 9443 both worked but http(s)://hostname failed with

503 Service Unavailable

The WAS Proxy server showed started.  There were no errors in the logs for that or the ST Meeting server.  Not all WAS proxies were broken because the one in front of the ST Proxy server worked.  In short that error suggests that the Meeting server is offline when we knew it wasn’t and since there isn’t any real configuration for the WAS Proxy other than what node it points to – there was nothing to troubleshoot.  I tried deleting and recreating the WAS Proxy a few times, I tried switching it to use alternate ports 81/444, nothing would fix it.

It took a few days and some combined effort to find.  The WAS team wanted us to upgrade to WAS fixpack 5 but that would mean upgrading 8 working servers in the hopes of fixes one WAS proxy.  There was a suggestion that since the Meeting server was a single, not a cluster, I could just change the Meeting server ports to use 80/443 instead of 9080/9443 and do away with the WAS proxy entirely.  That would get rid of the problem but not fix it, just circumvent it.  I wanted to fix it and find out why it happened.

I had checked the virtual hosts to make sure the hostname / port combination was in the stmeet host and wasn’t anywhere else and discovered that in default_host new wildcard port entries had appeared for ports 80 and 443.  I had already deleted those but that didn’t fix the problem.  How did those port entries appear ? I’ve seen this before when you install new ST servers (as we did with Media in Dec) it come sometimes write virtual host entries to the wrong places.  In fact that was my first guess but after I removed those entries from default_host and it still didn’t fix the problem I was out of ideas.  Then Tony Payne from IBM spotted that the admin_host virtual host which is only used by the SSC had the ports 9080 and 9443 in it when it should only have 8700 and 8701.  Again I assume these were added by the previous server installs and of course I never went to look there because the Meeting server was specifically set to use the STMeet host.

I removed those extra ports from the admin_host virtual host definition and restarted the Meeting node and servers (clearing the temp directories first \profilename\temp and \profilename\wstemp as well as \profilename\config\temp) and that fixed the problem.

So why was the presence of those two ports 9080/9443  (used by the ST Meeting server) that were in a virtual host the ST Meeting server doesn’t even use causing the WAS Proxy to break? Why didn’t the Meeting server itself break and why didn’t the ST Proxy Server which also had a WAS proxy in front of it break?

Turns out that no matter what virtual host mapping you have in place for applications, in Sametime the code checks the admin_host and if a port appears there – it silently disables looking up any other host.  The fact that the Meeting server ports appeared at all in the admin_host meant that the STMeet host was being ignored and the WAS Proxy had no way to direct the traffic.

Unfortunately none of that is visible in the logs or in debug logs which all reported the servers and services using the correct STMeet host.  So it wasn’t something that was able to be seen.  It was a combination of Tony seeing the admin entries and me having had a previous call with a server install which added ports to unwanted virtual hosts that allowed us to find it and fix it.

The ST Proxy server itself wasn’t affected because that server was running on 9082/9445 so its ports weren’t in admin_host and its virtual host therefore wasn’t ignored.

Always good to have a problem fixed and learn a ton of stuff about application behaviour at the same time 🙂

Last week in Eindhoven…

We were in Eindhoven last week at the Engage conference.. over 400 attendees, speakers and IBM’ers gathered for two days of learning, talking and cleaning out the hotel bar of tonic water.. I’ve been to several of the past Engage conferences and Theo always puts on a great event but this was bigger and better than ever.  So why?

IBM sent a lot of executives to Engage with the Opening General Session being given by the new ICS general manager (appointed at Connect in January) Inhi Cho Suh and with product strategy presented by Suzanne Livingston , Sara Gibbons and Chris Crummey.  The first thing Inhi announced was that things are going to change – starting with the Orlando conference which moves to February 22nd at Moscone West in San Francisco.  That’s a big decision and commitment – serious tech companies have conferences in SF and that’s where ICS (IBM Collaboration Services) need to be if they are going to innovate, lead and grow as opposed to maintain.   Inhi also let us know that she has asked the product team to work on a 2020 strategy and that it will include IBM Verse on premise.

Then we got the demo of Verse , Toscana and the thinking behind ICS design.  It’s a shame the OGS wasn’t recorded as Suzanne’s background to their design thinking and Sara & Chris’ demo were both much more detailed (and further advanced) than at Connect in January.  However if you want some idea of what we saw take a look at the OGS video from January (from about 90 seconds in to 20 mins in) here

Aside from the OGS the entire IBM team (of which there were more than 30 in attendance) were everywhere wanting to hear about problems, wanting to listen, wanting to change their relationship with partners, with customers with development for the better.   It’s hard not to be taken up with the positivity and enthusiasm.  I’m an optimistic person but I don’t consider myself naive – I feel that I recognise honesty and intent when people talk to me and I what I heard that ICS was important, investable and part of the core IBM development strategy.

In short I choose to believe until I’m proved wrong.

There were of course plenty of great sessions to attend and, as usual, I missed many of the ones I wanted.  Partly because there were also lots of round table discussions too which I found very interesting.  Apparently I’m still the 8 year old in class first to put her hand up with a question.

My session on SHA2 and SSL vulnerabilities was against Mat Newman’s User Blast and Sara Gibbons’ with Toscana.   We were all along the same corridor and I watched person after person go past my room on their way to Mat or Sara’s , so thank you to everyone who chose to hear about security instead and filled out my room.  I hope you found it useful  (and the hand puppets helpful).  For anyone who wasn’t there I have added it to slideshare 

On the final evening of the event Theo invited speakers to a dinner preceded by a surprise.  The surprise was that 32 of us were sent into the Escape Rooms.. you are locked in a themed room for an hour and have to decode lots of puzzles to find the code to get out.  I’ve always wanted to try an Escape Room and I chose the “Tomb” which was an Egyptian tomb and went in with a team including Tim and Mike, Sue Smith, Bill Malchisky, Mat Newman, Rene Winkelmeyer and Carl Tyler.  We didn’t make it out in time – we were soooooo close.. but a few things to bear in mind

  • The tomb was entirely dark except for a small flashlight Tim found hidden in a basket in a corner and some candles.  My night vision varies from “bad” to “crappy”
  • Having multiple alpha males in a small space all shouting instructions at each other may not be the best way to get out quickly
  • There was sand everywhere.  Everywhere.  My shoes may never recover
  • Tim is great at puzzles but apparently in the dark, without his glasses (which he forgot to bring in) and with 7 people shouting at him to hurry up – not so much
  • There was a really cool effect where we completed a puzzle and lasers appeared out of the eyes of a skull on the wall and we had to position 7 different mirrors around the room to bounce the lasers around to hit a small hole on the wall.  We got so excited doing that we didn’t notice we had completed the puzzle and a new “door” had opened for about 10 mins.
  • I was given a cryptex to decode and open.  I broke it by pulling the end off.
  • With only 1 light source we could only do one thing at a time so some of us spent a lot of time kneeling in the sand feeling around fake skeletons for clues

In the end it was great fun and I’d definitely want to do it again.

All of that plus a chance to talk to lots of customers and see lots of friends – some of which came along just to meet up.

I hope you’re recovered Theo – because we’re all up to do it again next year.

 

 

 

 

 

There Has To Be A Better Way

I subscribe to IBM notifications for software. That means that every day I receive at least two, or more usually 4 notifications.  The reason I receive more than one? I apparently have two accounts on the IBM site, one for my email address which lets me open PMRs and see technotes and one for my IBM ID which  lets me download software and log into Partnerworld. Let’s not go there. I’ve spent 2 years escalating that to get it resolved and am now accepting of it 🙂

I have a customer who today reported NSDs on the HTTP task – well I know they are on the very latest fixes and patches so when I get a software report this afternoon with titles like “DOMINO SERVER CRASH ON HTTP” obviously I immediately want to go look at a new problem.  Unfortunately when I click on the link in the email I am told to login to the IBM Site before I can see the problem.  When I do that (and use the wrong ID) it tells me I’m a member of Partnerworld but not allowed to see technical support documents.  So I shut down my browser and try again with the other ID.  Finally the document opens up and this is what I see

Local fix
Problem summary
Problem conclusion
Temporary fix
Comments
This APAR is associated with SPR# PEJA9ZXRWJ.

That’s right. Nothing.  No problem detail, No Domino version or OS detail, No resolution. Nothing.  I click on the next problem in the email and get a similar technote.

I want to help IBM support their customers, I want to support my customers.  If there isn’t more information just don’t publish the technote because at this point it’s not a technote, it’s just a problem.  I have problems.  I hope that IBM only tells me of solutions.

Surely there’s a better way ?

Determining Connections Versions

As I start a new Healthcheck today I thought I’d share a tip with you.  One of the first things I do when coming clean to someone’s Connections environment is try and determine what’s installed, including CRs and fixes.  Installation Manager is good at telling you what it installed but less so if you installed fixes outside of its interface.  There are other methods too like checking the version logs and reading the about.jsp, but it can be fiddly to piece together all the information.

One of the best resources I’ve found is an IBM technote from this July which shows how to identify exactly what fixes and CRs are installed.  The most comprehensive is updateSilent which produces a report on screen of every version, CR and iFix.  Here’s the table of what each utility can do.

Identifying Connections

The updateSilent utility is run from with the updaterInstaller directory under your Connections install and the command is:

sh updateSilent (bat or sh depending on your OS) -fix -installDir <ConnectionsInstallDir>

You may have to set WAS_HOME first before it will run so my commands in Linux are:

WAS_HOME=/opt/IBM/WebSphere/AppServer

export WAS_HOME

sh updateSilent.sh -fix -installDir /opt/IBM/Connections

It will then output to screen every CR and iFix that exists.