You Lie! Error Messages and When To Ignore Them

Building Connections this week and troubleshooting some errors reminded me to share the process I have adopted when dealing with IBM error messages – which is to treat them as hints that can set you on the right path but also send you badly down the wrong one.

Problem 1:

Installing Connections itself via Installation Manager.  One of steps during the install requires you to specify the DB2 server, the database names and credentials to connect to them.  I click validate and it fails  with error CLFRP0030E and launch error!.  That points to this technote which says I left a space after the hostname for the DB2 server.

I absolutely didn’t leave a space and didn’t copy/paste.  Just in case (and working on the assumption that it’s always me and not the product) I cleared it all and typed carefully again. I confirmed the hostname was correct and could be reached.  I also relaunched Installation Manager and started from the beginning.  No luck.

It’s  at this point I have to accept the error is referring to something else and that’s all the information I’m going to get from Installation Manager.  So now I move to asking myself “what if I saw no error but it just failed to connect”.  Well the first answer to that is to check if the connection details, hostname, credentials etc actually work at all.   Having confirmed the hostname and ports (there were no firewalls turned on or virus software), I logged into the DB2 server and checked the LCUSER account. Locked out.  I unlocked the account and the install then completed.

Problem 2

The test server in this environment is one box with everything DB2, TDI and all the applications on it.  My base WebSphere install was WAS 8.5.5 FP10 since Connections System Requirements for WebSphere 8.5.5 says FP8 and higher and I wanted to test that out. Everything installed fine right up to when I went to install Connections Surveys.  That’s when I hit a 2 day brick wall.  Installation Manager couldn’t connect to the Deployment manager despite it being on the same server.

screen-shot-2016-12-09-at-18-26-10

Well that’s odd.  Deployment manager is running.  The hostname resolves. The port is listening. I try to find out what the system requirements are for Connections Surveys but for 2 days last week and through the weekend the IBM system requirements pages for Survey were down.  I’m stubborn so I won’t let it go.  Even the Forms Experience Builder requirements for earlier versions were down.  So eventually I had to leave it and move onto the production build. The work needs completing and I was suspicious that the issue might have been installing everything on one server.

I build production across 4 servers and this time I stick with WebSphere 8.5.5 FP8 just in case.  When I get to the Surveys install it goes without a hitch.  So back to the test server I go.  Roll back Websphere to 8.5.5.0 and then forwards to FP8 (thank you Installation Manager!).  Surprise surprise Surveys installed perfectly.

So. Not an issue connecting to deployment manager or port or the server running but instead “Connections Surveys cannot install onto WebSphere 8.5.5.10 at all.

 

 

User Denied Access To Files and Wikis

Another PMR this week on a new 5.5 side by side build. Once built everything looked OK except a couple of users in IT who received access denied errors when going to Files or Wikis, everything else worked.  Those two applications have databases with pretty much the same schema so we often see matching problems in both applications.

Checking the application security I could see that both were set to All Authenticated so there was no reason why those users couldn’t get at files.  The browser error contained

Identifier: LC6C54CE35BA4D41BF8CB2461634B9EAE6 EJPVJ9275E: Unable to add a group with the directory ID [E7F267C7-8811-D8EC-8025-7E57004A5278, 4339D1D3-2F37-ACDB-8025-7E57004A5285, C0085F47-7A84-EFD4-8025-7E57004A51FA, 4DB58BD6-77EA-80AC-8525-6B700078923E, A5456CF5-9FA0-E49B-8025-7E57004A5316, 54578802-623A-2E18-8025-7E57004A5289, 4EEFAFD1-A098-4155-8025-7F1D00522430, 0D1FD4C5-F61E-CB15-8025-7E57004A51F6, 5A3E2519-52BE-F072-8025-7E57004A527B, 04CE2967-BD15-B84D-8025-7E57004A52F1, 0D162A8C-223C-33C3-8025-7EB4002F6ADF, 6DCCEAE9-6A16-2A75-8025-7E57004A5377, 2B5D0EBA-B225-BA42-8025-7E57004A52DF, C6B296E2-5D27-0F89-8025-7E57004A532A].

If I count how many directory IDs are listed there, there are 14 which matched the number of groups that user was a member of when doing an LDAP query.  Still we weren’t using groups for any access and this exact configuration was working for the same users in 5.0.

In the SystemOut.log I could also see

CWWIM4546E  Duplicate entries were found in the external identifier ‘d68bb54dea77ac8085256b700078923e’ in repository ‘d68bb54dea77ac8085256b700078923e’.

That ID (formatted in various ways) would not resolve to any group in Files or Wikis never mind to duplicates.

Eventually David McCarthy @ IBM got me to change the wimconfig.xml file on the deployment manager which fixed the problem.  My configuration didn’t exactly match the documentation which said to change

<config:baseEntries name=”o=ORGX” nameInRepository=”o=ORGX”/>
to
<config:baseEntries name=”” nameInRepository=””/>

my configuration only had <config:baseEntries name=”” – no ORGX and no nameInRepository at all.  I believe that’s because we use  Domino for LDAP and “root” as the base entry so my federated repository looks like this – a configuration that results in no entry for nameInRepository in wimconfig.xml.

Screen Shot 2016-06-29 at 14.52.26

Once more this isn’t a problem in 5.0 but possibly due to a change in WebSphere behaviour in a newer version, I had to manually edit wimconfig.xml to add the nameInRepository=”” value.

At IBM’s request I also added the Group Membership Attribute which is used for resolving nested group memberships.  This customer uses Domino for LDAP and doesn’t really use nested groups in Connections so in 5.0 it was empty and worked fine however 5.5 may have been struggling with resolving group memberships for some individuals.  In 5.5 having it set to empty could have been contributing to the access problem.

The screenshot below is from 5.0. Screen Shot 2016-06-28 at 19.13.56this is how I changed it in 5.5 (same LDAP source, same users, same everything else)

Screen Shot 2016-06-29 at 15.06.31

Resyncing and restarting then fixed the problem and the users concerned could suddenly access Files and Wikis.

Not sure why it didn’t work for those users before the changes but it could have been something to do with one particular group and its nesting or maybe even a replication conflict which I couldn’t find.

Go figure.

 

Ephox Textbox.io – documentation error

When installing Textbox.io, one of the rich text editors for Connections 5.5 from Ephox,  you have the option post install to configure a spellchecker.  It’s actually a very nice feature that spellchecks on the fly in any rich text field within connections.

To enable it you have to install one of the ear files that comes with the Ephox installers and configure a configuration file that allows the spellchecker to run.  It’s a simple thing to do and the instructions are here however a few issues you should be aware of

  1. The documented example refers to you using server ports but if Connections is correctly configured via IHS and you have regenerated the plugin-cfg.xml then you don’t need to add the server ports for access
  2. The example refers to only one  origin URL but often we have more than one.  To add additional origin URLs you add a comma and a space. My example is
     ephox { allowed-origins { origins = [ "http://connections101.turtlehost.net", "https://devtest.turtlehost.net"], url = "https://connections101.turtlehost.net/ephox-allowed-origins/cors" } }
  3. The biggest problem is that the documentation is wrong when it says where to create the application.conf file

    On Windows: BOOT_DRIVE_LETTER:\opt\ephox\application.conf where BOOT_DRIVE_LETTER is the boot drive for your system

    it’s fairly clear it wants me to put the file on the C drive which is the boot drive for Windows but if you do that the spellchecker won’t work and the URL

    https://connections101.turtlehost.net/ephox-allowed-origins/cors will return

    {"value":["http://localhost"]} 

    which is obviously a default value when the file can’t be found.  

    You need to create the application.conf file not on the boot drive but on whatever drive you have installed WebSphere for the deployment manager which could be the D, E or even Z drive. By creating the /opt/ephox directories there and an application.conf file the spellchecker will find it and start working.

Connections Upgrade UI Weirdness

My latest PMR for a Connections 5.5 upgrade was concerning some very strange behaviour in the UI.  Seemingly unrelated but looking suspiciously like the same cause. Here’s what we discovered (and yes it was tested with multiple IDs , browsers and locations)

  1. The public files window had no scroll bar
  2. There were no upload buttons on the Files app
  3. When editing a profile picture I couldn’t remove it, only change it.  If I chose remove it just closed the profile with no changes.  The wsadmin profile command to remove the picture worked fine
  4. Editing a Community I owned just showed a blank page with a Community title but not editable fields

Weird right?

No errors anywhere in SystemOut or even in filebug or fiddler.  IBM could find no problems in the configuration or errors anywhere so today we did a screenshare to see what was going on.  After 90 minutes we had no luck, not even an error so Charlie Price joined the call as we talked over what I did to build the environment I explained it was a side by side upgrade where I

  • Built a new 5.5 server
  • Upgraded it to CR1
  • migrated the data to 5.0 CR1 databases using dbt.jar
  • upgraded those databases first to 5.5 then to CR1
  • used the migration tool to export artifacts from 5.0
  • copied the work folder the migration tool created to the new 5.5 server
  • used migration lc-import (from the updated tool issued D1 Dec 18) to import the artifacts

When I looked at the customization folder after completing the migration it was full of content that had been imported so the first thing I did when I saw this problem was empty that folder (by copying it somewhere else) and getting it to match my Connections101 5.5 CR1 server.  My suspicion was that somehow the migration lc-import had broken something.

Charlie compared with his system and noticed that both the javascript folder under Customizations and the webresources folder under provision were both completely wrong.  When we checked the 5.0 environment we saw that the contents matched that so must have been overwritten during migration lc-import.  The clue was the missing javascript and jar files for Ephox which I had installed and were now missing.  Charlie sent me a zipped up copy of his webresources directory and I overwrote mine with that and everything started working.

So.. be very careful using the migration wizard.  Take copies of your javascript and provision\webresources directories before doing an import and then make sure the files look up to date and there is nothing missing when you’re done.

 

 

 

Connections Failure To Authenticate

Last week was spent working on a PMR where a newly migrated (side by side) Connections 5.5 environment refused to let anyone access any applications.  I could login using any credential but the Homepage wouldn’t load and any application that required authentication failed including Communities.

Here are some of the errors in the logs

CLFRW0016E: Could not retrieve details for the user with login ID gabriella.davis@domainname.com due to an exception. The exception occurred when retrieving the details via the virtual member manager directly: {1} (in system out for utilcluster which contains homepage)

ADMN0022E: Access is denied for the expandVariable operation on AdminOperations MBean because of insufficient or empty credentials. (in ffdc)

“CustomAuthent E com.ibm.connections.httpClient.CustomAuthenticatorFactory <init> SONATA: authenticator class name is missing!  {in SystemOut for InfraCluster)

webapp E com.ibm.ws. webcontainer.webapp.WebApp logServletError SRVE0293E: [Servlet Error]- [action]: com.ibm.tango.exception.AuthContextException: com.ibm. connections.directory.services.exception.DSException: com.ibm. connections.directory.services.exception.DSOutOfServiceException: java. lang.NullPointerException (in Systemout for InfraCluster).

 

Here (amongst others) are the things we tested / changed / reverted that didn’t fix it.  Bear in mind a working 5.0 production environment with the exact same configuration had no problems during this time.

  • LDAP was fine (we could login). For giggles we changed credentials and back again
  • We changed the login options from mail;cn;uid (which we use in this environment and works fine) to uid;mail;cn
  • We removed the mapped credentials for application security that were put there by the installer and put them back again – apparently that sometimes works
  • Set the authentication under application security for Communities and Profiles from None to Everyone just to confirm where the problem was
  • About 100 other things

Basically we managed to establish the issue was any intraservice communication but not why.  Eventually it went to L3 who isolated the error  as being something in the LotusConnections-Config.xml.

CustomAuthent E com.ibm.connections.httpClient.CustomAuthenticatorFactory <init> SONATA: authenticator class name is missing!  

That file had been migrated as an artifact via the migration tool and was the same as 5.0 but in there was the line <tns:customAuthenticator name=”DefaultAuthenticator” xmlns:tns1=”http://www.ibm.com/uiextensions-config”/>;

which they asked to be changed to <customAuthenticator name=”DefaultAuthenticator”/>

That immediately fixed the problem.

No-one is quite sure how that setting ever got into LotusConnections-Config.xml but my guess is during a CCM/Filenet installation.  The interesting thing is that it works in 5.0 but breaks 5.5. Maybe it requires you to have CCM installed to work as the 5.5 environment (mine or IBMs) didn’t have that.

Still a nice simple fix for such a painful problem and maybe somewhere for you to check when doing your own debugging.

Thanks very much to David McCarthy & the IBM L2 team for prioritising and working the problem.

Severe TDI Issue In Connections 5.5

I have been working with a customer who is migrating to Connections 5.5 from Connections 5.0.   When I do a migration I like to do it properly and create clean data by using dbt.jar to migrate content to new databases.  I know a lot of people are happy with the backup/restore of databases idea but for me that leaves too much scope for bad data to carry over from old system to new.

Everything was going fine, the profiles data migrated and then I tried a sync all dns to sync the ldap data to the database.  Something we schedule daily at this customer.  It failed as it tried to hash the database tables.  The error in the ibmdi.log was

Error: The sort page size property – source_ldap_sort_page_size= – must be greater than 10 if it is not 0. Aborting.

That’s a value that is set in profiles_tdi.properties and it was already set to 0.  So why was it aborting?

I decided to troubleshoot just with a cutdown list of names in collect.dns and using populate_from_dn_file function.  Again it failed but with the strangest error that would find the user in LDAP, get all their values then fail to find the user in the database and fail to update.

In SyncUpdates.log I could see the following error no matter what user I chose for populate_from_dn_file.

ERROR [com.ibm.di.log.FileRollerAppender.bc9c35a0-aae5-416e-9a99-1d418c3c564c] – [callSyncDB_mod] [ProfileConnector] null
java.lang.IndexOutOfBoundsException
at java.util.Collections$EmptyList.get(Collections.java:87)

I then tried copying the collect.dns to my 5.0 production environment and running there and it worked fine, found the users as duplicates and didn’t update them which is the correct behaviour.

I compared the map_dbrepos_from_source.properties files in 5.0 with 5.5 and it all looked pretty much the same.  So I opened a PMR which was eventually escalated to development. As soon as they received it they knew what the problem was – apparently a known but not documented bug that was fixed in CR1 with files that you have to manually deploy (we were already at CR1).

Development’s report of the problem was

log4j.logger.com.ibm.lconn.profiles.internal.service=ALL           
                                                                       
in log4j.properties causes TDI populate and sync commands to crash if an
EMPLOYEE is altered        

Well the crashing was true but the value  log4j.logger.com.ibm.lconn.profiles.internal.service=ALL was # out and unused so it wasn’t related to that particular log setting in my case.

The fix was to go find the two files

lc.profiles.core.service.impl.jar
lc.profiles.core.service.api.jar

in the Connections install and copy them to your TDI\lib directory in your tdisol environment.  In my case I had created a folder called TDISOL55 and under that I had a TDI directory with all the properties, script etc files in and the lib subdirectory full of jar files.  That came from the D1 (day 1) release download of Wizards which contained updated TDIPopulation directory and was dated 18th Dec.  There was no new tdisol with CR1 but clearly there should have been.

I found the files in my Websphere Application profile directory for the profiles application server under the directory

D:\IBM\WebSphere\AppServer\profiles\AppSrv01\installedApps\conn55Cell01\Profiles.ear

I copied those two files over and it al-most worked.  I had one more problem.  The value source_ldap_sort_attribute in profiles_tdi.properties which was initially set to empty (not null but = ) had been changed at the request of L2 to source_ldap_sort_attribute=mail which matched the 5.0 properties we were using.  They asked me to change it for exact comparison and that broke the updates.  Once I took out the “mail” mapping the scripts, both populate_from_dn_file and sync_all_dns ran perfectly.

The new environment does use different LDAP servers (but the same source data) and I don’t know if attempting to tell the server to sort the LDAP results failing is a problem with that server configuration (both environments are Domino 9.0.1) or 5.5 itself. I’ll investigate that and update.

So my two fixes were

  • copy the two jar files from your CR1 installedapps directory to your TDISol directory (lib subdirectory).
  • make sure source_ldap_sort_attribute= in profiles_tdi.properties

Connections and Traveler At ISBG

I was delighted to be invited to speak at the ISBG (http://isbg.org) conference in Norway which this year was held in Oslo.  I’d like to thank the organisers for being so accommodating to the fact that I could only stay 1 day !

I presented on two topics , Upgrading Connections and Managing Traveler.  The content for both is on slideshare and linked below.  My upgrading Connections session had a lot of new content about 5.5 and 5.5 CR1 and I hadn’t written a Traveler management session from scratch in several years.  I’m not sure how well the audience received them but I am pleased with the content at least.  I hope you find them useful.

So you have IBM Connections installed, but now you need to decide what and when to update. It could be a WebSphere fix or a DB2 fixpack, a new application, a database schema or an entirely new version. Some updates are for security, some for performance and some for new features. In this session we’ll discuss how you can decide when and what to upgrade, how to plan for and perform a safe upgrade regardless of its size, and test when it’s complete. We’ll also discuss what things can trip you up along the way.

 

Traveler is a core component of most companies’ mail infrastructure, but its maintenance and security goes far beyond Domino server management. In this session we’ll look at a Traveler environment from daily tasks to enforcing TLS and starting with understanding how Traveler behaves. We’ll review both standalone and high availability configurations and discuss common problems, as well how best to plan and design a secure and stable infrastructure.