Creative Ideas For Docker (and Domino)

In an earlier post I mentioned that I have been working on new technology projects since the end of last year and I wanted to share here what I’m doing as well as plan to keep you updated on my progress if only to keep pressure on myself.   I have been working with, and speaking about, Docker and containers for the past year and it was good news to hear that IBM will now support Docker as a platform for Domino (as of 9.0.1 FP10). http://www-01.ibm.com/support/docview.wss?uid=swg22013200

Good news, but only a first start.  Domino still needs to be installed and run in its entirety inside a container although the data would / could be mapped outside.  Ideally in a microservices model Domino would be componentised and we could have separate containers for the router task, for amgr, for updall, etc, so we could build a server to the exact scale we needed.  However that is maybe in the future, right now there’s a lot we can do and two projects in particular I’m working on to solve existing issues.

Issue 1: A DR-Only Domino Cluster Mate

It’s a common request for me to design a Domino infrastructure that includes clustered servers but with at least one server at a remote location, never to be used unless in a DR situation.  The problem with that in a Domino world is also Domino’s most powerful clustering feature, there is an assumption that if a server is in a cluster then it is equally accessible to the users as any other server in the cluster and, if it’s not busy and the server the user tries to connect to is busy, the user will be pushed to the not-busy server.   That’s fine if all the cluster servers are on equal bandwidth or equally accessible, but a remote DR-only server that should only be accessed in emergency situations should not be part of that failover process.   It’s a double edged sword – we want the DR server to be part of the cluster so it is kept up to date in real time and so users can fail over to it without any configuration changes or action on their part.  We don’t want users failing over to it until we say so.

I tend to tackle this by designing the DR server to have a server_availability_threshold=100 which marks it as “busy” and prevents and client failover if the other servers are online.  It works ‘ish’ but someone has to disable that setting to ensure all users failover neatly when needed and it isn’t unusual to have a few users end up on there regardless.

So what can Docker do for me?

I don’t see that much value in a standard Domino image for docker in my world.  When I build a Domino server it tends to have a unique configuration and set of tasks so although it would be nice, my goal in deploying Domino under docker is very different. It is to create identical containers running identical versions of Domino with identical names e.g Brass/Turtle and Brass/Turtle. Both containers will point to external data stores (either in another container or a file system mount). Both will be part of a larger Domino cluster.  Both will have the same ip address.  Obviously both can’t be online at the same time so one will be online and operating as part of the cluster and only if that server or container goes down would the other container – at another location – activate. In that model we have passive / active DR on a Domino server that participates fully in workload balancing and failover.  I don’t have to worry about tuning the Domino server itself because the remote instance will only be active if the local instance isn’t.   I would use Docker clustering (both swarm and kubernetes can do this) to decide to activate the second container.

In principle I have this designed but I have lots of questions I need to test.  Not least deciding the location of the data.  Having a data container, even a clustered data container would be the simplest method.   That way the Domino container(s) would reference the same data container(s) however Domino is very demanding of disk resources and docker data containers don’t have much in the way of file system protection so I need to test both performance and stability.  This won’t work if the data can be easily corrupted.   The other idea is to have a host-based mount point but of course that could easily become inaccessible to the remote Domino container.  I have a few other things that I am testing but too long to go into in this post.  More on that later.

Issue 2: Domain Keys Indentified Mail for Domino

In its simplest explanation, DKIM requires your sending SMTP server to encrypt part of the message header and have a public key published in your DNS file that enables the receiving server to decrypt it, thereby confirming it did actually originate from your server.  It’s one of the latest attempts to control fraudelent emails and, combined with SPF records, constitutes requirements for DMARC certification.

The DKIM component of DMARC is something Domino does not support either inbound or outbound.  It may do in the future but it doesn’t right now and I am increasingly getting asked for DMARC configurations.  Devices like Barracuda can support inbound DMARC checking but not outbound DMARC encryption. The primary way I recommend doing that now is to deploy Postfix running OpenDKIM as a relay server between Domino and the outside world, your mail can then be “stamped” by that server as it leaves.

My second docker project therefore is to design and publish an image of postfix + OpenDKIM that can be used by Domino (or any SMTP server).

More on these as I progress.

 

Champions Expertise – 2018 Technology

IBM Champion Expertise presentations are a new initiave we are starting this month whereby Champions can provide audio presentations on a particular topic.  This month is “2018 Futures and Technology” and here is my presentation on what I think is going to be big for 2018, containerisation vs virtualisation and where it goes next.  This presentation has audio and I tried to keep it short but feel free to double speed me if 14 mins is too long.

I mention in my presentation that I have a more detailed presentation on docker architecture on slideshare and if you want to see that it’s here.  I’d also be grateful for any feedback on the length, style or other aspects of the presentation and what you think of the Champions Expertise idea.

Sametime Client Update Breaks Single Sign On

I recently built a new Sametime Complete environment for a customer that included an Advanced and Meeting server.  When I had completed the build I tested a new standalone Sametime client in a VM to confirm that I could login to the new Community server and it would log me into the Advanced and Meeting servers.   Having added the necessary lines to plugin_customization.ini to enable  Sametime Advanced* I was able to login to the Community server successfully and be automatically logged into the Meeting and Advanced servers.   However, when I handed over to the customer for testing I was surprised that they couldn’t actually login to the Meeting server at all through the Sametime client. They got a server unreachable error.

So I did further testing

  1. On my client I was configured to use SSL for both the Meeting server and Sametime Advanced. I could login to the Community server and that logged me in securely to Meetings and Advanced.  That same configuration on a test workstation of theirs failed to login to the Meeting server saying server not responding (although it did successfully log in to Advanced)
  2. If I removed the Sametime Advanced servers from the Sametime workstation client it could suddenly log in to the Meeting server
  3. If I changed the Meeting server configuration in the workstation client to use HTTP (80) instead of HTTP (443) I would be logged in to the Meeting and Advanced server
  4. On the test workstation I could always login to the Meeting server securely through a browser and open a tab to the Advanced server and be automatically logged in there even when the Sametime client claimed it couldn’t reach the server.

So why did it fail on every one of their workstations and not for me? It turns out they were using the latest Sametime client I had downloaded from Fix Central (20170402-0344) for them whereas I was using the 2016 build (20160624-0209).  I took a snapshot of my VM and upgraded my Sametime client to the April 2017 one and I immediately was unable to log in to the Meeting server. I rolled the snapshot back to the 2016 client and everything worked again.

One of the major updates in the 2017 client was SAML functionality and it does seem that the single sign on logic has been broken in some way by that 2017 update.  Everything is working with the 2016 client so for the time being (and whilst IBM investigate the PMR) we are rolling that out.  One to watch out for though – newer is not always better and you might want to avoid the latest 20170402-0344 update.

 

*for Sametime Advanced login to work at all in the client you must ensure “remember password” is checked and the following two lines are in the plugin_customization.ini

com.ibm.collaboration.realtime.bcs/useTokens=false
com.ibm.collaboration.realtime/enableAdvanced=true

Engage – Was It Really Over A Week Ago?

It’s 2am so apologies in advance for any rambling in this post but I’ve been wanting to write about the Engage conference in Antwerp ever since I got back last Thursday (and if I leave it much longer I might as well write about next  year’s conference).

This year Engage was held in Antwerp which is only a 3.5hr drive for me so we met everyone else there who came by train.  Top tip – don’t try and drive in Antwerp, the one way systems will get you every time.  Yet another beautiful city and conference location by Theo and the Engage team.  The Elizabeth conference center was spacious and since there were 400 of us and the Engage team had made sure to provide lots of seating / meeting areas, it felt right.  One thing I really enjoy at conferences is the opportunity to meet people (OK I hate approaching people to talk but I like being part of a conversation) and I had the opportunity for some great conversations with sponsors and attendees. I managed to bore people to death about my latest obsession (docker).  IBM also sent a lot of speakers this year with Scott Souder and Barry Rosen updating us on Domino and Verse futures and both Jason Roy Gary and Maureen Leland there to sprinkle some (Connections) pink around.  There was a lot of open discussion about technology now and what we were each learning and working with along with a fair amount of enthusiasm for what we’re each working with, so thanks to everyone for that.

This year the agenda expanded to including emerging technologies and one of my sessions was in that track – on IoT in the Enterprise, GDPR and data.  I try to aim my presentations at the audience I’m talking to and when it comes to IoT the IT audience naturally has a lot more concerns then line of business managers.  Outside of IT IoT is purely about opportunity but since IT need to take care of the rest my presentation was more technical with a security checklist for deploying IoT devices.  All the opportunity for businesses will inevitably involve a lot of work from IT in the areas of data retention, data analysis, security and process redesign.  Some really interesting technologies are evolving and IoT is very fast moving as evolutionary technologies are so now is the time to start planning how your business can take advantage of the incoming swarm of data and tools.

My second session was on configuring a Domino  / Cloud Hybrid solution with step by step instructions for setting up your first environment.  That presentation is on my slideshare and also shared below.  The key thing to understand about hybrid cloud is that as a Domino administrator you still manage all your users, groups, policies and your on premises and hybrid servers, in fact the only things you don’t manage are the cloud servers themselves.  Getting started with a hybrid cloud deployment is a good way to understand what the potential might be for migrating or consolidating some of your mail services.

As always the Engage team put on an amazing event, lots to sessions to learn from, lots of people to meet and a lot of fun.  I was very pleased to see Richard Moy who runs the US based MWLUG event there for the first time and I’m looking forward to attending his event in the US in August.   Finally my crowning achievement of the week was when no-one on my table could identify either a Miley Cyrus or Justin Bieber song at the closing dinner and none of us considered cheating by using Shazam (I’m looking at YOU Steph Heit and Amanda Bauman :-)).  Theo promises us Engage will be back in May 2018 at a new location.   See you there.

More Adventures In *** RHEL Configuration

I know I shouldn’t have blogged on Saturday – as soon as I think I have a problem fixed the universe rises up and slaps me roundly about the head.  So fast forward to the end, it’s Sunday night and I’m installing Connections on RHEL 7 so that’s good.  However to get there I had more hurdles which I’ll note here both for myself and for anyone else

I configured and enabled VNC and SSH for access which worked fine on the same network but not from any other network (“Connection Refused”).  The obvious first guess is that the firewall on the server hasn’t been disabled.  It’s always the first thing I do since I have perimeter firewalls between networks and I don’t like to use OS ones. So Saturday and Saturday night was an adventure in checking, double checking and checking again that I had the firewall disabled.  RHEL 7 has replaced iptables with firewalld but iptables still exists so my worry was that I had something enabled somewhere.  I didn’t think it could be my perimeter firewall since I had built the server with the same ip as an earlier server that already worked. My other guess was VNC being accidentally configured with –nolisten but that wasn’t true either.

By the time I went to bed Sunday morning I had ruled out it being the OS and was going to start fresh a few hours later.  I’d also noticed that although I could connect via VNC it was slow as hell despite having a ton of resources.

Sunday morning I decided to delete all the entries referring to that server’s ip on our Sonicwall perimeter device and recreate them.  That fixed the network access. The one thing I didn’t build from scratch was the one thing that was broken. *sigh*.

At this point I did consider switching to Windows 2016 on a new box but I already planned to use that for another server component and wanted to build with mixed OS. Also #stubborn.

So now I have VNC and SSH access but the GUI is awful. I can’t click on most of the menus and it keeps dropping out.  I’m running GNOME 3 and I can find endless posts about problems with GNOME 3 and Cent OS or Redhat so I bite the bullet and install KDE because all I want is a GUI.  KDE is as bad, slow, menus not clickable.  I make sure SELINUX is set to “Disabled” but still no luck.   I try installing NoMachine as an alternative method but that has the same problem with the GUI – slow, unresponding, menus unclickable and eventually a crash with “Oh no!, Something has gone wrong”.  Which isn’t nearly as entertaining the 100th time you see it.  Along the way I disable IPV6 entirely and found and fixed this bug

https://bugzilla.redhat.com/show_bug.cgi?id=912892

and this one

https://bugzilla.redhat.com/show_bug.cgi?id=730378

oh and this irritating setting

https://access.redhat.com/solutions/195833 “Authentication is required” prompt

Throughout Sunday I’m continually working with /etc/systemd/system/vncserver@:1.0 to modify the settings, create new instances, create new VNC users but no matter what I try it proves unworkable.

I’m using the Red Hat instructions from here which has a configurator you can use to automatically create the file vncserver@ file according to your settings.  I’m suspicious of that file because it has settings I don’t normally use like  -RANDR so eventually I edit the file and change

ExecStart=/sbin/runuser -l turtlevnc -c \”/usr/bin/vncserver %i -extension RANDR -geometry 1024×768\”
PIDFile=~turtlevnc/.vnc/%H%i.pid

To

ExecStart=/sbin/runuser -l turtlevnc -c “/usr/bin/vncserver %i -geometry 1024×768”
PIDFile=~turtlevnc/.vnc/%H%i.pid
Cleared the /tmp/X11.unix/X? directories and restart once more.  Everything including GNOME 3 works and it’s zippy zippy fast.

 

So. Note to self. Next time remove that RANDR setting and win yourself an entire day back.

 

Me vs Technology (spoiler: I win)

Yesterday Connections 6 shipped and although I was in meetings all day my goal for last night was to get everything downloaded and in place on a VM and have that VM built with a configured and hardened OS.  That was the plan.  I thought it might be fun to share my 4pm – 4am battle against technology and maybe it will help someone else.  It might also explain all the “other” work that tends to take up my time before I  ever get to the actual stuff I’m meant to be installing.

All my servers are hosted in a data centre and mostly I run ESXi boxes with multiple servers on them. I have 5 current ESXi boxes. So first things first, create a new virtual machine on a box with capacity so I can download the software.  All of this is done from a Windows VM on my Mac which connects to Turtle’s data centre

Vsphere lets me create the machine then gives me VMRC disconnected when I try and open a console.  After some checking I realise it’s the older ESXi boxes that are throwing that error for every VM and only since I upgraded to Windows 10.  If I can’t open a console on the VM I can’t do anything so I search the internet for various random advice which included

  • Disable anti virus
  • Remove Vsphere
  • Install latest Vsphere (which keeps being overwritten with an older one each time I connect to an older machine)
  • Uninstall VMware Converter (which I had forgotten was even there) – that required me booting into safe mode in my VM which only worked if I used msconfig to get it to restart in safe mode
  • Downgrade Windows
  • Create a new clean desktop VM to install Vsphere into

This is a bigger problem than just this install because I also can’t manage any of my servers on those boxes.  I rarely connect to them via the console so I don’t know how long it’s been like that but it can’t stay like that.

Several hours later.. still no luck. Vsphere lets me do everything to a virual machine except open a console.  I could use another ESXi box but I’m being stubborn at this point. I want to use this box

Then I find reference to VGC – Virtual Guest Console  https://labs.vmware.com/flings/vgc.  Created in VMWare labs in 2010 and still in “beta” it does one thing I need which is open a console.  So now I have VSphere where I can create and manage the instances and the VGC to open a console I’m ready to install and OS.

But which OS?  The host boxes have ISOs on them I already use but those are Windows 2012 R2 and RHEL 6.4.  I want either Windows 2016 or RHEL 7.1  Again I could use Windows 2012 but #stubborn.

I download Windows 2016 to my Mac and it’s over 5GB.  That’s going to take a few hours to upload to the datastore and I’m optimistically thinking I don’t have a few hours to waste.  So Plan B is that I take an existing RHEL 6.4 ISO and use that to install then upgrade it to 7.1 in place since you can now do that with Redhat if you’re moving from the latest 6.x to 7.x.  Top tip – it would have been quicker to upload Windows 2016.

I start building the new VM using RHEL 6.4 and eventually I get to the point where I can tell it to get all updates and off it goes.  It’s now 1am and it’s showing 19/1934 updates.  So.. I go to bed taking my iPad with me and leaving my laptop downstairs.  Once I’m in bed I can use Jump on the iPad to connect to my laptop which is on the same network and Terminus and the VPN on the iPad to open a putty session to the data centre.  The 6.4 updates finish and now I need to get it to 7.1  First thing I need to do is download 7.1 directly to that new VM which I can do easily because I installed a browser so I download the 3GB ISO directly to the VM which only takes 3 minutes and I’m ready to install.

Except not quite.  Redhat requires to you run their pre upgrade utility before doing an inplace upgrade.  In fact the upgrade won’t even run until you run pre-upgrade.  So I do that and as expected it fails a bunch of stuff that I don’t care about because this is a new machine and I’m not using anything yet so I’m not bothered if something stops working.  Except the upgrade still won’t run because it spots I failed the pre upgrade test.  That’s where “redhat-upgrade-tool -f” comes in.  Around 4am I left that running and got some sleep.

Incidentally this is a great document on upgrading but I think you may need a login to read it https://access.redhat.com/solutions/637583

At 7am I found it completed at RHEL 7.1 and then ran one more update to make sure everything was on the latest patches,  added the GUI and configured the firewall.

I’m NOW ready to download Connections 6