Building a Software Ecosystem

Introduction

Let me set the scene. You're fairly new to leading your own team and you have a couple of successful projects under your belt. You've been in the job nearly a year, your team is growing, a pandemic is raging and then you're asked to deliver an ambitious, high-profile, multi-application programme of work. Your team is about to double in size, your development processes are still maturing and you're going to have to architect the whole thing. What do you do?

This is the situation I faced at Laing O'Rourke (LOR). I started a job with just two developers. Within a year the team had added two more developers and we had settled on a new tech stack for building web applications (TypeScript / React JS, .NET Core, Azure). We had a couple of applications in production but still rough around the edges. For example, we had only recently started using pull requests to enforce code reviews.

We were being asked to:

  • Develop a new application to manage the lifecycle of a construction project

  • Create a portal for us to more effectively interact with our supply chain

  • Develop a safety compliance application for managing temporary works on site

  • Create a centralised task management application so that tasks generated by the first three applications would appear in a single location

In order to create these applications and have them work together I knew we would also have to:

  • Deliver an authentication system that could manage both internal and external users

  • Build a centralised way of managing role-based access control (RBAC) to these applications, or face having to repeatedly add it to every application

  • Devise an integration architecture

  • Find an easy method of accessing the important data about people, projects, contracts etc. that come out of our core systems

  • Create shared services for common activities

  • Uplift the design of some of the existing applications and integrate them with the new service

In the end we were committed to delivering five new web applications, a data access service (see below), uplifting the design and integration of our two existing apps on the new tech stack plus possibly rewriting a further two applications that were built on an older stack.

Architecture

Hosting

The most obvious and influential question was where to host these new applications. It was clear that at least some of them would be required to be accessible to external users, and we already had some presence in Azure, so I decided to continue there. Infrastructure-as-a-service (IaaS) was not considered to be particularly beneficial for us as a development team, offering little benefit over what was available to us on-premises.

The wide range of platform-as-a-service (PaaS) offerings on Azure convinced me that this would be the best way to accelerate our delivery. Which led me on to the next question…

To Microservice or not to Microservice

It's a rare day that I get through the day's tech news without seeing the word "Kubernetes". There seems a lot of hype around creating a vast number of microservices, all in their containers, all deployed, managed and scaled at the mere touch of a button.

There's a flip-side too, such as added complexity and the learning curve. I particularly like the following from this article on The Register:

…a lot of organisations incrementing their own Kubernetes clusters have found their ability to deliver software hollowed out by the fact that everybody now has to go on Kubernetes training courses.

There were a few key reasons I decided not to go with microservices. They add a lot of complexity, such as operational overhead, service boundary headaches, and the coordination tax across teams. But the deciding factor for our specific situation was simpler than that. The applications we were writing weren't enormous, the user base was relatively small, and individual services were unlikely to need fine-tuned scaling on demand. I couldn't see the benefit of managing a large suite of microservices when none of the things microservices are good at applied to us.

The approach we took instead was to bundle all data access into a single GraphQL API. Doing so eliminated the need to develop many individual services.

Authentication

The key requirement for authentication is that it would have to support non-LOR employees. This hadn't been a requirement for our newest apps that were using Azure Active Directory. We looked at three options:

  • Auth0, which we were already using for a couple of smaller applications

  • Microsoft B2C

  • Identity Server

Auth0 was the obvious choice, but the costs involved seemed prohibitive at inception. Our discussions with Microsoft did not make us feel confident about the level of support for the product, so we were left with Identity Server, which would still require a lot of custom implementation.

The good news for us was that discussions with our enterprise architect in the UK revealed an appetite for a robust, global solution, which allowed us to bring in Auth0.

Authorisation and RBAC

You've all seen pages like it. A big grid of permissions and users, with a seemingly endless sea of check boxes. We were being asked to put this into every app we built. I proposed a different solution, which was to build another application dedicated to user and permissions management. This would provide a one-stop-shop for determining a user's permissions in any system.

This idea grew to include automatic mapping of a user's role in our HR system to their permissions which was a massive reduction in the administration burden.

Integration

The goal of integrating our applications was to minimise direct dependencies between them. The preferred approach was to use events. In our case events were lightweight messages that represented something that had happened within the business and would be system agnostic. This was important as we did not want to expose the technical implementation of a system via events. Events were strictly a representation of business interactions (such as a contract being signed, a new employee starting etc.).

The chosen technology was Azure Event Grid, although given the chance to do it again, I would have used Azure Service Bus Topics.

Initially, Azure Service Bus Queues did not feature in our plans. As we progressed and new requirements became clear, we realised we would need to create or update hundreds, and sometimes thousands, of tasks in one go.

Security

Securing our apps and infrastructure was an important aspect of the programme. It became clear to me fairly early on that we lacked expertise in Azure security and really needed to up our game. We engaged with Microsoft, external consultants as well as doing an incredible amount of self-learning, in order to come up with a security strategy that would work for us. You can learn more about Azure security in my article on Azure.

The Team

It is exciting to bring a talented team together to achieve something great, but it is not without its challenges. Developing so many applications simultaneously was difficult because we needed to build out the code libraries to support them as well as the shared service they would rely on. I'm a big believer in DRY (Don't Repeat Yourself) so looked to developing libraries of code to standardise the solutions to common problems. I also follow this philosophy in my own development.

Keeping developers following a set of patterns without stifling creativity and innovation is very hard without being a complete micro-manager. Developers are talented and creative people and you stifle that at your peril. My response was to set the standards, but be prepared to adjust them if something better came along. My stipulation was that a change should be applied across the whole team and for that reason some good suggestions were not implemented. My motto was and still is:

Consistency is better than perfection

Of course I was not always successful. Unless you start with a clean sheet of paper and very well defined set of patterns, you'll be making compromises along the way. 

This is another case where AI could upend current thinking. The cost of large scale refactoring is steadily diminishing so maybe the team can get closer to perfection while maintaining consistency.

Another challenge with such a large team is making sure everyone feels engaged, heard and respected. A good chunk of my time was spent doing one-on-ones. So many meetings can be draining but it is important to connect with your colleagues, particularly when working remotely. Regular full team meetings, even virtually are a great way for everyone to feel more connected, to interact with others whom you may not normally talk to, and to feel like they're part of a real team. In person social events are also great if distance (or a flight budget) allows.

What I Learned

The thing that makes this programme stand out, looking back, is how many disciplines it stretched me across at once. I learned how to hire and build a team, and more importantly how to keep one cohesive when it doubled in size during a pandemic. I learned how to manage multiple concurrent streams of work without dropping any of them, which mostly came down to being honest with myself about what I could realistically manage in parallel. I got a deep, practical grounding in cloud-native architecture; not the MS Build "look how easy it is" version, but the version where you have to live with your decisions for years. I learned the Ops side of DevOps: infrastructure as code, backups, logging, monitoring, and all the things that don't matter until they really matter. And I learned that cost management in the cloud is its own discipline, quite distinct from budgeting for on-premises or SaaS software, and equally distinct from writing the code in the first place.

Most projects let you focus on one or two of these things. This one didn't, and that's why it shaped me more than anything else I've worked on.