Written by Gautier DI FOLCO
on October 1, 2021

Soft Compatibility

Hetchr is a SaaS product that centralizes the key features of many collaborative development tools (Github, Jira, Gitlab, and so on).

After few weeks of work, we released gitlab.com support last week.

We previously integrated Github.com, so we expected that Gitlab would be roughly the same, and well, expectations set up for disappointment. Let's see what was different and how we handled it.

History

Github was the first tool we integrated, meaning we had to do two things:

Interact with the API to see the data we can get, and which actions we can perform
Create the Atoms (the widgets) and model the Items

Atoms were initially designed to represent functional views. For example, we model Pull Requests (from Github), Merge requests (from Gitlab), Patches, and so on, as CodeContribution. This is a good idea if you plan to capitalize on functional views and when you are about to integrate multiple tools around the same domain.

In terms of code, it looks like this:

data CodeContribution = CodeContribution
  { id :: ItemId,
    author :: User,
    title :: Title,
    description :: Message
  }
  deriving stock (Eq, Show)

newtype ItemId
  = ItemId Text
  deriving stock (Eq, Show)

newtype User
  = User Text
  deriving stock (Eq, Show)

newtype Title
  = Title Text
  deriving stock (Eq, Show)

newtype Message
  = Message Text
  deriving stock (Eq, Show)

Concerns and Limitations

We actually use these types in different places:

To send updates of these Items through our streaming infrastructure (currently AWS Kinesis, but we're migrating to Apache Pulsar)
To store a reference of the current version of Items
To store and query ElasticSearch (to display Atoms)
To display it through the API

All of these rely on JSON representation (provided by aeson).

Having said that, we have only four possibilities for handling:

Having one instance of FromJSON/ToJSON, which would imply to have everything bound on the frontend representation
Having orphan instances, which could have lead to a struggle with imports or error not visible at compile-time
Having a Phantom type and defining instances on it
Having dedicated types and defining instances on them

#1 is not a realistic solution: we do not expose all the data we have.

#2 and #3 have the same problem: some changes cannot be prevented at compile-time.

Let say we change our Title to a stricter version:

newtype Title
  = Title NonEmptyText
  deriving stock (Eq, Show)
  deriving newtype (FromJSON, ToJSON)

It will still compile, but without a solid test suite, it will break backward compatibility.

Going Back-and-Forth

Then, for each target type, we have a mirrored type per concern, here for the persistence:

newtype TitleP
  = TitleP Text
  deriving stock (Eq, Show)
  deriving newtype (FromJSON, ToJSON)

It is the concept of Data transfer object in the OOP world.

Then we need to go back-and-forth, which is the role Isomorphism:

data Isomorphism a b = Isomorphism
  { rightWay :: a -> b,
    leftWay :: b -> a
  }

It's a concept borrowed from Category Theory, you can find a different version in lens, to go further, have a look at Dmitrii Kovanikov's talk at haskell.love 2021.

Which will lead to a simple implementation:

titlePIso :: Isomorphism Title TitleP
titlePIso = Isomorphism right left
  where
    right (Title x) = TitleP x
    left (TitleP x) = Title x

Dealing with Change: Integrating Gitlab

There are multiple differences, but we will focus on these:

Github works by call-by-value (eg. gautier for a user, myrepo for a repository)
Gitlab works by call-by-reference (eg. 42 for a user, 123 for a project)

So User and UserP will have to evolve and deal with the change:

data UserP = UserP
  { userIdP :: Text,
    userNameP :: Text
  }
  deriving stock (Eq, Show)
  deriving anyclass (ToJSON)

instance FromJSON UserP where
  parseJSON x = genericParseJSON defaultOptions x <|> oldFormat x
    where oldFormat =
            withText "UserP" $ \t ->
              pure $
                UserP
                { userIdP = t,
                  userNameP = t
                }

Conlcusion

While being an imperfect solution, it is quite efficient for its purpose.

However, it does not work when you have data you cannot deduce, or when you have external tools which are bound to a schema (eg. ElasticSearch). Here, you are forced to reindex your documents.

What is your strategy regarding change?

Top →