on
Soft Compatibility
Soft Compatibility
Hetchr is a SaaS product that centralizes the key features of many collaborative development tools (Github, Jira, Gitlab, and so on).
After few weeks of work, we released gitlab.com support last week.
We previously integrated Github.com, so we expected that Gitlab would be roughly the same, and well, expectations set up for disappointment. Let's see what was different and how we handled it.
History
Github was the first tool we integrated, meaning we had to do two things:
- Interact with the API to see the data we can get, and which actions we can perform
- Create the Atoms (the widgets) and model the Items
Atoms were initially designed to represent functional views. For example, we model Pull Requests (from Github), Merge requests (from Gitlab), Patches, and so on, as CodeContribution
.
This is a good idea if you plan to capitalize on functional views and when you are about to integrate multiple tools around the same domain.
In terms of code, it looks like this:
data CodeContribution = CodeContribution
{ id :: ItemId,
author :: User,
title :: Title,
description :: Message
}
deriving stock (Eq, Show)
newtype ItemId
= ItemId Text
deriving stock (Eq, Show)
newtype User
= User Text
deriving stock (Eq, Show)
newtype Title
= Title Text
deriving stock (Eq, Show)
newtype Message
= Message Text
deriving stock (Eq, Show)
Concerns and Limitations
We actually use these types in different places:
- To send updates of these Items through our streaming infrastructure (currently AWS Kinesis, but we're migrating to Apache Pulsar)
- To store a reference of the current version of Items
- To store and query ElasticSearch (to display Atoms)
- To display it through the API
All of these rely on JSON representation (provided by aeson).
Having said that, we have only four possibilities for handling:
- Having one instance of
FromJSON
/ToJSON
, which would imply to have everything bound on the frontend representation - Having orphan
instance
s, which could have lead to a struggle withimport
s or error not visible at compile-time - Having a Phantom type and defining instances on it
- Having dedicated types and defining instances on them
#1 is not a realistic solution: we do not expose all the data we have.
#2 and #3 have the same problem: some changes cannot be prevented at compile-time.
Let say we change our Title
to a stricter version:
newtype Title
= Title NonEmptyText
deriving stock (Eq, Show)
deriving newtype (FromJSON, ToJSON)
It will still compile, but without a solid test suite, it will break backward compatibility.
Going Back-and-Forth
Then, for each target type, we have a mirrored type per concern, here for the persistence:
newtype TitleP
= TitleP Text
deriving stock (Eq, Show)
deriving newtype (FromJSON, ToJSON)
It is the concept of Data transfer object in the OOP world.
Then we need to go back-and-forth, which is the role Isomorphism
:
data Isomorphism a b = Isomorphism
{ rightWay :: a -> b,
leftWay :: b -> a
}
It's a concept borrowed from Category Theory, you can find a different version in lens, to go further, have a look at Dmitrii Kovanikov's talk at haskell.love 2021.
Which will lead to a simple implementation:
titlePIso :: Isomorphism Title TitleP
titlePIso = Isomorphism right left
where
right (Title x) = TitleP x
left (TitleP x) = Title x
Dealing with Change: Integrating Gitlab
There are multiple differences, but we will focus on these:
- Github works by call-by-value (eg.
gautier
for a user,myrepo
for a repository) - Gitlab works by call-by-reference (eg.
42
for a user,123
for a project)
So User
and UserP
will have to evolve and deal with the change:
data UserP = UserP
{ userIdP :: Text,
userNameP :: Text
}
deriving stock (Eq, Show)
deriving anyclass (ToJSON)
instance FromJSON UserP where
parseJSON x = genericParseJSON defaultOptions x <|> oldFormat x
where oldFormat =
withText "UserP" $ \t ->
pure $
UserP
{ userIdP = t,
userNameP = t
}
Conlcusion
While being an imperfect solution, it is quite efficient for its purpose.
However, it does not work when you have data you cannot deduce, or when you have external tools which are bound to a schema (eg. ElasticSearch). Here, you are forced to reindex your documents.
What is your strategy regarding change?