Deriving Vinyl Representation from Plain Haskell Records
Jun 25, 2018
12 minute read

Note: The complete code associated with this post can be seen in this repository on the generic-vinyl branch.

Background

In the last post where we were trying to access Postgres from a dataframe, we had to manually declare an instance of the FromBeam class for the record at hand (i.e. UserT). With this manually declared instance we were able to go from the plain record to its vinyl representation.

But can we do better ? For instance, can we just say something like deriveVinyl ''UserT and have a typeclass instance generated for us automatically? More importantly, how can we make this instance generation work for arbitrary plain Haskell records (i.e. make it generic) ? These are some of the questions tackled in this blog post.

What we want to do

We want to convert from plain Haskell records to the corresponding vinyl representation. While looking up ways to do this generically, I stumbled across this comment on StackOverflow:

"...At least since vinyl-0.5, their Rec type is structurally isomorphic to NP in generics-sop..." - kosmikus

The above comment gave me quite a hint. I had to dig deeper. Looking at vinyl’s Rec:

data Rec :: (u -> *) -> [u] -> * where
  RNil :: Rec f '[]
  (:&) :: !(f r) -> !(Rec f rs) -> Rec f (r ': rs)

And next at the NP (stands for N-ary Product) data type from generics-sop:

data NP :: (k -> *) -> [k] -> * where
  Nil  :: NP f '[]
  (:*) :: f x -> NP f xs -> NP f (x ': xs)

As can now be seen, both the data declarations are parameterised over a functor (say f) and a type-level list (labels, in case of Rec); the type of values at any position, say for r :: u is given by f r :: *. Note that u and k are equivalent. Moreover, given an N-ary Product, by recursing over its structure and replacing its constructors appropriately with those of Rec, it should be possible to get it into a Rec representation; the opposite is also possible.

There is the records-sop library that “provides utilities for working with labelled single-constructor record types via generics-sop”. But it doesn’t work with vinyl records (yet!), which is what we are interested in here. Nevertheless, it is certainly prior work in the same direction as that pursued in this blog post.

So the transformation we are now interested in is:

Plain-records ~~~> Vinyl Representation   (Overall goal)

And we know that:

NP Representation ~ Vinyl Representation

Therefore, the following transformation is certainly worth exploring:

Plain-records ---> NP Representation ===> Vinyl Representation (Revised goal)

Note that all the funny looking arrows are an informal, made-up notation of sorts, different from any formal usage of “arrows”. Arrows as used above represent a conversion from one representation to another, and are used in a very informal sense. The second step of the transformation in the ‘Revised goal’ (i.e. “===>”) has a different kind of arrow. This is only to highlight (as we shall see) a different “sort” of transformation from NP to Rec in the final implementation, which is in addition to the isomorphism. We shall revisit this point again.

The next section delves into the essentials of the generics-sop library, which we shall be using to get an “intermediate” representation of plain-records.

A Short Introduction to generics-sop

Note: This section has been adapted from “Applying Type-level and Generic Programming in Haskell” by Andres Löh, who is one of the co-authors of the library. You should read the pdf if you want a deeper dive into generics-sop and the type-level machinery it leverages.

Moving on; assume there is a typeclass:

class Generic a where 
  type Rep a
  from :: a -> Rep a 
  to :: Rep a -> a

The core of what generics-sop enables is that given a type a that is representable as Rep a, it is possible to write functions that would work for all such representable data types.

More specifically, if all Rep a have a commmon structure then it becomes possible to define a generic function:

geq :: Generic a => Rep a -> Rep a -> Bool

We can now implement the following function that works for all representable types:

eq :: Generic a => a -> a -> Bool 
eq x y = geq (from x) (from y)

To use a (rough) visual metaphor, a large number of data types can be “unfurled” into a structurally similar representation. Once in this new representation, many sorts of generic functions can be written over this representation. This representation is called ‘Sum of Products’ (the ‘sop’ in generics-sop; see pdf for more details). And it is also possible to “pack-up” the representation back into the original structure.

The generics-sop library provides two approaches for “unfurling”/“furling” a to Rep a/vice-versa (i.e. comimg up with the from and to functions):

A. Using GHC.Generics

B. Using TemplateHaskell

According to the authors of generics-sop: using option A or B is “…mainly a matter of personal taste. The version based on Template Haskell probably has less run-time overhead.” Now for us, since the Rep is an intermediate step, we would benefit from the transparency afforded to us by the TemplateHaskell approach (by using some compiler options to inspect the generated code). So ‘option B’ it is.

Next, we try to inspect the output of trying to auto-derive a Generic instance for the UserT record. We setup the requisite compiler flags (‘-ddump-splices’, ‘-ddump-to-file’) in the package.yaml file. After adding the requisite language extensions/imports (see code repo for details), all we need to do is:

-- TestDeriveVinyl.hs

deriveGeneric ''UserT

The above snippet when compiled, brings into scope the following code (along with some other code that we’re not interested in). This can be seen in the TestDeriveVinyl.dump-splices file (note that some of the type variables have been made to look better (“sugared”) than their original generated form; also certain names have been un-qualified; if you were to compile the repo, then the code is sure to look slightly different):

gsoc-blog-code/src/TestDeriveVinyl.hs:23:1-21: Splicing declarations
    deriveGeneric ''UserT
  ======>
    instance Generic (UserT f) where
      type Code (UserT f) = '['[Columnar f Text,
                                     Columnar f Text,
                                     Columnar f Text,
                                     Columnar f Bool,
                                     Columnar f Int]]
      from (User x1 x2 x3 x4 x5)
        = SOP
            (Z ((I x1)
                  :*
                    ((I x2)
                       :*
                         ((I x3) :* ((I x4) :* ((I x5) :* Nil))))))
      to
        (SOP (Z ((I x_1)
                  :*
                    ((I x_2)
                       :*
                         ((I x_3) :* ((I x_4) :* ((I x_5) :* Nil)))))))
        = ((((User x_1) x_2) x_3) x_4) x_5
      to _ = error "unreachable"

As can be observed, the from function unfurls the record values into the SOP representation. And the to function gets us back to a User record. The Code type synonym stands for the field types, parametrised over an interpretation functor.

We have our intermediate representation. Next we try to take this to the vinyl representation.

GenericVinyl FTW!

Consider the following type family and type-class declaration:

-- Vinylize.hs
type family ZipTypes (ns :: [Symbol])  (ys :: [*]) = (zs :: [k]) | zs -> ns ys
type instance ZipTypes '[] '[] = '[]
type instance ZipTypes (n ': ns) (y ': ys)  =  ( n  :-> y) ': (ZipTypes ns  ys)

class GenericVinyl a names rs | a -> names rs where
  type FieldNames a :: [Symbol]
  createRecId :: a  -> Rec VF.Identity (ZipTypes names rs)

The type family ZipTypes takes two type variable lists (one for column-names and one for column types) and returns the column types of the vinyl record, by zipping the correspinding elements using the :-> type constructor imported from Frames.Col. Note that we add the type family dependency zs -> ns ys to state that this type family is injective, i.e. ZipTypes a1 b1 ~ ZipTypes a2 b2 implies that (a1, b1) ~ (a2, b2). Next we declare the GenericVinyl typeclass, that is similar to the FromBeam typeclass from the previous post, with the addition of the FieldNames a declaration and createRecId return type making use of the ZipTypes type family.

Before we write the GenericVinyl instance, we need to extract a type-level list of record field names. Consider the following code:

{-# LANGUAGE TemplateHaskell #-}
module Helpers where

import           Language.Haskell.TH
import           Language.Haskell.TH.Syntax

fNamesTypeLevel :: Name -> Q Type
fNamesTypeLevel name = do
  fnames <- fmap getRecordFields $ reify name
  fnames' <- fnames
  foldr (\x xs -> appT (appT promotedConsT x) xs) promotedNilT $ map (litT . strTyLit) fnames'


getRecordFields :: Info -> Q [String]
getRecordFields (TyConI (DataD _ _ _ _ cons _)) = return $ concatMap getRF cons
getRecordFields _                               = return []

getRF :: Con -> [String]
getRF (RecC _name fields) = map getFieldInfo fields
getRF _                   = []

getFieldInfo :: (Name, Strict, Type) -> String
getFieldInfo (name, _, AppT (AppT (ConT _) (VarT f)) (ConT ty)) = (nameBase name)

The function fNamesTypeLevel takes a Name, i.e. an in-scope record in our case and returns a type-level list of its field-names.

Next we try to write a GenericVinyl typeclass instance for plain records of the sort used by beam. Consider the following code:

{-# LANGUAGE DataKinds              #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE MultiParamTypeClasses  #-}
{-# LANGUAGE PolyKinds              #-}
{-# LANGUAGE TemplateHaskell        #-}
{-# LANGUAGE TypeFamilies           #-}
{-# LANGUAGE TypeFamilyDependencies #-}
{-# LANGUAGE TypeOperators          #-}
module Vinylize where

import           Data.Proxy
import           Data.Vinyl
import qualified Data.Vinyl.Functor         as VF
import qualified Database.Beam              as B
import           Frames.Col
import           Generics.SOP
import qualified Generics.SOP.NP            as GSN
import           GHC.TypeLits
import           Helpers                    (fNamesTypeLevel)
import           Language.Haskell.TH
import           Language.Haskell.TH.Syntax

-- ZipTypes and GenericVinyl class declaration elided here

deriveVinyl :: Name -> DecsQ
deriveVinyl name = entireInstance
  where
    n = conT name
    typeList1 = fNamesTypeLevel name
    entireInstance=
      [d|
        instance (((Code ($(n) B.Identity)) ~ '[rs]),
          (ns3 ~ FieldNames ($(n) B.Identity)) )
          => GenericVinyl ($(n) B.Identity) ns3 rs where
          type FieldNames ($(n) B.Identity) = $(typeList1)
          createRecId r = (go tranformedNP)
            where
              SOP (Z prod) = from r
              tranformedNP = (((GSN.trans_NP (Proxy :: Proxy (LiftedCoercible I VF.Identity)) (\(I x) -> VF.Identity $ coerce x) prod)) )
              go = GSN.cata_NP RNil (:&)
         |]

The deriveVinyl function takes a record name as input and generates a GenericVinyl instance declaration for this record. On the whole, trans_NP transforms the NP from one interpretation functor to another, (safely) coercing the field values to the appropriate column types at the same time. In order to have a coercion constraint lifted over the interpretation functor, we use the Proxy (LiftedCoercible I VF.Identity) argument. Specifically, trans_NP :: AllZip c xs ys => proxy c -> (forall x y. c x y => f x -> g y) -> NP f xs -> NP g ys, so the proxy argument provides a partially applied class constraint, that upon being fully applied yeilds a constraint LiftedCoercible I VF.Identity x y, which happens to be equivalent to Coercible (I x) (VF.Identity y) (Coercible class comes from Data.Coerce; LiftedCoercible is declared in Generics.SOP.Constraint). In my understanding, this is necessary as the compiler usually infers on its own Coercible x y when x and y have the same representation; but here we need to lift the Coercible constraint over the respective interpretation functors.

Of particular note is the functiongo, which effectively replaces the NP constructors with constructors from vinyl (GSN.cata_NP is a generalization of foldr; we pass it RNil as the neutral element and (:&) combines consecutive elements). The overall effect of createRecId is therefore not just an isomorphism, but some additional things happening as well (most notably the coercing). This was the point that I had flagged earlier as one to be revisited, when we were discussing the informal arrow diagrams.

Testing it all

Consider the following code:

{-# LANGUAGE DataKinds              #-}
{-# LANGUAGE FlexibleInstances      #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE MultiParamTypeClasses  #-}
{-# LANGUAGE OverloadedStrings      #-}
{-# LANGUAGE TemplateHaskell        #-}
{-# LANGUAGE TypeFamilies           #-}
{-# LANGUAGE TypeFamilyDependencies #-}
{-# LANGUAGE UndecidableInstances   #-}
module TestDeriveVinyl where

import           Data.Coerce
import           Data.Vinyl
import qualified Data.Vinyl.Functor         as VF
import qualified Database.Beam              as B
import           Database.PostgreSQL.Simple
import           Frames
import           Generics.SOP
import           Generics.SOP.TH
import           PostgresAccess             (User, UserT, selectAllUsers)
import           Vinylize                   (createRecId, deriveVinyl)

deriveGeneric ''UserT

deriveVinyl ''UserT

Compiling and checking for the generated code, reveals that the following instance of GenericVinyl for UserT has been generated (again, the type variables below have been sugared and names un-qualified in places):

gsoc-blog-code/src/TestDeriveVinyl.hs:25:1-19: Splicing declarations
    deriveVinyl ''UserT
  ======>
    instance (Code (UserT B.Identity) ~ '[rs],
              ns3 ~ FieldNames (UserT B.Identity)) =>
             GenericVinyl (UserT B.Identity) ns3 rs where
      type FieldNames (UserT B.Identity) = '["_userEmail",
                                             "_userFirstName",
                                             "_userLastName",
                                             "_userIsMember",
                                             "_userDaysInQueue"]
      createRecId r
        = go tranformedNP
        where
            SOP (Z prod) = from r
            tranformedNP
              = ((Generics.SOP.NP.trans_NP
                    (Proxy :: Proxy (LiftedCoercible I VF.Identity)))
                   (\ I x -> (VF.Identity $ (coerce x))))
                  prod
            go = (Generics.SOP.NP.cata_NP RNil) (Data.Vinyl.:&)

Next, we try to test this instance with the following test code (which is similar to what we did from ghci in the last post):

-- TestDeriveVinyl
test :: IO ()
test = do
  conn <- connectPostgreSQL "host=localhost dbname=shoppingcart1"
  us <- selectAllUsers conn
  mapM_ print $ toFrame $ map createRecId us

And then assuming you still have the test data from the last post in your local Postgres instance, upon going to ghci:

ghci>test
SELECT "t0"."email" AS "res0", "t0"."first_name" AS "res1", "t0"."last_name" AS "res2", "t0"."is_member" AS "res3", "t0"."days_in_queue" AS"res4" FROM "cart_users" AS "t0"
{_userEmail :-> "james@example.com", _userFirstName :-> "James", _userLastName :-> "Smith", _userIsMember :-> True, _userDaysInQueue :-> 1}
-- some output elided

We can confirm the following as well:

ghci>:set -XOverloadedStrings
ghci>conn <- connectPostgreSQL "host=localhost dbname=shoppingcart1"
ghci>us <- PostgresAccess.selectAllUsers conn
SELECT "t0"."email" AS "res0", "t0"."first_name" AS "res1", "t0"."last_name" AS "res2", "t0"."is_member" AS "res3", "t0"."days_in_queue" AS"res4" FROM "cart_users" AS "t0"
ghci>:t us
us :: [User]
-- type User = UserT B.Identity
ghci>:t (map createRecId us)
(map createRecId us)
  :: [Rec
        VF.Identity
        '["_userEmail" :-> Text, "_userFirstName" :-> Text,
          "_userLastName" :-> Text, "_userIsMember" :-> Bool,
          "_userDaysInQueue" :-> Int]]

It would be safe at this point to say: it works!

Conclusion

In this post we saw a (somewhat experimental) approach of converting plain records to their vinyl representation. We did this in order to convert our list of plain records (representing beam query results) to a dataframe with minimal boilerplate at the end-user side (compare the module TestDeriveVinyl to our older approach from the last post in module PostgresFrame).

Thanks

Thanks to Marco Zocca (@ocramz), my GSoC mentor, for reviewing an earlier version of this blog post, and providing valuable feedback/guidance over the course of this project.