Accessing Postgres in a Dataframe in Haskell: Early Steps, Part 2
Jun 9, 2018
9 minute read


This is part 2 in a 2-part series. Click here to go to part 1. The complete code associated with parts 1 and 2 can be seen here.

In the previous part of this series we saw some examples of usage of Frames, along with some explanation of how it works under the hood. In this part we take a look at the Postgres access code using beam, and then we finally tie things together by actually converting a ‘beam record’ (place holder for “a regular haskell record used by beam to describe tables”) to a vinyl record, and from a list of vinyls to an actual dataframe.

A Short Introduction to Beam

Haskell has many libraries for Postgres/other SQL database access/usage. To get some idea about how beam compares with library $x, I would refer you to beam’s FAQ page. Now onward to a mini-tutorial of how to write some data to beam and read it back.

Consider the following:

{-# LANGUAGE DeriveGeneric         #-}
{-# LANGUAGE FlexibleContexts      #-}
{-# LANGUAGE FlexibleInstances     #-}
{-# LANGUAGE ImpredicativeTypes    #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE OverloadedStrings     #-}
{-# LANGUAGE PartialTypeSignatures #-}
{-# LANGUAGE StandaloneDeriving    #-}
{-# LANGUAGE TypeFamilies          #-}
module PostgresAccess where

import           Data.Text                                (Text)
import           Database.Beam                            as B
import           Database.Beam.Backend.SQL.BeamExtensions
import           Database.Beam.Postgres
import           Database.Beam.Postgres.Syntax
import           Database.PostgreSQL.Simple
import           Lens.Micro

-- A
data UserT f = User
  { _userEmail       :: Columnar f Text
  , _userFirstName   :: Columnar f Text
  , _userLastName    :: Columnar f Text
  , _userIsMember    :: Columnar f Bool
  , _userDaysInQueue :: Columnar f Int
  } deriving (Generic)

-- B
type User = UserT Identity
type UserId = PrimaryKey UserT Identity

deriving instance Show User

instance Beamable UserT
instance Beamable (PrimaryKey UserT)

instance Table UserT where
  data PrimaryKey UserT f = UserId (Columnar f Text) deriving Generic
  primaryKey = UserId . _userEmail

-- C
data ShoppingCartDb f = ShoppingCartDb
                      { _shoppingCartUsers :: f (TableEntity UserT) }
                        deriving Generic

instance Database be ShoppingCartDb

-- D
ShoppingCartDb (TableLens shoppingCartUsers) = dbLenses

User (LensFor userEmail)    (LensFor userFirstName)
     (LensFor userLastName) (LensFor userIsMember)
     (LensFor userDaysInQueue) = tableLenses

-- E
shoppingCartDb :: DatabaseSettings be ShoppingCartDb
shoppingCartDb = defaultDbSettings

-- F
allUsers :: Q PgSelectSyntax ShoppingCartDb s (UserT (QExpr PgExpressionSyntax s))
allUsers = all_ (shoppingCartDb ^. shoppingCartUsers)

insertUsers :: Connection -> IO [User]
insertUsers conn =
  runBeamPostgresDebug putStrLn conn $ runInsertReturningList (shoppingCartDb ^. shoppingCartUsers) $
      insertValues users

users =  [ User "" "James" "Smith" True 1 {- james -}
                 , User "" "Betty" "Jones" False 42 {- betty -}
                 , User "" "James" "Pallo" True 1 {- james -}
                 , User "" "Betty" "Sims" False 42 {- betty -}
                 , User "" "James" "O'Reily" True 1 {- james -}
                 , User "" "Sam" "Sophitz" False 42 {- sam -}
                 , User "" "Sam" "Jely" True 1 {- sam -}
                 , User "" "Sam" "Taylor" False 42 {- sam -}

selectAllUsers :: Connection -> IO [User]
selectAllUsers conn =
  runBeamPostgresDebug putStrLn conn $ do
    users <- runSelectReturningList $ select allUsers
    (liftIO . return) users

Here’s how I would unpack all that is happening in the above:

A. The UserT record declaration (we’ll call it a ‘beam record’ from now on) is what maps onto the underlying database table. Fields in the UserT beam record are of the type Columnar f a; here Columnar is a type family such that when f is the Identity functor, then Columnar f a = a.

B. Next we declare some type synonyms for the table and its primary key. We also do ‘StandAloneDeriving’ of several typeclass instances needed by beam(hence we also use the language extension of the same name). The implementation of these instances is filled in by the compiler at compile time (we need the ‘DeriveGeneric’ extension for this). We also state that _userEmail is our primary key.

C. Here we have the data declaration corresponding to the database (ShoppingCartDb) and the table/tables inside the database (_shoppingCartUsers).

D. These are the declarations necessary for using lenses for accessing the table(s) in the database and fields in a particular table (the UserT table in this case).

E. defaultSettings implies, amongst other things, that our naming at the DB level and at the program-level is inline with the default naming that beam expects/handles automatically. In case there is any difference between the expectation and the implementation, we can specify that here.

F. The three declarations here, are used for inserting some mock data and reading back that data from the database.

This is all the essential code we need to access data written to Postgres. In case you’d like to learn more about beam, I would refer you to the excellent official tutorial.

Tying things together

Now let’s say you’d like to access the data written to Postgres above from a dataframe, in order interpret it (say in the context of some other CSV(s) also read into a dataframe(s)). In more concrete terms, you have code now that returns a list of User beam-records, and you want to convert this list to a dataframe. A quick glance at the documentation reveals that Frames.InCore has the following function:

toFrame :: (Foldable f, RecVec rs) => f (Record rs) -> Frame (Record rs)

This function can take in a foldable (e.g. a list) of Records. Record is a type synonym for Rec Identity taken from vinyl. Therefore if we can figure out a way to convert our beam-record to a vinyl record with the column names promoted to the type level (the rs), then our job is mostly done.

Consider the following code:

{-# LANGUAGE DataKinds              #-}
{-# LANGUAGE FlexibleContexts       #-}
{-# LANGUAGE FlexibleInstances      #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE MultiParamTypeClasses  #-}
{-# LANGUAGE TypeOperators          #-}
{-# LANGUAGE TypeSynonymInstances   #-}
module PostgresFrame where

import           Data.Text          (Text, unpack)
import           Data.Vinyl
import qualified Data.Vinyl.Functor as VF
import           Frames.Col
import           Frames.InCore
import           Frames.Rec
import qualified Lens.Micro         as LM
import           PostgresAccess     (User, userDaysInQueue, userEmail,
                                     userFirstName, userIsMember, userLastName)

-- A
type SEmail = "email" :-> String
type SFirstName = "first_name" :-> String
type SLastName = "last_name" :-> String
type SIsMember = "is_member" :-> Bool
type SDaysInQueue = "days_in_queue" :-> Int

-- B
class FromBeam t rs | t -> rs where
  createRecId :: t -> Rec VF.Identity rs

-- C
instance FromBeam User '[SEmail, SFirstName, SLastName, SIsMember, SDaysInQueue] where
  createRecId u =
      VF.Identity (Col $ unpack $ u LM.^. userEmail) :&
      VF.Identity (Col $ unpack $  u LM.^. userFirstName) :&
      VF.Identity (Col $ unpack $ u LM.^. userLastName) :&
      VF.Identity (Col $ u LM.^. userIsMember) :&
      VF.Identity (Col $ u LM.^. userDaysInQueue) :&

Here’s what the above does:

A. Here we declare the column types using the :-> constructor imported from Frames.Col. TypeOperators extension helps here by allowing us to use the :-> as infix notation at the type-level. Also since DataKinds extension is in effect, we are able to promote the column names from the term level (i.e. String) to the type-level.

B. Here we declare the FromBeam class that has two parameters (t and rs), thus requiring the MultiParamTypeClasses extension. The only function under this typeclass is createRecId that takes something of type t (in our case the beam-record t) and returns a vinyl record. In fact given a t (say User), we are able to uniquely write the typeclass instance for it’s underlying set of column types (rs). We formalize this fact by making use of the FunctionalDependencies extension and add the | t -> rs annotation to the class declaration. (We could probably further constrain the type variable t to have certain properties, but I haven’t gotten around to thinking about that yet.)

C. This is where we implement an instance of the FromBeam typeclass for the User beam-record. Here we are building a heterogenous list (i.e. a list with values of different types) of column values to get a vinyl record representing a row in our dataframe. Specifically, :& is like list cons (i.e. :) and RNil is like the empty list ([]). We instantiate every column with the Col data constructor (it is the data constructor for the :-> type constructor) that returns column value of the corresponding type. The column types are inferred from using SEmail, SFirstName, SLastName, SIsMember, SDaysInQueue, in order.

Next we create a database in psql named shoppingcart1. And create a table in this DB named cart_users using the SQL from the sql/cart_users.sql file.

Once you’ve done the above, all that remains is to build the project, and fire up ghci:

$ stack build
-- output elided
$ stack ghci
-- some output elided
Ok, five modules loaded.
Collecting type info for 5 module(s) ...
-- some output elided
ghci>:set -XOverloadedStrings
ghci>import Database.PostgreSQL.Simple
ghci>conn <- connectPostgreSQL "host=localhost dbname=shoppingcart1"
ghci>insertUsers conn
INSERT INTO "cart_users"("email", "first_name", "last_name", "is_member", "days_in_queue") VALUES ('', 'James', 'Smith', true, 1), ('', 'Betty', 'Jones', false, 42), ('', 'James', 'Pallo', true, 1), ('', 'Betty', 'Sims', false, 42), ('', 'James', 'O''Reily', true, 1), ('', 'Sam', 'Sophitz', false, 42), ('', 'Sam', 'Jely', true, 1), ('', 'Sam', 'Taylor', false, 42)  RETURNING "email", "first_name", "last_name", "is_member", "days_in_queue"
[User {...},...] -- some output elided
ghci>us <- selectAllUsers conn
SELECT "t0"."email" AS "res0", "t0"."first_name" AS "res1", "t0"."last_name" AS "res2", "t0"."is_member" AS "res3", "t0"."days_in_queue" AS "res4" FROM "cart_users" AS "t0"
ghci>let recId = map createRecId us
ghci>import Frames
ghci>let userFrame = toFrame recId
ghci>:t userFrame
  :: Frame
       (Record '[SEmail, SFirstName, SLastName, SIsMember, SDaysInQueue])
ghci>mapM_ print userFrame
{email :-> "", first_name :-> "James", last_name :-> "Smith", is_member :-> True, days_in_queue :-> 1}
{email :-> "", first_name :-> "Betty", last_name :-> "Jones", is_member :-> False, days_in_queue :-> 42}
{email :-> "", first_name :-> "James", last_name :-> "Pallo", is_member :-> True, days_in_queue :-> 1}
{email :-> "", first_name :-> "Betty", last_name :-> "Sims", is_member :-> False, days_in_queue :-> 42}
{email :-> "", first_name :-> "James", last_name :-> "O'Reily", is_member :-> True, days_in_queue :-> 1}
{email :-> "", first_name :-> "Sam", last_name :-> "Sophitz", is_member :-> False, days_in_queue :-> 42}
{email :-> "", first_name :-> "Sam", last_name :-> "Jely", is_member :-> True, days_in_queue :-> 1}

And with that we are now reading data read from Postgres into a dataframe.

What Comes Next?

Over the next few weeks the idea is to generalize the code that reads a beam-record to vinyl to account for other cases (such as joins, embedded records etc.). This would be followed by figuring out a way to autogenerate the FromBeam instance for arbitrary beam-records, along with associated type synonyms for column types. This would take out of the picture all sorts of boiler plate (some of which we saw) that the user would have to come up with otherwise.