Definitely Impure IO_Haskell Notes 5

Written at the Beginning

I've always had a question: Haskell claims to be a pure functional language, so how are definitely impure scenarios (definitely have side effects, or the operation itself is a side effect) resolved?

For example, (pseudo) random numbers, I/O, etc. A pure function random number generator definitely cannot exist, so how should such scenarios be handled?

Haskell's approach is actually similar to React's componentDidMount() and other component lifecycle functions. React suggests (moral constraint) keeping render() as a pure function, moving operations with side effects to componentDidMount() and other lifecycles. That is, through lifecycle hooks, separating the pure from the impure. Haskell provides do statement blocks, also used to isolate impure parts

I. I/O Action

First, look at a function type:

> :t print
print :: Show a => a -> IO ()

The print function accepts a Show class parameter, returns an IO (), called I/O Action, also a type, as follows:

> :k IO
IO :: * -> *
> :k IO ()
IO () :: *
> :i IO
newtype IO a
  = GHC.Types.IO (GHC.Prim.State# GHC.Prim.RealWorld
                  -> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))
    -- Defined in 'GHC.Types'
instance Monad IO -- Defined in 'GHC.Base'
instance Functor IO -- Defined in 'GHC.Base'
instance Applicative IO -- Defined in 'GHC.Base'
instance Monoid a => Monoid (IO a) -- Defined in 'GHC.Base'

From the type perspective, IO is similar to Maybe :: * -> *, both accept a concrete type parameter, return a concrete type (such as IO ())

P.S. Among them, newtype is similar to data type declaration, syntax and usage are also basically the same, newtype is a stricter type declaration (directly replacing with data also works normally, data replacing newtype may not work), specific differences are:

data can only be replaced with newtype if the type has exactly one constructor with exactly one field inside it.

II. User Input

Can obtain user input through I/O Action, for example:

main = do
  line <- getLine
  if null line then
    return ()
  else do -- do used to combine actions
    putStrLn line
    main

The above example is a simple echo program, getLine takes one line of input, returns IO String, and uses the <- operator to extract the String, assigning it to the line variable, if empty then do nothing (return IO (), end), otherwise output that line's content through putStrLn to standard output with a newline, and recursively execute main

Among them, main represents the entry function (similar to C language), do is used to combine multiple I/O Actions into one, returning the last combined I/O Action. Additionally, I/O Actions in do statement blocks will be executed, so do statement blocks have 2 purposes:

Can have multiple statements, but finally must return I/O Action
Delimit impure environment, I/O Action can execute within this environment

Analogizing to JS, the function of combining multiple statements is similar to the comma operator, returning the value of the last expression. Delimiting impure environment is similar to async function, I/O Action can only appear in do statement blocks, this point is similar to await

P.S. Actually, there are 3 ways to execute I/O Action:

When bound to main, as entry function
Put into do statement block
Input I/O Action in GHCi environment and press enter, such as putStrLn "hoho"

Execution

Can execute main as a normal function in GHCi environment, for example:

> :l echo
[1 of 1] Compiling Main             ( echo.hs, interpreted )
Ok, modules loaded: Main.
> main
what?
what?

Inputting an empty line will exit, inputting other content will output line by line as-is

Can also compile to get an executable file:

$ ghc --make ./echo.hs
[1 of 1] Compiling Main             ( echo.hs, echo.o )
Linking echo ...
$ ./echo
here
here

III. Control.Monad

The Control.Monad module also provides some functions applicable to I/O scenarios, encapsulating some fixed patterns, such as forever do, when condition do, etc., able to simplify some scenarios

return

return is used to wrap value into I/O Action, not to jump out of a function. return and <- have opposite effects (feeling of boxing/unboxing):

main = do
  a <- return "hell"
  b <- return "yeah!"
  putStrLn $ a ++ " " ++ b

Two purposes:

Used to create I/O Action that does nothing, such as the then part in echo example
Customize return value of do statement block, such as scenarios where you don't want to directly use I/O Action as the return value of do statement block, want to process it further

when

when is also a function:

Control.Monad.when :: Applicative f => Bool -> f () -> f ()

Can accept a boolean value and an I/O Action (IO belongs to Applicative class), the effect is when boolean value is True the value is I/O Action, otherwise value is return (), so equivalent to:

when' c io = do
  if c then io
  else return ()

This thing's type is:

when' :: Monad m => Bool -> m () -> m ()

So if used for I/O, the second parameter's return type can only be IO (), looks not very convenient, but very suitable for conditional output scenarios, after all print and a series of output functions all satisfy this type

sequence

sequence :: (Traversable t, Monad m) => t (m a) -> m (t a)

This type declaration looks relatively complex:

Traversable :: (* -> *) -> Constraint
Monad :: (* -> *) -> Constraint
-- Find two corresponding instances, List and IO
instance Traversable [] -- Defined in 'Data.Traversable'
instance Monad IO -- Defined in 'GHC.Base'

In I/O List scenario (replace m with IO, t with []), the parameter's type constraint is [IO a], return value's type constraint is IO [a], so equivalent to:

sequence' [] = do
  return []
sequence' (x:xs) = do
  v <- x
  others <- (sequence' xs)
  return (v : others)

The effect is to collect all I/O results in I/O List, form a List, then wrap into IO

P.S. Has a bit of Promise.all feeling, accepts a set of promise, returns a new promise carrying this set of results

mapM and mapM_

Control.Monad.mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
Control.Monad.mapM_ :: (Foldable t, Monad m) => (a -> m b) -> t a -> m ()

In I/O List scenario, mapM's first parameter is a function that inputs a outputs IO b, second parameter is [a], returns IO [b], return value type is consistent with sequence. The effect is equivalent to first mapping [a], getting I/O List, then doing a sequence, for example:

> mapM (\x -> do return $ x + 1) [1, 2, 2]
[2,3,3]
> mapM print [1, 2, 2]
1
2
2
[(),(),()]

mapM_ is similar to it, but discards results, returns IO (), very suitable for scenarios like print where I/O Action results are not concerned:

> mapM_ print [1, 2, 2]
1
2
2

forM

Control.Monad.forM :: (Traversable t, Monad m) => t a -> (a -> m b) -> m (t b)

Parameter order is opposite to mapM, same effect:

> forM [1, 2, 2] print
1
2
2
[(),(),()]

Just a formal difference, if the function passed as second parameter is relatively complex, forM looks clearer, for example:

main = do
  colors <- forM [1,2,3,4] (\a -> do
    putStrLn $ "Which color do you associate with the number " ++ show a ++ "?"
    getLine)
  putStrLn "The colors that you associate with 1, 2, 3 and 4 are: "
  mapM putStrLn colors

P.S. Finally using forM (swapping parameter order) also works, but for semantic convention, forM is commonly used in scenarios defining I/O Action (such as generating IO [b] from [a])

forever

Control.Monad.forever :: Applicative f => f a -> f b

In I/O scenario, accepts an I/O Action, returns an I/O Action that forever repeats that Action. So the echo example can be approximately rewritten as:

echo = Control.Monad.forever $ do
    line <- getLine
    if null line then
      return ()
    else
      putStrLn' line

In echo scenario doesn't show much advantage (even can't exit anymore, unless Ctrl+C force interrupt), but there's a scenario very suitable for forever do:

import Control.Monad
import Data.Char

main = forever $ do
  line <- getLine
  putStrLn $ map toUpper line

That is text processing (transformation) scenario, when input text ends forever also ends, for example:

$ ghc --make ./toUpperCase.hs
[1 of 1] Compiling Main             ( toUpperCase.hs, toUpperCase.o )
Linking toUpperCase ...
$ cat ./data/lines.txt
hoho, this is xx.
who's that ?
$ cat ./data/lines.txt | ./toUpperCase
HOHO, THIS IS XX.
WHO'S THAT ?
toUpperCase: <stdin>: hGetLine: end of file

Through forever do gradually process file content line by line into uppercase form, further:

$ cat ./data/lines.txt | ./toUpperCase > ./tmp.txt
toUpperCase: <stdin>: hGetLine: end of file
$ cat ./tmp.txt
HOHO, THIS IS XX.
WHO'S THAT ?

Write processing results to file, meets expectations

IV. System.IO

Previously used getLine, putStrLn are all functions in System.IO module, commonly used ones also include:

-- Output
print :: Show a => a -> IO ()
putChar :: Char -> IO ()
putStr :: String -> IO ()
-- Input
getChar :: IO Char
getLine :: IO String

Among them print is used to output values, equivalent to putStrLn . show, putStr is used to output strings, without newline at end, the difference between the two is:

> print "hoho"
"hoho"
> putStr "hoho"
hoho

P.S. For detailed information of IO module see System.IO

getContents

getContents :: IO String

getContents can return all user input as a string, so toUpperCase can be rewritten like this:

toUpperCase' = do
  contents <- getContents
  putStr $ map toUpper contents

No longer process line by line, but take out all content, convert all at once. But if compiling and executing this function, will find it's processed line by line:

$ ./toUpperCase
abc
ABC
efd
EFD

This is related to input buffer, specifics see Haskell: How getContents works?

Lazy I/O

String itself is a lazy List, getContents is also lazy I/O, won't read all content into memory at once

In toUpperCase' example, will read line by line then output uppercase version, because only when outputting do we truly need this input data. Operations before this are all just promises, only demand fulfillment when不得不 do, similar to JS Promise:

function toUpperCase() {
  let io;
  let contents = new Promise((resolve, reject) => {
    io = resolve;
  });
  let upperContents = contents
    .then(result => result.toUpperCase());
  putStr(upperContents, io);
}

function putStr(promise, io) {
  promise.then(console.log.bind(console));
  io('line\nby\nline');
}

// test
toUpperCase();

Very vivid, getContents, map toUpper and other operations just create a series of Promise, only when encountering putStr needing to output results do we truly do I/O then perform toUpper and other calculations

interact

interact :: (String -> String) -> IO ()

Accepts a string processing function as parameter, returns empty I/O Action. Very suitable for text processing scenarios, for example:

-- Filter out lines with less than 3 characters
lessThan3Char = interact (\s -> unlines $ [line | line <- lines s, length line < 3])

Equivalent to:

lessThan3Char' = do
  contents <- getContents
  let filtered = filterShortLines contents
  if null filtered then
    return ()
  else
    putStr filtered
  where
    filterShortLines = \s -> unlines $ [line | line <- lines s, length line < 3]

Looks more troublesome, interact function name is called interaction, the effect is to simplify this most common interaction pattern: input string, after processing output the results

V. File Read/Write

Read a file, display it as-is:

import System.IO

main = do
  handle <- openFile "./data/lines.txt" ReadMode
  contents <- hGetContents handle
  putStr contents
  hClose handle

Form is similar to C language file read/write, handle is equivalent to file pointer, open file in read-only mode to get file pointer, then read its content through pointer, finally release file pointer. Intuitively, we try doing this:

readTwoLines = do
  handle <- openFile "./data/lines.txt" ReadMode
  line1 <- hGetLine handle
  line2 <- hGetLine handle
  putStrLn line1
  putStrLn line2
  hClose handle

Everything normal, read first two lines of file, then output them, this pointer indeed can move

P.S. Similar hGet/Putxxx contains many, such as hPutStr, hPutStrLn, hGetChar etc., similar to versions without h, just with an extra handle parameter, for example:

hPutStr :: Handle -> String -> IO ()

Looking back at these function types:

openFile :: FilePath -> IOMode -> IO Handle
hGetContents :: Handle -> IO String
hGetLine :: Handle -> IO String
hClose :: Handle -> IO ()

openFile accepts a FilePath and IOMode parameter, returns IO Handle, holding this Handle can ask hGetContents or hGetLine for file content, finally release file pointer related resources through hClose. Among them FilePath is String (alias defined for String), IOMode is an enum value (read-only, write-only, append, read-write 4 modes):

> :i FilePath
type FilePath = String 	-- Defined in 'GHC.IO'
> :i IOMode
data IOMode = ReadMode | WriteMode | AppendMode | ReadWriteMode
    -- Defined in 'GHC.IO.IOMode'

P.S. Can understand file pointer as bookmark, book refers to entire file system, this metaphor is very vivid

withFile

withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r

Looks like another pattern encapsulation, so, use it to simplify the above file reading example:

readThisFile = withFile "./data/lines.txt" ReadMode (\handle -> do
    contents <- hGetContents handle
    putStr contents
  )

Looks a bit cleaner, more and more functional common patterns, things done are nothing but two types:

Abstract out common patterns, including Maybe/Either and other type abstractions, forever do, interact and other common pattern abstractions
Simplify parts outside key logic, such as withFile, map, filter and other utility functions can help strip boilerplate code (openFile, hClose and other routine operations), focus more on key logic

So, what withFile does is according to passed file path and read mode, open file, inject the obtained handle to file processing function (3rd parameter), finally close the handle:

withFile' path mode f = do
  handle <- openFile path mode
  result <- f handle
  hClose handle
  return result

Note, here reflects the important role of return, we need to hClose handle before returning result, so must have mechanism to return custom value

readFile

readFile :: FilePath -> IO String

Input file path, output IO String, Open/Close links are all omitted, can make reading files very simple:

readThisFile' = do
  contents <- readFile "./data/lines.txt"
  putStr contents

writeFile

writeFile :: FilePath -> String -> IO ()

Input file path, and string to write, returns an empty I/O Action, similarly omits the link of dealing with handle:

writeThatFile = do
  writeFile "./data/that.txt" "contents in that file\nanother line\nlast line"

File doesn't exist will auto-create, overwrite write, very convenient to use. Equivalent to manual control troublesome way:

writeThatFile' = do
  handle <- openFile "./data/that.txt" WriteMode
  hPutStr handle "contents in that file\nanother line\nlast line"
  hClose handle

appendFile

appendFile :: FilePath -> String -> IO ()

Type is same as writeFile, just internally uses AppendMode, append content to end of file

Other File Operation Functions

-- At path specified by FilePath, open file with name specified by String plus random string, return tuple of temp filename and handle
openTempFile :: FilePath -> String -> IO (FilePath, Handle)
-- Defined in System.Directory module, used to delete specified file
removeFile :: FilePath -> IO ()
-- Defined in System.Directory module, used to rename specified file
renameFile :: FilePath -> FilePath -> IO ()

Note, among them removeFile and renameFile are both defined in System.Directory module (not in System.IO), file create/delete/update/query, permission management and other functions are all in System.Directory module, such as doesFileExist, getAccessTime, findFile etc.

P.S. For more file operation functions, see System.Directory