Written at the Beginning
I've always had a question: Haskell claims to be a pure functional language, so how are definitely impure scenarios (definitely have side effects, or the operation itself is a side effect) resolved?
For example, (pseudo) random numbers, I/O, etc. A pure function random number generator definitely cannot exist, so how should such scenarios be handled?
Haskell's approach is actually similar to React's componentDidMount() and other component lifecycle functions. React suggests (moral constraint) keeping render() as a pure function, moving operations with side effects to componentDidMount() and other lifecycles. That is, through lifecycle hooks, separating the pure from the impure. Haskell provides do statement blocks, also used to isolate impure parts
I. I/O Action
First, look at a function type:
> :t print
print :: Show a => a -> IO ()
The print function accepts a Show class parameter, returns an IO (), called I/O Action, also a type, as follows:
> :k IO
IO :: * -> *
> :k IO ()
IO () :: *
> :i IO
newtype IO a
= GHC.Types.IO (GHC.Prim.State# GHC.Prim.RealWorld
-> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))
-- Defined in 'GHC.Types'
instance Monad IO -- Defined in 'GHC.Base'
instance Functor IO -- Defined in 'GHC.Base'
instance Applicative IO -- Defined in 'GHC.Base'
instance Monoid a => Monoid (IO a) -- Defined in 'GHC.Base'
From the type perspective, IO is similar to Maybe :: * -> *, both accept a concrete type parameter, return a concrete type (such as IO ())
P.S. Among them, newtype is similar to data type declaration, syntax and usage are also basically the same, newtype is a stricter type declaration (directly replacing with data also works normally, data replacing newtype may not work), specific differences are:
data can only be replaced with newtype if the type has exactly one constructor with exactly one field inside it.
II. User Input
Can obtain user input through I/O Action, for example:
main = do
line <- getLine
if null line then
return ()
else do -- do used to combine actions
putStrLn line
main
The above example is a simple echo program, getLine takes one line of input, returns IO String, and uses the <- operator to extract the String, assigning it to the line variable, if empty then do nothing (return IO (), end), otherwise output that line's content through putStrLn to standard output with a newline, and recursively execute main
Among them, main represents the entry function (similar to C language), do is used to combine multiple I/O Actions into one, returning the last combined I/O Action. Additionally, I/O Actions in do statement blocks will be executed, so do statement blocks have 2 purposes:
-
Can have multiple statements, but finally must return I/O Action
-
Delimit impure environment, I/O Action can execute within this environment
Analogizing to JS, the function of combining multiple statements is similar to the comma operator, returning the value of the last expression. Delimiting impure environment is similar to async function, I/O Action can only appear in do statement blocks, this point is similar to await
P.S. Actually, there are 3 ways to execute I/O Action:
-
When bound to
main, as entry function -
Put into
dostatement block -
Input I/O Action in GHCi environment and press enter, such as
putStrLn "hoho"
Execution
Can execute main as a normal function in GHCi environment, for example:
> :l echo
[1 of 1] Compiling Main ( echo.hs, interpreted )
Ok, modules loaded: Main.
> main
what?
what?
Inputting an empty line will exit, inputting other content will output line by line as-is
Can also compile to get an executable file:
$ ghc --make ./echo.hs
[1 of 1] Compiling Main ( echo.hs, echo.o )
Linking echo ...
$ ./echo
here
here
III. Control.Monad
The Control.Monad module also provides some functions applicable to I/O scenarios, encapsulating some fixed patterns, such as forever do, when condition do, etc., able to simplify some scenarios
return
return is used to wrap value into I/O Action, not to jump out of a function. return and <- have opposite effects (feeling of boxing/unboxing):
main = do
a <- return "hell"
b <- return "yeah!"
putStrLn $ a ++ " " ++ b
Two purposes:
-
Used to create I/O Action that does nothing, such as the
thenpart inechoexample -
Customize return value of
dostatement block, such as scenarios where you don't want to directly use I/O Action as the return value ofdostatement block, want to process it further
when
when is also a function:
Control.Monad.when :: Applicative f => Bool -> f () -> f ()
Can accept a boolean value and an I/O Action (IO belongs to Applicative class), the effect is when boolean value is True the value is I/O Action, otherwise value is return (), so equivalent to:
when' c io = do
if c then io
else return ()
This thing's type is:
when' :: Monad m => Bool -> m () -> m ()
So if used for I/O, the second parameter's return type can only be IO (), looks not very convenient, but very suitable for conditional output scenarios, after all print and a series of output functions all satisfy this type
sequence
sequence :: (Traversable t, Monad m) => t (m a) -> m (t a)
This type declaration looks relatively complex:
Traversable :: (* -> *) -> Constraint
Monad :: (* -> *) -> Constraint
-- Find two corresponding instances, List and IO
instance Traversable [] -- Defined in 'Data.Traversable'
instance Monad IO -- Defined in 'GHC.Base'
In I/O List scenario (replace m with IO, t with []), the parameter's type constraint is [IO a], return value's type constraint is IO [a], so equivalent to:
sequence' [] = do
return []
sequence' (x:xs) = do
v <- x
others <- (sequence' xs)
return (v : others)
The effect is to collect all I/O results in I/O List, form a List, then wrap into IO
P.S. Has a bit of Promise.all feeling, accepts a set of promise, returns a new promise carrying this set of results
mapM and mapM_
Control.Monad.mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
Control.Monad.mapM_ :: (Foldable t, Monad m) => (a -> m b) -> t a -> m ()
In I/O List scenario, mapM's first parameter is a function that inputs a outputs IO b, second parameter is [a], returns IO [b], return value type is consistent with sequence. The effect is equivalent to first mapping [a], getting I/O List, then doing a sequence, for example:
> mapM (\x -> do return $ x + 1) [1, 2, 2]
[2,3,3]
> mapM print [1, 2, 2]
1
2
2
[(),(),()]
mapM_ is similar to it, but discards results, returns IO (), very suitable for scenarios like print where I/O Action results are not concerned:
> mapM_ print [1, 2, 2]
1
2
2
forM
Control.Monad.forM :: (Traversable t, Monad m) => t a -> (a -> m b) -> m (t b)
Parameter order is opposite to mapM, same effect:
> forM [1, 2, 2] print
1
2
2
[(),(),()]
Just a formal difference, if the function passed as second parameter is relatively complex, forM looks clearer, for example:
main = do
colors <- forM [1,2,3,4] (\a -> do
putStrLn $ "Which color do you associate with the number " ++ show a ++ "?"
getLine)
putStrLn "The colors that you associate with 1, 2, 3 and 4 are: "
mapM putStrLn colors
P.S. Finally using forM (swapping parameter order) also works, but for semantic convention, forM is commonly used in scenarios defining I/O Action (such as generating IO [b] from [a])
forever
Control.Monad.forever :: Applicative f => f a -> f b
In I/O scenario, accepts an I/O Action, returns an I/O Action that forever repeats that Action. So the echo example can be approximately rewritten as:
echo = Control.Monad.forever $ do
line <- getLine
if null line then
return ()
else
putStrLn' line
In echo scenario doesn't show much advantage (even can't exit anymore, unless Ctrl+C force interrupt), but there's a scenario very suitable for forever do:
import Control.Monad
import Data.Char
main = forever $ do
line <- getLine
putStrLn $ map toUpper line
That is text processing (transformation) scenario, when input text ends forever also ends, for example:
$ ghc --make ./toUpperCase.hs
[1 of 1] Compiling Main ( toUpperCase.hs, toUpperCase.o )
Linking toUpperCase ...
$ cat ./data/lines.txt
hoho, this is xx.
who's that ?
$ cat ./data/lines.txt | ./toUpperCase
HOHO, THIS IS XX.
WHO'S THAT ?
toUpperCase: <stdin>: hGetLine: end of file
Through forever do gradually process file content line by line into uppercase form, further:
$ cat ./data/lines.txt | ./toUpperCase > ./tmp.txt
toUpperCase: <stdin>: hGetLine: end of file
$ cat ./tmp.txt
HOHO, THIS IS XX.
WHO'S THAT ?
Write processing results to file, meets expectations
IV. System.IO
Previously used getLine, putStrLn are all functions in System.IO module, commonly used ones also include:
-- Output
print :: Show a => a -> IO ()
putChar :: Char -> IO ()
putStr :: String -> IO ()
-- Input
getChar :: IO Char
getLine :: IO String
Among them print is used to output values, equivalent to putStrLn . show, putStr is used to output strings, without newline at end, the difference between the two is:
> print "hoho"
"hoho"
> putStr "hoho"
hoho
P.S. For detailed information of IO module see System.IO
getContents
getContents :: IO String
getContents can return all user input as a string, so toUpperCase can be rewritten like this:
toUpperCase' = do
contents <- getContents
putStr $ map toUpper contents
No longer process line by line, but take out all content, convert all at once. But if compiling and executing this function, will find it's processed line by line:
$ ./toUpperCase
abc
ABC
efd
EFD
This is related to input buffer, specifics see Haskell: How getContents works?
Lazy I/O
String itself is a lazy List, getContents is also lazy I/O, won't read all content into memory at once
In toUpperCase' example, will read line by line then output uppercase version, because only when outputting do we truly need this input data. Operations before this are all just promises, only demand fulfillment when不得不 do, similar to JS Promise:
function toUpperCase() {
let io;
let contents = new Promise((resolve, reject) => {
io = resolve;
});
let upperContents = contents
.then(result => result.toUpperCase());
putStr(upperContents, io);
}
function putStr(promise, io) {
promise.then(console.log.bind(console));
io('line\nby\nline');
}
// test
toUpperCase();
Very vivid, getContents, map toUpper and other operations just create a series of Promise, only when encountering putStr needing to output results do we truly do I/O then perform toUpper and other calculations
interact
interact :: (String -> String) -> IO ()
Accepts a string processing function as parameter, returns empty I/O Action. Very suitable for text processing scenarios, for example:
-- Filter out lines with less than 3 characters
lessThan3Char = interact (\s -> unlines $ [line | line <- lines s, length line < 3])
Equivalent to:
lessThan3Char' = do
contents <- getContents
let filtered = filterShortLines contents
if null filtered then
return ()
else
putStr filtered
where
filterShortLines = \s -> unlines $ [line | line <- lines s, length line < 3]
Looks more troublesome, interact function name is called interaction, the effect is to simplify this most common interaction pattern: input string, after processing output the results
V. File Read/Write
Read a file, display it as-is:
import System.IO
main = do
handle <- openFile "./data/lines.txt" ReadMode
contents <- hGetContents handle
putStr contents
hClose handle
Form is similar to C language file read/write, handle is equivalent to file pointer, open file in read-only mode to get file pointer, then read its content through pointer, finally release file pointer. Intuitively, we try doing this:
readTwoLines = do
handle <- openFile "./data/lines.txt" ReadMode
line1 <- hGetLine handle
line2 <- hGetLine handle
putStrLn line1
putStrLn line2
hClose handle
Everything normal, read first two lines of file, then output them, this pointer indeed can move
P.S. Similar hGet/Putxxx contains many, such as hPutStr, hPutStrLn, hGetChar etc., similar to versions without h, just with an extra handle parameter, for example:
hPutStr :: Handle -> String -> IO ()
Looking back at these function types:
openFile :: FilePath -> IOMode -> IO Handle
hGetContents :: Handle -> IO String
hGetLine :: Handle -> IO String
hClose :: Handle -> IO ()
openFile accepts a FilePath and IOMode parameter, returns IO Handle, holding this Handle can ask hGetContents or hGetLine for file content, finally release file pointer related resources through hClose. Among them FilePath is String (alias defined for String), IOMode is an enum value (read-only, write-only, append, read-write 4 modes):
> :i FilePath
type FilePath = String -- Defined in 'GHC.IO'
> :i IOMode
data IOMode = ReadMode | WriteMode | AppendMode | ReadWriteMode
-- Defined in 'GHC.IO.IOMode'
P.S. Can understand file pointer as bookmark, book refers to entire file system, this metaphor is very vivid
withFile
withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
Looks like another pattern encapsulation, so, use it to simplify the above file reading example:
readThisFile = withFile "./data/lines.txt" ReadMode (\handle -> do
contents <- hGetContents handle
putStr contents
)
Looks a bit cleaner, more and more functional common patterns, things done are nothing but two types:
-
Abstract out common patterns, including
Maybe/Eitherand other type abstractions,forever do, interactand other common pattern abstractions -
Simplify parts outside key logic, such as
withFile,map, filterand other utility functions can help strip boilerplate code (openFile, hCloseand other routine operations), focus more on key logic
So, what withFile does is according to passed file path and read mode, open file, inject the obtained handle to file processing function (3rd parameter), finally close the handle:
withFile' path mode f = do
handle <- openFile path mode
result <- f handle
hClose handle
return result
Note, here reflects the important role of return, we need to hClose handle before returning result, so must have mechanism to return custom value
readFile
readFile :: FilePath -> IO String
Input file path, output IO String, Open/Close links are all omitted, can make reading files very simple:
readThisFile' = do
contents <- readFile "./data/lines.txt"
putStr contents
writeFile
writeFile :: FilePath -> String -> IO ()
Input file path, and string to write, returns an empty I/O Action, similarly omits the link of dealing with handle:
writeThatFile = do
writeFile "./data/that.txt" "contents in that file\nanother line\nlast line"
File doesn't exist will auto-create, overwrite write, very convenient to use. Equivalent to manual control troublesome way:
writeThatFile' = do
handle <- openFile "./data/that.txt" WriteMode
hPutStr handle "contents in that file\nanother line\nlast line"
hClose handle
appendFile
appendFile :: FilePath -> String -> IO ()
Type is same as writeFile, just internally uses AppendMode, append content to end of file
Other File Operation Functions
-- At path specified by FilePath, open file with name specified by String plus random string, return tuple of temp filename and handle
openTempFile :: FilePath -> String -> IO (FilePath, Handle)
-- Defined in System.Directory module, used to delete specified file
removeFile :: FilePath -> IO ()
-- Defined in System.Directory module, used to rename specified file
renameFile :: FilePath -> FilePath -> IO ()
Note, among them removeFile and renameFile are both defined in System.Directory module (not in System.IO), file create/delete/update/query, permission management and other functions are all in System.Directory module, such as doesFileExist, getAccessTime, findFile etc.
P.S. For more file operation functions, see System.Directory
No comments yet. Be the first to share your thoughts.