鐵定不純的 IO_Haskell 筆記 5

寫在前面

一直有個疑惑，Haskell 號稱純函數式語言，那麼鐵定不純的場景（肯定有副作用，或者操作本身就是副作用）如何解決？

比如（偽）隨機數、I/O 等，一個純函數的隨機數發生器肯定是不存在的，那要如何處理這種場景呢？

Haskell 的做法其實類似於 React 的 componentDidMount() 等元件生命週期函數，React 建議（道德約束）保持 render() 是純函數，帶有副作用的操作挪到 componentDidMount() 等生命週期中。也就是透過生命週期鉤子，把純的和不純的區分開。Haskell 提供了 do 語句塊，也是用來隔離不純的部分的

一.I/O action

先看個函數型別：

> :t print
print :: Show a => a -> IO ()

print 函數接受一個 Show 類參數，返回一個 IO ()，稱之為 I/O Action，也是一種型別，如下：

> :k IO
IO :: * -> *
> :k IO ()
IO () :: *
> :i IO
newtype IO a
  = GHC.Types.IO (GHC.Prim.State# GHC.Prim.RealWorld
                  -> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))
    -- Defined in 'GHC.Types'
instance Monad IO -- Defined in 'GHC.Base'
instance Functor IO -- Defined in 'GHC.Base'
instance Applicative IO -- Defined in 'GHC.Base'
instance Monoid a => Monoid (IO a) -- Defined in 'GHC.Base'

從型別上看，IO 與 Maybe :: * -> * 類似，都是接受一個具體型別參數，返回具體型別（比如 IO ()）

P.S.其中，newtype 與 data 型別宣告類似，語法和用法也都基本相同，newtype 是更嚴格的型別宣告（直接換成 data 也能正常用，data 換 newtype 就不一定了），具體區別是：

data can only be replaced with newtype if the type has exactly one constructor with exactly one field inside it.

二.使用者輸入

可以透過 I/O Action 獲取使用者輸入，例如：

main = do
  line <- getLine
  if null line then
    return ()
  else do -- do 用來合成 action
    putStrLn line
    main

上面示例是個簡單的 echo 程式，getLine 取一行輸入，返回 IO String，並透過 <- 運算子把 String 取出來，賦值給 line 變數，為空則什麼都不做（返回 IO ()，結束），否則把該行內容透過 putStrLn 輸出到標準輸出並換行，並遞迴執行 main

其中，main 表示入口函數（與 C 語言類似），do 用來把多個 I/O Action 合併成一個，返回被合併的最後一個 I/O Action。另外，do 語句塊裡的 I/O Action 會執行，所以 do 語句塊有 2 個作用：

可以有多條語句，但最後要返回 I/O Action
圈定不純的環境，I/O Action 能夠在這個環境執行

類比 JS，組合多條語句的功能類似於逗號運算子，返回最後一個表示式的值。圈定不純環境類似於 async function，I/O Action 只能出現在 do 語句塊中，這一點類似於 await

P.S.實際上，執行 I/O Action 有 3 種方式：

繫結給 main 時，作為入口函數
放到 do 語句塊裡
在 GHCi 環境輸入 I/O Action 再回車，如 putStrLn "hoho"

執行

可以把 main 當做普通函數在 GHCi 環境下執行，例如：

> :l echo
[1 of 1] Compiling Main             ( echo.hs, interpreted )
Ok, modules loaded: Main.
> main
what?
what?

輸入空行會退出，輸入其它內容會按行原樣輸出

也可以編譯得到可執行檔案：

$ ghc --make ./echo.hs
[1 of 1] Compiling Main             ( echo.hs, echo.o )
Linking echo ...
$ ./echo
here
here

三.Control.Monad

Control.Monad 模組還提供一些適用於 I/O 場景函數，封裝了一些固定的模式，比如 forever do、when condition do 等，能夠簡化一些場景

return

return 用來把 value 包成 I/O Action，而不是從函數跳出。return 與 <- 作用相反（裝箱/拆箱的感覺）：

main = do
  a <- return "hell"
  b <- return "yeah!"
  putStrLn $ a ++ " " ++ b

兩個用途：

用來製造什麼都不做的 I/O Action，比如 echo 示例裡的 then 部分
自定義 do 語句塊的返回值，比如不想把 I/O Action 直接作為 do 語句塊的返回值，想要二次加工的場景

when

when 也是一個函數：

Control.Monad.when :: Applicative f => Bool -> f () -> f ()

可以接受一個布林值和一個 I/O Action（IO 屬於 Applicative 類），作用是布林值為 True 時值為 I/O Action，否則值為 return ()，所以相當於：

when' c io = do
  if c then io
  else return ()

這個東西的型別是：

when' :: Monad m => Bool -> m () -> m ()

所以如果用於 I/O 的話，第二個參數的返回型別只能是 IO ()，看起來不很方便，但很適合條件輸出的場景，畢竟 print 等一系列輸出函數都滿足該型別

sequence

sequence :: (Traversable t, Monad m) => t (m a) -> m (t a)

這個型別宣告看起來比較複雜：

Traversable :: (* -> *) -> Constraint
Monad :: (* -> *) -> Constraint
-- 找兩個對應例項，List 和 IO
instance Traversable [] -- Defined in 'Data.Traversable'
instance Monad IO -- Defined in 'GHC.Base'

在 I/O List 的場景（把 m 換成 IO，t 換成 []），參數的型別約束是 [IO a]，返回值的型別約束是 IO [a]，所以相當於：

sequence' [] = do
  return []
sequence' (x:xs) = do
  v <- x
  others <- (sequence' xs)
  return (v : others)

作用是把 I/O List 中所有 I/O 結果收集起來，形成 List，再包進 IO

P.S.有點 Promise.all 的感覺，接受一組 promise，返回一個新 promise 攜帶這組結果

mapM 與 mapM_

Control.Monad.mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
Control.Monad.mapM_ :: (Foldable t, Monad m) => (a -> m b) -> t a -> m ()

在 I/O List 的場景，mapM 第一個參數是輸入 a 輸出 IO b 的函數，第二個參數是 [a]，返回 IO [b]，返回值型別與 sequence 一致。作用相當於先對 [a] 做映射，得到 I/O List，再來一发 sequence，例如：

> mapM (\x -> do return $ x + 1) [1, 2, 2]
[2,3,3]
> mapM print [1, 2, 2]
1
2
2
[(),(),()]

mapM_ 與之類似，但丟棄結果，返回 IO ()，很適合 print 等不關心 I/O Action 結果的場景：

> mapM_ print [1, 2, 2]
1
2
2

forM

Control.Monad.forM :: (Traversable t, Monad m) => t a -> (a -> m b) -> m (t b)

與 mapM 參數順序相反，作用相同：

> forM [1, 2, 2] print
1
2
2
[(),(),()]

只是形式上的區別，如果第二個參數傳入的函數比較複雜，forM 看起來更清楚一些，例如：

main = do
  colors <- forM [1,2,3,4] (\a -> do
    putStrLn $ "Which color do you associate with the number " ++ show a ++ "?"
    getLine)
  putStrLn "The colors that you associate with 1, 2, 3 and 4 are: "
  mapM putStrLn colors

P.S.最後用 forM（交換參數順序）也可以，但出於語義習慣，forM 常用於定義 I/O Action 的場景（如根據 [a] 生成 IO [b]）

forever

Control.Monad.forever :: Applicative f => f a -> f b

在 I/O 的場景，接受一個 I/O Action，返回一個永遠重複該 Action 的 I/O Action。所以 echo 的示例可以近似地改寫成：

echo = Control.Monad.forever $ do
    line <- getLine
    if null line then
      return ()
    else
      putStrLn' line

在 echo 的場景體現不出來什麼優勢（甚至還跳不出來了，除非 Ctrl+C 強制中斷），但有一種場景很適合 forever do：

import Control.Monad
import Data.Char

main = forever $ do
  line <- getLine
  putStrLn $ map toUpper line

即文字處理（轉換）的場景，輸入文字結束時 forever 也結束，例如：

$ ghc --make ./toUpperCase.hs
[1 of 1] Compiling Main             ( toUpperCase.hs, toUpperCase.o )
Linking toUpperCase ...
$ cat ./data/lines.txt
hoho, this is xx.
who's that ?
$ cat ./data/lines.txt | ./toUpperCase
HOHO, THIS IS XX.
WHO'S THAT ?
toUpperCase: <stdin>: hGetLine: end of file

透過 forever do 把檔案內容逐漸行處理成大寫形式，更進一步的：

$ cat ./data/lines.txt | ./toUpperCase > ./tmp.txt
toUpperCase: <stdin>: hGetLine: end of file
$ cat ./tmp.txt
HOHO, THIS IS XX.
WHO'S THAT ?

把處理結果寫入檔案，符合預期

四.System.IO

之前使用的 getLine、putStrLn 都是 System.IO 模組裡的函數，常用的還有：

-- 輸出
print :: Show a => a -> IO ()
putChar :: Char -> IO ()
putStr :: String -> IO ()
-- 輸入
getChar :: IO Char
getLine :: IO String

其中 print 用來輸出值，相當於 putStrLn . show，putStr 用來輸出字串，末尾不帶換行，二者的區別是：

> print "hoho"
"hoho"
> putStr "hoho"
hoho

P.S.IO 模組的詳細資訊見 System.IO

getContents

getContents :: IO String

getContents 能夠把所有使用者輸入作為字串返回，所以 toUpperCase 可以這樣改寫：

toUpperCase' = do
  contents <- getContents
  putStr $ map toUpper contents

不再一行一行處理，而是取出所有內容，一次全轉換完。但如果編譯執行該函數，會發現是逐行處理的：

$ ./toUpperCase
abc
ABC
efd
EFD

這與輸入緩衝區有關，具體見 Haskell: How getContents works?

惰性 I/O

字串本身是一個惰性 List，getContents 也是惰性 I/O，不會一次性讀入內容放到記憶體中

toUpperCase' 的示例中會一行一行讀入再輸出大寫版本，因為只在輸出的時候才真正需要這些輸入資料。在這之前的操作都只是一種承諾，在不得不做的時候才要求兌現承諾，類似於 JS 的 Promise：

function toUpperCase() {
  let io;
  let contents = new Promise((resolve, reject) => {
    io = resolve;
  });
  let upperContents = contents
    .then(result => result.toUpperCase());
  putStr(upperContents, io);
}

function putStr(promise, io) {
  promise.then(console.log.bind(console));
  io('line\nby\nline');
}

// test
toUpperCase();

非常形象，getContents，map toUpper 等操作都只是造了一系列的 Promise，直到遇到 putStr 需要輸出結果才真正去做 I/O 再進行 toUpper 等運算

interact

interact :: (String -> String) -> IO ()

接受一個字串處理函數作為參數，返回空的 I/O Action。非常適合文字處理的場景，例如：

-- 濾出少於 3 字元的行
lessThan3Char = interact (\s -> unlines $ [line | line <- lines s, length line < 3])

等價於：

lessThan3Char' = do
  contents <- getContents
  let filtered = filterShortLines contents
  if null filtered then
    return ()
  else
    putStr filtered
  where
    filterShortLines = \s -> unlines $ [line | line <- lines s, length line < 3]

看起來麻煩了不少，interact 函數名就叫互動，作用就是簡化這種最常見的互動模式：輸入字串，處理完畢再把結果輸出出來

五.檔案讀寫

讀個檔案，原樣顯示出來：

import System.IO

main = do
  handle <- openFile "./data/lines.txt" ReadMode
  contents <- hGetContents handle
  putStr contents
  hClose handle

形式類似於 C 語言讀寫檔案，handle 相當於檔案指標，以只讀模式開啟檔案得到檔案指標，再透過指標讀取其內容，最後釋放掉檔案指標。直覺的，我們試著這樣做：

readTwoLines = do
  handle <- openFile "./data/lines.txt" ReadMode
  line1 <- hGetLine handle
  line2 <- hGetLine handle
  putStrLn line1
  putStrLn line2
  hClose handle

一切正常，讀取檔案的前兩行，再輸出出來，這個指標果然是能移動的

P.S.類似的 hGet/Putxxx 含有很多，比如 hPutStr, hPutStrLn, hGetChar 等等，與不帶 h 的版本類似，只是多個 handle 參數，例如：

hPutStr :: Handle -> String -> IO ()

回頭看看這幾個函數的型別：

openFile :: FilePath -> IOMode -> IO Handle
hGetContents :: Handle -> IO String
hGetLine :: Handle -> IO String
hClose :: Handle -> IO ()

openFile 接受一個 FilePath 和 IOMode 參數，返回 IO Handle，拿著這個 Handle 就可以找 hGetContents 或 hGetLine 要檔案內容了，最後透過 hClose 釋放檔案指標相關的資源。其中 FilePath 就是 String（給 String 定義的別名），IOMode 是個列舉值（只讀，只寫，追加，讀寫 4 種模式）：

> :i FilePath
type FilePath = String 	-- Defined in 'GHC.IO'
> :i IOMode
data IOMode = ReadMode | WriteMode | AppendMode | ReadWriteMode
    -- Defined in 'GHC.IO.IOMode'

P.S.可以把檔案指標當做書籤來理解，書指的是整個檔案系統，這個比喻非常形象

withFile

withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r

看起來又是一種模式的封裝，那麼，用它來簡化上面讀檔案的示例：

readThisFile = withFile "./data/lines.txt" ReadMode (\handle -> do
    contents <- hGetContents handle
    putStr contents
  )

看起來更清爽了一些，越來越多的函數式常見套路，做的事情無非兩種：

抽象出通用模式，包括 Maybe/Either 等型別抽象，forever do, interact 等常用模式抽象
簡化關鍵邏輯之外的部分，比如 withFile，map, filter 等工具函數能夠幫助剝離樣板程式碼（openFile, hClose 等一板一眼的操作），更專注於關鍵邏輯

所以，withFile 所作的事情就是按照傳入的檔案路徑和讀取模式，開啟檔案，把得到的 handle 注入給檔案處理函數（第 3 個參數），最後再把 handle 關掉：

withFile' path mode f = do
  handle <- openFile path mode
  result <- f handle
  hClose handle
  return result

注意，這裡體現了 return 的重要作用，我們需要在返回結果之前 hClose handle，所以必須要有返回自定義值的機制

readFile

readFile :: FilePath -> IO String

輸入檔案路徑，輸出 IO String，Open/Close 的環節都省掉了，能讓讀檔案變的非常簡單：

readThisFile' = do
  contents <- readFile "./data/lines.txt"
  putStr contents

writeFile

writeFile :: FilePath -> String -> IO ()

輸入檔案路徑，和待寫入的字串，返回個空的 I/O Action，同樣省去了與 handle 打交道的環節：

writeThatFile = do
  writeFile "./data/that.txt" "contents in that file\nanother line\nlast line"

檔案不存在會自動建立，覆蓋式寫入，用起來非常方便。等價於手動控件的麻煩方式：

writeThatFile' = do
  handle <- openFile "./data/that.txt" WriteMode
  hPutStr handle "contents in that file\nanother line\nlast line"
  hClose handle

appendFile

appendFile :: FilePath -> String -> IO ()

型別與 writeFile 一樣，只是內部用了 AppendMode，把內容追加到檔案末尾

其它檔案操作函數

-- 在 FilePath 指定的路徑下，開啟 String 指定的名字拼上隨機串的檔案，返回臨時檔案名稱與 handle 組成的二元組
openTempFile :: FilePath -> String -> IO (FilePath, Handle)
-- 定義在 System.Directory 模組中，用來刪除指定檔案
removeFile :: FilePath -> IO ()
-- 定義在 System.Directory 模組中，用來重新命名指定檔案
renameFile :: FilePath -> FilePath -> IO ()

注意，其中 removeFile 和 renameFile 都是 System.Directory 模組定義的（而不是 System.IO 中的），檔案增刪改查，權限管理等函數都在 System.Directory 模組，例如 doesFileExist, getAccessTime, findFile 等等

P.S.更多檔案操作函數，見 System.Directory