Skip to main content

Types - Haskell Notes 3

Free2018-05-04#Functional_Programming#Haskell type#Haskell自定义类型#Haskell内置类型#Haskell类型派生

Haskell is statically typed; everything has a type, and it must be determined at compile time.

1. Built-in Types

Several common types are as follows:

  • Int: Bounded integer, the bounds on a 32-bit machine are [-2147483648, 2147483647].

  • Integer: Unbounded integer, a built-in large number type, less efficient than Int.

  • Float: Single-precision floating-point number, 6 decimal places.

  • Double: Double-precision floating-point number, 15 decimal places.

  • Bool: Boolean value, values are True/False.

  • Char: Character.

  • Tuple: A tuple itself is also a type, with () being its only value.

The built-in unbounded integer makes large number arithmetic very convenient, for example, calculating the factorial of 100:

> product [1..100]
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

2. Variable Types

name :: String
name = "ayqy"

:: is read as "has type", telling the compiler that the variable name is of type String (i.e., [Char]).

Additionally, the first letter of types is always capitalized.

P.S. Although in theory many scenarios do not require manual type annotations (the compiler will infer them itself), the practical recommendation is to at least annotate the types of top-level variables/functions. Of course, annotating everything is definitely a good habit, as explicit types can greatly enhance readability. See Type signatures as good style for details.

P.S. You can list all type inferences of a specified module using the :browse <module> command, which makes it easy to add types to existing code.

3. Function Types

The type signatures of some common functions are as follows:

show :: Show a => a -> String
odd :: Integral a => a -> Bool
fromIntegral :: (Num b, Integral a) => a -> b
(+) :: Num a => a -> a -> a
(++) :: [a] -> [a] -> [a]

Here, the part between :: and => is the type constraint (declaring type variables), and the part after => is its type. The lowercase letters in the type declaration (like a) are called type variables. Unconstrained type variables (like the a in the type of ++) act like generics. Functions that use type variables are called polymorphic functions.

For example, show :: Show a => a -> String means the type of show is a function that takes a parameter of type Show and returns a String. (+) :: Num a => a -> a -> a means the type of + is a (curried) function that takes two parameters of type Num and returns a Num. And (++) :: [a] -> [a] -> [a] means the type of ++ is a function that takes two List parameters and returns another List. Here a has no type constraint, so the elements in the List can be of any type.

The -> in the type part is read as "maps to". How to understand this?

The mathematical definition of a function is a mapping relationship from the domain to the codomain, so the mathematical meaning corresponding to f = x -> y is y = f(x). That is to say, x mapped to y (the mapping relationship) is f; input x and return the corresponding y.

So a -> b -> c represents a function that takes an input a and returns a function b -> c. By continuing to call this returned function, inputting b returns the corresponding c. If we ignore the currying feature, it can simply be understood as taking two parameters a, b and returning c.

4. Typeclass

(==) :: Eq a => a -> a -> Bool

Here, Eq is called a typeclass, which is equivalent to an interface; it defines the behavior that members of this type must possess.

All types except functions belong to Eq and can be checked for equality. Some other common typeclasses are:

  • Ord: Can be compared for size (can compare size using functions like <, >, <=, >=, so Ord must belong to Eq).

  • Show: Can be represented as a string (everything except functions is Showable). You can convert other types to strings via the show function.

  • Read: The opposite of Show. You can convert strings to other types via the read function.

  • Enum: Enumerable, i.e., sequential. Includes (), Bool, Char, Ordering, Int, Integer, Float, and Double. These types can be used for Ranges, and the successor and predecessor of a value of this type can be accessed via the succ and pred functions.

  • Bounded: Has clear upper and lower bounds. You can get the upper and lower bounds of a specified type via maxBound and minBound (e.g., maxBound :: Int).

  • Num: Numeric. Members all have the characteristics of numbers.

  • Integral: Integer. Includes Int and Integer.

  • Floating: Floating-point. Includes Float and Double.

For numeric conversions, going from a larger range to a smaller range can be done implicitly (e.g., Num to Float), while going from smaller to larger requires functions like fromIntegral :: (Num b, Integral a) => a -> b. A common scenario is the length function:

> length "Hello" + 0.5

<interactive>:191:18: error:
    ? No instance for (Fractional Int) arising from the literal ‘0.5’
    ? In the second argument of ‘(+)’, namely ‘0.5’
      In the expression: length "Hello" + 0.5
      In an equation for ‘it’: it = length "Hello" + 0.5

Because length :: Foldable t => t a -> Int, and Int cannot be directly added to Fractional, you need to do this:

> (fromIntegral (length "Hello")) + 0.5
5.5

Also, the read function is very interesting, for example:

> read "12" + 4
16
> 1 : read "[2, 4]"
[1,2,4]

It will infer the target type based on the context, so if there is no context, it cannot infer:

> read "12"
*** Exception: Prelude.read: no parse

The compiler doesn't know what type we want, so we can manually declare the type to give it a hint:

> read "12" :: Int
12
> read "12" :: Float
12.0

5. Custom Types

Algebraic Data Types

Algebraic Data Type refers to data structures constructed through algebraic operations, of which there are two types:

  • sum: Logical OR. For example, the possible values of the Maybe type have a logical OR relationship.

  • product: Logical AND. For example, the components of a tuple have a logical AND relationship.

For example:

-- Logical AND, Pair type is an Int-Double pair
data Pair = P Int Double
-- Logical OR, the value of Pair type is either an Int or a Double
data Pair = I Int | D Double

Data structures of arbitrary complexity can be built using logical OR and logical AND, all of which can be called algebraic data types.

In terms of status, algebraic data types are as fundamental to functional languages as algebra is to mathematics. Similarly, to perform algebraic operations, one must first have a definition of numbers:

[caption id="attachment_1705" align="alignnone" width="625"]map algebraic data type to math map algebraic data type to math[/caption]

Declaration

Use the data keyword to declare custom types:

data Shape = Circle Float Float Float | Rectangle Float Float Float Float

This means the Shape type has 2 value constructors (Circle, Rectangle), i.e., a value of the Shape type is either a Circle or a Rectangle. Value constructors are essentially functions:

Circle :: Float -> Float -> Float -> Shape
Rectangle :: Float -> Float -> Float -> Float -> Shape

The parameters of a value constructor (like the Float Float Float of Circle) are also called fields, which are practically just parameters.

Since value constructors are functions, pattern matching can also be used for custom types:

circleArea (Circle _ _ r) = pi * r ^ 2
> circleArea (Circle 1 1 1)
3.1415927

The type of the area function is:

circleArea :: Shape -> Float

The parameter type is Shape, not Circle, because the latter is just a value constructor, not a type.

In addition, pattern matching is always against value constructors. Common ones like [], otherwise/True, 5, etc., are parameterless value constructors.

Recursive Type Definitions

If a parameter (field) of a value constructor for a type is of that same type, a recursive definition occurs.

For example, the syntactic sugar for a List:

[1, 2, 3]
-- is equivalent to (: is right-associative, parentheses are optional)
1 : (2 : (3 : []))

This is a recursive definition: a List is formed by inserting the first element to the left of the List composed of the remaining elements.

Let's try hand-rolling one:

infixr 5 :>
data MyList a = MyEmptyList | a :> (MyList a) deriving (Show)

Here, the custom operator :> is equivalent to :, and both are value constructors (so pattern matching x:xs is actually against the List's value constructor :). Let's play with it:

> :t MyEmptyList
MyEmptyList :: MyList a
> 3 :> 5 :> MyEmptyList
3 :> (5 :> MyEmptyList)
> :t 3 :> 5 :> MyEmptyList
3 :> 5 :> MyEmptyList :: Num a => MyList a

Aside from syntactic differences, it is essentially identical to the List definition (3 : 5 : []). Let's build a few more characteristic List functions:

_fromList [] = MyEmptyList
_fromList (x:xs) = x :> (_fromList xs)
_map f MyEmptyList = MyEmptyList
_map f (x :> xs) = f x :> _map f xs

Continue playing:

> _fromList [1, 2, 3]
1 :> (2 :> (3 :> MyEmptyList))
> _map (+ 1) (_fromList [1, 2, 3])
2 :> (3 :> (4 :> MyEmptyList))

Deriving

Only members of the Show typeclass can be directly output in the GHCi environment (because show :: Show a => a -> String is called before output). So, let's make Shape a member of Show:

data Shape = Circle Float Float Float | Rectangle Float Float Float Float deriving (Show)

The deriving keyword is used to declare type derivation, making the values of one type also members of another typeclass. Try directly outputting a Shape value:

> Circle 1 1 1
Circle 1.0 1.0 1.0

Let's extract the coordinate point as well:

data Point = Point Float Float deriving (Show)
data Shape = Circle Point Float | Rectangle Point Point deriving (Show)
circleArea (Circle _ r) = pi * r ^ 2

Besides Show, other typeclasses that can automatically add default behaviors are Eq, Ord, Enum, Bounded, Read. For example, after deriving from Eq, you can compare values for equality using == and /=:

data Mytype = Mytype Int String deriving (Show, Eq)
> Mytype 3 "a" == Mytype 4 "b"
False
> Mytype 3 "a" == Mytype 3 "a"
True

Actually, the equality check automatically added when deriving from Eq just checks if the input parameters are identical:

1. Check if the value constructors are identical.
2. Check if the parameters of the value constructors are identical.

Of course, this requires the parameters to also be members of the Eq class; otherwise, it cannot automatically compare (if this is not met, an error will be thrown).

Show and Read are similar, used to convert between strings and values:

data Mytype = Mytype Int String deriving (Show, Eq, Read)
> Mytype 3 "a"
Mytype 3 "a"
> read "Mytype 3 \"a\"" :: Mytype
Mytype 3 "a"

Ord is interesting; it indicates that members can be ordered, but how is the default sorting basis determined?

data Mytype = EmptyValue | Singleton | Mytype Int String deriving (Show, Eq, Read, Ord)
> EmptyValue < Singleton
True
> Singleton < Mytype 3 "a"
True
> Mytype 3 "a" < Mytype 4 "a"
True

First, look at the order in the type declaration. Among those combined with OR (|), the value constructor that appears first creates the smallest value. Then, compare the parameters of the value constructors according to similar rules, so it also requires all parameters to be Ord members.

Enum, Bounded are used to define enumerated types, i.e., finite sets. Enum requires each value to have a predecessor/successor so it can be used for Ranges. Bounded requires the values to have upper and lower bounds. For example:

data Day = Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday deriving (Show, Bounded, Enum)
-- Upper and lower bounds
> maxBound :: Day
Sunday
> minBound :: Day
Monday
-- Predecessor/Successor
> pred Wednesday
Tuesday
> succ Wednesday
Thursday
-- Range
> [Wednesday ..]
[Wednesday,Thursday,Friday,Saturday,Sunday]

Record

For simple data types, such as Vector2D:

data Vector2D = Vector2D Float Float deriving(Show)

A simple data definition meets the semantic needs (we explicitly know the two parameters of a 2D vector are the x and y coordinates). If the object to be described is complex, such as a person with age, height, weight, and measurements:

data Person = Person Float Float Float Float Float Float deriving(Show)

This is too unintuitive. Let's add a comment:

-- Age Height Weight Bust Waist Hips
data Person = Person Float Float Float Float Float Float deriving(Show)

What does this bring to mind? This is just a function with a dozen parameters! A huge number of parameters that also require a specific order. What's more troublesome is that this is a data type, so we also need a series of getters:

getAge (Person age _ _ _ _ _) = age
getHeight (Person _ height _ _ _ _) = height
-- ... and a bunch more getters

How is this usually handled in other languages? Organize the scattered parameters (e.g., create an object):

data Person = Person {
  age :: Float,
  height :: Float,
  weight :: Float,
  xw :: Float,
  yw :: Float,
  tw :: Float
} deriving (Show)

Creating a person now has clear semantics, and we don't need to worry about parameter order:

person =  Person {age=1, height=2, xw=4, yw=5, tw=6, weight=3}

A bunch of getters are automatically created, for example:

> :t age
age :: Person -> Float
> weight person
3.0

This is much more convenient to use than a simple type definition.

Type Parameters

Type constructors can take parameters and return new types. For example:

data Maybe a = Nothing | Just a

Here, a is a type parameter, and Maybe is not a type, but a type constructor. The concrete Maybe xxx is the type, and Nothing and Just xxx are values of that type. For example:

Just 'a' :: Maybe Char
Nothing :: Maybe a

Doing this yields a set of types with similar behaviors. From an application scenario perspective, types with parameters are equivalent to generics, an abstraction over concrete types. For example, the classic List:

[1, 2, 3] :: Num t => [t]
"456" :: [Char]

Both support certain behaviors (various functions defined in the Data.List module):

map :: (a -> b) -> [a] -> [b]
> map (+ 1) [1, 2, 3]
[2,3,4]
> map (Data.Char.chr . (+ 1) . Data.Char.ord) "456"
"567"

length :: Foldable t => t a -> Int
> length [1, 2, 3]
3
> length "456"
3

The map and length functions don't care about the concrete type of List a; they are considered operations defined on an abstract data type.

Maybe vs. Either

data Maybe a = Nothing | Just a 	-- Defined in ‘GHC.Base’
data Either a b = Left a | Right b 	-- Defined in ‘Data.Either’

In terms of application scenarios, Maybe is used to represent results that might fail: success is Just a, failure is Nothing. It is suitable for scenarios with a single failure reason, such as elemIndex:

Data.List.elemIndex :: Eq a => a -> [a] -> Maybe Int

If found, it returns the index of type Just Int; if not found, it returns Nothing. There is no third result.

Looking solely at exception handling scenarios, Either is more powerful. Generally, the failure reason is put into Left a, and the successful result into Right b. It is formally very similar to Maybe, but Left a can carry arbitrary information. In comparison, Nothing is far too vague.

P.S. In the context of JS, Maybe is like agreeing to return a value on success and false on failure; you only know it failed, but might not know the exact reason. Either is like agreeing that the first parameter of a callback function carries the error information; if it's not empty, it means failure, and the specific reason is the value of that parameter.

Type Synonyms

Type synonyms have already been seen:

> :i String
type String = [Char] 	-- Defined in ‘GHC.Base’

The type keyword is used to define an alias for a type, making String equivalent to [Char], thereby bringing semantic benefits to type declarations, for example:

type PhoneNumber = String
type Name = String
type PhoneBook = [(Name,PhoneNumber)]

inPhoneBook :: Name -> PhoneNumber -> PhoneBook -> Bool
inPhoneBook name pnumber pbook = (name, pnumber) `elem` pbook

Input a name, phone number, and phonebook, and return whether this record exists in the phonebook. Without aliases, the type declaration could only look like this:

inPhoneBook :: String -> String -> [(String, String)] -> Bool

Of course, this scenario might seem like making a mountain out of a molehill, doing all this extra work for something with no real practical effect (the type declaration). However, the type synonym feature is designed to provide the ability to allow type definitions to be more vividly semantic, not just for a specific scenario. For example:

  • Making type declarations more readable.

  • Replacing long type names with high repetition rates (e.g., [(String, String)]).

This ability can make the types describe things much more explicitly.

Type synonyms can also take parameters. For example, a custom association list:

type AssocList k v = [(k, v)]

Allowing any k-v ensures its generality, for example:

inPhoneBook :: (Eq k, Eq v) => k -> v -> AssocList k v -> Bool
inPhoneBook name pnumber pbook = (name, pnumber) `elem` pbook
> inPhoneBook 1 "1234" [(0, "0012"), (1, "123")]
False

Here, the concrete type corresponding to AssocList k v is AssocList Int String:

> read "[(0, \"0012\"), (1, \"123\")]" :: AssocList Int String
[(0,"0012"),(1,"123")]

Type synonyms also have a feature similar to currying, for example:

type IntAssocList = Int
-- Equivalent to keeping one parameter
type IntAssocList v = Int v

If enough parameters are provided, it's a concrete type; otherwise, it's a parameterized type.

References

Comments

No comments yet. Be the first to share your thoughts.

Leave a comment