Safe Haskell	Safe-Inferred
Language	Haskell2010

Text.HTML.Scalpel

Contents

Selectors
Scrapers
- Primitives
- Executing scrapers
Serial Scraping
- Primitives

Description

Scalpel is a web scraping library inspired by libraries like parsec and Perl's Web::Scraper. Scalpel builds on top of Text.HTML.TagSoup to provide a declarative and monadic interface.

There are two general mechanisms provided by this library that are used to build web scrapers: Selectors and Scrapers.

Selectors describe a location within an HTML DOM tree. The simplest selector, that can be written is a simple string value. For example, the selector "div" matches every single div node in a DOM. Selectors can be combined using tag combinators. The // operator to define nested relationships within a DOM tree. For example, the selector "div" // "a" matches all anchor tags nested arbitrarily deep within a div tag.

In addition to describing the nested relationships between tags, selectors can also include predicates on the attributes of a tag. The @: operator creates a selector that matches a tag based on the name and various conditions on the tag's attributes. An attribute predicate is just a function that takes an attribute and returns a boolean indicating if the attribute matches a criteria. There are several attribute operators that can be used to generate common predicates. The @= operator creates a predicate that matches the name and value of an attribute exactly. For example, the selector "div" @: ["id" @= "article"] matches div tags where the id attribute is equal to "article".

Scrapers are values that are parameterized over a selector and produce a value from an HTML DOM tree. The Scraper type takes two type parameters. The first is the string like type that is used to store the text values within a DOM tree. Any string like type supported by Text.StringLike is valid. The second type is the type of value that the scraper produces.

There are several scraper primitives that take selectors and extract content from the DOM. Each primitive defined by this library comes in two variants: singular and plural. The singular variants extract the first instance matching the given selector, while the plural variants match every instance.

The following is an example that demonstrates most of the features provided by this library. Suppose you have the following hypothetical HTML located at "http:/example.comarticle.html" and you would like to extract a list of all of the comments.

<html>
  <body>
    <div class='comments'>
      <div class='comment container'>
        <span class='comment author'>Sally</span>
        <div class='comment text'>Woo hoo!</div>
      </div>
      <div class='comment container'>
        <span class='comment author'>Bill</span>
        <img class='comment image' src='http://example.com/cat.gif' />
      </div>
      <div class='comment container'>
        <span class='comment author'>Susan</span>
        <div class='comment text'>WTF!?!</div>
      </div>
    </div>
  </body>
</html>

The following snippet defines a function, allComments, that will download the web page, and extract all of the comments into a list:

type Author = String

data Comment
    = TextComment Author String
    | ImageComment Author URL
    deriving (Show, Eq)

allComments :: IO (Maybe [Comment])
allComments = scrapeURL "http://example.com/article.html" comments
   where
       comments :: Scraper String [Comment]
       comments = chroots ("div" @: [hasClass "container"]) comment

       comment :: Scraper String Comment
       comment = textComment <|> imageComment

       textComment :: Scraper String Comment
       textComment = do
           author      <- text $ "span" @: [hasClass "author"]
           commentText <- text $ "div"  @: [hasClass "text"]
           return $ TextComment author commentText

       imageComment :: Scraper String Comment
       imageComment = do
           author   <- text       $ "span" @: [hasClass "author"]
           imageURL <- attr "src" $ "img"  @: [hasClass "image"]
           return $ ImageComment author imageURL

Complete examples can be found in the examples folder in the scalpel git repository.

Synopsis

data Selector
data AttributePredicate
data AttributeName
- = AnyAttribute
- | AttributeString String
data TagName
- = AnyTag
- | TagString String
tagSelector :: String -> Selector
textSelector :: Selector
anySelector :: Selector
(//) :: Selector -> Selector -> Selector
atDepth :: Selector -> Int -> Selector
(@:) :: TagName -> [AttributePredicate] -> Selector
(@=) :: AttributeName -> String -> AttributePredicate
(@=~) :: RegexLike re String => AttributeName -> re -> AttributePredicate
hasClass :: String -> AttributePredicate
notP :: AttributePredicate -> AttributePredicate
match :: (String -> String -> Bool) -> AttributePredicate
type Scraper str = ScraperT str Identity
data ScraperT str (m :: Type -> Type) a
attr :: forall str (m :: Type -> Type). (Show str, StringLike str, Monad m) => String -> Selector -> ScraperT str m str
attrs :: forall str (m :: Type -> Type). (Show str, StringLike str, Monad m) => String -> Selector -> ScraperT str m [str]
html :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m str
htmls :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m [str]
innerHTML :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m str
innerHTMLs :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m [str]
text :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m str
texts :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m [str]
chroot :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => Selector -> ScraperT str m a -> ScraperT str m a
chroots :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => Selector -> ScraperT str m a -> ScraperT str m [a]
position :: forall str (m :: Type -> Type). (StringLike str, Monad m) => ScraperT str m Int
matches :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m ()
scrape :: StringLike str => Scraper str a -> [Tag str] -> Maybe a
scrapeStringLike :: StringLike str => str -> Scraper str a -> Maybe a
scrapeT :: StringLike str => ScraperT str m a -> [Tag str] -> m (Maybe a)
scrapeStringLikeT :: (StringLike str, Monad m) => str -> ScraperT str m a -> m (Maybe a)
type URL = String
fetchTags :: StringLike str => URL -> IO [Tag str]
fetchTagsWithConfig :: StringLike str => Config str -> URL -> IO [Tag str]
scrapeURL :: StringLike str => URL -> Scraper str a -> IO (Maybe a)
scrapeURLWithConfig :: StringLike str => Config str -> URL -> Scraper str a -> IO (Maybe a)
data Config str = Config {
- decoder :: Decoder str
- manager :: Maybe Manager
}
type Decoder str = Response ByteString -> str
defaultDecoder :: StringLike str => Decoder str
utf8Decoder :: StringLike str => Decoder str
iso88591Decoder :: StringLike str => Decoder str
type SerialScraper str a = SerialScraperT str Identity a
data SerialScraperT str (m :: Type -> Type) a
inSerial :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => SerialScraperT str m a -> ScraperT str m a
stepNext :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m a
stepBack :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m a
seekNext :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m a
seekBack :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m a
untilNext :: forall str (m :: Type -> Type) a b. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m b -> SerialScraperT str m b
untilBack :: forall str (m :: Type -> Type) a b. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m b -> SerialScraperT str m b

Selectors

data Selector #

Selector defines a selection of an HTML DOM tree to be operated on by a web scraper. The selection includes the opening tag that matches the selection, all of the inner tags, and the corresponding closing tag.

Instances

Instances details

IsString Selector
Instance details Defined in Text.HTML.Scalpel.Internal.Select.Types Methods fromString :: String -> Selector #

data AttributePredicate #

An AttributePredicate is a method that takes a Attribute and returns a Bool indicating if the given attribute matches a predicate.

data AttributeName #

The AttributeName type can be used when creating Selectors to specify the name of an attribute of a tag.

Constructors

AnyAttribute
AttributeString String

Instances

Instances details

IsString AttributeName
Instance details Defined in Text.HTML.Scalpel.Internal.Select.Types Methods fromString :: String -> AttributeName #

data TagName #

The TagName type is used when creating a Selector to specify the name of a tag.

Constructors

AnyTag
TagString String

Instances

Instances details

IsString TagName
Instance details Defined in Text.HTML.Scalpel.Internal.Select.Types Methods fromString :: String -> TagName #

tagSelector :: String -> Selector #

textSelector :: Selector #

A selector which will match all text nodes.

Wildcards

anySelector :: Selector #

A selector which will match any node (including tags and bare text).

Tag combinators

(//) :: Selector -> Selector -> Selector infixl 5 #

The // operator creates an Selector by nesting one Selector in another. For example, "div" // "a" will create a Selector that matches anchor tags that are nested arbitrarily deep within a div tag.

atDepth :: Selector -> Int -> Selector infixl 6 #

The atDepth operator constrains a Selector to only match when it is at depth below the previous selector.

For example, "div" // "a" atDepth 1 creates a Selector that matches anchor tags that are direct children of a div tag.

Attribute predicates

(@:) :: TagName -> [AttributePredicate] -> Selector infixl 9 #

The @: operator creates a Selector by combining a TagName with a list of AttributePredicates.

(@=) :: AttributeName -> String -> AttributePredicate infixl 6 #

The @= operator creates an AttributePredicate that will match attributes with the given name and value.

If you are attempting to match a specific class of a tag with potentially multiple classes, you should use the hasClass utility function.

(@=~) :: RegexLike re String => AttributeName -> re -> AttributePredicate infixl 6 #

The @=~ operator creates an AttributePredicate that will match attributes with the given name and whose value matches the given regular expression.

hasClass :: String -> AttributePredicate #

The classes of a tag are defined in HTML as a space separated list given by the class attribute. The hasClass function will match a class attribute if the given class appears anywhere in the space separated list of classes.

notP :: AttributePredicate -> AttributePredicate #

Negates an AttributePredicate.

match :: (String -> String -> Bool) -> AttributePredicate #

The match function allows for the creation of arbitrary AttributePredicates. The argument is a function that takes the attribute key followed by the attribute value and returns a boolean indicating if the attribute satisfies the predicate.

Scrapers

type Scraper str = ScraperT str Identity #

A value of Scraper a defines a web scraper that is capable of consuming a list of Tags and optionally producing a value of type a.

data ScraperT str (m :: Type -> Type) a #

A ScraperT operates like Scraper but also acts as a monad transformer.

Instances

Instances details

MonadError e m => MonadError e (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods throwError :: e -> ScraperT str m a # catchError :: ScraperT str m a -> (e -> ScraperT str m a) -> ScraperT str m a #
MonadReader s m => MonadReader s (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods ask :: ScraperT str m s # local :: (s -> s) -> ScraperT str m a -> ScraperT str m a # reader :: (s -> a) -> ScraperT str m a #
MonadState s m => MonadState s (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods get :: ScraperT str m s # put :: s -> ScraperT str m () # state :: (s -> (a, s)) -> ScraperT str m a #
MonadWriter w m => MonadWriter w (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods writer :: (a, w) -> ScraperT str m a # tell :: w -> ScraperT str m () # listen :: ScraperT str m a -> ScraperT str m (a, w) # pass :: ScraperT str m (a, w -> w) -> ScraperT str m a #
MonadTrans (ScraperT str)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods lift :: Monad m => m a -> ScraperT str m a #
Monad m => MonadFail (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods fail :: String -> ScraperT str m a #
MonadFix m => MonadFix (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods mfix :: (a -> ScraperT str m a) -> ScraperT str m a #
MonadIO m => MonadIO (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods liftIO :: IO a -> ScraperT str m a #
Monad m => Alternative (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods empty :: ScraperT str m a # (<\|>) :: ScraperT str m a -> ScraperT str m a -> ScraperT str m a # some :: ScraperT str m a -> ScraperT str m [a] # many :: ScraperT str m a -> ScraperT str m [a] #
Monad m => Applicative (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods pure :: a -> ScraperT str m a # (<>) :: ScraperT str m (a -> b) -> ScraperT str m a -> ScraperT str m b # liftA2 :: (a -> b -> c) -> ScraperT str m a -> ScraperT str m b -> ScraperT str m c # (>) :: ScraperT str m a -> ScraperT str m b -> ScraperT str m b # (<*) :: ScraperT str m a -> ScraperT str m b -> ScraperT str m a #
Functor m => Functor (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods fmap :: (a -> b) -> ScraperT str m a -> ScraperT str m b # (<$) :: a -> ScraperT str m b -> ScraperT str m a #
Monad m => Monad (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods (>>=) :: ScraperT str m a -> (a -> ScraperT str m b) -> ScraperT str m b # (>>) :: ScraperT str m a -> ScraperT str m b -> ScraperT str m b # return :: a -> ScraperT str m a #
Monad m => MonadPlus (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods mzero :: ScraperT str m a # mplus :: ScraperT str m a -> ScraperT str m a -> ScraperT str m a #
MonadCont m => MonadCont (ScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape Methods callCC :: ((a -> ScraperT str m b) -> ScraperT str m a) -> ScraperT str m a #

Primitives

attr :: forall str (m :: Type -> Type). (Show str, StringLike str, Monad m) => String -> Selector -> ScraperT str m str #

The attr function takes an attribute name and a selector and returns the value of the attribute of the given name for the first opening tag that matches the given selector.

This function will match only the opening tag matching the selector, to match every tag, use attrs.

attrs :: forall str (m :: Type -> Type). (Show str, StringLike str, Monad m) => String -> Selector -> ScraperT str m [str] #

The attrs function takes an attribute name and a selector and returns the value of the attribute of the given name for every opening tag (possibly nested) that matches the given selector.

s = "<div id=\"out\"><div id=\"in\"></div></div>"
scrapeStringLike s (attrs "id" "div") == Just ["out", "in"]

html :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m str #

The html function takes a selector and returns the html string from the set of tags described by the given selector.

This function will match only the first set of tags matching the selector, to match every set of tags, use htmls.

htmls :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m [str] #

The htmls function takes a selector and returns the html string from every set of tags (possibly nested) matching the given selector.

s = "<div><div>A</div></div>"
scrapeStringLike s (htmls "div") == Just ["<div><div>A</div></div>", "<div>A</div>"]

innerHTML :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m str #

The innerHTML function takes a selector and returns the inner html string from the set of tags described by the given selector. Inner html here meaning the html within but not including the selected tags.

This function will match only the first set of tags matching the selector, to match every set of tags, use innerHTMLs.

innerHTMLs :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m [str] #

The innerHTMLs function takes a selector and returns the inner html string from every set of tags (possibly nested) matching the given selector.

s = "<div><div>A</div></div>"
scrapeStringLike s (innerHTMLs "div") == Just ["<div>A</div>", "A"]

text :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m str #

The text function takes a selector and returns the inner text from the set of tags described by the given selector.

This function will match only the first set of tags matching the selector, to match every set of tags, use texts.

texts :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m [str] #

The texts function takes a selector and returns the inner text from every set of tags (possibly nested) matching the given selector.

s = "<div>Hello <div>World</div></div>"
scrapeStringLike s (texts "div") == Just ["Hello World", "World"]

chroot :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => Selector -> ScraperT str m a -> ScraperT str m a #

The chroot function takes a selector and an inner scraper and executes the inner scraper as if it were scraping a document that consists solely of the tags corresponding to the selector.

This function will match only the first set of tags matching the selector, to match every set of tags, use chroots.

chroots :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => Selector -> ScraperT str m a -> ScraperT str m [a] #

The chroots function takes a selector and an inner scraper and executes the inner scraper as if it were scraping a document that consists solely of the tags corresponding to the selector. The inner scraper is executed for each set of tags (possibly nested) matching the given selector.

s = "<div><div>A</div></div>"
scrapeStringLike s (chroots "div" (pure 0)) == Just [0, 0]

position :: forall str (m :: Type -> Type). (StringLike str, Monad m) => ScraperT str m Int #

The position function is intended to be used within the do-block of a chroots call. Within the do-block position will return the index of the current sub-tree within the list of all sub-trees matched by the selector passed to chroots.

For example, consider the following HTML:

<article>
 <p> First paragraph. </p>
 <p> Second paragraph. </p>
 <p> Third paragraph. </p>
</article>

The position function can be used to determine the index of each <p> tag within the article tag by doing the following.

chroots "article" // "p" $ do
  index   <- position
  content <- text "p"
  return (index, content)

Which will evaluate to the list:

[
  (0, "First paragraph.")
, (1, "Second paragraph.")
, (2, "Third paragraph.")
]

matches :: forall str (m :: Type -> Type). (StringLike str, Monad m) => Selector -> ScraperT str m () #

The matches function takes a selector and returns () if the selector matches any node in the DOM.

Executing scrapers

scrape :: StringLike str => Scraper str a -> [Tag str] -> Maybe a #

The scrape function executes a Scraper on a list of Tags and produces an optional value.

scrapeStringLike :: StringLike str => str -> Scraper str a -> Maybe a #

The scrapeStringLike function parses a StringLike value into a list of tags and executes a Scraper on it.

scrapeT :: StringLike str => ScraperT str m a -> [Tag str] -> m (Maybe a) #

The scrapeT function executes a ScraperT on a list of Tags and produces an optional value. Since ScraperT is a monad transformer, the result is monadic.

scrapeStringLikeT :: (StringLike str, Monad m) => str -> ScraperT str m a -> m (Maybe a) #

The scrapeStringLikeT function parses a StringLike value into a list of tags and executes a ScraperT on it. Since ScraperT is a monad transformer, the result is monadic.

type URL = String Source #

fetchTags :: StringLike str => URL -> IO [Tag str] Source #

Download and parse the contents of the given URL.

fetchTagsWithConfig :: StringLike str => Config str -> URL -> IO [Tag str] Source #

Download and parse the contents of the given URL with the given Config.

scrapeURL :: StringLike str => URL -> Scraper str a -> IO (Maybe a) Source #

The scrapeURL function downloads the contents of the given URL and executes a Scraper on it.

The default behavior is to use the global manager provided by http-client-tls (via getGlobalManager). Any exceptions thrown by http-client are not caught and are bubbled up to the caller.

scrapeURLWithConfig :: StringLike str => Config str -> URL -> Scraper str a -> IO (Maybe a) Source #

The scrapeURLWithConfig function takes a Config record type and downloads the contents of the given URL and executes a Scraper on it.

data Config str Source #

A record type that determines how scrapeURLWithConfig interacts with the HTTP server and interprets the results.

Constructors

Config
Fields decoder :: Decoder str manager :: Maybe Manager

Instances

Instances details

StringLike str => Default (Config str) Source #
Instance details Defined in Text.HTML.Scalpel.Internal.Scrape.URL Methods def :: Config str #

type Decoder str = Response ByteString -> str Source #

A method that takes a HTTP response as raw bytes and returns the body as a string type.

defaultDecoder :: StringLike str => Decoder str Source #

The default response decoder. This decoder attempts to infer the character set of the HTTP response body from the `Content-Type` header. If this header is not present, then the character set is assumed to be `ISO-8859-1`.

utf8Decoder :: StringLike str => Decoder str Source #

A decoder that will always decode using `UTF-8`.

iso88591Decoder :: StringLike str => Decoder str Source #

A decoder that will always decode using `ISO-8859-1`.

Serial Scraping

type SerialScraper str a = SerialScraperT str Identity a #

A SerialScraper allows for the application of Scrapers on a sequence of sibling nodes. This allows for use cases like targeting the sibling of a node, or extracting a sequence of sibling nodes (e.g. paragraphs (<p>) under a header (<h2>)).

Conceptually serial scrapers operate on a sequence of tags that correspond to the immediate children of the currently focused node. For example, given the following HTML:

 <article>
   <h1>title</h1>
   <h2>Section 1</h2>
   <p>Paragraph 1.1</p>
   <p>Paragraph 1.2</p>
   <h2>Section 2</h2>
   <p>Paragraph 2.1</p>
   <p>Paragraph 2.2</p>
 </article>

A serial scraper that visits the header and paragraph nodes can be executed with the following:

chroot "article" $ inSerial $ do ...

Each SerialScraper primitive follows the pattern of first moving the focus backward or forward and then extracting content from the new focus. Attempting to extract content from beyond the end of the sequence causes the scraper to fail.

To complete the above example, the article's structure and content can be extracted with the following code:

chroot "article" $ inSerial $ do
    title <- seekNext $ text "h1"
    sections <- many $ do
       section <- seekNext $ text "h2"
       ps <- untilNext (matches "h2") (many $ seekNext $ text "p")
       return (section, ps)
    return (title, sections)

Which will evaluate to:

 ("title", [
   ("Section 1", ["Paragraph 1.1", "Paragraph 1.2"]),
   ("Section 2", ["Paragraph 2.1", "Paragraph 2.2"]),
 ])

data SerialScraperT str (m :: Type -> Type) a #

Run a serial scraper transforming over a monad m.

Instances

Instances details

MonadError e m => MonadError e (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods throwError :: e -> SerialScraperT str m a # catchError :: SerialScraperT str m a -> (e -> SerialScraperT str m a) -> SerialScraperT str m a #
MonadReader r m => MonadReader r (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods ask :: SerialScraperT str m r # local :: (r -> r) -> SerialScraperT str m a -> SerialScraperT str m a # reader :: (r -> a) -> SerialScraperT str m a #
MonadState s m => MonadState s (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods get :: SerialScraperT str m s # put :: s -> SerialScraperT str m () # state :: (s -> (a, s)) -> SerialScraperT str m a #
MonadWriter w m => MonadWriter w (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods writer :: (a, w) -> SerialScraperT str m a # tell :: w -> SerialScraperT str m () # listen :: SerialScraperT str m a -> SerialScraperT str m (a, w) # pass :: SerialScraperT str m (a, w -> w) -> SerialScraperT str m a #
MonadTrans (SerialScraperT str)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods lift :: Monad m => m a -> SerialScraperT str m a #
Monad m => MonadFail (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods fail :: String -> SerialScraperT str m a #
MonadFix m => MonadFix (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods mfix :: (a -> SerialScraperT str m a) -> SerialScraperT str m a #
MonadIO m => MonadIO (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods liftIO :: IO a -> SerialScraperT str m a #
Monad m => Alternative (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods empty :: SerialScraperT str m a # (<\|>) :: SerialScraperT str m a -> SerialScraperT str m a -> SerialScraperT str m a # some :: SerialScraperT str m a -> SerialScraperT str m [a] # many :: SerialScraperT str m a -> SerialScraperT str m [a] #
Monad m => Applicative (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods pure :: a -> SerialScraperT str m a # (<>) :: SerialScraperT str m (a -> b) -> SerialScraperT str m a -> SerialScraperT str m b # liftA2 :: (a -> b -> c) -> SerialScraperT str m a -> SerialScraperT str m b -> SerialScraperT str m c # (>) :: SerialScraperT str m a -> SerialScraperT str m b -> SerialScraperT str m b # (<*) :: SerialScraperT str m a -> SerialScraperT str m b -> SerialScraperT str m a #
Functor m => Functor (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods fmap :: (a -> b) -> SerialScraperT str m a -> SerialScraperT str m b # (<$) :: a -> SerialScraperT str m b -> SerialScraperT str m a #
Monad m => Monad (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods (>>=) :: SerialScraperT str m a -> (a -> SerialScraperT str m b) -> SerialScraperT str m b # (>>) :: SerialScraperT str m a -> SerialScraperT str m b -> SerialScraperT str m b # return :: a -> SerialScraperT str m a #
Monad m => MonadPlus (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods mzero :: SerialScraperT str m a # mplus :: SerialScraperT str m a -> SerialScraperT str m a -> SerialScraperT str m a #
MonadCont m => MonadCont (SerialScraperT str m)
Instance details Defined in Text.HTML.Scalpel.Internal.Serial Methods callCC :: ((a -> SerialScraperT str m b) -> SerialScraperT str m a) -> SerialScraperT str m a #

inSerial :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => SerialScraperT str m a -> ScraperT str m a #

Executes a SerialScraper in the context of a Scraper. The immediate children of the currently focused node are visited serially.

Primitives

stepNext :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m a #

Move the cursor forward one node and execute the given scraper on the new focused node.

stepBack :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m a #

Move the cursor back one node and execute the given scraper on the new focused node.

seekNext :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m a #

Move the cursor forward until the given scraper is successfully able to execute on the focused node. If the scraper is never successful then the serial scraper will fail.

seekBack :: forall str (m :: Type -> Type) a. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m a #

Move the cursor backward until the given scraper is successfully able to execute on the focused node. If the scraper is never successful then the serial scraper will fail.

untilNext :: forall str (m :: Type -> Type) a b. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m b -> SerialScraperT str m b #

Create a new serial context by moving the focus forward and collecting nodes until the scraper matches the focused node. The serial scraper is then executed on the collected nodes.

The provided serial scraper is unable to see nodes outside the new restricted context.

untilBack :: forall str (m :: Type -> Type) a b. (StringLike str, Monad m) => ScraperT str m a -> SerialScraperT str m b -> SerialScraperT str m b #

Create a new serial context by moving the focus backward and collecting nodes until the scraper matches the focused node. The serial scraper is then executed on the collected nodes.