Safe Haskell | None |
---|---|
Language | Haskell2010 |
Data.Hadoop.SequenceFile
Description
This module allows for lazy decoding of hadoop sequence files from a lazy
L.ByteString
. In the future an incremental API using strict
ByteString
will be provided, but for now if you need that
level of control you need to use the attoparsec parsers in
Data.Hadoop.SequenceFile.Parser directly.
Basic Examples
import Control.Applicative ((<$>)) import qualified Data.ByteString.Lazy as L import qualified Data.Foldable as F import Data.Int (Int32) import Data.Text (Text) import qualified Data.Text.IO as T import Data.Hadoop.SequenceFile -- | Print all the keys in a sequence file. printKeys :: FilePath -> IO () printKeys path = do bs <- L.readFile path let records = decode bs :: Stream (RecordBlock Text Int32) F.for_ records $ \rb -> do F.mapM_ T.putStrLn (rbKeys rb) -- | Count the number of records in a sequence file. recordCount :: FilePath -> IO () recordCount path = do bs <- L.readFile path let records = decode bs :: Stream (RecordBlock Text Int32) print $ F.sum $ rbCount <$> records
Integration with Conduit
sourceRecords :: MonadIO m => FilePath -> Source m (RecordBlock Text ByteString) sourceRecords path = do bs <- liftIO (L.readFile path) F.traverse_ yield (decode bs)
- data Stream a
- class (Collection a ~ c a) => Writable c a where
- data RecordBlock k v = RecordBlock {
- rbCount :: Int
- rbKeys :: Collection k
- rbValues :: Collection v
- decode :: (Writable ck k, Writable cv v) => L.ByteString -> Stream (RecordBlock k v)
Documentation
A lazy stream of values.
class (Collection a ~ c a) => Writable c a where Source
Equivalent to the java interface org.apache.hadoop.io.Writable. All serializable key or value types in the Hadoop Map-Reduce framework implement this interface.
data RecordBlock k v Source
A block of key/value pairs. The key at index i always relates to the value at index i. Both vectors will always be the same size.
Constructors
RecordBlock | |
Fields
|
decode :: (Writable ck k, Writable cv v) => L.ByteString -> Stream (RecordBlock k v) Source
Decode a lazy L.ByteString
in to a stream of record blocks.