Skip to content

Using threaded runtime destroys performance of C calls #115

@l29ah

Description

@l29ah
‰ ghc -O2 -threaded --make inline-c-crit.hs && ./inline-c-crit +RTS -N        
Linking inline-c-crit ...
benchmarking haskell +
time                 7.679 ns   (7.570 ns .. 7.774 ns)
                     0.998 R²   (0.998 R² .. 0.999 R²)
mean                 7.594 ns   (7.495 ns .. 7.733 ns)
std dev              397.9 ps   (300.3 ps .. 528.6 ps)
variance introduced by outliers: 76% (severely inflated)

benchmarking c +
time                 182.9 ns   (182.0 ns .. 184.6 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 182.7 ns   (182.1 ns .. 183.8 ns)
std dev              2.572 ns   (1.813 ns .. 3.817 ns)
variance introduced by outliers: 15% (moderately inflated)

benchmarking cu +
time                 78.18 ns   (76.60 ns .. 79.59 ns)
                     0.998 R²   (0.998 R² .. 1.000 R²)
mean                 77.06 ns   (76.43 ns .. 78.05 ns)
std dev              2.792 ns   (1.559 ns .. 4.588 ns)
variance introduced by outliers: 56% (severely inflated)

benchmarking c block +
time                 195.8 ns   (188.0 ns .. 203.0 ns)
                     0.993 R²   (0.990 R² .. 0.999 R²)
mean                 189.3 ns   (186.6 ns .. 193.3 ns)
std dev              11.25 ns   (7.601 ns .. 14.98 ns)
variance introduced by outliers: 76% (severely inflated)

benchmarking cu block +
time                 76.09 ns   (75.35 ns .. 76.99 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 78.40 ns   (76.93 ns .. 81.21 ns)
std dev              6.560 ns   (2.949 ns .. 10.09 ns)
variance introduced by outliers: 88% (severely inflated)

76ns call overhead on a modern 3.5GHz i7 is just insane. That's 266000 cycles!
Disabling -N makes it much less (but still more than Haskell), but then i can't meaningfully use threads in the application.

Code:

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE QuasiQuotes #-}
import Criterion.Main
import qualified Language.C.Inline as C
import qualified Language.C.Inline.Unsafe as CU
import Data.Word
import System.IO.Unsafe

C.include "<stdint.h>"

type Fun = Word32 -> Word32 -> Word32

fun :: Fun
fun = (+)

cfun :: Fun
cfun x y = [C.pure| uint32_t { $(uint32_t x) + $(uint32_t y) }|]

cufun :: Fun
cufun x y = [CU.pure| uint32_t { $(uint32_t x) + $(uint32_t y) }|]

cblockfun :: Fun
cblockfun x y = unsafePerformIO [C.block| uint32_t { return $(uint32_t x) + $(uint32_t y); }|]

cublockfun :: Fun
cublockfun x y = unsafePerformIO [CU.block| uint32_t { return $(uint32_t x) + $(uint32_t y); }|]

main = defaultMain
	[ bgroup ""
		[ bench "haskell +" $ nf (fun 1) 1
		, bench "c +" $ nf (cfun 1) 1
		, bench "cu +" $ nf (cufun 1) 1
		, bench "c block +" $ nf (cblockfun 1) 1
		, bench "cu block +" $ nf (cublockfun 1) 1
		]
	]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions