You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/writing-pipelines/index.md
+181-9Lines changed: 181 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -115,9 +115,117 @@ files for [Vim](https://www.vim.org/), [Atom](https://atom.io/),
115
115
and [Sublime](https://www.sublimetext.com/) text editors. If your favorite
116
116
editor is missing from this list, pull requests are welcome.
117
117
118
+
### Types
119
+
120
+
#### Built-in types
121
+
122
+
Martian has several supported built-in basic types:
123
+
-`int`: A 64-bit signed integer type.
124
+
-`float`: A double-precision floating point value.
125
+
-`bool`: `true` or `false`
126
+
-`string`: A utf-8 string. String literals in pipeline source code are double-quoted and recognize json-style escape sequences.
127
+
-`file`: A generic type indicating that the value will contain a path to a regular file. Always use absolute paths, as each stage will run in its own working directory.
128
+
-`path`: A generic type indicating that the value will contain a path to a directory.
129
+
-`map`: an untyped map type, with string keys and arbitrary values. For new code targetting martian 4.0, it is strongly recommended to use a typed map or struct instead.
130
+
131
+
Implicit conversion from `string` to `file` or `path` is permitted, as well as
132
+
from `int` to `float` (though not the reverse).
133
+
134
+
#### User-defined file types
135
+
136
+
Users may define new file types using the `filetype` directive. These behave
137
+
like `file`, and can be implicitly converted to `string` or `file`, but not from
138
+
one user file type to another. This allows pipelines to be more clear about
139
+
the format of files being passed around, and to exploit the type checking to
140
+
ensure that files are being used conistently as they are passed around.
141
+
142
+
#### Structured data types (Martian 4.0 preview)
143
+
144
+
A `struct` is related to the same concepts in other languages like C, a named
145
+
tuple in Python, or an object in javascript. They can be declared,
146
+
```
147
+
struct MyType(
148
+
int foo,
149
+
float bar,
150
+
)
151
+
```
152
+
Members of a struct can be extracted using the familiar `.` syntax, e.g.
153
+
```
154
+
call FOO(
155
+
foo = STRUCT_OUTPUT.struct.foo,
156
+
)
157
+
```
158
+
The output of a stage or pipeline is always a `struct`. Because of this, the
159
+
name of a stage or pipeline can be used as a type, to indicate a structure with
160
+
the same members and types as the outputs of the stage or pipeline.
161
+
162
+
Martian structures support a form of "[duck typing](https://en.wikipedia.org/wiki/Duck_typing)".
163
+
If one has a struct `MyType` as declared above, and another type
164
+
```
165
+
struct MyBiggerType(
166
+
int foo,
167
+
int bar,
168
+
string baz,
169
+
)
170
+
```
171
+
then a value of type `MyBiggerType` may be used for the input to a stage or
172
+
pipeline which asks for a `MyType`. This is because for every field in `MyType`
173
+
there is a field with the same name in `MyBiggerType`, and that field in
174
+
`MyBiggerType` has a type that is assignable to the type for that field in
175
+
`MyType`. Because of this, one can easily take a subset of the data from a
176
+
`struct` with only the values one actually needs. Values with struct types may
177
+
also always be used for untyped map values.
178
+
179
+
As an additional convenience, martian supports a "wildcard expansion" of a
180
+
struct value when calling a stage, e.g.
181
+
```
182
+
call STAGE2(
183
+
foo = self.foo,
184
+
* = STAGE1,
185
+
)
186
+
```
187
+
This is equivalent to `input = STAGE1.output` for every output of `STAGE1` that
188
+
is an input to `STAGE2`. To prevent ambiguity, only one wildcard expansion is
189
+
allowed for each call, and it is an error if one of the outputs of `STAGE1` was
190
+
already assigned in the input call explicitly (e.g. `foo` in the example).
191
+
192
+
#### Collection types
193
+
194
+
Martian also supports collections of values as arrays or typed maps. These are
195
+
declared using a syntax that is familar to users of C-style languages. Arrays
196
+
are declared as for example `int[]`. Typed maps (available in the martian 4.0
197
+
preview) always have string keys, and
198
+
are declared as for example `map<int>`. These can be combined as for example
199
+
`map<int[][]>[]`.
200
+
201
+
In order to prevent confusing data flows, maps cannot be directly nested. That
202
+
is, `map<map<int>>` is not permitted, nor is it permitted to nest untyped maps,
203
+
e.g. `map<map>`. It _is_ permitted to have a map of structs, and those structs
204
+
may contain further maps.
205
+
206
+
Because the type system has no way to enforce the length of an array or the keys
207
+
of a map, there is no support for indexing into one. If one knows the keys
208
+
ahead of time, use a struct.
209
+
210
+
A struct can be assigned to a typed map value if every field in the struct has a
211
+
type that can be assigned to the type of the map. For example a struct with
212
+
only `int` and `float` fields may be assigned to a value of type `map<float>`.
213
+
214
+
Typed maps may be converted to untyped maps, and `map<T>` may be converted to
215
+
`map<U>` if type `T` is convertible to type `U`. The same applies for arrays,
216
+
e.g. converting `T[]` to `U[]`.
217
+
218
+
A very important convenience is "projection" through structs. Using the
219
+
`MyType` example struct from the previous section, if we have a value `FOO` of
220
+
type `map<MyType[]>` then `FOO.bar` has type `map<float[]>`.
221
+
118
222
### Composability
119
223
120
-
Pipelines specify input and output parameters the same way stages do, so they may themselves also act as stages. This allows for the composition of an arbitrary mix of individual stages and pipelines into still larger pipelines. We refer to pipelines as "subpipelines" when they are composed into other pipelines.
224
+
Pipelines specify input and output parameters the same way stages do, so they
225
+
may themselves also act as stages. This allows for the composition of an
226
+
arbitrary mix of individual stages and pipelines into still larger pipelines.
227
+
We refer to pipelines as "subpipelines" when they are composed into other
228
+
pipelines.
121
229
122
230
Because parameter binding is done by stage name, pipelines cannot call the same
123
231
stage or sub-pipeline twice without aliasing it like so:
@@ -148,6 +256,39 @@ pipeline ADD_KEYS(
148
256
}
149
257
```
150
258
259
+
## Top-level file outputs
260
+
261
+
When a top-level pipeline completes, any outputs with file type are moved into
262
+
the pipestance directory's `outs` subdirectory. Symbolic links are added to
263
+
the original locations of those files in the stage output directories.
264
+
265
+
For an output with `file` or `path` type, the name of a file in the top-level
266
+
output directory will be the name of the output parameter of the pipeline. If
267
+
it is a user-defined file type, e.g. json, then the type will be appended to
268
+
the name as an extension, e.g. `.json`.
269
+
270
+
If a pipeline is defined like for example
271
+
```
272
+
pipeline PIPE(
273
+
out json foo "help text" "special_file",
274
+
)
275
+
```
276
+
then the string "help text" will be displayed in the console as a label for
277
+
the output file, and the default filename (which would be `foo.json`) is
278
+
overridden to `special_file`. These annotations apply when defining struct
279
+
types as well.
280
+
281
+
In martian 4.0, if an output is a struct type, then in the top-level `outs`
282
+
directory there will be a _directory_ for that value, containing files from
283
+
within that structure. Nested structures are handled recursively as deeper
284
+
directories.
285
+
286
+
An array of files will become a directory, with files named for the array index,
287
+
e.g. for `json[] foo` there will be `foo/1.json` and so on. For typed maps,
288
+
`map<json>`, the outputs would be `foo/<key>.json` for each key in the map.
289
+
Arrays or typed maps of structs containing files and up as nested directories
290
+
as one would expect.
291
+
151
292
## Organizing Code
152
293
153
294
### MRO Files
@@ -157,7 +298,9 @@ by convention they are written in files that have an ```.mro``` extension.
157
298
158
299
### Preprocessing with @include
159
300
160
-
Martian supports lexical preprocessing with an ```@include``` directive, which takes the path to another MRO file as an argument. This directive is evaluated by splicing the contents of the included file into the file where the directive is given, replacing the directive itself. This evaluation is recursive, and Martian keeps track of the inclusion tree in order to be able to report errors using per-source file line numbers.
301
+
Martian supports organizing one's pipeline definitions into multiple files which
302
+
can use an `@include` directive to import the stages, pipelines, and types
303
+
defined in other files.
161
304
162
305
`_my_stages.mro`
163
306
@@ -193,7 +336,12 @@ pipeline DUPLICATE_FINDER(
193
336
194
337
### Stage Code vs Pipeline Code
195
338
196
-
By convention, the ```@include``` directive allows the developer to organize code into header files, although there is no formal distinction between header and non-header MRO files in Martian. Typically, stages that are logically grouped together are declared in one file, for example ```_sorting_stages.mro```, and that file would be included into another MRO file that declares a pipeline that calls these included stages. By convention, MRO files containing stage declarations should be named with the suffix ```_stages```.
339
+
The ```@include``` directive allows the developer to organize code. Typically,
340
+
stages that are logically grouped together are declared in one file, for example
341
+
```_sorting_stages.mro```, and that file would be included into another MRO
342
+
file that declares a pipeline that calls these included stages. By convention,
343
+
MRO files containing stage declarations should be named with the suffix
344
+
```_stages```.
197
345
198
346
### Martian Project Directory Structure
199
347
@@ -226,7 +374,10 @@ martian_project/
226
374
227
375
## Formatting Code
228
376
229
-
Martian includes a canonical code formatting utility called `mrf`. It parses your MRO code into its abstract syntax tree and re-emits the code with canonical whitespacing. In particular, `mrf` performs intelligent column-wise alignment of parameter fields so that this:
377
+
Martian includes a canonical code formatting utility called `mrf`. It parses
378
+
your MRO code into its abstract syntax tree and re-emits the code with
379
+
canonical whitespace. In particular, `mrf` performs intelligent column-wise
380
+
alignment of parameter fields so that this:
230
381
231
382
~~~~
232
383
stage SORT_ITEMS (in txt unsorted,
@@ -246,17 +397,38 @@ stage SORT_ITEMS(
246
397
)
247
398
~~~~
248
399
249
-
`mrf` is an "opinionated" formatter, inspired by tools like `gofmt`, therefore we will borrow
250
-
[their explanation](https://blog.golang.org/go-fmt-your-code) of the benefits of canonical code formatting:
400
+
`mrf` is an "opinionated" formatter, inspired by tools like `gofmt`, therefore
401
+
we will borrow [their explanation](https://blog.golang.org/go-fmt-your-code) of
402
+
the benefits of canonical code formatting:
251
403
252
404
-**Easier to write**: never worry about minor formatting concerns while hacking away.
253
405
-**Easier to read**: when all code looks the same you need not mentally convert others' formatting style into something you can understand.
254
406
-**Easier to maintain**: mechanical changes to the source don't cause unrelated changes to the file's formatting; diffs show only the real changes.
255
407
-**Uncontroversial**: never have a debate about spacing or brace position ever again!
256
408
257
-
`mrf` takes a list of MRO filenames as arguments. By default, it will output the formatted code back to `stdout`. If given the `--rewrite` option, it will write the formatted code back into the original files. If given the `--all` option, it will rewrite all MRO files found in your `MROPATH`. For consistency of your MRO codebase, consider configuring editor save-hooks or git commit-hooks that run `mrf --rewrite` or `mrf --all`.
258
-
259
-
`mrf` does not support any arguments that affect the formatting, otherwise it would not be canonical!
409
+
`mrf` takes a list of MRO filenames as arguments. By default, it will output
410
+
the formatted code back to `stdout`. If given the `--rewrite` option, it will
411
+
write the formatted code back into the original files. If given the `--all`
412
+
option, it will rewrite all MRO files found in your `MROPATH`. For consistency
413
+
of your MRO codebase, consider configuring editor save-hooks or git
414
+
commit-hooks that run `mrf --rewrite` or `mrf --all`.
415
+
416
+
`mrf` does not support any arguments that affect the formatting, otherwise it
417
+
would not be canonical!
418
+
419
+
If you run `mrf` with the `--includes` flag, it will (attempt to) fix up
420
+
`@include` directives. Specifically, if a pipeline in an `mro` source file
421
+
uses a stage, it will ensure that the file defining that stage is _directly_
422
+
included, and that files which are not directly depended on are not included.
423
+
It will only add `@include` statements referring to files in the root of your
424
+
`MROPATH` or in the transitive closure of the existing includes. The reason
425
+
for the convention of direct inclusions is the same as the reasons explained
426
+
in the [clang include-what-you-use][] tool - briefly, if a file you depend
427
+
on stops depending on another file, and you only included it transitively,
428
+
then your pipeline will fail to compile if the intermediate pipeline removes
0 commit comments