== Rule-based task generators (Make-like)

This chapter illustrates how to perform common transformations used during the build phase through the use of rule-based task generators.

=== Declaration and usage

Rule-based task generators are a particular category of task generators producing exactly one task (transformation) at a time.

The following example presents a simple example of a task generator producing the file 'foobar.txt' from the project file 'wscript' by executing a copy (the command `cp`). Let's create a new project in the folder '/tmp/rule/' containing the following 'wscript' file:

[source,python]
---------------
top = '.'
out = 'out'

def configure(conf):
	pass

def build(bld):
	bld( <1>
		rule   = 'cp ${SRC} ${TGT}', <2>
		source = 'wscript', <3>
		target = 'foobar.txt', <4>
	)
---------------

<1> To instantiate a new task generator, remember that all arguments have the form 'key=value'
<2> The attribute 'rule' is mandatory here. It represents the command to execute in a readable manner (more on this in the next chapters).
<3> Source files, either in a space-delimited string, or in a list of python strings
<4> Target files, either in a space-delimited string, or in a list of python strings

Upon execution, the following output will be observed:

[source,shishell]
---------------
$ waf distclean configure build -v
'distclean' finished successfully (0.000s)
'configure' finished successfully (0.021s)
Waf: Entering directory `/tmp/rule/out'
[1/1] foobar.txt: wscript -> out/default/foobar.txt <1>
16:24:21 runner system command ->  cp ../wscript default/foobar.txt <2>
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.016s)

$ tree
.
|-- out
|   |-- c4che
|   |   |-- build.config.py
|   |   `-- default.cache.py
|   |-- config.log
|   `-- default
|       `-- foobar.txt
`-- wscript

$ waf <3>
Waf: Entering directory `/tmp/rule/out'
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.006s)

$ echo " " >> wscript <4>

$ waf
Waf: Entering directory `/tmp/rule/out'
[1/1] foobar.txt: wscript → out/default/foobar.txt <5>
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.013s)
---------------

<1> In the first execution, the target is correctly created
<2> Command-lines are only displayed in 'verbose mode' by using the option '-v'
<3> The target are up-to-date, there is nothing to do
<4> Modify the source file in place by appending a space character
<5> Since the source has changed, the target is created once again.

The target is created once again whenever the source files or the rule change. This is achieved by computing a signature for the targets, and storing that signature between executions. By default, the signature is computed by hashing the rule and the source files (MD5 by default).

NOTE: The task (or transformation) are only executed during the build phase, after all build functions have been read

=== Rule functions

Rules may be given as expression strings or as python function. Let's modify the previous project file with a python function:

[source,python]
---------------
top = '.'
out = 'out'

def configure(conf):
	pass

def build(bld):
	def run(task): <1>
		print(' → source is ' + task.generator.source) <2>

		src = task.inputs[0].srcpath(task.env) <3>
		tgt = task.outputs[0].bldpath(task.env) <4>

		import Utils
		cmd = 'cp %s %s' % (src, tgt)
		print(cmd)
		return Utils.exec_command(cmd) <5>

	bld(
		rule   = run, <6>
		source = 'wscript',
		target = 'same.txt',
	)
---------------

<1> Rule functions take the task instance as parameter.
<2> Task instances may access their task generator through the attribute 'generator'
<3> Sources and targets are represented internally as Node objects bound to the task instance.
<4> Commands are executed from the root of the build directory. Node methods such as 'bldpath' ease the command line creation.
<5> Utils.exec_command(...) is a wrapper around subprocess.Popen(...) from the Python library. Passing a string will execute the command through the system shell (use lists to disable this behaviour). The return code for a rule function must be non-0 to indicate a failure.
<6> Use a function instead of a string expression

The execution trace will be similar to the following:

[source,shishell]
---------------
$ waf distclean configure build
'distclean' finished successfully (0.001s)
'configure' finished successfully (0.001s)
Waf: Entering directory `/tmp/rule/out'
[1/1] same.txt: wscript -> out/default/same.txt
 → source is wscript
cp ../wscript default/same.txt
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.010s)
---------------

The rule function must return '0' to indicate success, and must generate the files corresponding to the outputs. The rule function must also access the task object in a read-only manner, and avoid node creation or attribute modification.

NOTE: The string expression 'cp $\{SRC} $\{TGT}' from the previous example is converted internally to a function similar to 'run'.

NOTE: While a string expression may execute only one system command, functions may execute various commands at once.

WARNING: Due to limitations in the cPython interpreter, only functions defined in python modules can be hashed. This means that changing a function will trigger a rebuild only if it is defined in a waf tool (not in wscript files).

=== Shell usage

The attribute 'shell' is used to enable the system shell for command execution. A few points are worth keeping in mind when declaring rule-based task generators:


. The Waf tools do not use the shell for executing commands
. The shell is used by default for user commands and custom task generators
. String expressions containing the following symbols \'>', \'<' or \'&' cannot be transformed into functions to execute commands without a shell, even if told to
. In general, it is better to avoid the shell whenever possible to avoid quoting problems (paths having blank characters in the name for example)
. The shell is creating a performance penalty which is more visible on win32 systems.

Here is an example:

[source,python]
---------------
top = '.'
out = 'out'

def configure(conf):
	pass

def build(bld):
	bld(rule='cp ${SRC} ${TGT}', source='wscript', target='f1.txt', shell=False)
	bld(rule='cp ${SRC} ${TGT}', source='wscript', target='f2.txt', shell=True)
---------------

Upon execution, the results will be similar to the following:

[source,shishell]
---------------
waf distclean configure build --zones=runner,action <1>
'distclean' finished successfully (0.004s)
'configure' finished successfully (0.001s)
Waf: Entering directory `/tmp/rule/out'
23:11:23 action <2>
def f(task):
	env = task.env
	wd = getattr(task, 'cwd', None)
	def to_list(xx):
		if isinstance(xx, str): return [xx]
		return xx
	lst = []
	lst.extend(['cp'])
	lst.extend([a.srcpath(env) for a in task.inputs])
	lst.extend([a.bldpath(env) for a in task.outputs])
	lst = [x for x in lst if x]
	return task.exec_command(lst, cwd=wd)

23:11:23 action
def f(task): <3>
	env = task.env
	wd = getattr(task, 'cwd', None)
	p = env.get_flat
	cmd = ''' cp %s %s ''' % (" ".join([a.srcpath(env) for a in task.inputs]),
		" ".join([a.bldpath(env) for a in task.outputs]))
	return task.exec_command(cmd, cwd=wd)

[1/2] f1.txt: wscript -> out/default/f1.txt
23:11:23 runner system command -> ['cp', '../wscript', 'default/f1.txt'] <4>
[2/2] f2.txt: wscript -> out/default/f2.txt
23:11:23 runner system command ->  cp ../wscript default/f2.txt
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.017s)
---------------

<1> The 'debugging zones' enable the display of specific debugging information (comma-separated values) for the string expression conversion, and 'runner' for command execution
<2> String expressions are converted to functions (here, without the shell).
<3> Command execution by the shell. Notice the heavy use of string concatenation.
<4> Commands to execute are displayed by calling 'waf --zones=runner'. When called without the shell, the arguments are displayed as a list.

NOTE: Whenever possible, avoid using the shell to improve both performance and maintainability

=== Inputs and outputs

Source and target arguments are optional for make-like task generators, and may point at one or several files at once. Here are a few examples:

[source,python]
---------------
top = '.'
out = 'out'

def configure(conf):
	pass

def build(bld):
	bld( <1>
		rule   = 'cp ${SRC} ${TGT[0].abspath(env)} && cp ${SRC} ${TGT[1].abspath(env)}',
		source = 'wscript',
		target = 'f1.txt f2.txt',
		shell  = True
	)

	bld( <2>
		source = 'wscript',
		rule   = 'echo ${SRC}'
	)

	bld( <3>
		target = 'test.k3',
		rule   = 'echo "test" > ${TGT}',
	)

	bld( <4>
		rule   = 'echo 1337'
	)

	bld( <5>
		rule   = "echo 'task always run'",
		always = True
	)
---------------

<1> Generate 'two files' whenever the input or the rule change. Likewise, a rule-based task generator may have multiple input files.
<2> The command is executed whenever the input or the rule change. There are no declared outputs.
<3> No input, the command is executed whenever it changes
<4> No input and no output, the command is executed only when the string expression changes
<5> No input and no output, the command is executed each time the build is called

For the record, here is the output of the build:

[source,shishell]
---------------
$ waf distclean configure build
'distclean' finished successfully (0.002s)
'configure' finished successfully (0.093s)
Waf: Entering directory `/tmp/rule/out'
[1/5] echo 1337:
1337
[2/5] echo 'task always run':
[3/5] echo ${SRC}: wscript
../wscript
[4/5] f1.txt f2.txt: wscript -> out/default/f1.txt out/default/f2.txt
task always run
[5/5] test.k3:  -> out/default/test.k3
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.049s)

$ waf
Waf: Entering directory `/tmp/rule/out'
[2/5] echo 'task always run':
task always run
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.014s)
---------------

=== Sharing data

Data sharing is performed through the configuration set. The following example illustrates how to use it.

[source,python]
---------------
top = '.'
out = 'out'

import Options

def configure(conf):
	print('prefix: %r' % conf.env.PREFIX)
	print('jobs:   %r' % Options.options.jobs)

def build(bld):
	bld(
		target = 't1.txt',
		rule   = 'echo ${PREFIX} > ${TGT}' <1>
	)

	obj = bld(
		target = 't2.txt',
		rule   = 'echo ${TEST_VAR} > ${TGT}', <2>
	)
	obj.env.TEST_VAR <3>= str(Options.options.jobs) <4>

	bld(
		rule = 'echo "hey"',
		vars = ['TEST_VAR'], <5>
		env  = obj.env <6>
	)
---------------

<1> The 'PREFIX' is one of the few predefined variables. It is necessary for computing the default installation path.
<2> The following rule will create the file 't2.txt' with the contents of 'TEST_VAR'
<3> The value of 'TEST_VAR' will be defined now
<4> Use the value of a predefined command-line option (the jobs control the amount of commands which may be executed in parallel)
<5> By default, the variables used in string expressions '$\{...}' are extracted automatically and used as dependencies (rebuild the targets when the value change). They are given manually in this case.
<6> Set the base environment. The variable TEST_VAR will be used for the dependency here.

By turning the debugging flags on, it will be easier to understand what is happening during the build:

[source,shishell]
---------------
$ waf distclean configure build --zones=runner,action
'distclean' finished successfully (0.003s)
prefix: '/usr/local' <1>
jobs:   2
'configure' finished successfully (0.001s)
Waf: Entering directory `/tmp/rule/out'
15:21:29 action
def f(task):
	env = task.env
	wd = getattr(task, 'cwd', None)
	p = env.get_flat
	cmd = ''' echo %s > %s ''' % (p('PREFIX'), <2>
		" ".join([a.bldpath(env) for a in task.outputs]))
	return task.exec_command(cmd, cwd=wd)

[...] <3>

[1/3] t2.txt:  -> out/default/t2.txt
15:21:29 runner system command ->  echo 2 > default/t2.txt
[2/3] t1.txt:  -> out/default/t1.txt
15:21:29 runner system command ->  echo /usr/local > default/t1.txt <4>
[2/3] echo "hey":
18:05:26 runner system command ->  echo "hey"
hey
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.052s)

$ waf
Waf: Entering directory `/tmp/rule/out'
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.007s)

$ waf --jobs=17 --zones=runner
Waf: Entering directory `/tmp/rule/out'
[1/3] t2.txt:  -> out/default/t2.txt
15:23:24 runner system command ->  echo 17 > default/t2.txt <5>
[2/3] echo "hey": <6>
hey
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.014s)
---------------

<1> The default values for prefix and jobs.
<2> The function generated from the string expression will use the 'PREFIX' value.
<3> Some output was removed.
<4> The expression '$\{PREFIX}' was substituted by its value.
<5> The target is created whenever the value of a variable changes, even though the string expression has not changed.
<6> The variable 'TEST_VAR' changed, so the command was executed again.

=== Execution order and dependencies

Although task generators will create the tasks in the relevant order, tasks are executed in parallel and the compilation order may not be the order expected. In the example from the previous section, the target 't2.txt' was processed before the target 't1.txt'. We will now illustrate two important concepts used for the build specification:

. order: sequential constraints between the tasks being executed (here, the commands)
. dependency: executing a task if another task is executed

For illustation purposes, let's create the following chain 'wscript' → 'w1.txt' → 'w2.txt'. The order constraint is given by the attribute 'after' which references a task class name. For rule-based task generators, the task class name is bound to the attribute 'name'

[source,python]
---------------
top = '.'
out = 'out'

def configure(conf):
	pass

def build(bld):
	bld(
		rule   = 'cp ${SRC} ${TGT}',
		source = 'wscript',
		target = 'w1.txt',
		name   = 'w1',
	)

	bld(
		rule   = 'cp ${SRC} ${TGT}',
		source = 'w1.txt',
		target = 'w2.txt'
		after  = w1,
	)
---------------

The execution output will be similar to the following. Note that both files are created whenever the first source file 'wscript' changes:

[source,shishell]
---------------
$ waf distclean configure build
'distclean' finished successfully (0.001s)
'configure' finished successfully (0.001s)
Waf: Entering directory `/tmp/rule/out'
[1/2] w1: wscript -> out/default/w1.txt
[2/2] w2.txt: out/default/w1.txt -> out/default/w2.txt
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.055s)

$ echo " " >> wscript

$ waf
Waf: Entering directory `/tmp/rule/out'
[1/2] w1: wscript -> out/default/w1.txt
[2/2] w2.txt: out/default/w1.txt -> out/default/w2.txt
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.018s)
---------------

Although the order constraint between w1 and w2 is trivial to find in this case, implicit constraints can be a source of confusion for large projects. For this reason, the default is to 'encourage' the explicit order declaration. Nevertheless, a special method may be used to let the system look at all source and target files:

[source,python]
---------------
top = '.'
out = 'out'

def configure(conf):
	pass

def build(bld):

	bld.use_the_magic()

	bld(
		rule   = 'cp ${SRC} ${TGT}',
		source = 'wscript',
		target = 'w1.txt',
	)

	bld(
		rule   = 'cp ${SRC} ${TGT}',
		source = 'w1.txt',
		target = 'w2.txt',
	)
---------------

WARNING: The method 'bld.use_the_magic()' will not work for tasks that do not have clear input and output files and will degrade performance for large builds.


=== Dependencies on file contents

As a second example, we will create a file named 'r1.txt' from the current date. It will be updated each time the build is executed. A second file named 'r2.txt' will be created from 'r1.txt'.

[source,python]
---------------
top = '.'
out = 'out'

def configure(conf):
	pass

def build(bld):
	bld(
		name   = 'r1', <1>
		target = 'r1.txt',
		rule   = '(date > ${TGT}) && cat ${TGT}', <2>
		always = True, <3>
	)

	bld(
		name   = 'r2', <4>
		target = 'r2.txt',
		rule   = 'cp ${SRC} ${TGT}',
		source = 'r1.txt', <5>
		after  = 'r1', <6>
	)
---------------

<1> Give the task generator a name, it will create a task class of the same name to execute the command
<2> Create 'r1.txt' with the date
<3> There is no source file to depend on and the rule never changes. The task is then set to be executed each time the build is started by using the attribute 'always'
<4> If no name is provided, the rule is used as a name for the task class
<5> Use 'r1.txt' as a source for 'r2.txt'. Since 'r1.txt' was declared before, the dependency will be added automatically ('r2.txt' will be re-created whenever 'r1.txt' changes)
<6> Set the command generating 'r2.txt' to be executed after the command generating 'r1.txt'. The attribute 'after' references task class names, not task generators. Here it will work because rule-based task generator tasks inherit the 'name' attribute

The execution output will be the following:

[source,shishell]
---------------
$ waf distclean configure build -v
'distclean' finished successfully (0.003s)
'configure' finished successfully (0.001s)
Waf: Entering directory `/tmp/rule/out'
[1/2] r1:  -> out/default/r1.txt
16:44:39 runner system command ->  (date > default/r1.txt) && cat default/r1.txt
dom ene 31 16:44:39 CET 2010
[2/2] r2: out/default/r1.txt -> out/default/r2.txt
16:44:39 runner system command ->  cp default/r1.txt default/r2.txt
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.021s)

$ waf -v
Waf: Entering directory `/tmp/rule/out'
[1/2] r1:  -> out/default/r1.txt
16:44:41 runner system command ->  (date > default/r1.txt) && cat default/r1.txt
dom ene 31 16:44:41 CET 2010
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.016s)
---------------

Although r2 *depends* on 'r1.txt', r2 was not executed in the second build. As a matter of fact, the signature of the task r1 has not changed, and r1 was only set to be executed each time, regardless of its signature. Since the signature of the 'r1.txt' does not change, the signature of r2 will not change either, and 'r2.txt' is considered up-to-date.

We will now illustrate how to make certain that the outputs reflect the file contents and trigger the rebuild for dependent tasks by enabling the attribute 'on_results':

[source,python]
---------------
top = '.'
out = 'out'

def configure(conf):
	pass

def build(bld):
	bld(
		name   = 'r1',
		target = 'r1.txt',
		rule   = '(date > ${TGT}) && cat ${TGT}',
		always = True,
		on_results = True,
	)

	bld(
		target = 'r2.txt',
		rule   = 'cp ${SRC} ${TGT}',
		source = 'r1.txt',
		after  = 'r1',
	)
---------------

Here 'r2.txt' will be re-created each time:

[source,shishell]
---------------
$ waf distclean configure build -v
'distclean' finished successfully (0.003s)
'configure' finished successfully (0.001s)
Waf: Entering directory `/tmp/rule/out'
[1/2] r1:  -> out/default/r1.txt
16:59:49 runner system command ->  (date > default/r1.txt) && cat default/r1.txt <1>
dom ene 31 16:59:49 CET 2010 <2>
[2/2] r2: out/default/r1.txt -> out/default/r2.txt
16:59:49 runner system command ->  cp default/r1.txt default/r2.txt
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.020s)

$ waf -v
Waf: Entering directory `/tmp/rule/out'
[1/2] r1:  -> out/default/r1.txt
16:59:49 runner system command ->  (date > default/r1.txt) && cat default/r1.txt
dom ene 31 16:59:49 CET 2010 <3>
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.016s)

$ waf -v
Waf: Entering directory `/tmp/rule/out'
[1/2] r1:  -> out/default/r1.txt
16:59:53 runner system command ->  (date > default/r1.txt) && cat default/r1.txt
dom ene 31 16:59:53 CET 2010 <4>
[2/2] r2: out/default/r1.txt -> out/default/r2.txt
16:59:53 runner system command ->  cp default/r1.txt default/r2.txt
Waf: Leaving directory `/tmp/rule/out'
'build' finished successfully (0.022s)
---------------

<1> Start with a clean build, both 'r1.txt' and 'r2.txt' are created
<2> Notice the date and time
<3> The second build was executed at the same date and time, so 'r1.txt' has not changed, therefore 'r2.txt' is up to date
<4> The third build is executed at another date and time. Since 'r1.txt' has changed, 'r2.txt' is created once again

