See also
We find that much of our Ruffus pipeline code is built on the same template and this is generally a good place to start developing a new pipeline.
From version 2.4, Ruffus includes an optional Ruffus.cmdline module that provides support for a set of common command line arguments. This makes writing Ruffus pipelines much more pleasant.
All you need to do is copy these 6 lines
import ruffus.cmdline as cmdline parser = cmdline.get_argparse(description='WHAT DOES THIS PIPELINE DO?') # <<<---- add your own command line options like --input_file here # parser.add_argument("--input_file") options = parser.parse_args() # standard python logger which can be synchronised across concurrent Ruffus tasks logger, logger_mutex = cmdline.setup_logging (__name__, options.log_file, options.verbose) # <<<---- pipelined functions go here cmdline.run (options)You are recommended to use the standard argparse module but the deprecated optparse module works as well. (See below for the template)
Ruffus.cmdline by default provides these predefined options:
-v, --verbose --version -L, --log_file # tasks -T, --target_tasks --forced_tasks -j, --jobs --use_threads # printout -n, --just_print # flow chart --flowchart --key_legend_in_graph --draw_graph_horizontally --flowchart_format # check sum --touch_files_only --checksum_file_name --recreate_database
The script provides for logging both to the command line:
myscript -v myscript --verboseand an optional log file:
# keep tabs on yourself myscript --log_file /var/log/secret.logbook
Logging is ignored if neither --verbose or --log_file are specified on the command line
Ruffus.cmdline automatically allows you to write to a shared log file via a proxy from multiple processes. However, you do need to use logging_mutex for the log files to be synchronised properly across different jobs:
with logging_mutex: logger_proxy.info("Look Ma. No hands")Logging is set up so that you can write
logger.info("A message")
logger.debug("A message")
from ruffus.cmdline import MESSAGE logger.log(MESSAGE, "A message")
This is extremely useful for understanding what is happening with your pipeline, what tasks and which jobs are up-to-date etc.
See Chapter 5: Understanding how your pipeline works with pipeline_printout(...)
To trace the pipeline, call script with the following options
# well-mannered, reserved myscript --just_print myscript -n or # extremely loquacious myscript --just_print --verbose 5 myscript -n -v5Increasing levels of verbosity (--verbose to --verbose 5) provide more detailed output
This is the subject of Chapter 7: Displaying the pipeline visually with pipeline_printout_graph(...).
Flowcharts can be specified using the following option:
myscript --flowchart xxxchart.svgThe extension of the flowchart file indicates what format the flowchart should take, for example, svg, jpg etc.
Override with --flowchart_format
Optionally specify the number of parallel strands of execution and which is the last target task to run. The pipeline will run starting from any out-of-date tasks which precede the target and proceed no further beyond the target.
myscript --jobs 15 --target_tasks "final_task" myscript -j 15
The checkpoint file uses to the value set in the environment (DEFAULT_RUFFUS_HISTORY_FILE).
If this is not set, it will default to .ruffus_history.sqlite in the current working directory.
Either can be changed on the command line:
myscript --checksum_file_name mychecksum.sqlite
Create or update the checkpoint file so that all existing files in completed jobs appear up to date
Will stop sensibly if current state is incomplete or inconsistent
myscript --recreate_database
As far as possible, create empty files with the correct timestamp to make the pipeline appear up to date.
myscript --touch_files_only
Note that particular options can be skipped (not added to the command line), if they conflict with your own options, for example:
# see below for how to use get_argparse parser = cmdline.get_argparse( description='WHAT DOES THIS PIPELINE DO?', # Exclude the following options: --log_file --key_legend_in_graph ignored_args = ["log_file", "key_legend_in_graph"])
The verbosity can be specified on the command line
myscript --verbose 5 # verbosity of 5 + 1 = 6 myscript --verbose 5 --verbose # verbosity reset to 2 myscript --verbose 5 --verbose --verbose 2If the printed paths are too long, and need to be abbreviated, or alternatively, if you want see the full absolute paths of your input and output parameters, you can specify an extension to the verbosity. See the manual discussion of verbose_abbreviated_path for more details. This is specified as --verbose VERBOSITY:VERBOSE_ABBREVIATED_PATH. (No spaces!)
For example:
# verbosity of 4 myscript.py --verbose 4 # display three levels of nested directories myscript.py --verbose 4:3 # restrict input and output parameters to 60 letters myscript.py --verbose 4:-60
Note that the version for your script will default to "%(prog)s 1.0" unless specified:
parser = cmdline.get_argparse( description='WHAT DOES THIS PIPELINE DO?', version = "my_programme.py v. 2.23")
deprecated since python 2.7
# # Using optparse (new in python v 2.6) # from ruffus import * parser = cmdline.get_optgparse(version="%prog 1.0", usage = "\n\n %prog [options]") # <<<---- add your own command line options like --input_file here # parser.add_option("-i", "--input_file", dest="input_file", help="Input file") (options, remaining_args) = parser.parse_args() # logger which can be passed to ruffus tasks logger, logger_mutex = cmdline.setup_logging ("this_program", options.log_file, options.verbose) # <<<---- pipelined functions go here cmdline.run (options)