Snakemake Continued#
Snakemake stands out among data processing tools for its versatility and user-friendliness. It empowers researchers to create reproducible and scalable analyses through a human-readable, Python-based language. If you’re familiar with the make tool, picking up Snakemake will be a breeze.
1. Building the Workflow: The Snakefile#
The heart of Snakemake lies in the Snakefile, acting as its build file. It defines a series of rules dictating the workflow’s execution. Each rule details how to produce a specific target (output) using its required dependencies (inputs) and the necessary actions.
In the intro chapter it was enough to call the pipeline from the ZIPF directory as root. For the following guide you need to change the directory for each section
Getting started - Executing Snakemake:#
if not already done create a new directory for example as the one provided snakemake/1_the_snakefile
.
Depending on how you follow the guide the paths for your solution might be different!
By default, running Snakemake without specifying a target prompts it to search for a file named Snakefile. Upon execution, it provides details about the workflow, including the number of steps, involved rules, input and output files.
Therefore we need to create a snakefile
. For the purpose of the course we start with an example that counts words.
# Count words in one of the books
rule count_words:
input: '../../data/dracula.txt'
output: '../../results/dracula.dat'
shell: 'python ../wordcount.py ../../data/dracula.txt ../../results/dracula.dat'
This is a build file, which for Snakemake is called a
Snakefile - a file executed by Snakemake. Note that aside from a few keyword
additions like rule
, it follows standard Python 3 syntax.
The parts included in to the snakefile are explained as follows:
Comments: Lines starting with # provide explanations and are ignored by Snakemake.
Target: This represents the desired outcome, denoted by a filename (e.g., dracula.dat).
Dependencies: These are files (e.g., data/dracula.txt) needed to create or update the target.
Action: This shell command (e.g., python wordcount.py data/dracula.txt dracula.dat) is responsible for generating or updating the target using the dependencies.
Snakemake follows Python 3 syntax, introducing keywords like rule. Indentation, whether using tabs or spaces, adheres to Python conventions. A rule combines target, dependencies, and actions, forming a “recipe” for a specific step in the workflow.
The rule we just created describes how to build the output dracula.dat
using
the action python wordcount.py
and the input dracula.txt
.
Information that was implicit in our shell script - that we are generating a
file called dracula.dat
and that creating this file requires
dracula.txt
- is now made explicit by Snakemake’s syntax.
Let’s first ensure we start from scratch and delete the .dat
, .png
, and
results.txt
files we created earlier:
rm results/*.dat results/*.png results/results.txt
to run snakemake we just have to call it while being in the same directory:
snakemake
Depending on your system the following error message may appear:
Assuming unrestricted shared filesystem usage for local execution.
Error: cores have to be specified for local execution (use --cores N with N being a number >= 1 or 'all')
Instead run the following:
snakemake --cores all
By default, Snakemake tells us what it’s doing as it executes actions:
Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job stats:
job count
----------- -------
count_words 1
total 1
Select jobs to execute...
Execute 1 jobs...
[Sun Feb 18 15:30:44 2024]
localrule count_words:
input: ../data/dracula.txt
output: ../data/dracula.dat
jobid: 0
reason: Missing output files: ../data/dracula.dat
resources: tmpdir=/tmp
[Sun Feb 18 15:30:44 2024]
Finished job 0.
1 of 1 steps (100%) done
Complete log: .snakemake/log/2024-02-18T153044.591897.snakemake.log
If there are errors, check your syntax. Remember, aside from new keywords
like rule
and input
, Snakemake follows Python syntax. Let’s see if we got
what we expected:
head -5 results/dracula.dat
The output should look like this:
the 8089 4.836269931901207
and 5976 3.5729446301201144
i 4846 2.897337630114136
to 4745 2.836951517724221
of 3748 2.240862863736645
If you now try to rerun snakemake the following error will occur:
Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Nothing to be done (all requested files are present and up to date).
This basicly means that since the output files are already present the workflow is not re-run. To re-run the snakefile just delete the .dat
.
At the same time snakemake checks the last modification time, if .dat
is younger than the input .txt
file it won’t rerun but if changes where done and save to the .txt
file and there by make the modification younger than the .dat
, snakemake will re-run the workflow.
An argument for this approach would be that only rebuilding files when required, makes processing more efficient.
It also important to note snakefiles do not need to be named snakefile
, you only need to tell snakemake the name on usage. This means you can have snakefiles for multiple workflows stored in the same location.
For example using the -s
flag / option:
snakemake -s snakefile.txt
Snakefiles as Documentation#
By explicitly recording the inputs to and outputs from steps in our analysis and the dependencies between files, Snakefiles act as a type of documentation, reducing the number of things we have to remember.
We can add additional rules to the snakefile i.e.:
rule count_words:
input: '../../data/dracula.txt'
output: '../../data/dracula.dat'
shell: 'python ../wordcount.py ../../data/dracula.txt ../../results/dracula.dat'
rule count_words_moby_dick:
input: '../../data/moby_dick.txt'
output: '../../results/moby_dick.dat'
shell: 'python ../wordcount.py ../../data/moby_dick.txt ../../results/moby.dat'
If you run snakemake now, nothing will happen, since snakemake will by default choose the firt rule, ignoring the others. to change this you need to ru the following command instead:
snakemake --cores all ../../results/moby.dat
This will give the following output:
Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job stats:
job count
--------------------- -------
count_words_moby_dick 1
total 1
Select jobs to execute...
Execute 1 jobs...
[Sun Feb 18 16:14:46 2024]
localrule count_words_moby_dick:
input: ../data/moby_dick.txt
output: ../data/moby.dat
jobid: 0
reason: Missing output files: ../data/moby.dat
resources: tmpdir=/tmp
[Sun Feb 18 16:14:46 2024]
Finished job 0.
1 of 1 steps (100%) done
Complete log: .snakemake/log/2024-02-18T161446.301442.snakemake.log
Nothing to be Done and MissingRuleException#
As you might have seen earlier, if the output file already exists snakemake will return the sentence: ‘nothing to be done
’.
But if you try to invoke a rule that does exists, the missing rule exception will be triggered. This looks like this:
$ snakemake what.dat
MissingRuleException:
No rule to produce what.dat (if you use input functions make sure that they
don't raise unexpected exceptions).
You need check the spelling of what.dat
to fix this.
Remove all Rule#
One could also create a rule that deletes all output, this could look like the following:
# delete everything so we can re-run things
rule clean:
shell: 'rm -f ../../results/*.dat'
Dependencies#
Often workflows have dependencies or files that need be created before running a certain rule. In the following example, the rule dats
relies on the input files dracula.dat and moby_dick.dat.
rule dats:
input:
'../../results/dracula.dat',
'../../results/moby_dick.dat'
If you run snakemake now, snakemake will first check whether or not the input files exist and if not snakemake will look for rules that generate the input files and run them. It is important to note that dependencies must form a directed acyclic graph. A target cannot depend on a dependency which itself, or one of its dependencies, depends on that target.
The output of
snakemake --cores 1 dats
will look like this
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 count_words
1 count_words_moby_dick
1 dats
3
rule count_words_moby_dick:
input: ../../data/moby_dick.txt
output: ../../results/moby_dick.dat
jobid: 1
Finished job 1.
1 of 3 steps (33%) done
rule count_words:
input: ../../data/dracula.txt
output: ../../results/dracula.dat
jobid: 2
Finished job 2.
2 of 3 steps (67%) done
localrule dats:
input: ../../data/dracula.dat, ../../data/moby_dick.dat
jobid: 0
Finished job 0.
3 of 3 steps (100%) done
While the snakefile should look as follows:
rule dats:
input:
'../../results/dracula.dat',
'../../results/moby_dick.dat'
# delete everything so we can re-run things
rule clean:
shell: 'rm -f ../../results/*.dat'
# Count words in one of the books
rule count_words:
input: '../../data/dracula.txt'
output: '../../results/dracula.dat'
shell: 'python ../wordcount.py ../../data/dracula.txt ../../results/dracula.dat'
rule count_words_moby_dick:
input: '../../data/moby_dick.txt'
output: '../../results/moby_dick.dat'
shell: '../python wordcount.py ../../data/moby_dick.txt ../../results/moby_dick.dat'
the directed graph of the dependencies looks like this:
Fig. 67 Dats Graph#
Debugging#
At this point, it becomes important to see what snakemake is doing behind the scenes. What commands is snakemake actually running? Snakemake has a special option (-p), that prints every command it is about to run. Additionally, we can also perform a dry run with -n. A dry run does nothing, and simply prints out commands instead of actually executing them. Very useful for debugging!
snakemake clean
snakemake -n -p dracula.dat
Building DAG of jobs...
Job counts:
count jobs
1 count_words
1
rule count_words:
input: ../../data/dracula.txt
output: ../../results/dracula.dat
jobid: 0
python ../wordcount.py ../../data/dracula.txt ../../results/dracula.dat
Job counts:
count jobs
1 count_words
1
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
**Before you continue with the book below, it is advised to first look at the exercise: Write two new rules since it would come in handy for the next section.
2 Wildcards#
When you have completed the exercise, you notice that there is a lot of duplications and repetitions. It is always good practice to not repeat yourself. Let’s start from the top on go rule by rule and refactor the snakefile.
rule zipf_test:
input: '../../results/dracula.dat','../../results/frankenstein.dat', '../../results/sherlock_holmes.dat'
output: '../../results/results.txt'
shell: 'python ../zipf_test.py ../../results/frankenstein.dat ../../results/dracula.dat ../../results/sherlock_holmes.dat > ../../results/wildcards_results.txt'
Here we can shorten the shell command by using variables.
This could look like the following, were we add an {input}
and {output}
wildcards.
rule zipf_test:
input: '../../results/dracula.dat', '../../results/frankenstein.dat', '../../results/sherlock_holmes.dat'
output: '../../results/wildcards_zipf_results.txt'
shell: 'python ../zipf_test.py {input} > {output}'
Handling dependencies differently#
For many rules, we will need to make finer distinctions between inputs. It is not always appropriate to pass all inputs as a lump to your action. For example, our rules for .dat
use their first (and only) dependency specifically as the input file to wordcount.py
. If we add additional dependencies (as we will soon do) then we don’t want these being passed as input files to wordcount.py
: it expects just one input file.
Let’s see this in action. We need to add wordcount.py as a dependency of each of our data files so that the rules will be executed if the script changes. In this case, we can use {input[0]}
to refer to the first dependency, and {input[1]}
to refer to the second:
rule count_words:
input: '../wordcount.py', '../../data/dracula.txt'
output: '../../results/dracula.dat'
shell: 'python {input[0]} {input[1]} {output}'
Alternatively, we can name our dependencies:
rule count_words_frankenstein:
input:
cmd='wordcount.py',
book='data/frankenstein.txt'
output: 'frankenstein.dat'
shell: 'python {input.cmd} {input.book} {output}'
Let’s mark wordcount.py
as updated, and re-run the pipeline:
touch wordcount.py
snakemake
Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job stats:
job count
------------------------ -------
count_words 1
count_words_frankenstein 1
count_words_sherlock 1
zipf_test 1
total 4
Select jobs to execute...
Execute 3 jobs...
localrule count_words:
input: ../wordcount.py, ../../data/dracula.txt
output: ../../results/dracula.dat
jobid: 1
reason: Missing output files: ../../results/dracula.dat
resources: tmpdir=/tmp
localrule count_words_frankenstein:
input: ../wordcount.py, ../../data/frankenstein.txt
output: ../../results/frankenstein.dat
jobid: 2
reason: Missing output files: ../../results/frankenstein.dat
resources: tmpdir=/tmp
localrule count_words_sherlock:
input: ../wordcount.py, ../../data/sherlock_holmes.txt
output: ../../results/sherlock_holmes.dat
jobid: 3
reason: Missing output files: ../../results/sherlock_holmes.dat
resources: tmpdir=/tmp
Finished job 2.
1 of 4 steps (25%) done
Finished job 3.
2 of 4 steps (50%) done
Finished job 1.
3 of 4 steps (75%) done
Select jobs to execute...
Execute 1 jobs...
localrule zipf_test:
input: ../../results/dracula.dat, ../../results/frankenstein.dat, ../../results/sherlock_holmes.dat
output: ../../results/wildcards_zipf_results.txt
jobid: 0
reason: Missing output files: ../../results/wildcards_zipf_results.txt; Input files updated by another job: ../../results/frankenstein.dat, ../../results/dracula.dat, ../../results/sherlock_holmes.dat
resources: tmpdir=/tmp
Finished job 0.
4 of 4 steps (100%) done
Complete log: .snakemake/log/2024-04-02T154614.819856.snakemake.log
In case a *.dat file such as dracula.dat
does already exist, that pipeline will not be triggered.
Intuitively, we should also add wordcount.py
as dependency for results.txt
, as the final table should be rebuilt if we remake the .dat files. However, it turns out we don’t have to! Let’s see what happens to results.txt
when we update wordcount.py
:
touch wordcount.py # or make a change / save it so that it has a newer date
snakemake ../../results/results.txt
The whole pipeline is triggered, even the creation of the results.txt
file! To understand this, note that according to the dependency graph, results.txt
depends on the .dat
files. The update of wordcount.py
triggers an update of the *.dat files. Thus, Snakemake sees that the dependencies (the .dat
files) are newer than the target file (results.txt
) and it therefore recreates results.txt
. This is an example of the power of Snakemake: updating a subset of the files in the pipeline triggers rerunning the appropriate downstream steps.
3 Patterns#
Our Snakefile still has a ton of repeated content. The rules for each .dat file all follow a consistent pattern. We can replace these rules with a single pattern rule which can be used to build any .dat
file from a .txt
file in data/
:
rule count_words:
input:
cmd='wordcount.py',
book='data/{book}.txt'
output: '{book}.dat'
shell: 'python {input.cmd} {input.book} {output}'
Here {book}
is an arbitrary wildcard that we can use as a placeholder for any generic book to analyze. Note that we don’t have to use {book}
as the name of our wildcard - it can be anything we want!
This rule can be interpreted as: “In order to build a file named [something].dat
(the target) find a file named data/[that same something].txt
(the dependency) and run wordcount.py [the dependency] [the target]
.”
Update your Snakefile now#
Replace all your count_words
rules with the given pattern rule now.
Let’s test the new pattern rule. We use the -p option to show that it is running things correctly:
snakemake clean
snakemake -p dats
We should see the same output as before. Note that we can still use snakemake to build individual .dat
targets as before, and that our new rule will work no matter what stem is being matched.
snakemake -p dracula.dat
which gives the output below:
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 count_words
1
rule count_words:
input: ../wordcount.py, ../../data/dracula.txt
output: ../../results/dracula.dat
jobid: 0
wildcards: file=dracula
python ../wordcount.py ../../data/dracula.txt ../../results/dracula.dat
Finished job 0.
1 of 1 steps (100%) done
Using wildcards continued#
Our arbitrary wildcards like {book}
can only be used in input: and output: fields. They cannot be used directly in actions. If you need to refer to the current value of a wildcard in an action you need to qualify it with wildcards.. For example: {wildcards.file}.
Running Pattern Rules Note that although Snakemake lets you execute a non-pattern rule by name, such as snakemake clean, you cannot execute a pattern rule this way:
snakemake count_words
Building DAG of jobs...
WorkflowError:
Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.
As the error message indicates, you need to ask for specific files. For example,
snakemake ../../results/dracula.dat.
Our Snakefile is now much shorter and cleaner:
# generate summary table
rule zipf_test:
input: '../../zipf_test.py', '../../results/sherlock_holmes.dat', '../../results/frankenstein.dat', '../../results/dracula.dat'
output: '../../results/wildcard_2_results.txt'
shell: 'python {input[0]} {input[1]} {input[2]} {input[3]} > {output}'
rule dats:
input: '../../results/dracula.dat', '../../results/frankenstein.dat', '../../results/sherlock_holmes.dat'
# delete everything so we can re-run things
rule clean:
shell: 'rm -f ../../results/*.dat ../../results/wild*.txt'
# count words in one of our "books"
rule count_words:
input:
cmd='../wordcount.py',
book='../../data{book}.txt'
output: '../../results{book}.dat'
shell: 'python {input.cmd} {input.book} {output}'
Now all that is left to do is update you snakefile
4 Snakefiles are Python code#
Despite our efforts, our pipeline still has repeated content,
for instance the names of input and output files
(dependencies and targets).
Our zipf_test
rule, for instance, is extremely clunky.
What happens if we want to analyze data/frankenstein.txt
as well?
We’d have to update everything!
rule zipf_test:
input: 'zipf_test.py', 'frankenstein.dat', 'jane_eyre.dat', 'dracula.dat'
output: 'results.txt'
shell: 'python {input[0]} {input[1]} {input[2]} {input[3]} > {output}'
Let’s try to improve this rule. One thing you’ve probably noticed is that all of our rules are using Python strings. Other data structures work too - let’s try a list:
rule zipf_test:
input:
cmd='../zipf_test.py',
dats=['../../results/dracula.dat', '../../results/sherlock_holmes.dat', '../../results/frankenstein.dat']
output: '../../results/results.txt'
shell: 'python {input.cmd} {input.dats} > {output}'
After updating your rule, run snakemake clean
and snakemake -p
to confirm
that the pipeline still works.
Named Dependencies#
Note that we also had to switch to using named dependencies. This was required since the first input, zipf_text.py
, should not be in the list of input files.
Inputs: named vs indexed?#
Having seen the use of both named and indexed dependencies, which approach do you prefer? Which approach do you think leads to Snakefiles that are easier to read and maintain?
The use of a list for the input files illustrates a key feature of Snakemake: Snakefiles are just Python code.
We can make our list into a variable to demonstrate this. Let’s create the
global variable DATS and use it in our zipf_test
and dats
rules:
DATS=['../../results/frankenstein.dat', '../../results/sherlock_holmes.dat', '../../results/dracula.dat']
# generate summary table
rule zipf_test:
input:
cmd='zipf_test.py',
dats=DATS
output: 'results.txt'
shell: 'python {input.cmd} {input.dats} > {output}'
rule dats:
input: DATS
Great! One more step towards reducing code duplication. Now there is just one place to update the list of files to process.
Update your Snakefile#
Update your Snakefile with the DATS
global variable.
Try recreating both the dats
and results.txt
targets
(run snakemake clean
in between).
Solution#
See Solution on Github for the full Snakefile. Otherwise, just refer to the code extracts above and modify your own file.
When are Snakefiles executed?#
The last example illustrated that we can use arbitrary Python code in our Snakefile. It’s important to understand when this code gets executed. Let’s add a print statement to the top of our Snakefile:
print('Snakefile is being executed!')
DATS=['frankenstein.dat', 'jane_eyre.dat', 'dracula.dat']
# generate summary table
rule zipf_test:
input:
# more output below
Now let’s clean up our workspace with snakemake clean
:
snakemake clean
Snakefile is being executed!
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 clean
1
rule clean:
jobid: 0
Finished job 0.
1 of 1 steps (100%) done
Now let’s re-run the pipeline…
snakemake
Snakefile is being executed!
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
3 count_words
1 zipf_test
4
rule count_words:
input: wordcount.py, data/jane_eyre.txt
output: data/jane_eyre.dat
jobid: 3
wildcards: file=last
Finished job 3.
1 of 4 steps (25%) done
rule count_words:
input: wordcount.py, data/frankenstein.txt
output: data/frankenstein.dat
jobid: 1
wildcards: file=abyss
Finished job 1.
2 of 4 steps (50%) done
rule count_words:
input: wordcount.py, data/dracula.txt
output: data/dracula.dat
jobid: 2
wildcards: file=isles
Finished job 2.
3 of 4 steps (75%) done
rule zipf_test:
input: zipf_test.py, data/frankenstein.dat, data/jane_eyre.dat, data/dracula.dat
output: results.txt
jobid: 0
Finished job 0.
4 of 4 steps (100%) done
Let’s do a dry-run:
snakemake -n
Snakefile is being executed!
Nothing to be done.
In every case, the print()
statement ran before any of the actual pipeline
code. What we can take away from this is that Snakemake executes the entire
Snakefile every time we run snakemake
, even for a dry-run. Because of this
we need to be careful and only put tasks that do “real work” (changing files
on disk) inside rules.
Common tasks, such as building lists of input files that will be reused in multiple rules are a good fit for Python code that lives outside the rules.
Is your print
output appearing last?#
On some systems, output is buffered. This means that nothing is actually output until the buffer is full. While this is more efficient, it can delay the output from the
In my testing on Windows using the combination of Git Bash and Anaconda, the
print("Snakefile is being executed!", flush=True)You should then see the printed text before the Snakemake output, confirming that this code executes first.
Using functions in Snakefiles#
In our example here, we only have 4 books (and just 3 are being processed).
But what if we had 700 books to be processed? It would be a massive effort to
update our DATS
variable to add the name of every single book’s
corresponding .dat
filename.
Fortunately, Snakemake ships with several functions that make working with
large numbers of files much easier. The two most helpful ones are
glob_wildcards()
and expand()
. Let’s start a Python session to see how
they work.
This can be done in any Python environment#
You can use any Python environment for the following code exploring
expand()
andglob_wildcards()
. The standard Python interpreter, ipython, or a Jupyter Notebook. It’s up to personal preference and what you have installed.On Windows, calling
python
from Git Bash does not always work. It is better to use the Anaconda start menu entries to run a Python prompt and then runpython
from there.Make sure you change to your Snakefile directory before launching Python.
In this example, we will import these Snakemake functions directly in our Python session.
Importing is not required in a Snakefile#
You don’t need to import the Snakemake utility functions within your Snakefile - they are always imported for you.
So in your chosen Python environment, run the following:
from snakemake.io import *
Generating file names with expand()#
The first function we’ll use is expand()
. expand()
is used quite
literally, to expand snakemake wildcards into a set of filenames:
expand('folder/{wildcard1}_{wildcard2}.txt', wildcard1=['a', 'b', 'c'], wildcard2=[1, 2, 3])
['folder/a_1.txt',
'folder/a_2.txt',
'folder/a_3.txt',
'folder/b_1.txt',
'folder/b_2.txt',
'folder/b_3.txt',
'folder/c_1.txt',
'folder/c_2.txt',
'folder/c_3.txt']
In this case, expand()
created every possible combination of filenames from
the two wildcards. Nice! Of course, this still leaves us needing to get the
values for wildcard1
and wildcard2
in the first place.
Get wildcard values with glob_wildcards()#
To get a set of wildcards from a list of files, we can use the
glob_wildcards()
function. It matches the given pattern against files
on the file system, returning a named tuple containing all the matches. Let’s
try grabbing all of the book titles in our books
folder:
glob_wildcards('data/{example}.txt')
Wildcards(example=['dracula', 'jane_eyre', 'frankenstein', 'sherlock_holmes'])
In this case, there is only one wildcard, {example}
.
We can extract the values for name by getting the example
property from the output of glob_wildcards()
:
glob_wildcards('data/{example}.txt').example
['dracula', 'jane_eyre', 'frankenstein', 'sherlock_holmes']
Using Python code as actions#
One very useful feature of Snakemake is the ability to execute Python code
instead of just shell commands. Instead of shell:
as an action, we can use
run:
instead.
Add the following to your snakefile:
# at the top of the file
import glob
import os
# add as the last rule (we don't want it to be the default)
rule print_book_names:
run:
print('These are all the book names:')
for book in glob.glob('../../data/*.txt'):
print(book)
Upon execution of the corresponding rule, Snakemake runs our Python code
in the run:
block:
snakemake --quiet print_book_names
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 print_book_names
1
rule print_book_names:
jobid: 0
These are all the book names:
data/dracula.txt
data/jane_eyre.txt
data/frankenstein.txt
data/sherlock_holmes.txt
Finished job 0.
1 of 1 steps (100%) done
Note the
--quiet
option
--quiet
or-q
suppresses a lot of the rule progress output from Snakemake. This can be useful when you just want to see your own output.
Cleaning House#
It is common practice to have a clean
rule that deletes all intermediate
and generated files, taking your workflow back to a blank slate.
We already have a clean
rule, so now is a good time to check that it
removes all intermediate and output files. First do a snakemake all
followed
by snakemake clean
. Then check to see if any output files remain and add them
to the clean rule if required.
Next steps#
It would be best if you start doing some of the exercises now, especially the exercises related to working with files and outputs before going to the next section! TODO insert link