Using custom exclusion rules with BackBlaze.

Backblaze has been a great and inexpensive backup solution for me for a little over a year now. I started with just having it on my HTPC but recently installed in on my work PC as well. After starting my work PC backup I notice it has one rather large omission in it's settings dialog: You cannot exclude anything but specific folders or specific extensions.

The problem

One of the folders I want to back up of course is my projects folder that contains all the different websites/applications I'm working on. Most of these take advantage of various package management utilities like composer, npm, bower.

These package management solutions make developing super easy but also bring in a ton of files from third party libraries. As these files can always be easily reinstalled it's a bit silly to be wasting bandwidth and CPU time backing up these files.

After going into the BackBlaze settings to exclude them I realized that the only way to do it was to add every folder individually for each project. That is entirely unreasonable when one has many projects and may add/remove projects at any time.

What is needed is a way to say "Exclude any folder that is named node_modules" for example. BackBlaze claims this is possible in one of their twitter posts but from what I saw in the GUI it is not. After a quick google search however I found it is possible, but you have to edit one of their XML configuration files to accomplish the task.

It's great that one can create tailored rules this way, but not having this functionality built into the GUI is a major fail on their part in my opinion. I'm not saying they need to expose all the functionality of these custom rules in the GUI but the ability exclude folders by a simple name really should be there as it's likely a common task, especially for developers such as myself.

The solution

Creating a custom rule requires editing their bzexcluderules_editable.xml configuration file which is stored in their program data directory. For windows this is located at %ProgramData%\Backblaze\bzdata.

Open this file in any text editor. Within this file you'll find the following line:

<!-- This block are the optional Windows excludes the customer is allowed to edit    -->

Below this line is where you can add your own custom rules. Copying one of the existing rules and modifying it is the easiest way to get started. There are tips at the top of the file and in their help topic regarding best practices one should follow to ensure backups run smoothly.

The rules I wanted to create and why were:

  1. Exclude all node_modules, bower_components, and vendor folders. These folders can be huge and contain tons of tiny files. As these can easily be reinstalled using the associated package manager there is no reason for them to be backed up.
  2. Exclude all .svn folders. The contents of these folders can easily be recovered by checking out a copy of the repository again. Nothing important is stored here

To accomplish these tasks I created the following rules:

<excludefname_rule plat="win" osVers="*"  ruleIsOptional="t" skipFirstCharThenStartsWith=":\Users\" contains_1="\node_modules\" contains_2="*" doesNotContain="*" endsWith="*" hasFileExtension="*" />
<excludefname_rule plat="win" osVers="*"  ruleIsOptional="t" skipFirstCharThenStartsWith=":\Users\" contains_1="\vendor\" contains_2="*" doesNotContain="*" endsWith="*" hasFileExtension="*" />
<excludefname_rule plat="win" osVers="*"  ruleIsOptional="t" skipFirstCharThenStartsWith=":\Users\" contains_1="\bower_components\" contains_2="*" doesNotContain="*" endsWith="*" hasFileExtension="*" />
<excludefname_rule plat="win" osVers="*"  ruleIsOptional="t" skipFirstCharThenStartsWith=":\Users\" contains_1="\.svn\" contains_2="*" doesNotContain="*" endsWith="*" hasFileExtension="*" />

Since all my projects sit under my user folder I include their standard :\Users\ prefix for quick filtering. After that the contains_1 attribute specifies the different folder names surrounded by backslashes. The backslashes ensure that it only excludes the folders and not files that might happen to match. The rest of the filters are set to * because they do not apply.

After applying these rules the size of my initial backup was reduced by 200+ thousand files. Since these are generally small files the savings in bandwidth/storage space is not that significant in the grand scheme of things but it does save a lot of time during the initial backup. It'll also save time and bandwidth in the future by not having to deal with changes to these files.

Extra info

Their documentation on what all of these attributes mean is a bit lacking, but it's not too hard to deduce from context and examples how they probably work.

Specifies what platform the rule applies to. win for windows and mac for MacOS
Specifies what version of the platform the rule applies to. A list of valid values can be found in the comments at the top of the file. Generally you'd probably just set this to * for your custom rules.
I really have no idea what this is for. It appears to be a boolean field, but all the examples set it to t so I did as well.
Matches a prefix against the file path. This skips over the drive letter so you only specify the path after that.
I believe this is a first-pass content filter and should be as specific as possible.
This I think is a second pass content filter to further narrow down matches from the first filter. This would allow you do exclude something like *\Projects\*\cache\*.
Allows a way to essentially re-include files that might otherwise match the other filters.
Matches against the end of a file path. Use this if you need to match a specific filename.
Similar to endsWith but matches a specific extension and runs faster. Use this for matching an extension

They have some recommendations in the comments and help article about how best to structure your rules to ensure everything runs smoothly.

Specifying a prefix using skipFirstCharThenStartsWith should be done if possible. This allows BackBlaze to quickly and efficiently narrow down the list of files that might apply to the rule to a much smaller subset.

Second specify a hasFileExtension rule if it applies. BackBlaze has special handling of extensions that allows it to quickly narrow the list of possible matches based on this filter. Keep in mind that their definition of the extension is the string of text after the last dot provided there are no spaces. For a multi-extension file like .tar.gz. you'd have to specify just gz as the extension and use endsWith to further narrow the results later.

Lastly use contains_1, contains_2, doesNotContain and endsWith to further narrow the filter as desired. These are going to be the slowest filters so if it's possible to avoid using them you should.