Removing old backup files

July 26, 2012

Following this post what was left was to create a “automatic” method to remove old backups. Basically, I wanted the newsyslog(8) (or logrotate for the linux enthusiasts) functionality to my own custom solution. The previous way was semi-automatic since different requirements needed some manual intervation to prevent from deleting something by mistake.

Following the quick and dirty method to achieve it. I created a really simple script that was taking a “configuration” file that includes the backup directories and after visiting them deletes files that are older from a defined time frame. The script is the following:

#!/bin/sh


while read dir time wctype
do
	if [ $wctype -eq 1 ]
	then
	      num=`ls -l $dir | sed '1d' | grep -v ^d  | wc -l`
	      if [ $num -ge $time ]
	      then
		find "$dir" -mtime +$time -type f -maxdepth 1 -exec rm {} \;
#		find "$dir" -mtime +$time -type f -maxdepth 1 
	      fi
	elif [ $wctype -eq 2 ]
	then
	      files=`ls $dir | cut -d . -f 1 | sort -u`
	      for i in $files
	      do
		      num=`ls -l $dir/$i* |  sed '1d' | grep -v ^d  | wc -l`
		      if [ $num -ge $time ]
		      then
			      find $dir -name "$i*" -mtime +$time -type f -maxdepth 1 -exec rm {} \;
#			      find $dir -name "$i*" -mtime +$time -type f -maxdepth 1 
		      fi
	      done
	fi
done < "$PATHTO/rotate.conf"

And the configuration file is simply

/backup/machine1			10	2
/backup/machine2			10	1

Some comments:

  • In the configuration file the first colum is the directory that the backup files exist, the second column (the number) is the number of days that you want to keep backups and the last column is either 1 or 2 and defines the backup type:
    1. type 1 is if you want to deal with all the files in the directory
    2. type 2 is when the files in the directory have a “structure” and you want to keep from all of them. For example in our case we backup mysql databases and we have the backup file named like
       mysqlDB.$DATE.sql.gz

      . So with type 2 tries to check for each individual file and remove accordingly

  • The script tries to keep a number of backup files in place, if no update happens for a certain time. There have been cases that the backup stopped unexpectedly for quite some time, and by just removing files that are older will eventually destroy all the backups. The number of backups kept in this versions is the number of days that the user requests to keep. The way it is implemented has a number of shortcoming, for example what happens if we have 2 backups per day? Or 1 backup per week? However it is known issue and in the case it is used doesn’t matter much. Furthermore, lets suppose the backup process stopped for example 10 days, and we have kept the 9 previous files. The next day the backup succeeds, so the script runs and removes the previous 9 files, and we are just left with 1. In our case this is not much of an issue, since everything that is older than the number of days and we didn’t explicitly request to keep, can be removed

It is not a perfect solution, but at least keeps the drive from filling up.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: