How to Backup Big Amount of Data - Simple Strategy
How to Backup Big Amount of Data - Simple Strategy

In this video I will share with you my ways to backup and store big amount of data. And I might say that it's not an easy task and this video is my long road to get a storage backup system which fits my need. Also make sure to watch this video until the end because as a bonus I will share with you my way of backuping data everywhere. So let's jump right into it.

I think everybody nowadays have something to store. It can be project, photos, videos, music, films. We are getting better quality every year and the size of files is getting bigger and bigger. It's completely normal to watch videos in 4k or make photos 50 mb each. But I may be problematic to store all this things efficiently and cheap.


Storing projects

As a programmer I have quite a lot of projects to store. Luckily we have a standard way to version and store projects with Git in remote repositories. For example I prefer Gitlab where you can store and access private projects from free everywhere but Github now also have private repositories for free. And just for you to know you can store whatever you want there. Not only project files but documents, photos and even videos. For example I'm storing my documents with Git and I'm 100% sure that I don't loose anything because of versioning out of the box. But the main disadvantage is that the total size of repository can't be more than 10gb normally. It's completely fine for projects or documents but not for music, videos or photos.


Also important point here is not to believe that if you store you data anywhere they are save and backuped there. You can remove them by mistake yourself or company can loose your data. So don't forget to backup your projects and documents to different places at least once per month.


The main problem if we talk about some services like google disk, icloud, dropbox, you name it they start be expensive relatively fast. So even storing a 1 terabite of data will costs 10$ and more per month. Also this data is not always mirrored locally which means it's not that comfortable to work with download/upload them if you need them often.


Storing locally

This is why the cheapest way to store data in locally by yourself.

When we talk about storing something locally lots of people just think that it is fine to store data in their computer drive. This is why they buy the biggest drive possible even SSD inside their machine. With this approach we have several problems. First of all if your drive dies you loose all your data. Secondly if your drive is full, your machine will be lagging. And the last one you are paying for the expensive SSD drive to store data and this is not exactly what SSD drives are used for.

So for you machine you need an SSD to make fast reads and writes. Which means the good idea is to by the smallest drive with enough additional capacity for programs that you need. I typically use 256 gb SSD drive just to be on the safe side but never bigger.


A small reminder here if you are doing any work which needs both fast storage and capacity the correct way is to use an external SSD drives for this. In this case your projects are separated from SSD drive of your working machine and you can easily move your project from one machine to anything. I'm using Samsung T5 1TB (100 dollars) which is working amazing for this purpose.


So to store data locally, doesn't matter where you need to buy HDD drives which are cheaper, slower but are fine for storing data. The solution that I used long time are wd elements portable 1tb (60$). If you don't have lots of data they are fine until you hit a terabyte of the space. But it doesn't make sense to have several of them because it's difficult to backup them, keep information in sync and support it. You always need to think at was disk that project that I need now is.


Also important point which not all people know is that drives don't work forever. The typical lifespan of HDD is 3-5 years and SSD lifespan is around 10 years. Which means it's not the question if your drives will fail but when. Which means you shouldn't have only 1 copy of your data (doesn't matter if it's HDD or SSD) and you should have a strategy what you will do then it happens.


One thing which helps with this problem but goes more expensive is RAID array. If you don't know what is RAID this is several disks (typically HDD) which have redundant data on them. This serves 2 purposes: first of all if you lose 1 drive you still have your data and you can rebuild them on new disk again. Secondly if you want to access your data fast you can do it only with SSD. But you can't get terabytes of storage with SSD because it quickly get's super expensive. RAID array allows to make read and write operations faster.

So my solution here is a NAS with RAID array inside. So NAS is network attached storage. Which means this is an additional computer with disks inside entirely focused on storing lots of data. But it's a more expensive solution because you need to buy not only NAS but also several disks inside. In my case NAS costs 500$ and 4 disks with 4 terabytes cost 100$ each. But this allow me to store lots of data locally and access them fast through the local network.


Backup

Now you know enough about storage solutions so let's talk about backup.

The basic rule of backups is 3-2-1 strategy. You need to have 3 copies of data: 1 production data and 2 backup copies on different drives. One of this backup copies should not be in the same place like other 2 because if you have some disaster it will be your only possibility to restore your data.

So here is my workflow: I'm working on external SSD and when my project of files are ready I move them to NAS for long term storage (This is my 1 copy). A lot of people think that RAID is a backup because you have disk redundancy there. But it's not a backup at all because you just store the current version of your files and you don't have any history or revisions. Which means if you delete something it is gone forever and to can't get it back. Also I have a local backup drive where I backup the whole NAS once a day (This is my second copy). I also need 1 backup off-site in the case of disaster not in my house. For this I am using Backblaze service. This is relatively cheap solution for storing lots of data which quite fast upload and download. For each terabyte of data I'm paying 5$ per month which is the cheapest that I saw.

So I one of my disks in RAID dies I just put there the new one. If my NAS dies completely I have 1 more local backup. If everything is burned, flooded or stolen I still can download all my data from Backblaze.

Bonus

So here is a bonus that I promised. The relatively easy way to manage all this drives and backups. The main problem here is that we don't want to only mirror the data because if don't help us if you delete something. We want to store history and all data that were changed at least for a month.

I'm using for this Arq backup program. It's not free and costs 50$ but it's stable and proofed solution which allows to backup folders, drives, smb drives, and much more to different places. The most important part is that it's not a mirror but a backup system which means I can restore files from previous versions which saved me already several times.

So my workflow is that I copy by myself in NAS everything that I need. Then just with 1 click I can start backing it up to additional drive. I'm doing it by hands and not by schedule because the backup drive is not normally turned off to avoid ransom attack and is only on when I want to mark a backup. Also I have a scheduled backup of my NAS to Backblaze every night. This means that in 1 click I can backup my NAS everywhere where I want.

As you can see storing lots of data and backing them up is not an easy task. And I changed a lot in my workflow during the years. What I have now I a relatively simple workflow which can be scaled indefinitely.

Also if you want to improve your programming skill I have lots of full courses regarding different web technologies.