Switch away from Azure for mass-storage backups #37
Labels
No Label
Breaking
Domain
komu.boo
Domain
langrock.info
Domain
libre.moe
Involves
Documentation
Involves
Security
Involves
Testing
Kind
Bug
Kind
Enhancement
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Service
Drone
Service
Element
Service
Gitea
Service
Matrix
Service
Nextcloud
Service
Szuru
Status
Abandoned
Status
Acknowledged
Status
Blocked
Status
Duplicate
Status
Invalid
Status
Need More Info
Status
Won't Fix
No Milestone
No project
No Assignees
1 Participants
Notifications
Total Time Spent: 1 hour 12 minutes
Due Date
lukas
1 hour 12 minutes
No due date set.
Dependencies
No dependencies set.
Reference: KomuSolutions/igot99issues#37
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We are currently using Azure's Archive tier for long term backups.
This was originally chosen as GRS provides incredible durability and the archive tier is incredibly cheap at $0.99/TB (LRS, Sweden) and $2.28/TB (GRS, Sweden).
Unfortunately, GRS requires very expensive replication, which would cost almost more than the storage for ~3,5TB of data with a couple hundred GBs changing per month. Additionally, putting objects into
Archive
-tier viarclone
uplaods toHot
and then archives, doubling the API calls and also adding some cents forHot
-storage on top of it.And on top of all this, restoring and downloading is very expensive. Because Azure's
Archive
-tier is offline tape storage, when we need to restore a file, it first needs to be rehydrated, which takes a couple of hours to days - and costs money. Rehydrating 3,5 TiB costs ~100€ + API calls for 1 million objects costs ~720€ (maybe even 2x because each object needs to be restored and then downloaded from a higher tier) + bandwidth costs for downloading the data costs ~250€ + taxes for everything.Tl;Dr: Azure is only usable for restoring single files, because otherwise I'd go bankrupt.
Currently evaluting GCP:
Their Archive`-tier is actually not offline. It's slower (but still measured in ms), but who cares, there are no rehydration fees or long waiting times. While storage is a little bit more expensive at $1.20/TB + ~160€ data retrieval + ~260€ download, API calls are way cheaper for me at ~50€ for one million requests. That means most of the cost compared to Azure vanishes and a full restore is actually realistic - very expensive at ~470€ + taxes, but something I can pay in case both my main and mirror storage completely fails.
Additionally, this adds another wanted capability: GCP is always zone-redundant. While their docs don't go into many details, unlike Azure who store data flagged as LRS (the cheapest tier) in only one AZ, GCP makes sure that data is at least stored in two different AZs.
In my books, this makes GCP more durable than Azure and also not worried about data loss. While unlikely, Azure could lose data due to a DC burning down or getting flooded. The probability that two DCs (in two AZs) go down in a short timeframe is super unlikely, which makes this backup even suitable for data only stored in B2 as primary storage (I plan on doing more directly within object storage in the future).
I usually require data existing redundantly in three locations for all non-temporary data on libre.moe. When using object storage directly, usually this means that data exists only as one redundant copy in B2 and three copies in Azure in one physical location - meaning data exists only in two locations and any kind of failure/error of B2 or Azure means that data is at great risk as one local emergency could mean the loss of all data.
With GCP and data being in at least two AZs (let's simplify this to Google keeping two copies in two locations), this means that data living directly in B2 is still following 3-2-1 (1x on B2, 2x GCP) and also my requirement of three locations (B2, GCP-AZ1, GCP-AZ2)
Related to this is also #30, as Storj or Scaleway Glacier could be good options, especially because of their low storage and API/bandwidth cost.
I decided to skip the 3rd cloud copy and resort to just keeping Seafile backups on a local disk at my house. Since this is the emergency backup for when both Hetzner's Storage Box and Backblaze B2 fails, keeping just one copy without redundancy is a valid option - especially since a full recovery of all data from Azure is financially impossible.
While the backup of other libre.moe services can continue to stay on Azure due to it's object locking feature, this was not feasible for Seafile in the first place, as Seafile data is incrementally backed up and locking individual files would be very transactionally expensive and complicated to set up. A local HDD will be the best option as it will be literally air gapped from the internet after the backup is done.
Now, while this is a manual process, it's not too difficult and also less important, as this backup is likely never used.
If I switch away from Hetzner's Storage Box and decide to keep Seafile directly on object storage, I'll add Storj DCS into the mix, as it's likely very comparable to Hetzner in terms of price for actually used GBs.
Also, if Seafile goes away from Azure, and it's just used to store all the other backups, I can switch that to ZRS. It'll cost almost nothing and will make Azure pretty resilient against anything. These files be protected from both: accidental deletion and hardware - even datacenter failures.
To-Do
Azure LRS/ZRS could also still be used as an alternative to Storj DCS when used in "Cold" instead of "Archive"-tier, as the API calls for reading 1 million objects would equal around 10€ instead of the 500€ for archived blobs.
Currently, this is not an option as having Hetzner BX + Backblaze B2 + Azure Cold is just too costly and not at all required. As long as Hetzner is the primary location, B2 + a local copy is more than enough, but when Seafile moves to object storage, as second, highly resilient storage is needed. This could be either Storj or Azure, as both theoretically provide a high amount of redundancy.
Switch away from Azure for long-term backupsto Switch away from Azure for mass-storage backupsClosing this as the local copy has now been verified to be working.