Switch away from Azure for mass-storage backups #37

Closed
opened 2024-06-02 20:46:56 +02:00 by lukas · 6 comments
Owner

We are currently using Azure's Archive tier for long term backups.

This was originally chosen as GRS provides incredible durability and the archive tier is incredibly cheap at $0.99/TB (LRS, Sweden) and $2.28/TB (GRS, Sweden).
Unfortunately, GRS requires very expensive replication, which would cost almost more than the storage for ~3,5TB of data with a couple hundred GBs changing per month. Additionally, putting objects into Archive-tier via rclone uplaods to Hotand then archives, doubling the API calls and also adding some cents for Hot-storage on top of it.

And on top of all this, restoring and downloading is very expensive. Because Azure's Archive-tier is offline tape storage, when we need to restore a file, it first needs to be rehydrated, which takes a couple of hours to days - and costs money. Rehydrating 3,5 TiB costs ~100€ + API calls for 1 million objects costs ~720€ (maybe even 2x because each object needs to be restored and then downloaded from a higher tier) + bandwidth costs for downloading the data costs ~250€ + taxes for everything.

Tl;Dr: Azure is only usable for restoring single files, because otherwise I'd go bankrupt.

Currently evaluting GCP:

Their Archive`-tier is actually not offline. It's slower (but still measured in ms), but who cares, there are no rehydration fees or long waiting times. While storage is a little bit more expensive at $1.20/TB + ~160€ data retrieval + ~260€ download, API calls are way cheaper for me at ~50€ for one million requests. That means most of the cost compared to Azure vanishes and a full restore is actually realistic - very expensive at ~470€ + taxes, but something I can pay in case both my main and mirror storage completely fails.

Note: This applies only to Seafile data, the rest of libre.moe can probably be restored for under 1€.

We are currently using Azure's Archive tier for long term backups. This was originally chosen as GRS provides incredible durability and the archive tier is incredibly cheap at $0.99/TB (LRS, Sweden) and $2.28/TB (GRS, Sweden). Unfortunately, GRS requires very expensive replication, which would cost almost more than the storage for ~3,5TB of data with a couple hundred GBs changing per month. Additionally, putting objects into `Archive`-tier via `rclone` uplaods to `Hot`and then archives, doubling the API calls and also adding some cents for `Hot`-storage on top of it. And on top of all this, restoring and downloading is very expensive. Because Azure's `Archive`-tier is offline tape storage, when we need to restore a file, it first needs to be rehydrated, which takes a couple of hours to days - and costs money. Rehydrating 3,5 TiB costs ~100€ + API calls for 1 million objects costs ~720€ (maybe even 2x because each object needs to be restored and then downloaded from a higher tier) + bandwidth costs for downloading the data costs ~250€ + taxes for everything. Tl;Dr: Azure is only usable for restoring single files, because otherwise I'd go bankrupt. Currently evaluting GCP: Their Archive`-tier is actually not offline. It's slower (but still measured in ms), but who cares, there are no rehydration fees or long waiting times. While storage is a little bit more expensive at $1.20/TB + ~160€ data retrieval + ~260€ download, API calls are way cheaper for me at ~50€ for one million requests. That means most of the cost compared to Azure vanishes and a full restore is actually realistic - very expensive at ~470€ + taxes, but something I can pay in case both my main and mirror storage completely fails. > Note: This applies only to Seafile data, the rest of libre.moe can probably be restored for under 1€.
lukas added the
Involves
Testing
Priority
Low
Domain
libre.moe
labels 2024-06-02 20:46:56 +02:00
lukas self-assigned this 2024-06-02 20:46:56 +02:00
lukas added this to the Issue Board project 2024-06-02 20:46:56 +02:00
Author
Owner

Additionally, this adds another wanted capability: GCP is always zone-redundant. While their docs don't go into many details, unlike Azure who store data flagged as LRS (the cheapest tier) in only one AZ, GCP makes sure that data is at least stored in two different AZs.

In my books, this makes GCP more durable than Azure and also not worried about data loss. While unlikely, Azure could lose data due to a DC burning down or getting flooded. The probability that two DCs (in two AZs) go down in a short timeframe is super unlikely, which makes this backup even suitable for data only stored in B2 as primary storage (I plan on doing more directly within object storage in the future).

I usually require data existing redundantly in three locations for all non-temporary data on libre.moe. When using object storage directly, usually this means that data exists only as one redundant copy in B2 and three copies in Azure in one physical location - meaning data exists only in two locations and any kind of failure/error of B2 or Azure means that data is at great risk as one local emergency could mean the loss of all data.

With GCP and data being in at least two AZs (let's simplify this to Google keeping two copies in two locations), this means that data living directly in B2 is still following 3-2-1 (1x on B2, 2x GCP) and also my requirement of three locations (B2, GCP-AZ1, GCP-AZ2)

Additionally, this adds another wanted capability: GCP is always zone-redundant. While their docs don't go into many details, unlike Azure who store data flagged as LRS (the cheapest tier) in only one AZ, GCP makes sure that data is at least stored in two different AZs. In my books, this makes GCP more durable than Azure and also not worried about data loss. While unlikely, Azure could lose data due to a DC burning down or getting flooded. The probability that two DCs (in two AZs) go down in a short timeframe is super unlikely, which makes this backup even suitable for data only stored in B2 as primary storage (I plan on doing more directly within object storage in the future). I usually require data existing redundantly in three locations for all non-temporary data on libre.moe. When using object storage directly, usually this means that data exists only as one redundant copy in B2 and three copies in Azure in one physical location - meaning data exists only in two locations and any kind of failure/error of B2 or Azure means that data is at great risk as one local emergency could mean the loss of all data. With GCP and data being in at least two AZs (let's simplify this to Google keeping two copies in two locations), this means that data living directly in B2 is still following 3-2-1 (1x on B2, 2x GCP) and also my requirement of three locations (B2, GCP-AZ1, GCP-AZ2)
Author
Owner

Related to this is also #30, as Storj or Scaleway Glacier could be good options, especially because of their low storage and API/bandwidth cost.

Related to this is also #30, as Storj or Scaleway Glacier could be good options, especially because of their low storage and API/bandwidth cost.
lukas added spent time 2024-06-03 17:15:56 +02:00
1 hour 12 minutes
Author
Owner

I decided to skip the 3rd cloud copy and resort to just keeping Seafile backups on a local disk at my house. Since this is the emergency backup for when both Hetzner's Storage Box and Backblaze B2 fails, keeping just one copy without redundancy is a valid option - especially since a full recovery of all data from Azure is financially impossible.

While the backup of other libre.moe services can continue to stay on Azure due to it's object locking feature, this was not feasible for Seafile in the first place, as Seafile data is incrementally backed up and locking individual files would be very transactionally expensive and complicated to set up. A local HDD will be the best option as it will be literally air gapped from the internet after the backup is done.
Now, while this is a manual process, it's not too difficult and also less important, as this backup is likely never used.

If I switch away from Hetzner's Storage Box and decide to keep Seafile directly on object storage, I'll add Storj DCS into the mix, as it's likely very comparable to Hetzner in terms of price for actually used GBs.

Also, if Seafile goes away from Azure, and it's just used to store all the other backups, I can switch that to ZRS. It'll cost almost nothing and will make Azure pretty resilient against anything. These files be protected from both: accidental deletion and hardware - even datacenter failures.

I decided to skip the 3rd cloud copy and resort to just keeping Seafile backups on a local disk at my house. Since this is the emergency backup for when both Hetzner's Storage Box and Backblaze B2 fails, keeping just one copy without redundancy is a valid option - especially since a full recovery of all data from Azure is financially impossible. While the backup of other libre.moe services can continue to stay on Azure due to it's object locking feature, this was not feasible for Seafile in the first place, as Seafile data is incrementally backed up and locking individual files would be very transactionally expensive and complicated to set up. A local HDD will be the best option as it will be literally air gapped from the internet after the backup is done. Now, while this is a manual process, it's not too difficult and also less important, as this backup is likely never used. If I switch away from Hetzner's Storage Box and decide to keep Seafile directly on object storage, I'll add Storj DCS into the mix, as it's likely very comparable to Hetzner in terms of price for actually used GBs. Also, if Seafile goes away from Azure, and it's just used to store all the other backups, I can switch that to ZRS. It'll cost almost nothing and will make Azure pretty resilient against anything. These files be protected from both: accidental deletion and hardware - even datacenter failures.
Author
Owner

To-Do

  • make local copy of Seafile's data
  • verify that the local copy can be used to successfully deploy a copy of the Seafile server
  • remove azure as backup target from scripts
  • delete storage data from Azure
  • set storage account to ZRS/GRS to improve resiliency of Azure backups
# To-Do - [x] make local copy of Seafile's data - [x] verify that the local copy can be used to successfully deploy a copy of the Seafile server - [x] remove azure as backup target from scripts - [x] delete storage data from Azure - [x] set storage account to ZRS/GRS to improve resiliency of Azure backups
Author
Owner

Azure LRS/ZRS could also still be used as an alternative to Storj DCS when used in "Cold" instead of "Archive"-tier, as the API calls for reading 1 million objects would equal around 10€ instead of the 500€ for archived blobs.
Currently, this is not an option as having Hetzner BX + Backblaze B2 + Azure Cold is just too costly and not at all required. As long as Hetzner is the primary location, B2 + a local copy is more than enough, but when Seafile moves to object storage, as second, highly resilient storage is needed. This could be either Storj or Azure, as both theoretically provide a high amount of redundancy.

  • Storj is globally distributed and can lose more than half of all shards without losing data, while also having no API fees and low download fees (7$/TB)
  • Azure keeps three copies either (using ZRS) in multiple zones and being an industry standard and therefore highly trusted and required to be stable, but charging for all kinds of API calls and having a very costly download bandwidth, which can be voided by sending data by mailing the data on disk for a small charge or using a support ticket under the European Data Act.
Azure LRS/ZRS could also still be used as an alternative to Storj DCS when used in "Cold" instead of "Archive"-tier, as the API calls for reading 1 million objects would equal around 10€ instead of the 500€ for archived blobs. Currently, this is not an option as having Hetzner BX + Backblaze B2 + Azure Cold is just too costly and not at all required. As long as Hetzner is the primary location, B2 + a local copy is more than enough, but when Seafile moves to object storage, as second, highly resilient storage is needed. This could be either Storj or Azure, as both theoretically provide a high amount of redundancy. - Storj is globally distributed and can lose more than half of all shards without losing data, while also having no API fees and low download fees (7$/TB) - Azure keeps three copies either (using ZRS) in multiple zones and being an industry standard and therefore highly trusted and required to be stable, but charging for all kinds of API calls and having a very costly download bandwidth, which can be voided by sending data by mailing the data on disk for a small charge or using a support ticket under the European Data Act.
lukas changed title from Switch away from Azure for long-term backups to Switch away from Azure for mass-storage backups 2024-06-30 17:17:51 +02:00
Author
Owner

Closing this as the local copy has now been verified to be working.

Closing this as the local copy has now been verified to be working.
lukas closed this issue 2024-07-25 20:52:39 +02:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Total Time Spent: 1 hour 12 minutes
lukas
1 hour 12 minutes
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: KomuSolutions/igot99issues#37
No description provided.