Infrastructure v2 #53
Labels
No Label
Breaking
Domain
komu.boo
Domain
langrock.info
Domain
libre.moe
Involves
Documentation
Involves
Security
Involves
Testing
Kind
Bug
Kind
Enhancement
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Service
Drone
Service
Element
Service
Gitea
Service
Matrix
Service
Nextcloud
Service
Szuru
Status
Abandoned
Status
Blocked
Status
Duplicate
Status
Invalid
Status
Need More Info
Status
Planned
Status
Won't Fix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: KomuSolutions/igot99issues#53
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This issue tracks a breaking change of libre.moe's server infrastructure. Currently, there are arael and armisael as servers behind libre.moe. During the runtime and maintenance of this setup, I came to realize a few things about server administration and am looking to provision a new infrastructure, build for higher stability, better scaling and better cost efficiency.
New Servers
Sahaquiel
This will be the "core" server, of which most others will be dependent upon and which will only host a few services to maintain high stability and performance.
It will provide a central database for all other services, as well as host the authentication, a new LDAP user directory, Gitea and the Vaultwarden and Mumble instances.
Ireul
Ireul will serve most of the other services, including Drone, Wiki, OnlyOffice, Nextcloud, Szuru, and the libre.moe website.
Leliel
As the name implies, Leliel will serve the art/waifu repo along with Seafile. Analogue to the angel Leliel, the server also symbolized a large unknown space where an unlimited amount of data may reside in, powered by object storage, serving the hosted services with easy scalability, as those services are likely to grow in terms of data storage.
New Policies and Configurations
Transaction email
Emails sent by all services will be routed through Scaleway's transactional email service, as the local relay on Arael will be removed in favor of a regular email service. Due to the nature of email, I want to offload the sending of non-personal mails to a third party at first, while the email system is not yet set up.
Volumes
All local application data will no longer be stored on the sever itself, but instead on a hosted volume, proving triple replication (single-AZ afaik) for the likes of databases, metadata or Gitea stuff, protecting against failure of an entire rack or two, which the local storage is susceptible to.
Volumes will be manually formatted with Btrfs, providing active detection of errors and blocking corrupted data from being read.
Object Storage
Object Storage provides high availability, strong resilience and cost-effective data storage, which is why it will be used for applications storing larger amounts of data.
Most services on libre.moe store data directly in a mounted file system (e.g. like described above), so Hetzner Object Storage will be fused by JuiceFS to present as a local storage mount for applications, enabling cost-effective storage with local NVMe caching for frequently used files.
For applications natively supporting object storage and being used primarily as personal data storage, Backblaze B2 will usually be used instead, as it provides (in my case) large amounts of included data transfer (when accessed directly from outside) at cost-effective pricing. For example: (the (hopefully) soon coming) Blobfisch and Ente Photos will use it as primary storage.
Database dumps and server configurations are also daily mirrored and versioned for a few days on Backblaze B2.
Backups
Log files, as well as user data of a highly temporary nature, may be excluded from all backups.
As all data lives on highly redundant volumes or object stores, backups are largely focussed on providing geo-resilience and versioning in case of issues related to natural disasters, malware, or simply human error. Data is always backed up in the most pure form, not through layers of abstractions like overlay file systems to provide protection against the failure of such a layered system and also enables the real files to be versioned.
Scaleway's Glacier storage in an underground fallout shelter 25m under Paris will be used to store atomic, monthly copies of small files like database dumps or config files and incrementally back up the data of larger folders not suitable for atomic copies from the likes of Szuru, Gitea or Seafile. While servers and data stores are set up in Germany and the Netherlands, this architecture also facilitates multi-regional durability for all data, with the backup being in France.
This backup is on a Glacier-tier, meaning data can not be read directly but instead has to be requested beforehand and can only be accessed once it has been restored from the Glacier, which can take from 12 hours to possibly many days. This is acceptable, as the likeliness of an object store failing to the point where data has been permanently lost is very, very low. Scaleway Glacier is both very resilient in its redundancy, but also protected from many natural disasters, due to being 25m underground in a fallout shelter. The benefit of using this offer is also it's price, at €2/TB (before tax) it is extremely cost-effective, yet it can replace offerings from other vendors because in this architecture, as there is no need for the data to be available instantly, a few days of recovery time is just fine.
Additionally, a local HDD will be asynchronously synchronized with the production data. While there is no redundancy in this third copy, two highly redundant storage systems with one of them being in a nuclear fallout shelter should already provide jaw dropping durability, so this being just a single HDD is very fine.
Too Long; Didn't Read
Notes for future archival services
In case I ever realize my idea of a YT archival service, it will probably just use Storj DCS, as it's cheap, highly resilient and globally distributed. While every downloaded byte is billed, the archival nature would make the demand for downloads very low. The goal is to offer a service with dirt cheap but decently reliable storage, meaning it will not be integrated into the here outlined backup policies.
Tasks
Sahaquiel
Ireul
Leliel
seaf-fsck
to verify nothing is missing, and all files are of integrityMost changes have been completed, and the infrastructure is now running purely on the newly created servers, using block or object storage as backend with backups on B2 and soon also on Scaleway's fallout shelter.
Missing actions