[libreoffice-website] Minutes from the Tue May 21 infra call


1. guilhem
2. Brett
3. Cloph


* Gluster status:
+ Found this on excelsior: "got non-linkfile delta-replicate-0:/.shard/….2, gfid = 00000000-0000-0000-0000-000000000000"
+ Some orphaned shards
- No corresponding gfid:
~$ diff <(find /srv/delta/gluster/.shard -type f -printf "%P\\n" | sed 's/\..*//' | sort -u) \
<(getfattr -n glusterfs.gfid.string -e text /mnt/delta/* | sed -nr 's/^glusterfs\.gfid\.string="([^"]*)"$/\1/p' | sort)
- shard link count <2
~$ find /srv/delta/gluster/.glusterfs/[0-9a-f][0-9a-f] -type f -links -2
+ dlarchive: split the 2TiB drive into 4 512GiB Physical Volumes for
better balancing across replicate pairs
+ Redmine: Planned auth backend migration to Single Sign-On (OAuth2)
- Will proceeed after June 30
- Announced by banner + individual mail (for users who last logged in after
- 047 (verified) users logged in the past 90 days, 42 (89%) of which in
LDAP → +6% since announced
- 120 (verified) users logged in since 2018-01-01, 89 (74%) of which in
LDAP → +4% since announced
+ PITR (cf. Brett's message)
- Are tablespaces being used, and is there any plan to use them in future?
PITR behaves in tricky ways when they're in use.
. No they currently are not, and I don't *think* we'll use them in the
foreseeable future.
- Is storage a concern? CPU/disk usage? (i.e. should the WAL logs be
. How much data is being streamed
. Currently we have `ssh $remote 'pg_dump … | gzip' >/path/to/dump.sql.gz`
and this is not causing a bottleneck
. Brett: regular base dumps are still recommended; makes disaster recovery
a lot shorter (don't need to replay since epoch, only since the last
snapshot), possibility to squash and remove dead rows (à la VACUUM)
. are there any database with very low traffic?
. WAL are archived when older than $TIME or bigger than $SIZE;
potentially blows up archive size if the DB is never written to
. gerrit, pad, bugzilla, nextcloud, askbot → data loss really
problematic, lots of writes
. limesurvey, devcentral, download, downloarchives → not often written
to, 24h data loss "acceptable" (can even be rebuilt from zero if
. crashreport → also often written to, but not as critical as bugzilla,
askbot or gerrit
. Brett: timeout can be expressed in minutes not hours, and still be
acceptable from a performance perspective
. Brett: knobs can be tuned later, only one reload/restart (+snapshot?) away
+ Do we want `archive_mode` to be "on" or "always"?
- on
+ Possibly later: move all RDMS to physical machine(s) with SSDs, with a
failover or multi-master replication
- Would lower I/O on the guests
+ Brett: Barman vs. custom-made scripts (would have to maintain configuration too)
- can give Barman a try (available in Debian)
* Next call: Tuesday June 25 2019 at 18:30 Berlin time (16:30 UTC).
→ Note: 4th Tuesday this time!


