[libreoffice-website] Minutes from the Thu Apr 18 infra call

[Rescheduled from Tue Apr 16.]


1. guilhem
2. cloph
3. Brett


* Post mortem March 26 incident
+ Distributed FS
. cloph: not opposing other solution, cloud solution probably not
feasible monetary wise (Brett and guilhem agree, AI guilhem to double
. would require rebalancing (hence changing isolation level) as ordering
50-60 VPS would be very costly
. cloph: a single big issue with Gluster in ~6 (?) years, convenient
scapegoat but not all issues should be attributed to that setup
. ceph was considered at the time, abandoned since ≥6 nodes are (were?)
+ guilhem: would like some SSD-based cache for faster reads; cloph: Gluster
suport tier-based setups, but we're not using that currently, not sure if
there are any slot left in the hypervisors (guilhem to check)
. Another problems is random IO from the crashtest VM (vm138), the disk
image of which resides on charly's /data/fast like gluster's delta
arbiter. The two race and trigger heals
. If there is room for one or more SSD on each hyper we should move the
arbiter (and swap) there
+ cloph: could also compare our current machines with the newer generation
from manitu (125€/month, Intel Xeon E3-1240 v5)
+ rebalance gluster volumes: too many VMs on delta, heals drains IO on too
many services
. delta 2 x (2+1), alpha, gamma, kappa 1 x 3
. cloph: straightforward to change the layout on an existing volume
. can rebalance, can also tune the healing settings so it doesn't starve
all guests
+ Continuous Archiving and Point-in-Time Recovery (PITR)
. replace dump-based database backups with continuous logs
. https://www.postgresql.org/docs/9.6/continuous-archiving.html
. can be dry-run without disruption to the current setup
. Brett: can try out on the [Matrix] host (vm222), push the WAL logs to
itself for now
* OCS-Webserver (new extension site) deployment
+ somewhat delayed due to infra issues
+ succesfully bridged to SSO, some hard-coded third-party requests to chase
and remove still, before opening the site for contributions
* tb31
+ connects to the outside from (not but
seems to block incoming TCP SYN, ICMP
+ is it sitting behind a firewall/NAT box?
+ cloph: we can move lcov to another host (part of LODE)
+ AI guilhem: check with Sophie/thb for contact info, fix incoming
connections, perform full backup and ask for an installation of CentOS7 (or
just ask for KVM access if possible)
* Build bots:
+ tb66 will be brought offline permanently in early May
+ large build logs
. 4GiB logs are not sustainable (a build issue caused a serie of build
logs to grow by a factor of >20 last months) and DoS CI
. gerrit_linux_clang_dbgutil (normally up to ~250MiB then they grew to
4GiB) cf. gerrit_linux_clang_dbgutil/builds/28522/log.gz (288MiB
compressed, 4GiB uncompressed)
. there is a jenkins plugin to limit log size but that only works with
pipeline-based setups
* Preserve old download links (redirect to downloadarchive), cf. wget's mail
+ cloph: not keen to manually maintain a map for a single distro
+ switching the links to use downloadarchive instead of download might
take time but that's just a one-off overhead
* Chance to speed up latam conference site deployment? → done
* Next call: Tuesday May 21 2019 at 18:30 Berlin time (16:30 UTC).


