[libreoffice-website] Minutes from the Tue Jan 16 infra call

Guilhem Moulin <guilhem -AT- libreoffice.org>
Tue, 16 Jan 2018 23:57:06 +0100

Participants
1. davido
2. guilhem
3. Brett
4. Christian

Agenda
* Upgrade ancient Gerrit version 2.11.8 to 2.13.9 (used by OpenStack
and Wikimedia for years now, without any issue)
- Q: according to my notes 2.11.8 was released on 2016-03-09 and
2.13.9 on 2017-07-03? Are there known vuln in 2.11.x? Is it about
getting feature fixes and new shiniest software?
. No known vulnerability, but there are a bunch of new features,
especially inline edit feature
- David: dedup scripts should keep working with 2.13.x
- David: see old redmine ticket Norbert filed about migration
. do you mean my comments in https://redmine.documentfoundation.org/issues/1587#note-4 ?
. I meant this Norbert's comment: https://redmine.documentfoundation.org/issues/1587#note-8
- Cloph: difficult to test everything as OAuth needs proper DNS setup
- Cloph: can't copy the database to a test VM and grant access to
everyone as we have private repos
- Cloph: release-wise, it would be ideal to do that (switching the
live instance) in March or so (after 6.0.1)
- Q: Is Norbert coming to FOSDEM? Would be ideal time to brainstorm
there
- Roadmap:
- Set up staging gerrit instance:
- Synchronize production gerrit content to gerrit-test:
- Simulate upgrade process:
. Stop gerrit
. Perform database and git repository backup
. Update gerrit version
. Update all external plugins (gerrit-oauth-provider)
. Run init command in batch mode, all used internal plugins
should be updated (double-check)
. Run reindex command
. Start gerrit
. Verify, that gerrit still works as expected
→ this is the (very) hard part, as test-instance cannot have
all features enabled, and of course you don't think of any
possible user-mistakes that had to be dealt with.
- Schedule gerrit upgrade in production
- AI guilhem/cloph: create redmine ticket to follow progress
* Gerrit: added `git gc` to a weekly cronjob so crap doesn't accumulate
and slow down the repo
- Q: is the frequency suitable? Also pass --aggressive (cloph: no)
and/or --prune=<date>?
- Cloph: slowness might be caused by gerrit keeping FDs open
* Network issues (hypers, gustl) seem fixed since manitu plugged to a
new switch last week (Wed Jan 10)
- Need to keep an eye on the RX FCS counter (gustl) and the link
speed (hypers)
* Saltstack:
- mail.postfix state is now ready, since mail/README for the config
options (and on pillar for usage examples: antares, vm150, vm194,
vm202, etc.)
- Proposal: more aggressive control for SSH and sudo access:
. ACL for SSH access already in place (user must be in ssh-login
Unix group, which is assigned — and possibly trimmed — with the
ssh salt state)
. Also limit the authorized_key(5) list to the keys that are found
in pillar? would avoid eg, leaving your key in
~root/.ssh/authorized_keys during a migration and forgetting
about it afterwards → OK
. Also assign — and possibly trim — the list of sudo group members
in salt? → OK
group_map:
sudo: [ superdev1, superdev2 ]
adm: other-username
. Cloph: beware of shared users (eg, tdf@ci); yaml-foo to share ssh
keys
. These would provide a clear overview (in pillar) of who has
access to what; the same could be done using NSS and pam_ldap(5).
* Backup
- right now rsnapshot-based (using `ssh -lroot` from berta as rsync's
remote shell)
. do we really want to open a root shell to each host from berta?
→ Nope :-)
. for rsync we could at least add restriction on the ssh-key
(remount fs read-only, and use `rsync --server --sender …` as
forcecommand)
- databases are downloaded in full each time, using pg_dumpall(1) or
mysqldump(1) and compressed locally
. large database clutter disk IO and network bandwith (even though
we're far from saturating the link since the upgrade to the new
switch, that's wasteful), for instance the bugzilla PostgreSQL
database is currently 44.9GiB (20.3GiB after gzip compression),
and takes around 95min to transfer at sustained 5MiB/s transfer
rate *on the public interface*
→ AI guilhem: add a private interface on br1 to all VMs (brought
that before, didn't do it yet)
. Q: do we know what is the bottleneck? local disk IO? local
compression (zstd to the rescue)? network (probably not)? berta's
disk IO (probably not)?
→ cloph: it's single-threaded compression maxing out CPU thread
. full backup is wasteful, especially with large databases. Does
anyone have experience with with PostgreSQL continuous archiving
(PITR)?
https://www.postgresql.org/docs/9.6/static/continuous-archiving.html
→ AI guilhem: deploy that on some hosts to try out (push base dir
to berta, + xlogs for incremental backups)
* Pootle backmigration
- in discussion with manitu
- backend domain tdf.pootle.tech doesn't exist anymore (NXDOMAIN),
added to vm183:/etc/hosts
* Monitoring status update.
- Q: Is Icinga the desired monitoring platform over something like
TICK? telegraf (client); central server with influxdb, chronograf,
kapacitor
. cronograf ping home, needs to turn that off
. Q: which transport does it use? brett: go server. G. HTTPd?
. all component BSD licence Telegraf, InfluxDB and Chronograf: MIT,
Chronograf: AGPL3
. AI guilhem: setup test VM for influxdb, chronograf, kapacitor
. Brett: get telegraf running on vm191 as show case
. https://github.com/influxdata/telegraf#input-plugins
* SSO adoption <https://user.documentfoundation.org>:
- 500 accounts in total \o/
- 54 new accounts created since the last infra call; 20 since
2018-01-01 00:00:00 UTC
- governance: 3/10 (new+old) board; 1/9 MC; 42/185 (23%) TDF members
missing
- contributors: 69/140 (49%) recent (last 90 days) wiki editors
missing
* Gluster: healing takes too long, perhaps due to 512B sharding? or
large files?
- Cloph: shouldn't limit ourselves to delta for new VMs, but other
volumes should be reconfigured to use arbiters, and images
rebalanced evenly
* Next call: Tuesday Feburary 20 2018 at 18:30 Berlin time (17:30 UTC)

--
Guilhem.

--
To unsubscribe e-mail to: website+unsubscribe@global.libreoffice.org
Problems? https://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette
List archive: https://listarchives.libreoffice.org/global/website/
All messages sent to this list will be publicly archived and cannot be deleted

Context

[libreoffice-website] Minutes from the Tue Jan 16 infra call · Guilhem Moulin

Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.