Date: prev next · Thread: first prev next last
2020 Archives by date, by thread · List index

1. guilhem
2. brett
3. Emiliano
4. Marco


 * New participant: Marco
   + TDF member
   + Authored the LOOL deployment guide on the wiki
   + python dev + sysadmin
   + lurking here for now :-)
 * Presentation round
   + guilhem: TDF staff member in charge of infrastructure, main contact for
   + Emiliano: sysadmin, BoD member, part of the oversight area for infra
   + Brett: sysadmin/devops, TDF member, part of the infra team since a few
     years; helped with monitoring and metrics collection among other things
   + cloph [absent]: mainly active on dev-related areas like tinderboxes,
     weblate and silverstripe
 * infratools (GitLab) deprecation
   + waiting to be opened
   + deadline to bring down June 30
     (Jessie EOL)
   + Brett: want to export the issues, interested to work back on that at some
   + tickets could be moved to Redmine (~10 at most)
     - AI guilhem: create account for Brett (done)
   + Brett: CI won't be functional anymore [g. it wasn't really], could replace
     it with Jenkins if desired
     - either stick to a docker-based deployment, or switch to VMs
 * OS upgrades:
   + 3 hypervisors upgraded to Buster
   + most guests as well
   + prod guests that are still pending: website, gerrit (waiting for upgrade to
     3.1), jenkins
 * libvirt API for guest healthchecks (FS %free, running kernel, etc)
   + guilhem: anyone with experience with golang/libvirt API?
     - Brett/guilhem: only basic patching, no real dev
   + EV: does the virt stack use plain qemu/KVM or is it behind proxmox or
     something else?
     - guilhem: KVM, qemu with gfapi block devices and libvirt for the
       management layer (glueing/monitoring/multiplexing)
   + Prometheus exporter ,
     - removes the need to deploy node exports on each guest
     - metrics name not that stable accross node-exporter versions
     - not all vms are connected to the vpn, would need to rewire or deploy a PKI
     - creates overhead
   + Brett: prometheus has the ability to push metrics too, advantages: no need
     to open a port or to connect guests to the vpn
     - guilhem: would still need a PKI for client auth though
   + libvirt stats API: network usage, %CPU, %mem, block device metrics (R/W,
     flush time and #requests)
     - qemu guest agent: can access
       more fine-grained info like FS usage, #users, running kernel
       . guilhem: would be nice to have dashboards showing how many VMs have a
         FS at ≥%85 full, or that run an old kernel,… etc. and configure the
         alert system accordingly
         (requires libvirt ≥5.7.0, guilhem: backported libvirt for that)
       . guilhem: wasn't able to patch libvirt_exporter_improved to use that API
     - Example using the virsh(1) binary; would like to have that data in /metrics
       $ virsh guestinfo vm150 | grep -F -eos.{pretty-name,kernel-release} -e{total,used}-bytes
       os.pretty-name      : Debian GNU/Linux 10 (buster)
       os.kernel-release   : 4.19.0-9-amd64    : 46909095936
       fs.0.used-bytes     : 27519135744
     - Brett: would be happy to have a look; EV: also interested to help, not
       sure about time will allow though
 * gerrit upgrade to 3.1
   + before or after 7.0? (pending on cloph's input)
   + git wire protocol v2, gives ~95% perf improvement on up to date trees (from
     20MiB to ~100kiB, also faster since git doesn't have to traverse as much)
     - might help TBs that are slow to fetch the repo and timeout
 * tb69 "Disconnected by cloph : out of diskspace"
   + Do tinderbox owners need to perform routine maintenance?
     - g. not FS cleanups and baseline upgrades at least (unless physical login
       is needed)
   + Brett: Will install node exporter ≥0.16 on tb69 for monitoring
 * Update on (Q: cloph)
   + user feedback/workflow?
   + vm179 (old plone site) deprecation, deadline June 30 (Jessie EOL)
   + AI cloph:$NAME calls
     to Google (reCAPTCHA), put the comment/rating stuff on a separate page (or
     activate it with a button), perhaps even let authenticated users through
     - Marco: know a reCAPTCHA alternative that can be self-hosted
       . does it work with
         (used to protect the page right now)?
 * Marco: what is used for metric collections of website visitors?
   + Piwik/Matomo:
   + We use it for ~75/80 websites, and also for download and update check
     metrics collection (using custom exporters)
   + Fairly large instance with ~6M hits/day
 * [rdm#3097] Try out MirrorBits as replacement for MirrorBrain
   + Marco: will look at this
     - /join #tdf-infra on freenode
 * Brett: how did the [Matrix] move go?
   + guilhem: adopted by BoD and staff right now
   + doesn't use Single Sign-On ATM, reverted last year to give clients (eg
     mobile) more time to implement SSO authentication
   + EV: Let's brainstom about it and see in the future if it's feasible, maybe
     via ML and discuss next/the other infra meeting
   + Brett: was assuming that #tdf-infra would move to [Matrix] and be
     . was hopping to connect own server
     . making the instance public could maybe lure Telegram users out :-)
 * Next call: Tue Jul 21 16:30 UTC 2020


To unsubscribe e-mail to:
Posting guidelines + more:
List archive:
Privacy Policy:


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.