Date: prev next · Thread: first prev next last
2018 Archives by date, by thread · List index


On Mon, 22 Oct 2018 at 11:51:35 +0200, Lionel Elie Mamane wrote:
On Wed, Oct 17, 2018 at 09:03:45PM +0200, Guilhem Moulin wrote:
On Wed, 17 Oct 2018 at 14:05:27 +0200, Eike Rathke wrote:
On Wednesday, 2018-10-17 04:27:54 +0200, Guilhem Moulin wrote:
Lastly, it's now possible to clone and fetch git repositories over
https:// .  While git:// URLs will remain supported for the foreseeable
future, they're intentionally no longer advertised in gerrit, and we
encourage you to upgrade the scheme of your ‘remote.<name>.url’ to
secure transports (SSH for authenticated access, or HTTPS for anonymous
access).  We'll update ‘lode’ and chase remaining git:// URLs shortly.

Why is git:// deprecated? From what I know it's more efficient when
fetching/pulling than https:// (or ssh://?)

Since v1.6.6 it's no longer true [0], cf. git-http-backend(1) and
https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes

That webpage doesn't seem to contain a discussion of the efficiency of
the various protocols.

My bad, I probably copy the URL from a wrong tab.  This is what I intended
to share: https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols .
As you can see the protocols are essentially equivalent.

For a high-level overview and pros and cons of each protocol, there is
also https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols ,
which reads

    “There is very little advantage that other protocols have over
    Smart HTTP for serving Git content.” :-)

To be fair, it also says that “The Git protocol is often the fastest
network transfer protocol available”, but that's just because no
encryption is always faster than the fastest encryption.  In practice
however, this argument is moot on modern CPUs.

FWIW, GitHub doesn't mentioned git:// URLs either (even though they're still
supported): https://help.github.com/articles/which-remote-url-should-i-use/ .
 
SSH is only used for transport, a git processed is exec()'ed on the
remote just like for git-daemon(1), so the only overhead is
crypto-related.  The handshake is a one-off thing, thus negligible
when you're transferring a large amount of data at once; (...) As
for symmetric crypto overhead, (...) the overhead should be
negligible.

All I know is that about 1/2/3 years ago ('t was I think in some
coworking space in Brussels, probably a hackfest) I showed Michael
Meeks how to have a separate "push" url (with ssh: protocol) and
"pull" url (with git: protocol) and he was very happy at the
speed-up.

Might be orthogonal to the git:// vs. https:// vs. ssh:// discussion.
Gerrit uses JGit as Git implementation, while git-daemon(1) spawns
“normal” (C-based) git-upload-pack(1) processes.  I recall Norbert and I
sat down during FOSDEM 2017 to solve perf issues with our JGit
deployment.  Perhaps you configured your ‘remote.<name>.pushurl’ at the
same time :-)

Anyway, it's easy enough to benchmark no-op `git fetch` on core.  master
is currently at c99732d59bc6, and I'm fetching from the same datacenter
to avoid metrics being polluted with network hiccups.

    $ git config remote.origin.url git://git.libreoffice.org/core && time git fetch
    0:01.62 (0.42 user, 0.64 sys)  142108k maxres
    ## Network usage: up 252kiB (4312 packets), down 10168kiB (7197 packets)

    $ git config remote.origin.url https://git.libreoffice.org/core && time git fetch
    0:01.63 (0.81 user, 0.29 sys)  141688k maxres
    ## Network usage: up 56kiB (924 packets), down 4194kiB (1241 packets)

    $ git config remote.origin.url "ssh://$USER@gerrit.libreoffice.org:29418/core" && time git fetch
    0:01.55 (0.62 user, 0.46 sys)  141588k maxres
    ## Network usage: up 67kiB (993 packets), down 9859kiB (1305 packets)

Pretty much equivalent, aren't they? :-)  (Network usage for https:// is
smaller because the TLS termination proxy is also compressing responses
from the git backend.  For git:// I guess the system time is higher than
the user time because it uses use sendfile(2) and friends since there
are no user-space encryption.)

As one might notice, network usage (~10MiB down, and growing) is really
high for a no-op `git fetch`.  That's caused by the >140k refs/changes/…
in the initial git-upload-pack advertisement(1):

    $ git ls-remote https://git.libreoffice.org/core | awk '
        $1 ~ /^[0-9a-f]{40}$/ {
            refs++;
            if ($2 ~ /^refs\/changes\//)
                changes++;
        }
        END {
            printf "refs=%d, changes=%d (%.1f%%)\n",
                refs, changes, 100*changes/refs;
        }
    '
    refs=144709, changes=142676 (98.6%)

All remote types are affected.  Since the number of changesets seems to
grow linearly [0], we should try to find a solution if we want the repository
to keep scaling.  I had an attempt at setting ‘uploadpack.hideRefs’ (and
‘uploadpack.allowTipSHA1InWant’) last Friday, to exclude refs/changes/…
from the initial advertisement, but that broke CI hence needs more work.
There is no urgency anyway (it's not a regression) and although it's
getting worse over time, by the time it's unbearable the Git protocol v2
[1] might save us :-)

-- 
Guilhem.

[0] https://dashboard.documentfoundation.org/app/kibana#/dashboard/Gerrit
[1] https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html

Attachment: signature.asc
Description: PGP signature


Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.