On Mon, 22 Oct 2018 at 17:25:11 +0200, Lionel Elie Mamane wrote:
On Mon, Oct 22, 2018 at 04:33:21PM +0200, Guilhem Moulin wrote:On Mon, 22 Oct 2018 at 11:51:35 +0200, Lionel Elie Mamane wrote:On Wed, Oct 17, 2018 at 09:03:45PM +0200, Guilhem Moulin wrote:SSH is only used for transport, a git processed is exec()'ed on the remote just like for git-daemon(1), so the only overhead is crypto-related. The handshake is a one-off thing, thus negligible when you're transferring a large amount of data at once; (...) As for symmetric crypto overhead, (...) the overhead should be negligible.All I know is that about 1/2/3 years ago ('t was I think in some coworking space in Brussels, probably a hackfest) I showed Michael Meeks how to have a separate "push" url (with ssh: protocol) and "pull" url (with git: protocol) and he was very happy at the speed-up.Might be orthogonal to the git:// vs. https:// vs. ssh:// discussion. Gerrit uses JGit as Git implementation, while git-daemon(1) spawns “normal” (C-based) git-upload-pack(1) processes.For us developers of LibreOffice, and thus consumers of the Gerrit / Git service of freedesktop.org and TDF, whether the difference comes from the protocol itself or a different git implementation on the server to serve the different protocols is intellectually interesting (thanks for that!), but materially inconsequential: if using git: will be faster, we will use git:.
Following the same logic, you want gerrit.libreoffice.org to serve content over plain http:// so you can save the two round-trips when you launch your browser to submit your reviews? Oo FWIW, we're stuck with git:// for the years to come because there is no smooth upgrade path for clients; if I were to deploy the service today I would most likely skip git-daemon(1). Things have changed since 2012, encryption is faster (there are modern stream ciphers and hardware acceleration is more widespread), and for situations like this one there is no reason not to encrypt data in transit.
I recall Norbert and I sat down during FOSDEM 2017 to solve perf issues with our JGit deployment. Perhaps you configured your ‘remote.<name>.pushurl’ at the same time :-)I can easily believe it was earlier.
Then it was before my time, so no idea what the bottleneck was.
Anyway, it's easy enough to benchmark no-op `git fetch` on core. master is currently at c99732d59bc6, and I'm fetching from the same datacenter to avoid metrics being polluted with network hiccups.Yes, but no. You also test in an environment where a network RTT is probably about one fifth to one third of a millisecond, and bandwidth at least 100Mbps if not 1000Mbps? In that case, everything will be fast. Time difference will be lost in noise.
I was arguing that C git and Jgit's performances are indistinguishable on the current instance. Transport overhead is the normal batch-mode SSH (resp. TLS) overhead for ssh:// (resp. https://) remotes. As mentioned earlier the protocol is essentially the same for git:// and http:// (on servers supporting smart HTTP). In both cases there is a first round-trip (client hello + server git-upload-pack advertisement), and another if the client is missing some object(s) (client sends list of missing objects and receives a pack back). For http:// these are done in two sequential requests to the same connection (resp. ‘GET /$REPO/info/refs?service=git-upload-pack’ and ‘POST /$REPO/git-upload-pack’); for git:// there are equivalent requests in the Git wire protocol. https:// is just http:// wrapped in TLS, which costs an extra 2 round-trips (TLS 1.3 brings that down to a single extra round-trip, but we don't offer it yet). For ssh://, what happens under the hood (as witnessed when adding “GIT_TRACE=1” to the environment) is that an ssh process is spawned to run git-upload-pack on the remote machine: ssh -p 29418 gerrit.libreoffice.org git-upload-pack "/core" Counting round-trips for SSH isn't trivial, but let me try: * Client & server greet each other (in parallel) * Client & server initiate KEX (in parallel) * Key EXchange * Client & server send NEWKEYS (in parallel) * Client requests service, wait for response * Client send each pubkey one at a time, waits for response; for the one that's accepted by the server, it sends the signed response and waits for the server to ack * Client asks to open a new channel, waits for response * Client asks to execute command in said channel, wait for response * […] * Server sends EOF and closes channel * Client acks, closes channels, and disconnects So if the latency is symmetric and the first key offered is accepted by the server, that makes a constant overhead of 8.5 round-trips. (When using an existing — multiplexed — connection the overhead becomes 2.5 round-trips.) Additionally, the sending side must wait for the client to adjust the window size when it's full. (OpenSSH's window size is 2MiB at compile-time and is adjusted on the fly depending on network conditions, cf. RFC 4254 sec. 5.2 for details.)
Are these protocols (or the *implementations* of these protocols) more sensitive to RTT than another? They do more roundtrips? Or not?
Given the current (and growing) size of the git-upload-pack advertisement, I doubt latency will be the bottleneck here. Not until we manage to shrink it. FWIW there is another advantage of using HTTP as pull URL, namely that capable clients can send HTTP headers such “Accept-Encoding: deflate, gzip” (on Debian Jessie and later it's compiled in, not sure if that's an upstream default). That way the backend can compress responses it thinks are worth compressing. As shown in my earlier message, this halves the size of git-upload-pack advertisement, despite the fact that it contains 145k random 40-hexdigits long strings. AFAIK compression of data in transit isn't in the git protocol, hence not available for ssh:// and git:// URLs. (For SSH one doesn't want transport-level compression, as packs are already compressed by the git protocol.) Saving 5MiB per fetch is certainly interesting in low-bandwidth networks. Also, TCP port 443 is less likely to be blocked than 9418 :-) -- Guilhem.
Attachment:
signature.asc
Description: PGP signature