1
0
mirror of https://github.com/git/git.git synced 2025-04-16 12:35:49 +00:00
Jeff King 845de33a5b cat-file: avoid noop calls to sha1_object_info_extended
It is not unreasonable to ask cat-file for a batch-check
format of simply "%(objectname)". At first glance this seems
like a noop (you are generally already feeding the object
names on stdin!), but it has a few uses:

  1. With --batch-all-objects, you can generate a listing of
     the sha1s present in the repository, without any input.

  2. You do not have to feed sha1s; you can feed arbitrary
     sha1 expressions and have git resolve them en masse.

  3. You can even feed a raw sha1, with the result that git
     will tell you whether we actually have the object or
     not.

In case 3, the call to sha1_object_info is useful; it tells
us whether the object exists or not (technically we could
swap this out for has_sha1_file, but the cost is roughly the
same).

In case 2, the existence check is of debatable value. A
mass-resolution might prefer performance to safety (against
outputting a value for a corrupted ref, for example).
However, the object lookup cost is likely not as noticeable
compared to the resolution cost. And since we have provided
that safety in the past, the conservative choice is to keep
it.

In case 1, though, the object lookup is a definite noop; we
know about the object because we found it in the object
database. There is no new information gained by making the
call.

This patch detects that case and optimizes out the call.
Here are best-of-five timings for linux.git:

  [before]
  $ time git cat-file --buffer \
                      --batch-all-objects \
                      --batch-check='%(objectname)'
  real    0m2.117s
  user    0m2.044s
  sys     0m0.072s

  [after]
  $ time git cat-file --buffer \
                      --batch-all-objects \
                      --batch-check='%(objectname)'
  real    0m1.230s
  user    0m1.176s
  sys     0m0.052s

There are two implementation details to note here.

One is that we detect the noop case by seeing that "struct
object_info" does not request any information. But besides
object existence, there is one other piece of information
which sha1_object_info may fill in: whether the object is
cached, loose, or packed. We don't currently provide that
information in the output, but if we were to do so later,
we'd need to take note and disable the optimization in that
case.

And that leads to the second note. If we were to output
that information, a better implementation would be to
remember where we saw the object in --batch-all-objects in
the first place, and avoid looking it up again by sha1.

In fact, we could probably squeeze out some extra
performance for less-trivial cases, too, by remembering the
pack location where we saw the object, and going directly
there to find its information (like type, size, etc). That
would in theory make this optimization unnecessary.

I didn't pursue that path here for two reasons:

  1. It's non-trivial to implement, and has memory
     implications. Because we sort and de-dup the list of
     output sha1s, we'd have to record the pack information
     for each object, too.

  2. It doesn't save as much as you might hope. It saves the
     find_pack_entry() call, but getting the size and type
     for deltified objects requires walking down the delta
     chain (for the real type) or reading the delta data
     header (for the size). These costs tend to dominate the
     non-trivial cases.

By contrast, this optimization is easy and self-contained,
and speeds up a real-world case I've used.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-18 14:17:38 -07:00
2016-03-17 11:26:41 -07:00
2015-04-18 18:35:48 -07:00
2015-08-31 15:38:52 -07:00
2015-08-03 11:01:27 -07:00
2015-08-25 14:57:08 -07:00
2015-08-25 14:57:08 -07:00
2015-08-11 13:48:15 -07:00
2015-05-05 21:00:23 -07:00
2015-09-28 15:33:56 -07:00
2015-08-03 11:01:21 -07:00
2015-08-03 11:01:27 -07:00
2015-05-20 10:19:12 -07:00
2015-06-29 11:39:10 -07:00
2015-06-25 10:47:46 -07:00
2015-09-28 19:16:54 -07:00
2016-03-17 11:26:18 -07:00
2015-09-28 14:57:10 -07:00
2015-05-26 13:24:46 -07:00
2015-08-03 11:01:18 -07:00
2015-06-29 11:39:10 -07:00
2015-09-28 15:33:56 -07:00
2016-03-17 11:26:41 -07:00
2015-06-24 12:21:47 -07:00
2015-03-13 22:43:11 -07:00
2016-03-16 10:41:02 -07:00
2015-09-28 15:28:31 -07:00
2015-09-28 15:33:56 -07:00
2015-08-03 11:01:27 -07:00
2015-03-23 11:12:58 -07:00
2015-09-04 10:43:23 -07:00
2015-08-25 14:57:08 -07:00
2016-03-17 11:26:41 -07:00
2015-05-22 09:33:08 -07:00
2015-08-25 14:57:08 -07:00
2015-09-28 19:16:54 -07:00
2015-09-28 19:16:54 -07:00
2015-09-09 14:30:35 -07:00
2015-08-11 14:29:36 -07:00

////////////////////////////////////////////////////////////////

	Git - the stupid content tracker

////////////////////////////////////////////////////////////////

"git" can mean anything, depending on your mood.

 - random three-letter combination that is pronounceable, and not
   actually used by any common UNIX command.  The fact that it is a
   mispronunciation of "get" may or may not be relevant.
 - stupid. contemptible and despicable. simple. Take your pick from the
   dictionary of slang.
 - "global information tracker": you're in a good mood, and it actually
   works for you. Angels sing, and a light suddenly fills the room.
 - "goddamn idiotic truckload of sh*t": when it breaks

Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.

Git is an Open Source project covered by the GNU General Public
License version 2 (some parts of it are under different licenses,
compatible with the GPLv2). It was originally written by Linus
Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

See Documentation/gittutorial.txt to get started, then see
Documentation/giteveryday.txt for a useful minimum set of commands, and
Documentation/git-commandname.txt for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with "man gittutorial" or "git help tutorial", and the
documentation of each command with "man git-commandname" or "git help
commandname".

CVS users may also want to read Documentation/gitcvs-migration.txt
("man gitcvs-migration" or "git help cvs-migration" if git is
installed).

Many Git online resources are accessible from http://git-scm.com/
including full documentation and Git related tools.

The user discussion and development of Git take place on the Git
mailing list -- everyone is welcome to post bug reports, feature
requests, comments and patches to git@vger.kernel.org (read
Documentation/SubmittingPatches for instructions on patch submission).
To subscribe to the list, send an email with just "subscribe git" in
the body to majordomo@vger.kernel.org. The mailing list archives are
available at http://news.gmane.org/gmane.comp.version-control.git/,
http://marc.info/?l=git and other archival sites.

The maintainer frequently sends the "What's cooking" reports that
list the current status of various development topics to the mailing
list.  The discussion following them give a good reference for
project status, development direction and remaining tasks.
Description
Git Source Code Mirror - This is a publish-only repository but pull requests can be turned into patches to the mailing list via GitGitGadget (https://gitgitgadget.github.io/). Please follow Documentation/SubmittingPatches procedure for any of your improvements.
Readme 834 MiB
Languages
C 50.1%
Shell 38.4%
Perl 5.1%
Tcl 3.2%
Python 0.8%
Other 2.1%