From e40a9e2c9ed0723088e0fb65deb9b430fc91c367 Mon Sep 17 00:00:00 2001 From: Junio C Hamano <junkio@cox.net> Date: Fri, 29 Dec 2006 02:35:40 -0800 Subject: [PATCH 1/2] send-pack: fix pipeline. send-pack builds a pipeline that runs "rev-list | pack-objects" and sends the output from pack-objects to the other side, while feeding the input side of that pipe from itself. However, the file descriptor that is given to this pipeline (so that it can be dup2(2)'ed into file descriptor 1 of pack-objects) is closed by the caller before the complex fork+exec dance! Worse yet, the caller already dup2's it to 1, so the child process did not even have to. I do not understand how this code could possibly have been working, but it somehow was working by accident. Merging the sliding mmap() code reveals this problem, presumably because it keeps one extra file descriptor open for a packfile and changes the way file descriptors are allocated. I am too tired to diagnose the problem now, but this seems to be a sensible fix. Signed-off-by: Junio C Hamano <junkio@cox.net> --- send-pack.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/send-pack.c b/send-pack.c index cc884f3b2de..54de96e40c3 100644 --- a/send-pack.c +++ b/send-pack.c @@ -58,7 +58,7 @@ static void exec_rev_list(struct ref *refs) /* * Run "rev-list --stdin | pack-objects" pipe. */ -static void rev_list(int fd, struct ref *refs) +static void rev_list(struct ref *refs) { int pipe_fd[2]; pid_t pack_objects_pid; @@ -71,10 +71,8 @@ static void rev_list(int fd, struct ref *refs) * and writes to the original fd */ dup2(pipe_fd[0], 0); - dup2(fd, 1); close(pipe_fd[0]); close(pipe_fd[1]); - close(fd); exec_pack_objects(); die("pack-objects setup failed"); } @@ -85,7 +83,6 @@ static void rev_list(int fd, struct ref *refs) dup2(pipe_fd[1], 1); close(pipe_fd[0]); close(pipe_fd[1]); - close(fd); exec_rev_list(refs); } @@ -111,7 +108,7 @@ static void rev_list_generate(int fd, struct ref *refs) close(pipe_fd[0]); close(pipe_fd[1]); close(fd); - rev_list(fd, refs); + rev_list(refs); die("rev-list setup failed"); } if (rev_list_generate_pid < 0) From b5ffa5ceef250ae57b9088ac1de22e783faf7ff8 Mon Sep 17 00:00:00 2001 From: Junio C Hamano <junkio@cox.net> Date: Fri, 29 Dec 2006 12:14:30 -0800 Subject: [PATCH 2/2] Documentation: illustrate send-pack pipeline. Signed-off-by: Junio C Hamano <junkio@cox.net> --- .../technical/send-pack-pipeline.txt | 112 ++++++++++++++++++ 1 file changed, 112 insertions(+) create mode 100644 Documentation/technical/send-pack-pipeline.txt diff --git a/Documentation/technical/send-pack-pipeline.txt b/Documentation/technical/send-pack-pipeline.txt new file mode 100644 index 00000000000..bd32aff00be --- /dev/null +++ b/Documentation/technical/send-pack-pipeline.txt @@ -0,0 +1,112 @@ +git-send-pack +============= + +Overall operation +----------------- + +. Connects to the remote side and invokes git-receive-pack. + +. Learns what refs the remote has and what commit they point at. + Matches them to the refspecs we are pushing. + +. Checks if there are non-fast-forwards. Unlike fetch-pack, + the repository send-pack runs in is supposed to be a superset + of the recipient in fast-forward cases, so there is no need + for want/have exchanges, and fast-forward check can be done + locally. Tell the result to the other end. + +. Calls pack_objects() which generates a packfile and sends it + over to the other end. + +. If the remote side is new enough (v1.1.0 or later), wait for + the unpack and hook status from the other end. + +. Exit with appropriate error codes. + + +Pack_objects pipeline +--------------------- + +This function gets one file descriptor (`out`) which is either a +socket (over the network) or a pipe (local). What's written to +this fd goes to git-receive-pack to be unpacked. + + send-pack ---> fd ---> receive-pack + +It somehow forks once, but does not wait for it. I am not sure +why. + +The forked child calls rev_list_generate() with that file +descriptor (while the parent closes `out` -- the child will be +the one that writes the packfile to the other end). + + send-pack + | + rev-list-generate ---> fd ---> receive-pack + + +Then rev-list-generate forks after creates a pipe; the child +will become a pipeline "rev-list --stdin | pack-objects", which +is the rev_list() function, while the parent feeds that pipeline +the list of refs. + + send-pack + | + rev-list-generate ---> fd ---> receive-pack + | ^ (pipe) + v | + rev-list + +The child process, before calling rev-list, rearranges the file +descriptors: + +. what it reads from rev-list-generate via pipe becomes the + stdin; this is to feed the upstream of the pipeline which will + be git-rev-list process. + +. what it writes to its stdout goes to the fd connected to + receive-pack. + +On the other hand, the parent process, before starting to feed +the child pipeline, closes the reading side of the pipe and fd +to receive-pack. + + send-pack + | + rev-list-generate + | + v [0] + rev-list [1] ---> receive-pack + +The parent then writes to the pipe and later closes it. There +is a commented out waitpid to wait for the rev-list side before +it exits, I again do not understand why. + +The rev-list function further sets up a pipe and forks to run +git-rev-list piped to git-pack-objects. The child side, before +exec'ing git-pack-objects, rearranges the file descriptors: + +. what it reads from the pipe becomes the stdin; this gets the + list of objects from the git-rev-list process. + +. its stdout is already connected to receive-pack, so what it + generates goes there. + +The parent process arranges its file descriptors before exec'ing +git-rev-list: + +. its stdout is sent to the pipe to feed git-pack-objects. + +. its stdin is already connected to rev-list-generate and will + read the set of refs from it. + + + send-pack + | + rev-list-generate + | + v [0] + git-rev-list [1] ---> [0] git-pack-objects [1] ---> receive-pack + + +