1
0
mirror of https://github.com/git/git.git synced 2025-04-01 00:29:09 +00:00

133 Commits

Author SHA1 Message Date
Jonathan Nieder
ddcc8c5b46 vcs-svn: skeleton of an svn delta parser
A delta in the subversion delta (svndiff0) format consists of the
magic bytes SVN\0 followed by a sequence of windows of a certain well
specified format (starting with five integers).

Add an svndiff0_apply function and test-svn-fe -d commandline tool to
parse such a delta in the special case of not including any windows.

Later patches will add features to turn this into a fully functional
delta applier for svn-fe to use to parse the streams produced by
"svnrdump dump" and "svnadmin dump --deltas".

The content of symlinks starts with the word "link " in Subversion's
worldview, so we need to be able to prepend that text to input for the
sake of delta application.  So initialization of the input state of
the delta preimage is left to the calling program, giving callers a
chance to seed the buffer with text of their choice.

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-27 22:41:38 -05:00
Jonathan Nieder
896e4bfcec vcs-svn: make buffer_read_binary API more convenient
buffer_read_binary is a thin wrapper around fread, but its signature
is wrong:

 - fread can fill an arbitrary in-memory buffer.  buffer_read_binary
   is limited to buffers whose size is representable by a 32-bit
   integer.
 - The result from fread is the number of bytes actually read.
   buffer_read_binary only reports the number of bytes read by
   incrementing sb->len by that amount and returns void.

Fix both: let buffer_read_binary accept a size_t instead of uint32_t
for the number of bytes to read and as a convenience return the number
of bytes actually read.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-27 22:37:05 -05:00
Jonathan Nieder
9d2f5ddfe5 vcs-svn: learn to maintain a sliding view of a file
Each section of a Subversion-format delta only requires examining (and
keeping in random-access memory) a small portion of the preimage.  At
any moment, this portion starts at a certain file offset and has a
well-defined length, and as the delta is applied, the portion advances
from the beginning to the end of the preimage.  Add a move_window
function to keep track of this view into the preimage.

You can use it like this:

	buffer_init(f, NULL);
	struct sliding_view window = SLIDING_VIEW_INIT(f);
	move_window(&window, 3, 7);	/* (1) */
	move_window(&window, 5, 5);	/* (2) */
	move_window(&window, 12, 2);	/* (3) */
	strbuf_release(&window.buf);
	buffer_deinit(f);

The data structure is called sliding_view instead of _window to
prevent confusion with svndiff0 Windows.

In this example, (1) reads 10 bytes and discards the first 3;
(2) discards the first 2, which are not needed any more; and (3) skips
2 bytes and reads 2 new bytes to work with.

When move_window returns, the file position indicator is at position
window->off + window->width and the data from positions window->off to
the current file position are stored in window->buf.

This function performs only sequential access from the input file and
never seeks, so it can be safely used on pipes and sockets.

On end-of-file, move_window silently reads less than the caller
requested.  On other errors, it prints a message and returns -1.

Helped-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-27 20:23:32 -05:00
Jonathan Nieder
41e6b91f01 vcs-svn: add missing cast to printf argument
gcc -m32 correctly warns:

 vcs-svn/fast_export.c: In function 'fast_export_commit':
 vcs-svn/fast_export.c:54:2: warning: format '%llu' expects
   argument of type 'long long unsigned int', but argument 2
   has type 'unsigned int' [-Wformat]

Fix it.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-27 12:21:12 -05:00
Junio C Hamano
a080fdd1b1 Merge branch 'svn-fe' of git://repo.or.cz/git/jrn
* 'svn-fe' of git://repo.or.cz/git/jrn:
  vcs-svn: handle log message with embedded NUL
  vcs-svn: avoid unnecessary copying of log message and author
  vcs-svn: remove buffer_read_string
  vcs-svn: make reading of properties binary-safe
2011-03-26 11:35:41 -07:00
David Barr
43155cfe14 vcs-svn: avoid using ls command twice
Currently there are two functions to retrieve the mode and content
at a path:

	const char *repo_read_path(const uint32_t *path);
	uint32_t repo_read_mode(const uint32_t *path)

Replace them with a single function with two return values.  This
means we can use one round-trip to get the same information from
fast-import that previously took two.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 01:00:05 -05:00
Jonathan Nieder
195b7ca6f2 vcs-svn: handle log message with embedded NUL
Pass the log message by strbuf instead of as a C-style string and use
fwrite instead of printf to write it to fast-import so embedded '\0'
bytes can be preserved.

Currently "git log" doesn't show the embedded NULs but "git cat-file
commit" can.

While at it, stop including system headers from repo_tree.h.  git
source files need to include git-compat-util.h (or cache.h or
builtin.h) sooner to ensure the appropriate feature test macros are
defined.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:49:37 -05:00
Jonathan Nieder
4c3169b03e vcs-svn: avoid unnecessary copying of log message and author
Use strbuf_swap when storing the svn:log and svn:author properties, so
pointers to rather than the contents of buffers get copied.  The main
effect should be to make the code a little easier to read.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:41:38 -05:00
Jonathan Nieder
7e2fe3a9fc vcs-svn: remove buffer_read_string
All previous users of buffer_read_string have already been converted
to use the more intuitive buffer_read_binary, so remove the old API to
avoid some confusion.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:17:35 -05:00
Jonathan Nieder
e7d04ee147 vcs-svn: make reading of properties binary-safe
svn-fe errors out on revision 59151 of the ASF repository:

 fatal: invalid dump: unexpected end of file

The proximate cause is a property with an embedded NUL character.
Previously such anomalies were ignored but commit c9d1c8ba
(2010-12-28) introduced a check strlen(val) == len to avoid reading
uninitialized data when a property list ends early and unfortunately
this test does not distinguish between "foo" followed by EOF and the
string "foo\0bar\0baz".

Fix it by using buffer_read_binary to read to a strbuf and checking
the actual length read.  Most consumers of properties still use
C-style strings, so in practice an author or log message with embedded
NULs will be truncated, but a least this way svn-fe won't error out
(fixing the regression).

Reported-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:15:10 -05:00
Junio C Hamano
785d6989da Merge branch 'svn-fe' of git://repo.or.cz/git/jrn
* 'svn-fe' of git://repo.or.cz/git/jrn:
  vcs-svn: use strchr to find RFC822 delimiter
  vcs-svn: implement perfect hash for top-level keys
  vcs-svn: implement perfect hash for node-prop keys
  vcs-svn: use strbuf for author, UUID, and URL
  vcs-svn: use strbuf for revision log
  vcs-svn: improve reporting of input errors
  vcs-svn: make buffer_copy_bytes return length read
  vcs-svn: make buffer_skip_bytes return length read
  vcs-svn: improve support for reading large files
  vcs-svn: allow input errors to be detected promptly
  vcs-svn: simplify repo_modify_path and repo_copy
  vcs-svn: handle_node: use repo_read_path
  vcs-svn: introduce repo_read_path to check the content at a path
2011-03-22 20:51:07 -07:00
Jonathan Nieder
41b9dd9d4f Merge branch 'db/length-as-hash' into svn-fe
* db/length-as-hash:
  vcs-svn: use strchr to find RFC822 delimiter
  vcs-svn: implement perfect hash for top-level keys
  vcs-svn: implement perfect hash for node-prop keys

Conflicts:
	vcs-svn/svndump.c
2011-03-22 18:44:49 -05:00
David Barr
cba3546a43 vcs-svn: drop obj_pool
This reverts commit 4709455db3891f6cad9a96a574296b4926f70cbe (Add
memory pool library, 2010-08-09).  svn-fe uses strbufs to avoid memory
allocation overhead nowadays.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:39:53 -05:00
David Barr
5db348dbd5 vcs-svn: drop treap
This reverts commit 951f316470acc7c785c460a4e40735b22822349f
(Add treap implementation, 2010-08-09).  The string_pool was
trp.h's last user.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:34:44 -05:00
David Barr
28c5d9ed2a vcs-svn: drop string_pool
This reverts commit 1d73b52f5ba4184de6acf474f14668001304a10c
(Add string-specific memory pool, 2010-08-09).  Now that svn-fe
does not need to maintain a growing collection of strings (paths)
over a long period of time, the string_pool is not needed.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:32:58 -05:00
David Barr
030879718f vcs-svn: pass paths through to fast-import
Now that there is no internal representation of the repo, it is not
necessary to tokenise paths.  Use strbuf instead and bypass
string_pool.

This means svn-fe can handle arbitrarily long paths (as long as a
strbuf can fit them), with arbitrarily many path components.

While at it, since we now treat paths in their entirety, only quote
when necessary.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:32:58 -05:00
Jonathan Nieder
fa6c4bceab Merge branch 'db/strbufs-for-metadata' into db/svn-fe-code-purge
* db/strbufs-for-metadata:
  vcs-svn: use strbuf for author, UUID, and URL
  vcs-svn: use strbuf for revision log

Conflicts:
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/svndump.c
2011-03-22 18:19:46 -05:00
Jonathan Nieder
5c674860eb Merge branch 'db/length-as-hash' (early part) into db/svn-fe-code-purge
* 'db/length-as-hash' (early part):
  vcs-svn: implement perfect hash for top-level keys
  vcs-svn: implement perfect hash for node-prop keys
  vcs-svn: improve reporting of input errors
  vcs-svn: make buffer_copy_bytes return length read
  vcs-svn: make buffer_skip_bytes return length read
  vcs-svn: improve support for reading large files

Conflicts:
	vcs-svn/fast_export.c
	vcs-svn/svndump.c
2011-03-22 18:11:59 -05:00
David Barr
f1602054e3 vcs-svn: use strchr to find RFC822 delimiter
This is a small optimisation (4% reduction in user time) but is the
largest artifact within the parsing portion of svndump.c

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:09:05 -05:00
David Barr
90c0a3cfe3 vcs-svn: implement perfect hash for top-level keys
Instead of interning property names and comparing their string_pool
keys, look them up in a table by string length, which should be about
as fast.

Another small step towards removing dependence on string_pool
altogether.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:09:05 -05:00
David Barr
044ad2906a vcs-svn: implement perfect hash for node-prop keys
Instead of interning property names and comparing their string_pool
keys, look them up in a table by string length, which should be about
as fast.

This is a small step towards removing dependence on string_pool.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:09:02 -05:00
David Barr
7c5817d3ba vcs-svn: use strbuf for author, UUID, and URL
Use strbufs and strings instead of interned strings for values of rev,
dump, and node fields that happen to be strings.  After this change,
the only remaining string_pool use is for paths in the repo_tree API
and internals.

Functional change: treat an empty author, UUID, or URL as none at all.
So for example, in repos where the first revision has an empty
svn:author property, the first rev will be treated as by "nobody"
rather than by a person with empty name and email address created by
prepending an @ sign to the repository UUID.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:01:48 -05:00
David Barr
dce33c9c18 vcs-svn: use strbuf for revision log
obj_pool is overkill for this application: all that is needed is a
buffer that can resize from rev to rev to accomodate differently-sized
strings.  In the spirit of commit deadcef4 (2010-11-06), use a strbuf
instead.

This is a small step towards removing dependence on obj_pool.h.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:41:36 -05:00
Jonathan Nieder
c9d1c8ba05 vcs-svn: improve reporting of input errors
Catch input errors and exit early enough to print a reasonable
diagnosis based on errno.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:41:09 -05:00
Jonathan Nieder
26557fc1b3 vcs-svn: make buffer_copy_bytes return length read
Currently buffer_copy_bytes does not report to its caller whether
it encountered an early end of file.

Add a return value representing the number of bytes read (but not
the number of bytes copied).  This way all three unusual conditions
can be distinguished: input error with buffer_ferror, output error
with ferror(outfile), early end of input by checking the return
value.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:40:26 -05:00
Jonathan Nieder
d234f54b2f vcs-svn: make buffer_skip_bytes return length read
Currently there is no way to detect when input ended if it ended
early during buffer_skip_bytes.  Tell the calling program how many
bytes were actually skipped for easier debugging.

Existing callers will still ignore early EOF.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:39:54 -05:00
Jonathan Nieder
93b709c79e vcs-svn: improve support for reading large files
Move from uint32_t to off_t as the fundamental unit of length used by
the line_buffer library.  Performance would get worse if anything but
I think it's worth it for support of deltas that need to skip large
pieces (> 4 GiB).

Exception: buffer_read_string still takes a uint32_t, since it keeps
its result in an in-core obj_pool.

Callers still have to be updated to take advantage of this.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 16:32:09 -05:00
Jonathan Nieder
06316234ac vcs-svn: remove spurious semicolons
trp_gen is not a statement or function call, so it should not be
followed with a semicolon.  Noticed by gcc -pedantic.

 vcs-svn/repo_tree.c:41:81: warning: ISO C does not allow extra ';'
  outside of a function [-pedantic]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-16 12:56:23 -07:00
David Barr
dd3f42ad79 vcs-svn: use mark from previous import for parent commit
With this patch, overlapping incremental imports work.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:58 -06:00
Jonathan Nieder
1ae469b06c vcs-svn: handle filenames with dq correctly
Quote paths passed to fast-import so filenames with double quotes are
not misinterpreted.

One might imagine this could help with filenames with newlines, too,
but svn does not allow those.

Helped-by: David Barr <daivd.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:58 -06:00
David Barr
e435811208 vcs-svn: quote paths correctly for ls command
This bug was found while importing rev 601865 of ASF.

[jn: with test]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:58 -06:00
Jonathan Nieder
723b7a2789 vcs-svn: eliminate repo_tree structure
Rely on fast-import for information about previous revs.

This requires always setting up backward flow of information, even for
v2 dumps.  On the plus side, it simplifies the code by quite a bit and
opens the door to further simplifications.

[db: adjusted to support final version of the cat-blob patch]
[jn: avoiding hard-coding git's name for the empty tree for
 portability to other backends]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:58 -06:00
Jonathan Nieder
7e11902c99 vcs-svn: add a comment before each commit
Current svn-fe produces output like this:

	blob
	mark :7382321
	data 5
	hello

	blob
	mark :7382322
	data 5
	Hello

	commit
	mark :3
[...]
	M 100644 :7382321 hello.c
	M 100644 :7382322 hello2.c

This means svn-fe has to keep track of the paths modified in each
commit and the corresponding marks, instead of dealing with each file
as it arrives in input and then forgetting about it.  A better
strategy would be to use inline blobs:

	commit
	mark :3
[...]
	M 100644 inline hello.c
	data 5
	hello
[...]

As a first step towards that, teach svn-fe to notice when the
collection of blobs for each commit starts and write a comment
("# commit 3.") there.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
Jonathan Nieder
78e1a3ff23 vcs-svn: save marks for imported commits
This way, a person can use

	svnadmin dump $path |
	svn-fe |
	git fast-import --relative-marks --export-marks=svn-revs

to get a list of what commit corresponds to each svn revision (plus
some irrelevant blob names) in .git/info/fast-import/svn-revs.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
Jonathan Nieder
d38f84484f vcs-svn: use higher mark numbers for blobs
Prepare to use mark :5 for the commit corresponding to r5 (and so on).

1 billion seems sufficiently high for blob marks to avoid conflicting
with rev marks, while still leaving room for 3 billion blobs.  Such
high mark numbers cause trouble with ancient fast-import versions, but
this topic cannot support git fast-import versions before 1.7.4 (which
introduces the cat-blob command) anyway.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
David Barr
41529bbce4 vcs-svn: set up channel to read fast-import cat-blob response
Set up some plumbing: teach the svndump lib to pass a file descriptor
number to the fast_export lib, representing where cat-blob/ls
responses can be read from, and add a get_response_line helper
function to the fast_export lib to read a line from that file.

Unfortunately this means that svn-fe needs file descriptor 3 to be
redirected from somewhere (preferrably the cat-blob stream of a
fast-import backend); otherwise it will fail:

	$ svndump <path> | svn-fe
	fatal: cannot read from file descriptor 3: Bad file descriptor

For the moment, "svn-fe 3</dev/null" works as a workaround but it
will not work for very long.  A fast-import backend that can retrieve
old commits is needed in order to be able to fulfill svn
"Node-copyfrom-rev" requests that refer to revs from a previous run.

[jn: with new change description]

Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
Jonathan Nieder
efc749b48f vcs-svn: allow input errors to be detected promptly
The line_buffer library silently flags input errors until
buffer_deinit time; unfortunately, by that point usually errno is
invalid.  Expose the error flag so callers can check for and
report errors early for easy debugging.

	some_error_prone_operation(...);
	if (buffer_ferror(buf))
		return error("input error: %s", strerror(errno));

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:32:51 -06:00
Jonathan Nieder
e75316de53 vcs-svn: simplify repo_modify_path and repo_copy
Restrict the repo_tree API to functions that are actually needed.

 - decouple reading the mode and content of dirents from other
   operations.
 - remove repo_modify_path.  It is only used to read the mode from
   dirents.
 - remove the ability to use repo_read_mode on a missing path.  The
   existing code only errors out in that case, anyway.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 00:56:50 -06:00
Jonathan Nieder
5a38b186d3 vcs-svn: handle_node: use repo_read_path
svn-fe processes each commit in two stages: first decide on the
correct content for all paths and export the relevant blobs, then
export a commit with the result.

But we can keep less state and simplify svn-fe a great deal by
exporting the commit in one step: use 'inline' blobs for each path and
remember nothing.  This way, the repo_tree structure could be
eliminated, and we would get support for incremental imports 'for
free'.

Reorganize handle_node along these lines.  This is just a code
cleanup; the changes in repo_tree and handle_revision will come later.

[db: backported to apply without text delta support]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 00:56:50 -06:00
Jonathan Nieder
4f5de755a7 vcs-svn: introduce repo_read_path to check the content at a path
The repo_tree structure remembers, for each path in each revision, a
mode (regular file, executable, symlink, or directory) and content
(blob mark or directory structure).  Maintaining a second copy of all
this information when it's already in the target repository is
wasteful, it does not persist between svn-fe invocations, and most
importantly, there is no convenient way to transfer it from one
machine to another.  So it would be nice to get rid of it.

As a first step, let's change the repo_tree API to match fast-import's
read commands more closely.  Currently to read the mode for a path,
one uses

	repo_modify_path(path, new_mode, new_content);

which changes the mode and content as a side effect.  There is no
function to read the content at a path; add one.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 00:56:50 -06:00
Jonathan Nieder
a62bbf8f01 Merge commit 'jn/svn-fe' of git://github.com/gitster/git into svn-fe
* git://github.com/gitster/git:
  vcs-svn: Allow change nodes for root of tree (/)
  vcs-svn: Implement Prop-delta handling
  vcs-svn: Sharpen parsing of property lines
  vcs-svn: Split off function for handling of individual properties
  vcs-svn: Make source easier to read on small screens
  vcs-svn: More dump format sanity checks
  vcs-svn: Reject path nodes without Node-action
  vcs-svn: Delay read of per-path properties
  vcs-svn: Combine repo_replace and repo_modify functions
  vcs-svn: Replace = Delete + Add
  vcs-svn: handle_node: Handle deletion case early
  vcs-svn: Use mark to indicate nodes with included text
  vcs-svn: Unclutter handle_node by introducing have_props var
  vcs-svn: Eliminate node_ctx.mark global
  vcs-svn: Eliminate node_ctx.srcRev global
  vcs-svn: Check for errors from open()
  vcs-svn: Allow simple v3 dumps (no deltas yet)

Conflicts:
	t/t9010-svn-fe.sh
	vcs-svn/svndump.c
2011-02-26 05:21:29 -06:00
Jonathan Nieder
b1c9b798a6 vcs-svn: teach line_buffer about temporary files
It can sometimes be useful to write information temporarily to file,
to read back later.  These functions allow a program to use the
line_buffer facilities when doing so.

It works like this:

 1. find a unique filename with buffer_tmpfile_init.
 2. rewind with buffer_tmpfile_rewind.  This returns a stdio
    handle for writing.
 3. when finished writing, declare so with
    buffer_tmpfile_prepare_to_read.  The return value indicates
    how many bytes were written.
 4. read whatever portion of the file is needed.
 5. if finished, remove the temporary file with buffer_deinit.
    otherwise, go back to step 2,

The svn support would use this to buffer the postimage from delta
application until the length is known and fast-import can receive
the resulting blob.

Based-on-patch-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:59:37 -06:00
Jonathan Nieder
cb3f87cf1b vcs-svn: allow input from file descriptor
Based-on-patch-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:59:37 -06:00
Jonathan Nieder
cc193f1f0b vcs-svn: allow character-oriented input
buffer_read_char can be used in place of buffer_read_string(1) to
avoid consuming valuable static buffer space.  The delta applier will
use this to read variable-length integers one byte at a time.

Underneath, it is fgetc, wrapped so the line_buffer library can
maintain its role as gatekeeper of input.

Later it might be worth checking if fgetc_unlocked is faster ---
most line_buffer functions are not thread-safe anyway.

Helpd-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:59:37 -06:00
Jonathan Nieder
e832f43c1d vcs-svn: add binary-safe read function
buffer_read_string works well for non line-oriented input except for
one problem: it does not tell the caller how many bytes were actually
written.  This means that unless one is very careful about checking
for errors (and eof) the calling program cannot tell the difference
between the string "foo" followed by an early end of file and the
string "foo\0bar\0baz".

So introduce a variant that reports the length, too, a thinner wrapper
around strbuf_fread.  Its result is written to a strbuf so the caller
does not need to keep track of the number of bytes read.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:59:37 -06:00
Jonathan Nieder
e5e45ca1e3 vcs-svn: teach line_buffer to handle multiple input files
Collect the line_buffer state in a newly public line_buffer struct.
Callers can use multiple line_buffers to manage input from multiple
files at a time.

svn-fe's delta applier will use this to stream a delta from svnrdump
and the preimage it applies to from fast-import at the same time.

The tests don't take advantage of the new features, but I think that's
okay.  It is easier to find lingering examples of nonreentrant code by
searching for "static" in line_buffer.c.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:59 -06:00
Jonathan Nieder
d350822fa7 vcs-svn: collect line_buffer data in a struct
Prepare for the line_buffer lib to support input from multiple files,
by collecting global state in a struct that can be easily passed
around.

No API change yet.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:58 -06:00
Jonathan Nieder
deadcef4c1 vcs-svn: replace buffer_read_string memory pool with a strbuf
obj_pool is inherently global and does not use the standard growing
factor alloc_nr, which makes it feel out of place in the git codebase.
Plus it is overkill for this application: all that is needed is a
buffer that can grow between requests to accomodate larger strings.
Use a strbuf instead.

As a side effect, this improves the error handling: allocation
failures will result in a clean exit instead of segfaults.  It would
be nice to add a test case (using ulimit or failmalloc) but that can
wait for another day.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:58 -06:00
Jonathan Nieder
4d21bec0d2 vcs-svn: eliminate global byte_buffer
The data stored in byte_buffer[] is always either discarded or
written to stdout immediately.  No need for it to persist between
function calls.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:58 -06:00
Ramsay Jones
5ee5f5a65d svndump.c: Fix a printf format compiler warning
In particular, on systems that define uint32_t as an unsigned long,
gcc complains as follows:

        CC vcs-svn/svndump.o
    vcs-svn/svndump.c: In function `svndump_read':
    vcs-svn/svndump.c:215: warning: int format, uint32_t arg (arg 2)

In order to suppress the warning we use the C99 format specifier
macro PRIu32 from <inttypes.h>.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-18 16:48:47 -08:00