[gutvol-d] Version control systems.

30 Jan 2012

      Note new subject line

On Sun, January 29, 2012 7:40 pm, don kretz wrote:
...
For one, maintaining text is not the same as maintaining source code. And
in particular,the work flow for software development is not the same as
ours. And if there's any guiding principle behind sccs, it's to support the
depth and breadth of software development.
We found we could get along fine for a long time storing simple versions of
text in a structured directory design that was easier to use and monitor.
And we could always get to our text in another obvious way.
Like Mr. Perathoner I'm a little confused about this statement. A source code
file is a text file just like an XML file is, and a version control system
ought to be able to handle both of them equally well. I've spent the last 1.5
years working with an XHTML policy manual system using Subversion as the
repository and for version control (corporate standard, not my choice). The
VCS component of this project was the most straight-forward part and was the
one part that "just worked." I'd would be very interested in hearing more
about your experiences, off-list if you would like, so I can be ahead of the
curve if problems arise in my system.
...
We ended up spending a lot of time just figuring out the sccs APIs (which
again are designed to support software development) and they aren't simple
or really very flexible. We had to mostly adapt our conceptual models to
theirs - there isn't much room to fiddle with theirs.
You mention the SCC API. I know that Microsoft purchased Visual Source Safe
and then created the Source Code Control interface that Visual Studio used to
integrate the IDE to VSS. I know that Adobe has adopted this interface
exclusively for its Dreamweaver and RoboHelp products, and presumably for its
Creative Suite. Presumably other companies have also implemented or consumed
the SCC interface, but I have no experience with them.

When you speak of SCC are you referring to the Microsoft API?

Visual Source Safe and the SCC API are RCS-like systems. They do not support
concurrent versioning, but rather use the sequential paradigm where a file is
locked on check out and can not be locked by any other user until the file is
unlocked by being checked in. A user cannot submit changes to the repository
until a lock is obtained. These kind of systems seem to require a lot of
administrative attention to break stale locks.

We were using the RoboHelp product, and integration between Adobe's SCC
interface and the corporate standard Subversion repository was quite
challenging. There are a few SCC/SVN products out there, but most are quite
long in the tooth. We ended up using the commercial PushOK product to convert
SCC calls to SVN, and vice-versa.

If when you say SCC you are referring to the Microsoft Source Code Control
interface then I can understand your frustrations. But for this particular
project I think we shouldn't face these problems if we simply stick with a
concurrent version control systems, and eschew any RCS-like systems.
...
Sccs transactions can be very slow.
Again, it depends on what system your talking about. My experience with Adobe
suggests that even SCC transactions can be very quick if you have well-written
software running on a 100-base T Local Area Network ;-).

I don't think it will be a problem to find a version control system fast
enough for our needs. I /do/ think that Mr. Perathoner's concerns about users
on GPRS systems are valid, and we need to think about how to address those
concerns.
...
I realize we had different needs - here we're talking about at least
mostly-completed projects, not page-or-less components. But I don't see how
we easily avoid at least some extension in the proofing direction if we
really want to do continuous semi-open-access improvement of texts. I think
it requires administrative resources we'll never have to do it any other
way.
Yes, I think some extension into the proofing direction is inevitable, and we
should be prepared for it. This is why I suggest a rule that some sort of
unambiguous page marker be inserted into the master file so that a single page
can be programmatically extracted.

This leads me around to the "unit of work" question. Mr. Perathoner suggests
that Git may not be the best solution for a VCS as you have to check out the
entire "project" before the efficiencies of diff merging kick in. So what is a
"project?" I had always conceived a project as being a single "work," whatever
that means. I get the impression that others conceive of the project as
encompassing all 5000 works that we choose as our starting point.

I propose that for version control purposes, each "work" will have its own
"project." Each project must contain the master file(s) and page scans of the
work. (Would a simple reference to the page scans at IA be sufficient? Do we
need to bust open IA's archive files so each page image can be viewed
individually?)
...
Could we just call it "the repository" or something for a while? I think we
should maybe spend more time coming at it from the user's direction and
refining some requirements before we make technology choices.
The reason I would prefer choosing /something/ is that I tend to use agile
development methodologies in my own work. If a repository were available now,
I would start by generating an HTML file using Mr. Perathoner's text-to-HTML
scripts, doing easy tweaks to the file, and checking it in as a first version.
I would continue to work with the file adding complexity and generating new
ideas as I go, doing interim check-ins. With each revision I will have learned
something, which will cause me to propose a new rule, or the modification of
an old rule.

If it turns out that the work I would have done is incorrect or unnecessary
we'd just throw it away and start over. If it turns out the VCS system we have
chosen is inadequate to the task we just import the files into a new VCS
system (I think they all have some sort of import/export function).

The most fundamental proposition of agile programming is that it's okay to
throw away work if it's wrong.
...
And in that direction I think automating the build process to be more
dependency-sensitive might pay off more in the short run. Maybe it's there
and I don't know it, but I haven't heard much of that flavor to the
discussion so far.
I'm opposed to a "build process." With the exception of CVS every VCS I've
been talking about has an HTTP interface, and in the case of CVS the project
document directory (not CVSROOT) can be mounted as a web server document
directory. The most recent version of any document should be available through
a browser call.

As for derived formats, a web server interface would be provided to serve
derived formats on demand. Caching would be appropriate, so any particular
format would be cached on generation, but time-stamping should be observed, so
a cached file would be discarded and regenerated when a change to the project
files occurs.

[gutvol-d] Version control systems.

Lee Passey