
Note new subject line On Sun, January 29, 2012 7:40 pm, don kretz wrote:
For one, maintaining text is not the same as maintaining source code. And in particular,the work flow for software development is not the same as ours. And if there's any guiding principle behind sccs, it's to support the depth and breadth of software development.
We found we could get along fine for a long time storing simple versions of text in a structured directory design that was easier to use and monitor. And we could always get to our text in another obvious way.
Like Mr. Perathoner I'm a little confused about this statement. A source code file is a text file just like an XML file is, and a version control system ought to be able to handle both of them equally well. I've spent the last 1.5 years working with an XHTML policy manual system using Subversion as the repository and for version control (corporate standard, not my choice). The VCS component of this project was the most straight-forward part and was the one part that "just worked." I'd would be very interested in hearing more about your experiences, off-list if you would like, so I can be ahead of the curve if problems arise in my system.
We ended up spending a lot of time just figuring out the sccs APIs (which again are designed to support software development) and they aren't simple or really very flexible. We had to mostly adapt our conceptual models to theirs - there isn't much room to fiddle with theirs.
You mention the SCC API. I know that Microsoft purchased Visual Source Safe and then created the Source Code Control interface that Visual Studio used to integrate the IDE to VSS. I know that Adobe has adopted this interface exclusively for its Dreamweaver and RoboHelp products, and presumably for its Creative Suite. Presumably other companies have also implemented or consumed the SCC interface, but I have no experience with them. When you speak of SCC are you referring to the Microsoft API? Visual Source Safe and the SCC API are RCS-like systems. They do not support concurrent versioning, but rather use the sequential paradigm where a file is locked on check out and can not be locked by any other user until the file is unlocked by being checked in. A user cannot submit changes to the repository until a lock is obtained. These kind of systems seem to require a lot of administrative attention to break stale locks. We were using the RoboHelp product, and integration between Adobe's SCC interface and the corporate standard Subversion repository was quite challenging. There are a few SCC/SVN products out there, but most are quite long in the tooth. We ended up using the commercial PushOK product to convert SCC calls to SVN, and vice-versa. If when you say SCC you are referring to the Microsoft Source Code Control interface then I can understand your frustrations. But for this particular project I think we shouldn't face these problems if we simply stick with a concurrent version control systems, and eschew any RCS-like systems.
Sccs transactions can be very slow.
Again, it depends on what system your talking about. My experience with Adobe suggests that even SCC transactions can be very quick if you have well-written software running on a 100-base T Local Area Network ;-). I don't think it will be a problem to find a version control system fast enough for our needs. I /do/ think that Mr. Perathoner's concerns about users on GPRS systems are valid, and we need to think about how to address those concerns.
I realize we had different needs - here we're talking about at least mostly-completed projects, not page-or-less components. But I don't see how we easily avoid at least some extension in the proofing direction if we really want to do continuous semi-open-access improvement of texts. I think it requires administrative resources we'll never have to do it any other way.
Yes, I think some extension into the proofing direction is inevitable, and we should be prepared for it. This is why I suggest a rule that some sort of unambiguous page marker be inserted into the master file so that a single page can be programmatically extracted. This leads me around to the "unit of work" question. Mr. Perathoner suggests that Git may not be the best solution for a VCS as you have to check out the entire "project" before the efficiencies of diff merging kick in. So what is a "project?" I had always conceived a project as being a single "work," whatever that means. I get the impression that others conceive of the project as encompassing all 5000 works that we choose as our starting point. I propose that for version control purposes, each "work" will have its own "project." Each project must contain the master file(s) and page scans of the work. (Would a simple reference to the page scans at IA be sufficient? Do we need to bust open IA's archive files so each page image can be viewed individually?)
Could we just call it "the repository" or something for a while? I think we should maybe spend more time coming at it from the user's direction and refining some requirements before we make technology choices.
The reason I would prefer choosing /something/ is that I tend to use agile development methodologies in my own work. If a repository were available now, I would start by generating an HTML file using Mr. Perathoner's text-to-HTML scripts, doing easy tweaks to the file, and checking it in as a first version. I would continue to work with the file adding complexity and generating new ideas as I go, doing interim check-ins. With each revision I will have learned something, which will cause me to propose a new rule, or the modification of an old rule. If it turns out that the work I would have done is incorrect or unnecessary we'd just throw it away and start over. If it turns out the VCS system we have chosen is inadequate to the task we just import the files into a new VCS system (I think they all have some sort of import/export function). The most fundamental proposition of agile programming is that it's okay to throw away work if it's wrong.
And in that direction I think automating the build process to be more dependency-sensitive might pay off more in the short run. Maybe it's there and I don't know it, but I haven't heard much of that flavor to the discussion so far.
I'm opposed to a "build process." With the exception of CVS every VCS I've been talking about has an HTTP interface, and in the case of CVS the project document directory (not CVSROOT) can be mounted as a web server document directory. The most recent version of any document should be available through a browser call. As for derived formats, a web server interface would be provided to serve derived formats on demand. Caching would be appropriate, so any particular format would be cached on generation, but time-stamping should be observed, so a cached file would be discarded and regenerated when a change to the project files occurs.