Skip to content

Conversation

@jkesselm
Copy link
Contributor

Had to do some branch juggling in order to apply my changes on top of the history-recovery changes, but I think this includes everything from both.

Don't assume I haven't messed it up. Please retest, and tell me if any problems are detected.

Yes, I know I probably still have to squash this changeset.

Restructures serializer, xalan, and samples as sub-modules with dependencies on each other.

Avoids carrying binaries in the project; fetch from Maven Central, or (in the case of the Javadoc taglet) carry the source in our own project. As part of that, eliminates dependencies on outdated versions of some of these, specifically java_cup and jlex. JFlex has replaced JLex, which permitted/required handling the look-ahead cases in the grammer rather than by reading ahead via code.

Reworks "site" (documentation) generation. Some files may be landing in diffrent locations in the resulting directory tree; if that's an issue, it can be fine-tuned. Some javadoc errors were (sloppily) fixed before discovering that I can turn off "doclint" and let them continue to be warnings rather than build errors; these should be fixed properly at some point (in many cases by switching to using javadoc inheritance). Some files were converted from HTML to XHTML, since Maven's doc plugin knows how to handle the latter.

JavaCupRedirect is no longer needed.

On Windows, javadoc needed javax.rmi as an explicit dependency.

CAVEAT: Some things wind up in different places than they did in the Ant build. If that bothers folks we can do more work to resove it, but I think it's mostly harmless.

TODO: We still have some StyleBook files as well, mostly for the design documentation. I'm handling those with a "stylebook_docgen" script for now, invoked from the mvnbuild script/batch file; that should be moved into the pom's logic.

NOT YET DONE: More of the non-Java files may want to be moved into resource subdirectories.
@stanio
Copy link
Contributor

stanio commented Oct 17, 2023

Yes, I know I probably still have to squash this changeset.

Note you can do this just when you're about to merge by choosing "Squash and merge" from the "Merge" web UI button options (there's a drop-down arrow on its right).

@garydgregory
Copy link
Member

I use that option all the time, works like a champ.

@vlsi
Copy link

vlsi commented Oct 18, 2023

I probably still have to squash this changeset

It would be better to have separate commits for "rename files" and "modify files" changes.

Currently, "Cutover from Ant-based build to Maven-based build" commit hides a lot, and it is hard to tell if it performs only the intended changes as there are a lot of renamed and modified files at the same time.


+9,777 −1,199 changeset look suspicious to me.
For instance, it looks like this commit resurrects xalan/src/main/java/org/apache/xalan/xsltc/compiler/XPathParser.java which should have been deleted long ago.
The same goes for xalan/src/main/java/org/apache/xalan/xsltc/compiler/sym.java.

It looks like samples/src/site/xhtml/AppletXMLtoHTML/README.xhtml duplicates samples/src/main/java/org/apache/xalan/samples/AppletXMLtoHTML/README.html.

@jkesselm , could you please split "rename files" from "modify files" changes, exclude generated code from the PR and exclude duplicated files?

@jkesselm
Copy link
Contributor Author

Remove generated files: Sure. That snuck back in during the reset after recovering history.

Duplicate: Ditto, I think; lemme check. There are currently multiple documentation paths, which may have done redundance between them. In general, though, Maven prefers xhtml.

Refactor: Sorry, not undertaking that level of additional effort at this time unless the PR is incomprehensible without it. If I get multiple requests, or if you can provide an easy and reliable way to do it, I will reconsider.

@vlsi
Copy link

vlsi commented Oct 18, 2023

unless the PR is incomprehensible without it.

It is very hard to review in both GitHub UI (it is very cluttered), and the desktop tools (e.g. gitk, IDEA git, and so on). Many tools attempt to display full diff for the commit, which makes the diff almost impossible to follow.

if you can provide an easy and reliable way to do it

  1. Squash the changes locally
  2. Amend the commit by excluding modified, added, and deleted files: keep only renames there. For instance, launch "git gui", click amend, and "un-add" all files except the moved ones
  3. Commit renames under "chore: move files according to Maven conventions"
  4. Re-add the modified files
  5. Commit them again

It should not take more than 15 minutes, the outcome would be byte-by-byte the same, and the change will split into two commits.

@jkesselm
Copy link
Contributor Author

Removed generated files. Removed html files that had been refactored/edited into being xhtml files under src/site/xhtml.

@jkesselm
Copy link
Contributor Author

jkesselm commented Oct 18, 2023

2. Amend the commit by excluding modified, added, and deleted files: keep only renames there. For instance, launch "git gui", click amend, and "un-add" all files except the moved ones

Haven't used git gui before. Not sure how to filter for "only renames". Are you proposing that be done manually? (It's also throwing error pop-ups when run under WSL, which may or may not be a problem.)

(The force-pushes mentioned below were abortive attempts to do a Squash which I could then run this filtering against. Each of them reported conflicts, so I backed 'em out. I can fly git; I don't claim to be an expert.)

@jkesselm jkesselm force-pushed the maven-build branch 4 times, most recently from d1f4a2b to 46d66d6 Compare October 18, 2023 22:03
@vlsi
Copy link

vlsi commented Oct 19, 2023

Are you proposing that be done manually?

Yes.

@vlsi
Copy link

vlsi commented Oct 19, 2023

Removed html files that had been refactored/edited into being xhtml files under src/site/xhtml

Could you please explain/provide a link to "maven prefers xhtml"?
I don't use Maven, so I might miss its preferences.

As far as I can remember, XHTML was abandoned long ago.
I performed a random search, and, they say, XHTML was deprecated in 2012: https://softwareengineering.stackexchange.com/a/149843.

What do you think of keeping the document as a regular HTML rather than converting it?
Are you sure the updated documents are 100% XHTML? As the build scripts have no validation for XHTML, bugs will appear quite fast, and there will be no benefit from pretending "the documents are valid XHTML".

@vlsi
Copy link

vlsi commented Oct 19, 2023

Are you proposing that be done manually?

Manual treatment is reasonable. Everybody will likely spend much more time trying to automate it.

There's an alternative option:

  1. Check out your branch in a separate folder (it will be needed for step 3)
  2. Create a new branch starting at master
  3. Perform a commit that only moves the files around. It should be more or less mechanical. Remember to perform renames only. Never edit files. Commit the changes as a single "move files according to Maven conventions" commit.
  4. Delete all files (without committing), and then copy all the files from your current branch (step 0). Then commit the resulting folder

The outcome will be the same as you have now, and "step 3" would produce a commit that includes much less of renames since most (all?) of the renames should be treated at "step 2".


In any case, I suggest merging my PRs before proceeding with Maven.
For instance, PR #7 normalized the line endings, and you might get in conflicts if you touch those files.

@stanio
Copy link
Contributor

stanio commented Oct 19, 2023

As far as I can remember, XHTML was abandoned long ago.

Aside: HTML5 still defines XML syntax for documents.

@jkesselm
Copy link
Contributor Author

jkesselm commented Oct 19, 2023

And I've just reconfirmed that the output generated is .html files, not named .xhtml. So this is a difference that makes a difference to the build but no difference to the users. Working as designed and implemented. Valid question, but the answer is that it stays xhtml.

@jkesselm
Copy link
Contributor Author

jkesselm commented Oct 19, 2023

Manual treatment is reasonable. Everybody will likely spend much more time trying to automate it.

So far you are the only person who has expressed concern. I'm willing to do a quick sizing to determine whether I can do this at all efficiently. No promises, so folks may or may not want to wait before starting to review.

@vlsi
Copy link

vlsi commented Oct 19, 2023

So far you are the only person who has expressed concern

What do you want to convey with that?

It is not the first time you say “the only person who has expressed..”, and I am truly puzzled what does it mean in the context of xalan. My rough estimation is that there are only a few persons who voice their opinion at dev@xalan at all. I know I am the first to raise the concern in Xalan, however:

  1. The concern was valid, and it enabled to detect unwanted changes
  2. It is important to keep git history reasonable for those who encounter a bug in their systems, so they could analyze the nature of the changes
  3. Having “technical commits” as separate ones enables to use git-blame-ignore-revs to improve “git blame” output: https://github.com/pgjdbc/pgjdbc/blob/master/.git-blame-ignore-revs

Of course I can wait, however, I do not believe that “Vladimir is the only who raised a concern” is a valid excuse for not implementing a significant improvement, especially, when there’s no-one else to agree or disagree. The mailing list is virtually dead, so I would not expect a lot of comments

@jkesselm
Copy link
Contributor Author

jkesselm commented Oct 19, 2023 via email

@vlsi
Copy link

vlsi commented Oct 19, 2023

There are others active here besides you and I

Do you mean there are 10+ active contributors?
The project has succeeded in scarring off all the contributors, so it does not sound right to use “no-one else replies” as an excuse.

We agree that, currently, we mostly disagree on whether it's worth the additional effort.

The effort is trivial, and it is you who selected to work on this PR while I have always suggested help with a migration to Gradle.

It is unfair to reject ~15min efforts based on “it is not worth the additional effort” comments

@jkesselm
Copy link
Contributor Author

jkesselm commented Oct 21, 2023

15min for you, perhaps. Longer for me. You could have volunteered those 15 minutes, y'know, rather than waiting for me to figure this out.

But OK. I'll make an attempt. No guarantee of correctness is expressed or implied.

... Hour-plus later: After multiple attempts, haven't got it yet.

  • Merge the commits: git rebase -i acdb8998f5b4a
  • Decommit: git reset --soft HEAD~1
  • Extract lists of which files were moved, moved and changed, deleted and created
    -- Manual process assisted by Emacs macros
    -- Caveat: Due to details of the process, some moved (with or without changes) are listed as deleted and created
    -- It's a relatively small number, mostly the xdocs/stylebook stuff.
    -- Presuming for now that folks can cope with this much.
  • Attempt to undo the staging: git reset HEAD
    -- PROBLEM: According to git status, that loses the distinction between added/deleted and renamed
    -- Yes, I can check them in separately to make this clear in the commits
    -- But I'm not willing to lose it in the history, and this looks likely to do so.

Shelving it for now while I think about what else I can attempt. If you have specific advice, that would be useful.

@jkesselm
Copy link
Contributor Author

jkesselm commented Oct 21, 2023

Trying another approach. Which is taking hours, not minutes, since it's requiring that I take the list of added/deleted/moved/moved-and-changed files, construct scripts to confirm which added/deleted are really moved/copied and where things were moved from/to, and then turn those confirmation scripts into appropriate git operations.

@vlsi : You say it can be done in 15 minutes. Please show me, specifically, how; I'd really like to learn whatever trick you're using. Otherwise, I'm on the verge of saying "nope, can't do it with reasonable effort, if I lose a reviewer that's acceptable." This can wait for Gary to have time to look at it.

@vlsi
Copy link

vlsi commented Oct 22, 2023

@jkesselm , I have suggested the step-by-step guide a couple of days ago in #105 (comment)

Here's a 13min video of me following the exact steps: https://youtu.be/xnkfUGWFUWQ
The video was not sped up. I just routinely go through renames.

The renames I ended up with were slightly different from your current ones.
For instance, you have serializer/resources/... while Maven convention is serializer/src/main/resources.

There were several .html -> .xhtml renames left, however, I think xhtml should not be included in the current commit as file format is more like a personal preference rather than Maven requirement. Either way, it can be adjusted easily.

Other than that, I believe, it confirms that 15 minutes was a reasonable estimate for splitting the renames.

@jkesselm
Copy link
Contributor Author

jkesselm commented Oct 22, 2023 via email

@jkesselm
Copy link
Contributor Author

Uhm. If I start this process from a directory on branch maven-build, I fail when I get to

[keshlam@goldtooth xalan-pr105]$ git --version
git version 2.41.0
[keshlam@goldtooth xalan-pr105]$ git diff origin/head --numstat | grep '=>' > renames
fatal: ambiguous argument 'origin/head': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

I appreciate the opportunity to learn new git features, but I'm not sure what I'm doing wrong here.

@vlsi
Copy link

vlsi commented Oct 22, 2023

ambiguous argument 'origin/head': unknown revision

It says you should have put origin/master

@jkesselm
Copy link
Contributor Author

jkesselm commented Oct 22, 2023

It says you should have put origin/master

Video said otherwise, but -- Yes, running it with origin/master instead does appear to have produced a renames file. I really have to remember that trick.

@jkesselm
Copy link
Contributor Author

jkesselm commented Oct 22, 2023

Did it by using the "renamed" file (VERY useful trick, thanks!) to drive a batchfile rather than "by hand" -- we each have our own ways to manage RSI -- but there should now be a branch called xalan-java-mvn-refactored which separates moving the files from editing, adding, and deleting, making them two different commits.

(@vlsi: Just realized that In the "delete and replace" stage I probably lost moving the resources directory as you suggested. I'll do another commit. Sorry about that.)

Outside of that one deliberate change, it should be identical to the maven-build branch. And it should have preserved all the history. I think.

@vlsi, please examine. Does that satisfy, or are there further quibbles?

Testing is in progress. But since the final stage was "copy contents from a working version", it should be OK. If so, I will probably close this PR and open a new one.

Known glitch: Somehow the .sh scripts got permissions set to 755. I think we want them as 744; if the user wants to open them up further they can do so, but unnecessary execute permissions are asking for trouble.

@vlsi
Copy link

vlsi commented Oct 23, 2023

If so, I will probably close this PR and open a new on

Why split the history/discussion across several PRs?

if the user wants to open them up further they can do so, but unnecessary execute permissions are asking for trouble.

If you feel the scripts are unsafe, just remove them from the repository.
If you feel the scripts are ok, then assign the proper execute bits.

Windows does not ask for execute bits, so you make the life of Linux/macOS users harder for no reason as you remove the execute bits.


I guess you need to settle on xalan2jtaglet.jar. You both move xalan2jtaglet.jar to various locations, and add decompiled sources at the same time.

Testing is in progress

I've no idea how you test, however, I hope you'll add CI sooner rather than later.


I guess you should not put package.xhtml into site as package.html is javadoc-related rather than site-related.
Keeping package.html alone would reduce the noise in the second commit, and it would improve the generated javadocs at the same time.


Does that satisfy, or are there further quibbles?

Would you scan through the second commit to identify excessive changes like xalan/tools/xalan2jdoc.jar, serializer/tools/xalan2jdoc.jar, and so on?

Try something git diff -M --diff-filter=AR --name-status origin/master. Ideally, the second commit should not contain renames (unless you rename classes for some reason), and it should not contain the addition of unwanted files (e.g. jars)

@jkesselm
Copy link
Contributor Author

jkesselm commented Oct 23, 2023 via email

@jkesselm
Copy link
Contributor Author

Major reason I'm considering switching to a new PR is a combination of shedding the arguments about things other than the changes, and distrust of my Git skills in terms of forcing one branch over another. Closing this PR as "OK, we learned from it" and opening a new one is a fairly painless way to resolve both, and is pretty darned common Git practice in my experience.

Main reason I haven't yet done so is that I haven't finished testing to convince myself that yea, verily, this is a consistent copy of the migration.

@jkesselm
Copy link
Contributor Author

jkesselm commented Oct 24, 2023

Confirmed that the maven-build branch passes smoketest, whereas xalan-java-mvn-refactored does not, complaining that

javax.xml.transform.TransformerConfigurationException: javax.xml.transform.TransformerException: org.apache.xml.serializer.utils.WrappedRuntimeException: Could not load the propery file 'output_xml.properties' for output method '' (check CLASSPATH)

(Yes, "propery". That should be fixed at some point.)

OK, time to diff the two trees and see what else has gone AWOL.

ADDENDUM: This appears to be the result of moving the serializer resource files. Moving them back until we can resolve what needs to be changed to support moving them.

ADDENDUM: Confirmed. Committing reversion to the xalan-java-mvn-refactored branch.

@jkesselm
Copy link
Contributor Author

With that reversion (sigh, that's what I get for late changes), the refactored changeset now passes.

Of course that's now refactoring plus about six more commits. I am NOT going to refactor again; they're small enough (especially given all the other changes) that folks can Deal With Them.

I'm still uncomfortable overwriting this branch with that one, and would rather issue a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants