Skip to content

Conversation

@hftf
Copy link
Contributor

@hftf hftf commented Jul 4, 2015

Fixes #2264.

@hftf
Copy link
Contributor Author

hftf commented Jul 16, 2015

Although pandoc doesn't currently support underlines, I do think there should be an underline builder function. The readers don't currently parse underlines consistently, and it makes more sense for readers to call a builder function instead of doing their own thing. A single function would also be very easy to change in the future if we decide to create a new Underline type, or parse all underlines as Emph, etc.

Would it be better to put the underlineSpan builder in pandoc-types with the other builders? (I think this would require a new PR.) And if you have any feedback or suggestions, please let me know.

Also, if it matters, is it possible to restart the build? I don't understand why one of the jobs failed.

$ cabal-$CABALVER sdist --output-directory=build
Warning: Cannot run preprocessors. Run 'configure' command first.
Building source dist for pandoc-1.15.0.4...
cabal-1.16: Error: Could not find module: Text.Pandoc.Data with any suffix:
["gc","chs","hsc","x","y","ly","cpphs","hs","lhs"]

The command "cabal-$CABALVER sdist --output-directory=build" exited with 1.

@jgm
Copy link
Owner

jgm commented Jul 16, 2015

+++ Ophir Lifshitz [Jul 16 15 10:14 ]:

Although pandoc doesn't currently support underlines, I do think there
should be an underline builder function. The readers don't currently
parse underlines consistently, and it makes more sense for readers to
call a builder function instead of doing their own thing. A single
function would also be very easy to change in the future if we decide
to create a new Underline type, or parse all underlines as Emph, etc.

Would it be better to put the underlineSpan builder in pandoc-types
with the other builders? (I think this would require a new PR.) And if
you have any feedback or suggestions, please let me know.

Also, if it matters, is it possible to restart the build? I don't
understand why one of the jobs failed.
$ cabal-$CABALVER sdist --output-directory=build
Warning: Cannot run preprocessors. Run 'configure' command first.
Building source dist for pandoc-1.15.0.4...
cabal-1.16: Error: Could not find module: Text.Pandoc.Data with any suffix:
["gc","chs","hsc","x","y","ly","cpphs","hs","lhs"]

The command "cabal-$CABALVER sdist --output-directory=build" exited with 1.

Can you rebase against master? There are some travis fixes
that should help with this.

Particularly
a27991d

@hftf
Copy link
Contributor Author

hftf commented Jul 16, 2015

My mistake – doing that now. If we decided to put the builder in pandoc-types, will it have to merge in first? I don't know if it's possible for Travis to know that this PR depends on another repo's PR.

@hftf hftf force-pushed the underline-consistent branch from f78c09f to 2b0d4d3 Compare July 16, 2015 17:45
@jgm
Copy link
Owner

jgm commented Jul 16, 2015

I'm inclined to think that if we're going to mess with
pandoc-types, we should consider adding a proper Underline
element.

+++ Ophir Lifshitz [Jul 16 15 10:36 ]:

My mistake – doing that now. If we decided to put the builder in
pandoc-types, will it have to merge in first? I don't know if it's
possible for Travis to know that this PR depends on another repo's PR.


Reply to this email directly or [1]view it on GitHub.

References

  1. Consistent underline for Readers #2270 (comment)

@hftf
Copy link
Contributor Author

hftf commented Jul 16, 2015

I agree. It would be a half-measure to add the underline builder to pandoc-types but not a full-fledged Underline element. Is Shared.hs still a good place for it?

@hftf
Copy link
Contributor Author

hftf commented Jul 30, 2015

Please let me know if you have any more feedback on this change.

@hftf
Copy link
Contributor Author

hftf commented Nov 21, 2015

Now that 1.16 includes a change to pandoc-types, is it possible to consider before the next release either adding the builder (easy to change later and makes underlines consistent for the time being) or adding a full-fledged Underline type (slightly harder, affects the 9 formats that I listed in #2264)?

@jgm
Copy link
Owner

jgm commented Nov 21, 2015

I'm still a bit uncomfortable with the mismatch between Strong/Emph (which are semantic) and Underline (which is more concrete). On the other hand, we do have Strikeout and SmallCaps.
I'm tempted to convert Strikeout into Deleted and add Inserted.

Also, while I agree that we should treat underline consistently in all readers, I'm not sure it's a good idea to have an underline builder unless it actually produces an underline. This seems likely to confuse people.

Perhaps this would be a good time to (re-)raise these questions on pandoc-discuss.

@hftf
Copy link
Contributor Author

hftf commented Feb 24, 2017

I’d like to revisit this before 2.0 is out. (The inconsistency is a main reason I can no longer use pandoc in my project. All of our documents use semantic underlines.) Do I still need to bring it up on the pandoc-discuss list?

Edit:
If nothing is done, should pandoc eventually emit a warning as part of the upcoming Lossiness Report feature (#3392)? There are already Readers that convert underlines into Strong/Emph/etc., but strip out other unsupported elements. In both of those cases, a lossiness warning should be emitted.

@jgm
Copy link
Owner

jgm commented Feb 24, 2017 via email

@mb21
Copy link
Collaborator

mb21 commented Feb 26, 2017

The inconsistency is a main reason I can no longer use pandoc in my project. All of our documents use semantic underlines.

So you've actually documents that use bold, italics and underline for three semantically different purposes? (Do you even have all possible combinations?!) I tend to take the view that all three are simply a form of emphasis with bold and italics being very common and underlining being used very little.

@hftf
Copy link
Contributor Author

hftf commented Feb 26, 2017

Yes. See these documents (document A or document B) containing quizbowl questions, for example. Tens of thousands of these documents already exist (many are archived here), and hundreds more are written every year. I will explain how basic formatting elements are used in quizbowl documents.

  1. Italics are used for titles, foreign terms, and rarely for emphasis or special instructions.
  2. Bold indicates where a player earns extra points by buzzing early on a tossup question.
  3. Bold+underline indicates what information a player is required to give in their answer.
  4. Underline indicates that when a player gives an incomplete answer, they will be prompted to give a complete answer.

Bonus 11 of document A demonstrates why only using bold, instead of bold+underline, for meaning 3 is insufficient. If two consecutive words of an answer are both acceptable individually, they should be separately underlined. (Thus, Agustín Barrios Mangoré has a different meaning than Agustín Barrios Mangoré. In the latter case, giving just “Barrios” or just “Mangoré” is an incorrect answer.) Notice that if only using bold, foo bar (<b>foo</b> <b>bar</b>) and foo bar (<b>foo bar</b>) can not be visually distinguished.

Yes, all combinations are possible. Tossup 20 of document B has almost all combinations; it is easy to find instances of the remaining combinations (bold+italic and underline) elsewhere in the same document.

I hope I have made the case for why stripping underlines or converting them into other elements will not work for these documents. Even if Pandoc never supports Underline, it should at least have Readers that behave consistently.

@mb21
Copy link
Collaborator

mb21 commented Feb 27, 2017

Interesting, if maybe a bit unusual, usecase! I can see the problem when the input format is docx/odt or a similarly unsemantic format that doesn't allow for easy preprocessing.

Even if Pandoc never supports Underline, it should at least have Readers that behave consistently.

Agreed.

To recap the three options and add a fourth:

  1. a new AST element Underline: now or never
  2. a workaround with spans: cluttering the AST with lots of spans seems messy
  3. emit a warning
  4. add Attr to Emph so we could parse to <em class="underline">foo</em> (I hope I'm not seeing only nails after having discovered a hammer...)

@hftf
Copy link
Contributor Author

hftf commented Jul 19, 2017

Sorry for the additional reminder. Does there need to be further discussion on this? Has there been a decision yet to either make the AST change or the workaround?

@jgm
Copy link
Owner

jgm commented Aug 23, 2017

@hftf sorry this has been neglected so long.

If you want to get these changes working with current dev, I'd be happy to switch to consistently parsing underlines as Span elements with class underline, instead of Emph or nothing.

It might be desirable to adjust some of the writers as well so they render this sort of Span as an actual underline element.

Alternatively, you could add the Underline element to pandoc-types and change all the readers and writers. This would require a new version of pandoc-types (and I suppose that it would be simplest for it to be 2.0 to match the upcoming pandoc).

@hftf hftf force-pushed the underline-consistent branch from 2b0d4d3 to 579a5d9 Compare September 1, 2017 03:10
@hftf
Copy link
Contributor Author

hftf commented Sep 1, 2017

Thank you for revisiting this issue.

I have rebased my changes. (There is a Travis build error, but I don't think it is my fault.)


However, there is one failing test, a strange Docx test that I wasn't comfortable passing or fixing.
Here's an edited snippet of the test diff:

 ... Str "dans",Space,Str "le",Space,Str "film",Space,Link ("",[],[])
-[Emph                       [      Str "\"Le",Space,Str "nom",Space,Str "des",Space,Str "gens\""]]
+[Span ("",["underline"],[]) [Space,Str "\"Le",Space,Str "nom",Space,Str "des",Space,Str "gens\""]]
 ("http://www.allocine.fr/film/fichefilm_gen_cfilm=172167.html",""),Str ".",Space ...

I spent a few hours trying to investigate why another Space appeared at the beginning of the Span, but couldn't really figure out the logic of smushInlines in Combine.hs. I'm not sure if the lack of trimSps on line 526 here is relevant. The test was added in 7a10507 for issue #2689 and it seems that the related code is in 2ee7752. (Seeking advice from the author, @jkr.)


Returning to the underline issue – I think this the best course of action:

  1. Workaround part 1:
    1. Make most Readers transform underlines to underline-spans (this pull request)
    2. Finish step i for the remaining Readers (see the table in Readers should be consistent in transforming underline #2264, might be out of date).
  2. Workaround part 2: Make Writers transform underline-spans to underlines
  3. Replace underline-spans with a new Underline element for 2.0

I can work on and submit a PR for step 2 soon, but I will probably need some help if I were to be the one to tackle step 3, since that change will affect a lot of the codebase.

@hftf
Copy link
Contributor Author

hftf commented Sep 5, 2017

The code responsible for the failure could be filter isAttrModifier in combineSingletonInlines, since Span corresponds to an AttrModifier and not a Modifier like other elements.

> combineSingletonInlines mempty $ fromList [Emph [Space]]
Many {unMany = fromList [Space]}
> combineSingletonInlines mempty $ fromList [Span ("",["underline"],[]) [Space]]
Many {unMany = fromList [Space,Span ("",["underline"],[]) [Space]]}

However, since it was probably done for a reason, I don't think I'm the right person to fix it. It's hard to know the intentions of this code when there are no comments or unit tests.

@hftf
Copy link
Contributor Author

hftf commented Oct 2, 2017

Any updates on this would be great.

@jgm
Copy link
Owner

jgm commented Oct 5, 2017

@jkr do you have any insight into this?

@hftf
Copy link
Contributor Author

hftf commented Oct 6, 2017

Why don't I just pass the test for now? When @jkr eventually sees this issue, then he can fix the test case and add any comments or unit tests.

I'd like to get started soon on implementing Underline via the course of action that I listed above. October would be the best month for me to put time into development; after that, it would be a while until I can contribute much again. Pandoc 2.0 is apparently also supposed to come out around the end of October. However, I don't think it would be worth my time to work on this improvement if it just ends up getting neglected, since I don't have much confidence it will ever be merged if it doesn't land in 2.0.

@jgm
Copy link
Owner

jgm commented Oct 7, 2017 via email

@jkr
Copy link
Collaborator

jkr commented Oct 7, 2017 via email

@jkr
Copy link
Collaborator

jkr commented Oct 8, 2017

Okay -- sorry about the long delay. Beginning of school email avalanche somehow hid the more recent ones from me.

So, @hftf, just to clarify, you'd like the remove the space at the beginning of the span output the way it's removed at the beginning of the emph output, right?

@hftf
Copy link
Contributor Author

hftf commented Oct 8, 2017

@jkr, Yes, that's right. I didn't know if it was intentional. Thank you for taking a look.

@jkr
Copy link
Collaborator

jkr commented Oct 9, 2017

@hftf : cool, I'll take a look this afternoon. Sorry again for the delay.

A couple of questions though, for all:

  1. can you think of any instances in docx where we would want a docx span to keep a leading space? In other words, should this be a thing for spans in general, or should we specialize for the "underline" class? My impulse is to make it general, but maybe I'm not thinking through all the cases.

  2. A lot of folks use underlined spaces for fill-in-the-blank forms, etc. We've been happily ignoring them by just ignoring underlines up until now. Are we cool continuing to ignore this frequent, though completely un-pandoc-like, use-case now that we're interpreting underlines? That would be the result if we munched leading spaces. (I guess we could be super clever and convert to underscores if the entire thing is full of nothing but spaces.)

@jkr
Copy link
Collaborator

jkr commented Oct 9, 2017

There are actually two issues going on here:

  1. The most important one is that the test is turning the links into emphases, which it probably shouldn't do. That's because it sees the links as underlined, when really that's the formatting of autolinks. I have to remind myself about what's going on with this test, but my sense is that there probably shouldn't be underlining in the output here. Basically, these are links masquerading as underlines, and are confusing the issue.

  2. Allowing spaces in the links. This is actually the lack of trimSps at lines 403 and 406 in Docx.hs. But the issue there is that you can see what you're making into a link, so you presumably mean to put a space in there (as opposed to what people do all the time with emph, etc).

I'll see if I can improve what's going on there.

@hftf: Do you have a rebased version of your changeset? That will help me figure this all out.

This can be easily updated if needed. The purpose is for Readers to
transform underlines consistently.
@hftf hftf force-pushed the underline-consistent branch from 579a5d9 to 7cbfb46 Compare October 9, 2017 21:33
@hftf
Copy link
Contributor Author

hftf commented Oct 9, 2017

@jkr, Thanks very much. I've just rebased again. (I haven't touched the adjacent_links test.)

That description seems right to me, but I'm still skeptical that people always intend to put a space at the edge of a link just because the link visibly formats a space, while Emph does not. I believe that people are just as careless with either element. In the documents I work with, bold/italic/underline elements often include the trailing space unintentionally since it gets automatically selected when authors drag the cursor in their word processor.

I simply expected to see the same behavior after changing the Emph to Span (which has no visual formatting on its own). I didn't want to make the decision to pass that test because I couldn't confirm if the behavior was intended or a bug. Would it be possible to add unit tests, comments, or some kind of documentation to cover the intended behavior of smushInlines and Combine.hs? That way, people like me who are not very familiar with the language might have an easier time debugging.

@jkr
Copy link
Collaborator

jkr commented Oct 10, 2017 via email

@jkr
Copy link
Collaborator

jkr commented Oct 10, 2017 via email

@hftf
Copy link
Contributor Author

hftf commented Oct 10, 2017

That seems logical. Some remarks:

  1. I'm in favor of changing the test as well for all the reasons you mentioned. I believe the test was added to handle the strange edge case of several consecutive links, which get combined into a single link. Let's just make sure that behavior is still tested.

  2. I agree that if an entire link is underlined, then the underline can be dropped. What if only part of the link is underlined/not underlined? For now, I suppose that nothing is dropped and whatever formatting is left as is.

  3. I think both leading spaces and trailing spaces should be moved outside the link. Exceptions can be dealt with later, as you say.

It might also be a good idea to move this underline-related discussion to separate issues.

@jgm
Copy link
Owner

jgm commented Oct 10, 2017 via email

@jkr
Copy link
Collaborator

jkr commented Oct 10, 2017 via email

@jgm
Copy link
Owner

jgm commented Oct 23, 2017

@jkr any success creating an alternative test?

I'd like to get this merged before 2.0.

@jgm jgm added this to the pandoc 2.0 milestone Oct 23, 2017
@jkr
Copy link
Collaborator

jkr commented Oct 23, 2017

Yes -- sorry, part of the difficulty was figuring out how to make a pathological case like that test case that would fail properly. The fixes will be up later this evening.

@jkr
Copy link
Collaborator

jkr commented Oct 24, 2017

Sorry -- work thing took over yesterday and today. This is still on my radar, and I plan to get to it tonight.

@jgm
Copy link
Owner

jgm commented Oct 27, 2017 via email

@hftf
Copy link
Contributor Author

hftf commented Oct 27, 2017

I am in favor of dropping the weird test case for now. This specific pull request is the first part of a workaround that really should be merged at this point.

I'm still interested in a new Underline element. I posted a roadmap for that goal 2 months ago, but there seemed to be no interest so I did not implement it. The most opportune time to land an AST change would be 2.0, so I would push for that, but I suppose it can wait.

(I also have another completely unrelated breaking change – a proposal for making "plain" output more useful – that I wanted to bring up on the mailing list before 2.0, but I procrastinated doing that.)

Both items deserve to land in 2.0; however, I would hate to keep everyone else waiting.

@jgm
Copy link
Owner

jgm commented Oct 27, 2017 via email

@jkr
Copy link
Collaborator

jkr commented Oct 27, 2017

I think we should just drop the weird test case. It really was a pathological case (note that in the original issue the original reporter couldn't recreate it). Finding an alternative that recreates the behavior can definitely be a TODO for a point release.

I apologize to everyone about this. There was a perfect storm of personal and professional.

@ickc
Copy link
Contributor

ickc commented Oct 27, 2017

The most opportune time to land an AST change would be 2.0, so I would push for that, but I suppose it can wait.

This was what I would have thought too. But the others convinced me otherwise. Even in pandoc 1.x, there were breaking changes in the point releases. However, the mentality of trying to change all "big changes" in pandoc 2.0 would make it quite overwhelmed, and it seemed to already be. (I just finished reading the changelog of pandoc 2.0, and it is very lengthy! I'm surprised so much is changed and fixed already. I didn't recall such a lengthy changelog (1788 lines!) in the past. And a quick regex confirmed that.)

There's at least one major AST change that everyone is hoping for—column/row span in table, and that is pushed to pandoc 2.1+. So don't worry, there's always a chance to make pandoc even better.

@jgm jgm merged commit 7f8a3c6 into jgm:master Oct 27, 2017
jgm added a commit that referenced this pull request Oct 27, 2017
See #2270 for background -- this test blocked the consistent
underline change and was hard to revise, so for now we are
removing it.
@jgm
Copy link
Owner

jgm commented Oct 27, 2017 via email

@hftf
Copy link
Contributor Author

hftf commented Dec 3, 2017

Is it a good idea to add a CSS rule for .underline in the default html template?

@jgm
Copy link
Owner

jgm commented Dec 3, 2017 via email

@hftf
Copy link
Contributor Author

hftf commented Dec 3, 2017

I imagine this should be sufficient:

span.underline { text-decoration: underline; }

@jgm
Copy link
Owner

jgm commented Dec 4, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants