## Not signed in

Want to take part in these discussions? Sign in if you have an account, or apply for one below

## Discussion Tag Cloud

Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.

• CommentRowNumber1.
• CommentAuthorMark C. Wilson
• CommentTimeDec 11th 2012

Robert Rosebrugh - Experience with a free electronic journal: Theory and Application of Categories http://www.ams.org/notices/201301/rnoti-p97.pdf

He claims that it is much easier to run such a journal than many people believe.

• CommentRowNumber2.
• CommentAuthorHenry Cohn
• CommentTimeDec 12th 2012

There are a couple of things I think TAC should do differently.

One is copyright. The articles are marked “Permission to copy for private use granted”, with no indication of exactly what constitutes “private use” or what sorts of uses might be objectionable, and the copyright agreement just gives the journal the right to publish the papers (it’s not clear to me whether they can even authorize anyone else to reprint or distribute papers). If you are going to have a free online journal, I strongly believe the papers should be made available under a license such as CC BY-ND. This would allow other people to maintain archives or distribute papers (provided they didn’t modify the papers), reprint them, etc. There’s great value in being clear and explicit about this and using a standard, well-understood license, and there’s no harm if you are going to distribute the papers for free anyway.

Another is archiving. As best I can tell from their archiving policy, the primary archive is a set of TeX files maintained by the managing editor. The National Library of Canada maintains another archive, but it does not seem to contain the TeX files. This is a pretty serious problem: the file types being maintained by the library were not designed or standardized for archival purposes and may not be adequate, TeX files are pretty tricky to use (the policy says they keep all the macro files needed, but various packages are updated in incompatible ways without notice, so this is painful to do right and easy to mess up without noticing), and in any case having the primary archive be maintained by the managing editor is not a viable long-term archiving plan.

• CommentRowNumber3.
• CommentAuthorzskoda
• CommentTimeDec 13th 2012
• (edited Dec 13th 2012)

I disagree with 2. – more formal the licenses, I am less likely to submit to that free journal. We are now being bombarded by the proliferation of various licences which are very complicated to understand, all talk “free etc.”, pretend “standard”, but are mutually conflicting and embarassingly annoying. If something is free and reasonable in tone, length, design and environment, then the rules of common sense are driving the mechanisms stronger than the law. Minimal and flexible statement is better than the detailed (which can still never literally predict all the future cases of conflict). As long as it is the culture of trust and low formalization I will prefer such communities; they have the flexibility to grow, to revolutionize, to allow rebelion when needed and will never be abused by the dirty culture of lawyers and other social saprophites.

Archiving can always be made better, of course.

• CommentRowNumber4.
• CommentAuthorHenry Cohn
• CommentTimeDec 14th 2012

We are now being bombarded by the proliferation of various licences

I’m not convinced this is true any longer (significant new open access journals are generally using CC licenses), although it once was true. To the extent we are still seeing ad hoc, custom licenses, full of ambiguities and loopholes, CC licenses are the solution. They are the only licenses we have that are broadly used, standardized, and well understood.

the rules of common sense are driving the mechanisms stronger than the law

The problem with common sense is that everyone thinks they know what’s common sense and what isn’t, but they don’t always agree.

For example, one big issue is derivative works: should readers be allowed to distribute modified or adapted versions of papers (as long as they make it very clear what they have done)? For example, translations, conversions to slides for use in teaching, corrected or extended editions, etc. Some people think common sense demands that of course readers should be allowed to do some or all of these things; others disagree.

If we rely on common sense alone to sort this out, it’s guaranteed to end up making someone really upset. The only responsible solution is clarity: authors should understand what they are agreeing to allow, and readers should understand what authors have allowed. The beauty of CC licenses is that they are the only standard we have for explaining this. (By contrast, I don’t understand what Theory and Application of Categories does or doesn’t allow.)

Minimal and flexible statement is better than the detailed (which can still never literally predict all the future cases of conflict).

Exactly, and this is what the CC licenses do. The point is that if you try to craft a statement of which uses you will or won’t allow, you’ll probably screw it up. The only way to craft a clear, unproblematic statement is to keep it absolutely as simple as possible.

For example, CC BY is the simplest license that puts any restrictions at all (i.e., doesn’t just put the paper in the public domain for anyone to do whatever they want with). The one restriction is that you must credit the source, but otherwise you can do anything you want.

CC BY-ND is the next simplest license. It says you must credit the source and you cannot modify the work.

Many mathematicians prefer CC BY-ND to CC BY, because they are uncomfortable with possible derivative works. However, I see no reason to choose a license more complicated or restrictive than CC BY-ND for an open access paper, and there are real risks involved in doing so.

will never be abused by the dirty culture of lawyers and other social saprophites

Exactly: the point of choosing a clear license is to avoid this risk. The problem with custom licenses is that they are generally full of loopholes that potentially allow this sort of abuse. You might think abuse is unlikely, because who is going to have an interest in messing with the mathematical community? One possibility is authors: a certain (small) fraction of authors are going to become extremely eccentric, and they may start trying to abuse the system. For example, Grothendieck has tried to prevent any future republication of any of his work. One thing I want to avoid is a situation where people do what they think common sense allows, and then years later an eccentric author appears and tries to crack down on this, with a potentially valid legal argument (namely that this use was never legally authorized). That’s a real mess.

culture of trust and low formalization

Trust and low formalization is a great solution for small, homogenous groups, but it requires a shared understanding. The mathematical community does not currently have such an understanding of what is appropriate for open access papers, and widespread use of CC licenses is the closest I can envision us coming to such an understanding.

• CommentRowNumber5.
• CommentAuthorzskoda
• CommentTimeDec 14th 2012

Interesting.

As far as the derivative works, I read extensive literature in 1990s on the issues with reverse engineering of software. The authors are experts in author rights and they claimed that the various statements some software vendors had about preventing reverse engineering are total bluff: once you get to use the software, you can translate it, study, decode, reverse engineer as far as you can do with a children toy you buy in a magazine for your kid. Now selling the derivatives is another thing, there is no general rule.

• CommentRowNumber6.
• CommentAuthorDmitri Pavlov
• CommentTimeDec 26th 2012

there are real risks involved in doing so

What are these real risks?

• CommentRowNumber7.
• CommentAuthorHenry Cohn
• CommentTimeDec 27th 2012

What are these real risks?

One is that the agreement won’t legally capture what you intended it to, becaue it’s extraordinarily difficult to pin down certain concepts. For example, people sometimes worry about “commercial use” of a paper, but nobody can agree on what constitutes commercial use. A company that makes money through advertising-supported websites offering paper downloads is presumably commercial use. But what about posting a paper on a personal blog, with hosting costs supported by advertising? That’s almost the same thing theoretically, but it feels very different. What if I post a collection of my favorite papers on my Microsoft Research web site? Is that commercial use, because it could be viewed as helping to attract viewers to a Microsoft web site, even though no money would change hands? On the other hand, one shouldn’t necessarily rule out all cases of money changing hands. What about a company that offers to sell nicely printed and bound collections of papers? Is that more like the advertising-supported blog, or is it more commercial? And this is not even getting into the question of students paying tuition, or of for-profit vs. non-profit universities. (Plus there’s an infinite list of other issues. See, for example, the footnote on pages 32-33 of http://mirrors.creativecommons.org/defining-noncommercial/Defining_Noncommercial_fullreport.pdf.)

The net effect is that if some mathematicians spend a few hours writing a copyright agreement, there’s a high probability that it will be at best legally ambiguous, and at worst unambiguously not what was intended. (For example, I’m not a lawyer, but the TAC legalities look awfully sloppy and vague to me, and I wouldn’t consider them a clear, reliable guide to what can legally be done with TAC papers.) The extra value of being able to write a custom license is negligible, so it’s best to stick with one that’s simple and has already been thoroughly analyzed and endorsed by lawyers.

A related risk is what happens in a situation that was simply not anticipated when writing the agreement. For example, what if TAC decided to create a paywall and start charging for papers? As far as I can tell, they could do that, and nobody else would be allowed to distribute papers without their permission or the author’s permission. Over time, some authors would become difficult or impossible to track down, so the second option might be unavailable. Or what if the body that owns TAC fell apart? (It’s not clear from their web site who is legally in charge, but let’s assume it’s the editorial board.) It’s possible that an angry dispute might tear the board apart, lead to a number of resignations, and leave someone bitter or eccentric in charge or leave it unclear who was actually in charge. Then who knows what would happen to the papers, and nobody but the original authors could do anything about access.

Of course these events are extremely unlikely. However, there are 600+ mathematical journals out there, and we would like their contents to be available essentially forever (keeping in mind that if Disney gets its way, current United States copyrights will be extended perpetually and these works will never enter the public domain). With 600 journals extending over centuries, stupid and crazy things are sure to happen someday, somewhere. We need things set up in a robust way, so if a journal goes off the rails, the rest of the world can simply archive the journal’s papers and move on, without leaving part of the literature stuck in legal limbo.

The beauty of licenses like CC BY or CC BY-ND is that they are simple and completely clear to everyone involved and they cover all circumstances. If you aren’t planning to restrict/sell access, then I see no reason to prefer another license.

• CommentRowNumber8.
• CommentAuthorDmitri Pavlov
• CommentTimeDec 27th 2012
• (edited Dec 27th 2012)

but nobody can agree on what constitutes commercial use

But what about posting a paper on a personal blog, with hosting costs supported by advertising? What if I post a collection of my favorite papers on my Microsoft Research web site? What about a company that offers to sell nicely printed and bound collections of papers?

In my opinion, all of these uses are clearly commercial and should not be allowed under the noncommercial clause. However, I don’t see how this would present a problem. Why copy the files with papers to your site if you can simply link to either the arXiv or the journal website? It is much more likely that these papers will remain available there than on somebody’s webpage.

The beauty of licenses like CC BY or CC BY-ND is that they are simple and completely clear to everyone involved and they cover all circumstances.

CC-BY or CC-BY-ND are by no means “completely clear” or “cover all circumstances”, as the above examples demonstrate. They are just as vague as CC-BY-ND-NC.

If you aren’t planning to restrict/sell access, then I see no reason to prefer another license.

I don’t plan to restrict/sell access to my papers, yet I do see a reason to prefer CC-BY-ND-NC: I don’t want anybody to profit from my work. (I’m more concerned with the possible corrupting effect of a commercial use than with the pure monetary gain.)

Here is an explicit example from Heather Morrison’s blog: a commercial company takes a CC-BY paper describing a certain medical study and containing a photo of one of the patients (obtained with his consent) that participated in the study. It takes out the photo and starts running ads with this photo for its drug, which was used in the study. Using the CC-BY-ND-NC license would prevent such a use of the photo.

• CommentRowNumber9.
• CommentAuthorHenry Cohn
• CommentTimeDec 27th 2012

I wouldn’t say any legal document could ever be perfectly unambiguous, but CC BY and CC BY-ND are about as clear as we are going to get. Specifically, I don’t think the ambiguities listed above are serious for CC BY-ND. Up until the TeX issues, they are ruled out by the no derivative works clause. As for the TeX issues, this is a good reason to choose an archival format such as PDF/A instead of or in addition to TeX (since TeX is a horrific mess for long-term archiving). All the CC licenses allow translating to other formats, if one only makes the minimal technical changes necessary to do this. You could certainly debate exactly which changes are necessary, but I see it as a relatively small issue. (It would be annoying if someone screwed something up while translating formats, but not allowing anyone to change formats would be a disaster for long-term accessibility, and it’s not clear what it even means to insist on perfect fidelity, so we’re kind of stuck with this.)

I’d definitely not recommend using CC BY unless you are comfortable with allowing people the hypothetical possibility of doing all sorts of weird things with your paper. In practice nobody has any interest in doing weird things, but in principle the license would allow them to do things like edit the paper in crazy ways (provided they clearly identified it as having been edited by them).

Minor ambiguity about proper attribution or derivative works doesn’t bother me that much for academic papers. The danger with this ambiguity is in potentially allowing too much, and I don’t see that as a serious problem. The academic community will enforce its own norms regarding things like attribution much better than the courts can (the deterrent to academic misbehavior isn’t getting sued, but rather damaging your reputation or career). It would be annoying if people outside the academic community did stupid things with papers, but it will rarely happen and in the vast majority of cases it could just be ignored. As a general principle, I think not allowing enough use is much more of a long-term danger than allowing too much, and the worst-case scenario is being stuck without the legal right to do reasonable things.

Why copy the files with papers to your site if you can simply link to either the arXiv or the journal website? It is much more likely that these papers will remain available there than on somebody’s webpage.

If all goes well, yes, but there are several reasons one might prefer a local copy:

1. Speed or ease of access, especially from out of the way locations or where unusual usage might take down the original servers (e.g., if the paper is discussed in a massive online course with hundreds of thousands of participants).

2. Cloud storage (e.g., Dropbox). The line between storing a file on your hard drive and storing it on someone’s server is getting awfully blurry, and when you introduce sharing features, further distribution comes into the picture.

3. Archiving, if one doesn’t trust the journal to do a good job of long-term preservation. Maintaining a permanent link (DOIs do a good job of this, but they cost money and many free journals do not supply them).

4. Automated processing in various ways. For example, building a search index like Google Scholar, deduplication to save server space in Dropbox, etc.

5. Paranoia about other people’s logs and tracking (or, conversely, a desire to do your own usage tracking).

I don’t plan to restrict/sell access to my papers, yet I do see a reason to prefer CC-BY-ND-NC: I don’t want anybody to profit from my work.

In practice, the commercial value of reproducing a freely available math paper is negligible, so I think it would be incredibly rare for profit-oriented abuse to occur. On the other hand, for-profit companies are involved in all sorts of standard and harmless activities (e-mailing a paper to an e-mail account supplied by Google or Microsoft, storing a paper in Dropbox and sharing it with collaborators, putting a university course online via Coursera, etc.).

Most of the time, noncommercial clauses will be irrelevant: nobody has a profit incentive for any abuse, and typical authors would be extremely unlikely to take anyone to court anyway. The biggest danger I see is in allowing someone (for example, an eccentric, disgruntled author like Grothendieck) to mess with the system via lawsuits over totally harmless activities.

My feeling is that whenever the courts intervene in academic publishing, it is more likely to make things worse than better: they don’t understand academia, and their goals and purposes are not the same as ours. In particular, the worst case scenario is that they will decide to interpret a license literally and rule out usage allowed by academic customs. So I believe we should choose licenses that minimize the potential role of the courts (and reserve legal action only for the most extreme circumstances).

• CommentRowNumber10.
• CommentAuthorDmitri Pavlov
• CommentTimeDec 27th 2012

As for the TeX issues, this is a good reason to choose an archival format such as PDF/A instead of or in addition to TeX (since TeX is a horrific mess for long-term archiving).

PDF/A seems to be a terrible choice for an archival format. I recently read a rant by somebody involved with writing PDF software. As it turned out, very few documents that claimed to be in PDF/A format actually conformed to the PDF/A specification. In contrast to TeX, which simply rejects incorrect files, there is nothing that prevents publishers from producing nonconformant PDF files. In particular, it doesn’t seem like there is a readily accessible software for verifying PDF/A conformance.

Also, in contrast to TeX, PDF/A is a binary opaque format. It’s much more difficult to process PDF files than TeX files. Google had been indexing TeX files for many years before it started indexing PDF files, and even now the text extracted by Google from PDF files is not always accurate, in contrast to TeX.

TeX is also much more stable than PDF, a set of TeX files written 30 years ago (!) in 1982 will produce exactly the same result now. (Of course, format files must be included. This is somewhat a problem for LaTeX (as opposed to Plain TeX and AMS TeX); there are almost 20 different versions of LaTeX 2e, not to mention LaTeX 2.09 and the previous versions. On a typical LaTeX installation it might be difficult to compile an old LaTeX file precisely for these reasons. But projects like the arXiv can easily afford to maintain all these different versions.)

Finally, it’s much more easy to produce a file in some new format from the original TeX source than from its compiled PDF version. For example, HTML5/CSS3 seem to be capable of an accurate reproduction of all TeX/DVI functionality and I bet we will soon see a new compiler from TeX to HTML5/CSS3 capable of reproducing DVI/PDF output with perfect accuracy. Converting PDF files to HTML5/CSS3 while preserving the layout is much more difficult because of the enormous (and mostly unnecessary) complicatedness of the PDF/A standard.

In fact, I would go as far as to claim that in a few years HTML5/CSS3 will make PDF disappear from common usage, whereas TeX will still be in use.

• CommentRowNumber11.
• CommentAuthorHenry Cohn
• CommentTimeDec 27th 2012

Yeah, I agree PDF/A has problems and I wish we had a better standard, but it is well documented and standardized, and there are an enormous number of PDF files out there, so these files should be readable long into the future. (And I’m at least thankful we have a PDF/A standard, since general PDF files are vastly more problematic for archival purposes.)

Plain TeX is also a stable and well documented format, and I expect those who are interested will be able to process it well into the future. It would be sensible to archive other file formats as well, but plain TeX could serve this purpose. However, almost no papers use plain TeX nowadays.

The frustrating aspect of LaTeX is that most mathematicians don’t understand what a terrible mess it is, that even common LaTeX packages frequently break backwards compatibility, as does LaTeX itself from time to time. The arXiv manages to store and process old LaTeX through really painstaking efforts, and this is not feasible for most organizations. (I.e., if a small journal run by volunteers tries to archive LaTeX code, I think there’s a high chance they won’t be careful enough about maintaining historical packages and LaTeX installations, and their code may not work ten or twenty years later without expert attention. It’s likely it could be fixed given enough expertise, but relying on expert intervention in the future is not a viable archiving strategy.)

For comparison, I’ve already had two of my papers become impossible to compile correctly using up to date LaTeX installations (an old one because a style file was incompatible with LaTeX 2e, and a newer one because TiKZ changed). I fixed both of them, but it was annoying. I don’t know any statistics, but I’d bet this is surprisingly widespread, and that most people just don’t realize since they don’t LaTeX their old papers very often.

Sadly, I suspect the opacity of PDF contributes to its popularity (I agree it’s awful trying to extract structure from it). I know several mathematicians who refuse to put their papers on the arXiv, because they are afraid that other people will misuse the TeX source. I’d bet there are a number of others who would worry about this but aren’t really aware that the arXiv distributes source files. Of course PDF isn’t really a black box, but this impression pleases people. If it were pleasant to edit PDF files or copy/paste, I bet they’d be less popular.

• CommentRowNumber12.
• CommentAuthorNoah Snyder
• CommentTimeJan 12th 2013
Is there no good way to input LaTeX files and output plain TeX? Obviously it wouldn't work on old LaTeX files for the same reasons, but it'd still be useful in producing archival versions of current files. I'm sure there's some good reason this can't be done or else people would do it, but I was curious.
• CommentRowNumber13.
• CommentAuthorHenry Cohn
• CommentTimeJan 15th 2013

Hmm, I’m not sure. There’s a useless sense in which it could certainly be done in principle (by specifying the placement of each character individually in the plain TeX file), but that would negate any benefits from doing this. What I imagine is that any reasonable way of doing it would basically amount to including in the plain TeX file code that basically implements all the LaTeX features being used, which would be cumbersome and probably not worth the effort.

• CommentRowNumber14.
• CommentAuthorAndrew Stacey
• CommentTimeJan 22nd 2013

Regarding conversion of LaTeX to Plain TeX, the answer is “No”. See Convert from LaTeX to Plain TeX and the links therein.

The view that (La)TeX files are a good storage option is a practical myth. Until recently, there wasn’t even a way to get an old version of a particular package so to store a document you would also have to store all the packages that generated it (see Historical, stable version archive of packages). You’d also have to store the engine, as although TeX itself is fixed what everyone in practice uses is most likely eTeX. Moreover, there is increasing use of XeTeX and LuaTeX and these bring their own issues as they provide access to system fonts. So to archive a LaTeX document you potentially have to archive quite a lot.

I don’t know all of the gains that PDF/A brings, but one distinct disadvantage is the difficulty in producing it from LaTeX (see How to create tagged PDF?).

I’m not all that knowledgeable in these matters, but to my mind some variant of XML would make the best archive format. It has enough structure to contain semantic information, and is an open standard so it is easy to build tools to convert it to other formats (having written a couple of programs that attempt to parse TeX, I do know a bit about this part!).

• CommentRowNumber15.
• CommentAuthorDmitri Pavlov
• CommentTimeJan 22nd 2013

Plain TeX packages seldom change (and the engine itself is frozen), unlike LaTeX ones.

some variant of XML would make the best archive format

XML is being rapidly replaced with JSON and it doesn’t seem like it will survive for much longer. Its fate will probably be similar to that of SGML.

A major problem with XML is that it’s practically impossible to typeset formulas in it.

• CommentRowNumber16.
• CommentAuthorAndrew Stacey
• CommentTimeJan 22nd 2013

While that’s true about Plain TeX, the fact that almost no-one uses it makes it irrelevant for this issue. As you can’t “translate” from LaTeX to Plain TeX, the fact that everyone uses LaTeX makes it moot. And whilst it is reasonably fixed, it is not documented anywhere so writing a Plain TeX-to-something else engine would be difficult.

I have no opinion on the relative benefits of XML, JSON, SGML and their ilk. The point is to choose an open standard which is flexible enough to encode everything we’d want to encode, but it needs to be an open standard so that it genuinely is possible to reconstruct it at a later date. When I wrote XML I had in mind XHTML+MathML which seems to work fine for formulas. Maybe you don’t consider XHTML+MathML a dialect of XML.

The wider point is that the best format for authoring is not necessarily the best for storage, and the best for rendering might be another one entirely. I’m not an expert on the various possibilities for storage formats, but I am convinced that no dialect of TeX is suitable for this.

• CommentRowNumber17.
• CommentAuthorDmitri Pavlov
• CommentTimeJan 23rd 2013

it is not documented anywhere

Plain TeX is fully documented in The TeXbook, unlike LaTeX, which apparently has no complete documentation.

XHTML+MathML which seems to work fine for formulas

Nobody can type formulas in MathML anyway, so I don’t see how this proposal could potentially be relevant. If you are proposing to compile formulas to MathML from some other language (TeX?), then we would still need to archive the original files, which returns us to the same question.

• CommentRowNumber18.
• CommentAuthorAndrew Stacey
• CommentTimeJan 23rd 2013

then we would still need to archive the original files

That’s the bit I don’t agree with. Why do we need to archive the original files? Why not translate them to some format more suitable for storage? Yes, it is possible that we lose information in the translation process. But it is also possible that we gain overall since the information that we lose is potentially ambiguous. By converting to a rigid format at time of writing, the author is asked to resolve the ambiguities the chance of getting the right resolution is that much higher than if the conversion is done at a much later stage.

So, yes, I am proposing conversion from TeX to MathML and then storing the MathML and throwing away the TeX.

• CommentRowNumber19.
• CommentAuthordarij grinberg
• CommentTimeJan 23rd 2013
• (edited Jan 23rd 2013)

LaTeX might not be very standardized, but doesn’t its human-readability (unless very weird and badly-written packages are used) outweigh these issues in practice by far? I don’t think bit rot could realistically happen with LaTeX files in the sense that some ancient paper could not be decrypted by someone with an hour of time at his hands, barring some extremely hacky implementation of diagrams and pictures. Feel free to give me a counterexample…

• CommentRowNumber20.
• CommentAuthorAndrew Stacey
• CommentTimeJan 23rd 2013

\let~\catcode~76~A13~F1~j00~P2jdefA71F~7113jdefPALLF
PA''FwPA;;FPAZZFLaLPA//71F71iPAHHFLPAzzFenPASSFthP;AFevP
AGGFRruoPAqq71.72.F717271PAYY7172F727171PA??Fi*LmPA&&71jfi
Fjfi71PAVVFjbigskipRPWGAUU71727374 75,76Fjpar71727375Djifx
RrhC?yLRurtKFeLPFovPgaTLtReRomL;PABB71 72,73:Fjif.73.jelse
B73:jfiXF71PU71 72,73:PWs;AMM71F71diPAJJFRdriPAQQFRsreLPAI
I71Fo71dPA!!FRgiePBt'el@ lTLqdrYmu.Q.,Ke;vz vzLqpip.Q.,tz;
;Lql.IrsZ.eap,qn.i. i.eLlMaesLdRcna,;!;h htLqm.MRasZ.ilk,%
s\$;z zLqs'.ansZ.Ymi,/sx ;LYegseZRyal,@i;@ TLRlogdLrDsW,@;G
doTsW,Wk;Rri@stW aHAHHFndZPpqar.tridgeLinZpe.LtYer.W,:jbye


Actually, that is Plain TeX but the principle is the same.

• CommentRowNumber21.
• CommentAuthordarij grinberg
• CommentTimeJan 23rd 2013

I was not talking of obfuscated code, which can indeed be as sensitive to updates as one wishes in a Turing-complete language. And I do consider code that starts with redefining \ as j and } as P to be obfuscated. What I meant is that I have yet to see an unreadable arXiv source code, even among the ones from early 1990s (and yes, I occasionally do peek into sources to understand how people drew diagrams; occasionally there are also interesting comments…). I fear that every derivate of XML will have much worse readability properties.

• CommentRowNumber22.
• CommentAuthorDmitri Pavlov
• CommentTimeJan 24th 2013

I agree with Darij. Whatever archival format we choose, it must be human readable, and MathML is very far from being human readable. Furthermore, XML and MathML can also be deliberately obfuscated, just like the TeX example above.

Despite the fact that the first MathML standard appeared in 1998, 15 years later we still lack an imementation of MathML capable of producing formulas of quality comparable to TeX. Furthermore, most browsers (except those based on Gecko) don’t even support MathML. This includes Internet Explorer, Safari, and Chromium, which is more than 70% of currently used browsers.

• CommentRowNumber23.
• CommentAuthorAndrew Stacey
• CommentTimeJan 24th 2013

Webkit now supports MathML. I agree it’s not yet perfect, but the latest versions of Chrome and Safari do now render MathML.

Here’s an extract from one of my papers if you don’t like deliberate obfuscation. This is from a paper on the arXiv.

 \item
An \emph{\doobj[\doobj]{}} consists of an \dobj, $$\abs{\doobj}$$, together with, for each
$$\oop \in \sabs{\otype}$$,
a $$\dcat$$\hyp{}morphism
$$\oop_{\doobj} \colon \abs{\doobj}^{n(\oop)} \to \abs{\doobj}$$;
these morphisms are called the \emph{operations} of the \doobj.
A morphism of \doobjs is a morphism of the underlying \dobjs which intertwines the operations.

\item
An \emph{\oalg{}} is an \soobj.

\item
We denote \docat by $$\docat$$ and \socat by $$\ocat$$.
We refer to the functor $$\docat \to \dcat$$ which assigns to an \doobj the underlying \dobj as the \emph{forgetful functor}.
We write the underlying \dobj of an \doobj[\doobj] as $$\abs{\doobj}$$.


Of course, you can work out what that says by tracing the definitions, but they aren’t all that straightforward since neither \newcommand{\doobj} nor \def\doobj appear anywhere.

Rather than arguing TeX vs MathML, why don’t we come up with a list of characteristics for what would make a good archival format. We might find something that we agree on!

Dmitri and Darij want “human readable”. I do think that that needs to be made more precise as there’s a danger of falling into the trap of thinking that a familiar syntax satisfies this simply by its familiarity.

What I want is a open standard format. I also want one with the flexibility to be converted into unusual final formats. One problem with TeX is that it is too tied to its output format. It is very hard to get TeX to output anything non-visual, and it makes it too easy to write ones article purely for those output formats.

• CommentRowNumber24.
• CommentAuthordarij grinberg
• CommentTimeJan 24th 2013

Convincing – I didn’t think of packages which inject lines of text into sentences (don’t you ever get issues with grammar not fitting this way?). I think this can be figured out, nevertheless, with the hopf.sty file being part of the arXiv sourcecode. Of course, if that style file would be there without the comments, we’d have a problem, so I’m not claiming that the code “explains itself”.

Yes, the overreliance on the output is a big issue with LaTeX, particularly seeing that the output algorithm isn’t very good to begin with. But just as tex files are usually written with pdflatex in mind, so I fear maths in MathML will be mostly written for websites, completely ignoring any printability issues. I am not advocating LaTeX, I just don’t have much hope in anything XMLish solving the issue.

• CommentRowNumber25.
• CommentAuthorDmitri Pavlov
• CommentTimeJan 24th 2013
Here is my list:

1) Readability: Only text files written by humans can be archived.
2) Correctness: The program that interprets the files must unconditionally reject incorrect or noncompliant files, thus automatically serving as a verification program.
3) Durability: 70 years from now any archived file must produce exactly the same result as it does right now.

TeX, as well as METAFONT and METAPOST satisfy (1), (2), and (3) by design, provided that all macro files and fonts are stored together with the source files.
Plain TeX and Computer Modern fonts satisfy (3) by design, no need to store them.
LaTeX could satisfy (3) if one stores the LaTeX version (e.g., LaTeX 2e 2011/06/27) and all macro and font files used (rarely done in practice).
Nonstandard extensions to TeX, like pdfTeX and rapidly changing macro packages like TikZ, unless one stores them together with the source files,
which might be impractical due to the size of TikZ, probably don't satisfy (3).

PDF/A fails (1) and (2) miserably. Because of (2) and the fact that the majority of files that claim to be PDF/A seem to be noncompliant,
it probably also cannot satisfy (3).

HTML-based solutions fail (2) miserably. If one uses MathML, then (1) is also clearly violated.
The completely chaotic situation with web browsers and their support of HTML features seems to severely undermine (3).