Russian invades TeX

Russian invades TeX

After the introduction of the new X2(T2) encoding for Cyrillic, the russification of TeX and LaTeX becomes very straightforward. In fact, X2(T2) is a very TeX-oriented way of supporting Cyrillic. Notice, that I am not writing "Russian" here because X2(T2) is a UNIVERSAL encoding: it supports virtually every language that is based on Cyrillic alphabet (much more than 100). It has many more glyphs than Russian language needs. The only time the word "Russian" comes up is when you have to deal with the hyphenations. If you'd like to learn more about X2(T2) you can get more information at the X2(T2) home page.

A few words about the terminology. There are basically two encodings, T2 and X2, and the difference between them is in the following. In X2, Cyrillic characters occupy the all 256 slots, whereas in T2 they are located only in the upper part. It naturally leads to several T2's: T2A, T2B and T2C. The reason for this is that LaTeX 3 team defines T* encoding as one having a Latin character set in the lower part. In order to satisfy this requirement an "original" T2 was split onto several sets ("A", "B" and "C") and full 256-character encoding was renamed into X2. According to Alexander Berdnikov (berd@ianin.spb.su), the encoding was presented on EuroTeX-98 meeting. T2* encodings were not presented there and generally known to the particcipants of the CyrTeX-T2 mailing list only. T2* encodings will be reported to the public on TUG-98 (according to Alexander).

The work on TS2 encoding has been started. TS2 will be the Russian 'text companion' encoding. As of now, I do not know much about it. There are also some drafts of X3(T3 ?) which will represent characters from the old Slavic languages. However, there exists an old LaTeX 2.09 version of the package called SlavTeX which implements some of the desired functionality.

Now I am going to focus on Russian language. If anyone wants to support any other Cyrillic based language, please, contact CyrTUG. Its home page is at http://www.cemi.rssi.ru/cyrtug/. Notice that Ukrainian support is already available. So, you need three things:

LH fonts with X2(T2) support are produced by Olga Lapko at "Mir" publishers (olga@mir.msk.su), X2(T2) support for LaTeX2e is by Werner Lemberg (a7971428@unet.univie.ac.at) and Vladimir Volovich (vvv@vvv.vsu.ru); and NEW hyphenation patters are by Andrey Slepuhin (pooh@msu.ru). The old ones that were made by D. Vulis are obsolete now. Andrey's patterns are "strong" patterns that were checked against some texts from Central Patriarchate of the Russian Orthodox Church (monks typeset in LaTeX these days!) and the patterns gave very good results.

There are some discussions about Ukrainian hyphenation patters, a word file is even available on CTAN but according to some folks from CyrTeX-T2 mailing list, this file is basically wrong. This is yet to be done.

LH fonts can be configured to use encodings other than X2(T2) but I, personally, strongly advise against it. The fonts should be in X2(T2) because it is a standard. Why is X2 "more standard"? Well, here is the background. TeX is very frequently used for scientific purposes which involve quite a bit of Latin letters. If one uses a pure X2, he needs to always switch to, say T1, if the necessity to write a Latin word arises. Some people consider it very tedious. So did I, once. But there is a big "if" here. If one combines Latin and Cyrillic letters in one table, the kerning in Cyrillic gets broken. This might significantly degrade the quality of some documents.

There was a big discussion in the CyrTUG-T2 mailing list about this. Authors of X2 argued that one should always switch languages to provide correct kerning, and that users should not be allowed to freely mix different alphabets. But the user's demand was so overwhelming that the authors decided to compromise. They released T* encodings, to avoid chaotic creation of such things by users themselves but they always encourage people to use the proper way of switching languages and alphabets.

Your input encoding can be anything: koi8-r, koi8-u, cp1251, cp855, cp866, maccyr, macukr. LaTeX support provides all this flexibility through fontenc and inpotenc packages that are parts of the base distribution, and should always be available to you. If they aren't, you have a faulty LaTeX distribution.

The installation has nothing really unusual in it. You copy the METAFONT sources where METAFONT can find them and add this piece to your special.map file, so METAFONT will put .pk files into the appropriate directories. You also have to unpack the macro support, and, following the instructions inside, run the install file through LaTeX; then put stuff into a place where TeX can find it.

Hyphenations could require a bit more tricks because there are a few options you can choose from. The most correct way is to use Babel with its language.dat file mechanism. As of v3.6h, Babel has a broken support of Russian in X2(T2). In order for it to work, you should get a replacement from the X2(T2) support macro distribution. There is a rusbabel/ subdirectory there. Replace Russian support in the original Babel distribution with those files. You might want to recreate Babel styles from the gound up or generate Russian support separately and copy .fd, .def, .sty and other appropriate files into a directory with a working babel.

Then you need to edit Babel's language.dat file (adding Russian hyphenation patterns), and generate the format over again running something like

              inittex latex.ltx '\dump'

You should use the patterns in X2(T2) when generating the format.

After that, you can use your new Russian support with something like:

              \usepackage[russian]{babel}
              \usepackage[koi8-r]{inputenc}

There are a few examples in the X2(T2) support macro distribution.

The second way to deal with the hyphenations is true only when you have a bi-lingual environment. You can do this with or without Babel. I have never done this with Babel but I did this without Babel's support. The idea is to put both English and Russian patterns into the same \language0 because the languages occupy different parts of the table. You can edit hyphen.tex directly just \input'ting Russia hyphenations into it. Under some circumstances this approach could be more convinient.

Now... This second way is directly related to some situations where one needs a direct access to the Latin letters. What does it mean? Well, if you are working with Babel, it switches languages (and alphabets, and hyphenations) with the commands defined in Babel. This is most certainly a good thing to do: it is the most correct and academic way of switching between languages. Many people are strong proponents of it, and you must use it if you are using Russian (or English) texts for citation purposes only. However, as it turns out, these people use Latin as the basic alphabet, so they do not have problems writing Latin notation that is very frequently used in the scientific environment. Besides, scientific notation does not usually require hyphenations, and if the number of the notations becomes large, it becomes rather tedious to always switch the languages. You might define macros for it but, again, when there are a lot of things you must use, it becomes very tedious.

You'd want to freely write in Russian or English. This can be done in the bilingual environment. There are two options: with and without Babel. In both cases you must install a ruseng.sty written by V. Volovich (vvv@vvv.vsu.ru) and found in etc/ in the T2 support macro distribution. If you do not want to use Babel, just do this:

     \usepackage[T2,T1]{fontenc}   %% could be OT1 as well but T1 is nicer
     \usepackage[koi8-r]{inputenc} %% or whatever input encoding is
     \usepackage{ruseng}

If you want to use Babel, go and edit russian.ldf as Vladimir Volovich suggests. Find


    \usepackage[X2]{fontenc} % do not force the switch of \encodingdefault
    %\input{x2enc.def}
    \def\cyrillicencoding{X2}

and change to


    %\usepackage[X2]{fontenc} % do not force the switch of \encodingdefault
    \input{x2enc.def}
    \def\cyrillicencoding{X2}

Then, you're in business and can do:

    \usepackage[russian,english]{babel}
    \usepackage[koi8-r]{inputenc}   %% Or whaever input encoding is
    \usepackage{ruseng}

Babel will yell at you that it could not switch to the Russian hyphenation patterns just because there are now in the upper part of the character table, together with English. Ignore it and happy TeXing!

Which one of the two approaches is "better" is hard to say. The first is a really correct and academic approach which sometimes can be inpractical, the second could be convinient but it only works for the bi-lingual environment, and you loose kerning when typsetting Russian. I, myself, is drifting towards the first one, although I was using the second approach for a very long time.