The poor man's interlinear

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

The poor man's interlinear

David Haslam
One my friends recently asked:

Do you know of any program that will load two text files (plain text or Word files) and display them interlinearly?

Here's my reply:

Not off hand, but here's an easy workaround using Excel.

First create a double-space copy of each text file. i.e. Replace all EOLs by two EOLs.
e.g. Using Notepad++ replace \r\n by \r\n\r\n for a Windows styled text file.

For the second file, insert an extra blank line at the top of the second file, so this file has its text on even numbered lines.

Paste the contents for each double-spaced text file into a separate Excel worksheet.

In another worksheet, use a formula in cell A1 to select accordingly. The formula is

=IF(ISODD(ROW()),Sheet1!A1,Sheet2!A1)

where I've not renamed the worksheets.

Copy cell A1 to as many rows as you need.

Simples!

The whole operation could be achieved by a more complex formula without the prior need to convert the files to double-space.
I considered this at first, but came to the conclusion that simplicity is preferable.

The method I described could be extended to cope with three (or more) files, using the MOD function instead of the ISODD function.
i.e. By first making a triple-spaced copy of each file.

David

PS. I'd resist doing it for Word files, unless they are first converted to Unicode text files.
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

Sebastien Koechlin-5
On Fri, Sep 07, 2012 at 12:34:56AM -0700, David Haslam wrote:
> /One my friends recently asked:/
>
> Do you know of any program that will load two text files (plain text or Word
> files) and display them interlinearly?
>
> /Here's my reply:/
>
> Not off hand, but here's an easy workaround using Excel.

Excel does not handle more than 65000 lines. May be Libreoffice have a
higher limit.

You can merge two file on the command line using for example Perl:

(put it on a single line)

perl -e 'open A,$ARGV[0]; open B,$ARGV[1]; while( <A> ) { print $_; print
"".<B>; }; print <B>;' file_a.txt file_b.txt > merge.txt

You can add a empty line between each:

perl -e 'open A,$ARGV[0]; open B,$ARGV[1]; while( <A> ) { print $_; print
"".<B>; print "\n"; }; print <B>;' file_a.txt file_b.txt > merge.txt


--
Seb, autocuiseur

_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

Jonathan Marsden
In reply to this post by David Haslam
On 09/07/2012 12:34 AM, David Haslam wrote:

> /One my friends recently asked:/
>
> Do you know of any program that will load two text files (plain text or Word
> files) and display them interlinearly?
>
> /Here's my reply:/
>
> Not off hand, but here's an easy workaround using Excel.

That's a somewhat bizarre choice of software tool for text processing,
to my mind :)

In Linux or (I think) using a port of paste for Windows, try

  paste -d '\n' file1 file2

That's all it takes, and it works for as many filenames as you can fit
on one command line.

Jonathan


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

Jaak Ristioja-2
In reply to this post by David Haslam
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07.09.2012 10:34, David Haslam wrote:


My first try would have been with GNU awk, but the "paste" utility
from GNU coreutils makes it even more simple:

   paste -d "\n" file1 file2

You can also paste together more than 2 files at once. See "man 1
paste" or http://linux.die.net/man/1/paste for details.

I really pity Windows users for the lack of such utilities on Windows. :)

Blessings,
Jaak
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iQgcBAEBAgAGBQJQSaoQAAoJELeXyoqzFNdNlzg//3NH4UqnHebsEm6KJB0vs5Dq
zfHNhxzkqH53rIkBJZ3EJ8eodAwUPm5xd1f54w3DE9/k8GPHyGnP/eSND2P2CJEr
PcqvhofjUo6bQcBSkVVqhKcU+oh79DqCMPs9WuHKZCTfeVicFmQoBa1iyqhRF+hx
WeSpIuxo/VAr7gt/dbJ8tBrXi34/bDXchjJFleYtnQKrT9v2pP1eufT25F3J5T+/
vZC2Zql86j3yrb0gKOSv7gcNg2F4sSuuwUnh4uh4C3hHMkvi+tkXa89WQ+C4hQyt
pOeFNKuLW0qQAovSdgZFTHc85GQ78TWC/G53sfcwwuhfQ3f5ubISWZ/KyzI22+W1
ZjSbG96tT7DTVOs3J8M8C3yWbILuSX30+eRRjyyfnD1BedublWipyEjAa136O+Wn
rsGryGmAhqQd20ZdlA+mm0hGpaBG9bxEVQ0oVtaGbdXLCzDICgwzJuZJ/qecTJN3
VR6hslsZkKz24mrjQYYJU1dCE02AHS8mnys2VugLufmdd1pVpef9lN7INY/ZMJid
lsLrPcJ94f4TkewBpVKX3mqNKO8q/jmDML1ODkxeCZiOPbL6cpcWDc/Qwd6U85Aw
ODqf5CZKCjQJpnaJMCxlFQqPOI0WV6HibRos/yK+9SbmA1kDKnPiJSsFypsGkRXf
Bck5tdVGBWmQvbY9GO2vkAR9ESLp75hZ9AfvlFK+bL969CnMPtkJIODIWh3PDVKt
+FO+iNx4XouB5NAd0vFkwRvUMLYdPd+wmnAuC60Q90xK3y7pXkvjA47RUHnswi6J
PydcO2xu/eg0USXLMFZlBgAacHp3gtqdrG9v2gBt8qKFptFF2hBotqBDZstYidKg
+6O+fohH4JYs3qeW1vQlZaLujoflpO9bZQuPy79/01igo/IBLLlJlK8C0Djcbzm9
Qo0WqWIISJ56Ta4GUohvrYhYd9u2qUWfg6b5kS/7H/hTdsWal8lqA4RjMVkOmfPp
0Q2Pe9/wBx9JH9vhEsyInXbOGt51rhrYcMm9biRiUd5BjWQfgLk/j15Gk54Mz2tH
BhoEz0kWgARTolqOcyjkt4qs5BfJ0NIyG1ME6rGmMw+Q2kz2wBUI4lLY+lyv3pQo
UHRal74jJdARC8B8cCmTdysjvyZl+9ci9iaLeLMoZGe27+91PVDszNDd2dOb13ZX
DkJCYJ4s7EhSaF84i/HhEZyYiZKvkr6EIk6EYusIiDI0tYCYn23f0DaXmjVC7P4f
C1e4dHzyMvaYZoqpE1/S0VHfVzzx/oNB29LvcAirzTC9OqYSbxzyGJkyFudw4vlT
L3HzkDTEHu/17QigGpQZaSQkog0GPpbRSUebRBg3l8/uGJcbJGsCZC2P4Q79Ywir
sPnCcAwqNRoBipyL8jzgeBOMXxXUlK3Ze0uL/wfpujgwOBoieLvACZ6BkWGBwPmw
mzX9dU9DWfZzOydMuxnq9b9aX2+HfZgapeeAHzLaNDUqUp5tj7DnVKh1cPTiO7Sf
C+byLLUYGhTtt4VPJvRY+zDWGp6RdEhgLmwA8/Q705hE72YkXJdh3tNgxpsp4EZs
32rU6WusUXZV35/M+VJ3e27b9uQbOMA+PCICqAGjwFtb77xf18TTxKrR+0nFAMN6
7HGyQcOfnIcg8HfMZ0ls8FghXYw0wBdxDLT27W53YCn8AG4P4eLLcIl8Lbz75ifq
9RslZ6kETqL0iViQ8ZgINn14UXVIImVzw9h77ZTbNl6wQDtDcfHJxF0pFOJExo+/
NxnEvevVR3sdXFXoCw6nCSEbnjjmKd9pw0xlVSjEsBSo3aFMMqXQzhX3JxD6pI9N
984gn+8SuhsM4SqGqhRJeWktDmprna3mX9cD2GhCIJBX+SgytWoqaIRunYm6GLj+
7VYEVuXQmppjiBE0+OaPAjyfxafPZZp+sszuSAt03iJjEJPa5GOj4KBOo1soMU9/
n4znowb0YDifdAUTWjnPS6hMy4d6i3WWsFNcHOGScXeWLOxRQAw+rQzbXphuZIE6
Lhzioot5Z1h6YMIAJ7DaL2LDUxgj+Yd6vSqWAH9lvQKvi5E2zRNkCaALVRxfooU0
Z+FPxZNkKsOEnFbWs2RFFJXkePG3tST9VoBsIedegny1MQhQY5vk4U+s7ndItEUW
AxU/2YLqxopwY6U5P1EYRuDGV6u7aB0eRWVfEDg2oPyqgj60HQEkq13coyWTfYl8
RC0FGmza2FtaQUZKVXIsrq3S1luzW7EsOtwdHc5dS8oFgdd+Use6dWLw9uZiRu3m
0nJ4EgLVbi4lhqS5Gwzg5K+YCILv4ZObfaOHYCrBtMaSYosuUUgWlbz6nSIpIyz0
Ii0K8DQUTitLhiswYNIObIkkIs6H+UJGrzZ/2eupxKoHX12KPerEU0qQTcO+elW0
gsWoM9wKFLqpkv7JuLR97dQOUuJzcBXJOCAfRQcHW55PN/iQZW6CcFRT3h8lnx2L
CGay20UKdkA1ShcIce0phYWUQhvMlKVFrEogDZ0VtziBDcWiHcgTWYZASg8JtssU
LfkmYd10GzRrWa77Dha3nqPbJO0uMyYoBUe+ovDtzQXYNM35eD9qpOUM8ov30ehN
y0ftY5olDOw0mBvcyyCUhaRZZrDpIUV43tbNBGkdEZm8LxLnEBXLzMCRcuzZ8D2V
k+Si/y9PvzOfbUrY9GdGSCUDR6twaPPg02ecGyLtXZt9B/4NW9dP+Yr/8yE4ksQu
BACtizK4hWxd0Jyc/8JH
=bD4s
-----END PGP SIGNATURE-----

_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

David Haslam
In reply to this post by Jonathan Marsden
Thanks for all the suggestions.
I'm familiar with the paste command, having used it myself for various tasks.

It turns out the specific task he had in mind was both simpler than text editing,
yet more complex than merely comparing two files line by line.

One file has a legacy font, the other has the same contents converted to Unicode.

He may even have had to reconstruct the legacy font himself during earlier work,
and design the Unicode conversion map table.

David
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

Mike Hart
In reply to this post by David Haslam
David,

To understand your input request, someone wanted

A.  to display each row of text from 2 separate files, alternating.

Or

B. To display an actual interlinear, matching 'related words'

For A, I would have:
1. loaded each file into excel, created a new Collumn 1 (named order) and autonumbered each row of C1
2. Merged the 2 resulting tables into a single worksheet in excel
3. Sorted them on collumn 1 in ascending order
4. selected collumn 2 and output that to a file.

B. Is a little more tricky and would require (and i'm only working through the logical steps... you'll have to fill in the greps and specifics.
1. creating an order for each word (if paragraphs matter, then give the EOL a word like {{EOL}}
2. doing a dictionary lookup (Excel function LOOKUP). (Assuming you can turn both files into something similar to strongs numbers)
3. Save each file which is now by collumn 1-word order, 2-word, 3-dictionary number as a flat file (remove the LOOKUP function)
4. reopen each file
3. Sort File 2 on the definition numbers
4. Create a 4th collumn in File 1 and LOOKUP the definition number in file 2 returning the word  from file 2
5. Create a Html Table based on  these results



From: David Haslam <[hidden email]>
To: [hidden email]
Sent: Friday, September 7, 2012 2:34 AM
Subject: [sword-devel] The poor man's interlinear

/One my friends recently asked:/

Do you know of any program that will load two text files (plain text or Word
files) and display them interlinearly?

/Here's my reply:/

Not off hand, but here's an easy workaround using Excel.

First create a double-space copy of each text file. i.e. Replace all EOLs by
two EOLs.
e.g. Using Notepad++ replace \r\n by \r\n\r\n for a Windows styled text
file.

For the second file, insert an extra blank line at the top of the second
file, so this file has its text on even numbered lines.

Paste the contents for each double-spaced text file into a separate Excel
worksheet.

In another worksheet, use a formula in cell A1 to select accordingly. The
formula is

=IF(ISODD(ROW()),Sheet1!A1,Sheet2!A1)

where I've not renamed the worksheets.

Copy cell A1 to as many rows as you need.

Simples!

The whole operation could be achieved by a more complex formula without the
prior need to convert the files to double-space.
I considered this at first, but came to the conclusion that simplicity is
preferable.

The method I described could be extended to cope with three (or more) files,
using the MOD function instead of the ISODD function.
i.e. By first making a triple-spaced copy of each file.

David

PS. I'd resist doing it for Word files, unless they are first converted to
Unicode text files.



--
View this message in context: http://sword-dev.350566.n4.nabble.com/The-poor-man-s-interlinear-tp4650950.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page



_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

David Haslam
Here's a more detailed list of requirements from my friend (who's a volunteer for MissionAssist).

1. Definitely different encodings. This is where ALL comparison programs fall down. Most operate on ANSI plain text; better ones are Unicode compliant; none allows independent setting of fonts for legacy files vs.Unicode file.

2. There is a nastier snag which will deter those who aspire to writing a capable program: multi-byte encoding. Two or more legacy glyphs form a composed character. Its Unicode representation can involve 1 to 4 (occasionally more) glyphs to create the composed character. How on earth do you then compare legacy with Unicode? Since we are checking how the text DISPLAYS I have even investigated image matching programs to see whether we could compare screen shots - the matching tends to be too precise, and fuzzy matching is not offered.

3. I use <Compare It!> which is Unicode compliant but only allows one font per  task. Besides the L&R panes it has another window which stacks line N in file A above line N in file B which simplifies reading/comparing the text even if the displays are different. It is good enough to insert virtual blank lines to keep the display text blocks aligned. The author won't add a dual font feature - I've asked.

4. What I looking for is a means to inter-linearise two text files, keeping the text as closely aligned as font differences allow. The text has to be more than plain text, since the projects always involve customised legacy proportional fonts.

David
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

Jonathan Marsden
On 09/08/2012 02:22 PM, David Haslam wrote:

> Here's a more detailed list of requirements from my friend (who's a
> volunteer for MissionAssist).
>
> 1. Definitely different encodings. This is where ALL comparison
> programs fall down. ...

Would it be worthwhile to work around this by recoding one file into
the encoding of the other, and then comparing the two files "normally"?

In other words, in the Linux world, make the custom font/encoding
known to recode, and then use recode to transform the file.  Then a
conventional file comparator will (I think?) work fine.

Overall I suspect you might be looking at the problem from too narrow
a perspective... instead of seeking a way to display things and
compare them by eye, consider seeking a way to transform the encoding
of one file into that of the other, so that the computer then can do
the comparison for you automatically :)  Is this workable, or am I
missing something?

man recode for info on its capabilities.  recode -l for a list of the
encoding it already knows about on your system.

Jonathan


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

David Haslam
Further update ....

I referred Jonathan's reply to my friend in MissionAssist, with the following accompanying remarks.

Somehow, I think he's missed the main point.

i.e.  You already have a legacy to Unicode conversion, yet because of the complexities of the original documents and how it the legacy script works, the purpose of the required comparison is to discover any "corner cases" which the conversion algorithm didn't yet address.

Is this a right understanding of the task?

-----------------

My friend observes:

Your correspondent has not understood what we are trying to do. We are looking for mistakes in the algorithm.

We have already converted the encoding. It is a given that the source file is error-free as displayed. What we have to check is that our encoding conversion mapping (algorithm if you like) is accurate. Machine methods of doing this  have so far turned out to be circular in methodology. The only sure, but slow, method is sight checking. Until we find other means we would like to be able to view the text on lines stacked one above the other. Viewing Word files side-by-side at 75% zoom is hard on the eyes when the fonts in question are not standard Latin. Given that our source files always use customised TTF fonts we would need tools that operate on plain text to have import filters to handle RTF/DOC/DOCX/PDF as a minimum.

Many thanks for your interest in our problem.

-----------

David
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

refdoc@gmx.net
That is bizarre.

If there is a conversion script then the way to test it is not by running 66 bible books through it and then doing side by side comparison, but to create a list of all "corner cases" and check for these.

Essentially there are  x characters in the former version, which combine into y combined characters, all of which have an unicode equivalent. Where is the problem?

If he is really thinking his conversion script causes grief which he can not find by analysing the script carefully, then he should do following:

original in custom -> unicode -> reconvert to custom.

And then do a diff on the original and the reconvert.

Peter


-------- Original-Nachricht --------
> Datum: Mon, 10 Sep 2012 02:18:01 -0700 (PDT)
> Von: David Haslam <[hidden email]>
> An: [hidden email]
> Betreff: Re: [sword-devel] The poor man\'s interlinear

> Further update ....
>
> I referred Jonathan's reply to my friend in MissionAssist, with the
> following accompanying remarks.
>
> Somehow, I think he's missed the main point.
>
> i.e.  You already have a legacy to Unicode conversion, yet because of the
> complexities of the original documents and how it the legacy script works,
> the purpose of the required comparison is to discover any "corner cases"
> which the conversion algorithm didn't yet address.
>
> Is this a right understanding of the task?
>
> -----------------
>
> My friend observes:
>
> Your correspondent has not understood what we are trying to do. We are
> looking for mistakes in the algorithm.
>
> We have already converted the encoding. It is a given that the source file
> is error-free as displayed. What we have to check is that our encoding
> conversion mapping (algorithm if you like) is accurate. Machine methods of
> doing this  have so far turned out to be circular in methodology. The only
> sure, but slow, method is sight checking. Until we find other means we
> would
> like to be able to view the text on lines stacked one above the other.
> Viewing Word files side-by-side at 75% zoom is hard on the eyes when the
> fonts in question are not standard Latin. Given that our source files
> always
> use customised TTF fonts we would need tools that operate on plain text to
> have import filters to handle RTF/DOC/DOCX/PDF as a minimum.
>
> Many thanks for your interest in our problem.
>
> -----------
>
> David
>
>
>
> --
> View this message in context:
> http://sword-dev.350566.n4.nabble.com/The-poor-man-s-interlinear-tp4650950p4650961.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

David Haslam
In reply to this post by Sebastien Koechlin-5
Sebastian,

Excel 2007 and up can handle 1048576 lines!

David
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

David Haslam
In reply to this post by refdoc@gmx.net
Peter,

Your underlying assumption is questionable.

As with transliteration - in general it's not always true that the conversion process can be reversed without loss of information. I've seen several examples involving such ambiguities.

David
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

DM Smith-5
While the reversal is not always possible, I think Peter's observation is correct, if I understood the thread at all.

Today, the legacy encoding has a custom TTF font that displays the text perfectly. Someone had to create that font and that font maps from the text to the glyphs.

That mapping is the only mapping that needs to be handled.

Presuming that the glyphs can be identified and all the patterns that each glyph supports can be identified, all that is needed is to map those glyphs to their unicode equivalent. That then will create a directional, transitive relationship A ===> B ===> C, such that A ===> C.

I think the real difficulty is that a non-technical solution is being sought for a technical problem. What was originally presented was: How do I create an interlinear? That wasn't the problem at all.

In Him,
        DM

On Sep 10, 2012, at 11:09 AM, David Haslam <[hidden email]> wrote:

> Peter,
>
> Your underlying assumption is questionable.
>
> As with transliteration - in general it's not always true that the
> conversion process can be reversed without loss of information. I've seen
> several examples involving such ambiguities.
>
> David
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/The-poor-man-s-interlinear-tp4650950p4650964.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

David Haslam
Having pressed the matter further with my good friend at MissionAssist, here is his response:

---

This sums up what NRSI told me when I began to look at machine checking of old vs new:

"In doing automated checking, one has to be careful not to rely on processes which give a false impression of accuracy. For example, some people have proposed converting a file to Unicode, and then converting it back to legacy, and comparing the original legacy file to the final version. But that only tells you that the conversion table is reversible, not that it is accurate. A comparison which relies on the same mapping table as was used to do the conversion will only tell you whether the rules of the mapping table were applied as written. In general, comparison of two data sets is useful only if the two data sets were created by independent paths."

---

My remarks follow:

Having met one of the programmers (during the EMDC) who works for SIL's NRSI (on implementing the Graphite Engine), I hold them in the highest regard for their technical knowledge and skills.

So yes, the interlinear arrangement originally requested does serve only one purpose: to provide additional confirmation that the visual appearance of text in the Unicode version matches that of the original with the legacy font.

Some details have been omitted in this reply.

Aside:  DM himself should recognize the truth of the statement "In general, comparison of two data sets is useful only if the two data sets were created by independent paths." That's precisely the background and underlying philosophy for the KJV2006 project.

David


Reply | Threaded
Open this post in threaded view
|

Re: The poor man's interlinear

refdoc@gmx.net
While I will never wish to stop anyone from creating themselves more
work than necessary (as long as they do not take my taxes or tithes) I
remain in awe over the work created here and described as necessary, yet
being entirely unnecessary. And it prejudices me heavily against working
ever with the organisation who enquired from you.

There is an existing conversion route. The route is called legacy font.

It converts a bizarre binary into something readable.

Every custom font item has specific rules which lead to its creation.

Every custom font item has, we one adequate and correct unicode
representation. If there are more than one graphical representations
(like Cyrrilic and Latin 'a') then this is made irrelevant by the
language being assigned a certain area in unicode chosen for it (Latin,
Cyrillic, Arabic, whatever).

So the rule which leads to the selection of custom font items can be
without risk of error, duplication or indeed "corner cases" select a
single unicode item.

There may be a small number of select and well defined exceptions to
above rules

1) Some custom font items may have several conversion rules leading to
it. This is only relevant if the custom font encorporated compromises
which can now get undone.
 
2) Some custom font items may after all not have a unicode equivalent.
This is unlikely, but in odd languages not impossible. Language specific
ligatures, language specific diacritics etc are the likely candidates in
"normal" scripts. These can be assigned empty unicode spaces and then
offered a custom font (after all)

All these matters require careful analysis, none require wholesale text
comparisons by eye.

Peter





On Mon, 2012-09-17 at 11:29 -0700, David Haslam wrote:

> Having pressed the matter further with my good friend at MissionAssist, here
> is his response:
>
> ---
>
> This sums up what NRSI told me when I began to look at machine checking of
> old vs new:
>
> "In doing automated checking, one has to be careful not to rely on processes
> which give a false impression of accuracy. For example, some people have
> proposed converting a file to Unicode, and then converting it back to
> legacy, and comparing the original legacy file to the final version. But
> that only tells you that the conversion table is reversible, not that it is
> accurate. A comparison which relies on the same mapping table as was used to
> do the conversion will only tell you whether the rules of the mapping table
> were applied as written. In general, comparison of two data sets is useful
> only if the two data sets were created by independent paths."
>
> ---
>
> My remarks follow:
>
> Having met one of the programmers (during the EMDC) who works for SIL's NRSI
> (on implementing the Graphite Engine), I hold them in the highest regard for
> their technical knowledge and skills.
>
> So yes, the interlinear arrangement originally requested does serve only one
> purpose: to provide /additional /confirmation that the visual appearance of
> text in the Unicode version matches that of the original with the legacy
> font.
>
> Some details have been omitted in this reply.
>
> Aside:  DM himself should recognize the truth of the statement "In general,
> comparison of two data sets is useful only if the two data sets were created
> by independent paths." That's precisely the background and underlying
> philosophy for the KJV2006 project.
>
> David
>
>
>
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/The-poor-man-s-interlinear-tp4650950p4651038.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page