EMTV text source URL is now unrelated

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

EMTV text source URL is now unrelated

David Haslam
The file EMTV.conf includes
TextSource=http://www.emtvonline.com

However, this URL no longer relates to the EMTV translation.  It appears to be a site about the Stock Market.

A Google search reveals that the relevant URL should now be http://www.majoritytext.com/

David
Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

troypulk
I've updated the .conf file and all the other error's too.

I'm also updating the whole EMTV module and as of right now I'm 70% done.

Troy P.


>The file EMTV.conf includes
>
>
>However, this URL no longer relates to the EMTV translation.  It appears to
>be a site about the Stock Market.
>
>A Google search reveals that the relevant URL should now be
>http://www.majoritytext.com/
>
>David

_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

David Haslam
The EMTV module had errors in red letter markup, so I assume that Troy Pulk is attending to those too.

See http://www.crosswire.org/bugs/browse/MOD-173

"Moreover, the red letter markup appears to have been done by our own module creator, even though the online edition now at http://www.majoritytext.com/ does not display any text in red. So should our EMTV module really have red letters?"

David

Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

Chris Little-2
In reply to this post by troypulk
If you are updating the EMTV, did you get a new copy of the Word docs
from Paul Esposito? We didn't create the existing module from HTML files
on a website, which are slightly degraded from the original text of the
translation.

It would be great if we could also get an updated text of his OT
translation, so that we can distribute a complete LOGOS Bible. And, we
will also need permission to distribute the updated text (which I assume
you have if you got the text directly from the source).

--Chris


On 10/10/2011 1:36 PM, troypulk wrote:

> I've updated the .conf file and all the other error's too.
>
> I'm also updating the whole EMTV module and as of right now I'm 70% done.
>
> Troy P.
>
>
>> The file EMTV.conf includes
>>
>>
>> However, this URL no longer relates to the EMTV translation.  It appears to
>> be a site about the Stock Market.
>>
>> A Google search reveals that the relevant URL should now be
>> http://www.majoritytext.com/
>>
>> David
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

troypulk
Yes,

I've fixed the RED letter issue as well.

I think that sword has the ability to display red letters so we should use it, even when the source doesn’t have it.

Chris,

I e-mailed and talked to Paul directly and received permission to update and resubmit back to sword for redistribution.

Also, I didn't even think of asking Paul for a word doc, what I did was the long way as I think about it, I down loaded and saved the online text at majoritytext.com to a text file and from there I converted it to osis.xml then I purchased the 2011 EMTV version from Paul Esposito's LULU store in a PDF format and then used Meld to update the text because the Web version is old compared to the 2011 version. I guess that I wanted to support Paul so I bought a copy.

So, I will e-mail Paul again and ask him for the LOGOS Bible in the original format that he was using, Also while I'm at it I'll ask him for the 2011 EMTV NT version so you can have it as a backup, if need be I'll buy another copy, their less than $25

He told me that he will not be updating the 2011 EMTV for a long time.

What format would you need for permission? Will my e-mail work or does it need to be some kind of official document?

Troy P.
Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

Greg Hellings
On Tue, Oct 11, 2011 at 12:05 PM, troypulk <[hidden email]> wrote:
> I think that sword has the ability to display red letters so we should use
> it, even when the source doesn’t have it.

I strongly disagree.  Absence of red letters in a source is usually a
choice - even a theological statement - of the person who prepares the
text. When I have done translations I have specifically omitted red
text as a pointed theological choice and would be very upset if
someone represented my work with it.

--Greg

_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

troypulk
>I strongly disagree.

Okay,

The old EMTV of 2004, the one I'm updating had the RED letters already that's why I continued to use them.

Also, I can e-mail Paul and ask what he would like.

Troy P.
Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

Chris Little-2
In reply to this post by troypulk


On 10/11/11 10:05 AM, troypulk wrote:
> Yes,
>
> I've fixed the RED letter issue as well.
>
> I think that sword has the ability to display red letters so we should use
> it, even when the source doesn’t have it.

The red-letter text had to have come from somewhere. I had assumed I was
the last to convert the EMTV text, using Word docs Paul sent me. I
certainly know that I didn't hand-encode red-letter text, but I also
can't find any version of Paul's site that included red-letter text.

I'm not clear on what problems actually exist with the red letters
(assuming they are Paul's, and not from some intermediary source that
added them). The problem David points out in Mark 1:11 is an instance of
words of the Spirit being marked red. That's not necessarily an error at
all. We shouldn't get caught up on the name that CrossWire has given to
its filter and we shouldn't be distracted by the most common use of red
lettering in Bibles being used to mark words of Christ. There are Bibles
that mark the text of the OT in red whenever God speaks; it may be that
whoever marked the text of Mark 1:11 red did so intentionally, and has
marked other instances of the Spirit's speech in red as well. (I don't
know for sure; I'm simply offering a possibility.)

Either way, I would concur with Greg and David in suggesting that the
red lettering be removed, IF it didn't come from Paul Esposito.

> Chris,
>
> I e-mailed and talked to Paul directly and received permission to update and
> resubmit back to sword for redistribution.

Awesome! Thanks!

> Also, I didn't even think of asking Paul for a word doc, what I did was the
> long way as I think about it, I down loaded and saved the online text at
> majoritytext.com to a text file and from there I converted it to osis.xml
> then I purchased the 2011 EMTV version from Paul Esposito's LULU store in a
> PDF format and then used Meld to update the text because the Web version is
> old compared to the 2011 version. I guess that I wanted to support Paul so I
> bought a copy.
>
> So, I will e-mail Paul again and ask him for the LOGOS Bible in the original
> format that he was using, Also while I'm at it I'll ask him for the 2011
> EMTV NT version so you can have it as a backup, if need be I'll buy another
> copy, their less than $25

Supporting him is one thing. Making extra work for yourself, entirely
another. :) Converting from PDF or trying to meld a PDF into a text
basis will always be more work than the worst Word doc conversion.

I hope you spotted that the HTML version lacks italicization entirely. I
could see that being lost, depending on what you exported from the PDF.

> He told me that he will not be updating the 2011 EMTV for a long time.
>
> What format would you need for permission? Will my e-mail work or does it
> need to be some kind of official document?

I would send Troy (Griffitts) a copy of the permission email. That
should suffice.

--Chris


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

troypulk
Hello,

I received a reply back from Paul today and he said that the RED Letters were okay to use.

All I know is that the 2004 Emtv module at Sword right now has red lettering so I continued to use it. But I have removed the RED letters of Mark 1:11 because it was not Jesus speaking.

Also that he will send me updated 2011 Logos and Emtv bibles tomorrow in Word Doc. Chris do you want me to e-mail them to you or someone else?

>Supporting him is one thing. Making extra work for yourself, entirely
>another. :) Converting from PDF or trying to meld a PDF into a text
>basis will always be more work than the worst Word doc conversion.
>
>I hope you spotted that the HTML version lacks italicization entirely. I
>could see that being lost, depending on what you exported from the PDF.

Yes, I realize that now but I'm 70% done and it's too late to turn back ;-)

Yeah, I saw right away that the Italics were gone, so I have been adding them manually, plus Paul has changed the text so many times that the 2011 version is different than the 2010, the Italics were different too.

Actually, I converted the HTML from majoritytext.com straight to text then to osis, then I converted the 2011 PDF to text. I used both of these text files in the program Meld to find the differences, which were very many. using meld as a guild I'm updating the osis emtv xml. I tried to convert the PDF to OSIS but from what I found out it was not possible.

I know I'm probably doing it the hard way but it works for me and I'm almost done.

Okay, I'll get the permission e-mails together and send them to Troy G.

Troy P.
Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

David Haslam
If Troy started with the HTML as the files to preprocess, what was the main difficulty that prevented tagging the words in italics, such that in the OSIS XML files these would be marked as transChange elements?

If the HTML does contain the italics, then the conversion of tags should be amenable to scripting.
The main thing to be careful about is remembering that attributes are not stacked, so italics within words of Jesus always require special care.

Another problem example is that Mark 2:11 had no tag to mark the end of red letters, so the red letter attribute "leaks" to the rest of the chapter. There may be other instances like this.

David
Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

troypulk
As far as I know there is no tool that converts HTML or .Doc file to a OSIS bible.

So using the HTML to convert to a .txt file caused the italics to disappear, but I'm putting them all back in.

As I was fixing the RED letters I noticed the lack of tags as well, but these are all fixed now.

Troy P.


>If Troy started with the HTML as the files to preprocess, what was the main difficulty that prevented >tagging the words in italics, such that in the OSIS XML files these would be marked as transChange >elements?
>
>If the HTML does contain the italics, then the conversion of tags should be amenable to scripting.
>The main thing to be careful about is remembering that attributes are not stacked, so italics within words >of Jesus always require special care.
>
>Another problem example is that Mark 2:11 had no tag to mark the end of red letters, so the red letter >attribute "leaks" to the rest of the chapter. There may be other instances like this.
>
>David
Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

David Haslam
Hi Troy,

Yes - you're probably right about lack of a readily available tool for direct conversion.

Had I been tackling the task, I might have considered these steps:

1. Open each HTML file using MS Word, save each file as RTF.
2. Open each RTF file using WordPad, save again as RTF (smaller and simpler file structure).
3. Create & run a script to process the RTF tags for italics attribute and for red font colour.
4. Open the processed RTF files using WordPad, save as Unicode text  (encoded as UTF-16 LE).
5. Use a suitable editor to open the Unicode text files and change encoding to UTF-8 (without BOM).

After step 5 you'd have something similar to where you began converting plain text to OSIS, but with some ingenuity at step 3, you'd also have some elementary markup for italics and red letters that survives the complete loss of formating attributes at step 4.

During my Go Bible activities, I've used this approach more times than I can recall.

The steepest part of the learning curve is getting used to the format of RTF files when viewed by an ordinary text editor.

After step 5, it's often simpler to do the next conversion to USFM, and then use usfm2osis.pl

Best regards,
David


Reply | Threaded
Open this post in threaded view
|

Re: EMTV text source URL is now unrelated

Greg Hellings
On Wed, Oct 12, 2011 at 2:18 PM, David Haslam <[hidden email]> wrote:

> Hi Troy,
>
> Yes - you're probably right about lack of a readily available tool for
> direct conversion.
>
> Had I been tackling the task, I might have considered these steps:
>
> 1. Open each HTML file using MS Word, save each file as RTF.
> 2. Open each RTF file using WordPad, save again as RTF (smaller and simpler
> file structure).
> 3. Create & run a script to process the RTF tags for italics attribute and
> for red font colour.
> 4. Open the processed RTF files using WordPad, save as Unicode text
> (encoded as UTF-16 LE).
> 5. Use a suitable editor to open the Unicode text files and change encoding
> to UTF-8 (without BOM).

This seems incredibly more complicated than it needs to be and
probably a terrible idea to filter HTML through MS Word.  We talk
about format-shifting and information loss as a result frequently.
Every programming language a person is likely to know has a library
for directly parsing HTML in some fashion. If you have any knowledge
of script and coding it is probably a much better idea to leverage one
of those and make a direct step from HTML to OSIS.  I have done this
at least twice now and with only a small amount of work you can adapt
a script that will process any source text from a given format source.

With Wycliffe we have two source formats which are proprietary SGML
formats akin to HTML. We wrote parsing scripts using well established
SGML and XML formatting tools and are able to leverage this for
automated processing of around 800 different source texts. Moreover
most scripting languages have a simple mechanism that will do the
encoding shifting as well.  A single line in the script is sufficient
in Python to convert from any given source encoding into UTF-8.
Assume that the variable 'text' contains the source in encoding 'enc'.
Just execute
text.decode(enc).encode('utf-8')
and you're done. The SWORD library has similar functionality in SWBuf,
fairly sure Perl has similar abilities.

All in all, you're much better to create a script to take straight out
of the source markup (HTML in this case) and into OSIS. Yes, you'd
need to create a new script for each source, as each one will utilize
different HTML constructs, but a single script could be used to - for
instance - lift all the translations on Biblegateway into a person's
local repository. A single script could run through his website and
scrape it and dump it into an OSIS text with little effort. The markup
format is simple and readily handled by many HTML loading/parsing
libraries.

--Greg

>
> After step 5 you'd have something similar to where you began converting
> plain text to OSIS, but with some ingenuity at step 3, you'd also have some
> elementary markup for italics and red letters that survives the complete
> loss of formating attributes at step 4.
>
> During my Go Bible activities, I've used this approach more times than I can
> recall.
>
> /The steepest part of the learning curve is getting used to the format of
> RTF files when viewed by an ordinary text editor/.
>
> After step 5, it's often simpler to do the next conversion to USFM, and then
> use usfm2osis.pl
>
> Best regards,
> David
>
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/EMTV-text-source-URL-is-now-unrelated-tp3871411p3899264.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>

_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page