Observations on the SahidicBible module

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Observations on the SahidicBible module

David Haslam
In my other thread, Troy kindly wrote,

So, as a side note to this thread,

The Sahidic Bible is maintained at coptot.manuscriptroom.com:

http://coptot.manuscriptroom.com/transcribing?docID=1620025&userName=PUBLISHED

and we regularly export from there and import into swordweb, which is used for their browser plugin (first link on Christian Askeland's wonder resource list for Coptic):

https://sites.google.com/site/askelandchristian/copticlinks

We don't index the text.  They typically search with regex (and yes, they know about the {byte_count} anomaly with our regex search).

-Troy

---------------------------------

FYI. I am now in email contact with Dr Askeland, so I hope we can make progress towards a properly updated module.

The one in the CrossWire repo was last updated in 2012, and judging by the anomalies in the conf file was originally put together somewhat of in a hurry.

Although I have submitted an updated conf file to the modules team, something more substantial is required.

One thing I didn't change in the .conf file was this line:

SourceType=OSIS

I expressed my doubts in my cover email, but now I have conclusive evidence that the module was made using imp2vs.

There are two verses where the IMP format reference was not processed for whatever cause.

$$$Jer.44.6
$$$Acts.7.60

That being the case, and there being no XML fragments in the module text, that line should be

SourceType=PlainText

Providing we can obtain the most recent source text that now contains input from further contributors, it should be feasible to build a new module.

btw. Our other four Coptic modules have "Cop" as a module name prefix. This one doesn't.

Would it be sensible to rename the module as [CopSahidicBible] when it's rebuilt, and include the line

Obsoletes=SahidicBible


Best regards,

David

Reply | Threaded
Open this post in threaded view
|

Re: Observations on the SahidicBible module

Troy A. Griffitts
The text is in constant development and each biblical book is in a
different state of completeness and quality.  An export of the work in
its most current state can always be obtained with:

http://coptot.manuscriptroom.com/community/vmr/api/transcript/export/?docID=1620025&biblicalContent=Gen-Rev

OSIS is our preferred type.  It may not have any tags yet, but it might
and probably will.  The editor at the list I sent in my previous email
produces TEI markup within each verse (which OSIS is heavily based
upon).  The hope is that any markup that might added would be valid
OSIS.  We reserve TEI for our lex/dict modules because there are some
very specific tags in TEI focused on lexica and dictionaries which OSIS
doesn't support-- we don't want to produce a Bible using TEI, unless
there was a very compelling reason.  "SahidicBible" was used as the
module name in SWORDWeb so it would be easy for them to lookup when they
choose their base text for transcribing Coptic.  I'm not sure how the
module got in the main repository for download.  I don't believe I put
it there, unless maybe for a test with one of our frontends.  If you
want to release it there, we certainly should follow our naming conventions.

Troy



On 05/01/2017 01:45 AM, David Haslam wrote:

> In my other thread, Troy kindly wrote,
>
> So, as a side note to this thread,
>
> The Sahidic Bible is maintained at coptot.manuscriptroom.com:
>
> http://coptot.manuscriptroom.com/transcribing?docID=1620025&userName=PUBLISHED
>
> and we regularly export from there and import into swordweb, which is used
> for their browser plugin (first link on Christian Askeland's wonder resource
> list for Coptic):
>
> https://sites.google.com/site/askelandchristian/copticlinks
>
> We don't index the text.  They typically search with regex (and yes, they
> know about the {byte_count} anomaly with our regex search).
>
> -Troy
>
> ---------------------------------
>
> FYI. I am now in email contact with Dr Askeland, so I hope we can make
> progress towards a properly updated module.
>
> The one in the CrossWire repo was last updated in 2012, and judging by the
> anomalies in the conf file was originally put together somewhat of in a
> hurry.
>
> Although I have submitted an updated conf file to the modules team,
> something more substantial is required.
>
> One thing I didn't change in the .conf file was this line:
>
> SourceType=OSIS
>
> I expressed my doubts in my cover email, but now I have conclusive evidence
> that the module was made using *imp2vs*.
>
> There are two verses where the IMP format reference was not processed for
> whatever cause.
>
> $$$Jer.44.6
> $$$Acts.7.60
>
> That being the case, and there being no XML fragments in the module text,
> that line should be
>
> SourceType=PlainText
>
> Providing we can obtain the most recent source text that now contains input
> from further contributors, it should be feasible to build a new module.
>
> btw. Our other four Coptic modules have "Cop" as a module name prefix. This
> one doesn't.
>
> Would it be sensible to rename the module as [CopSahidicBible] when it's
> rebuilt, and include the line
>
> Obsoletes=SahidicBible
>
>
> Best regards,
>
> David
>
>
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/Observations-on-the-SahidicBible-module-tp4657132.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: Observations on the SahidicBible module

David Haslam
Thanks Troy,

I only got drawn into this through my curiosity when I came across a later post of yours in this mailing list in which you were asking about viewing the SahidicBible module in PocketSword. :)

Is it likely then that the "2012-10-10" module version 1.1 in our repository is merely a snapshot that somebody wanted to test with a front-end other than SwordWeb?

I've just downloaded the exported text file, which is in IMP format with (as you describe) some simple TEI markup in the verse text, even for "empty" verses (as it were).

It shouldn't be difficult to at least strip out the TEI markup where it currently has no useful equivalent in OSIS. Then we're back to a plain text source in IMP format.  

I imagine this is how the module was made in 2012. Is that correct?

On the other hand, I could actually retain the <w>...</w> elements as these are valid in OSIS as well, and merely discard the <ab ... > and </ab> items at the start and end of each verse.

It will be interesting to find out how much has changed in the past five years.

If nothing has changed, then it may merely indicate that the SwordVersion date in the conf file is left untouched by whatever procedure is in place whenever the module is rebuilt.

That might be of some concern.

Best regards,

David



Reply | Threaded
Open this post in threaded view
|

Re: Observations on the SahidicBible module

David Haslam
Hi Troy,

I can certainly see that OSIS will be needed to make some parts of the source text more usable.

There are 201 w elements similar to this example:

<w><supplied source="na28" reason="illegible">ⲉⲧ</supplied>ⲃⲉ</w>

I'm thinking that these should be converted to use the OSIS transChange element.

<w><transChange type="x-supplied-na28">ⲉⲧ</transChange>ⲃⲉ</w>

The reason attribute could be discarded if a suitable note was added to the module .conf file.

Even so, there are so many other anomalies that make the file as it stands hard to fathom in terms of what's the semantic significance of various constructions within various w elements.

I'd not expect to see ordinary punctuation marks within a w element, would you?

So stuff like <w>[……</w> and <w>[ⲉ]ⲣⲉϯ…………</w> requires a lot of resolution.

There are also what appear to be errors in how a few verses are marked.
Here's an example of such.

Judges 20:33 contains Judges 20:34

$$$Judges 20:33
<ab n="B07K20V33" id="1007020000.1"><w>ⲁⲩⲱ</w> <w>ⲛⲣⲱⲙⲉ</w> <w>ⲧⲏⲣⲟⲩ</w> <w>ⲙⲡⲓⲥⲣⲁⲏⲗ</w> <w>ⲁⲩⲧⲱⲟⲩⲛⲟⲩ</w> <w>ⲉⲃⲟⲗ</w> <w>ϩⲙⲡⲓⲥⲣⲁⲏⲗⲉⲩⲙⲁ</w> <w>ⲁⲩⲙⲓϣⲉ</w> <w>ϩⲛⲃⲁⲁⲗⲑⲁⲙⲁⲣ</w> <w>ⲁⲩⲱ</w> <w>ⲡⲉⲙⲗⲁϩ</w> <w>ⲙⲡⲓⲥⲣⲁⲏⲗ</w> <w>ⲉⲧⲟ</w> <w>ⲛⲕⲣⲟϥ</w> <w>ⲁϥⲉⲓ</w> <w>ⲉⲃⲟⲗ</w> <w>ϩⲙⲡⲉϥⲙⲁ</w> <w>ϩⲛⲙⲙⲁ</w> <w>ⲛϩⲱⲧⲡ</w> <w>ⲛⲅⲁⲃⲁⲁ</w> <w>Judges</w> <w>20:34</w> <w>ⲁⲩⲉⲓ</w> <w>ⲉⲃⲟⲗ</w> <w>ⲙⲡⲉⲙⲧⲟ</w> <w>ⲉⲃⲟⲗ</w> <w>ⲛⲅⲁⲃⲁⲁ</w> <w>ⲉⲩⲛⲁⲣⲟⲩⲧⲃⲁ</w> <w>ⲛⲣⲱⲙⲉ</w> <w>ⲛⲥⲱⲧⲡ</w> <w>ⲉⲃⲟⲗ</w> <w>ϩⲙⲡⲓⲥⲣⲁⲏⲗ</w> <w>ⲧⲏⲣϥ</w> <w>ⲁⲩⲛⲟϭ</w> <w>ⲛⲙⲗⲁϩ</w> <w>ϣⲱⲡⲉ</w> <w>ⲛⲧⲟⲟⲩ</w> <w>ⲇⲉ</w> <w>ⲙⲡⲟⲩⲉⲓⲙⲉ</w> <w>ϫⲉ</w> <w>ⲧⲕⲁⲕⲓⲁ</w> <w>ⲛⲁⲉⲓ</w> <w>ⲉϩⲣⲁⲓ</w> <w>ⲉϫⲱⲟⲩ</w></ab>
$$$Judges 20:34
<ab n="B07K20V34" id="1007020000.1"/>

How much of this is made worse by the export feature isn't clear.
It might simply be due to a missing EOL, but then the mistake compounded by the export mechanism.

It's probably best if I pursue this off-line with those concerned.

Best regards,

David

Reply | Threaded
Open this post in threaded view
|

Re: Observations on the SahidicBible module

David Haslam
In reply to this post by Troy A. Griffitts
Troy wrote, "... each biblical book is in a different state of completeness and quality".

One evidence of this is that these three books have text but it's completely without w elements yet.

I Samuel
II Samuel
Isaiah

Best regards,

David