XML attribute delimiters in OSIS files?

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

XML attribute delimiters in OSIS files?

David Haslam
We describe and normally expect that all XML attributes are to be delimited by "quotation marks".

Am I correct in thinking that XML attributes delimited by 'apostrophes' would not be parsed correctly by osis2mod?  i.e. Such attributes would be ignored rather than processed.

Background: I just observed that some of the OSIS files hosted by Myanmar Bibles have attributes delimited by apostrophes.

These files were "converted from TEX into OSIS by bibleTec2osis.pl " - so I thereby conclude that the Perl script that was used by the Society contains a significant software bug.

David
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

Chris Little-2
On 10/15/2011 4:36 AM, David Haslam wrote:
> We describe and normally expect that all XML attributes are to be delimited
> by "quotation marks".
>
> Am I correct in thinking that XML attributes delimited by 'apostrophes'
> would not be parsed correctly by osis2mod?  i.e. Such attributes would be
> ignored rather than processed.

I would guess that our converters only look for double quoted attribute
values. However, using single quotes (apostrophes) is perfectly valid
for XML. I'm not sure whether mixed use of single & double quotes in a
single document is valid or not. It might fall in the category of valid,
but poor style.

> Background: I just observed that some of the OSIS files hosted by *Myanmar
> Bibles* have attributes delimited by apostrophes.
>
> These files were "converted from TEX into OSIS by bibleTec2osis.pl " - so I
> thereby conclude that the Perl script that was used by the Society contains
> a significant software bug.

I would not call that a bug on their part. You could say that our
importers have bugs, but I would recommend conforming the import
document to our expectations rather than meddling with osis2mod. It's a
minor issue, and trivial to fix in the source docs. One caveat: use of
single quotes to indicate attribute values permits use of double quotes
within the values, so those double quotes need to be escaped before the
value delimiters are changed to double quotes.

--Chris



_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

David Haslam
Thanks Chris,

You have confirmed what further meditation on my part had led me to conclude, and - yes - the point about quotes within quotes had crossed my mind too!

I think I will leave well alone, unless I discover anything that seems to fall short of our expectations.

David
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

David Haslam
In reply to this post by Chris Little-2
I have discovered today that osis2mod output is incorrect for any text in which the delimiters used for the sID milestone attribute are not the same as the delimiters for the eID attribute in the same verse [or verse range].

This phenomenon was encountered in one of the Myanmar Bibles translations.

btw. The XML files for most of their hosted translations are badly encoded on many other aspects of OSIS, not just XML attribute delimiters.

As source text, most of these files are unsuitable for making a module that we could release. Perhaps the one exception is the Judson translation.

David

Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

Greg Hellings
On Tue, Oct 25, 2011 at 2:14 PM, David Haslam <[hidden email]> wrote:
> I have discovered today that osis2mod output is incorrect for any text in
> which the delimiters used for the *sID* milestone attribute are not the same
> as the delimiters for the *eID* attribute in the same verse [or verse
> range].

If the sID and eID do not match (but should) then this is an error in
the XML document.  It's likely that osis2mod is handling them
correctly, it just looks like it's incorrect because the OSIS file is,
itself, wrong.  This is the type of "incorrect" that is beyond what an
XML Schema validation step is capable of detecting.

--Greg

>
> This phenomenon was encountered in one of the Myanmar Bibles translations.
>
> btw. The XML files for most of their hosted translations are badly encoded
> on many other aspects of OSIS, not just XML attribute delimiters.
>
> As source text, most of these files are unsuitable for making a module that
> we could release. Perhaps the one exception is the Judson translation.
>
> David
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3937889.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>

_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

DM Smith-5
In reply to this post by David Haslam
I'm not clear what you mean by delimiters? Can you provide an example of
the problem?

Thanks.

In Him,
     DM

On 10/25/2011 03:14 PM, David Haslam wrote:

> I have discovered today that osis2mod output is incorrect for any text in
> which the delimiters used for the *sID* milestone attribute are not the same
> as the delimiters for the *eID* attribute in the same verse [or verse
> range].
>
> This phenomenon was encountered in one of the Myanmar Bibles translations.
>
> btw. The XML files for most of their hosted translations are badly encoded
> on many other aspects of OSIS, not just XML attribute delimiters.
>
> As source text, most of these files are unsuitable for making a module that
> we could release. Perhaps the one exception is the Judson translation.
>
> David
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3937889.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

Chris Little-2
Do you mean they are using single-quotes for one and double-quotes for
the other? That's certainly bad style, at the least. I'm not sure
whether it is well-formed XML. Did you happen to try validating?

An example, as DM requests, would help in diagnosis. Also, some
elaboration on "output is incorrect" would aid.

I have a feeling, though, that our advice would be something along the
lines of "fix the XML" in any case, or "run it through a regularizing
pre-processor" of some sort.

--Chris

On 10/25/2011 1:21 PM, DM Smith wrote:

> I'm not clear what you mean by delimiters? Can you provide an example of
> the problem?
>
> Thanks.
>
> In Him,
> DM
>
> On 10/25/2011 03:14 PM, David Haslam wrote:
>> I have discovered today that osis2mod output is incorrect for any text in
>> which the delimiters used for the *sID* milestone attribute are not
>> the same
>> as the delimiters for the *eID* attribute in the same verse [or verse
>> range].
>>
>> This phenomenon was encountered in one of the Myanmar Bibles
>> translations.
>>
>> btw. The XML files for most of their hosted translations are badly
>> encoded
>> on many other aspects of OSIS, not just XML attribute delimiters.
>>
>> As source text, most of these files are unsuitable for making a module
>> that
>> we could release. Perhaps the one exception is the Judson translation.
>>
>> David
>>
>>
>>
>> --
>> View this message in context:
>> http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3937889.html
>>
>> Sent from the SWORD Dev mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> sword-devel mailing list: [hidden email]
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

Sebastien Koechlin-5
On Tue, Oct 25, 2011 at 03:51:03PM -0700, Chris Little wrote:
> Do you mean they are using single-quotes for one and double-quotes
> for the other? That's certainly bad style, at the least. I'm not
> sure whether it is well-formed XML. Did you happen to try
> validating?

Using single and double quotes in the same element are valid and such files
are well-formed XML.  There's a test-case at the end of this mail.

You can declare it as not valid for osis2mod but it's one more pitfall for
OSIS writters and one more step away from XML. Is it so difficult to correct
this bug?

% cat test.xml  
<?xml version="1.0"?>

<a one='1' two="2">
        <b type="start">
                <c/>
                <c attr='single&apos;'/>
                <c attr="double&quot;"/>
                <c attr='&apos; &amp; &quot;'/>
        </b>
</a>
% xmlwf test.xml
% echo $?      
0

--
Seb, autocuiseur

_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

David Haslam
In reply to this post by DM Smith-5
Mixing double and single quotes, as per earlier messages in this thread.

Example (minus the chaff):

sID="reference"
.....
eID='reference'

But this time for the same verse, just as Chris replied, rather than in completely separate OSIS elements.

As this is just an observation, I see no immediate need to give a detailed example of what happens to the module.
To locate the places where I spotted it yesterday would take some time.

Perhaps the most interesting thing is that there was no error message from osis2mod.

And I agree with Chris, the OSIS needs fixing first, before using as input for osis2mod.

David


Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

David Haslam
In reply to this post by Chris Little-2
PS.

Yes - I had validated the OSIS files, after correcting the specified schemaLocation.

They did validate, despite the mixture of attribute delimiters.

David
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

DM Smith-5
In reply to this post by David Haslam
Ah, now I understand. This is a bug. And should be fixed. (BTW, not having the entire thread reproduced in each email makes it harder to understand the context of the email. I don't like having to go digging for the context. Having looked, I see that the first email in the thread defines delimiters.)

But I'm not sure where it should be fixed. I haven't looked at the code, but as I recall, we use the SWORD parser to obtain the attribute value. My guess is that it is returning it with the quotes. If the problem is there and we fix it there, it may break a whole host of other things. (This parser is not a true XML parser, but one that is highly optimized for speed and thus we work with it's definition.)

It should be easy to change osis2mod to work. I'll look into doing this soon.

That said, it is and has been the recommendation that double quotes be used to wrap attribute values. It is valid to use single quotes, but it may (does) expose bugs. Fixing this bug does not change this recommendation.

Until osis2mod has been changed and it is available, it is advisable to change the input so that the quoting of sID/eID pairs to be identical.

In Him,
        DM

On Oct 26, 2011, at 6:38 AM, David Haslam wrote:

> Mixing double and single quotes, as per earlier messages in this thread.
>
> Example (minus the chaff):
>
> sID="reference"
> .....
> eID='reference'
>
> But this time for the same verse, just as Chris replied, rather than in
> completely separate OSIS elements.
>
> As this is just an observation, I see no immediate need to give a detailed
> example of what happens to the module.
> To locate the places where I spotted it yesterday would take some time.
>
> Perhaps the most interesting thing is that there was no error message from
> osis2mod.
>
> And I agree with Chris, the OSIS needs fixing first, before using as input
> for osis2mod.
>
> David
>
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3940110.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

refdoc@gmx.net
Is there any actual credible reason for having quotation marks in attributes? I agree that it may be grammatically correct for XML as such, but OSIS's attributes are defined and do not contain quotation marks. And x-marked attributes are largely thrown out during the osis2mod run, no? Or at least ignored - apart from our own - like x-preverse.

Peter

 
-------- Original-Nachricht --------
> Datum: Wed, 26 Oct 2011 08:59:14 -0400
> Von: DM Smith <[hidden email]>
> An: SWORD Developers\' Collaboration Forum <[hidden email]>
> Betreff: Re: [sword-devel] XML attribute delimiters in OSIS files?

> Ah, now I understand. This is a bug. And should be fixed. (BTW, not having
> the entire thread reproduced in each email makes it harder to understand
> the context of the email. I don't like having to go digging for the context.
> Having looked, I see that the first email in the thread defines
> delimiters.)
>
> But I'm not sure where it should be fixed. I haven't looked at the code,
> but as I recall, we use the SWORD parser to obtain the attribute value. My
> guess is that it is returning it with the quotes. If the problem is there
> and we fix it there, it may break a whole host of other things. (This parser
> is not a true XML parser, but one that is highly optimized for speed and
> thus we work with it's definition.)
>
> It should be easy to change osis2mod to work. I'll look into doing this
> soon.
>
> That said, it is and has been the recommendation that double quotes be
> used to wrap attribute values. It is valid to use single quotes, but it may
> (does) expose bugs. Fixing this bug does not change this recommendation.
>
> Until osis2mod has been changed and it is available, it is advisable to
> change the input so that the quoting of sID/eID pairs to be identical.
>
> In Him,
> DM
>
> On Oct 26, 2011, at 6:38 AM, David Haslam wrote:
>
> > Mixing double and single quotes, as per earlier messages in this thread.
> >
> > Example (minus the chaff):
> >
> > sID="reference"
> > .....
> > eID='reference'
> >
> > But this time for the same verse, just as Chris replied, rather than in
> > completely separate OSIS elements.
> >
> > As this is just an observation, I see no immediate need to give a
> detailed
> > example of what happens to the module.
> > To locate the places where I spotted it yesterday would take some time.
> >
> > Perhaps the most interesting thing is that there was no error message
> from
> > osis2mod.
> >
> > And I agree with Chris, the OSIS needs fixing first, before using as
> input
> > for osis2mod.
> >
> > David
> >
> >
> >
> >
> > --
> > View this message in context:
> http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3940110.html
> > Sent from the SWORD Dev mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > sword-devel mailing list: [hidden email]
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

--
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
Jetzt informieren: http://www.gmx.net/de/go/freephone

_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

David Haslam
Yes - there are very good reasons for the delimiters.
It's because many attributes have several terms separated by spaces.

e.g. For a verse range:

osisID="Exod.22.2 Exod.22.3 Exod.22.4" n="2-4"

Inaddition to the fact that removing delimiters would make the XML invalid, let alone cause problems for OSIS.

David
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

DM Smith-5
On 10/26/2011 10:34 AM, David Haslam wrote:
> Yes - there are very good reasons for the delimiters.
> It's because many attributes have several terms separated by spaces.

I think he meant within the attribute's value in an OSIS document.

>
> e.g. For a verse range:
>
> osisID="Exod.22.2 Exod.22.3 Exod.22.4" n="2-4"
>
> Inaddition to the fact that removing delimiters would make the XML invalid,
> let alone cause problems for OSIS.
>
> David
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3940813.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

David Haslam
You think Peter was thinking of nested quotations?  Greg hinted at the possibility in his first reply.

So to illustrate...(merely hypothetical - not something to expect in OSIS)

------

!!!!! Does not address the possibility of nesting within an attribute !!!!!

Example:
   <title type='nested "quotation" '/>
becomes
   <title type="nested "quotation" "/>
which is invalid XML

However this is much less likely than
   <title type="nested 'quotation' "/>
so probably not an issue.

------

(Pasted from a comment in the TextPipe filter I was just writing.)

Special filter to rectify XML attribute delimiters
Converts delimiters that use apostrophes to using double quotes

My filter doesn't care that the input files are OSIS.
It's just a simple solution to the issue I encountered.

David
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

refdoc@gmx.net
I was thinking of nested quotations. And I see no obvious use for them. No legal attribute I know of  uses them, no?

Peter
-------- Original-Nachricht --------
> Datum: Wed, 26 Oct 2011 08:19:37 -0700 (PDT)
> Von: David Haslam <[hidden email]>
> An: [hidden email]
> Betreff: Re: [sword-devel] XML attribute delimiters in OSIS files?

> You think Peter was thinking of nested quotations?  Greg hinted at the
> possibility in his first reply.
>
> So to illustrate...(merely hypothetical - not something to expect in OSIS)
>
> ------
>
> !!!!! Does not address the possibility of nesting within an attribute
> !!!!!
>
> Example:
>    <title type='nested "quotation" '/>
> becomes
>    <title type="nested "quotation" "/>
> which is invalid XML
>
> However this is much less likely than
>    <title type="nested 'quotation' "/>
> so probably not an issue.
>
> ------
>
> (Pasted from a comment in the TextPipe filter I was just writing.)
>
> Special filter to rectify XML attribute delimiters
> Converts delimiters that use apostrophes to using double quotes
>
> My filter doesn't care that the input files are OSIS.
> It's just a simple solution to the issue I encountered.
>
> David
>
> --
> View this message in context:
> http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3940986.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

DM Smith-5
In reply to this post by refdoc@gmx.net
On 10/26/2011 09:47 AM, Peter von Kaehne wrote:
> Is there any actual credible reason for having quotation marks in attributes? I agree that it may be grammatically correct for XML as such, but OSIS's attributes are defined and do not contain quotation marks. And x-marked attributes are largely thrown out during the osis2mod run, no? Or at least ignored - apart from our own - like x-preverse.
>
> Peter

I had never spent the time to look at the allowable attribute values in
an OSIS document. Now, having looked at the schema, it is allowed to
nest quotes. See below for details.

I think there are many good reasons that a single quote will be found in
an attribute value. Many languages use it for other things than quoting.

I can only think of a few, probably obscure, reasons for a double quote
to be there. E.g chapterTitle='xxx aka "yyy"', who='James "Jimmy"
Smith', ...

Osis2mod *should* allow for all well-formed, valid (both syntactically
and semantically) OSIS documents. Regarding quoting attribute values,
the recommendation still stands, use double quotes if at all possible,
but also avoid &quot; and &apos; too. (Note that these entities are only
needed within attribute values and never elsewhere in the text.)

(Below I'm using x@y to mean element x with attribute y.)

In looking at this, I think there are some bugs in the definition of
l@type, lg@type, and rdg@type.

In Him,
     DM

Here are the attributes that allow for arbitrary text:
actor@who
<xs:attribute name="who" type="xs:string" use="optional"/>
contributor@file-as
<xs:attribute name="file-as" type="xs:string" use="optional"/>
a@href
<xs:attribute name="href" type="xs:string" use="required"/>
abbr@expansion
<xs:attribute name="expansion" type="xs:string" use="optional"/>
chapter@chapterTitle
<xs:attribute name="chapterTitle" type="xs:string" use="optional"/>
figure@alt, @catalog, @location, @rights, @size, @src
<xs:attribute name="alt" type="xs:string" use="optional"/>
<xs:attribute name="catalog" type="xs:string" use="optional"/>
<xs:attribute name="location" type="xs:string" use="optional"/>
<xs:attribute name="rights" type="xs:string" use="optional"/>
<xs:attribute name="size" type="xs:string" use="optional"/>
<xs:attribute name="src" type="xs:string"/>
index@index, @level1, @level2, @level3, @level4, @see
<xs:attribute name="index" type="xs:string" use="required"/>
<xs:attribute name="level1" type="xs:string" use="required"/>
<xs:attribute name="level2" type="xs:string" use="optional"/>
<xs:attribute name="level3" type="xs:string" use="optional"/>
<xs:attribute name="level4" type="xs:string" use="optional"/>
<xs:attribute name="see" type="xs:string" use="optional"/>
item@role
<xs:attribute name="role" type="xs:string" use="optional"/>
label@role
<xs:attribute name="role" type="xs:string" use="optional"/>
milestone@marker
<xs:attribute name="marker" type="xs:string" default="DEFAULT"
use="optional"/>
milestoneEnd@start
<xs:attribute name="start" type="xs:string" use="required"/>
milestoneStart@end
<xs:attribute name="end" type="xs:string" use="required"/>
name@regular
<xs:attribute name="regular" type="xs:string" use="optional"/>
q@level, @marker, @who
<xs:attribute name="level" type="xs:string" use="optional"/>
<xs:attribute name="marker" type="xs:string" default="DEFAULT"
use="optional"/>
<xs:attribute name="who" type="xs:string" use="optional"/>
speaker@who
<xs:attribute name="who" type="xs:string" use="optional"/>
speech@marker
<xs:attribute name="marker" type="xs:string" default="DEFAULT"
use="optional"/>
title@short
<xs:attribute name="short" type="xs:string" use="optional"/>
w@gloss, @src, @xlit
<xs:attribute name="gloss" type="xs:string" use="optional"/>
<xs:attribute name="src" type="xs:string" use="optional"/>
<xs:attribute name="xlit" type="xs:string" use="optional"/>
Globally (globalWithType, globalWithoutType)
@annotateWork, @resp, @n
<xs:attribute name="annotateWork" type="xs:string" use="optional"/>
<xs:attribute name="resp" type="xs:string" use="optional"/>
<xs:attribute name="n" type="xs:string" use="optional"/>
Milestone attributes
@sID, @eID
<xs:attribute name="sID" type="xs:string" use="optional"/>
<xs:attribute name="eID" type="xs:string" use="optional"/>
osisID, osisRef, osisAnnotateType regexes allowing quotation marks:
(look for [^...] constructs)
<xs:pattern value="((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)"/>
<xs:pattern
value="(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?:)?((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?"/>
<xs:pattern
value="(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?:)?((\p{L}|\p{N}|_|(\\[^\s]))+)(\.(\p{L}|\p{N}|_|(\\[^\s]))*)*(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?(@(cp\[(\p{Nd})*\]|s\[(\p{L}|\p{N})+\](\[(\p{N})+\])?))?(\-((((\p{L}|\p{N}|_|(\\[^\s]))+)(\.(\p{L}|\p{N}|_|(\\[^\s]))*)*)+)(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?(@(cp\[(\p{Nd})*\]|s\[(\p{L}|\p{N})+\](\[(\p{N})+\])?))?)?"/>
Attribute extension regex:
<xs:pattern value="x-([^\s])+"/>
l@type
<xs:union memberTypes="osisLine attributeExtension xs:string"/>
lg@type
<xs:union memberTypes="osisLineGroup attributeExtension xs:string"/>
<xs:simpleType name="osisLineGroup">
<xs:restriction base="xs:string">
<!-- <xs:enumeration value="doxology"/> -->
</xs:restriction>
</xs:simpleType>
rdg@type
<xs:union memberTypes="osisRdg attributeExtension xs:string"/>

>
>
> -------- Original-Nachricht --------
>> Datum: Wed, 26 Oct 2011 08:59:14 -0400
>> Von: DM Smith<[hidden email]>
>> An: SWORD Developers\' Collaboration Forum<[hidden email]>
>> Betreff: Re: [sword-devel] XML attribute delimiters in OSIS files?
>> Ah, now I understand. This is a bug. And should be fixed. (BTW, not having
>> the entire thread reproduced in each email makes it harder to understand
>> the context of the email. I don't like having to go digging for the context.
>> Having looked, I see that the first email in the thread defines
>> delimiters.)
>>
>> But I'm not sure where it should be fixed. I haven't looked at the code,
>> but as I recall, we use the SWORD parser to obtain the attribute value. My
>> guess is that it is returning it with the quotes. If the problem is there
>> and we fix it there, it may break a whole host of other things. (This parser
>> is not a true XML parser, but one that is highly optimized for speed and
>> thus we work with it's definition.)
>>
>> It should be easy to change osis2mod to work. I'll look into doing this
>> soon.
>>
>> That said, it is and has been the recommendation that double quotes be
>> used to wrap attribute values. It is valid to use single quotes, but it may
>> (does) expose bugs. Fixing this bug does not change this recommendation.
>>
>> Until osis2mod has been changed and it is available, it is advisable to
>> change the input so that the quoting of sID/eID pairs to be identical.
>>
>> In Him,
>> DM
>>
>> On Oct 26, 2011, at 6:38 AM, David Haslam wrote:
>>
>>> Mixing double and single quotes, as per earlier messages in this thread.
>>>
>>> Example (minus the chaff):
>>>
>>> sID="reference"
>>> .....
>>> eID='reference'
>>>
>>> But this time for the same verse, just as Chris replied, rather than in
>>> completely separate OSIS elements.
>>>
>>> As this is just an observation, I see no immediate need to give a
>> detailed
>>> example of what happens to the module.
>>> To locate the places where I spotted it yesterday would take some time.
>>>
>>> Perhaps the most interesting thing is that there was no error message
>> from
>>> osis2mod.
>>>
>>> And I agree with Chris, the OSIS needs fixing first, before using as
>> input
>>> for osis2mod.
>>>
>>> David
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>> http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3940110.html
>>> Sent from the SWORD Dev mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> sword-devel mailing list: [hidden email]
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list: [hidden email]
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

Troy A. Griffitts
Hey guys.  Just did some testing.  If you have a look at
sword/tests/xmltest and try the problem case:

./xmltest "<title type='nested \"quotation\" '/>"

(xmltest already tries to add an attribute to your input which tests for
embedded quotes, so you'll see an addedAttribute in your output)

You get:

[scribe@charis tests]$ ./xmltest "<title type='nested \"quotation\" '/>"
<title type='nested "quotation" '/>
<title type='nested "quotation" '/>
<title addedAttribute='with a " quote' type='nested "quotation" '/>
Tag name: [title]
  - attribute: [addedAttribute] = [with a " quote]
     4 parts:
     with
     a
     "
     quote
  - attribute: [type] = [nested "quotation" ]
     3 parts:
     nested
     "quotation"

  isEmpty: 1
  isEndTag: 0


It is a little odd that the second attribute has "3 parts", but looking
at the example given, it have a space at the end, so I supposed this
might be correct.

Hope this is helpful in tracking this down,

Troy





On 10/26/2011 06:38 PM, DM Smith wrote:

> On 10/26/2011 09:47 AM, Peter von Kaehne wrote:
>> Is there any actual credible reason for having quotation marks in
>> attributes? I agree that it may be grammatically correct for XML as
>> such, but OSIS's attributes are defined and do not contain quotation
>> marks. And x-marked attributes are largely thrown out during the
>> osis2mod run, no? Or at least ignored - apart from our own - like
>> x-preverse.
>>
>> Peter
>
> I had never spent the time to look at the allowable attribute values
> in an OSIS document. Now, having looked at the schema, it is allowed
> to nest quotes. See below for details.
>
> I think there are many good reasons that a single quote will be found
> in an attribute value. Many languages use it for other things than
> quoting.
>
> I can only think of a few, probably obscure, reasons for a double
> quote to be there. E.g chapterTitle='xxx aka "yyy"', who='James
> "Jimmy" Smith', ...
>
> Osis2mod *should* allow for all well-formed, valid (both syntactically
> and semantically) OSIS documents. Regarding quoting attribute values,
> the recommendation still stands, use double quotes if at all possible,
> but also avoid &quot; and &apos; too. (Note that these entities are
> only needed within attribute values and never elsewhere in the text.)
>
> (Below I'm using x@y to mean element x with attribute y.)
>
> In looking at this, I think there are some bugs in the definition of
> l@type, lg@type, and rdg@type.
>
> In Him,
>     DM
>
> Here are the attributes that allow for arbitrary text:
> actor@who
> <xs:attribute name="who" type="xs:string" use="optional"/>
> contributor@file-as
> <xs:attribute name="file-as" type="xs:string" use="optional"/>
> a@href
> <xs:attribute name="href" type="xs:string" use="required"/>
> abbr@expansion
> <xs:attribute name="expansion" type="xs:string" use="optional"/>
> chapter@chapterTitle
> <xs:attribute name="chapterTitle" type="xs:string" use="optional"/>
> figure@alt, @catalog, @location, @rights, @size, @src
> <xs:attribute name="alt" type="xs:string" use="optional"/>
> <xs:attribute name="catalog" type="xs:string" use="optional"/>
> <xs:attribute name="location" type="xs:string" use="optional"/>
> <xs:attribute name="rights" type="xs:string" use="optional"/>
> <xs:attribute name="size" type="xs:string" use="optional"/>
> <xs:attribute name="src" type="xs:string"/>
> index@index, @level1, @level2, @level3, @level4, @see
> <xs:attribute name="index" type="xs:string" use="required"/>
> <xs:attribute name="level1" type="xs:string" use="required"/>
> <xs:attribute name="level2" type="xs:string" use="optional"/>
> <xs:attribute name="level3" type="xs:string" use="optional"/>
> <xs:attribute name="level4" type="xs:string" use="optional"/>
> <xs:attribute name="see" type="xs:string" use="optional"/>
> item@role
> <xs:attribute name="role" type="xs:string" use="optional"/>
> label@role
> <xs:attribute name="role" type="xs:string" use="optional"/>
> milestone@marker
> <xs:attribute name="marker" type="xs:string" default="DEFAULT"
> use="optional"/>
> milestoneEnd@start
> <xs:attribute name="start" type="xs:string" use="required"/>
> milestoneStart@end
> <xs:attribute name="end" type="xs:string" use="required"/>
> name@regular
> <xs:attribute name="regular" type="xs:string" use="optional"/>
> q@level, @marker, @who
> <xs:attribute name="level" type="xs:string" use="optional"/>
> <xs:attribute name="marker" type="xs:string" default="DEFAULT"
> use="optional"/>
> <xs:attribute name="who" type="xs:string" use="optional"/>
> speaker@who
> <xs:attribute name="who" type="xs:string" use="optional"/>
> speech@marker
> <xs:attribute name="marker" type="xs:string" default="DEFAULT"
> use="optional"/>
> title@short
> <xs:attribute name="short" type="xs:string" use="optional"/>
> w@gloss, @src, @xlit
> <xs:attribute name="gloss" type="xs:string" use="optional"/>
> <xs:attribute name="src" type="xs:string" use="optional"/>
> <xs:attribute name="xlit" type="xs:string" use="optional"/>
> Globally (globalWithType, globalWithoutType)
> @annotateWork, @resp, @n
> <xs:attribute name="annotateWork" type="xs:string" use="optional"/>
> <xs:attribute name="resp" type="xs:string" use="optional"/>
> <xs:attribute name="n" type="xs:string" use="optional"/>
> Milestone attributes
> @sID, @eID
> <xs:attribute name="sID" type="xs:string" use="optional"/>
> <xs:attribute name="eID" type="xs:string" use="optional"/>
> osisID, osisRef, osisAnnotateType regexes allowing quotation marks:
> (look for [^...] constructs)
> <xs:pattern
> value="((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)"/>
> <xs:pattern
> value="(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?:)?((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?"/>
> <xs:pattern
> value="(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?:)?((\p{L}|\p{N}|_|(\\[^\s]))+)(\.(\p{L}|\p{N}|_|(\\[^\s]))*)*(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?(@(cp\[(\p{Nd})*\]|s\[(\p{L}|\p{N})+\](\[(\p{N})+\])?))?(\-((((\p{L}|\p{N}|_|(\\[^\s]))+)(\.(\p{L}|\p{N}|_|(\\[^\s]))*)*)+)(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?(@(cp\[(\p{Nd})*\]|s\[(\p{L}|\p{N})+\](\[(\p{N})+\])?))?)?"/>
> Attribute extension regex:
> <xs:pattern value="x-([^\s])+"/>
> l@type
> <xs:union memberTypes="osisLine attributeExtension xs:string"/>
> lg@type
> <xs:union memberTypes="osisLineGroup attributeExtension xs:string"/>
> <xs:simpleType name="osisLineGroup">
> <xs:restriction base="xs:string">
> <!-- <xs:enumeration value="doxology"/> -->
> </xs:restriction>
> </xs:simpleType>
> rdg@type
> <xs:union memberTypes="osisRdg attributeExtension xs:string"/>
>
>>
>>
>> -------- Original-Nachricht --------
>>> Datum: Wed, 26 Oct 2011 08:59:14 -0400
>>> Von: DM Smith<[hidden email]>
>>> An: SWORD Developers\' Collaboration Forum<[hidden email]>
>>> Betreff: Re: [sword-devel] XML attribute delimiters in OSIS files?
>>> Ah, now I understand. This is a bug. And should be fixed. (BTW, not
>>> having
>>> the entire thread reproduced in each email makes it harder to
>>> understand
>>> the context of the email. I don't like having to go digging for the
>>> context.
>>> Having looked, I see that the first email in the thread defines
>>> delimiters.)
>>>
>>> But I'm not sure where it should be fixed. I haven't looked at the
>>> code,
>>> but as I recall, we use the SWORD parser to obtain the attribute
>>> value. My
>>> guess is that it is returning it with the quotes. If the problem is
>>> there
>>> and we fix it there, it may break a whole host of other things.
>>> (This parser
>>> is not a true XML parser, but one that is highly optimized for speed
>>> and
>>> thus we work with it's definition.)
>>>
>>> It should be easy to change osis2mod to work. I'll look into doing this
>>> soon.
>>>
>>> That said, it is and has been the recommendation that double quotes be
>>> used to wrap attribute values. It is valid to use single quotes, but
>>> it may
>>> (does) expose bugs. Fixing this bug does not change this
>>> recommendation.
>>>
>>> Until osis2mod has been changed and it is available, it is advisable to
>>> change the input so that the quoting of sID/eID pairs to be identical.
>>>
>>> In Him,
>>>     DM
>>>
>>> On Oct 26, 2011, at 6:38 AM, David Haslam wrote:
>>>
>>>> Mixing double and single quotes, as per earlier messages in this
>>>> thread.
>>>>
>>>> Example (minus the chaff):
>>>>
>>>> sID="reference"
>>>> .....
>>>> eID='reference'
>>>>
>>>> But this time for the same verse, just as Chris replied, rather
>>>> than in
>>>> completely separate OSIS elements.
>>>>
>>>> As this is just an observation, I see no immediate need to give a
>>> detailed
>>>> example of what happens to the module.
>>>> To locate the places where I spotted it yesterday would take some
>>>> time.
>>>>
>>>> Perhaps the most interesting thing is that there was no error message
>>> from
>>>> osis2mod.
>>>>
>>>> And I agree with Chris, the OSIS needs fixing first, before using as
>>> input
>>>> for osis2mod.
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>> http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3940110.html 
>>>
>>>> Sent from the SWORD Dev mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: [hidden email]
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> _______________________________________________
>>> sword-devel mailing list: [hidden email]
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

David Haslam
The idea of nested quotes within an OSIS attribute was somewhat of a side issue possibility.
I'm glad you've delved into this aspect, but we do seem to have come some distance from my original question.

The more important issue is still how do our utilities handle such as I reported earlier today.

sID="reference" ....  eID='reference'

It would seem that this causes the eID to not match the sID - and as Chris remarked, this may be considered as a software bug.

David

Reply | Threaded
Open this post in threaded view
|

Re: XML attribute delimiters in OSIS files?

DM Smith-5
In reply to this post by David Haslam
David,

I'm looking at the code and it looks correct. I tested it and find no
problem. Bottom line: Your problem is elsewhere.

Here is the result. (It might wrap, making it ugly.)

Given Troy's recent example,
SWBuf sid = t.getAttribute("sID");
will get the value of the sID attribute and it will not include the
quotation marks, whether single or double.

Later when we compare
SWBuf eid = t.getAttribute("eID");
if (sid == eid) {
    they match
} else {
    they don't match: complain loudly!!!!
}

I think there is something else going on. I need a good test case.
The simplest one I can think of:
<osis><div type="book" osisID="Gen"><chapter osisID="Gen.1"><verse
osisID="Gen.1.1" sID="Gen.1.1"/>In the beginning...<verse
eID='Gen.1.1'/></chapter></div></osis>
If I'm understanding you, this should not produce a module with Gen 1:1
and will give no error message.

I ran this with full debug on and got the following:
[dmsmith@www utilities]$ ./osis2mod /tmp - -d 1023 <<.
 > <osis><div type="book" osisID="Gen"><chapter osisID="Gen.1"><verse
osisID="Gen.1.1" sID="Gen.1.1"/>In the beginning...<verse
eID='Gen.1.1'/></chapter></div></osis>
 > .
You are running osis2mod: $Rev: 2659 $
DEBUG(ARGS):
     path: /tmp
     osisDoc: -
     create: 0
     compressType:
     blockType: 4
     cipherKey:
     normalize: 1
DEBUG(XFORM): N/A: xform push (1) <osis> (tagname=osis)
DEBUG(XFORM): N/A: xform top(1) <osis>
DEBUG(STACK): N/A: push (1) osis
DEBUG(XFORM): N/A: xform push (2) <div osisID="Gen" sID="gen1"
type="book"/> (tagname=div)
DEBUG(XFORM): N/A: xform top(2) <div osisID="Gen" sID="gen1" type="book"/>
DEBUG(FOUND): Found first div and pitching prior material: <osis>
DEBUG(TITLE): Gen: Looking for book introduction
DEBUG(FOUND): New book is Gen
DEBUG(XFORM): Gen: xform push (3) <chapter osisID="Gen.1" sID="gen2"/>
(tagname=chapter)
DEBUG(XFORM): Gen: xform top(3) <chapter osisID="Gen.1" sID="gen2"/>
DEBUG(TITLE): Gen: BOOK INTRO <div osisID="Gen" sID="gen1" type="book"/>
DEBUG(FOUND): Current chapter is Gen.1 (Gen.1)
DEBUG(TITLE): Gen.1: Looking for chapter introduction
DEBUG(XFORM): Gen.1: xform empty <verse osisID="Gen.1.1" sID="Gen.1.1"/>
DEBUG(FOUND): Entering verse
DEBUG(TITLE): Gen.1: Done looking for chapter introduction
DEBUG(TITLE): Gen.1: CHAPTER INTRO <chapter osisID="Gen.1" sID="gen2"/>
DEBUG(WRITE): Gen:Gen: <div osisID="Gen" sID="gen1" type="book"/>
DEBUG(REF): Copy osisID:Gen.1.1
Member Key Count = 1
contains = Genesis 1:1
DEBUG(FOUND): New current verse is Gen.1.1
DEBUG(FOUND): osisID/annotateRef is adjusted to: Gen.1.1
DEBUG(XFORM): Gen.1.1: xform empty <verse eID="Gen.1.1"/>
DEBUG(WRITE): Gen.1:Gen.1: <chapter osisID="Gen.1" sID="gen2"/>
DEBUG(XFORM): Gen.1.1: xform pop(3) <chapter osisID="Gen.1" sID="gen2"/>
DEBUG(XFORM): Gen.1.1: xform pop(2) <div osisID="Gen" sID="gen1"
type="book"/>
DEBUG(XFORM): Gen.1.1: xform pop(1) <osis>
DEBUG(STACK): Gen.1.1: pop(1) osis
DEBUG(WRITE): Gen.1.1:Gen.1.1: <milestone osisID="Gen.1.1" resp="v"
sID="Gen.1.1"/>In the beginning...<milestone eID="Gen.1.1" resp="v"/>
<chapter eID="gen2" osisID="Gen.1"/> <div eID="gen1" osisID="Gen"
type="book"/>

Here is the content of the module's "/tmp/ot" file:
<milestone type="x-importer" subType="x-osis2mod" n="$Rev: 2659 $"/>
<div osisID="Gen" sID="gen1" type="book"/>
<chapter osisID="Gen.1" sID="gen2"/>
<milestone osisID="Gen.1.1" resp="v" sID="Gen.1.1"/>In the
beginning...<milestone eID="Gen.1.1" resp="v"/> <chapter eID="gen2"
osisID="Gen.1"/> <div eID="gen1" osisID="Gen" type="book"/>

In Him,
     DM

On 10/26/2011 06:38 AM, David Haslam wrote:

> Mixing double and single quotes, as per earlier messages in this thread.
>
> Example (minus the chaff):
>
> sID="reference"
> .....
> eID='reference'
>
> But this time for the same verse, just as Chris replied, rather than in
> completely separate OSIS elements.
>
> As this is just an observation, I see no immediate need to give a detailed
> example of what happens to the module.
> To locate the places where I spotted it yesterday would take some time.
>
> Perhaps the most interesting thing is that there was no error message from
> osis2mod.
>
> And I agree with Chris, the OSIS needs fixing first, before using as input
> for osis2mod.
>
> David
>
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3940110.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: [hidden email]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [hidden email]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
12