|
|
|
|
|
|
XML = unranked trees |
|
XML Schema = wants to be a regular language of
unranked trees |
|
|
|
Several official proposals: |
|
DTD, XSchema - baroque and bad |
|
Several counter proposals |
|
Relax NG |
|
Often arbtrary restrictions are imposed on the
RE’s, claiming “efficiency” |
|
|
|
|
Containment:
E1 µ E2 |
|
|
|
Equivalence:
E1 = E2 |
|
|
|
Intersection: E1 Ĺ E2 = ; |
|
|
|
In class: What is their complexity for all
regular expressions ? |
|
|
|
|
E1 µ E2: PSPACE complete |
|
|
|
E1 = E2: PSPACE complete |
|
|
|
E1 Ĺ E2 = ;: PTIME |
|
|
|
|
Paper claims (and is right) that in practice the
DTDs or XSchema use “simple” regular expressions |
|
|
|
What is “simple” ? Open to debate, but paper makes the following proposals |
|
|
|
|
|
Symbol s = a letter or a word |
|
Notation a
or w |
|
Possibly followed by ? or *. Notation: a?, a*, w?, w* |
|
|
|
Factor f = s | s | . . . | s |
|
Notation:
s or +s (e.g. (+w*) or (+a*)? or (w*)? ) |
|
Possibly followed by ? or * |
|
|
|
Simple RE = f.f….f |
|
Notation:
RE(f1, f2, …, fk) where f1, …, fk
are the kinds of factors allowed |
|
|
|
|
|
Examples |
|
RE(a,a*): |
|
Name.Address.Phone*.Email |
|
RE(a,a?,a*): |
|
Name.Email?.Address?.Email*.Phone?.Email |
|
RE((+a),a*) |
|
Name.(Email | Phone).Address*.Email* |
|
|
|
|
|
RE(a?, (+a)*)
in PTIME [1] |
|
RE(a, S, S*)
in PTIME [17] |
|
|
|
RE(a, a*)
or RE(a, a?) coNP hard |
|
WOW ! |
|
|
|
Others in the paper |
|
|
|
|
Background: |
|
Given two ranked-tree automata A1, A2,
checking L(A1) µ L(A2) is EXPTIME complete |
|
Note: if A_2 is deterministic, then one can
check containment in time |A_1| |A_2| |
|
|
|
|
|
|
DTDs and XML Schemas are unranked tree languages |
|
No big deal: easy to encode unraked trees into
ranked trees [show in class] |
|
Still, lots of papers out there that re-invent
regular languages for unranked trees |
|
|
|
|
|
|
Given alphabet S |
|
|
|
A DTD is a set of expressions:
s
:= E where E is a “regular
expression” |
|
|
|
Example:
root := person*
person :=
name,project?,email*,(address|contact)
project := name, project* |
|
A tree T satisfies the DTD iff it is a
derivation tree |
|
|
|
|
Strictly weaker than regular tree languages on
unranked trees [why ?] |
|
|
|
Lots of ways to extend them; most popular in the
theory community: specialized DTDs |
|
|
|
|
Given two alphabets S, S’ |
|
|
|
A specialized DTD is a set of expressions:
s‘
:= E’ where E’ is a “regular
expression”
and a mapping m : S’ ! S |
|
Example:
root := (person|project)*
person := name1,phone
project := name2,cost
name1
:= firstName, lastName
name2
:= internalName, publicName |
|
|
|
|
|
One more restriction: if E’ contains two
occurrences of s 2 S, then they have the same “type”. |
|
Formally: there are no two occurrences of s1’
and s2’ in any regular expression E’ s.t. m(s1’) = m(s2’) |
|
The XML Schema standard has such a requirement |
|
|
|
|
The main result is that the following have the
same complexities: |
|
|
|
Inclusion for a class R of RE’s |
|
Inclusion for a class of DTD’s restricted to R |
|
Inclusion for a class of single-type SDTDs
restricted to R |
|
BUT not for SDTD’s over R |
|
|
|
|
|
Is FO restricted to only k variables:
x1,
…, xk |
|
|
|
What can we express here ? |
|
Try this in FO3: |
|
There exists a path of length 10 from u to
v
[in class] |
|
|
|
|
|
|
|
The combined complexity of query evaluation: |
|
Given A, f, decide whether A ˛ f |
|
|
|
What is the complexity of:
{(A,f) | A 2
STRUCT[s], f 2 FO}
{(A,f) | A 2 STRUCT[s], f 2 FOk} |
|
|
|
|
Satsfiability |
|
Given f, decide if 9 A s.t. A ˛ f |
|
|
|
Undecidable for FO (Trakhtenbrot) |
|
Decidable for FO2 (WOW !) |
|
Undecidable for FO3 (Hmm….) |
|
|
|
|
|
Extensions of FO: |
|
LFP, IFP, PFP, TC, you-name-it, … |
|
|
|
All are expressible in infinitary FO, L1,w: |
|
Allowed to take infinite
conjuctions/disjunctions:
Çi 2 I fi or
Ći 2 I fi [why ?] |
|
But L1,w is boring… |
|
All are expressible in [k ¸
0 Lk1,w= Lw1,w |
|
|
|