XML Data Repository

Repository Home

Datasets, Details, and Download

Protein Sequence Database
Integrated collection of functionally annotated protein sequences.
from Georgetown Protein Information ResourceNov 9 2001
filenameDTDDescriptionDownloadelementsattributesmax-depthavg-depth
psd7003.xmldtdProtein Sequence Database.xml (683 MB)
.gz (103 MB)
.xmi (70 MB)
21305818129064775.15147

SwissProt
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases.
from ExPASy - SWISS-PROT and TrEMBL1998
filenameDTDDescriptionDownloadelementsattributesmax-depthavg-depth
SwissProt.xmlNASwissProt database.xml (109 MB)
.gz (13 MB)
.xmi (7 MB)
2977031218985953.55671

Auction Data
Auction data converted to XML from web sources.
from Anhai Doan2001
filenameDTDDescriptionDownloadelementsattributesmax-depthavg-depth
321gone.xmldtd.xml (23 KB)
.gz (6 KB)
.xmi (6 KB)
311053.76527
ebay.xmldtdEBay auction data.xml (34 KB)
.gz (10 KB)
.xmi (10 KB)
156053.75641
ubid.xmldtdUBid auction data.xml (19 KB)
.gz (3 KB)
.xmi (3 KB)
342053.76608
yahoo.xmldtdYahoo auction data.xml (24 KB)
.gz (6 KB)
.xmi (5 KB)
342053.76608

DBLP Computer Science Bibliography
The DBLP server provides bibliographic information on major computer science journals and proceedings. DBLP stands for Digital Bibliography Library Project.
from DBLP HomepageOct 2002
filenameDTDDescriptionDownloadelementsattributesmax-depthavg-depth
dblp.xmldtdDBLP Bibliography.xml (127 MB)
.gz (23 MB)
.xmi (19 MB)
333213040427662.90228

University Courses
Course data derived from university websites.
from Anhai Doan1999
filenameDTDDescriptionDownloadelementsattributesmax-depthavg-depth
reed.xmldtdCourses from Reed College.xml (277 KB)
.gz (18 KB)
.xmi (12 KB)
10546043.19979
uwm.xmlNACourses from UWM.xml (2 MB)
.gz (157 KB)
.xmi (102 KB)
66729653.95243
wsu.xmlNACourses from WSU.xml (1 MB)
.gz (99 KB)
.xmi (61 KB)
74557043.15787

Nasa
Datasets converted from legacy flat-file format into XML and made available to the public.
from GSFC/NASA XML Project2001
filenameDTDDescriptionDownloadelementsattributesmax-depthavg-depth
nasa.xmlNAAstronomical Data.xml (23 MB)
.gz (3 MB)
.xmi (2 MB)
4766465631785.58314

SIGMOD Record
Index of articles from SIGMOD Record
from ACM SIGMOD Record in XML2001
filenameDTDDescriptionDownloadelementsattributesmax-depthavg-depth
SigmodRecord.xmldtdSIGMOD Record in XML.xml (467 KB)
.gz (79 KB)
.xmi (56 KB)
11526373765.14107

TPC-H Relational Database Benchmark
TPC-H Benchmark, 10 MB version, in XML form. Converted to XML by Zack Ives.
from Transaction Processing Performance Council (TPC)2002
filenameDTDDescriptionDownloadelementsattributesmax-depthavg-depth
part.xmlNAParts.xml (603 KB)
.gz (70 KB)
.xmi (45 KB)
20001132.8999
lineitem.xmlNALine items.xml (30 MB)
.gz (2 MB)
.xmi (1 MB)
1022976132.94117
partsupp.xmlNAPart/Supplier relationship.xml (2 MB)
.gz (311 KB)
.xmi (236 KB)
48001132.8333
supplier.xmlNASupplier.xml (28 KB)
.gz (6 KB)
.xmi (5 KB)
801132.87266
orders.xmlNAOrders.xml (5 MB)
.gz (556 KB)
.xmi (358 KB)
150001132.89999
nation.xmlNANations.xml (4 KB)
.gz (1 KB)
.xmi (1 KB)
126132.78571
region.xmlNARegions.xml (787 B)
.gz (373 B)
.xmi (370 B)
21132.66667
customer.xmlNACustomers.xml (503 KB)
.gz (101 KB)
.xmi (76 KB)
13501132.88875

Treebank (partially encrypted)
English sentences, tagged with parts of speech. The text nodes have been encrypted because they are copywritten text from the Wall Street Journal. Nevertheless, the deep recursive structure of this data makes it an interesting case for experiments.
from University of Pennsylvania Treebank Projectadded Nov 2002
filenameDTDDescriptionDownloadelementsattributesmax-depthavg-depth
treebank_e.xmlNAPartially-encrypted treebank.xml (82 MB)
.gz (30 MB)
.xmi (24 MB)
24376661367.87279

Mondial
World geographic database integrated from the CIA World Factbook, the International Atlas, and the TERRA database among other sources.
from Florid-Mondial Case Study2002
filenameDTDDescriptionDownloadelementsattributesmax-depthavg-depth
mondial-3.0.xmldtdDTD is available, but data is not valid..xml (1 MB)
.gz (167 KB)
.xmi (95 B)
224234742353.59274

Last Modified: [an error occurred while processing this directive]