xml&Java的结合应用(今天就先讲这些,下课啦)
发表于:2007-07-01来源:作者:点击数:
标签:
关键词: Java , XML 采用JAVA处理XML非常容易 1,你需要一个JDK 2,你需要一些 free class libraries,如XML4J,XERCES,LOTUSXSL 3,你需要一个text editor 4,你需要一些待处理的DATA 预备条件 熟悉Java尤其是I/O, classes, objects, polymorphism等概念. 知道
关键词:
Java, XML
采用JAVA处理XML非常容易
1,你需要一个JDK
2,你需要一些 free class libraries,如XML4J,XERCES,LOTUSXSL
3,你需要一个text editor
4,你需要一些待处理的DATA
预备条件
熟悉Java尤其是I/O, classes, objects, polymorphism等概念.
知道XML要well-formedness(格式准确), validity(有法为证), namespaces(名字空间)等。
I will briefly review proper terminology
一个简单的例子:
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://metalab.unc.edu/xml/namespace/song"
xmlns:xlink="http://www.w3.org/1999/xlink">
<TITLE>Hot Cop</TITLE>
<PHOTO
xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
<COMPOSER>Ja
cques Morali</COMPOSER>
<COMPOSER>Henri Belolo</COMPOSER>
<COMPOSER>Victor Willis</COMPOSER>
<PRODUCER>Jacques Morali</PRODUCER>
<!-- The publisher is actually Polygram but I needed
an example of a general entity reference. -->
<PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
A & M Records
</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was
listening to when I wrote this example -->
Markup and Character Data Markup includes:
1,Tags
2,Entity References
3,Comments
4,Processing Instructions
5,Document Type Declarations
6,XML Declaration
7,CDATA Section Delimiters
8,Character data includes everything else
例子:(标记和Character Data例子)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/css" href="song.css"?>
<!DOCTYPE SONG SYSTEM "song.dtd">
<SONG xmlns="http://metalab.unc.edu/xml/namespace/song"
xmlns:xlink="http://www.w3.org/1999/xlink">
<TITLE>Hot Cop</TITLE>
<PHOTO
xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg"
ALT="Victor Willis in Cop Outfit" WIDTH="100" HEIGHT="200"/>
<COMPOSER>Jacques Morali</COMPOSER>
<COMPOSER>Henri Belolo</COMPOSER>
<COMPOSER>Victor Willis</COMPOSER>
<PRODUCER>Jacques Morali</PRODUCER>
<!-- The publisher is actually Polygram but I needed
an example of a general entity reference. -->
<PUBLISHER xlink:type="simple" xlink:href="http://www.amrecords.com/">
A & M Records
</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG>
<!-- You can tell what album I was
listening to when I wrote this example -->
Entities
An XML document is made up of one or more physical storage units called entities
Entity references :
Parsed internal general entity references like &
Parsed external general entity references
Unparsed external general entity references
External parameter entity references
Internal parameter entity references
Reading an XML document is not the same thing as reading an XML file
The file contains entity references.
The file document contains the entities@# replacement text.
When you use a parser to read a document you@#ll get the text including characters like <. You will not see the entity references.
Parsed Character Data
Character data left after entity references are replaced with their text
假设element如下:
<PUBLISHER>A & M Records</PUBLISHER>
那么parsed character data是:A & M Records
CDATA 部分
Used to include large blocks of text with lots of normally illegal literal characters like < and &, typically XML or HTML.
<p>You can use a default <code>xmlns</code>
attribute to avoid having to add the svg prefix to all
your elements:</p>
<![CDATA][
<svg xmlns="http://www.w3.org/Graphics/SVG/SVG-19991203.dtd"
width="12cm" height="10cm">
<ellipse rx="110" ry="130" />
<rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg>
]]>
CDATA is for human authors, not for programs!
Comments
<!-- Before posting this page, I need to double check the number of pelicans in Lousiana in 1970 -->
Comments are for humans, not programs.
Processing Instructions
Divided into a target and data for the target:
1,The target must be an XML name、
2,The data can have an effectively arbitrary format
例:
<?
robots index="yes" follow="no"?>
<?xml-stylesheet href="pelicans.css" type="text/css"?>
<?
php
mysql_connect("database.unc.edu", "clerk", "password");
$result = mysql("CYNW", "SELECT LastName, FirstName FROM Employees
ORDER BY LastName, FirstName");
$i = 0;
while ($i < mysql_numrows ($result)) {
$fields = mysql_fetch_row($result);
echo "<person>$fields[1] $fields[0] </person>\r\n";
$i++;
}
mysql_close();
?>
These are for programs(为程序服务!)
The XML Declaration(XML声明)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Looks like a processing instruction but isn@#t.
version attribute
required
always has the value 1.0
encoding attribute
UTF-8
8859_1
etc.
standalone attribute
yes
no
Document Type Declaration
<!DOCTYPE SONG SYSTEM "song.dtd">
Document Type Definition
--------------------------------------------------------------------------------
<!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*,
PUBLISHER*, YEAR?, LENGTH?, ARTIST+)>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT COMPOSER (#PCDATA)>
<!ELEMENT PRODUCER (#PCDATA)>
<!ELEMENT PUBLISHER (#PCDATA)>
<!ELEMENT LENGTH (#PCDATA)>
<!-- This should be a four digit year like "1999",
not a two-digit year like "99" -->
<!ELEMENT YEAR (#PCDATA)>
<!ELEMENT ARTIST (#PCDATA)>
XML Names
Used for element, attribute, and entity names
Can contain any alphabetic, ideographic, or numeric Unicode character
Can contain hyphen, underscore, or period
Can also contain colons but these are reserved for namespaces
Can begin with any alphabetic or ideographic character or the underscore but not digits or other punctuation marks
XML Namespaces
Raison d@#etre:
To distinguish between elements and attributes from different vocabularies with different meanings.
To group all related elements and attributes together so that a parser can easily recognize them.
Each element is given a prefix
Each prefix (as well as the empty prefix) is associated with a URI
Elements with the same URI are in the same namespace
URIs are purely formal. They do not necessarily point to a page.
--------------------------------------------------------------------------------
Namespace Syntax
Elements and attributes that are in namespaces have names that contain exactly one colon. They look like this:
rdf:description
xlink:type
xsl:template
Everything before the colon is called the prefix
Everything after the colon is called the local part or local name.
The complete name including the colon is called the qualified name or raw name.
--------------------------------------------------------------------------------
Namespace URIs
Each prefix in a qualified name is associated with a URI.
For example, all elements in XSLT 1.0 style sheets are associated with the http://www.w3.org/1999/XSL/Transform URI.
The customary prefix xsl is a shorthand for the longer URI http://www.w3.org/1999/XSL/Transform.
You can@#t use the URI in the element name directly.
Binding Prefixes to Namespace URIs
Prefixes are bound to namespace URIs by attaching an xmlns:prefix attribute to the prefixed element or one of its ancestors.
<svg:svg xmlns:svg="http://www.w3.org/Graphics/SVG/SVG-19991203.dtd"
width="12cm" height="10cm">
<svg:ellipse rx="110" ry="130" />
<svg:rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg:svg>
Bindings have scope within the element where they@#re declared.
An SVG processor can recognize all three of these elements as SVG elements because they all have prefixes bound to the particular URI defined by the SVG specification.
The Default Namespace
Indicate that an unprefixed element and all its unprefixed descend
ant elements belong to a particular namespace by attaching an xmlns attribute with no prefix:
<DATASCHEMA xmlns="http://www.w3.org/2000/P3Pv1">
<DATA name="vehicle.make" type="text" short="Make"
category="preference" size="31"/>
<DATA name="vehicle.model" type="text" short="Model"
category="preference" size="31"/>
<DATA name="vehicle.year" type="number" short="Year"
category="preference" size="4"/>
<DATA name="vehicle.license.state." type="postal." short="State"
category="preference" size="2"/>
<DATA name="vehicle.license.number" type="text"
short="License Plate Number" category="preference" size="12"/>
</DATASCHEMA>
Both the DATASCHEMA and DATA elements are in the http://www.w3.org/2000/P3Pv1 namespace.
Default namespaces apply only to elements, not to attributes. Thus in the above example the name, type, short, category, and size attributes are not in any namespace. Unprefixed attributes are never in any namespace.
You can change the default namespace within a particular element by adding an xmlns attribute to the element.
How Parsers Handle Namespaces
Namespaces were added to XML 1.0 after the fact, but care was taken to ensure backwards compatibility.
An XML 1.0 parser that does not know about namespaces will most likely not have any troubles reading a document that uses namespaces.
A namespace aware parser also checks to see that all prefixes are mapped to URIs. Otherwise it behaves
almost exactly like a non-namespace aware parser.
Other software that sits on top of the raw XML parser, an XSLT engine for example, may treat elements differently depending on what namespace they belong to. However, the XML parser itself mostly doesn@#t care as long as all well-formedness and namespace constraints are met.
A possible exception o
clearcase/" target="_blank" >ccurs in the unlikely event that elements with different prefixes belong to the same namespace or elements with the same prefix belong to different namespaces
Many parsers have the option of whether to report namespace violations so that you can turn namespace processing on or off as you see fit.
Canonical XML
A W3C standard for determining when two documents are the same after:
Entity references are resolved
Document is converted to Unicode
Unicode combining forms are combined
Comments are stripped
White space is normalized
Default attribute values are added
If at all possible, your programs should depend only on the canonical form of the document
Canonical form of hotcop.xml:
<?xml-stylesheet type="text/css" href="song.css"?><SONG>
<TITLE>Hot Cop</TITLE>
<COMPOSER>Jacques Morali</COMPOSER>
<COMPOSER>Henri Belolo</COMPOSER>
<COMPOSER>Victor Willis</COMPOSER>
<PRODUCER>Jacques Morali</PRODUCER>
<PUBLISHER>A & M Records</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG>
Trees
An XML document is a tree.
It has a root.
It has nodes.
It is amenable to recursive processing.
Not all applications agree on what the root is.
Not all applications agree on what is and isn@#t a node.
我将在下次给出一个简单的JAVA APP。
原文转自:http://www.ltesting.net