Python5-XML文件解析

发布: 2008-11-05 10:39 | 作者: 网络转载 | 来源: 网络转载 | 查看: 270次 | 进入软件测试论坛讨论

这次的学习目标是搞清楚基本的xml解析在Python的流程。

我准备解析下面这个文档(关于xml的知识可以到http://www.w3.org上查看相关的Recommendations)：
代码:

<catalog>;

  <book isbn="1-56592-724-9">;

    <title>;The Cathedral & the Bazaar</title>;

    <author>;Eric S. Raymond</author>;

  </book>;

  <book isbn="1-56592-051-1">;

    <title>;Making TeX Work</title>;

    <author>;Norman Walsh</author>;

  </book>;

  <!-- imagine more entries here... -->;

</catalog>;

Python的标准模块里包含了xml 处理的module。我们这次用的是xml.dom.minidom，一个迷你版的DOM API
代码:

#! /usr/bin/python



import xml.dom.minidom

from xml.dom.minidom import Node



doc = xml.dom.minidom.parse("books.xml")



mapping = {}

for node in doc.getElementsByTagName("book"):

    isbn = node.getAttribute("isbn")

    L = node.getElementsByTagName("title")

    for node2 in L:

        title = ""

        for node3 in node2.childNodes:

            if node3.nodeType == Node.TEXT_NODE:

                title += node3.data

                mapping[isbn] = title

                # mapping now has the same value as in the SAX example:

                print(mapping)

通过这个程序，可以看到解析xml的文件的过程
minidom.parse返回的就是一个xml.dom.Document类型的实例。其实就是DOM中定义的Document了。通常的DOM的操作都是通过这个类来完成，比如例子中的建立ISBN和书名的对应关系表。对DOM的API，大家可以查看相关的文档。

同时，这次引入了一个新的控制结构，就是for-loop。这个和C和Java的for循环有些区别（Java在5.0中也引入了这种循环）。这个循环是for-each-in格式的。而不是传统的以初始值，步进值和中止条件控制循环过程的。