Python xml.dom module parses XML

 

Back to the top

1. What is XML? What are the characteristics?

xmlThat is,Extensible markup language,It can be used to mark up data and define data types. It is a source language that allows users to define their own markup language.

Example: del.xml

Copy code ></span></div>
<pre><span style=<?xml version="1.0" encoding="utf-8"?> <catalog> <maxid>4</maxid> <login username="pytest" passwd='123456'> <caption>Python</caption> <item id="4"> <caption>test</caption> </item> </login> <item id="2"> <caption>Zope</caption> </item> </catalog>
Copy code ></span></div>
</div>
<p>Structurally, it is much like the HTML hypertext markup language. But they are designed for different purposes. Hypertext markup languages are designed to display data, focusing on the appearance of the data. It is designed for use.<strong>transmission</strong>and<strong>storage</strong>Data, whose focus is data.<strong>content</strong>。</p>
<p>It has the following characteristics:</p>
<ul>
<li>It is there.<strong>Label pair</strong>Composition, < aa> < /aa></li>
<li>Tags can have attributes: < AA id=’123′> < /aa></li>
<li>The tag pairs can be embedded into data: < aa> abc< /aa></li>
<li>Tags can be embedded in sub tags (hierarchical relationships).</li>
</ul>
<div style=Back to the top

2. Get tag properties

Copy code ></span></div>
<pre><span style=#coding: utf-8 import xml.dom.minidom dom = xml.dom.minidom.parse("del.xml") #Open XML document root = dom.documentElement #Get XML document objectprint "nodeName:", root.nodeName #Each node has its nodeName, nodeValue, nodeType attribute.print "nodeValue:", root.nodeValue #nodeValueIs the value of the node, only valid for the text node.print "nodeType:", root.nodeType print "ELEMENT_NODE:", root.ELEMENT_NODE
Copy code ></span></div>
</div>
<p>nodeTypeIs the type of node. Catalog is ELEMENT_NODE type.</p>
<p>Now there are the following:</p>
<div class=
1
2
3
4
5
6
7
8
9
10
11
12
'ATTRIBUTE_NODE'
'CDATA_SECTION_NODE'
'COMMENT_NODE'
'DOCUMENT_FRAGMENT_NODE'
'DOCUMENT_NODE'
'DOCUMENT_TYPE_NODE'
'ELEMENT_NODE'
'ENTITY_NODE'
'ENTITY_REFERENCE_NODE'
'NOTATION_NODE'
'PROCESSING_INSTRUCTION_NODE'
'TEXT_NODE'

Running result

1
2
3
4
nodeName: catalog
nodeValue: None
nodeType: 1
ELEMENT_NODE: 1
Back to the top

3. Get child Tags

Copy code ></span></div>
<pre><span style=#coding: utf-8 import xml.dom.minidom dom = xml.dom.minidom.parse("del.xml") root = dom.documentElement bb = root.getElementsByTagName('maxid') print type(bb) print bb b = bb[0] print b.nodeName print b.nodeValue
Copy code ></span></div>
</div>
<p>Running result</p>
<div class=
1
2
3
4
<class 'xml.dom.minicompat.NodeList'>
[<DOM Element: maxid at 0x2707a48>]
maxid
None
Back to the top

4. Get label attribute values

Copy code ></span></div>
<pre><span style=#coding: utf-8 import xml.dom.minidom dom = xml.dom.minidom.parse("del.xml") root = dom.documentElement itemlist = root.getElementsByTagName('login') item = itemlist[0] print item.getAttribute("username") print item.getAttribute("passwd") itemlist = root.getElementsByTagName("item") item = itemlist[0] #By distinguishing the location in itemlistprint item.getAttribute("id") item2 = itemlist[1] #By distinguishing the location in itemlist
print item2.getAttribute("id")
Copy code ></span></div>
</div>
<p>Running result</p>
<div class=
1
2
3
4
pytest
123456
4
2
Back to the top

5. Get data between labels pairs

Copy code ></span></div>
<pre><span style=#coding: utf-8 import xml.dom.minidom dom = xml.dom.minidom.parse("del.xml") root = dom.documentElement itemlist = root.getElementsByTagName('caption') item = itemlist[0] print item.firstChild.data item2 = itemlist[1] print item2.firstChild.data
Copy code ></span></div>
</div>
<p>Running result</p>
<div class=
1
2
Python
test
Back to the top

6. Example

Copy code ></span></div>
<pre><span style=<?xml version="1.0" encoding="UTF-8" ?> <users> <user id="1000001"> <username>Admin</username> <email>admin@live.cn</email> <age>23</age> <sex>boy</sex> </user> <user id="1000002"> <username>Admin2</username> <email>admin2@live.cn</email> <age>22</age> <sex>boy</sex> </user> <user id="1000003"> <username>Admin3</username> <email>admin3@live.cn</email> <age>27</age> <sex>boy</sex> </user> <user id="1000004"> <username>Admin4</username> <email>admin4@live.cn</email> <age>25</age> <sex>girl</sex> </user> <user id="1000005"> <username>Admin5</username> <email>admin5@live.cn</email> <age>20</age> <sex>boy</sex> </user> <user id="1000006"> <username>Admin6</username> <email>admin6@live.cn</email> <age>23</age> <sex>girl</sex> </user> </users>
Copy code ></span></div>
</div>
<p>Exporting name, email, age and sex</p>
<p><strong>Reference code</strong></p>
<div class=
Copy code ></span></div>
<pre><span style=# -*- coding:utf-8 -*- from xml.dom import minidom def get_attrvalue(node, attrname): return node.getAttribute(attrname) if node else '' def get_nodevalue(node, index = 0): return node.childNodes[index].nodeValue if node else '' def get_xmlnode(node, name): return node.getElementsByTagName(name) if node else [] def get_xml_data(filename = 'user.xml'): doc = minidom.parse(filename) root = doc.documentElement user_nodes = get_xmlnode(root, 'user') print "user_nodes:", user_nodes user_list=[] for node in user_nodes: user_id = get_attrvalue(node, 'id') node_name = get_xmlnode(node, 'username') node_email = get_xmlnode(node, 'email') node_age = get_xmlnode(node, 'age') node_sex = get_xmlnode(node, 'sex') user_name =get_nodevalue(node_name[0]) user_email = get_nodevalue(node_email[0]) user_age = int(get_nodevalue(node_age[0])) user_sex = get_nodevalue(node_sex[0]) user = {} user['id'] , user['username'] , user['email'] , user['age'] , user['sex'] = ( int(user_id), user_name , user_email , user_age , user_sex ) user_list.append(user) return user_list def test_load_xml(): user_list = get_xml_data() for user in user_list : print '-----------------------------------------------------' if user: user_str='No.:\t%d\nname:\t%s\nsex:\t%s\nage:\t%s\nEmail:\t%s' % (int(user['id']) , user['username'], user['sex'] , user['age'] , user['email']) print user_str if __name__ == "__main__": test_load_xml()
Copy code ></span></div>
</div>
<p><strong>Result</strong></p>
<div class=
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
C:\Users\jihite\Desktop\xml>python user.py
user_nodes: [<DOM Element: user at 0x2758c48>, <DOM Element: user at 0x2756288>,
 <DOM Element: user at 0x2756888>, <DOM Element: user at 0x2756e88>, <DOM Elemen
t: user at 0x275e4c8>, <DOM Element: user at 0x275eac8>]
-----------------------------------------------------
No.:    1000001
name:   Admin
sex:    boy
age:    23
Email:  admin@live.cn
-----------------------------------------------------
No.:    1000002
name:   Admin2
sex:    boy
age:    22
Email:  admin2@live.cn
-----------------------------------------------------
No.:    1000003
name:   Admin3
sex:    boy
age:    27
Email:  admin3@live.cn
-----------------------------------------------------
No.:    1000004
name:   Admin4
sex:    gril
age:    25
Email:  admin4@live.cn
-----------------------------------------------------
No.:    1000005
name:   Admin5
sex:    boy
age:    20
Email:  admin5@live.cn
-----------------------------------------------------
No.:    1000006
name:   Admin6
sex:    gril
age:    23
Email:  admin6@live.cn 
Back to the top

7. summary

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
minidom.parse(filename)
Load read XML file
 
doc.documentElement
Getting XML document objects
 
node.getAttribute(AttributeName)
Get XML node attribute value
 
node.getElementsByTagName(TagName)
Get the collection of XML node objects
 
node.childNodes #Returns a list of child nodes.
 
node.childNodes[index].nodeValue
Get XML node value
 
node.firstChild
#Access the first node. Equivalent to pagexml.childNodes[0]
 
doc = minidom.parse(filename)
doc.toxml('UTF-8')
Returns the text represented by the XML of the Node node.
 
Node.attributes["id"]
a.name #It's above."id"
a.value #Attribute value
Accessing element properties

Leave a Reply

Your email address will not be published. Required fields are marked *