注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Bioinformatics home

 
 
 

日志

 
 

Using XML in MySQL 5.1 and 6.0  

2008-01-12 09:51:26|  分类: 默认分类 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Using XML in MySQL 5.1 and 6.0

Using XML in MySQL 5.1 and 6.0

Table of Contents

In this article, we discuss the XML functionality available in MySQL, with an emphasis on new features coming online in MySQL 5.1 and MySQL 6.0. We assume that you already have a working knowledge of XML, and that you know what the terms “valid” and “well-formed” mean. We also assume that you have some knowledge of XPath.

We cover the following topics:

  • Methods for outputting MySQL data in XML format, including the use of

    lib_mysqludf_xql

    , a third-party library that can be used for this task

  • Using the functions (new in MySQL 5.1)

    ExtractValue()

    and

    UpdateXML()

    for working with XML and XPath

  • Storing data from XML in a MySQL database using the

    LOAD XML

    statement (implemented in MySQL 6.0)

  • Some security considerations to keep in mind when using these techniques

http://dev.mysql.com/doc/.

Beginning with MySQL 5.1.12, the

<field>

and

<row>

format produced by the mysql client matches that produced by mysqldump. However, the root element in the output of mysql

--xml

, is

<resultset>

, whose

statement

attribute contains the SQL statement passed to mysql, as shown here:

shell>

mysql -uroot --xml -e 'SELECT * FROM xmltest.cities ORDER BY name'

<?xml version="1.0"?><resultset statement="SELECT * FROM xmltest.cities ORDER BY name" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <row> <field name="name">Ciudad de Méico</field> <field name="country">Mexico</field> <field name="population">8591309</field> </row> <row> <field name="name">Istanbul</field> <field name="country">Turkey</field> <field name="population">8787958</field> </row> <row> <field name="name">Jakarta</field> <field name="country">Indonesia</field> <field name="population">9604900</field> </row> <row> <field name="name">Karachi</field> <field name="country">Pakistan</field> <field name="population">9269265</field> </row> <row> <field name="name">Moscow</field> <field name="country">Russian Federation</field> <field name="population">8389200</field> </row> <row> <field name="name">Mumbai (Bombay)</field> <field name="country">India</field> <field name="population">10500000</field> </row> <row> <field name="name">New York</field> <field name="country">United States</field> <field name="population">8008278</field> </row> <row> <field name="name">São Paulo</field> <field name="country">Brazil</field> <field name="population">9968485</field> </row> <row> <field name="name">Seoul</field> <field name="country">South Korea</field> <field name="population">9981619</field> </row> <row> <field name="name">Shanghai</field> <field name="country">China</field> <field name="population">9696300</field> </row></resultset>

The output of mysqldump

--xml

is structured somewhat differently, as shown here:

shell>

mysqldump --xml xmltest cities

<?xml version="1.0"?><mysqldump xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><database name="xmltest"> <table_structure name="cities"> <field Field="name" Type="char(35)" Null="NO" Key="" Default="" Extra=""/> <field Field="country" Type="char(52)" Null="NO" Key="" Default="" Extra=""/> <field Field="population" Type="int(11)" Null="NO" Key="" Default="0" Extra=""/> <options Name="cities" Engine="MyISAM" Version="10" Row_format="Fixed" Rows="10" Avg_row_length="92" Data_length="920" Max_data_length="25895697857380351" Index_length="1024" Data_free="0" Create_time="2007-08-24 14:19:42" Update_time="2007-08-24 14:19:42" Collation="latin1_swedish_ci" Create_options="" Comment="" /> </table_structure> <table_data name="cities"> <row> <field name="name">Mumbai (Bombay)</field> <field name="country">India</field> <field name="population">10500000</field> </row> <row> <field name="name">Seoul</field> <field name="country">South Korea</field> <field name="population">9981619</field> </row> <row> <field name="name">São Paulo</field> <field name="country">Brazil</field> <field name="population">9968485</field> </row> <row> <field name="name">Shanghai</field> <field name="country">China</field> <field name="population">9696300</field> </row> <row> <field name="name">Jakarta</field> <field name="country">Indonesia</field> <field name="population">9604900</field> </row> <row> <field name="name">Karachi</field> <field name="country">Pakistan</field> <field name="population">9269265</field> </row> <row> <field name="name">Istanbul</field> <field name="country">Turkey</field> <field name="population">8787958</field> </row> <row> <field name="name">Ciudad de México</field> <field name="country">Mexico</field> <field name="population">8591309</field> </row> <row> <field name="name">Moscow</field> <field name="country">Russian Federation</field> <field name="population">8389200</field> </row> <row> <field name="name">New York</field> <field name="country">United States</field> <field name="population">8008278</field> </row> </table_data></database></mysqldump>

Note

The formatting of some of the XML output has been altered slightly to fit the space available on a printed page.

mysqldump

--xml

employs the following elements:

  • The root element of the XML document representing the dump is

    <mysqldump>

  • The table definitions and data belonging to each database represented in the dump are wrapped together in a

    <database>

    element whose

    name

    attribute value is the name of that database

  • Each table definition is wrapped in a

    <table_structure>

    element

  • The data from each table is wrapped in a

    <table_data>

    element, and is comprised of

    <field>

    and

    <row>

    elements

To save the output of either mysql or mysqldump to a file, simply use the

>

operator with the desired filename, as shown here:

shell>

mysql -uroot --xml xmltest -e 'SELECT name FROM cities LIMIT 2' > /tmp/2cities.xml

shell>

more /tmp/2cities.xml

<?xml version="1.0"?><resultset statement="SELECT name,country FROM cities LIMIT 2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <row> <field name="name">Mumbai (Bombay)</field> <field name="country">India</field> </row> <row> <field name="name">Seoul</field> <field name="country">South Korea</field> </row></resultset>

Note

An XML namespace declaration is included in the XML output of both mysql and mysqldump, beginning with MySQL 5.1.18.

There may be times when your application expects an XML format that is different from what is normally obtain from mysql or mysqldump. Suppose, for example, that your application expects a format such as this one:

<cities> <city>

name1

<city> <city>

name2

<city> <-- etc. --></cities>

One way to obtain such output would be to use concatenation with the

CONCAT()

and

GROUP_CONCAT()

functions, like this:

mysql>

SELECT CONCAT('\n<cities>\n',

->

GROUP_CONCAT(' <city>', name, '</city>\n' SEPARATOR ''),

->

'</cities>') AS xmldoc

->

FROM cities\G

*************************** 1. row ***************************xmldoc:<cities> <city>Mumbai (Bombay)</city> <city>Seoul</city> <city>Sã Paulo</city> <city>Shanghai</city> <city>Jakarta</city> <city>Karachi</city> <city>Istanbul</city> <city>Ciudad de Méico</city> <city>Moscow</city> <city>New York</city></cities>1 row in set (0.01 sec)

Another such example is this format:

<cities> <city name="

name1

" population="

population1

"/> <city name="

name2

" population="

population2

"/> <-- etc. --></cities>

This can be produced as shown here:

mysql>

SELECT CONCAT(

->

'\n<cities>',

->

GROUP_CONCAT(

->

'\n\t<city name="', name, '" population="', population, '"/>'

->

SEPARATOR ''

->

),

->

'\n</cities>'

->

) AS xmldoc

->

FROM cities\G

*************************** 1. row ***************************xmldoc:<cities> <city name="Mumbai (Bombay)" population="10500000"/> <city name="Seoul" population="9981619"/> <city name="Sã Paulo" population="9968485"/> <city name="Shanghai" population="9696300"/> <city name="Jakarta" population="9604900"/> <city name="Karachi" population="9269265"/> <city name="Istanbul" population="8787958"/> <city name="Ciudad de Méico" population="8591309"/> <city name="Moscow" population="8389200"/> <city name="New York" population="8008278"/></cities>1 row in set (0.01 sec)

You can see that generating even relatively simple XML output by such means can rapidly grow very complicated. While you can wrap such

SELECT

statements in stored procedures, the fact that MySQL stored routines cannot (at least at present) take a variable number of arguments. Fortunately, there is a third-party library that can be used with MySQL to make this task easier.

Using the

lib_mysqludf_xql

library. The

lib_mysqludf_xql

library is a set of UDFs originally written and placed under the GPL by Arnold Daniels. The source for these is now housed at www.mysqludf.org.

mysql>

SELECT xql_element('city', name) FROM cities;

+-------------------------------+| xql_element('city', name) |+-------------------------------+| <city>Mumbai (Bombay)</city> || <city>Seoul</city>|| <city>Sã Paulo</city> || <city>Shanghai</city> || <city>Jakarta</city> || <city>Karachi</city> || <city>Istanbul</city> || <city>Ciudad de Méico</city> || <city>Moscow</city> || <city>New York</city> |+--------------------------------+10 rows in set (0.00 sec)mysql>

SELECT xql_element('city', NULL, name AS 'name') FROM cities;

+-------------------------------------------+| xql_element('city', NULL, name AS 'name') |+-------------------------------------------+| <city name="Mumbai (Bombay)"/>|| <city name="Seoul"/> || <city name="S&#x3BE0;Paulo"/> || <city name="Shanghai"/> || <city name="Jakarta"/> || <city name="Karachi"/> || <city name="Istanbul"/> || <city name="Ciudad de M&#x9E29;co"/> || <city name="Moscow"/> || <city name="New York"/> |+-------------------------------------------+10 rows in set (0.00 sec)mysql>

SELECT xql_element('city', NULL, name AS 'name', population AS 'population')

->

FROM cities;

+-----------------------------------------------------------------------+| xql_element('city', NULL, name AS 'name', population AS 'population') |+-----------------------------------------------------------------------+| <city name="Mumbai (Bombay)" population="10500000"/> || <city name="Seoul" population="9981619"/> || <city name="S&#x3BE0;Paulo" population="9968485"/> || <city name="Shanghai" population="9696300"/> || <city name="Jakarta" population="9604900"/> || <city name="Karachi" population="9269265"/> || <city name="Istanbul" population="8787958"/> || <city name="Ciudad de M&#x9E29;co" population="8591309"/> || <city name="Moscow" population="8389200"/> || <city name="New York" population="8008278"/> |+-----------------------------------------------------------------------+10 rows in set (0.00 sec)

the section called “XPath Functionality”); however having to do so every time you want to access the data s not terribly convenient. However, MySQL developer Alexander Barkov has written a stored procedure

xmldump_load

that extracts data from the XML elements and attributes found in a file created using mysqldump --xml, and inserts this data into the columns of a MySQL table. The source of this stored procedure is shown here:

DELIMITER |DROP PROCEDURE IF EXISTS xmldump_load |CREATE PROCEDURE xmldump_load(file_name VARCHAR(128), database_name VARCHAR(128), table_name VARCHAR(128))BEGIN DECLARE xml TEXT; DECLARE nrows INT; DECLARE rownum INT DEFAULT 1; DECLARE ncols INT; DECLARE colnum INT DEFAULT 1; DECLARE ins_list TEXT DEFAULT ''; DECLARE val_list TEXT DEFAULT ''; DECLARE tmp VARCHAR(255); # load the XML file's contents into a string SET xml = LOAD_FILE(file_name); # get the number of <row>s in this table SET nrows = ExtractValue(xml, 'count(/mysqldump/database[@name=$database_name]/table_data[@name=$table_name]/row)'); # get the number of <field>s (columns) in this table SET ncols = ExtractValue(xml, 'count(/mysqldump/database[@name=$database_name]/table_data[@name=$table_name]/row[1]/field)'); # for each <row> WHILE rownum <= nrows DO # for each <field> (column) WHILE colnum <= ncols DO SET tmp = ExtractValue(xml, '/mysqldump/database[@name=$database_name]/table_data[@name=$table_name]/row[$rownum]/field[$colnum]/@name'); SET ins_list = CONCAT(ins_list, tmp, IF(colnum<ncols, ',', '')); SET tmp = ExtractValue(xml, '/mysqldump/database[@name=$database_name]/table_data[@name=$table_name]/row[$rownum]/field[$colnum]'); SET val_list = CONCAT(val_list, '''', tmp ,'''', IF(colnum<ncols, ',', '')); SET colnum = colnum + 1; END WHILE; SET @ins_text = CONCAT('INSERT INTO t1 (', ins_list, ') VALUES (', val_list, ')'); SET ins_list = ''; SET val_list = ''; PREPARE stmt FROM @ins_text; EXECUTE stmt; SET rownum = rownum + 1; SET colnum = 1; END WHILE;END |DELIMITER ;

This stored procedure employs XPath user variables (see User variables in XPath expressions) and so runs only under MySQL 5.1.20 or later. You can find a copy of it in the file

create-xmpdump-load.sql

included with this article.

Important

The caller of this stored procedure must have the MySQL

FILE

privilege.

You can test

xmldump_load

using the supplied

test.sh

script, shown here:

# Demo for xmldump_load()DB="test"# Change "root" and "mypass" in the following 2 lines to # a user and password appropriate to your installation MYSQL="mysql -uroot -pmypass --socket=/tmp/mysql.sock"MYSQLDUMP="mysqldump -uroot -pmypass --socket=/tmp/mysql.sock"# Creates a test table with two columns and fills it with some data$MYSQL $DB <> ENDSELECT VERSION();DROP TABLE IF EXISTS t1;CREATE TABLE t1 (a INT, b VARCHAR(128));INSERT INTO t1 VALUES (1,'11111');INSERT INTO t1 VALUES (2,'22222');INSERT INTO t1 VALUES (3,'33333');INSERT INTO t1 VALUES (4,'44444');INSERT INTO t1 VALUES (5,'55555');INSERT INTO t1 VALUES (6,'66666');INSERT INTO t1 VALUES (7,'77777');INSERT INTO t1 VALUES (8,'88888');INSERT INTO t1 VALUES (9,'99999');END# Dumps data into an XML file$MYSQLDUMP --xml $DB t1 > /tmp/t1.xml# Empties the table$MYSQL --execute="DELETE FROM t1" $DB # Creates the procedure, calls it, and # makes sure we've restored all records$MYSQL $DB << END\. create-xmldump-load.sqlCALL xmldump_load('/tmp/t1.xml', 'test', 't1');SELECT * FROM t1;END# Performs cleanup# Comment out the remaining lines if you wish to # preserve the stored procedure, table, and XML# file following the test run$MYSQL $DB << ENDDROP PROCEDURE xmldump_load;DROP TABLE t1;ENDrm /tmp/t1.xml

Using

LOAD XML

. An implementation contributed by Erik Wetterberg of a new SQL statement has been accepted for MySQL 6.0, and is available beginning with version 6.0.3.

LOAD XML

greatly simplifies the task of importing data from an XML file into a MySQL table, without having to use the stored procedure discussed previously (see Importing mysqldump XML data into MySQL table columns using a stored procedure). The syntax for this statement is as shown here:

LOAD XML [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE '

filename

'[REPLACE | IGNORE]INTO TABLE [

db_name

.]

tbl_name

[CHARACTER SET

charset_name

][ROWS IDENTIFIED BY '<

tagname

>'][IGNORE

number

[LINES | ROWS]][(

column_or_user_var

,...)][SET

col_name

=

expr

,...]

This statement reads data from an XML file into a table, and acts as the complement mysql or mysqldump in XML output mode (that is, using the

--xml

option). The

filename

must be given as a literal string. The

tagname

in the optional

ROWS IDENTIFIED BY

clause must also be given as a literal string, and must be surrounded by angle brackets (

<

and

>

).

LOAD DATA

clauses used in

LOAD XML

. If you have used MySQL's

LOAD DATA

statement before, then you should find the following clauses familiar, since they work in essentially the same way for

LOAD XML

as they do for

LOAD DATA

:

  • LOW_PRIORITY

    or

    CONCURRENT

  • LOCAL

  • REPLACE

    or

    IGNORE

  • CHARACTER SET

  • The

    IGNORE number LINES

    clause is analogous to the

    LOAD DATA

    statement's

    IGNORE ... LINES

    clause. LOAD XML also accepts

    IGNORE number ROWS

    . In either case, the clause causes the first

    number

    rows in the XML file to be skipped, and not to be imported.

  • (column_or_user_var,...)

  • SET

See

LOAD DATA INFILE

Syntax in the MySQL 5.1 Manual for more information about these clauses.

ROWS IDENTIFIED BY '<

tagname

>'.

Accepted input formats.

LOAD XML

supports three different XML formats:

  • Attributes are interpreted as column names, and attribute values as interpreted as column values:

    <

    row

    column1

    ="

    value1

    "

    column2

    ="

    value2

    " .../>

  • Tag names are interpreted as column names, and the content of these tags are interpreted as column values:

    <

    row

    > <

    column1

    >

    value1

    </

    column1

    > <

    column2

    >

    value2

    </

    column2

    ></

    row

    >

  • Table column names are derived from the

    name

    attributes of

    <field>

    tags, and column values are taken from the contents of these tags:

    <row> <field name='

    column1

    '>

    value1

    </field> <field name='

    column2

    '>

    value2

    </field></row>

    This is the format used by MySQL tools such as mysqldump

The import routine used by

LOAD XML

automatically detects the format used for each row and interprets it correctly, matching based on the tag or attribute name and the column name. You can easily verify this for yourself, by creating an XML file that uses any two or even all three formats and then using

LOAD XML

to import it into a table.

Important

You must have the

FILE

privilege to use

LOAD XML

.

Using

LOAD_FILE()

demonstrates how to get the content of an XML file into a MySQL database, but the problem of getting at the actual data remains. One way to accomplish this is to use the

ExtractValue()

function.

The syntax for this function is shown here:

ExtractValue(

xml_fragment

,

xpath_expression

)

ExtractValue()

takes two arguments. The first of these is the XML fragment to be tested; the second is the XPath expression to be matched.

ExtractValue()

example. Let' see how we might obtain the name of the first city in the document (which we saved as

2cities.xml

). We do this in two steps. First, we get the XML from the

xmldocs

table and place into a user variable:

mysql>

SELECT doc_content FROM xmldocs LIMIT 1 INTO @xml;

Query OK, 1 row affected (0.00 sec)mysql>

SELECT @xml\G

*************************** 1. row ***************************@xml: <?xml version="1.0"?><resultset statement="SELECT name,country FROM cities LIMIT 2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <row> <field name="name">Mumbai (Bombay)</field> <field name="country">India</field> </row> <row> <field name="name">Seoul</field> <field name="country">South Korea</field> </row></resultset>1 row in set (0.00 sec)

Now we use

ExtractValue()

with

@xml

as the first argument. For the second argument, we employ an XPath expression that means “the content of the first

<field>

element contained in the second <row> element found anywhere in the document”:

mysql>

SELECT ExtractValue(@xml, '//row[2]/field[1]');

+-----------------------------------------+| ExtractValue(@xml, '//row[2]/field[1]') |+-----------------------------------------+| Seoul |+-----------------------------------------+1 row in set (0.00 sec)

Another way to do this would be to use an XPath expression that means “the content of the second

<field>

having the

name

attribute "name"”:

mysql>

SELECT ExtractValue(@xml, '//field[@name="name"][2]');

+------------------------------------------------+| ExtractValue(@xml, '//field[@name="name"][2]') |+------------------------------------------------+| Seoul |+------------------------------------------------+1 row in set (0.00 sec)

As you can see, the result (

Seoul

) is the same as before, which is exactly what we would expect.

XML Path Language (XPath) 1.0 standard. The Zvon.org XPath Tutorial also provides a useful resource for those who are new to XPath or who would like a refresher in XPath basics.

the section called “Loading Data from Files”).

In other words, the

FILE

privilege affects only whether the client can read files on the server; it has no bearing on whether the client can read files on the local filesystem.

Blind XPath Injection (PDF, 46KB).

It is also important to check the output being sent back to the client. For an example, let's revisit our previous example; however, this time, instead of using PHP's XPath functionality, we use the MySQL

ExtractValue()

function:

mysql>

SELECT ExtractValue(

->

LOAD_FILE('users.xml'),

->

'//user[login/text()="" or 1=1 and password/text()="" or 1=1]/attribute::id'

->

) AS id;

+-------------------------------+| id |+-------------------------------+| 00327 13579 02403 42354 28570 |+-------------------------------+1 row in set (0.01 sec)

Because

ExtractValue()

returns multiple matches as a single space-delimited string, this injection attack provides to the user in a single row every valid ID contained within

users.xml

. As an extra safeguard, you should also test output before returning it to the user. Here is a simple example:

mysql>

SELECT @id = ExtractValue(

->

LOAD_FILE('users.xml'),

->

'//user[login/text()="" or 1=1 and password/text()="" or 1=1]/attribute::id'

->

);

Query OK, 0 rows affected (0.00 sec)mysql>

SELECT IF(

->

INSTR(@id, ' ') = 0,

->

@id,

->

'Unable to retrieve user ID')

->

AS singleID;

+----------------------------+| singleID |+----------------------------+| Unable to retrieve user ID |+----------------------------+1 row in set (0.00 sec)

In general, the guidelines for returning data to users securely are the same as for accepting user input. These can be summed up as:

  • Always test outgoing data for type and allowable values.

  • Never allow unauthorized users to view error messages that might provide information about the application that could be used to exploit it.

Following these principles in architecting an XML application can help to insure that it is a secure application.

  评论这张
 
阅读(898)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017