Strange error: doc.setContent(data) returns false
Hi,
I have a problem parsing some xml file from megaupload: my applycation, in short, extract links from a MegaUpload folder url. It works well but sometime I got error; but first of all the code:
Code:
void Dialog::parseXml()
{
muitems.clear();
if(!internetJob->isRequestAborted() && internetJob->getStatusCode() == 200)
{
qDebug() << "Checking for data!!\n";
if(data.isEmpty()) {
qDebug() << "Megaupload returned an empty result!\n";
return;
}
if( !doc.setContent( data ) ) {
qDebug() << "The XML obtained from Megaupload is invalid.";
return;
}
if( root.tagName() != "FILES" ) {
qDebug() << "The xml file invalid.";
return;
}
while( !n.isNull() )
{
if( !e.isNull() )
{
if( e.tagName() == "ROW" )
{
MegaUploadItem mui;
mui.name = e.attribute( "name", "" );
mui.url = e.attribute( "url", "" );
muitems.append(mui);
}
}
n = n.nextSibling();
}
}
qSort(muitems);
}
Most of the times the code above runs good and my QVector muitems is filled with the needed data (file name and url); some times I got an error at doc.setContent(data) even if I dont' know why; in fact, if I comment line 16 parseXml() continues and data are retrieved correctly; other time I got a error in the same line but it is parsed only the first sibling (hoping I was clear). Below are three different urls: with the first I got no error; whit the second doc.setContent(data) returns false but commenting line 16 the data is retrieved correctly; while the third url gives the error plus only one iteration of while(!n.isNull()) {...}.
http://www.megaupload.com/xml/folder...derid=D4HQHPLJ
http://www.megaupload.com/xml/folder...derid=0JY6SVP1
http://www.megaupload.com/xml/folder...derid=QEKO90W1
You can copy these urls in your browser and take a look at the generated xml file. They are structurally the same...
So my questions are:
1) Why doc.setContent(data) returns false? How get a more verbose output?
2) Why only the first tag is parsed and remaining not?
I was thinking to use QRegExp for parsing, but the captured test is wrong: here a sample row of the xml file:
Quote:
<ROW name="VIDEO_TS.part06.rar" name_cut="VIDEO_TS.part06.rar" size="400 MB" url="http://www.megaupload.com/?d=3B1SORP1" downloadid = "3B1SORP1" sizeinbytes="419430400" expired="0"></ROW>
and below my pattern:
Code:
QRegExp pattern
("<ROW\\s((\\w+)\\s*=\\s*(\"[^\"]\"))+></ROW>");
Probably the regular expression is wrong; who can review that?
Best regards.
Re: Strange error: doc.setContent(data) returns false
Hello Giuseppe,
I'm not going to comment on the problem you're having with XML, since I haven't used it enough to be knowledgeable in that area. I can comment on your regexp, however.
To start with, using regular expressions for XML is not a good idea. They are far too brittle. If you choose to use one anyway, here is your corrected expression. I use a free program called Regex Coach to proofread mine. I didn't test this in Qt, but it should work.
Code:
QRegExp pattern
("<ROW\\s+((\\w+)\\s*=\\s*(\"[^\"]*\")\s*)+></ROW>");
Re: Strange error: doc.setContent(data) returns false
Quote:
Originally Posted by
init2null
To start with, using regular expressions for XML is not a good idea. They are far too brittle. If you choose to use one anyway, here is your corrected expression. I use a free program called
Regex Coach to proofread mine. I didn't test this in Qt, but it should work.
Code:
QRegExp pattern
("<ROW\\s+((\\w+)\\s*=\\s*(\"[^\"]*\")\s*)+></ROW>");
I have tested your pattern but it seems not working; it matches only the last couple of attributeName/attributeValue.
I have used your pattern in the following piece of code:
Code:
while (!ts.atEnd()) {
pattern.indexIn(line);
int max = pattern.numCaptures() - 1;
for ( int i = 1; i < max; i += 2 )
{
QString attributeName
= caps.
at(i
);
QString attributeValue
= caps.
at(i
+ 1);
// do something with that data now
qDebug() << attributeName + "=" + attributeValue + "\n";
}
}
Thanks
Re: Strange error: doc.setContent(data) returns false
I guess regexps can only capture one value for each set of parenthesis. Since that's the case, either put in this snippet for each key-value pair
Code:
(\\w+)\\s*=\\s*(\"[^\"]*\")\s*
or parse the XML. I looked a little at your initial questions, and I can answer your question on getting the XML error message. Use the optional arguments for the setContent method: