Results 1 to 4 of 4

Thread: How to remove whitespace inside XML tag?

  1. #1
    Join Date
    Apr 2020
    Posts
    7
    Thanks
    2

    Default How to remove whitespace inside XML tag?

    Hello,

    I got some broken xml files, in which some xml tags contain white spaces, tab, or line break.

    How can I remove them? I try below codes, but it doesn't work.

    Thank you!

    Qt Code:
    1. QString str = QString( "xml content <xml tag> tag value </xml tag>" );
    2.  
    3. qDebug() << str.replace( QRegExp( "[<](\\s+)[>]"), QString( "" ) );
    To copy to clipboard, switch view to plain text mode 

  2. #2
    Join Date
    Jan 2008
    Location
    Alameda, CA, USA
    Posts
    5,230
    Thanks
    302
    Thanked 864 Times in 851 Posts
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: How to remove whitespace inside XML tag?

    Your regexp is wrong. It matches only tags that contain the opening and closing brackets and one or more spaces, essentially only empty opening tags: "< >". It also will not match closing tags "</tag>" or tags for self-contained elements "<tag/>".

    But I do not think you can do this with a simple regexp match and a string replace.

    It is true that XML element tag names and attribute names cannot contain space characters. However, other characters are allowed, such as '.', '_', and '-'. Uppercase, lowercase, and numbers are also allowed.

    More importantly though, XML opening element tags can contain attributes, and attributes must be separated by one or more spaces. Attribute values can contain embedded spaces. You do not want to remove either the spaces between attributes or the spaces inside attribute values. In additions, attribute values can contain '<' and '>' characters, so you can't use those as part of a regexp match either.

    Finally, you also have to distinguish between a tag name with an embedded space and a tag name followed by a space before an attribute name.

    So I think you will have to forget about using rexexp and essentially write a mini XML parser that embeds these rules of XML into its matching and replacement logic. You might be able to write a regexp that matches an entire XML opening or closing tag, but then you would have to parse the content of the tag to ensure that the only spaces you were replacing are those embedded in the tag name.

    Google for "recursive descent XML parsing" for some code you might be able to adapt.
    Last edited by d_stranz; 7th November 2021 at 18:04.
    <=== The Great Pumpkin says ===>
    Please use CODE tags when posting source code so it is more readable. Click "Go Advanced" and then the "#" icon to insert the tags. Paste your code between them.

  3. The following user says thank you to d_stranz for this useful post:

    sophvic (23rd November 2021)

  4. #3
    Join Date
    Sep 2021
    Posts
    6
    Thanks
    1
    Thanked 2 Times in 1 Post
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: How to remove whitespace inside XML tag?

    If your problem is that some xml elements has a space in the name (which is indeed forbidden), then you can do it with an algorithm like this:

    1. Find all closing tags with space, extract an element name from them and store those names in a QStringList.
    2. For each name in the list make a substitution name (by removing spaces, or replacing them with '_' or whatever you prefer).
    3. For each name in the list run a find-replace for "<%1" this will ensure that any opening tag name will be corrected without touching possible attributes.
    4. Do the same for closing tags ( "</%1>" )

  5. The following 2 users say thank you to White_Owl for this useful post:

    d_stranz (11th November 2021), sophvic (23rd November 2021)

  6. #4
    Join Date
    Jan 2008
    Location
    Alameda, CA, USA
    Posts
    5,230
    Thanks
    302
    Thanked 864 Times in 851 Posts
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: How to remove whitespace inside XML tag?

    If your problem is that some xml elements has a space in the name (which is indeed forbidden), then you can do it with an algorithm like this:
    This is a great idea and seems like it will work around all of the difficulties I mentioned with attributes.
    <=== The Great Pumpkin says ===>
    Please use CODE tags when posting source code so it is more readable. Click "Go Advanced" and then the "#" icon to insert the tags. Paste your code between them.

Similar Threads

  1. How to remove title bar of QWidget which is inside QTabWidget
    By duongtan_pfiev in forum Qt Programming
    Replies: 3
    Last Post: 8th July 2019, 16:40
  2. How to remove whitespace in qstring in beginning?
    By Gokulnathvc in forum Newbie
    Replies: 1
    Last Post: 20th August 2012, 09:16
  3. remove selection behaviour of icons inside QTableWidget
    By thejester in forum Qt Programming
    Replies: 0
    Last Post: 8th September 2010, 09:39
  4. Replies: 4
    Last Post: 10th December 2009, 16:40
  5. Clicking Whitespace in a table
    By shooogun in forum Qt Programming
    Replies: 5
    Last Post: 27th March 2008, 07:29

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.