Are you sure with Latin1 "U+00FC" ? For me it seems like Ã+00BC, with à being sort of an escape character.

I've looked up the codes in this Utf8 table here. With that information I can just string-replace all Umlauts and other important chars. But for my case, it actually works as expected.

This said, it's probably quite slow to tackle the problem this way, it seems like reassembling the debris instead of preventing the accident.


Many thanks for the idea to actually look into the bytes myself - sometimes I don't see the wood for the trees. If anyone has a better idea (in terms of performance or safety of use), I'd be very happy to improve or entirely change my approach, but for the moment I can use that.