Unauthenticated RCE via SAML and XML (a presentation)
Context
There was a high-severity security incident in Zoho, โ22: attackers could run any code on affected computers without having to log in.
More than >20 products were affected.
SAML is a foreign concept to most devs, but XML is widely known.
Whatโs not widely known is the features XML comes with and how they can be abused, as in this case.
The only way to truly improve security is for the actual developers to understand risks - not just โexpertsโ, not just a security team.
This was my attempt at starting at the basics and sharing the knowledge required to truly understand what could (and did) go wrong.
Here are the slides from my talk โ
say you want to display this on your website this is what the html for that table looks like html has both styling needed + data if you had to write the data down, how would you write it? in computers, we generally store it in a format called JSON (Javascript object notation) there's also an alternate format to store data, called XML XML comes with a friend - a complimenting format '.xslt' (style transforms) that can be used to store styling for that data If you look closely, you can see where the xslt refers to the 'user' entry in the XML Overall, xslt picks data up from xml and tries to TRANSFORM it into a HTML this gives us a happy family: xml to store data, xslt to convert it into a HTML buuttt JS is much more powerful that xslt at transformations (yay JSON?) but xslt said 'screw you!' and developed a plugin where you can use JS inside xslt you can see an example here, where the JS function on top is being called inside xslt below it here it takes the date from XML and converts it to the user's timezone but why top at JS? Xalan could support other languages like Java too What have we learnt, kids? xslt is capable of transforming XML into any other format, not only HTML - it can even transform it into JSON! So we realize that there are a lot of cool things going on in the XML ecosystem. Transferring data is just a teeny tiny part of it. let's take a quick detour and talk about some other stuff a crash-course on signing take a minute to check this out: we can 'sign' data in XML by adding it to <SignedInfo>, and including the encrypted version of that data and a decryption key we could sign all the original data if we want to (although that means including the same data in an encrypted format too, so 3x the size) ...or we could just encrypt the hash of the original data So here are the updated steps: instead of direct comparison of original data and encrypted data, we hash the original data and compare it with the encrypted hash Instead of copying the original data again inside <SignedInfo>, we can just add a reference to it a quick recap: what we want to encrypt is inside <SignedInfo>, encrypted hash of <SignedInfo> is inside <SignatureValue>. <KeyInfo> has info required to decrypt <SignatureValue> and check if <SignedInfo> is legit. what if we want to sign not only user-details, but also last-modified? Easy, just add that reference too oh I forgot: since we want to decrypt the encrypted data with the key to check signature, we also need to include the algo used to encrypt it. we have another problem: hashes are unforgiving. In the app, all 3 formats in the image will give the same value, '123' - but hashes will be different. Another case: for the app, it doesn't matter if user-details comes on top or last-modified does. But again, the hash will differ. But we want the same value to give the same hash. So we also include the algorithm to be used to normalize data (sort/remove whitespace), and call it Canonicalization Method HTML has comments etc too, which actually need to be ignored while hashing - because to the app it'd all be the same value So we specify any transformations to be done before hashing it as well, like removing comments A summary of the full process it gets interesting now! yep yepp (from the first part) dayumn sonn let's get to the issue now aight aight back up SAML is a protocol used to log in via 3rd party services (like log in with FB/log in with Google etc) SAML uses XML to transfer data between these apps and calls it a 'SAML assertion' looks all complicated, but it's basically what we saw before with lots of data. You can see the signedInfo, data outside, transform, canonicalization method etc this is what the attacker's modified XML looked like. They got a legit copy of the SAML XML and modified it to use Java code using Xalan code inside the <transform> part! This is how we process the XML: we verify hashes of data in <SignedInfo>, and then verify <SignedInfo> itself using <KeyInfo>. The problem is that by the time it got to the <KeyInfo> part, the attack was over! The malicious code runs during verification of hashes in the first step. Then the KeyInfo verification happens, which fails and shows an error - but the damage is already done. the obvious question and that was the obvious fix. Check if <SignedInfo> is valid using the <SignatureValue> the obvious fix' code (this is from the library Zoho used, not Zoho's code itself). Actual code changes from that library don't worry, there are some more footguns some apps use a different library to validate XML and then a different library to handle the SAML process. If the same XML includes multiple assertions (one legit and one malicious), validation could happen in one and SAML in the other there's also the problem of the XML spec not helping out. For eg., this order of validation (<KeyInfo> first then <SignedInfo>) should have been in the spec/docs, but is mentioned nowhere. because the spec isn't clear, different libraries behave differently not to mention, just like signing there's also encryption ._. andd you can include external files as well in the XML apart from this, there's also another format, DTD in the XML family did I mention you can use variables too? Search for 'billion laughs' and tons of tiny 'features' that are waiting to bite you in your plumpy backside even worse: each library has different ways of handling each feature, require you to manually disable features, have different defaults... while the average user just wants to use XML to store some data! and there's that! I encourage you to put in the effort to learn things in depth and explore internals. I hope you do - and that you share that knowledge with me too!