Hi, all. Welcome back to Track 2. Next talk is attacking XML processing from Nicolas Gregoire. Please welcome him. Hi, all. So, I'm Nicolas Gregoire, and I will talk about attacking XML processing. A short introduction about myself. I have been working in information security for more than 12 years, and 18 months ago a customer asked me to audit several applications using XML digital signatures, and I compromised three targets. There was one client side and two server side. And during this engagement, I found this kind of technology fun, and I chose to investigate more. And now I have a very big bunch of XML-related vulnerabilities. You have a client side, WebKit, Adobe Reader, Firefox, server side with LifeWay or.NET Nuke, and some libraries which are used a lot. That's what I will speak about. I will first introduce XML technologies. Then I will speak about encapsulation, which is the fact of hiding some interesting data inside XML containers. Then I will present limited denial of service attacks, which are useful in black box mode in order to detect if a specific kind of processing occurs. And the main technical part, which is the exploitation of both XSC and XSLT vulnerabilities. So XML stands for extensible markup language. We will try to define the three terms. Markup, it's an easier one. It's like HTML. Everything is between angle brackets. And you have some tags. You have some attributes. So here, too. So it's very simple. Extensibility is very important. In order to understand extensibility, we need to understand namespaces. Namespaces are used to define the precise meaning of a tag. And they are usually defined by your URL. So that's a very simple example. We have an HTML document with two namespaces. This is the default one, which is the XHTML namespace. And the full one, which is a private namespace with my own URL. And then when a browser will try to interpret this document, only the word will be used. Only underscored because the browser has no idea of the meaning of the U tag in this namespace. So the main use of namespaces is to avoid ambiguities. For example, if you find a font tag, you need to know if it's related to XHTML, to SVG, or to a very specific configuration file. But there is some more interesting stuff for attackers. You can use some namespaces in order to figure some very specific features. For example, the first one can be used to call PHP code from XSLT. It could be useful. The second one can be used to create file in the libXSLT parser, which is used, for example, in WebKit. So I published last year vulnerability when you can create file on the client hard drive. And the third one is used in ZalangJ to execute Java code from XSLT. The next term is language. In fact, in XML, in the valid XML document, you can find much more than only data. So we have data. You can have XSLT code inside the XML document. It's still valid XML. You can find some grammar in the document type definition. And processing instruction, which will include the parser to trigger specific XSLT code, for example, in order to apply it to an XML document. The main point is we need to be careful that we can have much more than data when processing XML document. So that's an example of a complex XML document. On the first line, you have processing instruction. On the second line, you have a DTD. Then you have some XML data. Then some XSLT code. And the overall goal of this document is to generate some SVG output. You can see the namespaces here. The question is, is it really useful to create this kind of document? The answer is yes. This document is several interesting things. It's first some XML data with embedded XSLT code. It's also a self-contained dynamic SVG image. And it's a book for the WebKit normality, dropping files on the victim's hard drive. And if we open this document in several browsers, in Firefox, it's the name of the XSLT engine. You have a green circle. It seems okay. In Opera, you get the same result. And in Chrome, you will get a red circle. And you need to check your TMP directory for a file just wrote. XML is used in a lot of technologies. So I already spoke about SVG and XSLT. If you are reading some blogs, you are probably using RSS or Atom Fit. These documents are mostly XML. Web service uses a lot of XML technologies, for example, SOP or XML. You can even listen music using some XSPF playlists which are supported by VLC. So there is some XML documents everywhere. And I have some real-life screenshots of applications using XML. So that's the Microsoft link online service. As a user, you can provide an XML file which will be passed by the application. So you can trigger some specific processing server side. The W3C offers this kind of functionality, which is XSLT engine available online. You can provide your own URL to XSLT and XML. So it's very dangerous. I didn't check. But you can probably execute Java code server side using this kind of interface. Chronopost is a French shipping service. And if you are using the professional version of the interface, you have some link like this. And the fun part is that this kind of servlet and these parameters are exactly the same in the sample application provided with XLNJ. So it could be that Chronopost is using in production over the Internet the sample XLNJ application which is very vulnerable to a lot of things. I didn't play with it. But you could try. That's some Google results looking for an URL, XSL URL equal HTTP. The fun part is that you have a lot of technologies. You have ASPX. You have some XQuery. You have some Java servlets. And here you have some PHP code. So it's really a cross platform. Everybody is using XSLT. Even if you are using PHP or ASPX. Everything is possible in XML. When editing some applications using XML, we need to ask ourselves a few questions. The first question is how can we feed some XML document to the application? It's always the same problem in editing. And if you find a way to submit data, for example, file upload or REST interface or anything else, you need to ask is the data processed by the application? So I have a nice example. If you upload the SVG file to Wikimedia, for example, Wikipedia, there is an automatic conversion to PNG which occurs. So it can be difficult to detect. You just can upload some SVG file. But you need to know that there is some processing done server side automatically. So if there is some processing, you need to find who is doing the processing and where. It's mostly client side and server side. In some B2B environments, you can find some gateways which are doing on the flight transformations in order to support several versions of the same protocol, for example. And a very basic example is atom or RSS fits which can be read client side in your browser or in a dedicated software or server side, for example, in your Google home page. And then once you know where are the processing points, you need to ask some questions about their functionalities. So if you can submit some arbitrary data, processing instruction executed by the application. If you can submit some grammar or external entity result, if yes, you probably have XML external entity vulnerability. And if you can provide some code, which extension are available, you can have something like accessing databases or executing Java code. All the demonstration during the talk are based on atom fit. So it's very it's false, but it's a simplistic view of atom fit. You have a root tag which is feed the block title and for each entry, you have another title. The demo infrastructure is simple. We have three server side applications, one using Perl and a dedicated library. One using PHP and the formatting is done with XSLT. And one under Windows using JSP and XSLT. And atom content is provided by some server on the Internet. And here is the article. So that's a local copy of the atom feed from the US. As you can see here. And if we submit to the Perl application, we have a very simple display. For each entry, we have the size of the entry and its name. And we have exactly the same layout on PHP. And in Java. Okay. So everybody is now okay with XML. We will speak about encapsulation. XDP is a file format defined by Adobe. It's a very old file format. And this is a screenshot of its Wikipedia page. The interesting point is that this format can be used as a container for PDF and XFA data. And it can also be used inside a PDF document. So it's easy to see. We can play XDP with inside PDF file, inside the PDF. And we will try to find some interesting use. Like bypassing. So it's a MSF cool type PDF file is generated by metal sploit. It's very basic. It's a three-year old vulnerability. It's detected here eight out of nine. And here 27 out of 43 antivirus is infected. And we try to avoid at least some antivirus. That's the patch for metal sploit. It's something like ten lines long. And it's very simple. You take your PDF. You do some basic C4 encoding. And you put the result here. And you get an XDP file. And if we submit this XDP file to antivirus, we get zero out of nine and zero out of 43 antivirus solution. So we have a full sample detection version. The fun part is that XDP files are automatically opened by Adobe under Windows. It's a very similar icon. So if you can send an email with an XDP attachment and the user double click on the attachment, you get your malicious PDF opened inside Adobe. So now we will try to create some limited resource derivation in order to detect some kind of processing in black box mode. So if we go back to the SVG to PNG conversion by Wikimedia, we may wonder is DTD processed. And so we will use the attack, which is very well known. It's quite simple. You define your DTD. You define a role variable. Its value is role. And the role one variable, its value is ten times the role variable. Et cetera, et cetera, et cetera. And in your document, you insert the role nine variable. So you have something like in memory extension of the variable. And if we do some quick math, we will get one billion of role string, which is three gigabytes, which could be enough to slow down the server and detect that the DTD is effectively processed by the Wikimedia application. Now we want to detect if some XML digital signature application is supporting accessibility. The norm says it should. And the best practice by the W3C says it shouldn't. So we have something similar. You have the XML function, which will convert one number to another format. We will try to convert 137. And the I format is for Roman numerals. So we get this kind of output. And the interesting thing is that the M is the biggest unit in Roman numerals. So if we ask for bigger values, we will get a lot of M. So we are requesting some very large number here. And we will get one gigabyte of M. So it's similar to the previous trick, but you are here detecting XSLT processing and not DTD processing. So I have a few demonstrations. So that's the billion love attacks. I just use LOL6 in order not to crash my computer. So if I submit to the application, it works, I get four megabytes. In PHP, you have a kind of error message because there is an entity reference loop which is detected. I didn't have a look to the code. It probably can bypass its PHP. It's easy. And in JSP, we will get an error because we can oh, no. So it works. It's another demo. It works in JSP too. And here it's XSL number proof of concept. So the application is not using XSLT, so no point in testing. The PHP one, it works. I don't know if you see the scroll bar. It's very small. And if we submit to JSP, we get an error because it can convert values which are greater than 4,000. I don't know why. Probably in order to avoid these kind of vulnerabilities, but there is no documentation related to this bug or feature. It depends. Okay. And the last thing. If I try to open the feed directly in Firefox, it will detect some it's some atom and will propose to subscribe to the feed. And as you can see, we have the extension which occurs. So it's very funny. Okay. Now, I will now speak about XML external entities which is probably the most common XML vulnerability. It's very simple. You define a doc type and an entity which is a ref and its value is the content of the ETC password file. So each time you will use this value in your document, it will be replaced by the content of the ETC password file. That's some XC vulnerabilities published or found since last summer. In bold, it's vulnerabilities I found myself. And in red, it's vulnerabilities in some libraries. So, for example, was patched this week. And there is something like 10 or 20 applications using this library. So they are all impacted. And it's enough to speak rest with the application to steal every file you want including configuration files. In my opinion, the impact is mostly underestimated. That's what is well known about XXC attacks. You can read some ASCII file, text file, or UTF-8. You can use internal network. I mean, you can see the network as seen from the parser. And you can, for example, do some banner grabbing or do some blind hit in order to exploit Tomcat internal servers, for example. You have some specific tricks in Windows environments. You can use the file in order to do some patch or stealing and TLM. And under Java, you can list the content of directories, which is very useful if you don't know the setup of the target. And then we have some advanced features. You can read binary file. It's in PHP. You can access some internal servers using Perl. And you can execute some arbitrary command in PHP and much more. Everything depends on the context of execution. It depends on the XML parser itself. The operating system, as we just see under Windows, you have some specific tricks. The programming language, because most XSNT parser allow to execute Java code. And the application features, which could add some additional URL handler. So that's the file URL handler. And under Unix, you can access the system, like Proc, which could be useful for exploitation of memory option bug later in the process. The path under Windows. And directory listing under Java. PHP. There is a lot of special URL handler in PHP. The HTTP one is very friendly. Because if you can get some error message, you can get the banner. So it's SSH. It's VNC. There is a PHP URL handler, which can be used to do some preprocessing on open file. This means that you open this kind of URL, and each time you call read on the file descriptor, you will get by64 encoded result. So this file in Proc will contain some new byte every time. And using this trick, you can get the by64 version and decode it later. And you can also use SSH2 to access the local machine or some internal other machine. You only need to find a weak username and password. So I'm using Oracle. Why? I don't know. You can use this trick to read file on other machine. And I have a demo using the local root account. So it's very simple. We refer the self maps file in the Proc system. So under PHP, as we can see, we are in Apache 2 process. And you get some nice information about the stack and the IP addresses. And every module loaded. If we do the same thing in the CGI application, we get everything related to Perl, including IP and stack address. And of course, it works just as well with the ATC password. So that's for Linux. If you want to play with Windows, it's very simple. And it works. Okay. Something more interesting. Here we will use PHP twice. One for do some root 13 conversion. And one for by64 encoding. We will read to specific file. As you can see here. So it works only on PHP. So that's your LSB release file. No problem. And here we get the command line of the current process. So we have Apache 2, then a new byte, then a new scale, then a new byte, then start. So it's totally impossible to read this kind of file without using the PHP trick because it's not an ASCII file. Now we will connect to the SSH port on the local machine. And we get everything you need. So I'm using Ubuntu. And here is a local root account as a weak password. And I will try to read the ETC shadow file. It's a little longer because we need to do an additional SSH connection. And here my root password. You can take pictures. Okay. And just to show the directory listing in the Java. I will request the program file folder on the window. Question? Okay. Let's go. So I will now speak about XSLT. I have an introduction slide. Yes. XSLT is a functional programming language which is the functional word is part of the problem. We will see later. And it's defined by the W3C since a lot of years. And it's still misunderstood, it seems. So the purpose of this language is to transform XML document to something which could be XML for data extraction, which could be SVG image. You can produce charts from XML data. Or PDF, TXT, whatever. It's Turing complete. There is some proof available online. So we know we can play a lot of tricks. And it's main use are extracting data, either to display it to human people or to feed it to another application. And for example, under open office and probably Microsoft office, when you convert one office document to another format, XSLT is used behind the scenes to do the conversion. And where can you find some XSLT parser? So your processor is using XSLT. Your browser is using XSLT. Your database server is using XSLT. For example, in Oracle it's very easy to trigger the XSLT parser. And in XML digital signatures, you may find some XSLT support. It depends of the implementation. That's a list of XSLT engines I played with. In blue, in green, it's parsers where I didn't find any vulnerability. In blue one, it's parsers which are safe by default, but you can easily change configuration option to get a vulnerable state. And in red, it's vulnerable XSLT parsers. Vulnerable, it could be to design issues or memory corruption or anything else. So that's the introduction to XSLT. The fuzzing part, I have done some very stupid fuzzing on XSLT. I mean stupid, it's mutation-based fuzzing. It's very simple. You take a lot of XSLT engines. You take a lot of input files that you will find on Google or in bug trackers or in conformance verification software. And you use a diversifier which is a tool which will take an input file and produce several output files with small modifications. And you do some kind of monitoring looking for bugs. The diversifier I use is which is very, very, very cool. And so I generally give it 5K files and ask for 1 billion in output. And I feed them to the parser. The monitoring was done with Valgrin if I don't have access to the source code, for example, for the Oracle XSLT engine. And done with address sanitizer, which is an LLVM plugin when I can recompile the parser. As for any fuzzing sessions, the only interesting thing is how much bugs you found. So I found a lot. This is I have very few published advisories because vendors are slow. This is what they are. And interesting thing about this bug outside of the free T-shirt is the bug itself. In fact, I took SVG image. I only replaced SVG namespace by the XSLT one. And it caused a very specific code pass in Firefox. Nobody was able to understand what happened. And we don't know if it's exploitable or not. Mozia chose to play safe and say it's exploitable. But the modification, it's like a copy pasting 20 characters and you trigger a crash in Firefox. Okay. This point is interesting. So on the lower part, as you can see, this is Oracle error code for segmentation fault. And as you can see in the log, the frame pointer is controlled by the attacker. So we have a very basic stack overflow. And for everybody interested, that's the way to access the XML and XSLT parser in Oracle. This means that if you have a SQL injection on Oracle database, you can very easily trigger this kind of overflow and get a share on the database server. This is Adobe reader line and the Linux. It seems it's only a crash, but it occurs during the malloc country date. So it's a corruption. It's not a very nice bug. That's why I'm able to give some information before the patch. But I have some bugs working very well on Adobe reader 10 under Windows, which is the latest version. But no patch. So there is much more. There is some WebKit, Oracle, Opera, a lot of bugs. So now we speak about basic constructs. The problem with XSLT is that it's a functional language. This means that by construction, you will not be able to create state and you will not be able to have writeable variables. So from an explanation point of view, it's a big problem because we can't create a loop. We can't create neither for a while, and every variable is read only. But we have some very specific objectives. We want to do some brute force. So we want to have a for loop. And we want to execute commands and get the output. So we need a while loop. So brute force. It's quite simple. We will put our data for the for loop in XML, and we will use XSL for each for the processing. That's an example from the WebKit dropping file on the local drive. In blue, it's the content of the file I want to drop. And in green, it's some possible path. And I want to try every possible path. So we get this kind of loop. You need to see the root tag is data. And then location. So we go to the root tag. And for each location, we get the value and we try to create a file in this location with this content. So we effectively have a for loop. If you have some SQL extensions in your XSLT parser, you can use this kind of for loop in order to brute force credentials for internal database. And once you find a valid credentials, you can use the XSLT extension to grab data. I have a brute force password cracker. It should be available online on the Wiki. The while loop is much more complicated. So we want to read STD out. The solution is to use some template, recursivity, and a code generator. The code generator was created by a German guy. And it will introduce three new tags, which are loop update that you can use to have some writeable variable. And loop for and loop while construct. Just to show the complexity. That's a for line of Java. Who reads a line, appended to result. And at the end, we'll print the result. It's for line. If we use the XSLT loop compiler, it's 18 lines. So we can see our read line here. And at the end, we will try to print where? We will try to print the result. But this is invalid XSLT code because you have this kind of tags which are forbidden. So you use a code generator. And you get this kind of valid XSLT code, which is 50 lines long, but which will work as XSLT loop. So for the loop demonstration, we have a very large list of Java property. And we want to get the value. And we get a for loop here. For each property, we get its name, its value, and we display everything. And it works very well. And as you can see, I'm using XP3 on a French computer, blah, blah, blah. And it's Java 1.7.0. For the while loop, we have an XML document with a list of commands we want to execute. And you can see the scroll bar here. We have a very long XSLT document which only in fact you have something, excuse me, some finger printing of the underlying operating system in order to choose between bash and CMD.exe. And then it executes the command. So it works. As you can see, we get the Microsoft version. We get a directory listing for the C drive. And then the output of the set command. Okay. Next, last part. There is several ways to gain execution of high level code in XSLT parser, mainly PHP, .NET, and Java. For example, in XLNG, by default, you can execute Java code if you can execute XSLT code. In PHP, you can execute PHP code if the register PHP function is called. And in XML spy, which is a client side software, we use for generating or creating XML file, you can by default execute.NET or Java code. So in PHP, it's very simple. We will use this name space that I showed at the beginning. We will associate it to the full keyword. And then in your XSLT document, you need to call full function and then the PHP function and its arguments. So for example, calling PHP info, it's just full function PHP info. The bad news is that require include and include once. And eval can be called using this kind of stuff. I don't know why. It seems it's PHP specificity. It's not functions. But we can use assert and the replace, which are well known to allow PHP code execution. Java is more complicated. In fact, at Berlin side, last year, I said that it seemed impossible to get some Java and XSLT interactive shell because we can't create thread and we can't create our own places. So it seems very difficult. And just after Berlin side, I published my content online. And I have a German guy, which is Mihi42, who told me that I was wrong and it was possible to execute arbitrary base 64 class files in Java. So for me, it was very big win because it seems to be impossible for two or three months I spent on this kind of bugs. And in fact, Mihi have published Java payload, which is a Swiss knife for Java exploit. And you can output a lot of payload in a lot of formats. For example, you can create Java file or upload. And this is interesting to me, XLNJ output. So you can choose your payload, which is a reverse shell, for example, and ask for XLNJ output, and you get an XSLT style sheet, which will trigger a reverse shell. In order to get things done easier, I code a metasploit module, which should be perhaps in the next version, which is for.3. And for the moment, you can still access the ticket. The code is available in the ticket. And it's a very easy way to get some PHP or Java metasploit. And I have some demonstration. So that's the way to get a PHP metasploit shell in XSLT under PHP when PHP register function is called. So we will call preg replace first. Here the namespace, which is associated to PHP. Here PHP function, preg replace. And here our code, our PHP code. And our PHP code is here. It's very simple. We will do an eval on the base64 decoding of this kind of string. So the base64 string is generated automatically by metasploit. Okay. So metasploit, it's very simple. You use the exploit. Target zero is for PHP. Target one is for Java. Okay. That's RC, metasploit resources, just to avoid typing every command. So I have I don't know if you see from the back. I have two jobs who are waiting. It's the PHP and the Java one. And if I submit to PHP. Okay. Thank you. Okay. As you can see, it's a PHP metasploit shell. And Apache. Okay. The Java one is much more complicated. In fact, we will use reflection in order to access enough internal function of Java to create our own cases. And we will use enough functionality to download external Java file, which is which include metasploit meterpreter. And we will just execute it. Just to show you. If somebody understands this kind of Java code. And submit to JSP. It's a little longer. Okay. Test complete. And another metasploit shell, which this time is in Java. And as you can see, I get system privilege under the Windows machine, which is game over. Okay. And the last one. That's the smaller Firefox crash I have. As you can see, there is a right parenthesis missing here. And this will break the parser. You can get something like that. Boom. Bye-bye. Okay. Conclusion. XML is everywhere. You really need to wonder how your application is parsing XML and not is it parsing XML. XML is much more than data. The XML attacks are related to the grammar. The attacks are related to code execution. It's not data. And you need to check very carefully if your document embed more than data on how your application will parse this kind of content. Third point. The offensive side have more and more tools. In penetration testing, I can sometimes some shells, big, big, big back end like processing credit cards using XSLT. So it works really well in real life. And the last point, which is not a good news, is that this kind of DTD and XSLT attacks are known for more than ten years. The first publication were in 2001 by Guyninsky, for example, and we still find a lot of vulnerable applications. If you have some questions, it's time. And thank you for coming. Thank you for a nice presentation. Cool stuff. Basically, a short question. Where can we find or did you publish the demo, like page you have been using for the presentation? Can we get the source code for that and play? Okay, the URL was on the last slide, which was already shipped. Okay. Thank you. So on this wiki, you will find most of my findings, including every source code and everything. The fun part is that this wiki has a REST interface for publication and it's vulnerable to XXC attacks, which is still un-patched, so please do not ask my own wiki. Any more questions? No? Well, then please thank him. Thank you.