[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [WEB SECURITY] noise about full-width encoding bypass?
- From: "Arian J. Evans" <arian@xxxxxxxxxxxxxx>
- Subject: Re: [WEB SECURITY] noise about full-width encoding bypass?
- Date: Mon, 21 May 2007 12:24:37 -0700
------=_Part_52172_16117918.1179775477528
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Let my clarify my first hastily/poorly written response:
1. The target parser for attacks using these techniques could be SQL query
engine, could be web user agent, etc.. The target parser, however, is not
necessarily where the decoding occurs.
2. Somewhere along the path from HTTP protocol --> to app untrusted entry
point --> to parser, there are several possible layers of decoding. These
could include:
+ Web Sever itself
+ Web Server plugin
+ Canonicalization in framework (e.g.-some .NET modules)
+ Canonicalization steps in web
code.
+ Decoding and interpretation by shellscripts and the like.
+ Decoding certain encoding types for normalization (see this a lot in PHP,
or cookies base64 file-system encoded, etc.)
+ etc.
This means that:
3. It is possible for an app to have one or more layers of
canonicalization/conversion, allowing for even crazy things like double and
triple-encoding, which IDS/IPS do not handle at all over HTTP (heck, only a
year or two ago most of the WAFs didn't handle these properly; I could walk
through all but Teros with simple encoding attacks):
+ Web Server does Hex URL, some forms of full-width
+ Shell script converts shellcode or some hex form
+ final interpreter decodes UTF-7 to UTF-8 to normalize before outputting to
parser
so you could have
1. Hex URL
2. Full-Width Unicode
3. Shellcode Hex payload
4. UTF-7
All decoded in order, potentially, to get down to your canonicalized attack
that works for the specific parser you are targeting. Now the above
four-step example would be a crazy rare case, but I've seen one instance of
double/triple-decoding just recently in a major production site that was
fairly insane. One of our developers keeps asking me over and over:
"Why in the world do these tests work?"
As a side note: I suspect it's related to that college-java-programmer
supplied IDE:
- control-c
- control-v
-ae
On 5/21/07, Arian J. Evans <arian.evans@anachronic.com> wrote:
>
> 1. You are missing what I consider to be the major point.
>
> 2. I don't know the context of the cert advisory; there are more encoding
> types than full under full-width that IDS today don't decode (that are of
> interest to us as well), but...
>
> 3. The question we need to ask ourselves is one of cannonicalization. In
> monolithic J2EE projects and modern cobbled-together web code, PHP is
> notoriously dirty for this, there are *multiple* layers of cannonicalization
> that often occur specific to particular untrusted entry points. This stuff
> is really hard to find (initially) in source code.
>
> You will find that sometimes you can even double-encode your attacks, and
> they get decoded/cannonicalized to their common ASCII or UTF-8 (or whatever
> format) before they read the parser (query engine, browser, shell script,
> smtp relay, whatever parser you are targeting).
>
> It's fair to be skeptical about this though Brian. It's not common to find
> where these attacks work, and I find that few people go beyond buzzwords and
> encoding-attack-technobafflegab when discussing this subject in the security
> "consultant" space.
>
> Guess it's finally time for a paper on this,
>
> --
> Arian Evans
> solipsistic software security sophist
>
> "I love deadlines. I like the whooshing sound they make as they fly by." -
> Douglas Adams
>
> On 5/21/07, Brian Eaton <eaton.lists@gmail.com > wrote:
> >
> > Has anyone had a look at the full-width unicode encoding trick discussed
> > here?
> >
> > http://www.kb.cert.org/vuls/id/739224
> >
> > AFAICT, this technique could be useful for a homograph attack. I
> > don't think it's useful for much else. However, a few vendors have
> > reacted already, so I may be missing something important.
> >
> > Here's why I think the attack is mostly harmless:
> >
> > Let's say an attacker wants to use this technique to hide a SQL
> > injection attack. They decide to use a full-width encoding for single
> > quote, 0xff 0x07. They successfully bypass the IDS, because the IDS
> > is only scanning for normal single quotes. (You can see the encodings
> > and their graphical representation here:
> > http://www.unicode.org/charts/PDF/UFF00.pdf )
> >
> > If the SQL engine is processing queries in Unicode, then 0xff 0x07
> > will be treated as a normal unicode character, not a single quote.
> > The sequence 0xff 0x07 is not equivalent to 0x27, the real single
> > quote value. No SQL injection occurs.
> >
> > If the SQL engine is processing queries in UTF-8, then 0xff 0x07 will
> > be converted from Unicode to UTF-8: 0xef 0xbc 0x87. Again, the engine
> > does not recognize 0xef 0xbc 0x87 as equivalent to 0x27.
> >
> > If the SQL engine is processing queries in ASCII or ISO-8859-1, the
> > conversion from unicode to the code page used by the engine will fail.
> > Either the engine will give up on the query, or it might substitute a
> > question mark (?) for the unconvertible character.
> >
> > To summarize: I think half-width and full-width unicode characters are
> > characters that happen to have the same graphical representation as
> > other characters, but don't carry any special significance outside of
> > that graphical representation. The graphical representation can be
> > important in homograph attacks, but otherwise I don't see this
> > technique as particularly useful to an attacker.
> >
> > Any comments on what I may have missed?
> >
> > Regards,
> > Brian
> >
> > ----------------------------------------------------------------------------
> >
> > Join us on IRC: irc.freenode.net #webappsec
> >
> > Have a question? Search The Web Security Mailing List Archives:
> > http://www.webappsec.org/lists/websecurity/
> >
> > Subscribe via RSS:
> > http://www.webappsec.org/rss/websecurity.rss [RSS Feed]
> >
>
>
>
>
--
Arian Evans
solipsistic software security sophist
"I love deadlines. I like the whooshing sound they make as they fly by." -
Douglas Adams
------=_Part_52172_16117918.1179775477528
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Let my clarify my first hastily/poorly written response:<br><br>1. The target parser for attacks using these techniques could be SQL query engine, could be web user agent, etc.. The target parser, however, is not necessarily where the decoding occurs.
<br><br>2. Somewhere along the path from HTTP protocol --> to app untrusted entry point --> to parser, there are several possible layers of decoding. These could include:
<br><br>+ Web Sever itself<br><br>+ Web Server plugin<br><br>+ Canonicalization in framework (e.g.-some .NET modules)<br><br>+ Canonicalization steps in web <br>code.<br><br>+ Decoding and interpretation by shellscripts and the like.
<br><br>+ Decoding certain encoding types for normalization (see this a lot in PHP, or cookies base64 file-system encoded, etc.)<br><br>+ etc. <br><br>This means that:<br><br>3. It is possible for an app to have one or more layers of canonicalization/conversion, allowing for even crazy things like double and triple-encoding, which IDS/IPS do not handle at all over HTTP (heck, only a year or two ago most of the WAFs didn't handle these properly; I could walk through all but Teros with simple encoding attacks):
<br><br>+ Web Server does Hex URL, some forms of full-width<br><br>+ Shell script converts shellcode or some hex form<br><br>+ final interpreter decodes UTF-7 to UTF-8 to normalize before outputting to parser<br><br>so you could have
<br><br>1. Hex URL<br>2. Full-Width Unicode<br>3. Shellcode Hex payload<br>4. UTF-7<br><br>All decoded in order, potentially, to get down to your canonicalized attack that works for the specific parser you are targeting. Now the above four-step example would be a crazy rare case, but I've seen one instance of double/triple-decoding just recently in a major production site that was fairly insane. One of our developers keeps asking me over and over:
<br><br>"Why in the world do these tests work?"<br><br>As a side note: I suspect it's related to that college-java-programmer supplied IDE:<br><br>- control-c<br>- control-v<br> <br>-ae<br><br><div><span class="gmail_quote">
On 5/21/07, <b class="gmail_sendername">
Arian J. Evans</b> <<a href="mailto:arian.evans@anachronic.com"; target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">arian.evans@anachronic.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>1. You are missing what I consider to be the major point.</div>
<div> </div>
<div>2. I don't know the context of the cert advisory; there are more encoding types than full under full-width that IDS today don't decode (that are of interest to us as well), but...</div>
<div> </div>
<div>3. The question we need to ask ourselves is one of cannonicalization. In monolithic J2EE projects and modern cobbled-together web code, PHP is notoriously dirty for this, there are *multiple* layers of cannonicalization that often occur specific to particular untrusted entry points. This stuff is really hard to find (initially) in source code.
</div>
<div> </div>
<div>You will find that sometimes you can even double-encode your attacks, and they get decoded/cannonicalized to their common ASCII or UTF-8 (or whatever format) before they read the parser (query engine, browser, shell script, smtp relay, whatever parser you are targeting).
</div>
<div> </div>
<div>It's fair to be skeptical about this though Brian. It's not common to find where these attacks work, and I find that few people go beyond buzzwords and encoding-attack-technobafflegab when discussing this subject in the security "consultant" space.
</div>
<div> </div>
<div>Guess it's finally time for a paper on this,</div>
<div> </div>
<div>-- <br>Arian Evans<br>solipsistic software security sophist<br><br>"I love deadlines. I like the whooshing sound they make as they fly by." - Douglas Adams <br> </div><div><span>
<div><span class="gmail_quote">On 5/21/07, <b class="gmail_sendername">Brian Eaton</b> <<a href="mailto:eaton.lists@gmail.com"; target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">eaton.lists@gmail.com
</a>> wrote:</span>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;">Has anyone had a look at the full-width unicode encoding trick discussed here?<br><br><a href="http://www.kb.cert.org/vuls/id/739224"; target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
http://www.kb.cert.org/vuls/id/739224</a><br><br>AFAICT, this technique could be useful for a homograph attack. I<br>don't think it's useful for much else. However, a few vendors have<br>reacted already, so I may be missing something important.
<br><br>Here's why I think the attack is mostly harmless:<br><br>Let's say an attacker wants to use this technique to hide a SQL<br>injection attack. They decide to use a full-width encoding for single<br>quote, 0xff 0x07. They successfully bypass the IDS, because the IDS
<br>is only scanning for normal single quotes. (You can see the encodings<br>and their graphical representation here:<br><a href="http://www.unicode.org/charts/PDF/UFF00.pdf"; target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
http://www.unicode.org/charts/PDF/UFF00.pdf</a>
)<br><br>If the SQL engine is processing queries in Unicode, then 0xff 0x07<br>will be treated as a normal unicode character, not a single quote.<br>The sequence 0xff 0x07 is not equivalent to 0x27, the real single<br>quote value. No SQL injection occurs.
<br><br>If the SQL engine is processing queries in UTF-8, then 0xff 0x07 will<br>be converted from Unicode to UTF-8: 0xef 0xbc 0x87. Again, the engine<br>does not recognize 0xef 0xbc 0x87 as equivalent to 0x27.<br><br>If the SQL engine is processing queries in ASCII or ISO-8859-1, the
<br>conversion from unicode to the code page used by the engine will fail.<br>Either the engine will give up on the query, or it might substitute a<br>question mark (?) for the unconvertible character.<br><br>To summarize: I think half-width and full-width unicode characters are
<br>characters that happen to have the same graphical representation as<br>other characters, but don't carry any special significance outside of<br>that graphical representation. The graphical representation can be<br>
important in homograph attacks, but otherwise I don't see this<br>technique as particularly useful to an attacker.<br><br>Any comments on what I may have missed?<br><br>Regards,<br>Brian<br><br>----------------------------------------------------------------------------
<br>Join us on IRC: <a href="http://irc.freenode.net"; target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">irc.freenode.net</a> #webappsec<br><br>Have a question? Search The Web Security Mailing List Archives:
<br><a href="http://www.webappsec.org/lists/websecurity/"; target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://www.webappsec.org/lists/websecurity/
</a><br><br>Subscribe via RSS:<br><a href="http://www.webappsec.org/rss/websecurity.rss"; target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://www.webappsec.org/rss/websecurity.rss</a> [RSS Feed]<br>
</blockquote></div><br><br clear="all"><br>
</span></div></blockquote></div><br><br clear="all"><br>-- <br>Arian Evans<br>solipsistic software security sophist<br><br>"I love deadlines. I like the whooshing sound they make as they fly by." - Douglas Adams
------=_Part_52172_16117918.1179775477528--
Brought to you by http://www.webappsec.org
Search this site
|