Re: [RFC] Block requests to builtin SQL functions where PHP can prove the call is vulnerable to a potential SQL-injection attack

From: Date: Sun, 02 Aug 2015 09:50:34 +0000
Subject: Re: [RFC] Block requests to builtin SQL functions where PHP can prove the call is vulnerable to a potential SQL-injection attack
References: 1 2 3 4 5 6 7 8  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
On 31 Jul 2015, at 17:40, Ronald Chmara <[email protected]> wrote:
> The RFC and bug report both make an erroneous assumption that unescaped GPC input is wrong.




Hi Ronabop,

Slightly continuing our discussion, but also replying to your on-list message...


Starting with your examples:

1) Web applications that send variables directly to the database (e.g. phpMyAdmin), if I remember
their codebase correctly, there is about 3 places where that happens, and after doing an
authorisation and CSRF check, they could put in the 1 line to say that this value is already SQL
escaped.

2) Overhead of escaping, where you later explained that this was more for legacy systems such as
FoxBase and legacy ODBC  data sources, where they pass the values over the network to have it
escaped (slow)... and while that is a valid concern, hopefully all escaping is done on the client
side now (if not, my previously non-parameterised queries would have taken far too long to load)...
but that said, mysql_real_escape_string() isn't good enough either (it does not apply quote
marks, as Matts example shows), hence why my examples talk about pg_escape_literal().

3) Generic user written escaping libraries/functions to support multiple destinations... if they are
still using the native escaping functions, then they don't need to do anything different... if
the are using something like addslashes (maybe this is acceptable for something), then that
library/function should mark it as having been escaped.

4) From a pre-cleaned source, e.g. taking HTML from a WYSIWYG editor, passing it though HTMLTidy,
storing in the database... likewise, this just needs a single function call to say that this value
is already HTML encoded - this is the "bio_html" example shown in bug 69886 :-)

5) Where a developer "is doing a file read off of a local hard drive and assuming their file
contents will never have an SQL injection"... well, I'd like all string variables (e.g.
from GPC, a file, database, xml, etc) to start with the assumption that it is unescaped... the
developer could mark that as having been escaped, but more likely, just escape it (or use it with a
hard coded, parameterised query, in the case of SQL).

--

But I completely agree that raising too many notices will just mean that this feature would be
switched off.

I've been focusing too much of my own systems, and the assumption that most systems
would/should be using parameterised queries, html encoding, etc (so they shouldn't get any
notices, unless they have made a couple of mistakes).

So the suggestion of using a new ini setting is a good idea (from Matts or the 2008 RFC's)...
or as you suggested, perhaps have it on a per file basis, that could also work, like the
declare(strict_types=1).

That way we have a useful security feature that can be switched on (rather than everyone investing
in an expensive static code analysis system, which uses dark magic to find "every
problem")... then hopefully early adopters would start using it, it slowly becomes good
practice, and sometime in the long distant future, it can be enabled by default (while in the mean
time we try to educate as many developers as we can though channels like Stack Overflow).

---

As to the "rarely" comment, fair point, I don't have numbers to back this up.

What I'm trying to say is that, for the typical use case for websites (i.e. not things like
phpMyAdmin), when you are using values in SQL, it is for them to be used as variables in the
query... e.g. where an id equals this value, or insert a record with these values, or update this
record with this value... i.e. how values in ORMs should be used.

Likewise, when you have variables that need to be printed in HTML, the "safe" normal would
be to have the PHP (or other static code) provide the HTML, and the variables should be html encoded
(e.g. if you were printing someones name)... there is an exception to this was explained above
(example 4).

I realise that I'm trying to word it differently, but what I'm trying to say is that all
values should be assumed bad, and it is up to the developer to either escape them, use parameterised
queries, etc... or call a single function to say "yes, I know this one is fine"... then
PHP can identify anything that has been missed.

Craig







On 31 Jul 2015, at 17:40, Ronald Chmara <[email protected]> wrote:

> On Thu, Jul 30, 2015 at 8:38 AM, Craig Francis <[email protected]> wrote:
> On 30 Jul 2015, at 16:26, Ronald Chmara <[email protected]> wrote:
> > Perhaps I have missed something in this discussion
> I think you have... my email from a couple of weeks ago was ignored... so I replied to
> Matt's suggestion (which is similar, but different).
> Please, just spend a few minutes reading my suggestion, it has absolutely nothing todo with
> breaking applications:
>    http://news.php.net/php.internals/87207
>    https://bugs.php.net/bug.php?id=69886
> 
> The RFC and bug report both make an erroneous assumption that unescaped GPC input is wrong.
> 
> I was addressing some cases where it is the correct, intended, behavior. 
> 
> WRT "breaking":
> Application stacks which go from zero E_NOTICE warnings, to hundreds or thousands of them a
> second or day, is (admittedly) poorly phrased as "breaking". It is a possible side effect
> of the proposed solutiions. I have worked in very stingent environments where an unapproved E_NOTICE
> is considered a ship-stop break, but I did not explicitly explain that. Such environments would
> require re-writes of all SQL that throws an E_NOTICE, or a per-line exception review and approval
> process, or disabling/not enabling the checking.
>  
> And yes, I do have a bypass_the_nerfing function (well, a function to say the variable has
> already been escaped)... but the idea is that it's ever so slightly harder to use than the
> related escaping functions, and rarely needed.
> 
> "Rarely" is subjective at this point, a quanyifyable sampling of some kind could be
> more meaningful. (How rare?)
> 
> Based on *my* purely anecdotal experience, in the last X years of using PHP I have have
> frequently encountered situations where passing through engine-unescaped text strings, to SQL, is
> desired for some reason, in nearly every environment. I mentioned one use case that I thought might
> be relevant to a large number of users, there are others.
> 
> Off the top of my head, some use cases I have dealt with relevant to this discussion, that
> should be considered (even if they're discarded as trivial):
> 1. Administrator has a web application that is supposed to allow them access functionally
> equivalent to a direct connection to a database.
> 2. Overhead of using the engine escaping technique (setup connection(if not done yet), send
> text to escape at network speed, recieve escaped text at network speed) was considered too much of
> an additional performance hit.
> 3. Text needed to have a generic, user written, escaping library/function to handle multiple
> destinations (think 5 different data storage systems, all with different escaping rules, some
> without connection based escaping).
> 4. Text being supplied was coming from a pre-cleaned, trusted, source, even though the variable
> was GPC, (example: it was a GET string assembled by a batch job that was pulling from another
> database)
> 
> I'm sure there are many more.
> 
> Basing language decisions on personal perceptions of "rarely" and
> "frequently" is not a good practice, but ensuring that we are covering multiple use-cases
> is. Discussing the use cases doesn't mean I think the idea is without merit. 
> 
> -Ronabop



Thread (45 messages)

« previous php.internals (#87498) next »