Code Analysis
There have been at least three situations in my web work, where a check of Google Search Console helped me figure out that a site was hacked. In each of the three situations, what I found in Google Search Console was different, but in each case, the problem was a hacked site. Here is an analysis of the most recent situation.
“%3C?php%20bloginfo(‘comments_rss2_url’);%20?%3E”
A client’s site was hacked. In her Google Search Console, there were URL Not found errors where the address was made up of a legitimate address with the code above attached.
That bit of ugliness is someone trying to use the address to hack into the site! There are two parts to interpreting this string of code, 1) the URL Encoding characters, 2) the PHP code. In web development, there is usually more than one type of code involved, even inside a single line of code.
The URL Encoding (UTF-8)
Notice how there are three % characters followed by other characters.
%3C = <
%20 = space
%3E = >
So, this is actually code to run PHP. <?php bloginfo(‘comments_rss2_url’); ?>. You can’t put <, >, or spaces in a URL, so they are replaced with those codes.
The PHP Function Call
The bloginfo( ) part of the code will run a function that is part of the WordPress core. The bloginfo( ) function can get many different pieces of information about your site.
Notice that this particular instance is for the rss feed that is also a function of the WordPress core. “‘comments_rss2_url‘ – Displays the comments RSS 2.0 feed URL (/comments/feed).” Before I found this in Google Search Console, my client found references to viagra and other extraneous information in her Google search results. Hackers have quite a lot of ways of hiding their bad stuff in your website.
How Will Injecting PHP into a Link Help the Hacker?
The next question is, “How does doing this help the hacker?” I had some vague ideas, but not enough specifics, so I posted my question in the WordPress Development group on StackExchange.
Some months ago, a former client came to me with a hacked website. The site had not been maintained for 2 – 3 years. In the rush to rebuild the site, I didn’t take the time to tear down the compromise, and now I’m trying to figure out what WordPress processes were involved.
The only two pieces of evidence I have are that the client noticed strange SERPs and that some of the URL not found links in the Search Console have the pattern listed in the title.
My guess is that the hacker injected spam comments through a vulnerability and then used the RSS feed somehow to broadcast the spam. My question is whether my guess is correct, and could I have a few more of the particulars without going to the code level.
I find that when I can supply clients with some details about how the hackers use WordPress structures, they are more likely to agree to the costs of site backups and updates.
One person gave a very interesting answer ***.
The hacker is trying to find a site that is not properly sanitizing user input. If that is not done, the query string of the URL could be interpreted as a PHP command. With that command, the hacker is trying to get the comment RSS link, for further possible attack of the site. WP won’t allow that, as it santizes things. Although perhaps a plugin or theme might not have the same ‘best practice’ of query sanitation. So the hacker is just ‘probing’ to see if the site is not properly configured. If so, they will try a further attack later.
The First Time was in 2009
At that time, Google Search Console was called Google Webmaster Tools. I was downloading a file set from a new client who had a previously existing site, and my antivirus software started flashing warnings. It’s hard to believe that was my first personal experience with a hacked site. But, in preparing for teaching web development courses, I had gained some basic understanding. A few years before, I had been part of a CS class team that did a project on website security. At that time of this first hacked site experience , my web development team was mostly building static sites, which didn’t give the hackers much to hack. Hacked sites were what happened to other people until this happened!
Google Webmaster Tools provided confirmation that the site was hacked. The site was ranking well for a whole lot of irrelevant terms. When viewers are allowed to fill out a form, hackers will fill out that form too. If the form software is not kept secure, it’s a doorway for all kinds of bad code. If the form is secure, the hacker’s entries are rejected. In this case, the software that made the form work was not secure. That was a long time ago, and this software is probably not in use any longer, but it was called FCKEditor. Here is a link to a page that describes the long list of vulnerabilities in this script.
It wasn’t very difficult to find the code the hacker added to the site, in this case, because it was all in one folder. I have never seen a hacked site since that was so easy to fix. Now, the hackers scatter bad code throughout the whole site to make it difficult to clean up.
*** Note: You have to have a very thick skin to post in StackExchange and StackOverflow. While one developer thought my question was “off topic” another thought it was a valuable question ans provided an answer. In fact, I have had quite a lot of help from other questions they decided were off topic.