How to moderate bbPress submissions that contain links

The most common trait of forum spam submissions is that they contain links. The code below (add it to your main wordpress install’s functions.php theme file) filters new bbPress topics and replies and if it detects a link, it marks the submission as “pending”, allowing moderators to review the submission in the back end before publishing it. The code is working on bbPress version 2.5.4.

The code, however, creates front end issues. If it is a new topic, the user is redirected to a page that contains the topic title but not the topic content. If it is a new reply, the page reloads with no indication of that the reply has been saved. These issues may be solvable with query variables and some jQuery, but in my case, almost all submissions that contain links are guaranteed to be spam, therefore user experience is not a big concern.

function bb_filter_handler($data , $postarr) {
    
   
   //If the post date and post_modified are the same, it is a new reply/topic. But if they are different,
   //it is a moderater editing the reply/topic (such as changing from pending to published status, 
   //therefore we let the data through without filtering. Without this admins/moderators won't be able to
   //change a reply/topic from "pending" status to "published".
if(  strtotime($data["post_date"]) != strtotime($data["post_modified"]    )  ) {
    
    return $data;
}
    
if(   ($data["post_type"] == 'reply' || $data["post_type"] == 'topic') && $data["post_status"] == 'publish'    ) {  

        $text= $data["post_content"];
        
        
        $regex = "((https?|ftp)\:\/\/)?"; // SCHEME 
        $regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass 
        $regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP 
        $regex .= "(\:[0-9]{2,5})?"; // Port 
        $regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path 
        $regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query 
        $regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor 
        
        
        
           if(preg_match("/$regex/", $text))  { 
                   $data["post_status"] = 'pending';
           } else {
                  //do nothing
           }    
    
    
}

 return $data;
 
}
add_filter( 'wp_insert_post_data', 'bb_filter_handler', '99', 2 );

Using jQuery and JSON to recover from a failed TablePress save

I was happily working away on my 700+ row table in TablePress, saving occasionally. Server issues came up and I was prevented from saving for a few hours. Eventually the server was back up again and I wanted to save, but I ran into the dreaded Ajax save failure message.

Even using shift+save did not work, taking me to the silly and useless Are you sure? WordPress page.

Refreshing the page would have meant losing many hours of work. I tried various ideas but all failed. The most desperate idea was to use jQuery to get the values of all the table cells, put them into an array, copy the string of the array, refresh the page, use jQuery to feed the array back into the cells. I tried to do it in Firefox, using the built-in inspector and Firebug, only to be reminded of how much I dislike Firefox’s slow and clunky inspector tools (I was using Firefox since it performs better than Chrome on super-sized web apps like a massive TablePress table).

So I needed a way to move my work to Chrome, but how? I saved the TablePress page as an HTML document on my computer, then opened it in Chrome. Saving the editor as an HTML document causes the values of the input fields to be saved, thus when I opened it in Chrome all the values of the cells where there.

Next, I used a jQuery bookmark to load jQuery on the page in Chrome, then I ran the following two lines in the console:

my_array = [];
$('textarea').each(function(){ my_array.push($(this).val()); });

The above code loads the values of the textboxes into an array. The Chrome console doesn’t have a way of letting you copy an object or array’s source code so that you can paste it somewhere else, therefore we have to improvise. We know that the console will print out the value of any object, and if it is a string, it will plainly print the string.

In the above example, we place the word “hello” in the variable x, then on the next line simply write the name of the variable and press enter, causing chrome to give us the string “hello”. As seen below, if type the name of an array variable, Chrome enables us to browse the values inside the array. This is usually helpful, but not this time, since we need the array in format that can be copied.

What we need is to stringify the array somehow. In this case, the JavaScript JSON API comes to the rescue. We place the array my_array inside the my_string variable using the line below:

var my_string = JSON.stringify(my_array);

Afterwards, we type my_string into the console, causing Chrome to show the plaintext version of the array:

We then copy the entire text (making sure to skip the beginning and end quotes added by the stringify function, since we won’t be needing them), then open the TablePress backend on a new tab, loading the table we were working on. The table will lack the cells we had added but could not save. Now we populate this working backend with the data we copied. We open the console, re-enable jQuery using the bookmark, and use the following line to load the text into an array. We do not have a need to use the JSON API’s parse function, since the plaintext is already a valid array initialization.

Below we see the array my_array, ready to be populated with the string we copied:

Next, we use the line below to add the values of the array into the table:

$('textarea').each(function(){ $(this).val(my_array.shift());   });

All done! In the first .each function above, we used my_array.push() to add values to the end of the array. To keep the values in order, we now use my_array_.shift(), getting items from the beginning of the array and feeding them to the textareas from first to last.

In this way I managed to get my work back. Another solution I could have tried would have been to see if WordPress could be forced to accept the data that it was rejecting (it was rejecting it due to an expired session or something like that). But such a solution may have required a lot more work and possibly modifications to the WordPress core, which is always risky and not fun.

How to automate and throttle Relevanssi indexing on large websites

First of all, update Relevanssi to the latest version. This significantly increased indexing performance on my 80,000+ page website.

Next, I created the following hacky solution for a problem that shouldn’t exist; the fact that Relevanssi cannot silently index everything without hogging all server resources. First find out the number of pages Relevanssi can index in one go without overloading your server, say 500. Then use the following Tampermonkey script on the Relevanssi settings page. You need Chrome’s Tampermonkey extension. Here’s what the script does:

  1. It enables jQuery on the Relevanssi dashboard.
  2. It waits 15 seconds, then clicks the “Continue indexing” button. Once the indexing is done and the page reloads, it waits 15 seconds, then clicks it again, and so on.
  3. Leave this running in a tab until all pages are indexed, then turn the script off and close the tab.

Below is the code:

// ==UserScript==
// @name       Relevanssi Index Button Clicker
// @namespace  http://hawramani.com
// @version    0.1
// @description  Click click click
// @match      http://mywordpressite.com/wp-admin/options-general.php?page=relevanssi/relevanssi.php
// @copyright  2014 jQuery, Ikram Hawramani
// ==/UserScript==


(function () {
 
    function loadScript(url, callback) {
 
        var script = document.createElement("script")
        script.type = "text/javascript";
 
        if (script.readyState) { //IE
            script.onreadystatechange = function () {
                if (script.readyState == "loaded" || script.readyState == "complete") {
                    script.onreadystatechange = null;
                    callback();
                }
            };
        } else { //Others
            script.onload = function () {
                callback();
            };
        }
 
        script.src = url;
        document.getElementsByTagName("head")[0].appendChild(script);
    }
 
    loadScript("https://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function () {
 
         //jQuery loaded
         console.log('jquery loaded');

        setTimeout(function(){$('[name="index_extend"]').click();},15000);
 
    });
 
 
})();

How cyber pirates anonymously torrent movies on the internet

For my views on Internet piracy see my essay: Why Digital Piracy is Ethical and Necessary

We all know that you, as a law-abiding citizen, would never download a car. And yet there are people out there who download movies for free and refuse to add a few more bucks to the billions of dollars that movie studios squat upon. There are film executives who, thanks to cyber pirates, only have a net worth of $100 million instead of $101.

So how do they do it? How are these cyber criminals subverting our democracy and freedoms to acquire knowledge and entertainment for free without making the wealthy even wealthier? It all burns down to three simple letters: I2P.

I2P, or the Invisible Internet Project, is a project that enables anyone anywhere to download information in a way that makes it impossible for anyone to track them or reveal their identity. Many experts at the CNN agree that our democracy is in great danger when we freely allow citizens to practice speech that is genuinely free. Speech needs to be controlled and approved by the government, for our own security, and most importantly, the safety of our children. The cyber police work tirelessly to prevent free speech from actually taking place. But the pirates have found a home in I2P where no one can catch them.

I2P is slightly like TOR, which you may have heard of. However, unlike TOR, I2P is not used to browse normal internet sites (though it can be used that way), rather, it has its own sites, such as stats.i2p. And unlike TOR, I2P supports and encourages torrenting; it even has a built-in torrent client that is ready to go as soon as you install I2P.

Cyber pirates follow the following steps when they download high quality Blue Ray movies, ebook and textbook collections, and the latest Battlefield video game anonymously. We can show you the steps since downloading, installing and using I2P is perfectly legal under current laws (so long as you do not intentionally seek out and download copyrighted movies, books, songs, etc., see step 18 below for more clarification on this).

  1. First, they visit the I2P site to download the I2P software:
    If the site is for some reason down or has moved, they can easily find the new site by Googling “download i2p”:
  2. Then they click the I2P download link to download the I2P software:
  3. Below is a picture of the finished I2P software download:
  4. They may then do a signature check to make sure their version of I2P has not been tampered with. You can read TOR’s guide for how to do this, and apply the same logic to I2P.
  5. Once I2P is installed, they do not run it. They will set up a browser to be fully dedicated to I2P. This means that the browser will be able to browse I2P websites, but not ordinary internet sites. In our example we show how the Opera browser can be configured to handle I2P. They click on the Opera button, then point to Settings->Preferences:
  6. Then they click the Advanced tab:
  7. Then they click on the Network section, then the “Proxy servers” button:
  8. Then they make the following changes to the window that pops up, then click “OK”:
  9. Once they are done setting up Opera, they start I2P. There are two programs, and it doesn’t matter which one you run, the only difference is that the second one has a restart option. In our example we show you the restartable one:
  10. The I2P Service window shows up for them. Here they wait a little while for the program to fully start up.
  11. If all goes well, their computer launches their default browser, which could be Internet Explorer. While they do not want this, it is useful for getting the address to the I2P service. Thus they copy the address shown.
  12. They go to Opera and paste the address in the address bar. Then they drag the icon where it says “Web” to the bookmarks bar for easy navigation in the future.
  13. They wait a while as their I2P program becomes integrated into the worldwide network. They watch these two indicaters on the I2P homepage. Once they are green, they know they are good to go:
  14. Now, they click on the “Torrents” link at the top of the I2P Console.
  15. They are taken to I2PSnark, which is the built-in torrent client for I2P. Currently the client is empty since we haven’t added any torrents. They click on the “Postman” link to take them to the Postman tracker, which is the largest torrent tracker on I2P. There is also the Diftracker link, which is another tracker.
  16. Depending on how long the I2P program has been running, the Postman website will open immediately or after a while. They may also get a “Proxy server error” kind of page, which is nothing to be worried about, they will simply try the website again in 5-10 minutes.
    In the image it can be seen that the sneaky anonymous cyber pirates have uploaded torrents for a movie called Let the Right One In and a video game called Wasteland 2.
  17. Since we are perfectly law-abiding citizens, we will show an example of downloading a legal non-copyrighted file from the Postman I2P bittorrent tracker. But the pirates download movies and other files, committing copyrighted infringement. Of course, nobody, government or otherwise, can catch them do it, since everything is fully anonymous and encrypted. So they get away with downloading their favorite movies without making the super wealthy even wealthier. The communism!
    Here, to find a legal file, we put the keyword “pdf” into the search box so that we only see ebook files, some of which are copyright-free and legally distributable.
  18. Here is an example of some of the books that came up. We find some German magazine, a book by John Gray for clueless men trying to lead a politically correct life, two sex guides for autistic individuals, some convoluted self help nonsense, and a book for antenna nerds. These are all copyrighted books, therefore we will have to skip them; we wouldn’t download a car, so why would we download books? Of course, balaclava-wearing cyber pirates do not skip them just because they are copyrighted, since they know the cyber police have no way of catching them, since they are using I2P.
  19. After a very, very long time, we find a book that seems copyright-free.
  20. Here, the pirates will right-click the magnet icon on the left of the book title and click “Copy Link Address”.
  21. Then, they will go back to I2PSnark, paste the link in the “From URL” box, then click “Add torrent”.
  22. Below we see that the torrent has been added to the list of torrents. The word “Magnet” ahead of all those numbers tells the pirate that the torrent file hasn’t been fetched yet (it usually takes a minute or two). Once it is, the name for the torrent will be shown.
  23. Below you can see the finished torrent being seeded. We have blacked out the names of the other torrents for undisclosed reasons. Seeding is also perfectly anonymous; therefore pirates often leave many torrents running in the seeding mode to help other pirates download things faster. Due to all of the cryptography that happens, downloading more than 5 torrents at the same time can cause significant CPU usage.

No Reddit table maker? How to easily put a large table on reddit

It is pretty easy to put a table on reddit. Below are step-by-step instructions:

  1. If the data is on a website, or in an Excel spreadsheet (you can skip these if not):
    1. Create a new spreadsheet and paste your data in there.
    2. Save your data as a comma-separated text file (CSV).
    3. Open the file in notepad (or another plain text editor) and copy.
  2. Go to Truben.no.
  3. If you copied your data from a CSV (as in step 1), go to file->import and choose CSV, then paste what you copied in step 1.3.
  4. If you are making a table from scratch, simply enter your data into the boxes you see. Use the menus to modify the table to fit your needs.
  5. Once you have your data, click on the Markdown tab.
  6. Copy the text you see below the tab and paste it in the reddit editor.

Using query_posts() as if it is get_posts()

Some filters work only with query_posts(); but what if you wanted to use one of these filters in a situation where you would normally use get_posts()? Below is the translation:

Original get_posts() query:

$args = array('orderby'=> 'title', 'order' => 'ASC','fields' =>'ids');

$posts_array = get_posts($args);

Translated to query_posts():

$args = array('posts_per_page'=>-1, 'orderby'=> 'title', 'order' => 'ASC','fields' =>'ids');
// the -1 means return all posts, without it you will get the
// number of posts you've set your blog to show per page

$posts_array = query_posts($args);

// do your thing here

wp_reset_query(); // this stops your get_posts() query from affecting other functions;
                  // without it functions like is_single() will break

How to ignore accents and other diacritics in WordPress/MySQL search (Arabic, French, etc.)

On my new Asmaa.org website, which is an Arabic-language baby name resource, I use a simple loop to show the posts in alphabetical order. Each post title is a baby name:

$args = array( 'paged' => $paged, 'orderby'=> 'title', 'order' => 'ASC',  'cat' => $cat_id);
query_posts($args); ?>
while ( have_posts() ) : the_post()

Since the Arabic alphabet is an abjad, most vowels are added to a word as diacritical marks. This has the unfortunate consequence of causing علم and عَلَم, two words that should be shown very close next to each other, to be shown miles apart in an alphabetical sort.

I solved the issue with this WordPress filter:

add_filter('posts_orderby', 'cleanse_diacritics');

function cleanse_diacritics($d) { //$d is this string: 'wp_posts.post_title ASC' (or sth similar) in a default WordPress install
                          //assuming you are sorting alphabetically ascending
    if(strpos($d,'title') !== false) { //if the string 'title' is in the orderby query, we know that
                                       //we are dealing with an alphabetical sort.
                                       //no need to mess with other queries like order by post_date

// below we replace the default order query WordPress passes to MySQL by
// using a whole bunch of replaces to remove diacritics from the sorting
        $d = 'REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(HEX(REPLACE(
wp_posts.post_title, "-", "")), "D98E", ""), "D98B", ""), "D98F", ""), "D98C", 
""),"D991",""),"D992",""),"D990",""),"D98D","") ASC';
    }
    return $d;
}

I got the nested MySQL replace() functions from this StackOverflow answer.

Explanation: When you run a query_posts(array('orderby' => 'title') function or something similar, the posts_orderby filter can be used to modify the order by part of the MySQL query. We wrap the name of the relevant MySQL column in replace() functions to remove all diacritics using their hex UTF-8 code units, which results in a diacritic-insensitive sort.

If you are dealing with a language other than Arabic, you may need to replace a code with another code (é [C3A9] to e [65] for example) instead of replacing with an empty string.

Considerations

The filter posts_orderby does not seem to work with get_posts(). There is a workaround however; see: Using query_posts() as if it is get_posts().