A friend asked me to help her manipulate the results of an online poll. Being a challenge-lover, and not having done anything like this before, I gladly accepted.

By using the Inspector tool in Firefox, we can clearly see that the vote button calls the Javascript function chkCapPass(id), which, along with other related code, are listed below: (Comments are removed since they are in Chinese anyway.)

function chkCapPass(id) {
    $.ajax({
        url: "http://hidden/vote/chkcapnum",
        type: "post",
        dataType: "json",
        data: {},
        success: function(response) {
            if (response.status == "faild") {
                alertify.alert(response.errMsg, function() {});
            } else if (response.status == "success") {
                if (response.num > 0) {
                    send_vote(id);
                } else {
                    $('.modal-content').html('');
                    $.ajax({
                        url: "http://hidden/vote/votecaptchain" + "/" + id,
                        type: "get",
                        success: function(response) {
                            $('.modal-content').html(response);
                        }
                    });
                    $('#showcap')[0].click();
                }
            }
        }
    });
}
isvote = "";

function send_vote(id) {
    if (isvote != '') {
        alertify.alert(isvote, function() {});
        return false;
    }
    $.ajax({
        url: "http://hidden/vote/to_vote",
        type: "post",
        dataType: "json",
        data: {
            voteid: id,
            captchain: $('#vote_captcha').val()
        },
        success: function(response) {
            if (response.status == "faild") {
                alertify.alert(response.errMsg, function() {});
            } else if (response.status == "success") {
                if ($('.close').length > 0) {
                    $('.close')[0].click();
                }
                add_vote_num(id);
                isvote = "您今天已投過此項目";
                showlotteryview(response.chk_val, response.lottery, id);
            }
        }
    });
}

Combined with the network tab and manually sending a vote, it’s trivial to see that the process is as follows:

  1. Fetch /vote/chkcapnum to see if a CAPTCHA should be displayed
  2. If so (in my tests, it seems to always be the case), get the CAPTCHA page from /vote/votecaptchain/[id]
  3. Fetch the CAPTCHA image from /captcha/chkcode (link from the previous step)
  4. Send a POST request to /vote/to_vote with the id and CAPTCHA string

After understanding the requests, we need to programatically solve the CAPTCHAs.

I tried to use Tesseract for the job. However, a lot of the Python wrappers need some pip modules which isn’t that straightforward when installing on Windows, like PIL or OpenCV. (Asking the friend to install Linux in Virtualbox is, of course, difficult.)

Thus, I decided to directly call the command line program by subprocess.Popen. Dirty, but works. However, another problem arose: it does not seem to recognize the characters at all.

Then, I found a snippet that uses OpenCV to strengthen the image before feeding into Tesseract. However, as mentioned above, it seems that you need to jump through a lot of hoops to install OpenCV on Windows.

Thus, I launched GIMP, and used its “threshold” function. Essentially, this forces all pixels below a certain threshold into black, and those above into white. Sure enough, it worked like a charm.

Of course, I didn’t want to call such a large program (in fact, I don’t even know whether GIMP has a CLI). So I turned to ImageMagick, one of the most famous CLI image processing software.

After some trial an error, the following parameters seem to work best:

convert captcha.png -black-threshold 60% -white-threshold 40% captcha-mod.png

Also, it is worth mentioning that calling Tesseract like the following:

tesseract captcha-mod.png captcha-txt digits

Limits the results to numbers, which can potentially increase the accuracy here.

After trying the code out, however, it seems like the server blocks your IP if you send the requests too frequently. Of course, I could add some code that fetches a list of proxies from the Internet and rotate through them, but frankly I’m a bit lazy. So in the end I just added a simple time.sleep(30) in the loop.

The following is the resulting full code: (The website and vote id have been hidden/modified.)

from urllib import request, parse
from http.cookiejar import CookieJar
import subprocess
import time

while True:
  cj = CookieJar()
  opener = request.build_opener(request.HTTPCookieProcessor(cj))
  headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:46.0) Gecko/20100101 Firefox/46.0',
          'Accept-Language': 'zh-TW,en-US;q=0.7,en;q=0.3',
          'Accept-Encoding': 'gzip, deflate',
          'X-Requested-With': 'XMLHttpRequest',
          'Referer': 'http://hidden/project/inside/79/'}
  opener.addheader = headers

  opener.open('http://hidden/vote/chkcapnum')
  opener.open('http://hidden/vote/votecaptchain/123')

  resp = opener.open('http://hidden/captcha/chkcode')
  with open('captcha.png', 'wb') as pict:
      pict.write(resp.read())

  subprocess.Popen('Imagemagick\\convert.exe captcha.png -black-threshold 60% -white-threshold 40% captcha-mod.png').wait()
  subprocess.Popen('Tesseract\\tesseract.exe captcha-mod.png captcha-txt digits').wait()

  captchaStr = ''
  with open('captcha-txt.txt') as t:
    captchaStr = t.read().strip()

  params = parse.urlencode({'voteid': 123, 'captchain': captchaStr})
  headers.update({'Content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
          'Accept': 'application/json, text/javascript, */*; q=0.01'})
  opener.addheader = headers
  resp = opener.open('http://hidden/vote/to_vote', params.encode('ASCII'))

  print(resp.read())
  time.sleep(30)