Avoiding Chromedriver site detection on Apple Silicon Macs

To this day (2022-11-03 as of this writing), there are still many sites which have a plethora of useful information, that you may even pay for, but that data isn't accessible via any sort of API cough banks cough. A pretty common scenario is to use some sort of automation tool to essentially screen scrape the data so that you can do whatever you like with it. For a variety of valid and invalid reasons though, site makers have a vested interest in preventing people from doing this. Think of scalper bots which will flood a retailer so they can get in-demand things, only to resell them at huge markups (PS5 comes to mind).

As a result, though, these companies have implemented tools to detect if automation software is controlling a browser and will cause their site to not operate properly. They'll even be sneaky about it and not tell you why the site isn't working.

One common way these sites working is by scanning the javascript variables being used and detecting one that's commonly running in chromedriver. It's somewhat random and changes frequently but usually start's with $cdc.

A fairly common workaround for this is to replace the variable name in the chromedriver binary to something different. You can do this in a HEX editor or with a simple perl command

perl -pi -e 's/cdc_/dog_/g' `which chromedriver`

This little trick worked like a charm in so many instances. When Apple switched to Apple Silicon however, that all changed. As a security feature, whenever a native ARM64 binary is executed, Gatekeeper will first verify the binary's signature to make sure it hasn't been tampered with. With Intel, this was an optional. If you modify chromedriver with the script above, the binary will fail to execute silently. You need to go to the macOS Console to see what's going on, where you'll find this error

Exception Type:        EXC_BAD_ACCESS (SIGKILL (Code Signature Invalid))

You could disable Gatekeeper, but that's a really bad idea since it's there to protect you. Instead, a better workaround is to re-sign the binary after the modification has been completed. You simply need to remove the existing signature and then replace it with a brand new one that you've created.

codesign --remove-signature $CHROMEDRIVER 
codesign --force --deep -s - $CHROMEDRIVER

So until your site exposes a proper, and secure API, or until Apple decides to disallow self-signed binaries, this trick will have to do.

Important note 1: Screen scraping may violate the terms of service agreements, so be careful on that front. The cost of violating that service may not be worth the benefits you get from creating basic automations.

Important note 2: While there are some genuinely great bots out there which do great things (e.g scanning all the COVID vaccine portals to help people book appointments), the vast majority are used to scalp in demand goods and resell them at stupid prices. I hope those bots get detected and suffer severe consequences as they're adding no value to society. They prey on people and create chaos, all to enrich a select few individuals. It's highly unlikely people reading this post are going to use the knowledge to deploy bots like this since those are often deployed to Linux servers, but if for some reason that is you, please consider applying your talents elsewhere, help someone in need.

Important note 3: This post was written as a summary of a question I asked on Stack Overflow. I'd like to thank the community members there who helped answer my question as this was driving me nuts.