Skip to content

Code written for research with the Wesleyan Media Project on Google political ads

Notifications You must be signed in to change notification settings

dknopf/WesleyanMediaProjectResearch

Repository files navigation

WesleyanMediaProjectResearch

Google Ad Data Creatives Webscraper

In order to run the AdCreativeWebscraper.py you must install the selenium python package. This also relies upon the CSV package and the itertools islice package. You MUST also install the correct chromedriver from https://chromedriver.chromium.org/downloads based on your Chrome browser version. This can be found at https://www.whatismybrowser.com/detect/what-version-of-chrome-do-i-have.

The webscraper takes in a CSV and outputs a new CSV. The filterAds.py function takes a CSV in the form of the google-political-ads-creative-stats file from the political ads transparency report found at https://transparencyreport.google.com/political-ads/region/US?hl=en. filterAds.py filters the ads so that only the US ads beginning on a specific date are included.

The webscraper opens a headless (no visible window) version of chrome that goes to the URL of the ad creative given in the original google CSV and scrapes for the text of the ad, adding that text to the end of the CSV. The webscraper waits 0.75 seconds for the page to be loaded, and if it cannot find the XPATH of the ad it assumes that the ad has violated Google's ad terms, which makes it inaccessible. The webscraper takes about 14 hours to run on a year's worth of ads.

TestCloudAd... is a version of the webscraper that runs on a virtual machine. he filteredAds.csv is a temporary CSV of ads, but it does not contain the most recent snapshot of ads. AnalyzeData is a function I wrote to do some basic analysis to get the number of ads in total and the number of ads that violate the ad policy.

About

Code written for research with the Wesleyan Media Project on Google political ads

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages