Skip to content

wikis Search Results · repo:privacy-tech-lab/gpc-web-crawler language:Python

Filter by

7 results
 (110 ms)

7 results

inprivacy-tech-lab/gpc-web-crawler (press backspace or delete to remove)

Welcome to the GPC Web Crawler Wiki Here we document the specific settings for the crawler and steps for deployment. Please feel free to take a look at our documentation in the sidebar.
  • Last updated
    on Jun 20

This page provides guidance for running and saving data from a crawl of our full 11,708-site dataset. Performing a full crawl consists of crawling crawl-set-pt1.csv - crawl-set-pt8.csv (our crawl set divided ...
  • Last updated
    on Jun 21

1. Install and Configure XAMPP Install xampp from here: https://sourceforge.net/projects/xampp/files/XAMPP%20Mac%20OS%20X/8.0.2/. After installing, there will be xampp folder in your Applications. Open ...
  • Last updated
    on Jun 21

Our crawler is developed using Selenium. We use a Selenium built-in function driver.installAddon(PATH_TO_EXTENSION) to install our analysis extension on the automated browser. This function only takes ...
  • Last updated
    on Jun 21

After making modifications to the OptMeowt Analysis extension, it must be re-packed into an XPI file (instructions here). Then, it can be tested to ensure it is analyzing sites properly. The extension ...
  • Last updated
    on Jun 21

Navigate to the rest-api folder, and run npm install Create a file named .env with the following information: DB_CONNECTION=mysql DB_HOST=localhost DB_DATABASE=analysis DB_USERNAME=root DB_PASSWORD= To ...
  • Last updated
    on Jun 21

This page provides guidance for running and saving data from a crawl of our full 11,708-site dataset. Performing a full crawl consists of crawling crawl-set-pt1.csv - crawl-set-pt8.csv (our crawl set divided ...
  • Last updated
    on Jun 15