Skip to content

Code used to build a Tracker Radar data set from raw crawl data.

License

Notifications You must be signed in to change notification settings

duckduckgo/tracker-radar-detector

Repository files navigation

DuckDuckGo Tracker Radar Detector

This is the code used to build a Tracker Radar data set using crawl data from the Tracker Radar Collector.

Getting Started

To generate a Tracker Radar data set follow these steps:

  1. Clone the Tracker Radar data repo

  2. Generate 3rd party request data using the Tracker Radar Collector

  3. Update the paths in config.json to point to your newly created crawler data files and the location of your Tracker Radar data repository

trackerDataLoc path to your Tracker Radar data repository
crawlerDataLoc path to your crawler data directory
performanceDataLoc path to your performance crawler data
nameserverListLoc path to your nameserver to entity file

Generating Tracker Radar data

  • Install dependencies

npm install

  • Build site performance summary (optional)

npm run build-performance

  • Update entity data (optional) note: requires some manual validation of the output data, see here for more info
npm run update-entities
npm run apply-entity-changes
  • Build Tracker Radar data files

npm run build

Note that if you wish to resolve CNAME's, node version 12+ is required. You can disable CNAME resolution by setting the option treatCnameAsFirstParty=true and keepFirstParty=false in the config file.

Postgresql data source

Crawler data can also be read from a PostgreSQL database. To enable this, set the crawlerDataLoc to postgres, and set the crawlId and region options in config.json. Database details should be provided via environment variables, for example with envdir:

envdir /etc/ddg/dbenv/tracker_radar_readonly/ npm run build

See the node-postgres documentation for more details on connection options.

Nameserver list file

To assign entity/domain ownership using groups of nameservers you can provide a nameserver list file.

The format of the nameserver list is:

[
    {
        "name": "entity name, must match name in Tracker Radar /entities file"
        "nameservers": [
            nameserver1,
            nameserver2,
            ...
        ]
    }
]

Contributing

Reporting bugs

  1. Check to see if the bug has not already been reported
  2. Create a bug report issue

New features

Right now all new feature development is handled internally.

Bug fixes

Most bug fixes are handled internally, but we will accept pull requests for bug fixes if you first:

  1. Create an issue describing the bug.
  2. Get approval from DDG staff before working on it. Since most bug fixes and feature development are handled internally, we want to make sure that your work doesn't conflict with any current projects

Questions or help with anything else DuckDuckGo related?

See DuckDuckGo Help Pages.

This software is licensed under the terms of the Apache License, Version 2.0 (see LICENSE).

About

Code used to build a Tracker Radar data set from raw crawl data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published