• Hello WordPress community 🙂

    I am currently working on a research project, which compares self-hosted to cloud-hosted websites on the example of WordPress.
    For this, I need to automatically separate WordPress websites from non-WordPress websites and since I am not very proficient with WordPress, I wanted to ask you if you know of any features that identify a WordPress website (like specific HTTP headers or anything else that could help).

    So far, my WordPress identification involves checking if a login is present on /wp-admin or /wp-login in addition to parsing the HTML in order to determine how many CSS classes and href attributes contain the string “wp-“.
    Because the login could also be provided via a different source and non-WordPress sites could potentially use WordPress CSS stylesheets (or vice versa), I am not yet confident with my identification process, but I lack in depth knowledge about WordPress to further refine it.

    If anyone has better ideas/more insight into WordPress, I would be really thankful about your suggestions!

    PS: Also, if you have any info on anything that is specific to self-hosted WordPress instances, that would help me a lot as well.

    Cheers

Viewing 2 replies - 1 through 2 (of 2 total)
  • You would have to check several ways to do this, and as soon as one applies you have found a WordPress.

    The existence of /wp-admin/ is a good indication, but it can be changed. wp-login.php is much more likely to be present. You could also look for the meta tag “generator” in the source code of the start page, which must contain WordPress. And then there are the readme.html and license.txt files in the root directory, but these are not always there either. Or the REST API under /wp-json/.

    Thread Starter user01234567

    (@user01234567)

    Sorry, for the late reply and thank you so much for your input.

    The login php and the API sound like helpful indicators. Readme and license could also be present in the root directory of non-WP web servers, but I am not sure how common that is.

    I’ll probably perform a small proof of concept in the next weeks and analyze the accuracy manually afterwards. If I find something interesting, I can just add it here if you’re interested.

Viewing 2 replies - 1 through 2 (of 2 total)
  • You must be logged in to reply to this topic.