5

Im searching for a native way in node.js to parse http messages. No matter where the messages come from, a simple hard coded string or via network tcp/udp or custom Duplex streams that are treated as "sockets".

As search results i found only outdated answers with process.binding and not existing private properties/methods.

What would be the "correct" way to parse incoming http messages with build in modules?

(As comments i accept any hint/package, but the goal should be to use only build in node modules)

Thanks in advance.

9
  • 2
    parse-raw-http has no dependencies and about 100 lines of code.
    – Amadan
    Commented Apr 27, 2022 at 9:24
  • @Amadan Thanks for the hint, but the package has issues open since 2018. And last commit 5 years ago. But looks not that bad.
    – Marc
    Commented Apr 27, 2022 at 9:48
  • 1
    I meant it more as inspiration, not as a library.
    – Amadan
    Commented Apr 27, 2022 at 10:22
  • Indeed, its a good starting point! But i cant/want believe there is no way to let node do that itself. I mean, all needed components are there.
    – Marc
    Commented Apr 27, 2022 at 10:23
  • @Marc have you looked at the node source?
    – Evert
    Commented Jun 30, 2023 at 6:52

2 Answers 2

5
+50

I hope you'll forgive me if I start with a story.. it kind of gives a bit of context.

A long time ago (around 2001/2002) I decided I needed to learn more about web programming. I already knew how to write HTML and load it in the browser. I had heard about this thing called CGI but it required me to install a server.. something I thought I did not have time to learn. So armed with my bachelor's degree and the 1 or 2 years of "professional" programming experience I thought the obvious thing to do was learn how HTTP works.

So I started reading the HTTP RFC: https://www.rfc-editor.org/rfc/rfc2616

Turns out that HTTP is really simple but has a lot of details.. most of which are optional. I needed a way to know what I can ignore and what I must implement because I didn't have time to implement the full spec (I had a day job to do). Google just came out a few years back (too late to help me with my dissertation at uni but just in time for my first job) so I thought I'd google "minimal http implementation". These days you'll probably get different results but back then it gave me James Marshall's excellent HTTP Made Really Easy (which is still online today).

So the answer you're looking for is to read the HTTP Made Really Easy page.

TLDR

Due to stackoverflow's policy it would be rude for me to just leave you with a link. However I still strongly suggest you read that article. In any case, here's the TLDR:

  1. HTTP requests are just plain text. It has the simple format of:

    Request Line
    Header
    Header
    Header
    
  2. Each part of a HTTP request is separated by a new line

Note: Technically they should be \r\n but you are strongly encouraged to also accept \n as a newline.

  1. A HTTP request is terminated by two newlines

Note: Technically they should be 4 bytes: \r\n\r\n but you are strongly encouraged to also accept 2 byte terminator: \n\n.

  1. The format of the request line is:

     METHOD path PROTOCOL_VERSION
    

    METHOD is the HTTP method such as POST, GET, PUT, DELETE etc. Typically they should be upper case.

    The path is the url path typically expected to be the path of the file you're requesting but in more modern times is more typically an endpoint processed by a web framework.

    The protocol version is in the format:

     HTTP/1.1
    

    Normally you can ignore this.

  2. Parts of request line is separated by a space character. Technically there should be only one space though I've seen badly malformed requests that send multiple spaces. Browsers will never send more than one space.

  3. Headers are in the format:

     Header-name: header value
    
  4. Header name can be either title-case or lowercase or mixed, all are valid.

Knowing this, parsing HTTP is actually fairly simple:

// Pseudocode:

let headers = {};
let method = '';
let path = '';

while (1) {
    input = read();

    buffer.append(input);

    if (buffer.contains('\n\n') || buffer.contains('\r\n\r\n') {
        raw_request = buffer.split('\n');
        request_line = raw_request[0].split(' ');
        method = request_line[0];
        path = request_line[1];

        for (let i=1; i<raw_request.length; i++) {
            let header = raw_request[i].split(':');
            headers[header[0].toLowerCase()] = header.slice(1).join(':');
        }
        break
    }
}

Obviously you shouldn't use a blocking while loop to read form I/O in javascript because it wouldn't work but you get the general idea.

There are additional things you need to handle such as reading the Content-Length header to determine when a POST request is completed, how to parse POST form-data (most js frameworks don't do this - they require you to use an additional module to parse request body) etc. but this should get you started with a minimal viable implementation that you can continue to add features in order to handle all the different request types.

0

Did some research and found that Node.js documentation. Look at the http module in the Node.js docs it includes some examples, such as how to create a basic HTTP server and client: https://nodejs.org/api/http.html

This tutorial shows how to parse: https://www.digitalocean.com/community/tutorials/how-to-create-a-web-server-in-node-js-with-the-http-module

Heres and example to do what you want with a server to help output things:

const http = require('http');

const server = http.createServer((req, res) => {
  console.log('Received a request');
  console.log('Headers:', req.headers);
  console.log('Method:', req.method);
  console.log('URL:', req.url);

  let body = [];
  req.on('data', chunk => {
    body.push(chunk);
  }).on('end', () => {
    body = Buffer.concat(body).toString();
    console.log('Body:', body);
    res.end();
  });
});

server.listen(8000, () => {
  console.log('Server listening on port 8000');
});

update:

due to your comment I will add, Node.js does not expose its HTTP parser as a public API. The http module does include a parser, but it's used internally and is not accessible from outside the module.

the process.binding method is discouraged in recent versions of Node.js, stated in process documentation:

The process.binding() method is used to load core modules built into Node.js and primarily to provide access to a set of ECMAScript Modules internal to Node.js that implement various APIs. It is used primarily by Node.js internal code. In general, userland code should prefer the> public APIs provided by the various core modules over using process.binding(). The module identifiers passed to process.binding() lack the stability guarantees provided by require('module'). Non-internal modules should not use process.binding(). In general, using process.binding() should be considered the same as using a private API for a third-party module. Any change could break the code> that uses it.

https://nodejs.org/api/process.html#process_process_binding_id

update:

const httpStringParser = require('http-string-parser');

const requestData = 'POST / HTTP/1.1\r\nHost: www.example.com\r\n\r\nHello=World';
const parsedRequest = httpStringParser.parseRequest(requestData);
console.log(parsedRequest);

via your comments the above is the only other method I can offer for your needs. the implementation will be complex and depend on your needs for UDP vs TCP.

4
  • The question is not how to create a http server or client. It is about to parse incoming http messages/requests. And how to parse a http request with the build in node.js http parser.
    – Marc
    Commented Jul 5, 2023 at 8:38
  • by creating an HTTP web server, you can parse incoming http messages/requests. that is how the internet works as talked about in the blog link I shared. HTTP is just plan text but can also be structured as JSON or other types. your question is really about learning HTTP as a protocol. you may not be interested in learning ways to program using that protocol. Im hoping someone else who find this post may see value. Next step is learning API's with HTTP and HTTPS. you should focus on HTTPS for prod
    – BlackFox
    Commented Jul 5, 2023 at 8:51
  • Read my question again. The focus is not on http per se. Its about parsing, with different kind of sources, like udp sockets, hard coded strings etc. pp. Not about creating a web/http-server.
    – Marc
    Commented Jul 5, 2023 at 9:20
  • see update and please add more detail to your question.
    – BlackFox
    Commented Jul 5, 2023 at 9:33

Not the answer you're looking for? Browse other questions tagged or ask your own question.