How can a number range and value be extracted from this complicated string using Python?

Question

I have a complicated string that includes a kilometer range and a fee for users that fall into that range. Ideally, I would like to transform the string into something that I could use to easily assign fees to users. I've taken the steps below to extract a list from the string, but the kilometer range is still not usable for classifying users:

import string
import unicodedata
import re
str = '{"row1":{"from":"0","to":"500","fee":"23100&nbsp;&nbsp;&nbsp; "},"row2":{"from":"500","to":"1000","fee":"24100&nbsp;&nbsp;&nbsp; "},"row3":{"from":"1000","to":"1500","fee":"25200&nbsp;&nbsp;&nbsp; "},"row4":{"from":"1500","to":"2000","fee":"26200&nbsp;&nbsp;&nbsp; "},"row5":{"from":"2000","to":"2500","fee":"27200&nbsp;&nbsp;&nbsp; "},"row6":{"from":"2500","to":"3000","fee":"28300&nbsp;&nbsp;&nbsp; "},"row7":{"from":"3000","to":"3500","fee":"29300&nbsp;&nbsp;&nbsp; "},"row8":{"from":"3500","to":"4000","fee":"30400&nbsp;&nbsp;&nbsp; "},"row9":{"from":"4000","to":"4500","fee":"31400&nbsp;&nbsp;&nbsp; "},"row10":{"from":"4500","to":"5000","fee":"32400&nbsp;&nbsp;&nbsp; "},"row11":{"from":"5000","to":"5500","fee":"33500&nbsp;&nbsp;&nbsp; "},"row12":{"from":"5500","to":"6000","fee":"34600&nbsp;&nbsp;&nbsp; "},"row13":{"from":"6000","to":"6500","fee":"35500&nbsp; "},"row14":{"from":"6500","to":"7000","fee":"36600&nbsp;&nbsp;&nbsp; "},"row15":{"from":"7000","to":"7500","fee":"37700&nbsp;&nbsp;&nbsp; "},"row16":{"from":"7500","to":"8000","fee":"38600&nbsp;&nbsp;&nbsp; "},"row17":{"from":"8000","to":"8500","fee":"39700&nbsp;&nbsp;&nbsp; "},"row18":{"from":"8500","to":"9000","fee":"40300&nbsp;&nbsp;&nbsp; "},"row19":{"from":"9000","to":"9500","fee":"41400&nbsp;&nbsp;&nbsp; "},"row20":{"from":"9500","to":"10000","fee":"42700&nbsp;&nbsp;&nbsp; "},"row21":{"from":"10000","to":"10500","fee":"43500&nbsp;&nbsp;&nbsp; "},"row22":{"from":"10500","to":"11000","fee":"44500&nbsp;&nbsp;&nbsp; "},"row23":{"from":"11000","to":"11500","fee":"45600&nbsp;&nbsp;&nbsp; "},"row24":{"from":"11500","to":"12000","fee":"46600&nbsp;&nbsp;&nbsp; "},"row25":{"from":"12000","to":"12500","fee":"47700&nbsp;&nbsp;&nbsp; "},"row26":{"from":"12500","to":"13000","fee":"48700&nbsp;&nbsp;&nbsp; "},"row27":{"from":"13000","to":"13500","fee":"49700&nbsp;&nbsp;&nbsp; "},"row28":{"from":"13500","to":"14000","fee":"50800&nbsp;&nbsp;&nbsp; "},"row29":{"from":"14000","to":"14500","fee":"51900&nbsp;&nbsp;&nbsp; "},"row30":{"from":"14500","to":"15000","fee":"52800&nbsp;&nbsp;&nbsp; "},"row31":{"from":"15000","to":"15500","fee":"52800&nbsp;&nbsp;&nbsp; "},"row32":{"from":"15500","to":"16000","fee":"52800&nbsp;&nbsp;&nbsp; "},"row33":{"from":"16000","to":"70000","fee":"52800&nbsp;&nbsp;&nbsp; "'
str1 = unicodedata.normalize("NFKD", str)
str2 = str1.translate({ord(c): None for c in string.whitespace})
s1 = str2.replace('{', "")
s2 = s1.replace('"', ' ')
s3 = s2.split(sep='},')

for i in range(len(s3)):
    ftm = re.search(r'from : (\d+) , to : (\d+)', s3[i])
    fm = re.search(r'fee : (\d+)', s3[i])
    if ftm and fm:
        vfrom = int(ftm.group(1))
        vto = int(ftm.group(2))
        vfee= int(fm.group(1))
        print(f"({vfrom}, {vto}), {vfee}")

I would like it to be instead transformed into something that extracts the lower and upper bounds of the range separately next to the fee, where I could later export into Excel and check which range the user falls between.

Where did you get this data from? Other than that it is missing }} at the end, this appears to be valid JSON. So it would probably make more sense, if you tried to get this data "complete", if you can, and then used a JSON parser ... — CBroe, Commented Jul 10 at 6:58
This needs to be fixed at the origin of the data. Do you get it using some web-scraping technology or perhaps an API. Please give more details — SIGHUP, Commented Jul 10 at 7:45
@CBroe You're absolutely right, I was completely unfamiliar with JSON. The string had the extra braces at the end, the fact that they're missing is my mistake. I didn't have access to a "source", they just gave me this at the office without any explanation so I was clueless. Thank you so so much. — Feiznia, Commented Jul 10 at 8:14
@SIGHUP It was correct in the origin, this was my mistake. I received it in text form from my manager, I had no idea it was JSON. — Feiznia, Commented Jul 10 at 8:15

SIGHUP · Accepted Answer · 2024-07-10 07:57:51Z

Your input data is almost valid JSON except that it's missing 2 trailing right-braces. If you fix that in code (albeit that it should be fixed at origin) then you can simply do this:

import json

_str = '{"row1":{"from":"0","to":"500","fee":"23100&nbsp;&nbsp;&nbsp; "},"row2":{"from":"500","to":"1000","fee":"24100&nbsp;&nbsp;&nbsp; "},"row3":{"from":"1000","to":"1500","fee":"25200&nbsp;&nbsp;&nbsp; "},"row4":{"from":"1500","to":"2000","fee":"26200&nbsp;&nbsp;&nbsp; "},"row5":{"from":"2000","to":"2500","fee":"27200&nbsp;&nbsp;&nbsp; "},"row6":{"from":"2500","to":"3000","fee":"28300&nbsp;&nbsp;&nbsp; "},"row7":{"from":"3000","to":"3500","fee":"29300&nbsp;&nbsp;&nbsp; "},"row8":{"from":"3500","to":"4000","fee":"30400&nbsp;&nbsp;&nbsp; "},"row9":{"from":"4000","to":"4500","fee":"31400&nbsp;&nbsp;&nbsp; "},"row10":{"from":"4500","to":"5000","fee":"32400&nbsp;&nbsp;&nbsp; "},"row11":{"from":"5000","to":"5500","fee":"33500&nbsp;&nbsp;&nbsp; "},"row12":{"from":"5500","to":"6000","fee":"34600&nbsp;&nbsp;&nbsp; "},"row13":{"from":"6000","to":"6500","fee":"35500&nbsp; "},"row14":{"from":"6500","to":"7000","fee":"36600&nbsp;&nbsp;&nbsp; "},"row15":{"from":"7000","to":"7500","fee":"37700&nbsp;&nbsp;&nbsp; "},"row16":{"from":"7500","to":"8000","fee":"38600&nbsp;&nbsp;&nbsp; "},"row17":{"from":"8000","to":"8500","fee":"39700&nbsp;&nbsp;&nbsp; "},"row18":{"from":"8500","to":"9000","fee":"40300&nbsp;&nbsp;&nbsp; "},"row19":{"from":"9000","to":"9500","fee":"41400&nbsp;&nbsp;&nbsp; "},"row20":{"from":"9500","to":"10000","fee":"42700&nbsp;&nbsp;&nbsp; "},"row21":{"from":"10000","to":"10500","fee":"43500&nbsp;&nbsp;&nbsp; "},"row22":{"from":"10500","to":"11000","fee":"44500&nbsp;&nbsp;&nbsp; "},"row23":{"from":"11000","to":"11500","fee":"45600&nbsp;&nbsp;&nbsp; "},"row24":{"from":"11500","to":"12000","fee":"46600&nbsp;&nbsp;&nbsp; "},"row25":{"from":"12000","to":"12500","fee":"47700&nbsp;&nbsp;&nbsp; "},"row26":{"from":"12500","to":"13000","fee":"48700&nbsp;&nbsp;&nbsp; "},"row27":{"from":"13000","to":"13500","fee":"49700&nbsp;&nbsp;&nbsp; "},"row28":{"from":"13500","to":"14000","fee":"50800&nbsp;&nbsp;&nbsp; "},"row29":{"from":"14000","to":"14500","fee":"51900&nbsp;&nbsp;&nbsp; "},"row30":{"from":"14500","to":"15000","fee":"52800&nbsp;&nbsp;&nbsp; "},"row31":{"from":"15000","to":"15500","fee":"52800&nbsp;&nbsp;&nbsp; "},"row32":{"from":"15500","to":"16000","fee":"52800&nbsp;&nbsp;&nbsp; "},"row33":{"from":"16000","to":"70000","fee":"52800&nbsp;&nbsp;&nbsp; "'

for v in json.loads(_str + "}}").values():
    row = [v[k].replace("&nbsp;", "").strip() for k in ("from", "to", "fee")]
    print(", ".join(row))

Output (partial):

0, 500, 23100
500, 1000, 24100
1000, 1500, 25200
1500, 2000, 26200
2000, 2500, 27200
2500, 3000, 28300
3000, 3500, 29300
3500, 4000, 30400
4000, 4500, 31400
4500, 5000, 32400
5000, 5500, 33500
5500, 6000, 34600
6000, 6500, 35500
6500, 7000, 36600

I deleted the braces in order to make the parsing easier... which in hindsight was counterintuitive. The JSON was correct in the source. Thank you! — Feiznia, Commented Jul 10 at 8:11

Collectives™ on Stack Overflow

How can a number range and value be extracted from this complicated string using Python?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
json
string
split
python-re
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonjsonstringsplitpython-re or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
json
string
split
python-re
or ask your own question.