Numpythonic way to fill value based on range indices reference (label encoding from given range indices)

Question

I have this tensor dimension:

(batch_size, class_id, range_indices) -> (4, 3, 2)
int64
[[[1250 1302]
  [1324 1374]
  [1458 1572]]

 [[1911 1955]
  [1979 2028]
  [2120 2224]]

 [[2546 2599]
  [2624 2668]
  [2765 2871]]

 [[3223 3270]
  [3286 3347]
  [3434 3539]]]

How do I construct densed representation with filled value with this rule:

Since there is 3 class IDs, therefore:

Class ID 0: filled with 1
Class ID 1: filled with 2
Class ID 2: filled with 3
Default: filled with 0

Therefore, it will outputting vector like this:

[0 0 0 ...(until 1250)... 1 1 1 ...(until 1302)... 0 0 0 ...(until 1324)... 2 2 2 ...(until 1374)... and so on]

Here is copiable code:

data = np.array([[[1250, 1302],
                  [1324, 1374],
                  [1458, 1572]],

                 [[1911, 1955],
                  [1979, 2028],
                  [2120, 2224]],

                 [[2546, 2599],
                  [2624, 2668],
                  [2765, 2871]],

                 [[3223, 3270],
                  [3286, 3347],
                  [3434, 3539]]])

Here is code generated by ChatGPT, but I'm not sure it's Numpythonic way since it's using list comprhension:

import numpy as np

# Given tensor
tensor = np.array([[[1250, 1302],
                    [1324, 1374],
                    [1458, 1572]],

                   [[1911, 1955],
                    [1979, 2028],
                    [2120, 2224]],

                   [[2546, 2599],
                    [2624, 2668],
                    [2765, 2871]],

                   [[3223, 3270],
                    [3286, 3347],
                    [3434, 3539]]])

# Determine the maximum value in the tensor to define the size of the output array
max_value = tensor.max()

# Create an empty array filled with zeros of size max_value + 1
dense_representation = np.zeros(max_value + 1, dtype=int)

# Generate the class_ids array, replicated for each batch
class_ids = np.tile(np.arange(1, tensor.shape[1] + 1), tensor.shape[0])

# Generate start and end indices
start_indices = tensor[:, :, 0].ravel()
end_indices = tensor[:, :, 1].ravel()

# Create an array of indices to fill
indices = np.hstack([np.arange(start, end) for start, end in zip(start_indices, end_indices)])

# Create an array of values to fill
values = np.hstack([np.full(end - start, class_id) for start, end, class_id in zip(start_indices, end_indices, class_ids)])

# Fill the dense representation array
dense_representation[indices] = values

# The resulting dense representation
print(dense_representation)
print(dense_representation[1249:1303])
print(dense_representation[1323:1375])
print(dense_representation[1457:1573])
print(dense_representation[1910:1956])

Output:

[0 0 0 ... 3 3 0]
[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0]
[0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0]
[0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 0]
[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 0]

It might be interesting to provide a smaller example (which could result in a printable array) with the exact expected output. — mozway, Commented Jul 10 at 10:16
@mozway Sorry for uncovinient, I know what you mean, it's because the tensor is my real case data. — Muhammad Ikhwan Perwira, Commented Jul 10 at 10:22

mozway · Accepted Answer · 2024-07-10 10:42:54Z

1

IIUC, you could craft the output array with zeros, repeat, tile:

start = data[..., 0].ravel()
end = data[..., 1].ravel()
slices = [slice(a,b) for a,b in zip(start, end)]
n = end-start
out =  np.zeros(data.max(), dtype='int')
out[np.r_[*slices]] = np.repeat(np.tile(np.arange(data.shape[1])+1, data.shape[0]), n)

Variant with boolean indexing:

start = data[..., 0].ravel()
end = data[..., 1].ravel()
out =  np.zeros(data.max(), dtype='int')
idx = np.arange(len(out))
m = ((idx >= start[:, None]) & (idx < end[:, None])).any(axis=0)
n = end-start
out[m] = np.repeat(np.tile(np.arange(data.shape[1])+1, data.shape[0]), n)

Or:

start = data[..., 0].ravel()
end = data[..., 1].ravel()
out =  np.zeros(data.max(), dtype='int')
idx = np.arange(len(out))

m1 = ((idx >= start[:, None]) & (idx < end[:, None]))
m2 = m1.any(axis=0)
nums = np.tile(np.arange(data.shape[1])+1, data.shape[0])

out[m2] = nums[m1[:, m2].argmax(0)]

Output:

[0 0 0 ... 3 3 3]

edited Jul 10 at 10:42

answered Jul 10 at 10:15

mozway

246k13 gold badges46 silver badges85 bronze badges

out[np.r_[*slices]] = np.repeat(np.tile(np.arange(data.shape[1])+1, data.shape[0]), n) Invalid syntax, perhaps you forgot comma at np.r_
– Muhammad Ikhwan Perwira
Commented Jul 10 at 10:19
I think you have a too old python version for this syntax. I used 3.11
– mozway
Commented Jul 10 at 10:22
It's Google Colaboratory runtime. But at least the boolean indexing is worked to me and have verified np.array_equal(out, dense_representation) is True.
– Muhammad Ikhwan Perwira
Commented Jul 10 at 10:26
1

@MuhammadIkhwanPerwira If np.r_[*slices] does not work for you, the following should be equivalent and "backward-compatible": np.r_[tuple(slices)] (thus using an indexing tuple rather than an unpacked index)
– simon
Commented Jul 10 at 14:09

Add a comment |

Collectives™ on Stack Overflow

Numpythonic way to fill value based on range indices reference (label encoding from given range indices)

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
numpy
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonnumpy or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
numpy
or ask your own question.