Determine which elements of sequence A can be used to create given sequence B in linear time

Question

Let's say we have sequence A and subsequence of A, called B. I need to determine which elements of sequence A could be potentially used to construct subsequence B.

For example, assume A = [1, 3, 2, 1, 2, 3, 1, 3, 2] and B = [1, 3, 1, 2]

To construct B from A, we could use elements at indexes:

1, 2, 4, 5
1, 2, 4, 9
1, 2, 7, 9
1, 6, 7, 9

So our answer should look like [true, true, false, true, true, true, true, false, true], where true means that it's possible that element at i-th index was used to construct subsequence B.

It would be easy to solve with O(2^N * M) time complexity (generate all subsequences of A and compare them with B). But I need solution that works in something like O(NlogN).

I figured out solution that seems working when all A elements (so B also) are unique.

I iterate over A, and for every i-th element I check whether:

A[i] exists in B, otherwise we can discard this i-th element
Let's say j is equal to index where A[i] was found in B. We check whether i >= j, otherwise we can discard this i-th element, because if j > i, then our subsequence won't maintain order
We check if any neigbhour of B[j], so B[j - 1] or B[j + 1] is at the right side of A[i] (all elements whose indexes are greater than i). Now, because the elements are unique, we can say that this specific A[i] element must have been used at some B[j] position, because there will always be one A subsequence that is equal to B.

That's how it could look in code:

#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

bool val_exists(const vector<pair<int, int>>& A_val_sorted, int val, int min_idx) {
    if (val == -1) {
        return true;
    }

    if (min_idx == A_val_sorted.size())
        return false;

    int l = 0;
    int r = A_val_sorted.size() - 1;

    while (l <= r) {
        int mid = (l + r) / 2;

        if (A_val_sorted[mid].first == val) {
            if (A_val_sorted[mid].second >= min_idx)
                return true;

            l = mid + 1;
        }
        else if (A_val_sorted[mid].first < val) {
            l = mid + 1;
        }
        else {
            r = mid - 1;
        }
    }

    return false;
}

int main()
{
    int n, m;
    cin >> n >> m;

    // val, original index
    vector<pair<int, int>> A_val_sorted(n);
    vector<pair<int, int>> B_val_sorted(m);

    vector<int> A(m);

    for (int i = 0; i < n; ++i) {
        cin >> A_val_sorted[i].first;
        A_val_sorted[i].second = i;
    }

    for (int i = 0; i < m; ++i) {
        cin >> B_val_sorted[i].first;
        B_val_sorted[i].second = i;

        A[i] = B_val_sorted[i].first;
    }

    sort(A_val_sorted.begin(), A_val_sorted.end());
    sort(B_val_sorted.begin(), B_val_sorted.end());

    vector<int> result(n);
    for (const pair<int, int>& b : A_val_sorted) {
        int val = b.first;
        int idx = b.second;

        int l = 0;
        int r = m - 1;
        bool valid = false;
        while (l <= r) {
            int mid = (l + r) / 2;

            if (B_val_sorted[mid].first == val) {
                int right_ngbr_idx = B_val_sorted[mid].second + 1;
                if (B_val_sorted[mid].second <= idx && val_exists(A_val_sorted, (right_ngbr_idx < A.size() ? A[right_ngbr_idx] : -1), idx + 1)) {
                    valid = true;
                    break;
                }

                r = mid - 1;
            }
            else if (B_val_sorted[mid].first < val) {
                l = mid + 1;
            }
            else {
                r = mid - 1;
            }
        }

        result[idx] = valid;
    }

    for (int i = 0; i < n; ++i)
        std::cout << result[i] << ' ';
}

I know, for the case where elements are unique, it can be done much easier. But this approach provides idea on how to solve this problem when elements are non-unique.

Instead of checking one of the neigbhours, we can just check whole left and right part from B[j]. Lets say N = A.size() and M = B.size(). This solution leads to O(NlogN + MlogM + N*M*logM), because we can sort the subsequences, store the original indexes etc. and use binary search to find A[i] in B.

It's too slow. How can I solve it for big sequences, like N = 10^5?

Consider asking this on math.stackexchange.com or computerscience.stackexchange.com -- it doesn't appear to be directly on-topic here as it's not a programming question but rather a mathematical/theoretical concept question. — TylerH, Commented Jul 8 at 18:56
@TylerH I think it's more programming question than computer science. It's simple programming problem from Polish Olympiad of Informatics (first stage), I just translated and paraphrased it so it's more general and fits better to SO, as it represents actual algorithmic problem, not the question about solving specifc CP task. — Szyszka947, Commented Jul 8 at 19:15
@TylerH This is a dynamic programming problem. Which is a pretty square fit with the algorithm tag. It isn't a fit with either CS or math. — btilly, Commented Jul 8 at 19:28
The edit to make it specific to C++ has made it more of a programming question, yes. But until then it was not programming-specific, just a math question. Algorithms are a math concept, not a programming one. Stack Overflow is for programming questions, not math questions, and algorithms are squarely on-topic (and popular) on both math.SE and CS.SE. — TylerH, Commented Jul 8 at 19:29
@btilly Don't be silly, the definition of an algorithm is a set of operations to perform in a certain order/given certain conditions, usually in calculations. It is inherently math. See my comment to beaker above. People can ask about implementing algorithms in code here all day! But if they're just asking about the algorithm, and not about implementing it in code, they need to take their question elsewhere. — TylerH, Commented Jul 8 at 19:56

btilly · Accepted Answer · 2024-07-08 19:37:58Z

There are a number of ways to do this.

The simplest is the following.

Use a greedy algorithm from the front to figure out which possible index is the earliest possible place that each position in B could go.

Use a greedy algorithm from the back to figure out which possible index is the latest possible place that each position in B could go.

Then in pseudocode:

create an array of False values
for each position in B
    for each place in A with that value from earliest to latest this position in B can go
        mark it True

Optimization. Create a dictionary mapping each value, to all of the locations in A where it is found. With the locations listed in order. This allows the search to be fast.

Second optimization. As a given position is marked True, remove it and all positions before it with the same value from the dictionary of locations. (Either they are already marked True, or no later position in B can result in them being marked True.)

With those optimizations, this becomes O(n). And therefore you'll have no problem with 10^5 elements.

Thanks! Can I know what was your thinking process? I started with observing example, and based on observations I tried to generalize conditions which must be satisfied for every A[i]. With this approach, I would never get idea like yours. How did u start? — Szyszka947, Commented Jul 9 at 9:37
@Szyszka947 I started knowing my trick for doing dynamic programming to generate a data structure that can reconstruct solutions. See stackoverflow.com/questions/78498209/… for that. Then I realized that you don't need the whole matching pattern - just which can go where. Then I realized that earliest - latest would do it. Then I began to think of optimizations to that. — btilly, Commented Jul 9 at 14:55

Collectives™ on Stack Overflow

Determine which elements of sequence A can be used to create given sequence B in linear time

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
c++
algorithm
subsequence
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged c++algorithmsubsequence or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
c++
algorithm
subsequence
or ask your own question.