Skip to main content
The 2024 Developer Survey results are live! See the results

Am I overengineering my lenient NER F1 measures

Created
Active
Viewed 54 times
2 replies
1

Hi, all

I need to customize my F1 measurement a lot when evaluating my fine-tuned NER model performance. But I don't see other people with similar issues. I wonder if I am doing the wrong thing, or if people have their customized F1 measures all the time but do not mention it.

I can easily get an F1 measure from the default hugging face evaluation library, and add some common lenient measures with minimum effort.

For example, I have labels <predator> <food> and my dataset is"

Mice eat cheese.
The Southeast Asian soldier fly eats nectar.
Owls eat mice.

In most off-the-shelf lenient measures, if we ignore the text span. The partially recognised fly would be counted twice as positives.

The(B-predator) Southeast(I-predator) Asian soldier fly(B-predator) eats nectar.

There are also types of lenient measures which ignore mismatched labels, as long as the text is highlighted.

I ended up considering all the options and calculated 4 types of lenient measures. And I haven't touched the measurements per label. Is that too much?

2 replies

Sorted by:
78434992
0

Hey @FewKey!
There is not such thing as "overengineering" a metric. Only inappropriately defining it. So let's start there. What exactly do you wish to evaluate? What is the desired result? What sort of problems are a reason for you to "engineer" the metric at all?

78447391
0
Author

I see. In my case, the desired result is "some entity is found" so that we can do downstream analysis. In this case, I guess which label or the span issues can be ignored. Thanks!