Friday, April 12, 2013

ATK text pitfalls

As soon as I ensured myself I've got a good understanding of ATK text they put me back into reality. One more time I must admit myself that ATK text is unknowable like the universe. Seriously, shortly after I started the work on fixing ATK text bugs in Firefox then Orca, a Linux screen reader, suddenly felt bad. I've been suggested to compare Firefox and GEdit to see if there's a difference in implementations. So did I and then I realized that results depend on whence you start the ATK spec reading (btw, GEdit implementation doesn't always follow the spec). If you read the spec from beginning (a first sentence) then you get one result. If you read it from the end (a second sentence) then you might conclude that a different result is expected. I filed a bug against ATK. But let's read it again together, I might be missing something.

Let's consider an example: "a funny word".

* atk_text_get_text_at_offset for BOUNDARY_WORD_END
The returned string is from the word end before the offset to the word end at or after the offset.
I think you will agree that there's no word end *before* 0 offset so it can be treated as an author error. ATK doesn't say how error values should be handled so I guess any reasonable return value is allowed. Firefox returned a ('', 0, 0) triplet and that confused Orca.

Read the spec next:
The returned string will contain the word at the offset if the offset is inside a word.
This means we should return a ('a', 0, 1) triple because 0 offset is inside 'a' word (btw, that's what GEdit did).


* atk_text_get_text_at_offset for BOUNDARY_WORD_START

It is a dual problem to the issue above for the offset equal to a text length. Spec says:
The returned string is from the word start at or before the offset to the word start after the offset.
and
The returned string will contain the word at the offset if the offset is inside a word.
There's no word start after the offset but the same time the offset is inside 'word' word. Reading next.


* atk_text_get_text_after_offset for BOUNDARY_WORD_END
The returned string is from the word end at or after the offset to the next work end.
It might be not evident but 0 offset is a word end offset. A proof by contradiction. If 0 offset is not the end offset then
get_text_at_offset(0, BOUNDARY_WORD_END)
in case of single word (like 'word') should return an empty text. But this contradicts to get_text_at_offset method name semantic and Orca expectations (see the case above). Therefore 0 offset is a word end.

Then it means that the method at 0 offset should return the first word ('a' in our example). But the second sentence says that it must be a second word ('funny' in our case).
The returned string will contain the word after the offset if the offset is inside a word.


* atk_text_get_text_before_offset for BOUNDARY_WORD_START.

This is a dual problem to get_text_after_offset (word end boundary) case. Let's take an offset equal to text length.
The returned string is from the word start before the word start before or at the offset to the word start before or at the offset.
Text length offset is a word start offset. A proof is by analogy (see above). That means that 3d word is expected ('text' in our case). However the second sentence says that it should be a 2nd word ('funny' in our case):
The returned string will contain the word before the offset if the offset is inside a word.

No comments:

Post a Comment