Ticket #1111 (new enhancement)

SVN Diffs for #1111

 

Opened 7 years ago

Last modified 1 year ago

Translit rule syntax possibilities

Reported by: mark.davis(at)us.ibm.com Assigned to: andy
Priority: trivial Milestone: UNSCH
Component: transliterate Version: 1.8.1
Keywords: transliterate Cc:
Load: Xref: 3921
Java Version: Operating System: all
Project (C/J): all Weeks: 2
Review:

Description

I thought of something while doing the hindi and piglatin examples, where you have to percolate. I'm not saying we should implement it, but I wanted to capture the thought in the bug database

With a small syntax addition (and probably small implementation), we could add state to a transliterator. xxx : m > yyy : n meaning: if you encounter xxx AND are in state m, replace by yyy and go into state n. default for m, n are 0. If we did that, we would actually have a complete Turing machine!!

Attachments

Change History

12/31/69 18:26:54 changed by notes

RFE assigned to alan target: unscheduled

12/31/69 18:26:55 changed by auditor

  • 11/02/01 20:29:00 schererm changed notes
  • 11/02/01 20:29:02 schererm moved from incoming to feature
  • 12/10/01 17:35:58 alan moved from feature to transliterate
  • 04/12/02 17:47:13 alan sent reply 1
  • 04/12/02 17:48:00 alan changed notes
  • 07/31/02 19:09:37 schererm changed notes
  • 10/29/02 15:57:58 hshih changed notes2
  • 10/29/02 15:57:58 hshih changed notes
  • 05/30/03 10:13:11 hshih changed notes2
  • 05/30/03 10:13:11 hshih changed notes
  • 07/12/04 18:46:32 andy changed notes2
  • 07/12/04 18:46:32 andy changed notes
  • 07/12/04 18:56:33 andy changed notes2
  • 07/12/04 18:56:33 andy changed notes

04/12/02 17:47:13 changed by Alan Liu <alan.liu(at)jtcsv.com>

A good way to write the syntax might be:

alpha : a > b; c > d; e > f : beta; g > h;

beta: q > r; s > t;

That is, it will get tedious to have to repeat the left state tag. So once a left state tag has been seen, it should repeat until a different left state tag is seen.

Furthermore, the default right state tag shold probably be the current left state tag. That way the "c > d;" rule above works in state "alpha" and stays in state "alpha".

I also propose that the state tags be alphanumeric identifiers, not numerals, like C or Java labels. And the default state tag can be "main", following C++ and Perl conventions.

ISSUE: Should the transliterator become stateful between calls to incremental transliterate()? That is, if I get a partial match, and I'm in state "foo", should the transliterator remember this, and then be in state "foo" when the next bit of text comes in? I think it must for this to work correctly.

Implementation probably consists of 2 new data members on each rule, a left tag and a right tag. These can be integers created by the parser. If we want pretty output we can keep the strings<->integer mapping in the core data object.

Then the RBT object needs a new data member encoding the current state, for

incremental operation.

09/28/07 12:27:39 changed by andy

  • load changed.
  • project set to all.
  • java changed.
  • revw changed.
  • summary changed from RFE: translit rule syntax possibilities to Translit rule syntax possibilities.

Add/Change #1111 (Translit rule syntax possibilities)




Anti spam check: