KStr: Multilingual String Class KStr: Multilingual String Class
By Keiji Ikuta(Jul. 5, 2001)
 H > P > this page: bottom (Group)
Objectives
KStr is a class to realize string manipulation in such a way as follows:
  • Treat multiple languages in the same time.
  • Treat variable length of strings.

Structure
  • Byte sequence.
  • Has length information. (long)
  • Use ESC sequence to change character codes.
  • Suppose normal text is Windows 1252.
  • To enable character search, ESC sequence has previous/next character code.
  • Each character code has each byte length.
    e.g. Shift JIS = 2byte, Windows 1252 = 1byte.
e.g.
ESC sequence is represented like [Prev,Next].
The real code is like: [ESC] [Next] [Prev]

representation: ABC[20,22]kanji[22,20]XYZ
hex: 41 42 43 1B 22 20 8A BF 8E 9A 1B 20 22 58 59 5A

Using this structure, there are some merits:

  • Can draw each string block using different font suites for each character set.
  • Can draw structured string. An extension of this structure, I developted block structure for equation expression.
    [Please see Equation]
  • Can hold more than UNICODE. The UNICODE is controversial because of unification of Kanji character.
    Japanese Kanji and Chinese Kanji is different even though they are similar in form!)
    Because KStr is using ESC sequences like ISO-2022, it has ability to extend its capacity.

Character Code
  • ASCII
  • Windows Code Page
  • ISO 8859
  • JIS X 0201
    Japanese, Kanji)
  • JIS X 0208
    Japanese, Kanji: These are almose same, but some characters are different.
    JIS C 6226-1978
    JIS X 0208-1983
    JIS X 0208-1990
    JIS X 0208-1997
  • JIS X 2012 (Japanese, Kanji)
    JIS X 0212-1990
  • SJIS (Japanese, Kanji)
    Standard used in PC and AIX.
  • EUC
    Standard in UNIX(Sun). It can hold eny character code.
  • ISO-2022-JP
    Similar concept with KStr. It uses ESC sequence.
  • ISO 10646/UNICODE
    UNICODE has many problem!
    e.g. CJK: Even though the shapes are similar, they are NOT same!
    e.g. Combining mechanism: Why those mechanism were defined in character code?
  • GB2312-80 (Chainese, Kanji)
  • BIG 5 (Chainese, Kanji)
  • KS C 5601 (Korea)
    KS C 5601-1992
  • VISCII (Vietnam)
  • Mojikyo (Kanji and any characters)
  • EBCDIC
  • JEF (Japanese, Kanji)

Character Code and Language References

String Manipulations

Definition:

KStr s,s2; // Variable length. Allocated from heap memory.
KStr_<100> s3; // Fixed length. Allocated on stack.

Assignment

s="abcde";
s2='A';

Conjunction:

s<<"xyz"<<'A';

String Functions:

s2=Mid(s,3,2);

Substring:

To treat substring, KStr_i is used. This class is iteration class of KStr.
KChar kc;
KStr s;
KStr_i si(s);
while(i){
  kc=i();
  i++;
}

String Functions
long Delete(long len)
long Insert(long st,rcKStr s)
KStrp Mid(rcKStr s,long st,long len);
KStrp Mid(rcKStr s,long st);
KStrp Left(rcKStr s,long len);
KStrp Right(rcKStr s,long len);
long Delete(rKStr s,long st,long len);
long Insert(rKStr s,long st,rcKStr src);
KStrf UL2Str(ulong n,long bs=10,char bc='a');
KStrf L2Str(long n,long bs=10,char bc='a');
KStrf Hex(ulong n,char bc='a');
KStrf Oct(ulong n);
long Str2L(rcKStr s,long bs=10);
ulong Str2UL(rcKStr s,long bs=10);
KStrf Format(long n,rcKStr fmt);
KStrf Format(rcKStr s,rcKStr fmt);

Test Application
Multilingual Text Editor with KStr using Win32 NLS, Win32 IME support under Win2000.
I admit that Windows2000 is great regarding language support including various IME and fonts!
t1169a

Valid HTML 4.01!Valid CSS!