Wednesday, October 28, 2009

How swatantra is this?

Obstinacy

Obstinacy

Several years back, when language computing was based on the ISCIIstandard, and Unicode was known only to some developers in the Free Software community, Microsoft had issued a set of specifications for their fonts.

Called the OpenType specifications, (current specifications for Indic languages available here) -. the entire software community - not just the Free Software community -- had to rely on those specifications for guidance for not just the fonts, but also for language encoding.

Those standards issued by Microsoft had a serious little problem - (കാര്യം നിസ്സാരം, പ്രശ്നം ഗുരുതരം, as they say in Malayalam) - they (or rather, the OpenType specifications as available then), were as precise or ambiguous, depending on your point of view, about how the chillus were to be encoded. To be fair, it was not Microsoft's fault - they had simply copied and pasted some paragraphs from the then latest standards from Unicode into the OT specifications.

One reading meant that a consonant + chandrakkala + zwj was to render a chillakshram, or a chillu. The other meant that the chillu was to be formed only with the consonant + chandrakkala sequence. For some reason, the Free software community decided to stick to Microsoft's definition / interpretation of the Unicode standards, about Chillus.

And the publisher of the OpenType specifications, the hallowed Microsoft Corporation, choose to adopt the other interpretation.

But the Free Software community, blissfully unaware of what was happening in the Microsoft world, went on to create a huge pile of software and data (mostly user interface translations, personal web pages and blogs) on a standard which no-body implemented.

When the Chillus were finally encoded, we raised a hue and cry aout "incompatible legacy data".

And not learning from our mistakes, the community of developers for Malayalam Free Software, became probably, the first community in the Free Software world to openly declare "we will not implement a standard".

And here are a couple of reasons why that stand is obstinate and idiotic.
quote from the above link

1. The atomic chillu's are unacceptable because it destroys the link of a chillu with its base character.
...
Here , the fundamental problem lies in Unicode's way of treating only representational forms without checking linguistic correctness.

The above quotes demonstrate a fundamental [mis|refusal]understanding about the Unicode standards.

The concerns are addressed in the Unicode FAQs - see the Indic FAQ, and for the bigger picture, the the Unicode FAQ index; and here is another.

Here is what a "character" is, according to Unicode - from the glossary.

Character.
(1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape (see also glyph), though in code tables some form of visual representation is essential for the reader’s understanding.
(2) Synonym for abstract character.
(3) The basic unit of encoding for the Unicode character encoding.
(4) The English name for the ideographic written elements of Chinese origin. [See ideograph (2).

(Emphasis, is mine).

You should read that again. Especially, the "The smallest component of written language that has semantic value" part.

So, when the developer community, which claims to represent the Malayalam population should rethink their stand when they say:-

2. The examples used to justify semantic difference between words only separated by ZWJ are non-existent in dictionary , not in are grammatically wrong or meaningless without proper context.

And finally, the security | spoofing part.

Did anybody in the "Swatantra Malayalam" community at least attempt to read RFC3454 ("stringprep").

If you are bandwidth or time challenged, you can avoid going to the RFC and read the excerpts below.

 5. Prohibited Output

   Before the text can be emitted, it MUST be checked for prohibited
   code points.  There are a variety of prohibited code points, as
   described in this section.  A profile of this document MAY use all or
   some of the tables in appendix C.

   The stringprep process never emits both an error and a string.  If an
   error is detected during the checking for prohibited code points,
   only an error is returned.

   Note that the subsections below describe how the tables in appendix C
   were formed.  They are here for people who want to understand more,
   but they should be ignored by implementors.  Implementations that use
   tables MUST map based on the tables themselves, not based on the
   descriptions in this section of how the tables were created.

   The lists in appendix C MUST be used by implementations of this
   specification.  If there are any discrepancies between the lists in
   appendix C and subsections below, the lists in appendix C always take
   precedence.

   Some code points listed in one section may also appear in other
   sections.

   It is important to note that a profile of this document MAY prohibit
   additional characters.






Hoffman & Blanchet          Standards Track                    [Page 10]

RFC 3454        Preparation of Internationalized Strings   December 2002


   Each subsection of this section has a matching subsection in appendix
   C.  For example, the characters listed in section 5.1 are listed in
   appendix C.1.

5.1 Space characters

   Space characters can make accurate visual transcription of strings
   nearly impossible and could lead to user entry errors in many ways.
   Note that the list below is split into two tables in appendix C:
   Table C.1.1 contains the ASCII code points, while Table C.1.2
   contains the non-ASCII code points.  Most profiles of this document
   that want to prohibit space characters will want to include both
   tables.

   0020; SPACE
   00A0; NO-BREAK SPACE
   1680; OGHAM SPACE MARK
   2000; EN QUAD
   2001; EM QUAD
   2002; EN SPACE
   2003; EM SPACE
   2004; THREE-PER-EM SPACE
   2005; FOUR-PER-EM SPACE
   2006; SIX-PER-EM SPACE
   2007; FIGURE SPACE
   2008; PUNCTUATION SPACE
   2009; THIN SPACE
   200A; HAIR SPACE
   200B; ZERO WIDTH SPACE
   202F; NARROW NO-BREAK SPACE
   205F; MEDIUM MATHEMATICAL SPACE
   3000; IDEOGRAPHIC SPACE

5.2 Control characters

   Control characters (or characters with control function) cannot be
   seen and can cause unpredictable results when displayed.  Note that
   the list below is split into two tables in appendix C: Table C.2.1
   contains the ASCII code points, while Table C.2.2 contains the non-
   ASCII code points.  Most profiles of this document that want to
   prohibit control characters will want to include both tables.

   0000-001F; [CONTROL CHARACTERS]
   007F; DELETE
   0080-009F; [CONTROL CHARACTERS]
   06DD; ARABIC END OF AYAH
   070F; SYRIAC ABBREVIATION MARK
   180E; MONGOLIAN VOWEL SEPARATOR



Hoffman & Blanchet          Standards Track                    [Page 11]

RFC 3454        Preparation of Internationalized Strings   December 2002


   200C; ZERO WIDTH NON-JOINER
   200D; ZERO WIDTH JOINER
   2028; LINE SEPARATOR
   2029; PARAGRAPH SEPARATOR
   2060; WORD JOINER
   2061; FUNCTION APPLICATION
   2062; INVISIBLE TIMES
   2063; INVISIBLE SEPARATOR
   206A-206F; [CONTROL CHARACTERS]
   FEFF; ZERO WIDTH NO-BREAK SPACE
   FFF9-FFFC; [CONTROL CHARACTERS]
   1D173-1D17A; [MUSICAL CONTROL CHARACTERS]

C.2.2 Non-ASCII control characters

   ----- Start Table C.2.2 -----
   0080-009F; [CONTROL CHARACTERS]
   06DD; ARABIC END OF AYAH
   070F; SYRIAC ABBREVIATION MARK
   180E; MONGOLIAN VOWEL SEPARATOR
   200C; ZERO WIDTH NON-JOINER
   200D; ZERO WIDTH JOINER
   2028; LINE SEPARATOR
   2029; PARAGRAPH SEPARATOR
   2060; WORD JOINER
   2061; FUNCTION APPLICATION
   2062; INVISIBLE TIMES
   2063; INVISIBLE SEPARATOR
   206A-206F; [CONTROL CHARACTERS]
   FEFF; ZERO WIDTH NO-BREAK SPACE
   FFF9-FFFC; [CONTROL CHARACTERS]

In simple words, the above means - no zero width joiners or zero width non-joiners in the address bar of your browser. Or more accurately, browsers are not supposed to send strings / addressesses.

Just for more technical accuracy, here is some more info.

The IDNA RFC 3490. That is the basic documents developers and applicaions should conform to, to enable unicode compliant domain names. The unicode characters are first process through a "toASCII" filter, which entails checking to ensure that the string conforms to requirements of RFC 3454 (nameprep) .

So, the question of spoofing does not arise; unless applications do not conform to the IDNA standards (and there are plenty of them out there - see here for conforming applications). In case of applications which do nto conform to standards, it is within our power to uninstall that application (in case of proprietary software), or file bugs / fix them.

If you are a Malayalam developer, and if the RFCs above do not convince you that spoofing is not possible with chillus, I have just one thing to tell you =- stop pretending to represent the Malayalam Free Software Community.

I know that I sound rude, but there is no choice left for me.

And look at the danger we are heading into. The latest version has specifiecd that we should the chillu na for the /nta/ stacked conjunct. However dunderheaded that specification may be, the specification is a specification, which means that it is a specification. In other words, it is a specification. and because it is a specification, we are obliged to implement it.

The danger is the stacked /nta/ requires the atomic chillu. Not the "canonical equivalent" one. True, it going to introduce plenty of ambiguities and legacy data problems, but that is the pain Microsoft is. The user community does not want another strawman based pain from the Free Software world which says "we will not implement that standards".

Sunday, February 8, 2009

A Phishing Mail From London

The new phishing spam

I received the following phishing attempt in mail today morning. It was too funny that I could not help posting it here.

Was looking for Nigerian origins, but strangely, looks like from China / Taiwan. I have added a wee bit of markup since this looks strange in blogspot's rendering.

From metropol.investigation ( at )gmail.com Sat Feb 07 23:52:42 2009
Return-path:
Envelope-to: paivakil ( at )localhost
Delivery-date: Sat, 07 Feb 2009 23:52:42 +0530
Received: from localhost ([127.0.0.1] helo=home.amd)
by home.amd with esmtp (Exim 4.69)
(envelope-from )
id 1LVrp8-0001YW-NB
for paivakil ( at )localhost; Sat, 07 Feb 2009 23:52:40 +0530
X-Apparently-To: paivakil ( at )yahoo.co.in via 203.84.221.15; Sat, 07 Feb 2009 21:46:22 +0530
X-YahooFilteredBulk: 61.219.218.115
X-YMailISG: 7KQ9HBAWLDu6R1F4irIwJm5qIQiIU6P6S2AX31Uv9G2bYZVffjR.c9lIhvyLmVKqwPpLYa0fp.mK8UxJiekEjJuiSh7IT.JMzwGjYos0gjwf1.5tK3HgWcTvaeJdvGgjwKO0bIC8qLIyDXY3rC8SBzT7wNaEB1BvJEzAf2nO3CwLxbRxuQBYlJzn3gvWbfzsKUcTlXipAHULWETGLGe_yNo5IxtXwMVkguKUPCDgNx9q
X-Originating-IP: [61.219.218.115]
Authentication-Results: mta107.mail.in.yahoo.com from=gmail.com; domainkeys=neutral (no sig); from=gmail.com; dkim=neutral (no sig)
Received: from pop.plus.mail.fy4.b.yahoo.com [206.190.53.11]
by home.amd with POP3 (fetchmail-6.3.9-rc2)
for (single-drop); Sat, 07 Feb 2009 23:52:38 +0530 (IST)
Received: from 61.219.218.115 (EHLO changsing.com.tw) (61.219.218.115)
by mta107.mail.in.yahoo.com with SMTP; Sat, 07 Feb 2009 21:46:21 +0530
Received: by changsing.com.tw (Postfix, from userid 401)
id 52057701FF; Sun, 8 Feb 2009 00:16:18 +0800 (CST)
Received: from User (unknown [218.57.11.112])
by changsing.com.tw (Postfix) with ESMTP
id 0CA8A701FC; Sun, 8 Feb 2009 00:14:29 +0800 (CST)
Reply-To:
From: "Metropolitan Police Service"
Subject: Your payment overview
Date: Sat, 7 Feb 2009 17:16:00 +0100
MIME-Version: 1.0
Content-Type: text/plain;
charset="Windows-1251"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2600.0000
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
Message-Id: <20090207161429.0CA8A701FC ( at )changsing.com.tw>
To: undisclosed-recipients: ;
X-Virus-Status: No (Scaned by Clam Antivirus)
Status: RO
Content-Length: 2766
Lines: 41
LONDON METROPOLITAN POLICE SERVICE.
ANTI-MONEY LAUNDERING UNIT
Wellington House 67-73 Buckingham Gate
London SW1E 6BE
Attention: Beneficiary,
Dated: 7th/FEB/2009
RE: AN IN-DEPT INVESTIGATION INTO YOUR DELAYED PAYMENT
We wish to inform you that it has come to our notice through our online security service that a huge amount of United States Dollars was scheduled to be remitted into your bank account a few months ago.
According to the report we received from the paying bank here in London, it states expressly that you have been dealing with the wrong people who have used several fake documents to obtain money from you for payment of charges/fees which we consider to be obnoxious.
We have been mandated to step into your transaction and put a STOP ORDER pending until you revert to us for clarification why your money is been delayed more than is necessary. You are hereby advised to stop further communications with your partners in Africa and Europe and co-operate with us to assist you get your payment in record time.
The government of the United Kingdom will not hesitate to bring you to book if you ignore this notice as your payment is causing so much embarrassment to our government and global financial Institutions who repose so much trust in the British banking sector for competence and accountability.
We have resolved with the internal Minister that your money should be paid to you through an Interswitch ATM Card (Automatic Teller Machine) You will only be allowed to withdraw $50,000 per day. The ATM will be loaded with $10,000,000.00 (Ten Million United States Dollars Only. This is inline with the international monetary regulations that not more than $10M should be loaded in an ATM Card. The PIN (Personal Identification Number) would be sent to you alone to your private email box for security reasons.
In view of the foregoing, you are expected to send us your mailing address where you wish the ATM card to me mailed to you by a UK Courier Company. You must also send us proof of ownership of the said fund before we can be able to process your ATM Card. This is for security reasons as there are many fraudsters on the internet.
We look forward to hearing from you at your earliest convenience.
My direct telephone number is:+44 70457 17275
Direct fax:+44 8709 743597 We are glad to be of services to you.
Yours faithfully,
Inspector Donald Boldman
Metropolitan Police Service
London, England.
CC: British Home Office Logistics Department
CC: United States Financial Action Task Force (FATF)

I am sure that the apammers are going to pick up this post, and goung to send out more mails apparently originating from paivakil.