Page 1 of 1

Collating Sequence

Posted: Fri Jul 29, 2016 11:58 am
by Nasirshiak
Hi all,

Could you please anyone tell me, How the comparison done based on collating sequence. I have gone through the Collating sequence, Got some idea and came to know that it is representation of alphabets into machine understandable language. But still confused.

It would be helpful, if you tell me how the comparisons can be done . Since I will be moved to Migration(Mainframe to UNIX) Project in next three months, I am learning my self about all the related things to migration.

I know that the UNIX(data is ASCII format) collating sequence is different and in Mainframe( data is in EBCDIC format) . Each and every alphabet will have different control characters to be understood by Computer.

In EBCDIC:

For Supoose A = 123NASIR

IF A < SPACE or A > 9
then
Display "Print something"
END-IF

why this statement will not work in ASCII environment. I tried lot to understand it, but could not get it. Please help.

I know the ASCII collating sequence is like - NUMBERS,ALPHABETS LOWER,ALPHABETS UPPER,SPACE,SPECIAL CHARACTERS
EBCDIC collating sequence is like - ALPHABETS LOWER,ALPHABETS UPPER,NUMBERS,SPACE,SPECIAL CHARACTERS

How the in mainframe or Unix , the bit by bit comparison will be done. Is it like it internally sort in the order mentioned above for the respective environments(ASCII, EBCDIC) and then it will start comparing??

I am new to Mainframe and Migration. help me to understand the collating sequence to boost my knowledge in Migration .


Thanks,
Nasir

Re: Collating Sequence

Posted: Fri Jul 29, 2016 5:17 pm
by Robert Sample
I know the ASCII collating sequence is like - NUMBERS,ALPHABETS LOWER,ALPHABETS UPPER,SPACE,SPECIAL CHARACTERS
EBCDIC collating sequence is like - ALPHABETS LOWER,ALPHABETS UPPER,NUMBERS,SPACE,SPECIAL CHARACTERS
What you "know" is WRONG. The SPACE character, for example, is X'20' in ASCII and X'40' in EBCDIC. The space in both collating sequences comes before numbers, before letters (upper AND lower), and there are special characters below and above spaces in both collating sequences.

Numbers are X'30' through X'39' in ASCII while they are X'F0' through X'F9' in EBCDIC. Hence if you sort data in ASCII, the numbers will come out before the letters whereas sorting the same data in EBCDIC, the numbers will come out after all the letters. There are 128 characters in ASCII (256 in extended ASCII) and 256 in EBCDIC. Google collating sequence and read up on where different characters come in the collating sequence; be aware, too, that some characters are NOT shared between ASCII and EBCDIC. Also, there are multiple definitions for EBCDIC collating sequence so some characters may appear differently depending upon which definition you are using.

Re: Collating Sequence

Posted: Sat Jul 30, 2016 9:13 am
by Anuj Dhawan
Along with what Robert has said, you might also want to read this explanation on Collating Sequence by William Collins:

William Collins wrote: Collating sequence starts from the lowest value, and continues, in sequence, by each subsequent higher value.

ABCDEFG

A is lowest, B is greater than A, C is greater than B (so also greater than A), D is greater than C (so also greater than B and A) etc.

Without a collating sequence, you can do no "greater than" or "less than" comparisons.

The collating sequence also determines in what order data will be sorted.

At the basic level data "collates" from X'00' thru X'FF, in sequence.

In EBCDIC, all "displayable" characters have a hexadecimal value. This is also true in ASCII, but, the hexadecimal values of, for instance, the alphabet and the numbers is different between the two character sets, so the collating sequence is different (in EBCDIC, letters collate lower than numbers, in ASCII the reverse).

At the basic level, in COBOL LOW-VALUES is the lowest hexadecimal value in the collating sequence, and HIGH-VALUES is the highest, and that is X'00' and X'FF' respectively.

However, in a COBOL program (and elsewhere outside COBOL) you can use a different collating sequence for a specific purpose. In a COBOL program running on a Mainframe you could process ASCII data using an ASCII collating sequence, or some custom collating sequence where LOW-VALUES and HIGH-VALUES still contain the lowest and highest in the sequence, but do not contain X'00' and X'FF'.

It is very rare that you would need to do this "for real", but you could do some little tests anyway. Have a look at the ALPHABET clause of the SPECIAL-NAMES paragraph (part of the CONFIGURATION SECTION in the EVIRONMENT DIVISION) and how it would be used in tests, and how you could use it to specify a COLLATING SEQUENCE for a SORT or MERGE statement in COBOL.


Also I believe, the example you are showing us is not complete. Why would someone want to compare a variable A which can take value like A = 123NASIR with a single digit 9, which will be just one byte!? (In your example PICture clause of A is not defined, assuming PIC (X) ). I think the example should have been like :

A = 123NASIR

Code: Select all

IF A(1:1) < SPACE or A > 9
	Display "Print something"
END-IF
...to make a better sense of the comparison.

Re: Collating Sequence

Posted: Thu Aug 04, 2016 3:21 pm
by Nasirshiak
Thanks Robert and Anuj.

Yes, Anuj, you were right, the example should be as shown as above. Thank you.

Re: Collating Sequence

Posted: Fri Aug 05, 2016 1:24 pm
by Anuj Dhawan
Hope we had been helpful.