Jump to content


Photo
- - - - -

Handling funky characters


10 replies to this topic

#1 anortonbk

anortonbk

    Newbie

  • Members
  • Pip
  • 8 posts

Posted 27 March 2007 - 07:25 PM

I have an update that reads a field, which is populated by another system. My program is expecting pipe delimited text. However, odd characters are often embedded in the text - I'll attach a few screen shots. When my program is reading the field, if it happens to cross one of these funky characters, it can't handle it, and nothing from that point forward is written or processed.

So far, it doesn't appear that we can pinpoint how these characters are getting embedded in the text, so I would love to know how to handle this on my end. The problem is that it is unpredictable, so seeing as it is impossible to set up code to handle every potential character that could be in the text field, is there any way for me to strip these out????

When you look in the database, you can't see them, but when I display after reading, you can.

Please HELP!!

Thanks a ton in advance!

Attached Files



#2 Chris Pepper

Chris Pepper

    ProIV Guru

  • Members
  • PipPipPipPipPip
  • 369 posts
  • Gender:Male
  • Location:United Kingdom

Posted 27 March 2007 - 08:33 PM

So what do you mean when you say your program can't handle it? Do you mean that the file system you are using doesn't like the characters? What type of file system are you using?

#3 anortonbk

anortonbk

    Newbie

  • Members
  • Pip
  • 8 posts

Posted 27 March 2007 - 09:29 PM

I mean that as my pro IV superlayer program reads the text field, if there is an ASCII symbol in the field, my program doesn't process anything beyond that point.

The other system passes me one huge field, containing 26 fields delimited by pipes, my program reads the big field, then writes each individual field to a second table.

It is supposed to read the pipe delimited fields, then, based on position, and the start and end pipe, take the field and write it to a field in the second table. When it hits one of these characters, none of the fields from that point forward get written to the second table.

So, I need to know how to strip these ASCII characters out of the text, or how to ignore them.

#4 Neil Hunter

Neil Hunter

    ProIV Guru

  • Members
  • PipPipPipPipPip
  • 414 posts
  • Gender:Male
  • Location:Johannesburg, South Africa

Posted 28 March 2007 - 07:45 AM

Why not do checking before you populate any tables ?

ie go through each field and make sure numeric is numeric, special characters aren't there etc.

Then throw out an exception report

#5 Mike Nicholson

Mike Nicholson

    Expert

  • Members
  • PipPipPipPip
  • 196 posts
  • Gender:Male
  • Location:Stockholm, Sweden

Posted 28 March 2007 - 10:28 AM

Just a thought but are you sure it's those characters causing the problem? There are no control characters in there doing something odd?

Having worked in languages with extra characters I've had a few odd things happen but I've not seen anything hang because of it. Usually you'll truncate or lose some data.

Whatever the characters are you can eiter:

a. Check for invalid characters in each string if you have a full list of invalids.

or

b. Go through each string a character at a time and validate against a permissable string.

As Neil suggests, probably best to do this in a pre-load check so you'll know whether there will be problems and can sort it out in advance.

Cheers

Mike

#6 anortonbk

anortonbk

    Newbie

  • Members
  • Pip
  • 8 posts

Posted 28 March 2007 - 04:12 PM

Thanks for all of the advice. I might not have been clear - my program doesn't hang, it finishes, we just lose data as you say - any data from the symbol onward do not get written.


So, I could compare every character of the field to a list of acceptable characters (perhaps with an IN clause), and if it is not in the list, then eliminate it?

Or, maybe create a table with all acceptable characters, and compare against that. Then, if we needed to add more characters, we would just need to modify the table, and not the code.

#7 Mike Nicholson

Mike Nicholson

    Expert

  • Members
  • PipPipPipPip
  • 196 posts
  • Gender:Male
  • Location:Stockholm, Sweden

Posted 29 March 2007 - 07:25 AM

<br />Thanks for all of the advice. I might not have been clear - my program doesn't hang, it finishes, we just lose data as you say - any data from the symbol onward do not get written.<br /><br /><br />So, I could compare every character of the field to a list of acceptable characters (perhaps with an IN clause), and if it is not in the list, then eliminate it?<br /><br />Or, maybe create a table with all acceptable characters, and compare against that. Then, if we needed to add more characters, we would just need to modify the table, and not the code.<br />

<br /><br /><br />

If you're going to put the characters into a table I'd suggest laoding that table into a memory file before you start the routine and using that for the check. An extra file read for each caracter in each line being read will really slow you down if the input file is of any real size ...

Cheers

Mike

Edited by Mike Nicholson, 29 March 2007 - 07:27 AM.


#8 Chris Mackenzie

Chris Mackenzie

    ProIV Guru

  • Members
  • PipPipPipPipPip
  • 368 posts
  • Gender:Male
  • Location:Bristol, United Kingdom

Posted 29 March 2007 - 08:36 AM

<br />So, I could compare every character of the field to a list of acceptable characters (perhaps with an IN clause), and if it is not in the list, then eliminate it? Or, maybe create a table with all acceptable characters, and compare against that. Then, if we needed to add more characters, we would just need to modify the table, and not the code.



If you're going to put the characters into a table I'd suggest laoding that table into a memory file before you start the routine and using that for the check. An extra file read for each caracter in each line being read will really slow you down if the input file is of any real size ...

Cheers

Mike



Yeah, and some would use a Value Variable rather than a table to avoid having to change the code.
The content and views expressed in this message are those
of the poster and do not represent those of any organisation.

#9 Donald Miller

Donald Miller

    ProIV Guru

  • Members
  • PipPipPipPipPip
  • 205 posts
  • Gender:Male
  • Location:Cupar, Fife, Scotland
  • Interests:Motorcycling, Running, Cooking

Posted 22 April 2007 - 08:15 PM

Hi anortonbk

You could use a table for the data, avoiding the need to change the code when there were changes to the list of characters to check. Another efficient method could be to read the table and add the elements into an array prior to looping through the string to compare to the array. This also may make the code easier to maintain.

Just a thought....
Half of what he said meant something else, and the other half didn't mean anytthing at all

#10 Mike Nicholson

Mike Nicholson

    Expert

  • Members
  • PipPipPipPip
  • 196 posts
  • Gender:Male
  • Location:Stockholm, Sweden

Posted 23 April 2007 - 09:12 AM

Hi anortonbk

You could use a table for the data, avoiding the need to change the code when there were changes to the list of characters to check. Another efficient method could be to read the table and add the elements into an array prior to looping through the string to compare to the array. This also may make the code easier to maintain.

Just a thought....


Donnie, a memory file should be just as efficient as an array and needs no updating when the array size changes.

Cheers

Mike

#11 Rob Donovan

Rob Donovan

    rob@proivrc.com

  • Admin
  • 1,640 posts
  • Gender:Male
  • Location:Spain

Posted 23 April 2007 - 11:21 AM

Actually in my tests looping through an array is slower than using a Memory file....

So in all situations, using a memory file is better.

Rob.



Reply to this topic



  


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users