[SGVLUG] Grep "quickie" needed -- searching for hi-bit characters
Christopher Smith
x at xman.org
Fri Jan 4 20:59:19 PST 2008
Emerson, Tom (*IC) wrote:
>
> *We have some files that were transferred from one machine to another
> [one of which was a PC], and somewhere in the process, it appears that
> some local-language/"multi-byte" characters got translated to
> multiple-ascii-bytes, which in turn buggered up the record length.
> Fortunately, these are easy to detect visually as the new values for
> each "byte" of the character are between 128 and 255 and generally
> look like "line noise" when cat'd to the screen. Unfortunately, the
> files involved are thousands of lines long, so a pure visual search is
> out of the question.*
>
*This all sounds suspiciously like UTF-8 encoding, no? If so, most
unicode libraries have handy routines for this kind of stuff.
--Chris
*
More information about the SGVLUG
mailing list