[SGVLUG] Help with file format
Jeremy Leader
jleader at alumni.caltech.edu
Wed Nov 29 10:10:02 PST 2006
Keep in mind that "file" just looks at the first few hundred bytes of the file,
but if it's ostensibly a text file, you should be able to break it up into
smaller chunks and run "file" on each of them. You might want to try "file -k"
which tells it to look for more than one matching format, since the format it's
reporting is very likely a false positive.
Also, in vim, try doing ":set fileencoding?" to see what encoding vim thinks the
file is in.
od is a handy tool, my favorite invocation is something like this:
od -tx1 -tc ./garbage.foo | less
which shows alternating lines of hex bytes and ascii text:
0000000 73 65 6c 65 63 74 20 6d 65 73 73 61 67 65 2c 20
s e l e c t m e s s a g e ,
0000020 63 6f 75 6e 74 28 2a 29 20 66 72 6f 6d 20 64 69
c o u n t ( * ) f r o m d i
--
Jeremy Leader
jleader at alumni.caltech.edu
leaderj at yahoo-inc.com (work)
on 11/29/2006 08:11 AM Ted Arden wrote:
> sounds like the file is multi-part. you could
> try stripping off the txt bits in front, redirecting
> the 'garbage bits' to another file then asking
> linux to tell you what that 'garbage' is..
> octal dumps are kinda fun too.
>
> od -c ./garbage.foo | less
>
> then you can kinda see what sorta characters are
> there.
>
> if there's any sorta texty type stuff in the
> 'garbage', use strings to strip it out as well.
>
> anyway, the od stuff is olde skewl unix commands
> back from my OSF/1 days.. strings sometimes works
> a bit better to *read* files like that.
>
> strings ./garbage.foo
>
> =ted=
>
> On Wed, 29 Nov 2006, James Neff wrote:
>
>> Greetings,
>>
>> We received a file from a customer and I'm having trouble determine what
>> the character set is.
>>
>> When I run the "file" utility:
>>
>> [root at appserver2 06-11-28]# file customer-file.txt
>> customer-file.txt: MPEG ADTS, layer I, v1, 96 kBits, 44.1 kHz, Stereo
>>
>>
>> When I run "less" it thinks its a binary file and I see garbage if I
>> choose to look at it anyway.
>>
>> When I run "vi" I can read the file just fine from start to finish but
>> at the bottom of the terminal is:
>>
>> "customer-file.txt" [converted][dos] 47830L, 9943298C
>>
>> The line count is correct.
>>
>> When I run "more" I can read the file just fine from start to finish.
>>
>> When I try to use "split", the first 15103 lines look ok, but after that
>> everything looks like garbage, as if its binary.
>>
>> Before I can go back to our customer and ask them for a proper file, I
>> need to at least tell them what is wrong with this file (other than
>> saying something is wrong with it).
>>
>> What started this problem was when we tried to import this into our MS
>> SQL database using DTS. At line 15103 the DTS reported an error saying
>> there were extra columns in that record. When we first opened DTS it
>> reported the file is in UNICODE. How would I go about verifying that?
>>
>> So how do I determine what exactly is wrong with this? Any ideas?
>>
>> Thanks in advance,
>> James
More information about the SGVLUG
mailing list