Subscribe: Comments on My SysAd Blog -- Unix: Split XML Records with Perl Script
http://www.mysysad.com/feeds/3871066359170940878/comments/default
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
aug myxml  aug  awk  count  file  filename  myxml xml  myxml  open  part root  part  root aug  root  splitter  xml part  xml 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Comments on My SysAd Blog -- Unix: Split XML Records with Perl Script

Comments on My SysAd Blog -- Unix: Split XML Records with Perl Script





Updated: 2017-08-10T07:44:05.379-07:00

 



If you use lexical filehandles you not only circum...

2008-10-05T19:19:00.000-07:00

If you use lexical filehandles you not only circumvent Bloggers mangling, but also make your live easier.

E.g.:

open my $fh, '<', $filename or die "Can't open '$filename' for reading: $!";
while ( my $line = <$fh> ) {
...
}
close $fh or die "Can't close '$filename': $!";

Notice that I've added some more good Perl programming advice (like correctly reporting errors, using 3 parameter version of open, etc.).

Also:
$file = @ARGV[ 0 ];

is technically not correct, you want:

$file = $ARGV[ 0 ];

or

$file = shift;



Starfrit,I just ran it. It worked fine. Maybe Blog...

2008-08-16T06:29:00.000-07:00

Starfrit,

I just ran it. It worked fine. Maybe Blogger parsed a tag. Did you do a copy and paste? Note below, I added an underscore, "_" in the while statement, <_FH>, because Blogger was complaining about during the submission of this comment. Remove it in your real script.

# more splitter.pl
#!/usr/bin/perl
$file = @ARGV[0];

open(FH, "< $file") or die "Unable to open file\n";

$count = 0;
$files_counter=1;
$max_records = 300;

while ( <_FH> )
{
if($count == 0)
{
$filename = $file . "_part_" . $files_counter;
open(FH2, "> $filename") or die "Unable to open file: $filename\n";
$count++;
}

if (grep /<\/item>/, $_ )
{
$count++;
}

print FH2 $_;

if ($count == $max_records + 1)
{
$count = 0;
$files_counter++;
close(FH2);
}
}

Now run it.
#./splitter.pl myxml.xml

Here's the output...
# ls -l myxml.xml*
-rwxrwxrwx 1 root other 25372146 Sep 19 2007 myxml.xml
-rw-r--r-- 1 root other 308835 Aug 16 22:05 myxml.xml_part_1
-rw-r--r-- 1 root other 282294 Aug 16 22:05 myxml.xml_part_10
-rw-r--r-- 1 root other 295288 Aug 16 22:05 myxml.xml_part_11
-rw-r--r-- 1 root other 298320 Aug 16 22:05 myxml.xml_part_12
-rw-r--r-- 1 root other 303570 Aug 16 22:05 myxml.xml_part_13
-rw-r--r-- 1 root other 297563 Aug 16 22:05 myxml.xml_part_14
-rw-r--r-- 1 root other 304841 Aug 16 22:05 myxml.xml_part_15
-rw-r--r-- 1 root other 298786 Aug 16 22:05 myxml.xml_part_16
-rw-r--r-- 1 root other 293452 Aug 16 22:05 myxml.xml_part_17
-rw-r--r-- 1 root other 293747 Aug 16 22:05 myxml.xml_part_18
-rw-r--r-- 1 root other 309752 Aug 16 22:05 myxml.xml_part_19
-rw-r--r-- 1 root other 284603 Aug 16 22:05 myxml.xml_part_2
-rw-r--r-- 1 root other 304117 Aug 16 22:05 myxml.xml_part_20
-rw-r--r-- 1 root other 310830 Aug 16 22:05 myxml.xml_part_21
...



I tried it because I'm expreriencing the same prob...

2008-08-14T16:32:00.000-07:00

I tried it because I'm expreriencing the same problem, but the script doesn't work.

I'm having a Syntax error at split.pl line 29, near : (:wq!)
Execution aborted because of compilation errors.

What do I have to do to correct the case ?



Erek,IMHO, they had other problems. I created anot...

2008-01-18T09:19:00.000-08:00

Erek,

IMHO, they had other problems. I created another tool and they wouldn't let me run it because special requirements were needed.

At first, I tried to load the larger XML file but it was killing their CPU. Then I used this splitter to better manage it. For some reason, my web host was even sensitive to these smaller files after several uploads. I guess you get what you paid for.

ux-admin is right -- I have bigger issues. Allowing someone else to run my site is crap but my choices are limited right now. I'm looking for an alternative solution.

Interestingly enough, the servers I work with have several web servers and are very powerful. The web servers are just minor points.



I'm assuming a simple file upload isn't causing th...

2008-01-18T08:49:00.000-08:00

I'm assuming a simple file upload isn't causing their CPU to go crazy, or they've got other problems, so I'm assuming that you're parsing the XML on the server side as it's uploaded.

Have you considered nicing the receiving process down, or adding some sleep statements between iterations that'll make your % CPU usage seem lower?

Cheers,
Erek



ux-admin,From previous posts, I know you're a big ...

2008-01-13T04:25:00.000-08:00

ux-admin,

From previous posts, I know you're a big AWK supporter.

I thought about that scenario many times but living in South Korea as a US expat legally limits my enterprise options on their soil. If I lived in the US, it would be a completely different story.

Believe me, I hate depending on someone else and almost to a fault. Obviously, I'm still looking for a solution to this problem...

Ideas?



Right. AWK is that perfect tool for the job in thi...

2008-01-13T02:19:00.000-08:00

Right. AWK is that perfect tool for the job in this particular case.

Anyways, if the web hoster is giving you crap, why just not roll out your own infrastructure?

All you need is a basement with electricity, a used rack, and some sniping on ebay, or, if you're good with logistics, brand new servers at fraction of price of the those cheap DELLs (some assembly required).

I'm kind of surprised. With your expertise, I didn't think you'd stand for depending on someone else for hosting.



A very good point jean-marc liotier.I wrote an AWK...

2008-01-12T09:41:00.000-08:00

A very good point jean-marc liotier.

I wrote an AWK-based parser script to get it into the XML format, which I shared a few months ago.

Anyways, I'll make the changes to better reflect its functionality. For our small project, it was a steady stream.



Handy, but it is actually more a XML splitter than...

2008-01-12T08:55:00.000-08:00

Handy, but it is actually more a XML splitter than a XML parser. In addition it assumes the flat structure of a steady stream of items.