ITworld.com
  Search  
ITworld Home Page ITworld Webcasts ITworld White Papers ITworld Newsletters ITworld News ITworld Topics Careers ITworld Voices ITwhirled Changing the way you view IT
The /c Regex Modifier
PERL --- 10/11/2001

Andrew Johnson

To understand the /c regex modifier you first need to know how the /g modifier and the \G anchor behave. The /g modifier, as you probably already know, means 'keep applying the regex until it fails or we hit the end of the string': 

On this topic

$_ = '123456abc789';
my $pattern = '\d\d\d'; while ( m/($pattern)/g ) { print "$1\n"; }

The above will match each sequence of 3 digits and execute the loop. Each string has a positional marker associated with it that records where the last regex match ended. You can access or set this marker directly with the pos() function, thus the regex engine knows where to continue searching from in the string. When the pattern can no longer be found, the match operator returns false (ending the while loop in this case) and the positional marker is reset to 0 (the beginning of the string).

One thing to notice is that the above snippet will skip over the 'abc' part of the string -- that is, on the third attempt to match, we start trying to match at position 6 (right before the 'a') but we aren't forced to actually match at that point. To force the match to succeed where we left off we would do:

$_ = '123456abc789';
my $pattern = '\d\d\d'; while ( m/\G($pattern)/g ) { print "$1\n"; }

In this case, each occurrence of $pattern *must* be found immediately following the positional marker (either the beginning of the string, or wherever the last successful match left off). Thus, this snippet only finds and prints '123', and '456', and then the match fails.

What if we wanted to match different patterns while stepping through the string (say, sequences of three digits or three lowercase letters)? We could set up an alternation pattern and then test the captured results:

$_ = '123456abc789';
my $pattern = '\d\d\d|[a-z]{3}'; while ( m/\G($pattern)/g ) { my $result = $1; if ($result =~ /\d/) { print "We got 3 digits\n"; } else { print "We got 3 letters\n"; } }

That's not horrible, though we needed to test for numbers twice (once in the original pattern, and once in the if test). This could get more cumbersome if we had more choices to distinguish (and slower because alternations in regexen are somewhat slow).

The /c modifier allows a /g match to fail without resetting the positional marker so we can try another match:

$_ = '123XYZ456abc789';
while (1) { print "Got digits ($1)\n" and next if m/\G(\d\d\d)/gc; print "Got UCase ($1)\n" and next if m/\G([A-Z]{3})/gc; print "Got LCase ($1)\n" and next if m/\G([a-z]{3})/gc; print "End of Parsing\n" and last if m/\G$/gc; print "Parse Error at position: ", pos(), "\n" and last; }

Now, we never skip over any data that we haven't accounted for, yet when any regex fails we simply try the next regex from the same position. Our parse of the string only fails if all of the regexen fail and we hit the last line of the loop. The above succeeds through the string, but if you try $_ = '123ABC456ab789'; you'll get a parse error message at position 9. If you tried this without the /c modifier you would have a problem because the if the first regex fails it would reset the positional marker to 0 (meaning you wouldn't be starting where you wanted with the next regex).

 

Andrew Johnson works as a programmer/consultant and is the author of Elements of Programming with Perl from Manning Publications.



Advertisements
Sponsored links
Top 5 Reasons to Combine App Performance and Security
KODAK i1400 Series Scanners stand up to the challenge
Locate Hidden Software on business PCs with this free tool
Bring harmony to your mix of UNIX-Linux-Windows computing environments
 Home   Newsletters  PERL
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Industry Standard   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.