consultix small logo

Beyond Regexes: Text Parsing with Perl Modules

1 day course
Consultix
> Training Classes > Perl


Home
Training Services
Public Class Schedule
Training Classes
 - UNIX
 - Linux
 - Perl

Clients and Endorsements
Publications
Interviews
About Consultix
by  Visiting Instructor   Dr. Damian Conway      Damian Conway Photo

Next Public Offering:      TBA; Help Us Schedule! 

Description

This 1-day seminar, written and presented by Dr. Damian Conway will show you how to use a range of standard Perl features and several CPAN modules (including Conway's Parse::RecDescent and Text::Balanced) to decipher and process a variety of complex data and command formats. It's a practical introduction to the techniques of grammar-based recursive-descent parsing.

You might like to read comments from an attendee at The Perl Conference 4.0 presentation of this seminar.

Attendees will learn:

  • how to design and build parsers to process Apache configuration files and log data,
  • how to process structured expressions (e.g. search engine queries),
  • how to balance nested brackets and match delimiters without a regular expression,
  • how to fold, spindle and mutilate the comments in a C program,
  • how to dissect C++ type declarations with a self-adapting parser,
  • how to allow embedded Perl code in your own data format or command language,
  • how to deal with ambiguous data by parsing it in multiple universes simultaneously,
  • how to get Parse::RecDescent to write most of your grammar for you,
  • how to parse modular text (e.g. with source with #includes in it),
  • how to pre-filter your source code by tricking Perl into (nearly) parsing Perl,
  • how to debug Parse::RecDescent parsers efficiently and how to improve the efficiency of your Parse::RecDescent grammars,
  • how to convert natural language queries into SQL
  • how to pull pesky unmatched <P> tags from HTML,
  • how to write a program that does stand-up comedy! 8-}
See below for the full Seminar Outline.


NOTE: The first part of this seminar was presented in 1999 as Tutorial P22 at The Perl Conference 3.0. Both parts were presented in tutorial sessions at The Perl Conference 4.0 in Monterey, during July, 2000.

Who Should Attend

The techniques presented in this course have general applicability, and will be useful to anyone who needs to process structured input of any kind.

This seminar is designed for those familiar with basic Perl programming, and having experience using regular expressions, subroutines, hashes and arrays, references, data structures built on hashes and arrays, and using the methods of object-oriented Perl modules. Most of these pre-requisites can be satisfied by attending the Consultix  "Perl Programming, plus Modules" course (or having equivalent experience), and studying the following resources:

Author & Instructor

Dr. Damian Conway holds a Ph.D. in Computer Science and is a Senior Research Fellow with the School of Computer Science and Software Engineering at Monash University, Melbourne, Australia.

He is the author of numerous well-known Perl modules including: Class::Contract, Text::Autoformat, Parse::RecDescent, Text::Balanced, Lingua::EN::Inflect, Class::Multimethods, Switch, Quantum::Superpositions, NEXT, Filter::Simple, Attribute::Handlers, Inline::Files, and Coy (all available from your local CPAN mirror).

Picture of OOP Book
  Damian was the winner of the 1998, 1999, and 2000 Larry Wall Award competitions for the most practical Perl utility program. He is a member of the technical committee for The Perl Conference, a former columnist for The Perl Journal, author of the book Object Oriented Perl, a member of the Perl 6 design team, and a popular public speaker.

In 2001, Damian received the first YAS Perl Development Grant and spent the year working on projects for the betterment of Perl. He is continuing this work in 2002 under a similar grant from The Perl Foundation.
 

instructors, including Visiting Instructor Dr. Damian Conway, are renowned for their ability to communicate complex concepts in simple terms and to make the study of dry technical material enjoyable. We pride ourselves in providing training experiences that our customers rave about!  
 

Seminar Outline

    Part I
    • A brief history of parsing
      • grammars, rules, recursive descent, etc.
    • Implementing parsers
      • top-down vs bottom-up approaches
    • Useful tools
      • Text::Balanced, Parse::Yapp, perl-byacc, Parse::RecDescent
    • Simple parsing
      • Parsing delimited text, parsing Perl subsets
    • Parsing data
      • Parsing Apache log files
      • optional subrules, list parsing
      • run-time parser generation
    • Parsing input
      • The Text::Query modules
      • OO parsing
      • operator precedence, lists, look-ahead, rejections, etc.
    • Parsing code
      • parsing C and C++
      • stateful grammars
      • porting yacc grammars (including left-recursion)
      • self-extending parsers, committing rules, deferred actions
      • grammar precompilation
    • Parsing natural language
      • generating SQL queries for natural language input
      • synthetic stand-up via reciprocal parsers
    Part II
    • Miscellaneous advanced features of Text::Balanced
      • precompiling delimiter extractions
      • extracting tagged text
      • extracting Perl variables
      • extracting mixed components
    • Miscellaneous extra features of Parse::RecDescent
      • Named items (the %item array)
      • Debugging grammars: <trace>, <warn>, <hint>, and <nocheck>
      • Context information: $thisline, $lastoffset, @itempos, etc.
      • Extreme prejudice: the <fail> directive
    • Non-deterministic parsing
      • tracking "goodness-of-match"
      • the <score> and <autoscore> directives
    • Pre-tokenization
      • the <token> directive
      • token-based parsing
    • Automatic grammar generation
      • autoactions
      • autostubbing
      • autotrees
      • the <perl_quotelike>, <perl_codeblock>, and <perl_variable>
      directives
    • Generic rules
      • the <matchrule> directive
      • subrule arguments: @arg and %arg
    • Handling distributed text
      • processing file inclusions recursively
      • processing file inclusions by input modification
      • other uses of input modification
    • Semi-grammatical parsing
      • when Parse::RecDescent is overkill and regexes don't appeal
      • CSV revisited, text interpolation, simple command interfaces
    • Self-modification
      • Run-time parser generation and self-extending parsers revisited
      • A self-modifying Apache config/log file parser
      • (Nearly) parsing Perl
        • parsing with Text::Balanced on Occam's Razor
        • source code filtering
      • Metagrammars
        • building a grammar for parsing grammars
        • beat poetry and postmodern literature

Other Courses


We have courses on many other Perl and UNIX/Linux topics!
 


© Copyright 1994-2005   Pacific Software Gurus, Inc.   All Rights Reserved.

   Powered by Google