Ruby strtok()
When developing a CSS analyzer, I needed to develop a method for splitting the CSS files into meaningful chunks.
These 'meaningful' chunks should be the sequence of characters which have semantic value in the CSS specification such as keywords (em
, border
), selectors(div
, p + p
), and property values(bold
, #773e1a
).
C provides a function specifically crafted for this occasion - strtok()
.
strtok()
is defined in the ISO C standard and available in may C based languages (C++, PHP, and Matlab).
strtok()
is a function which splits strings into tokens based on a set of delimeters.
A string passed into strtok()
is divided into an array of tokens which contain the characters in between one or more delimiters.
Here is a simplified example (in C) of the tokenizer to demonstrate strtok()
.
#include <stdio.h>
#include <string.h>
#define DELIM "{}:; "
int main (int argc, char **argv) {
char str_to_tokenize[] = "p { font-size: 1.4em; font-weight: bold }";
char *str_ptr;
fprintf(stdout, "Split \"%s\" into tokens:\n", str_to_tokenize);
str_ptr = strtok(str_to_tokenize, DELIM);
for(; str_ptr != NULL;) {
fprintf(stdout, "%s\n", str_ptr);
str_ptr = strtok(NULL, DELIM);
}
return 0;
}
Ruby does not provide an interface to strtok()
.
However, the String#split
method can perform the same task with more flexibility.
split
takes one parameter, the delimiter which can be either a character or a regular expression.
Below is the Ruby version of the same example.
DELIM = /[{}:;]+/
str_to_tokenize = "p { font-size: 1.4em; font-weight: bold }"
puts str_to_tokenize.split(DELIM)
Since we are using a regular expression to define the delimiter we can include additional functionality.
For example, to simplify harmony's parser I opted to collect the delimiter that was found between each token.
This tweak only required two additional characters DELIM = /([{}:;]+)/
.
By using regular expressions with String#split
we can duplicate the behavior of strtok()
.