[Tutorial]Simple Tokenizer that can be used in parsers.
Here we go:
First, create a module and create a public function named "TokenizeLine":
Code:
Public Function TokenizeLine(ByVal line As String)
End Function
Inside this function add the following lines of code:
Code:
Dim BCO As Integer = 0
Dim bcc As Integer = 0
Dim g As String = ""
Dim q As Boolean = False
Dim bracket As Boolean = False
For i As Integer = 1 To line.Length
Dim c As String = Mid(line, i, 1)
If c = """" Then
If q Then
q = False
Else
q = True
End If
End If
If q Then
g += c
Continue For
End If
If Not q Then
If c = "(" Then
BCO += 1
bracket = True
If BCO > 1 Then
If BCO = bcc Then
g += vbCrLf
End If
Else
g += vbCrLf
End If
End If
If c = ")" Then
bcc += 1
If BCO = bcc Then
g += c
bracket = False
BCO = 0
bcc = 0
End If
End If
If bracket Then
g += c
Continue For
End If
End If
If Char.IsLetter(c) Then
g += c
ElseIf Char.IsDigit(c) Then
g += c
ElseIf Char.IsWhiteSpace(c) Then
g += vbCrLf
ElseIf c = "+" Or c = "-" Or c = "/" Or c = "*" Or c = "=" Or c = "." Or c = "<" Or c = ">" Or c = "_" Then
g += vbCrLf & c & vbCrLf
ElseIf c = """" Then
If Not q Then
g += """"
End If
End If
Next
Dim f() As String = g.Split(ChrW(10))
Dim ic As String = ""
For Each j As String In f
If Char.IsWhiteSpace(j) Then
j = j.Replace(j, "")
End If
ic += j
Next
ic = ic.Trim
Return ic
Explaination:
'These two integers will keep record of opened and closed brackets.
Dim BCO As Integer = 0
Dim bcc As Integer = 0
'Will keep the final value.
Dim g As String = ""
'Checks whether a double quote is opened or not.
Dim q As Boolean = False
'Checks if a bracket is opened or not.
Dim bracket As Boolean = False
'Loop through every character of the line.
For i As Integer = 1 To line.Length
'Stores a single character.
Dim c As String = Mid(line, i, 1)
'If current character is a double quote;
If c = """" Then
'Check the value of variable q and invert the value;
If q Then
q = False
Else
q = True
End If
End If
'If double quote is found then, append the character and continue Loop;
If q Then
g += c
Continue For
End If
'If current character not a double quote;
If Not q Then
'If current character is an opened bracket
If c = "(" Then
'Increment the variable BCO
BCO += 1
'Put the 'bracket' to true;
bracket = True
'If count of opened brackets is greater than 1
If BCO > 1 Then
'If count of opened brackets = count of closed brackets
If BCO = bcc Then
'Append a newline;
g += vbCrLf
End If
Else
g += vbCrLf
End If
End If
'If current character is a closed bracket
If c = ")" Then
'Increment closed bracket's counter
bcc += 1
'If count of opened brackets = count of closed brackets
If BCO = bcc Then
'Increment the character ")" to the string
g += c
'Put 'bracket' to False
bracket = False
Put the two variables (That keep the record of brackets) to '0'
BCO = 0
bcc = 0
End If
End If
'We want to keep everything in line when a bracket is opened
If bracket Then
'So, append the string with the current character unless the bracket is closed.
g += c
'Continue the loop (Without executing rest of the lines)
Continue For
End If
'Check if the current character is a letter (alphabet)
If Char.IsLetter(c) Then
'If so, append it
g += c
'Check if the current character is a digit
ElseIf Char.IsDigit(c) Then
'If so, append it
g += c
'If a whitespace is found
ElseIf Char.IsWhiteSpace(c) Then
'Just insert a newline
g += vbCrLf
'If one of the following character's is found, seperate each of them by inserting a newline
ElseIf c = "+" Or c = "-" Or c = "/" Or c = "*" Or c = "=" Or c = "." Or c = "<" Or c = ">" Or c = "_" Then
g += vbCrLf & c & vbCrLf
'If a double quote is found.
ElseIf c = """" Then
'And if double quote is already opened.
If Not q Then
'Just put a double quote to close it.
g += """"
End If
End If
'Loop
Next
'Create a string to split the contents of the variable 'g'. Split them by newline.
Dim f() As String = g.Split(ChrW(10))
'Create an empty string to store the value
Dim ic As String = ""
'Loop through the splitted string
For Each j As String In f
'If a whitespace is found, just replace it by an empty string value
If Char.IsWhiteSpace(j) Then
j = j.Replace(j, "")
End If
'Increment the value of 'j' into 'ic'
ic += j
'Loop
Next
'Trim the value of 'ic'
ic = ic.Trim
'Return the final value
Return ic
Example:
Create a form. Add a richtextbox, a button and a treeview. Arrange them.
Rename the richtextbox to 'rtb' and rename the treeview to 'tv'
Now double click the button and add the following code in it:
Code:
'Clear the Treeview
tv.Nodes.Clear()
'Set the line number to 1
Dim ln = 1
'Loop through the lines of richtextbox
For Each x As String In rtb.Lines
'If current line is not an empty line
If Not x = vbNullString Then
'Add the current line number to treeview 'tv'
tv.Nodes.Add("Line No: " & ln)
'Create a string that will keep the splitted line (current line)
Dim tokens As String() = TokenizeLine(x).ToString.Split(vbCrLf)
'Loop through every token created after splitting
For Each n As String In tokens
'Add the token to its respective line number
tv.Nodes(tv.Nodes.Count - 1).Nodes.Add(n)
'Loop
Next
End If
'Increment the line number
ln += 1
Next
First it is created in Visual Basic 2010 Beta 2, so many might be unable to open it !!
Required Virus Scans
Here's the screenshot:
Attachment scans:
Virus Total... Symantec BS as always, rest is clear
Virus Scan.org Tokenizer_********* MD5: cbf96b6ecdf781a0f914e1702d5b90a6 didnot find malware.