RegEx - Using regular expressions on result data tables
In PROMPT it is now possible to apply any regular expressions on columns of a result data table. Regular expression can be used as simple find & replace, but also allow to reformat the data shown in the spreadsheet viewer in a multitude of ways.To open the result data viewer, just double-click on any result line in the main window. (In case no results have been produced so far, load the demo workspace from the Help menu).
Here we present a simple application of the regular expression functionality
of PROMPT called RegEx on the result of mapping between two protein sets.
You may notice that the identifiers in the SUBJECT_ID column contain not only the GenBank identifier but also additional information. He we are interested to adjust the format of the identifiers of the SUBJECT column so that they are equal to the simple GI_NUMBER format as in the QUERY column.
After you have opened the spread sheet viewer of the mapping result choose RegEx ... Apply regular expression from the top menu bar as shown below. Then he RegEx dialog will appear.
In the first step you can choose the column you want to change with your regular expression from all the columns that are in the underlying result data table.
To continue, click the Next button.
Step 2: Enter and preview the regular expression
In the second step of the RegEx dialog, you can enter the pattern that should match the column, and choose if a standard pattern match shall be performed or if the replace with option is more aplicable for you. The pattern has to be in Java Regular expresion format. An overview is presented in the Pattern-reference of this document.
You want to change the genbank identifiers in the SUBJECT_ID column of the given mapping result (image above), so that they have the same format as the ones in the QUERY_ID coulumn. This is done in two steps. First, you have to cut out the important part via a standard pattern match. Use the following pattern (as seen in the regexshot): 'gi.\d'. This matches the letters 'gi' one following random character (in this case'|') and one or more digits (\d for digits, + for one or more times; * would be zero or more times). After clicking the Preview button, you can observe the effects of your pattern applied to the data in the Preview section. Here the first five rows of the result data table are shown. If you don't like the effect, you can reset it via the Reset button.
In our example, the identifiers of the SUBJECT_ID column should look like in the given screenshot (e.g. gi|13364696). Now you can apply a second step (Attention: At the moment, you have to Finish after every change you made, to transfer it to the underlying result data table.) and use the replace with option to make '_' out of'|' and upercase of 'gi'. But be careful: you have to use 'gi\|' as pattern and 'GI_' as replacement string, as '|' is a special operator (mode details in the Pattern-reference). After this second step, the SUBJECT_ID column should have the same format as the QUERY_ID column (e.g. GI_13364696). Now you can press the Finish button. To save the changes you have to save them in the result data table window (e.g. File...SaveAsExcel).
Attention: You can only Finish after having previewed the changes at least once.The underlying result data table is not changed until you click Finish.
Mode of the pattern matching method:
|Standard pattern match||
If you selected the standard pattern match option, everything except the string matching your pattern will be cropped from the cells of the selected column.
If you selected the replace with option, you can specify a string, that will replace every occurence of the pattern in the cells of the selected column.
Details about the pattern syntax for regular expressions can be found in the Pattern reference.