This is for work, so I've changed the extensions and files to protect the innocent.
I am parsing text from a description looking for a file name in the format word_here and it can have as many underscores as needed plus an optional extension. I was able to come up with this regular expression which works
Test 1
text = 'Some text here: * my_file_stuff.mat * other_file * third_file *'
FILE_REG_EX = r'([w]+_+[w]+.*[py|mat]*)'
res = re.findall(FILE_REG_EX, text)
print(res)
Output 1
python test_regex.py
['my_file_stuff.mat', 'other_file', 'third_file']
The problem is it doesn't work for stuff like this
Test 2
text = '|my_file|another_file.mat|O_HERES_ONE|_O_HERES_ANOTHER| | | |'
FILE_REG_EX = r'([w]+_+[w]+.*[py|mat]*)'
res = re.findall(FILE_REG_EX, text)
print(res)
Output 2
python test_regex.py
['my_file|a', 'nother_file.mat|', 'O_HERES_ONE|', '_O_HERES_ANOTHER|']
I modified my regex to include the vertical bar, here
Test 3
text = '|my_file|another_file.mat|O_HERES_ONE|_O_HERES_ANOTHER| | | |'
FILE_REG_EX = r'([w]+_+[w]+.*[py|plot]*)|'
res = re.findall(FILE_REG_EX, text)
print(res)
Output 3
python test_regex.py
['my_file', 'another_file.mat', 'O_HERES_ONE', 'O_HERES_ANOTHER']
and that works for the second one but now not for the first one. Part of the issue is I will be searching a description for text to look up where a file is at, and I have no way of knowing what formatting it will use for files, only that they will be something in the form of MY_FILE_HERE01.py with or without the extension.
I've tried using the not symbol to exclude the vertical bars in front and back, but that seems to come up empty for both strings.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…