正则

正则是在字符串中匹配特殊需要字段的表示方法。基本语法参考

Boost::regex使用

Match字段

这种方式是为了匹配字符字段是否跟你想要的正则表达式翻译的字段相同。

match.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <boost/regex.hpp>
#include <iostream>
#include <string>

using namespace::boost;
using namespace::std;

void test(void)
{
regex re("(https?://www.ttufo.com/.+/.+/.+)(_\\d+)(.html?)");
string target("http://www.ttufo.com/ufo/201705/154053_3.html");
// NOTE: cmatch is meaning char *, smatch is string
cmatch what;
// smatch what;

if (regex_match(target.c_str(), what, re)) {
// if (regex_match(target, what, re)) {
cout << "match " << what.size() << endl;

for (int i = 0; i < what.size(); i++) {

cout << "what[" << i << "]: " << what[i] << endl;
}
} else {
cout << "not match " << endl;
}
}

int main(void)
{
test();
return 0;
}

Search字段

这种方式是为了从字符字段中拿取与符合正则翻出的字符格式。

Search.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <boost/regex.hpp>
#include <iostream>
#include <string>

using namespace::boost;
using namespace::std;

int main(void)
{

// std::string str = "##0101QN=20160801085857223;ST=32;CN=1062;PW=100000;MN=010000A8900016F000169DC0;Flag=5;CP=&&RtdInterval=30&&1C80\r\n";
// std::string str = "mm789.232";
// boost::regex expression("([^0-9]+)([0-9]+).([0-9]+)");

std::string str = "LB33";
boost::regex expression("([^0-9]+)([0-9]+)");

// NOTE: cmatch is meaning char *, smatch is string
boost::smatch what;
// boost::cmatch what;

if ( boost::regex_search(str, what, expression) ) {
// if ( boost::regex_search(str.c_str(), what, expression) ) {
std::cout << what.size() << std::endl;
for (size_t i = 0; i < what.size(); ++i) {
if (what[i].matched)
std::cout << "what[" << i << "]:" << what[i] << std::endl;
}
}

// One another function to find character.
// string target = *json_string;
// string::size_type idx;
// idx = target.find("\r\n");
// if (idx == string::npos ) {
// cout << "not found\n";
// return false;
// } else {
// cout << "found\n";
// return true;
// }

return 0;
}

Boost编码转换

  Boost提供了一些常见的文本的的编码的转换的方法,比如Utf8转成GB2312,或者GB2312转回Utf8,如下举例

demo.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <iostream>

#include <boost/locale.hpp>

using namespace std;

void Utf8TranferToGB2312(string src_string, string & dec_string)
{
dec_string = boost::locale::conv::from_utf(src_string, "GB2312");
}

void GB2312TranferToUtf8(string src_string, string & dec_string)
{
dec_string = boost::locale::conv::to_utf<char>(src_string, "GB2312");
}


int main(int argc, char const *argv[])
{
string text = "中国你好";
string dec_string;
cout << text << endl;
Utf8TranferToGB2312(text, dec_string);
cout << dec_string << endl;
string tmp_dec_string;
GB2312TranferToUtf8(dec_string, tmp_dec_string);
cout << tmp_dec_string << endl;

return 0;
}