Boost正则、编码转换
|Word count:597|Reading time:3min
正则
正则是在字符串中匹配特殊需要字段的表示方法。基本语法参考。
Boost::regex使用
Match字段
这种方式是为了匹配字符字段是否跟你想要的正则表达式翻译的字段相同。
match.cpp1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| #include <boost/regex.hpp> #include <iostream> #include <string>
using namespace::boost; using namespace::std;
void test(void) { regex re("(https?://www.ttufo.com/.+/.+/.+)(_\\d+)(.html?)"); string target("http://www.ttufo.com/ufo/201705/154053_3.html"); cmatch what;
if (regex_match(target.c_str(), what, re)) { cout << "match " << what.size() << endl;
for (int i = 0; i < what.size(); i++) {
cout << "what[" << i << "]: " << what[i] << endl; } } else { cout << "not match " << endl; } }
int main(void) { test(); return 0; }
|
Search字段
这种方式是为了从字符字段中拿取与符合正则翻出的字符格式。
Search.cpp1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
| #include <boost/regex.hpp> #include <iostream> #include <string>
using namespace::boost; using namespace::std;
int main(void) {
std::string str = "LB33"; boost::regex expression("([^0-9]+)([0-9]+)");
boost::smatch what;
if ( boost::regex_search(str, what, expression) ) { std::cout << what.size() << std::endl; for (size_t i = 0; i < what.size(); ++i) { if (what[i].matched) std::cout << "what[" << i << "]:" << what[i] << std::endl; } }
return 0; }
|
Boost编码转换
Boost提供了一些常见的文本的的编码的转换的方法,比如Utf8转成GB2312,或者GB2312转回Utf8,如下举例
demo.cpp1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
| #include <iostream>
#include <boost/locale.hpp>
using namespace std;
void Utf8TranferToGB2312(string src_string, string & dec_string) { dec_string = boost::locale::conv::from_utf(src_string, "GB2312"); }
void GB2312TranferToUtf8(string src_string, string & dec_string) { dec_string = boost::locale::conv::to_utf<char>(src_string, "GB2312"); }
int main(int argc, char const *argv[]) { string text = "中国你好"; string dec_string; cout << text << endl; Utf8TranferToGB2312(text, dec_string); cout << dec_string << endl; string tmp_dec_string; GB2312TranferToUtf8(dec_string, tmp_dec_string); cout << tmp_dec_string << endl;
return 0; }
|