C语言使用sscanf()解析字符串

最近刚看完《The C Programming Language》,果然是一本“小而美”的好书,最好能带着实际的目标去读,而不是走马观花地看个语法,相信每一个例子与习题都会带给你收获和提高。而我带着的目标,就是同事布置给我的“作业”:重构一个基于 C++ 实现的 IP Mapping 程序。

在查阅《C: A Reference Manual》后,感觉C语言库中可能没有解析CSV格式的原生函数,所以任务的第一步,就是造个解析CSV的轮子。(PHP是世界上最好的语言 :p )

而之前学习C的时候,在7.4格式化输入章节有这样一段介绍:

另外还有一个输入函数 sscanf,它用于从一个字符串(而不是标准输入)中读取字符序 列:

int sscanf(char *string, char *format, arg1, arg2, …)

它按照格式参数 format 中规定的格式扫描字符串 string,并把结果分别保存到 arg1、
arg2、…这些参数中。这些参数必须是指针。

于是我造了这么个木头轮子:

1
2
3
4
5
void resolve_line(char *lp) {
char start_ip[15], end_ip[15], base_code[10];
sscanf(lp, "%s,%s,%s", start_ip, end_ip, base_code);
printf("%s\n", base_code);
}

果然不能跑:)

于是乎继续往后看,后面有这样一段:

假设我们要读取包含下列日期格式的输入行:

25 Dec 1988

相应的 scanf 语句可以这样编写:

int day, year;

char monthname[20];

scanf(“%d %s %d”, &day, monthname, &year);

因为数组名本身就是指针,所以,monthname 的前面没有取地址运算符&。

看完后更觉得没毛病了。去网上看看别人咋写的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
/*****************************************************
** Name : sscanf.c
** Author : gzshun
** Version : 1.0
** Date : 2011-12
** Description : sscanf function
******************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void sscanf_test(void);

static void sscanf_test(void)
{
int ret;
char *string;
int digit;
char buf1[255];
char buf2[255];
char buf3[255];
char buf4[255];

/*1.最简单的用法*/
string = "china beijing 123";
ret = sscanf(string, "%s %s %d", buf1, buf2, &digit);
printf("1.string=%s\n", string);
printf("1.ret=%d, buf1=%s, buf2=%s, digit=%d\n\n", ret, buf1, buf2, digit);
/*
**执行结果:
**1.ret=3, buf1=china, buf2=beijing, digit=123
**可以看出,sscanf的返回值是读取的参数个数
*/

/*2.取指定长度的字符串*/
string = "123456789";
sscanf(string, "%5s", buf1);
printf("2.string=%s\n", string);
printf("2.buf1=%s\n\n", buf1);
/*
**执行结果:
**2.buf1=12345
*/

/*3.取到指定字符为止的字符串*/
string = "123/456";
sscanf(string, "%[^/]", buf1);
printf("3.string=%s\n", string);
printf("3.buf1=%s\n\n", buf1);
/*
**执行结果:
**3.buf1=123
*/

/*4.取到指定字符集为止的字符串*/
string = "123abcABC";
sscanf(string, "%[^A-Z]", buf1);
printf("4.string=%s\n", string);
printf("4.buf1=%s\n\n", buf1);
/*
**执行结果:
**4.buf1=123abc
*/

/*5.取仅包含指定字符集的字符串*/
string = "0123abcABC";
sscanf(string, "%[0-9]%[a-z]%[A-Z]", buf1, buf2, buf3);
printf("5.string=%s\n", string);
printf("5.buf1=%s, buf2=%s, buf3=%s\n\n", buf1, buf2, buf3);
/*
**执行结果:
**5.buf1=123, buf2=abc, buf3=ABC
*/

/*6.获取指定字符中间的字符串*/
string = "ios<android>wp7";
sscanf(string, "%*[^<]<%[^>]", buf1);
printf("6.string=%s\n", string);
printf("6.buf1=%s\n\n", buf1);
/*
**执行结果:
**6.buf1=android
*/

/*7.指定要跳过的字符串*/
string = "iosVSandroid";
sscanf(string, "%[a-z]VS%[a-z]", buf1, buf2);
printf("7.string=%s\n", string);
printf("7.buf1=%s, buf2=%s\n\n", buf1, buf2);
/*
**执行结果:
**7.buf1=ios, buf2=android
*/

/*8.分割以某字符隔开的字符串*/
string = "android-iphone-wp7";
/*
**字符串取道'-'为止,后面还需要跟着分隔符'-',
**起到过滤作用,有点类似于第7点
*/
sscanf(string, "%[^-]-%[^-]-%[^-]", buf1, buf2, buf3);
printf("8.string=%s\n", string);
printf("8.buf1=%s, buf2=%s, buf3=%s\n\n", buf1, buf2, buf3);
/*
**执行结果:
**8.buf1=android, buf2=iphone, buf3=wp7
*/

/*9.提取邮箱地址*/
string = "Email:beijing@sina.com.cn";
sscanf(string, "%[^:]:%[^@]@%[^.].%s", buf1, buf2, buf3, buf4);
printf("9.string=%s\n", string);
printf("9.buf1=%s, buf2=%s, buf3=%s, buf4=%s\n\n", buf1, buf2, buf3, buf4);
/*
**执行结果:
**9.buf1=Email, buf2=beijing, buf3=sina, buf4=com.cn
*/

/*10.过滤掉不想截取或不需要的字符串--补充,
**在%号后面加一*号,代表过滤这个字符串,不读取
*/
string = "android iphone wp7";
sscanf(string, "%s %*s %s", buf1, buf2);
printf("10.string=%s\n", string);
printf("10.buf1=%s, buf2=%s\n\n", buf1, buf2);
/*
**执行结果:
**10.android wp7
*/
}

int main(int argc, char **argv)
{
sscanf_test();

return 0;
}

/*
**测试程序
**环境:
**Linux ubuntu 2.6.32-24-generic-pae #39-Ubuntu SMP Wed Jul 28 07:39:26 UTC 2010 i686 GNU/Linux
**gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
**
gzshun@ubuntu:~/c/sscanf$ gcc sscanf.c -o sscanf
gzshun@ubuntu:~/c/sscanf$ ./sscanf
1.string=china beijing 123
1.ret=3, buf1=china, buf2=beijing, digit=123

2.string=123456789
2.buf1=12345

3.string=123/456
3.buf1=123

4.string=123abcABC
4.buf1=123abc

5.string=0123abcABC
5.buf1=0123, buf2=abc, buf3=ABC

6.string=ios<android>wp7
6.buf1=android

7.string=iosVSandroid
7.buf1=ios, buf2=android

8.string=android-iphone-wp7
8.buf1=android, buf2=iphone, buf3=wp7

9.string=Email:beijing@sina.com.cn
9.buf1=Email, buf2=beijing, buf3=sina, buf4=com.cn

10.string=android iphone wp7
10.buf1=android, buf2=wp7
*/

好像看出了点什么,还是 RTFM 吧:

scanf Width Specification

Reading Undelimited strings

To read strings not delimited by whitespace characters, a set of characters in brackets ([ ]) can be substituted for the s (string) type character. The set of characters in brackets is referred to as a control string. The corresponding input field is read up to the first character that does not appear in the control string. If the first character in the set is a caret (^), the effect is reversed: The input field is read up to the first character that does appear in the rest of the character set.

然后我把轮子改成了这样:

1
2
3
4
5
void resolve_line(char *lp) {
char start_ip[15], end_ip[15], base_code[10];
sscanf(lp, "%[^,],%[^,],%s", start_ip, end_ip, base_code);
printf("%s\n", base_code);
}

run到飞起:)


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!