print('Available IP Address in log file '+'=> '+str(k)+' '+'Count '+'=> '+str(v))
译者注:如日志文件较大可像 Alan 这样指定仅读取前 xx 字节
运行脚本,我们将得到如下输出:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
$python3 parse_ip_address.py
Reading Apache log file
Available IP Address inlog file=>117.188.30.192Count=>65
Available IP Address inlog file=>66.249.79.227Count=>1
Available IP Address inlog file=>111.202.101.7Count=>1
Available IP Address inlog file=>46.229.168.146Count=>1
Available IP Address inlog file=>46.229.168.133Count=>1
Available IP Address inlog file=>54.36.148.177Count=>1
Available IP Address inlog file=>23.239.1.95Count=>1
Available IP Address inlog file=>91.121.155.172Count=>4
Available IP Address inlog file=>64.233.172.168Count=>1
Available IP Address inlog file=>54.36.148.58Count=>1
Available IP Address inlog file=>116.227.9.111Count=>1
Available IP Address inlog file=>207.46.13.34Count=>1
Available IP Address inlog file=>66.249.79.203Count=>1
Available IP Address inlog file=>40.77.167.115Count=>4
Available IP Address inlog file=>136.243.70.151Count=>2
Available IP Address inlog file=>37.115.190.120Count=>3
Available IP Address inlog file=>42.156.137.108Count=>8
Available IP Address inlog file=>42.156.136.64Count=>1
Available IP Address inlog file=>42.120.161.64Count=>2
Available IP Address inlog file=>42.156.138.64Count=>1
Available IP Address inlog file=>42.156.139.108Count=>1
Available IP Address inlog file=>207.46.13.33Count=>1
Available IP Address inlog file=>1.10.187.34Count=>9
Available IP Address inlog file=>66.249.79.229Count=>3
Available IP Address inlog file=>109.228.56.115Count=>1
Available IP Address inlog file=>66.249.79.231Count=>2
Available IP Address inlog file=>183.143.43.108Count=>9
Available IP Address inlog file=>42.156.137.83Count=>2
Available IP Address inlog file=>42.156.139.83Count=>2
Available IP Address inlog file=>42.120.161.83Count=>1
Available IP Address inlog file=>42.120.160.95Count=>1
Available IP Address inlog file=>42.120.161.95Count=>1
Available IP Address inlog file=>182.134.133.186Count=>1
Available IP Address inlog file=>157.55.39.109Count=>1
Available IP Address inlog file=>42.156.136.22Count=>2
Available IP Address inlog file=>42.156.137.22Count=>3
Available IP Address inlog file=>42.156.138.22Count=>2
Available IP Address inlog file=>42.120.160.22Count=>4
Available IP Address inlog file=>42.156.139.22Count=>4
Available IP Address inlog file=>37.9.87.213Count=>4
Available IP Address inlog file=>54.36.149.17Count=>1
Available IP Address inlog file=>54.36.148.229Count=>1
Available IP Address inlog file=>54.36.149.22Count=>1
Available IP Address inlog file=>168.181.61.154Count=>1
Available IP Address inlog file=>125.82.16.199Count=>2
Available IP Address inlog file=>123.125.71.95Count=>1
Available IP Address inlog file=>111.206.198.79Count=>1
Available IP Address inlog file=>111.206.198.100Count=>1
Available IP Address inlog file=>54.36.149.64Count=>1
Available IP Address inlog file=>42.84.39.146Count=>1
Available IP Address inlog file=>180.173.173.168Count=>1
Available IP Address inlog file=>125.70.190.216Count=>7
Available IP Address inlog file=>54.36.148.104Count=>1
Available IP Address inlog file=>42.120.160.114Count=>2
上例中,我们创建了Apache日志解析器来获取对应的 IP 地址及其对服务器的请求次数。因此,很明确我们无需整个Apache日志文件的所有行,仅需获取日志文件中的 IP 地址。实现这一获取,我们需要定义一个模式来搜索 IP 地址,我们可通过正则表达式来实现。因此我们导入了 re 模块。然后我们导入了Collection模块来代替 Python 的内置数据类型:字典、列表、集合和元组。该模块有特定的容器数据类型。在导入所需模块后,我们使用正则表达式编写了一个模式来匹配指定条件来从日志文件中映射 IP 地址。
在这个匹配模式中,\d为0到9之间的任意数字,\r表示原生字符串。然后,我们打开了名为access.log的Apache日志文件并进行了读取。之后我们对Apache日志文件应用了正则表达式条件,接着使用Collection模块中的Counter 函数来对以re条件获取到的 IP 地址进行计数。最后,正如在输出中所见我们打印出了执行的结果。