Shell | log日志文件分析,并写shell小程序

公司老总让弄个分析log日志的小程序,用来分析confluence的log登录日志。可以查看哪些人什么时候登录了~其实如果程序带这个功能的话就不难,但当时也没找到,后来经过N个小时的苦苦寻找,终于还是找到了,虽然不是最关键的,但对于当时的我来说已经很高兴了!(后来找到的官方方法

为什么高兴?呵呵,因为confluence默认没有log功能。当时找这个功能竟然就费了我好多时间(因为还要做其他很多的事情,所以并没有全部精力放于此)。然后开始找规则,我大概看了下confluence的管理功能模块,应该是没有在程序内显示登录详情的部分,也许就算开启后会出现吧。所以我最少着手找文件内容的规律,比如登录时有哪些关键字,而且此关键字不会在其他操作时出现。稍微试下就找到了 dologin 这个关键字,即做登录动作,并且与其他没有可关联性(也就是说只要有它存在即可证明为登录动作),接着找到logout这个关键字,退出动作,也是只有在退出时出现~!好,这两个已经找到,那接下来就是谁登录了,用我的帐号登录一下,kinggoo登录。找到了~在dologin这个关键字的下面大约10行以内会搜索到(注意这个地方哦,大约10行以内)如图(3)下图。

3

看到效果了吧!就是这样。然后请看下一张图,是我大概的分析图

3[7]

 1).在搜索dologin时,在它的下面10行以内一定会搜索到name

2).在搜索logout时,在它的下面10以内一定会搜索到name

3).而在搜索name时,却不一定会搜到dologinlogout,而且还会搜索到一大堆的其他数据。所以我不会在搜索时把他放在第一个条件上!我可不想做这傻事~

然后

#grep dologin catalina.out

查看一下都多少个

返回结果:

[root@Config script]# grep dologin catalina.out
2010-12-02 13:37:16,090 DEBUG [http-8090-14] [atlassian.util.profiling.UtilTimerStack] log [55ms] – /dologin.action
2010-12-02 13:50:10,062 DEBUG [http-8090-12] [atlassian.util.profiling.UtilTimerStack] log [72ms] – /dologin.action

有两条记录,但发现没有,并没有输入有关于name的名称。这就是我为什么在上面特别提到大约10行以内其实5行就可以了,但为了保险起见,还是多让程序跑几次loop吧!
然后就开始想办法了,我shell 并没有那么好,所以可能不知道有什么方法可以直接提取多行文件吧~但我知道用其他方法去代替。大路走不通也要试试小路吗哈哈~
后来想起来cat –n 可以打印出行号~嘿嘿那我为何不利用这个行号来做呢,但经过尝试cat -n做出来的文件重复性太强,所以抛弃(也可能是我所掌握的cat参数太少吧~也许吧)
然后想起来nl打印文件,可以把行号打印出来并且可以在行号自动补0哈哈~那就它了 ~
  
nl -b t -n rz -w 4  // –b –t 是不显示空行,-n rz 是行号在右方显示,0补全,-w 是有几位数如:-w 4 是这样显示0003,当然这个-w后面参数要看你们日志大小了~也可以在之前做个判断~然后把-w 后加变量~

这个方法也找到了~,那接下来是如果能提取出来多行了~
因为行号是绝对唯一的,而在log日志里行号也是唯一的~所以我用两个文件来调
那么我将先去grep dologin 日志文件 并写到temp.g文件内,接着
nl -b t -n rz -w 4 temp.g 并写入temp.n文件
那么在*文件内容是:

temp.g temp.n

4112
4114
4115
4123
7104
7106
7107
7115

0001  4112
0002  4114
0003  4115
0004  4123
0005  7104
0006  7106
0007  7107
0008  7115

对比下两个文件
接下来我就开始分析,如果比对会减少代码写入量,并且可用性比较强,分析草稿图~嘿嘿,乱画的。多数在大脑里~

20101204104

 20101204105

分析完后,肯定是要判断了~因为0001里有1,因为你要grep查找这个行号对应的temp.g的行号~,比如你查找1那么含有1的行号包括0021、0031、1111等等都有地,要查找只能通过前面补零来查找了~那绝对是唯一的~
那么我们可以通过正则表达式去找多想要的结果即行号所对应的值,然后赋值给变量,接着用这个变量去查另一个文件中行号所对应的数据即log日志中的行号,而由于上面的操作是一个循环体,那么就可以使用for循环了,当然wileh也可以,不过这个操作可能要用到几个循环的嵌套,并且在
最后一个循环时是循环10次而不是循环行号的次数了~!不然计算机会很累的~哈哈

大致就说到这里,几乎差不多了~现在小程序还没有写完,但我觉得思路比较重要,可以在扩展一下。

Thanks, End!

后来找到的官方方法,不过还是要分析的呀返回继续阅读
How to audit Confluence – enabling user access logging
Skip to end of metadata

    * Page restrictions apply
    * Added by Matt Ryall (Atlassian), last edited by Roy Hartono on Nov 17, 2010  (view change)
    * show comment hide comment

Comment: updated other options
Go to start of metadata

Often, for auditing purposes, administrators need to know who did what. Notifications are not ideally suited for this purpose. Instead, you can generate a basic log indicating which users are accessing which pages in Confluence. Application servers are able to log the requested URL, but they cannot determine the currently logged in user. This log is not currently formatted to be accessible to web log analysis tools such as AwStats as it lacks a host and get method, so must be viewed manually.

Similar to JIRA, Confluence has a built-in access logging mechanism, which shows the user and URL invoked. To enable it, you need to modify a couple of configuration files and restart Confluence.
Configuring the AccessLogFilter

There is a simple AccessLogFilter in Confluence than can be enabled via confluence/WEB-INF/classes/log4j.properties and confluence/WEB-INF/web.xml.
    Please do not modify the application-wide web descriptor, $server/conf/web.xml. This will be ineffective and potentially may break Confluence.

   1. Uncomment these line in log4j.properties:
   2. Enable the filter in web.xml by removing the comments around these lines:

      Notice that the *.action pattern is added optionally to log the actions of Confluence in addition to the page views, such as user logins by specifying login.action. This combination of URL patterns will work for all URLs. You can further modify the pattern by adjusting the url-pattern field.
          For troubleshooting purposes, often it is useful to capture all accesses to Confluence. To do this use this filter mapping in web.xml instead of the above:
   3. Restart Confluence

This will result in logging information being stored in the atlassian-confluence.log file in the confluence-home directory.
Advanced configuration

After this is working, you could redirect the access log to a different file by adding a new RollingFileAppender at the top of log4j.properties:

log4j.appender.accesslog=org.apache.log4j.RollingFileAppender
log4j.appender.accesslog.Threshold=DEBUG
log4j.appender.accesslog.File=${catalina.home}/logs/atlassian-confluence-access.log
log4j.appender.accesslog.MaxFileSize=20480KB
log4j.appender.accesslog.MaxBackupIndex=5
log4j.appender.accesslog.layout=com.atlassian.confluence.util.PatternLayoutWithStackTrace
log4j.appender.accesslog.layout.ConversionPattern=%d %p [%c{4}] %M %m%n

Find this line:

#log4j.category.com.atlassian.confluence.util.AccessLogFilter=INFO

Change it to this:

log4j.category.com.atlassian.confluence.util.AccessLogFilter=INFO, accesslog
log4j.additivity.com.atlassian.confluence.util.AccessLogFilter=false

The web.xml url-pattern given above only matches page views (/display/*). You could change the url-pattern, or duplicate the entire filter-mapping to log access for different kinds of access (/admin/* for admin functions, /pages/* for edits and creates, etc. Note that /pages/editpage.action* doesn’t work).
What is logged

The format produced is the following values separated by spaces:

   1. Username or ‘-‘ if no user
   2. URL
   3. VM free memory at start of request (in KB)
   4. Change in free memory after request is finished (in KB)
   5. Time taken for request (in ms).
   6. Remote address

Example:

2008-08-08 10:33:05,359 INFO [atlassian.confluence.util.AccessLogFilter] init AccessLogFilter initialized. Format is: <user> <url> <starting memory free (kb)> +- <difference in free mem (kb)> <query time (ms)> <remote address>
2008-08-08 10:47:27,015 INFO [atlassian.confluence.util.AccessLogFilter] doFilter admin

http://localhost:8080/display/ds 42025-154 15 127.0.0.1
2008-08-08 10:47:27,187 INFO [atlassian.confluence.util.AccessLogFilter] doFilter admin
http://localhost:8080/display/ds/Confluence+Overview 41805+982 172 127.0.0.1
2008-08-08 10:47:36,296 INFO [atlassian.confluence.util.AccessLogFilter] doFilter admin
http://localhost:8080/display/ds/Breadcrumb+demonstration 42102-6660 156 127.0.0.1
2008-08-08 11:08:16,875 INFO [atlassian.confluence.util.AccessLogFilter] doFilter admin
http://localhost:8080/display/ds/test+firelite 34362-1616 188 127.0.0.1
2008-08-08 11:47:01,890 INFO [atlassian.confluence.util.AccessLogFilter] doFilter admin
http://localhost:8080/display/sand 59711-148 0 127.0.0.1
2008-08-08 11:47:02,171 INFO [atlassian.confluence.util.AccessLogFilter] doFilter admin
http://localhost:8080/display/sand/Home 59497-2302 234 127.0.0.1
2008-08-08 11:47:04,500 INFO [atlassian.confluence.util.AccessLogFilter] doFilter admin
http://localhost:8080/display/ds/Tasklist 57124+155 1266 127.0.0.1

The above may be preceded by additional log4j-generated text, depending on the log4j pattern which is configured.

- THE END -
版权声明:
转载原创文章请注明,文章出处://kinggoo.com
原文地址:https://kinggoo.com/chengxu-shell-fenxiwikilogshell.htm
发表评论?

8 条评论。

  1. 你好,看了你的日志,觉得你懂好多,我有些问题想请教。谢谢啊!我在一所大学实习,那个smart lighting center用wiki confluence. 老师她想要知道有谁看过空间的页面,页面被看过多少次,被谁看过;附件被下载多少次,谁下载的。我搜了好几天,有很多插件比如viewtracker能做一些这样的事,但问题在于这些插件记录的是插件被安装后的数据,一些在安装前写的页面的数据就不完整。老师想要完整的数据,我看官网说可以分析log文件,我非CS出身几十MG的日志文件我实在不知道从何下手。上网搜就搜到你的帖子了,想问1,你一下这个log文件能满足老师的要求嘛?2,我应该怎么分析这个log文件呢?我是不是只用分析这种的就好了呢?
    2008-08-08 10:47:27,015 INFO [atlassian.confluence.util.AccessLogFilter] doFilter admin http://localhost:8080/display/ds 42025-154 15 127.0.0.1
    2008-08-08 10:47:27,187 INFO [atlassian.confluence.util.AccessLogFilter] doFilter admin http://localhost:8080/display/ds/Confluence+Overview 41805+982 172 127.0.0.1
    2008-08-08 10:47:36,296 INFO [atlassian.confluence.util.AccessLogFilter] 但是我怎么才能把别的信息删除,把这种提取出来呢?
    谢谢谢谢!!!如果您不知道也没关系,依然谢谢你的帮助。我实在没办法了,我连澳大利亚和美国的客服都打电话问了,他们让我post issue,我post了,可是他就一个劲儿给我介绍plugin.

    • php可以的 给你提供几个点吧
      我php也不好,只是了解一些
      可以使用file_get_contents读取日志文件内容,如果文件过大,可以通过start 、max_length来分别读取日志文件,然后通过正则来查找相匹配的url,将匹配后结果保存到变量或文件内。
      因为这个日志文件多数都在catalina.out这个文件里,其他日志文件里不会有保留的。所以你最好每日备份,并清空远catalina.out文件,创建 有规则的日志文件名(最好与catalina.out内的日期格式一样最好),最好与日期有关
      当post方式或get传参到你写好的php页面,php页面接收 url页面地址,用户,日志文件日期,排序方式等来实现

  2. Can you provide more information on this? cheers

  3. Thank you, I have recently been searching for information about this topic for ages and yours is the best I have discovered so far.

    • Very happy to help you! I hope it can help you.
      Sorry, I can’t speak English! I use bubble Translate plugin to translation, Plug-in expansion.
      Do you speak Chinese? How do you read this article?
      Thanks.

  4. 你这种分析问题的过程挺好

    • 不过,那天跟相明说了下,相明说也可以使用sed来截取!回家看了下,的确,而且可以少走一个弯路
      当时没怎么太注意sed的功能,只是把sed功能想象到了特换添加删除了~还是得多看看嘿嘿

发表评论


此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据