Wednesday, April 29, 2015

3. Defined Goal

Agenda for 5.00 pm meeting on 10th Apr, 2015:
a) Sharing a single line of log file
b) Expected output i.e. To make three graphs out of it
             1. Past records analysis
             2. Showing in-progress data (Dashboard)
             3. Future load projection
c) Immediate action plan
    * Install Hadoop in 2 systems (so that we can test scaling) - Need 2 volunteers
    * Port data - Need 1 volunteer  to learn and share what needs to be done
    * Analyze using map-reduce to get graphs (1) - Need 1 volunteer to identify how to do this
d) Next meet on coming Wednesday, 5.00 pm
e) A brief Q & A session

Sample log lines:
[00001] 2015-04-10 11:21:39 [Root]system-notification-00257(traffic): start_time="2015-04-10 11:21:39" duration=0 policy_id=11 service=http proto=6 src zone=Trust dst zone=Untrust action=Permit sent=0 rcvd=0 src=192.168.16.236 dst=62.67.193.31 src_port=43855 dst_port=80 translated ip=111.93.148.154 port=1101
[00002] 2015-04-10 11:21:39 [Root]system-notification-00257(traffic): start_time="2015-04-10 11:21:39" duration=0 policy_id=7 service=NETBIOS (NS) proto=17 src zone=Trust dst zone=VPN action=Permit sent=0 rcvd=0 src=192.168.16.87 dst=192.168.0.32 src_port=137 dst_port=137 translated ip=192.168.16.87 port=137 
:
: 
[00115] 2015-04-10 11:21:43 [Root]system-notification-00257(traffic): start_time="2015-04-10 11:21:43" duration=0 policy_id=11 service=dns proto=17 src zone=Trust dst zone=Untrust action=Permit sent=0 rcvd=0 src=192.168.16.12 dst=205.251.192.60 src_port=40901 dst_port=53 translated ip=111.93.148.154 port=1772 
:
 
Quick summary of our meeting:
We briefly discussed the problem i.e. Analyze router log (mail files) and make sense out of it using Hadoop.

Our primary purpose is to learn analysis of huge amounts of data using Hadoop, with a specific deliverable, so that we can have problem to tackle and solve.

c) Immediate action plan
   * Install Hadoop in 2 systems (so that we can test scaling) : Need 2 volunteers ::
      - 10th Apr '15: We got Pavithra, Radhakrish & Prateek.
      - Kiran Patil will assist us in the installation, if there are any issues.
      - Kiran Patil also offered to create a VM, on which he will install the Hadoop.
      - Hadoop will be installed in local system itself, provided the needed version java does not hinder with their project java.

   * Port data - Need 1 volunteer to learn and share what needs to be done
      - Kiran Kumar, Sangamesh & Prateek volunteered.
      - What needs to be done? If there is data in a file (sample format shown below), how to import it into Hadoop environment?

   * Analyze using map-reduce to get graphs (1) - Need 1 volunteer to identify how to do this
      - Mustafa has volunteered to learn about map reduce and share it with the team.
      - ?: If there is data in hadoop, how to read it (map reduce it) for further processing?

Few questions discussed:- Sangmesh mentioned that we might need at least 3 systems, for apache hadoop to be running. So, three volunteers were needed and we got it.

? Why Hadoop?
! We needed a environment that allows us to import and analyze data. Hadoop was chosen in view of its non restrictive license.
? How to handle duplicates, if any, in input data?
! Logically, there won't be any duplicate. If there are any, we need to handle them.. [How??? That was the question... Huh?? I couldn't hear you.. I guess your cell phone tower signal is weak].
? What is the deliverable for Wednesday? Is it the graph itself?
! No. Graph 1 is not the deliverable for Wednesday. We will meet on Wednesday to share our learnings and then take it forward on how to apply them.

What more?
Gowri & Giri will prepare a quick time line and share it on Wednesday.

No comments:

Post a Comment