Abstract
of Biographical Dictionary Generator for Password Cracking
The biological dictionary generator is used to crack
the password. It is the proposed idea for conducting this research and to bring
several important improvements in the cracking method. It can be done by making
some changes in the password cracking method. There are two aspects of hackling
one is positive and the other is negative. There are many hackers and attackers
in the world that are always trying to crack the password for their particular
purposes. There is another side of hacking and it is positive. The study is
providing information about the hacking process improvement. According to this
fact, the literature review chapter is providing information related to some
studies on the recent challenges in cracking password algorithms easier. A
cracking program is proposed for the development of a password. This can be done
through agile methodology. After completing the development of this program,
testing is performed for checking the accuracy of the algorithm. Moreover, this
program is also requiring some amendments for working efficiently for the
future.
Acknowledgment
of Biographical Dictionary Generator for Password Cracking
I am very thankful to my
****** and ****** for their help and their endless support. Moreover, their supervision
provided me a lot of help for my project. I would also like to thank Birmingham
City University for providing me with the resources I needed to complete this
project.
List of Tables
Table 1: Personal information type in
12306 dataset. 17
Table 2: Percentage of personal
information use in password in 12306 dataset
List of Abbreviations
Acronyms
|
Abbreviation
|
2FA
|
two-factor
authentication
|
RTT
|
Reverse
Turing Tests
|
EKE
|
Encoded
Key Exchange
|
PCFG
|
Password
Context-Free Grammar
|
MPCFG
|
Modified
Context-Free Grammar
|
Chapter 1: Introduction of Biographical Dictionary Generator for Password Cracking
Introduction of Biographical Dictionary Generator for Password Cracking
The biographical dictionary generator is used for password
cracking. It is providing information about the generation of biographical password
cracking as well as it can also be modified for bringing some improvements (Lancrenon,
2013).
This kind of improvement can also be based on the victim’s data. The study is
showing that the data of the victim can be very effective as well as useful to
crack the passwords by conducting the attack. This is because it can enhance
the speed of the cracking technique. The
main aim of the research is to generate the dictionary for cracking a password.
Moreover, its objective is to generate an effective method that can easily
improve the method of cracking password by matching the data of the user.
The password cracking program will work effectively
when the user will change the password of their account (Kody, 2018). This study is
showing a proper report on hacking and contains important information about the
user’s stats. After reviewing the reports on hacking by Google and Harris poll
it is identified that many peoples are using internet contains account and they
are also reutilizing their passwords on multiple accounts. On the other side, one-third
majority of web and mobile application account holders mostly use different passwords
for all accounts, as well as a very few people, are reusing the same password
for their accounts (Saliba, 2018). The scope of this
research is to take a better understanding of different passwords by cracking techniques.
This can be
done through the information biographically. It can be noted that the need of
the hour is securing passwords as the demand for social media platforms is
increasing day by day on the internet. The password relationship is further
divided into three different categories that include; similarity-based; probability-based
as well as modification based (Wang, 2016). Furthermore, the
literature review section will discuss the effective related studies as well as
related works about hacking and also securing passwords (Vigliarolo,
2018).
Moreover, for conducting this study. The agile and iterative methodologies are
proposed for creating the program for generating a biographical dictionary.
This dictionary will be used for cracking the password (Sterling,
2013).
The program focuses on the match and retrieves the information.
The user name, email address, phone number, account id as well as password are
more focused for the cracking passwords (Bhattacharjee, 2013). For visualization, there
are several diagrams created to provide information about the program of
cracking a password (Rivest, 2013). To check the
accuracy of the proposed technique, the testing and results sections are also
provided in this report. Moreover, in the recommendation section, there is some
information is present for the future study and the conclusion section will
discuss the summary of the whole research.
Biographical
dictionary generator for password cracking
This
project is looking to explore the generation of biographical dictionaries. This
can be improved easily according to the data about the victim. Moreover, it is
also useful for the system during cracking the password (Kävrestad,
2018).
This is because it will reduce the time and increase speed during cracking. It
can be noted that passwords are widely used over different applications, data,
and devices to authenticate your identity. The passwords that are used for the authenticity
of the user identity. They are started with biographical information about the
users. Moreover, sometimes that is
asking users to use different characters and symbols. Although there are
security settings in place to stop dictionary attacks (Kävrestad,
Indexing, Searching, and Cracking. In Guide to Digital Forensics , 2017). All of these settings
usually required to be enabled by the user like for example two-factor
authentication (2FA).
This
project will look at creating custom wordlist passwords using personalized
information related to the victim. For conducting this research the data is
collected by the individual. From that case, each wordlist will be customized by
the victim and he will also include some specific characters for the victim.
Moreover, it will also include a specific character length of the required
demand. It can be noted that the custom word list is including a complete set
of characteristics that are used for generating the password word list (Yazdi, 2011).
Aims of Biographical Dictionary Generator for Password Cracking
This
project aims to develop a biographical password generating program with the
intent of improving how password cracking dictionaries are created.
Objective of Biographical Dictionary Generator for Password Cracking
The objectives of this
project are:
·
Research different
password cracking techniques.
·
Design and develop a
password cracking program and test the authenticity.
·
Generate biographical
information for a password-protected file.
·
Explore if biographical
information is enough to decrease the time taken to conduct an attack.
·
Implement new techniques
for generating a biographical dictionary
·
Explain the key
principles of password cracking in a key detail.
Rationale of Biographical Dictionary Generator for Password Cracking
The
main advantage of this research study is to increase awareness of password
cracking for different organizations, researchers in the industry alongside the
wider audience. This will provide
complete information about the side effects of weak passwords and when
biographical information is required. It can be noted that the example of a weak
password is related to the secondary details, dictionary words and personal
information of the user. This is like
such a password is based on a favorite football team, date of birth, personal
name, nicknames or favorite celebrities. These passwords can be guessed easily by
people that know enough about the user. (Notoatmodjo et
al , 2009)This paper will explore biographical information that
is enough to crack a user’s password. Moreover, it will also implement new
techniques that are useful for improving dictionary attacks. This could be
helpful if a user wanted to test the strength of their password. Additionally,
it provides a stronger understanding for users to secure their data by using a
stronger password. Individuals that use the same passwords across multiple
accounts and programs can be extremely dangerous because they can compromise
the security of one’s password. This project will also help those to recover a
forgotten or lost password that may contain biographical data.
Timetable
for project of Biographical
Dictionary Generator for Password Cracking
Figure 1: Dates for the project
Gantt
chart of Biographical
Dictionary Generator for Password Cracking
Figure 2: Schedule dates in a Gantt
chart for the project
Problem
definition of Biographical
Dictionary Generator for Password Cracking
This project aims to
develop a biographical password generating program with the intent of improving
how password cracking dictionaries are created. Hacking has become a norm and increasingly
mainstream, which may be seen in a positive or negative light. According to a
study conducted by Google and Harris poll, Statistics show that around 52% of
individuals re-utilize their passwords for “multiple (but not all) accounts”
35% “use a different password for all accounts” and 13% “Reuse the same
password for all their accounts (Services.Google.Com, 2019)
It can be noted that while
reusing your password it may appear as an efficient method to help you recall your
passwords for significant records. But the main thing is that it will leave you
powerless against an information break. Nobody comprehends this more than
Facebook CEO, Mark Zuckerberg, who fell victim to hacking and had his internet
based life accounts traded off – including Twitter, where hackers tweeted from
his record (Fox, 2019). Hackers uncovered
the account passwords of well-known CEO's and it was figured out during the
security breach from the LinkedIn server. Mark Zuckerberg’s secret phrase for
his LinkedIn account, “dadada”, was additionally utilized for his Twitter and
other traded off online networking accounts. These sorts of attacks can have
immense consequences for your business (Bhole, 2017). Dropbox endured an
attack in 2012 that originated from a representative utilizing a similar
password for LinkedIn that the hackers utilized for their corporate Dropbox
account. Rather than some indiscreet tweets from a hacker, this attack brought
about the robbery of approximately 60 million client details. The benefits of
this research study will be to increase the awareness of password cracking for
organizations, researchers in the industry alongside the wider audience. This
will provide a better insight into what a weak password is when using
biographical information. This will be outlined in the guidelines. An example
of a weak password would be creating a password using a secondary detail such
as basing a password on a favorite football team or date of birth of a loved
one. The paper will explore if biographical information is enough to crack a
user’s password but also implement new techniques that may be able to improve
dictionary attacks. This could be helpful if a user wants to test the strength
of their password. Additionally, it provides a stronger understanding for users
to secure their data using a stronger password. Individuals that use the same
passwords across various accounts and programs can account for a danger that
can compromise the security of one’s password. This project can also help those
to recover a forgotten or lost password that may contain biographical data.
The research study will
help educate readers about multiple techniques that are used for cracking
passwords with the help of a biographical password cracking program that will
be designed. This study aims to not promote hacking. The project will help
understand the importance of setting strong passwords and not creating
passwords based on biological information which is easy to crack. It will be
showcased with the help of the program.
Scope of Biographical Dictionary Generator for Password Cracking
The scope of this research is about
understanding the different password cracking techniques that are used using
biographical information. With the increasing surge of internet and social
media platforms in demand, securing passwords has become the hour of the need.
Passwords are one of the widely used parameters that are the result of cyber-attacks
as a successfully cracked password gives access to sensitive unauthorized
information. User analysis is one method that is used to crack passwords. In
this method, analysis of the user is carried out to understand the commonly
used phrases, so that hints regarding the passwords can be generated. Hackers can
reduce the time taken to crack a password by tracing and understanding the
conversation style and characteristics of the user (Wong, 2013). A popular
example of this is passwords in the corporate world are mostly aligned to the
business activities which make it easy to crack the password. Password
relationships are established which is another technique which is used to crack
the password for users based on the biological information (Zheng et al , 2018)
The password relationship is further
divided into 3 categories which are known as modification based, similarity-based
and probability-based.
Modification based is generally about
observing the changes that users generally do while changing the passwords and
based on this the passwords are cracked. The similarity-based technique is
based on the changes which are done by changing the passwords to similar
strings. Finally, the probability-based passwords find out the probability
derived on the idea of creating passwords that form the chain of finding the
next password with the weight assigned to it.
Chapter 2: Literature Review of Biographical Dictionary Generator for Password Cracking
Review
of Existing Knowledge of
Biographical Dictionary Generator for Password Cracking
Many attacks are not
being made by hackers to crack the passwords and as mentioned above the reason
behind the same is that the passwords are most vulnerable and with access to it
all important information can be exposed.
Brute
Force Attacks of Biographical Dictionary Generator for Password Cracking
It
is one of the most well-known methods for password cracking of up to eight
characters. This is fundamentally a hit-and-miss strategy, as the hacker
efficiently checks every single imaginable character, computes the hash of the
string blend and afterward contrasts it and gets the password cracked. The
success of this attack relies upon the length of the password. In this attack,
the hacker attempts every blend of letters, numbers, and accentuates it to
create a password key. But if the password is longer than eight characters then
this strategy takes additional time: from minutes to quite a while, contingent
upon the framework utilized and length of the password (Raza et al, 2012).
Word
reference Attacks of Biographical Dictionary Generator for Password Cracking
Word
reference attack is like a brute attack but at the same time, there is one
significant distinction between these two kinds of attacks. Right now, hacker
utilizes a rundown of plausible matches (in light of expressions of the English
language, for instance) rather than attempting every single potential character
individually. This attack framework frequently incorporates known passwords,
words from the English language, sentences from books, and that's only the tip
of the iceberg (Sharif & Khan, 2007).
Consolidated
Dictionary Attacks of Biographical Dictionary Generator for Password Cracking
Taking
the word attack one stride further and including considerably greater
multifaceted nature, hackers can consolidate a rundown of existing words with
numbers similarly that people may while making new passwords –, for example, by
swapping the letter 'e' with '3'. This system is known as a "consolidated
dictionary" attack, where the database utilized can contain words from at
least one dictionary (Bošnjak et al , 2018)
Dictionary
and Rule-Based Dictionary Attacks of Biographical Dictionary Generator for
Password Cracking
The
crossover dictionary attack is the strategy for taking the words recorded in a
word reference and joining them with this by pre-appending three numbers to
every section. It gets results, for example, 111apple up to 999apple. This
method of attempting to crack a password can take some time, so spicing up the
secret phrase mystery with a couple of rules can abbreviate the timeframe it
may take to split the password. This technique, notwithstanding, leaves a lot
of space for hacker’s intelligence in characterizing the guidelines that the
password splitting programming will apply (Madiraju, 2014).
Rainbow
Table Attacks of Biographical Dictionary Generator for Password Cracking
A rainbow table is a
pre-aggregated table Utilized for recouping hashes. Every rainbow table is for
a particular length of secret phrase containing a characterized set of
characters. This strategy plans to decrease the speculating time however is constrained
to passwords no longer than nine characters and hashes without the password (Zhang et al , 2017).
Biographical
Information on Attacks of Biographical Dictionary Generator for Password
Cracking
The
author presented an investigation of existing dictionary attack anticipation
procedures and their downsides. The research explored encoded key trade based
conventions for insurance against offline dictionary attacks. For defeating the
web word reference attacks, the paper talked about record lockout, deferred
reaction, additional calculations, and RTT.
The Password-based EKE)by Steven Bellovin and Michael Merritt fuses a
mix of cryptographic plans to forestall offline word reference attackswever, it was later seen that the
EKE and variations of EKE conventions are powerless against plain content
equality. Thus, permitting a hacker to take on the appearance of an individual
by utilizing his hashed secret password caught through spying (Fink, 1997). Furthermore,
deferred reactions and record locking are basic countermeasures for online word
reference attacks. It reduces the number of passwords that can be speculated in
a given time and lock the client account in the wake of arriving at a limit set
for fizzled login attempts. (Shapira, 2016) these countermeasures can bring
about expanded client support costs because of record locking. Moreover,
cyber-attacks can try numerous login attempts corresponding with various client
records to dodge postponed reaction and record lockout countermeasures. The
additional calculation based system includes the incorporation of non-paltry
calculation notwithstanding giving the secret password (Lee, 1999). Such a
strategy can join an enormous overhead for secret password attack frameworks as
it would require calculation for each login endeavor along these lines reducing
the number of attempts. The additional calculation method might introduce
convenience issues for an authentic client while a hacker can handle the
overhead by utilizing a ground-breaking attack machine or condition.
Borror, (2004) made use of
a comparative methodology by fusing the RTT method to keep robotized programs
from doing dictionary attacks. Right now, client necessity is to introduce his
secret phrase and pass the RTT to ensure the success of a login. Conversely, the
author believes that the RTT-based usage is inclined to RTT transfer attacks. A
few genuine worlds RTTs (additionally known as CAPTCHAs which allude to
Completely Computerized Public Turing test to distinguish Computers and Humans)
actualized by famous on the web specialist organizations have been broken in
the past Utilizing PC character acknowledgment based activities Methods for
improving verification or security against secret key attacks from equipment
based arrangements, biometric verification, customer declaration components,
graphical secret key plans, matrix-based logins, multifaceted verification and
so on (Almasizadeh, 2013).
Shapira (2016) presents a
scope of such existing verification components and downsides related to them,
which are examined as follows. The equipment and biometric-based confirmation
arrangements structure a vigorous verification strategy. There are some
disadvantages like extra expenses, overhead connected with the requirement for
extra gadgets and relocation from conventional secret word based validation.
Moreover, equipment based validation arrangements additionally include ease of
use issues because of losing or overlooking gadgets. Multifaceted validation
plots regularly join passwords (something which is known) with equipment
(something present) or biometric arrangements (something one identifies like)
therefore exhibiting the same downsides as equipment and biometric arrangements
alongside other accommodation issues. Customer declarations are another
arrangement that actualizes a product-based confirmation approach however
incorporates disadvantages related to key convenience and capacity.
Analysis of the password
security dates back to 1979 by Morrison and Thompson where they did a seminal
analysis of more than 3000 passwords (Morris et al , 1979)There are 2 ways in
which password cracking analysis can be attained, it is known as password
cracking and semantic evaluation. Both these methods focus mainly on the
individual password based on the user information and do not focus on the
relationships that can be established between the passwords. Thus, to solve
this issue a technique was discovered to ensure that the relationship is not
neglected when cracking the passwords as it lays down a good foundation to
crack the passwords by hackers using biological information. A password
relationship is one such method that is based totallyn the biographical information about
the users to crack passwords. It was discovered that 71% of the simple
passwords were less than 6 characters and 86% of that belonged to the same category
which was clustered into name category Klein (1992) that they acquired password
file and tried to crack the passwords and they were able to crack 21% of the
passwords. It was found out that more than 50% of the passwords were less than
6 characters.
Zhang (2010) Stated that
the modifications done by a user on its passwords are pretty much predictable
and observation was the method that was used to predict and crack the password.
Zhang (2010) researched focused on the relationship established on the password
of single users however some methods were evolved that focuses on the relationship
of the passwords amongst the different users. Based on the biological
information and the clustering methods that are used to classify and segment
the users.
As stated by Juels &
Rivest (2013) that they have proposed a very simple method to improve the
security of the hashed password such as the additional maintenance honeywords false
password associated with every account of the user. The study is showing that the
file of hashed passwords is stolen by the adversary as well as the hash
function is also inverted. Furthermore, the hash function cannot notify whether
the attacker has found the keyword or the password. Although, the attempted honeyword
use to log in the sets of an alarm. The user password can be distinguished by
an auxiliary server such as the honey checker from honeywords from the routine
of login, as well as an alarm will be set off at the time of submission of
honeyword (Juels & Rivest, 2013).
As Li,
Romdhani, & Buchanan (2016) has described that it uses text-based passwords
effectively for mobile applications as well as several web applications. for
both applications web-based as well as mobile-based, the vulnerabilities, as
well as the patterns, were also investigated in the paper that is based on the
conditions of guessing entropy, Shannon entropy as well as minimum entropy. The
study is also providing brief information on how to make improvements
substantially on the password strength which is based on the analysis of
entropies in the text-based password. Furthermore, the scientists are very sure
about the security of the applications using the strong password which may also
be designed as well as based on the rememberability, on good useability,
security entropies as well as deployability through analyzing the datasets comprising
the text-based passwords (Li, Romdhani, & Buchanan, 2016).
Sandvoll (2014) has described
that password management is a very important issue for many people across the
world. it has the design as well as implementing the system of password management
in this study as the IOS application called the PassCue. The PassCue is the
password management model that is based on the password shared cues as well as
proposed by J. Blocki, A. Datta and M. Blum in naturally rehearsing passwords. Furthermore,
the choices related to the design as well as implementation choices including
the evaluation of the parameters were very significant in the order to develop a
secure as well as usable system. The cues are used by PassCues to share the
confidential information or hidden secrets throughout several accounts in the
sense of attaining the security as well as competing for usability goals. moreover,
the higher security is provided by the PassCue rather than several kinds of
very famous multiple password techniques for the management without any type of
minimization in the use of the passwords applications significantly.
Sandvoll (2014) also has made a discussion
on password management as well as provided some probabilistic results in this
study to provide a better understanding related to password security. The
pabilitstic results which are provided in this study are showing that an
account will be compromised by an attacker in the online attack which is for the passcue
(9, 4, 3) and the passcure of (43, 4, 1) as well as for the
passcure of (60, 5, 1). Furthermore, Sandvoll (2014) has also explained several
significant things in his study that cracking the password related to the
passcue of (9, 4, 3) as well as (43, 4, 1) will also take more than thirty-eight
years as well as can have the cost more than 700,000 dollars in the offline
attack with no leakage of the previous plaintext passwords.
Moreover, the password cracking
of the passcue (60, 5, 1) would take more than 1.5 million days as well as the
technology is also using the cost around $2.84442×10^(10) nowadays. Thus, the user is not required
by the Passcue of (9, 4, 3) for the investigation into the extra time in the
sense of maintaining or handling or managing the passwords in the memory while 11
and 20 respective rehearsals must be performed by the user in the sets of the
passcues of (43, 4, 1) as well as the passcue of (60, 5, 1). It
can also be easily customized to the implementation as well as the design of
the passcue to provide support for other security as well as the usability
needs and the requirements. furthermore, the low percentage of memory of the
iPhone 5 as well as the CPU percentage are utilized by the passcode application,
as well as It is also used less than 1% of the percentage of CPU as well as utilizes
only 5.9 Mb memory in the idle state (Sandvoll, 2014).
Helkala & Snekkenes (2009)
has also explained that it is very easy for humans to think, design as well as
generate passwords that can be very easy for them and they can also remember
those passwords but those simple generated passwords may be very complex for
others and those passwords if other people see then it is very difficult to
remember such passwords. Therefore, those passwords can be familiar with the
generator but very complicated for other persons that increase the strength of
the password. Although, such kind of generated passwords may have the predictable
structure of the password or the particular pattern of the password which can
make the comprehensive search possible. Furthermore, Helkala & Snekkenes
(2009) has also divided the passwords which are generated by the humans in
three categories which are: mixture passwords, non-word passwords as well as
word passwords. The passwords which are generated by humans and divided by the
researchers intot three categories completely depend on their structure.
Helkala & Snekkenes (2009)
also stated about the generation as well as the division of the passwords, and
they have also analyzed some important aspects of the passwords types which are
mentioned previously. The search-space reduction has also been analyzed by the
researchers into the mentioned categories of the passwords for many common substructures
of the passwords. Therefore, the researchers have derived some notable
guidelines for these categories form the analysis that makes the very strong
passwords in every category of the password. Thus, the results which are given
in this study contribute toward the goals of the project to achieve both
memorable as well as strong passwords (Helkala & Snekkenes, 2009).
Chapter 3: Methodology of Biographical Dictionary
Generator for Password Cracking
Generally,
people tend to use a password that is easy to remember and hard to crack at the
same time. More often than not these are mutually exclusive. There is generally
a rule or a context-free grammar which people use for their password. To
explain this with an example let’s say we denote the alpha string with L, digital
with D and special characters with S. One of the commonly used
type password takes the form of S1L8D3. Weir
et.al studied data of dataset of passwords and they tried to come up in
context-free grammar for generating the most common password. They took a
probabilistic approach for the context-free grammar they developed and assigned
different probabilities to different types of strings after exhaustively
studying the password dataset. This context-free grammar can be used to
generate a dictionary of the most commonly used password that people use. The
probabilistic approach also helps with the sorting of probable passwords.
People do not have an infinite amount of computing power when they are trying
to break the password. So, it makes a lot of sense to try the most probable
password first and move to least probable passwords later on (Weir et al , 2009).
As
people tend to use the password that they can remember, later on, most people
end up using some biographical information. To help understand the correlation
of personal information with user-generated password, a subset of publicly
available and leaked dataset of 12036.cn was used. This dataset also has personal
information of users along with the passwords they used. The personal
information which is also available along with the password is listed in the
below table:
Table 1: Personal information type
in 12306 dataset
Information
Type
|
Description
|
Name
|
Name
of the user in Chinese
|
Email
address
|
The
email address of the user
|
Cell
phone number
|
Cell
phone number of the user
|
Account
name
|
The
user of the user which they use to login to the platform
|
ID
number
|
ID
number which the government issued to the user
|
The ID number also holds
the important personal information which can be parsed out like digits 7-14 of
the ID is the birth date of the user and digit 17 represents the gender of the
user. So, in addition to the information listed above in the table, birth date
and gender of the user is also accessible. So, in effect, there are 6 types of
personal information that are available namely name, birth date, cell phone
number, email address, account name, gender.
To include this personal information in the generic of representation of
password with characters L, D, and S, new variables are
introduced namely [NAME], [BD], [ACCT], [EMAIL], [CELL]. Usually, people do not
use their gender is their password. If “John” who was born in 1965 has
password “john1965xyz”, it is
represented as “[NAME][BD]L3” as opposed of L3D4L3
because in that case, we would lose very important information as the
biographical information of the user can be just characterized as other digits
and letters and knowing those in advance does help in cracking the password.
Algorithm
1: Personal information Matching of Biographical Dictionary Generator for
Password Cracking
1.
2
The
passwords are then matched to personal information, and for matching the set of
all possible substrings of a password is used and is sorted in the ascending
order of length of the string. These substrings are then matched with the
personal information of the users. Substrings are matched for personal
information recursively with the base case of not matching any personal
information. There are different techniques for matching each personal
information like the name is first converted to English characters and then is
matched according to different settings like first_name + last_name, last_name
+ first_name, first_name_inital + last_name_initial, etc. For birth date, the different
permutation is used like only considering the last two digits, considering the
whole 4 digits, etc. Similarly, other personal information is also matched. The
personal information was used in around 60% of the password. The usage of the
personal information is listed in the below table.
Table 2: Percentage of personal
information use in a password in 12306 dataset
Information
Type
|
Usage
percentage
|
Birthdate
|
24.1
%
|
Account
name
|
23.6%
|
Name
|
22.35%
|
Email
|
12.66%
|
Cell
phone number
|
2.7%
|
This
tells that people largely rely on the usage of their personal information for
the creation of their password which birth date being the most commonly used
personal information in the creation of passwords. The correlation between the
personal information and the passwords used by users warrants the modification
of “Password Context-free Grammar” (PCFG). The PCFG is extended and personal
information is also added to that. In addition to the usual L, D
and S more variables are semantic symbols are introduced to cater to the
personal information. These variables are B for birth date, N for
a name, E for an email address, A for account number and C for cell phone number. Now, the password
is matched with the personal information and the number of characters that
match with the personal information is also recorded e.g. if John has the password
“helloJohn93” it will be translated as L5N4B2.
The next phase is calculating the probabilities and generating the dictionary
with password guesses. The L words are plugged in from a dictionary of most
commonly used in passwords. And according to the probabilities, the symbolic
passwords are then stored in a dictionary with the most probable password at the
top and least probable passwords at the bottom end of the dictionary. As the
personal information of each user differs a lot, the dictionary can not be
generated with the personal information at this stage. The attack scenarios are
considered to be the case of the known attackers as the personal information of
users without is not so easy to get; the cases can only the case of a known
attacker or the case of some leaker passwords with personal information.
Coverage is also an
important metric. Coverage ranges from 0 to 1. Coverage is directly
proportional to the correlation between personal information and password. So,
coverage 0 means the no personal information is helpful in the cracking of
password and the coverage of 1 means one type of personal information can be
used to fully crack the password. Although coverage is calculated for
individual password an average coverage can be used as a metric for correlation
of personal information and passwords of a whole dataset. The coverage is
calculated using the sliding window approach. Passwords and personal
information are taken into account. A dynamic-sized window sliding from the
start to end of the password with the initial size of 2. In case the segment
behind the window matches a certain type of personal information, the size
grows by 1. The windows size keeps on increasing until a mismatch is found.
This way we find the highest match. At this point, the window’s size is reset
and the process restart. At the same time, a tag array with the same length of
the password is also maintained. Let’s take the example of [4,4,4,0,0,2,2] as
the tag array of length 8. The first 4 elements of tag array {4,4,4,4} matches
a certain type of personal information while the next 2 elements {0,0} does not
match anything. Similarly, the last 2 elements {2,2} matches another type of
personal information. So, coverage is defined as
where is the number of all segments in the tag
array, represents the length of the corresponding
segment and is the length of the password. So, the
coverage of the above example would be 0.3125. Coverage can be a great
indicator of selecting the right algorithm for password cracking. If the
coverage is close to 0, it does not make much sense to use the algorithms which
take into account personal information.
This
section of the research paper will discuss the chosen methods for the research
project. This research study will be conducted from a secondary research
stance. The secondary research method allows the opportunity to experiment so
that I can test whether the profile of biographical information is enough to
crack a password. Secondary research
will be carried out to support my findings against my project. Secondary
research is beneficial to this project as opposed to empirical research.
Empirical research requires you to collect data from various users on what
biographical information their password is based on. This is tricky as it is a
security issue and participants and users may not be happy sharing this
information. Conducting the project from
a Secondary research perspective is a lot more reputable you are going off the
research that has been collected from various journals.
This
investigation relates to the user’s password security, determining whether
biographical information is enough to crack a password. The only way to gain an
understanding of this is by experimenting; in this case by setting up dummy
profiles that will include biographical data for the test participants to
create a password based on the biographical data of the profiles. The profiles
will include data such as:
•
First and last name
•
Date of birth
•
Mobile Number
•
Favorite color, books, songs, and movies
•
Interest and hobbies
Figure 3: The Iterative Model
There
are a variety of methodologies that can be used within this research paper such
as:
•
Agile
•
Iterative
•
Waterfall
Agile
Methodology
The
Agile methodology is a process by which a project can be managed by breaking
the project up into several tasks (Sacolick, 2020). The task is broken down into six main
stages:
•
Project Vision Statement (A summarization
of the goals of the project goals.)
•
Project Roadmap (The standards that need
to be achieved for the project vision)
•
Project backlog (A priority list of what
needs to be done for the project)
•
Release plan (A timetable for the realization
of the working project)
•
Sprint Backlog (The requirements, task,
and goals that are linked to the current sprint)
•
Increment (The final working project that
could be the end product.)
Fernandez and Fernandez (2008) believed Agile to
be a “set of values and principles” This methodology works by having an idea
of what the finishing project will be and what problem it will solve. An Agile
project runs through the process of planning, executing and evaluating. The
advantage of using an agile method is that the quality of the project is higher
as the task within the project is broken up into several manageable tasks (Fernandez et
al , 2008).
Iterative
methodology of Biographical
Dictionary Generator for Password Cracking
The
Iterative model is a cycle for software development focusing on building a
simple version of the project up and then adding more complexity to it for the
project to meet the final goal. The iterative model contains five stages (Morse A. ,
2016)
•
Planning and Requirement’s
•
Analysis and Design
•
Implementation
•
Testing
•
Evaluation
The
Iterative model is preferred over other methods for example waterfall as it
allows enhancements to be implemented quickly improving on the last iteration.
This method was implemented by NASA about software development to aid the first
manned space flight (Morse, 2016)
Waterfall
methodology of Biographical
Dictionary Generator for Password Cracking
The
waterfall methodology is where the specifications of the project are all
gathered before the project begins and then a sequential project plan is created
to meet the requirements. The activities in the project are broken down into
small linear sequential phases. This sort of method is used for large projects
and organizations, with benefits including flexibility for early design changes (Sherman,
2015). The stages involved in a waterfall
methodology include:
•
Requirements (A requirement of what the
application should do)
•
Analysis (An analysis of models and
business logic that will be used in the application.)
•
Design (This stage covers the technical
requirements such as programming language, services, and data layers.)
•
Coding (The code is written up
implementing all four stages.)
•
Testing (Tests are carried out to discover
any bugs that may need resolving.)
•
Operation (This final stage is where the
application is ready for deployment.)
This
project will be carried out using the Iterative model where the focus will be
on building the password cracker and then adding more complex features to the
program. The iterative model is perfect and known to work well when implemented
into small projects this approach when being used within the projects will
ensure the program is built to a high quality of standards. A password cracker
will be built, and then more complex features will be implemented into the
program such as biographical data about the user and also windows registry
files such as NTUSER.DAT file.
Chapter 4: Development of Biographical Dictionary Generator for Password Cracking
Development of Biographical Dictionary Generator for Password Cracking
The
project demanded a high-performance computer so I went with a work station that
had the memory of 1TB. The dataset that was used for evaluation was quite large
and needed this kind of work station. Also, the project was done using a python
programming language which is a great language for scripting and data analysis
and research. Python is equipped with libraries that make the handling of big
data a piece of cake and one of the most popular libraries of this category is NumPy
which was also used in this project.
Unified Modeling language of Biographical Dictionary
Generator for Password Cracking
Unified
Modeling Language (UML) is used to visualize the way a system has been
designed. The unified modeling language is referred to as a visual language
this is because diagrams are used to demonstrate the actions and structure of a
software or system. There are up to seven diagrams that can be used to portray
how software or system works. These seven diagrams are:
•
Class Diagram (Describes the types of
diagram objects in the system and the variety of relationship that is existent
between them.)
•
Component Diagram (These diagrams help
with the visualization of the physical components in a system)
•
Deployment Diagram (A diagram that
specifies physical hardware on which the software system will execute.)
•
Object Diagram (A snapshot of a detailed
state of a system at a point in time.)
•
Package Diagram (Package diagram is used
to simplify complicated class diagrams.)
•
Composite Structure Diagram (Composite
diagram is used to show the internal structure of a classifier including its
interaction points to other parts of the system)
•
Profile Diagram (A profile diagram is used
to enable you to create a domain and platform-specific stereotypes defining the
relationship between them.)
These
diagrams are known as structure diagrams, “structure diagrams show the static
structure of the system”
(Fakhroutdinov, 2019). A class diagram will be used for the
biographical password cracker as it gives a sense of orientation. This will
provide a detailed insight into the structure of the program whilst allowing
you to see an overview.
Testing /Result
of Biographical Dictionary Generator for Password Cracking
The
results of the Password Context-free Grammar (PCFG) are compared with the
results of the Modified Password Context-free Grammar (MPCFG). PCFG is one of
the best-known algorithms for creating the dictionaries of possible passwords.
12306 dataset was used. Half of the dataset was used for testing data and the
other half was used as the training data For the L segments a “perfect”
dictionary is used which mean all the possible L segment words were
collected directly from the data set which eliminates the possibility of unfair
dictionary selection so that all the words are already present in the
dictionary. The perfect dictionary ensures that the words will be present there
and can be found efficiently. The dictionary contained more than 15,000 words.
An individual number of
guesses is used as a metric to compare the effectiveness of
MPCFG against PCFG where an individual number of guesses is
described as the number of guesses for password generated for each account. The
bottleneck for cracking the password lies in the number of hashing operations
and adding salt to that and is thus bound by G.N where G is the
individual number of guesses and N is the size of dataset being used.
For a different number of guesses, the percentage of the entire password set
which has been cracked is calculated. Both of these methods have a very quick
started as both of the methods start with high probability guesses. In 0.5
million guesses MPCFG achieves a similar rate that PCFG achieves at 200 million
guesses which prove the fact the adding personal information to the guesses
improves password cracking a lot. And MPFCG will also be able to cover large
password spaces meaning it will be able to crack more passwords than FPCG
because the information that personal information adds to the password cannot
get from even the best of dictionaries that are used in PFCG.
Figure 4: comparison of PFCG and
MPFCG
Chapter 5: Discussion
Discussion of Biographical Dictionary Generator for Password
Cracking
The research on the biographical dictionary generator
for a password has effectively described in this research document that has a
clear main goal. The main objective of this research is to make an effective,
efficient as well as optimized algorithm program for the biographical password
generation to bring improvements into the password generation method as well as
to provide brief information on how the dictionaries related to the password
cracking are generated. The hacking of computers, websites, network systems or
any other computing device has become the norm a well as mainstream
increasingly that has two sides negative and positive. Both of the hacking
sides can be seen easily in the real world where many attackers always ready to
attack the network as well as computer security for their particular purposes. For
the estimation purposes, a study has been conducted and reviewed by Harris
poll. In the study, the statistical part has also been shown that the majority
of the people in the world have reutilized their old passwords for their
multiple webs and mobiles applications passwords, one-third population is using
different patterns and password for their multiple accounts in all applications
while a minority of the people are using the same kind of passwords for their
all accounts active in their website applications as well as their mobiles
based application accounts.
Further on the objectives of this study, a
comprehensive discussion is made as well as the different kinds of related
studies on cracking password techniques are provided. For this purpose, the
massive literature review part is added to this study. The main and significant
part of this research of this study is to design and develop the program for
cracking the password as well as to test the password cracking algorithm for
the authentication and verification of the technique. Furthermore, for the
development of this program, the agile methodology and iterative methodologies,
as well as their processes, analyzed to see whether it will be effective with
those methodologies. The algorithm or program which is generated for the cracking
the password, it focuses first to match the relevant information of the users
as well as it will also match the information with different accounts and then
it will crack the password. In the detail of the cracking the password, the
password has been divided into three categories. After the creation of the
cracking program, the program needed some changes and some modifications. So,
the modifications are based is usually observing some changes in the passwords
made by the users. The passwords are cracked at that time while changing the
passwords. In very simple and meaningful words, the password cracking technique
is completely based on the changes in the passwords of users which will be performed
by making some modifications to the passwords in the same strings. In the last
of the discussion on the cracking passwords technique, the passwords based on
the probability identify the possibilities of matching passwords derived by
this the idea.
Critical analysis
of product of Biographical
Dictionary Generator for Password Cracking
The decision of using the personal information in
password cracking increases the chances of a password being cracked but it also
has some limitations. It can only be used in case the person who wants to crack
the password is a close associate as the access to the personal information of
random people is not easy and in case there is a dataset which solely has the
passwords there is no way of getting the personal information of users. The
other case when it is used is the case of leaked datasets which also have
personal information of users and leaker datasets usually don’t have this kind
of personal information with them.
Critical
analysis of my process of
Biographical Dictionary Generator for Password Cracking
The process which is used in this project has some
limitation as there was only one kind of dataset which was used to evaluate the
performance and compare it with other techniques. And the dataset that user
does not take into account people of other nations. It is quite possible that
people in other countries do not use personal information like Chinese did in
their password although it is unlikely without actually evaluating those
datasets we cannot be certain.
Chapter 6: Conclusion of Biographical Dictionary Generator for Password Cracking
It is concluded that the password cracking program
will work effectively when the user will change the password of their account.
The study of the report on hacking is showing very significant information on
the statistics of the users. The
benefits of this research study will be to increase the awareness of password
cracking for organizations, researchers in the industry alongside the wider
audience. Hacking has become a norm and increasingly mainstream, which
may be seen in a positive or negative light. This study aims to not promote
hacking. Hackers can reduce the time taken to crack a password by tracing and
understanding the conversation style and characteristics of the user. The
success of this attack relies upon the length of the password. In this attack,
the hacker attempts every blend of letters, numbers, and accentuates it to
create a password key. The user password can be distinguished by an auxiliary
server such as the honey checker from honeywords from the routine of login, as
well as an alarm will be set off at the time of submission of honeyword.
The passwords are then matched to personal information, and for matching
the set of all possible substrings of a password is used and is sorted in the
ascending order of length of the string. There are different techniques for
matching each personal information like the name is first converted to English
characters. The dataset that was used for evaluation was quite large and needed
this kind of work station.
Recommendation and future work of Biographical Dictionary Generator for Password Cracking
In the recommendation and future work section, it is
tried to provide the information about the modifications in the program to make
this password cracking program and algorithm more optimize as well as more
effective and faster. The program for password cracking should be optimized in the
future to provide perfect biographical dictionaries based on the passwords. It
is because the password matching technique will become faster than before when
the program will genera perfect dictionaries of the text-based passwords. It is
also recommended that the program should become more efficient to generate the file
dictionary faster and the limit or size of every password dictionary should be
increased. Because the algorithm can only generate a limited amount of records.
Furthermore, the program for password cracking should have to make some
amendments to handle the bottlenecks to crack the password by generating larger
sized password dictionaries. In future work, the program to crack the password
will work on the probability of the combinations of the password as well as to
improve the matches of the passwords and the related information. Every dataset
record should be sorted and saved in the dictionary. So, the guessing
probability will become faster as well as can provide more accuracy.
Bibliography of
Biographical Dictionary Generator for Password Cracking
1.
Bhattacharjee, S. (2013). CockcroftWalton generator:
Circuit Analysis And Applications. i-Manager's. Journal on Electronics
Engineering, , 3(3), 20.
2.
Bhole, M. M. (2017). Honeywords
for Password Security and Management.
3.
Bošnjak et al , L.
(2018). Brute-force and dictionary attack on hashed real-world passwords. Conference:
2018 41st International Convention on Information and Communication Technology,
Electronics and Microelectronics (MIPRO).
4.
Fernandez et al , D.
(2008). Agile Project Management —Agilism versus Traditional Approaches. Journal
of Computer Information Systems, 10-17.
5.
Fox, K. (2019). ‘True
Biographies of Nations?’: The Cultural Journeys of Dictionaries of National
Biography. ANU Press.
6.
Helkala, K., &
Snekkenes, E. (2009). Password Generation and Search Space Reduction. Journal
of Computers, 663-669.
7.
Juels, A., & Rivest,
R. L. (2013). Honeywords: making password-cracking detectable. Proceedings
of the 2013 ACM SIGSAC conference on Computer & communications,
145–160.
8.
Kävrestad, J. (2017).
Indexing, Searching, and Cracking. In Guide to Digital Forensics . Springer,
Cham., 61-70.
9.
Kävrestad, J. (2018).
Cracking. In Fundamentals of Digital Forensics. Springer, Cham., 93-103.
10.
Kody. (2018). Create
Custom Wordlists for Password Cracking Using the Mentalist.
https://null-byte.wonderhowto.com/how-to/create-custom-wordlists-for-password-cracking-using-mentalist-0183992/.
11.
Lancrenon, J. K. (2013).
Password-based Authenticated Key Establishment Protocols. In Computer and
Information Security Handbook Morgan Kaufmann., (pp. 705-720).
12.
Li, S., Romdhani, I.,
& Buchanan, W. (2016). Password pattern and vulnerability analysis for web
and mobile applications. ZTE Communications, 14(S0), 32-36.
13.
Madiraju, T. (2014). Dictionary
Attacks and Password Selection. Rochester Institute of Technology RIT
Scholar Works.
14.
Morris et al , R. (1979).
Password Security: A Case History. 22(11).
15.
Morse , A. (2016,
Decmeber 15). Iterative Model: What Is It And When Should You Use It.
Retrieved from https://airbrake.io/blog/sdlc/iterative-model
16.
Morse, A. P. (2016,
Novemeber 23). Rapid Application Development (RAD): What Is It And How Do
You Use It? Retrieved from
https://airbrake.io/blog/sdlc/rapid-application-development
17.
Notoatmodjo et al , G.
(2009). Passwords and Perceptions. Proc. 7th Australasian Information
Security Conference (AISC 2009),.
18.
Raza et al, M. (2012). A
Survey of Password Attacks and Comparative Analysis on Methods for Secure
Authentication. World Applied Sciences Journal, 19(4), 439-444.
19.
Rivest, R. L. (2013).
Honeywords: Making password-cracking detectable. . In Proceedings of the
2013 ACM SIGSAC conference on Computer & communications security, (pp.
145-160).
20.
Sacolick, I. (2020,
feburary 25). What is agile methodology? Modern software development
explained. Retrieved from https://www.infoworld.com/article/3237508/what-is-agile-methodology-modern-software-development-explained.html
21.
Saliba, J. (2018). Extracting
Gold: Creating Wordlists from AXIOM Cases to Crack Passwords.
https://www.magnetforensics.com/blog/extracting-gold-creating-wordlists-axiom-cases-crack-passwords/.
22.
Sandvoll, M. B. (2014).
Design and Analysis of a Password Management System. Fakultet for
informasjonsteknologi og elektroteknikk (IE).
23.
Services.Google.Com.
(2019). Retrieved from
https://services.google.com/fh/files/blogs/google_security_infographic.pdf
24.
Sharif, M., & Khan,
A. U. (2007). Benchmarking of PVM and LAM/MPI Using OSCA Rocks and Knoppix
Clustering Tools in ICCISSE. International Conference on
Computer,Information and Systems Science and Engineering.
25.
Sherman, R. (2015).
Sherman, R. (2015). Project Management. Business Intelligence Guidebook,
449–492.
26.
Sterling, C. H. (2013). Biographical
Dictionary of Radio. . Routledge.
27.
Vigliarolo, B. (2018). Brute
force and dictionary attacks: A cheat sheet.
https://www.techrepublic.com/article/brute-force-and-dictionary-attacks-a-cheat-sheet/.
28.
Wang, R. C. (2016).
Phoney: protecting password hashes with threshold cryptology and honeywords. . International
Journal of Embedded Systems, , 8(2-3), 146-154.
29.
Weir et al , M. (2009).
Password Cracking Using Probabilistic Context-Free Grammars. 30th IEEE
Symposium on Security and Privacy.
30.
Yazdi, S. H. (2011). Analyzing
Password Strength & Efficient Password Cracking.
31.
Zhang et al , L. (2017).
An Improved Rainbow Table Attack for Long Passwords. Procedia Computer
Science, 107(C).
32.
Zheng et al , Z. (2018).
An Alternative Method for Understanding User-Chosen Passwords. Security and
Communication Networks, 1–12.