Anuja Vasant Kale, Vishwajeet Bajpayee, Shyam P Dubey
ABSTRACT: In any organization or institution we have to deal with lots of data. Those data may contain confidential information about the customer, project related data, employee personal data etc. If such type of confidential data is leaked from the organization then it may affect on the organization health. Hence we have to enforce some policies in order to prevent data leakage. Data leakage is a loss of data which can be occurred on any storage device where the data is stored. There are two ways in which data can be leaked; if the system is hacked or if the internal resources intentionally or unintentionally make the data public. If the system is hacked then we have some existing technologies like antivirus, firewall etc which can prevent data from leakage. We will discuss here the second scenario where we provide data leakage prevention solutions. We make the use of a bayesian theorem for maintaining confidentiality of data in an organization.
KEYWORDS: sensitive data, data leakage, internal attack, external attack, data leakage prevention, bayesian approach.
Nowadays, Information Security became a vital and a major subject, especially with the spreading of information sharing among private and public networks for all organizations across different industrial sectors (e.g. telecom, banking, education all over the world). The importance of securing information is playing a significant role especially when sharing, distributing, accessing and publishing any information that had been classified as a sensitive, either for the organization itself or the clients who sharing their private information with the organization, such as information stored, shared, distributed and viewed through the electronic documents systems and/or images of paper documents systems widely used by a lot of organizations.
Many of organizations have given a great deal of attention has been given to protecting their sensitive data from the outside threats by using a set of security countermeasures like: intrusion prevention systems, firewalls, and management of the vulnerability points inside them. So, organizations must now turn their attention to an equally critical situation that forms -for them- a great challenge today, that is: the problem of data leaking or loss from the inside.
In fact, in many organizations there’s a gaping hole in controlling, monitoring, and protecting its business environment and electronic data assets from leaking or loss to the wrong individuals or groups intentionally or accidentally. This hole is the now ubiquitous in businesses, health, education organizations and individuals who need needed to communicate with each other over the Internet network.
In our days, many of the electronic communications heavily used inside any organization for many purposes, for instance: local mail, instant messaging, web mail, data files transferring, and also organization website still go largely to different destinations without any limitations, monitoring, and controlling on its movements from the organization. Thus, the expected result for this issue is there is a big potential for the organization confidential information be falling into the wrong hands. Surely, from this significant point, the organization sensitive data should be protected very well, otherwise will be facing tragic results like: business loss, damaged reputation, bad publicity, loss of strategic customers, and loss of competitiveness with the other organizations.
If you need assistance with writing your essay, our professional essay writing service is here to help!Essay Writing Service
As a result, any organization using similar electronic document system must keep a close eye to secure sensitive data that had gone forth/back through this system or application to maintain reputation and business continuous, and ensure regulations, laws compliance, along with being different from others. One of the recent methodologies and technical solution has been raised to top is the Data Leakage Prevention (DLP) solution, which is basically protecting sensitive data of an organization from being viewed by wrong individuals, whether from outside or even inside the organization. This basically means that specific data can be viewed by only a specific set of an authorized individuals or groups for them.
- Related work
As organizations progress into a more technological environment, the amount of digitally stored data increases dramatically. As a consequence, keeping track of where it is stored is no longer as easy as before. The modern workforce naturally creates and uses data sensitive to the organization to do their job. This data is then used across services such as email, business applications and cloud-services, as well as being accessed from multiple devices, including laptops and mobile phones. In many cases it is even hard for the users to manage the amount of data they deal with themselves, and the (ir) responsibility doesn’t end there. In addition, a user also needs to keep track of how sensitive data is and who should be allowed to access it.
DLP is a recent type of security technology that works toward securing sensitive data in an automated and non-intrusive fashion. Through policies a DLP system automatically makes sure no sensitive data is stored, sent or accessed where it shouldn’t be, while still allowing users to use the tools and services they choose and need to fulfil their tasks. Unlike traditional white- and blacklisting, the DLP only blocks the actions where sensitive data is involved, e.g. sending e-mails is perfectly acceptable, but not if they contain sensitive data. DLP can also be set to handle different levels of sensitivity and document access control. To quote George Lawton: “DLP systems keep people from deliberately or inadvertently sending out sensitive material without authorization” .
In addition to protecting sensitive data, a modern DLP should be adaptive, mobile and as minimally intrusive as possible . Adaptive means that it can work in different environments and be configured to meet the needs of a wide range of different businesses. Mobile means that it can still protect the data, even when the device is used outside the company network. The products today only fulfil this to a certain degree. DLP is still maturing, but unlike a few years ago, most vendors have standardized on the core functionality that defines a modern DLP solution.
- Proposed System
In Today’s business world, many organizations use Information Systems to manage their sensitive and business critical information. The need to protect such a key component of the organization cannot be over emphasized. Data Loss/Leakage Prevention has been found to be one of the effective ways of preventing Data Loss.
DLP solutions detect and prevent unauthorized attempts to copy or send sensitive data, both intentionally or/and unintentionally, without authorization, by people who are authorized to access the sensitive information.DLP is designed to detect potential data breach incidents in timely manner and this happens by monitoring data. Data Loss Prevention is found to be the data leakage/loss control mechanism that fits naturally with the organizational structure of businesses. It not only helps the organization protect structured data but it also helps protection and leakage prevention of unstructured data.
In any organization or institution we have to maintain lots of sensitive data or confidential data. These data may contain the confidential information regarding the projects; customer privileged data or employee personal data; if such type of confidential data is leaked from the organization then it may affect on the organization health.
Data leakage is a loss of data that occur on any device that stores data. It is a problem for anyone that uses a computer. Data loss happens when data may be physically or logically removed from the organization either intentionally or unintentionally.
A data stored on any storage device can be leaked in two ways; if the system is hacked or if the internal resources intentionally or unintentionally make the data public.
Hacking can be prevented by carefully configuring your Firewalls and other security devices. We will be discussing the second scenario i.e. if an internal resource makes the sensitive data public. Consider the possibility of an employee leaking the sensitive data. Now there are various ways in which data can leave the organization via internet, Email, webmail, FTP etc. Consider the possibility that an employee needs to forward the confidential data through Email or and uploading those files on to a server which can be accessed by outside world. Before reaching that confidential data to the unauthorized person we need to enforce some policies in order to avoid the violation of the organization health.
To achieve the primary requirement is to scan the whole outbound traffic. We will maintain the DLP (data link prevention) server, which would scan the complete attachment to match the patterns. In case the patter matches, the attachment will be corrupted with the User designed message and an automated response E-mail will be sent out. This mechanism is shown in the figure below.
Figure 3: The process of Data Leakage Prevention (DLP) mechanism.
As shown in the above figure, there is an internal employee of the organization who is trying to send the confidential data via email. Now, before reaching that confidential data to the unauthorized person we need to enforce some policies. For that we are using the Data Leakage Prevention (DLP) Server.
3.1 Data Leakage Prevention (DLP) Server
Data Leakage Prevention (DLP) is a computer security term which is used to identify, monitor, and protect data in use, data in motion, and data at rest . DLP is used to identify sensitive content by using deep content analysis to per inside files and with the use if network communications. DLP is mainly designed to protect information assets in minimal interference in business processes. It also enforces protective controls to prevent unwanted incidents. DLP can also be used to reduce risk, and to improve data management practices and even lower compliance cost.
DLP solution prevents confidential data loss by monitoring communications which goes outside of the organization, encrypting emails which contain confidential information. We are enabling conformity with global privacy and data security in securing outsourcing and partner communication. To check whether an email contains confidential data or not, DLP server makes the use of a Naive Bayes spam filtering.
3.2 Naive Bayes Algorithm
Naïve Bayesian method is used for the learning process. Analyze a mail to calculate its probability of being a Spam using individual characteristic of words in the mail.
For each word in the mail, calculate the following:
S (w) = (number of Spam emails containing the word)/(total number of Spam emails)
H (w) = (number of Ham emails containing the word)/(total number of Ham emails)
P (w) = S(w)/(S(w)+H(w))
P (w) can be interpreted as the probability that a randomly chosen email containing the word w is Spam.
Say a word w =“success” appears only once and it is a Spam email. Then the above formula calculates P (w)=1.
This doesn’t mean that all future mails containing this word will be considered as Spam. It will rather depend upon its degree of belief. The Bayesian method allows us to combine our intuitive background information with this collected data.
Degree of belief f(w)= [(s*x)+(n*p(w))]/(s + n)
s=Assumed strength of the background information.
x= Assumed probability of the background information.
n= no of emails received containing word w.
Combining the probabilities
Each email is represented by a set of probabilities. Combining these individual probabilities gives the overall indicator of spamminess.
H= Chi_inverse (-2*ln(Product of all(f(w)), 2*n)
S= Chi_inverse (-2*ln(Product of all(1-f(w)), 2*n)
Here, I is the Indicator of Spamminess.
A Genetic Algorithm
A mail can be divided into three parts:
Genetic Algorithm can be used to get an appropriate weight say α, β and γ for “body” part, “from” part and “subject” part.
IFinal= α*IBody+ β*IFrom+ γ*ISubject
The overall accuracy is a function of α, β and γ. Genetic Algorithm maximize the above function.
Advantages of Bayesian Method
- Bayesian approach is self adapting. It keeps learning from the new spams.
- Bayesian method takes whole message into account.
- Bayesian method is easy to use and very accurate (Claimed Accuracy Percentage is 97).
- Bayesian approach is multi-lingual.
- Reduces the number of false positives.
Sensitive Data leaking prevention became one of the most pressing security issues facing Organizations today. The most effective solution to the problem is to see Data Leakage Prevention solution (DLP) as a part of your overall security problem. This solution can be fully integrated with other security tools within organization, to form a comprehensive security strategy plan to protect these data properly. Data Leakage Prevention (DLP) solution can be used effectively in reducing intentional sensitive data leakage actions, through monitoring user’s actions and protecting three groups of organization’s data: data at rest, data in use, and data in motion. This solution can be regarded as “integrated” through achieving two main phases [two layers of defence]: protecting sensitive data and securing sensitive data of organization alike. The organization also needs to create an Acceptable Use Policy (AUP) for users, and at the same time ensuring both are compliant with organization policies. To avoid getting broad sided by a data leakage, organizations must evaluate their vulnerabilities and respond appropriately by many ways like: Endpoints protection, Gateway protection, and encryption data.
 Bradley R. Hunter, Available: http://www.ironport.com/pdf/ironport_dlp_booklet.pdf
 Data loss problems, Available: http://www.webspy.com/reso urces/whitepapers/2009WebSpy Ltd-Information Security and Data Loss Prevention.pdf
 Report, the Office of the U.S. Trade Representative, Available: http://www.ustr.gov/about -us/press- office/reports-and-publications/archive
 Lubich, H.P; “The changing roel of IT security in an Internet world, a business perspective”; Available: http://www.terena.nl/conference/archieve/tnc2000/proceedings/2A/2a2.html
Sithirasenan, E.;Muthukkumarasamy, V., “Word N-Gram Based Classification for Data Leakage Prevention”, Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on 16-18 July 2013, 578 – 585, Melbourne, VIC, 13971211, 10.1109/TrustCom.2013.71.
Pham, D.V. “Threat analysis of portable hack tools from USB storage devices and protection solutions,” IEEE ISBN: 978-1-4244-8001-2
 Bai Xiaoping; Wei Yuanfeng; , “Study on the signal detection and simulation of universal serial bus 2.0 IP core circuit system, “SoutheastCon, 2007. Proceedings. IEEE , vol., no., pp.59-62, 22-25 March 2007
 S. Jithesh and U. Naveen, “Improved key management methodology for enhanced media security in IMS networks”, New York, US: Institute of Electrical and Electronics Engineers Inc., 2007, pp. 903-907.
 AK. Gupta, U. Chandrashekhar, S.V. Sabnis and F.A, “Building secure products and solutions”, Bell Labs Technical Journal, Hoboken, US: John Wiley and Sons Inc., 2007.3, pp. 21-38
 R.A. Shaikh, S. Rajput, S.M.H. Zaidi and K. Sharif, “Comparative analysis and design philosophy of next generation unified enterprise application security”, Piscataway, US: Institute of Electrical and Electronics Engineers Computer Society, 2005, pp. 517-524.
 Data Leakage Prevention A newsletter for IT Professionals Issue 5.
 Data Leakage Detection SandipA.Kale1, Prof. S.V.Kulkarni2 Department Of CSE, MIT College of Engg, Aurangabad, Dr.B.A.M.University, Aurangabad (M.S), India1,
 Journal Of Information, Knowledge And Research In Computer Engineering Issn: 0975 – 6760| Nov 12 To Oct 13 | Volume – 02, Issue – 02| Page 534 Data Leakage Detection Nikhil Chaware 1,Prachi Bapat 2, Rituja Kad 3, Archana Jadhav 4, Prof.S.M.Sangve
Copyright to IJIRCCE www.ijircce.com 1
Cite This Work
To export a reference to this article please select a referencing stye below:
Related ServicesView all
DMCA / Removal Request
If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: