OmniTera

AI and Cybersecurity: A New Era

admin — Thu, 02 May 2024 07:15:25 +0000

In the evolving landscape of artificial intelligence (AI), both cybersecurity teams and hackers are using AI to their advantage.

Key Takeaways

Cybercriminals are using AI to carry out a variety of sophisticated attacks, from data poisoning to deepfakes.
Cybersecurity organizations also increasingly rely on AI to help flag suspicious data and detect or thwart attacks.
To help keep your data safe, review your current cybersecurity protection and make sure it follows best practices.

If you recently used your car’s GPS system, relied on auto-correct when writing an email or conducted an online search, chances are you’ve experienced artificial intelligence (AI).

So, let’s discuss the basics of AI, how cybersecurity teams and hackers are using AI, and how you can help keep yourself safe.

What Is AI?

AI is a broad term that refers to the science of simulating human intelligence in machines with the goal of enabling them to think like us and mimic our actions. This would allow AI machines to perform tasks that previously only human beings could handle. For some tasks, AI may even surpass human beings.

Many AI machines attempt to determine the best way to achieve an outcome or solve a problem. They typically do this by analyzing enormous amounts of training data and then finding patterns in the data to replicate in their own decision-making.

While AI may seem futuristic, the concept behind it is believed to have begun in 1950, when British mathematician and logician Alan Turing speculated about “thinking machines” that could reason similarly to humans.¹ The term “artificial intelligence” was born a few years later.²

How AI Benefits Cybersecurity

Artificial intelligence (AI) is reshaping nearly every industry – and cybersecurity is no exception. A recent research report estimated the global market for AI-based cybersecurity products was about $15 billion in 2021 and will surge to roughly $135 billion by 2030.³

Cybersecurity organizations increasingly rely on AI in conjunction with more traditional tools such as antivirus protection, data-loss prevention, fraud detection, identity and access management, intrusion detection, risk management and other core security areas. Because of the nature of AI, which can analyze enormous sets of data and find patterns, AI is uniquely suited to tasks such as:

Detecting actual attacks more accurately than humans, creating fewer false-positive results, and prioritizing responses based on their real-world risks;
Identifying and flagging the type of suspicious emails and messages often employed in phishing campaigns;
Simulating social engineering attacks, which help security teams spot potential vulnerabilities before cybercriminals exploit them; and
Analyzing huge amounts of incident-related data rapidly, so that security teams can swiftly take action to contain the threat.

Additionally, AI has the potential to be a game-changing tool in penetration testing—intentionally probing the defenses of software and networks to identify weaknesses. By developing AI tools to target their own technology, organizations will be better able to identify their weaknesses before hackers can maliciously exploit them.

Having this intelligence would provide cybersecurity organizations with a significant edge in preventing future attacks. Stopping breaches before they occur would not only help protect the data of individuals and companies, but also lower IT costs for businesses.

How Hackers Abuse AI

Unfortunately, cybercriminals are relentless and resourceful. Let’s look at several ways they’re using AI for their own benefit:

1 Social engineering schemes:

These schemes rely on psychological manipulation to trick individuals into revealing sensitive information or making other security mistakes. They include a broad range of fraudulent activity categories, including phishing, vishing and business email compromise scams.

AI allows cybercriminals to automate many of the processes used in social-engineering attacks, as well as create more personalized, sophisticated and effective messaging to fool unsuspecting victims. This means cybercriminals can generate a greater volume of attacks in less time—and experience a higher success rate.

2 Password hacking:

Cybercriminals exploit AI to improve the algorithms they use for deciphering passwords. The enhanced algorithms provide quicker and more accurate password guessing, which allows hackers to become more efficient and profitable. This may lead to an even greater emphasis on password hacking by cybercriminals.

3 Deepfakes:

This type of deception leverages AI’s ability to easily manipulate visual or audio content and make it seem legitimate. This includes using phony audio and video to impersonate another individual. The doctored content can then be broadly distributed online in seconds—including on influential social media platforms—to create stress, fear or confusion among those who consume it.

Cybercriminals can use deepfakes in conjunction with social engineering, extortion and other types of schemes.

4 Data poisoning:

Hackers “poison” or alter the training data used by an AI algorithm to influence the decisions it ultimately makes. In short, the algorithm is being fed with deceptive information, and bad input leads to bad output.

Additionally, data poisoning can be difficult and time-consuming to detect. So, by the time it’s discovered, the damage could be severe.

Staying Secure in a Changing AI Environment

As AI evolves, concerns about data privacy and risk management for both individuals and businesses continue to grow. Regulators are considering ways to develop AI and maximize its benefits while reducing the likelihood of negative impacts to society. However, there currently isn’t any comprehensive AI federal legislation in the United States.

So, what does all this mean to you? How do the advancements in AI impact your life from a security perspective?

Fortunately, the answer is surprisingly simple. You don’t need to learn a new set of cybersecurity rules. Instead, you should review your current cybersecurity protection and make sure it follows best practices in critical areas such as passwords, data privacy, personal cybersecurity and especially social engineering.

It makes it easier for all of us to enjoy the conveniences and other enhancements in our daily lives made possible.

GitOps – A Framework to Implement DevOps Best Practices

admin — Thu, 02 May 2024 06:59:54 +0000

Introduction

In the ever-evolving world of DevOps, GitOps has emerged as a powerful framework that leverages Git’s potential to the fullest. GitOps, a term coined by Weaveworks, is a way to do Kubernetes cluster management and application delivery. It works by using Git as a single source of truth for declarative infrastructure and applications.

What is GitOps?

GitOps is a paradigm or a set of practices that empowers developers to perform tasks which typically fall under the purview of IT operations. GitOps requires us to describe and observe systems with declarative specifications that can be version controlled and managed with Git.

Why GitOps?

The adoption of GitOps brings numerous benefits to the table:

Increased Productivity: GitOps allows developers to use familiar tools like Git and Continuous Deployment tools.
Enhanced Developer Experience: Developers are more comfortable coding, reviewing pull requests and merging, rather than writing scripts and manual deployment.
Improved Stability: GitOps provides a stable framework for cloud-native application delivery.
Better Security: With Git at the center of your delivery pipelines, every pull request has a traceable record. This improves auditability and compliance.

GitOps Best Practices

Here are some best practices to follow while implementing GitOps:

Use Declarative Infrastructure: All resources should be defined declaratively. Declarative descriptions are idempotent, meaning they can be reapplied without changing the outcome.
Version Control System as a Single Source of Truth: The canonical desired system state should be versioned in Git. This makes it easy to track every change.
Automated Delivery: Changes to the desired state in version control should automatically result in system state change. This ensures a fast and consistent delivery.
Software Agents to Ensure Correctness: Software agents should continuously monitor and ensure correctness. An alert should be triggered if there’s a divergence between the desired and actual state.

Conclusion

Embracing GitOps practices can lead to more efficient and reliable delivery pipelines, making it an excellent choice for teams looking to scale their DevOps efforts. As the cloud-native ecosystem continues to evolve, GitOps is likely to play an increasingly important role in how we build and manage our systems.

Vector-based Search: An Efficient Technique for Unstructured Duplicate Data Detection

admin — Wed, 24 Apr 2024 06:39:34 +0000

Organizations today are driven by a competitive landscape to make insights-led decisions at speed and scale. And, data is at the core here. Capturing, storing and analyzing large volumes of data in a proper way has become a business necessity. Analyst firm IDC predicts that the global creation and replication of data will reach 181 zettabytes in 2025. However, almost 80% of that data will be unstructured and much less will be analyzed and stored.

A single user or organization may collect large amounts of data in multiple formats such as images, documents, audio files, and so on, that consume significantly large storage space. Most storage applications use a predefined folder structure and give a unique file name to all data that is stored. This unique file name system of applications enables the same file to exist under different names. This makes it rather difficult to identify duplicate data without checking its content.

This blog focuses on the challenges associated with data duplication in the database and the detection of the same in unstructured folder directories.

The complications of unstructured data

Unstructured data is defined as data that lacks a predefined data model or that cannot be stored in relational databases. According to a report, 80% to 90% of the world’s data is unstructured, the majority of which has been created in the last couple of years. The unstructured data is growing at a rate of 55%-65% every year. Unstructured data may contain large amounts of duplicate data, limiting enterprises’ ability to analyze their data.

Here are a few issues with unstructured data (duplicate data in particular) and its impact on any system and its efficiency:

Increase in storage requirements: Higher the duplicate data, more the storage requirements. This increases the operating costs for applications substantially.
Large number of data files: This significantly increases the response time for every type of search function.
Delays in migration: Larger duration of time is required for migrating data from one storage facility to another.
Difficulty in eliminating duplicates: It becomes more difficult to remove duplicate files when the scalability of the system increases.

Redundant data creates disarray in the system. For that reason, it becomes imperative for organizations to identify and eliminate duplicate files. A clean database free of duplicate data avoids unnecessary computation requirements and improves efficiency.

Challenges in duplicate record detection

Detecting duplicate files by search functions using file characteristics like name, size, type and others, may seem to be the easiest method. However, it might not prove to be the most efficient method, especially if the data is on large scale. Here’s why:

Searching with file names: Most of the applications use unique file names to store media files. This makes the search difficult because the same file can be under different names. Identification of duplicate data is not possible unless the content is examined.
Search based on content: As searching with file names isn’t suitable for applications, a search based on content appears to be the next option. However, if we are dealing with a large document or pdf with multiple pages, this is not a feasible solution either. It will not only have high latency but will also be a computationally expensive task.
Search based on types and formats: Media files can be of different types like images, video, audio and so on. Each type of media file can be stored in multiple formats. For instance, an audio file can be saved as .wav, .mp3, AAC or others. The file structure and encoding for each format will be different, hence making the detection of duplicate files difficult.

The proposed solution

A suitable solution to detect duplicate files must address the complications associated with dealing with large volumes of data, multiple media formats and low latency. If each file were to be converted into multi-dimensional vectors and fed as inputs to the nearest neighbor’s algorithm, one would get the top 5-10 possible duplicate copies of the file. Once converted into vector files, duplicate data can be easily identified as the difference in distance of the respective dimensions of duplicate files will be almost indistinguishable.

Here’s how different types of files can be converted to multi-dimensional vectors.

Image files: Images are multi-dimensional arrays that have multiple pixels. Each pixel has three values – red, green and blue. When passed through a pre-trained convolution neural network, the images or a video frame get converted into vectors. A convolution neural network is a deep learning architecture, specifically designed to work with image inputs. Many standard architectures like VGG16, ResNet, MobileNet, AlexNet and others are proven to be very efficient in prediction based on inputs. These architectures are trained on large standard datasets like ImageNet with classification layers at the top.
Represented below is a very simple sample convolution neural network for reference:

2. The required images are fed into multiple convolution layers as inputs. Convolution layers are trained to identify underlying patterns from image inputs. Each convolution layer has its own set of filters that multiplies the pixels of the input image. The pooling layer takes the average of the total pixels and reduces the image size as it passes on to the next step in the network. The flatten layer collects the input from the pooling layers and gives out the vector form for the images.
3. Text Files: To convert the text files into vectors, the words that comprise that particular file are used. Words are nothing but a combination of ASCII codes of characters. However, there is no representation available for a complete word. In such cases, pre-trained word vectors such as Word2Vec or Glove vectors can be used. Pre-trained word vectors are obtained after training a deep-learning model such as the skip-gram model on large text data. More details on this skip-gram model are available in the TensorFlow documentation. The output vector dimension will change with respect to the chosen pre-trained word representation model.

To convert a text document with multiple words where the number of words is not fixed, Average Word2Vec representation can be used on the complete document. The calculation of Average Word2Vec vectors is done using the formula below:

4. This solution can be made more feasible by adding a 36-dimensional (26 alphabets + 10 digits) vector as an extension to the final representation of the text file. This becomes efficient in cases when two text files have the same characters but in different sequences.

5. PDF files: PDF files usually contain texts, images or a mix of both. Therefore, to make a more inclusive solution, vector conversion for both texts and images are programmed in. The approaches discussed earlier to convert text and images into vectors is combined here.

First, to convert the text into a vector, it needs to be extracted from the PDF file and then passed through a similar pipeline as discussed before. Similarly, to convert images to vectors, each page in a PDF is considered as an image and is passed through a pre-trained convolution neural network as discussed before. A PDF file can have multiple pages and to include this aspect, the average of all page vectors is taken to get the final representation.
6. Audio files: Audio files stored in .wav or .mp3 formats are sampled values of audio levels. Audio signals are analogue and to store them digitally, it undergoes the process of sampling. Sampling is a process where an analogue-to-digital converter captures sound waves from audio files at regular intervals of time (known as samples) and stores them. The sampling rate may vary according to the applications used. Therefore, while converting audio files to vectors, a fixed resampling is used to get standard sampling rates.

Another difficulty while converting audio files into vectors is that the lengths of the audio files may vary. To solve this, a fixed-length vector with padding (adding zeros at the end or start) or trimming (trimming the vector to a fixed length) can be added, depending on the audio length.

Finding duplicates with vector representations

With vector representations for all types of files, it now becomes easier to find duplicate data based on the difference in distance of respective dimensions. As previously stated, detection by comparing each vector may not be an efficient method as it can increase latency. Therefore, a more efficient method with lower latency is to use the nearest neighbors algorithm.

This algorithm takes vectors as inputs and computes the Euclidean distance or cosine distance between the respective dimensions of all the possible vectors. The files with the shortest distance between their respective vector dimensions are likely duplicates.

Finding Euclidean distance may take longer (O(n^2) latency computation), but the optimized Sci-Kit Learn implementation with the integration of KDTrees reduces the computational time (brings down latency by O(n(k+log(n))). Note: k is the dimension of the input vector.

Please note that different processing pipelines must be used when converting images, texts, PDFs, and audio files into vectors. This is to ensure that the scale of these vectors is the same. Since the nearest neighbour’s algorithm is a distance-based algorithm, we may not get correct results if the vectors are in different scales. For instance, one vector’s values can vary from 0 to 1 while another vector’s values can vary from 100-200. In this case, irrespective of the distance, the second vector will take precedence.

The nearest neighbour algorithm also tells us how similar the files are (lesser the distance between dimensions, more similar the files are). Each file vector has to be scaled within a standard range to have a uniform distance measure. This can be done by using a pre-processing technique such as StandardScaler from Sci-kit Learn. After the pre-processing, the nearest neighbour algorithm can be applied to get the nearest vector for each file. Since the Euclidean distances are calculated along with the nearest neighbour vectors, a distance threshold can be applied to filter out less probable duplicate data.

Conclusion

Data duplication in any system will impact its performance and demand unnecessary infrastructure requirements. Duplicate record detection based on file characteristics is not a recommended method as it might require an examination of the content for accurate results. Vector-based search is a more efficient technique for duplicate record detection. Successful implementation of this methodology can help identify the most and least probable duplicate files in unstructured data storage systems.

Generative AI—Is It the Catalyst for Evolution in Test Automation?

admin — Wed, 24 Apr 2024 06:27:00 +0000

Generative AI is changing testing practices by automating the creation of test cases, adapting to software changes and improving test efficiency. This highlights the growing importance of artificial intelligence in improving test coverage and accuracy, making test automation even more adaptive and intelligent. It has the potential to change the way software is tested, ultimately leading to higher-quality software products.

When I think about artificial intelligence (AI), I’m reminded of a time not too long ago when AI was just a futuristic concept, something we saw in sci-fi movies and read about in books. Little did I know that AI would soon become an integral part of my daily life, reshaping how I work and make decisions.

As a software test automation engineer, I’ve seen the testing landscape evolve dramatically. But the introduction of AI genuinely revolutionized how I approached my work.

Before Generative AI

Automation testing has been an invaluable tool throughout the software development lifecycle, enabling rapid execution of test cases and reducing the time and effort required for regression testing. Automation frameworks and command sets have played a central role in ensuring software functionality, detecting defects and maintaining product quality. New AI tools, however, with neural networks and deep learning models continuously emerge, and most of them are very effective tools that definitely improve testing.

But these conventional testing methodologies possess their own inherent restrictions. Let us delve into a few of these limitations:

Maintenance costs: Automation scripts require constant maintenance as the software evolves, which increases the cost and effort of script maintenance.
Limited scope of testing: Automation testing focuses on predefined tests that often struggle to adapt to dynamic changes in the software environment.
Complex UI testing: Extensive automation of user interface (UI) testing can be complex, leading to gaps in test coverage.

Generative AI can help overcome some of these limitations, but it may also bring about new challenges.

Generative AI

Generative AI is a type of deep learning that can generate new data or content similar to the original data or content. For example, generative AI can create text that imitates the style and tone of a particular genre. Generative AI uses various techniques such as generative adversarial networks (GAN), variational autoencoders (VAE), and transformers to learn patterns and features of data and then generate new samples.

In recent years, the field of generative AI software testing has undergone significant change. Traditionally, automation testing has been the cornerstone of software quality assurance, ensuring the efficiency and accuracy of software functionality verification. However, Generative AI is poised to transform the automation testing landscape, introducing new approaches and capabilities that promise to change the way software is assured:

Test case generation: Generative AI can dynamically generate test cases by learning how the application works, identifying edge cases and creating test scenarios that humans might miss.
Self-healing: Test automation tools that use AI algorithms can adapt to software changes and update test locators if they change, which reduces the maintenance of testers.
Improved UI testing: After learning UI, Generative AI can simulate human interaction with UI.
Larger data sets: Generative AI analyzes large data sets and generates test data with different combinations, discovering subtle flaws.
Shift-left testing: Generative AI facilitates error detection earlier in the development process, reducing the cost and effort of debugging later.

There are a few challenges and factors, however, that must be taken into account prior to the employing of any generative AI models:

Data Quality: Models need high-quality training data to perform effectively—garbage in and garbage out applies here.
Expensive: Machine learning & deep learning are an expensive technology to implement. It is estimated that the resources for developing a large AI model have roughly doubled since its inception.
Interpretability: Understanding AI-generated test cases and results can be challenging. Ensuring transparency and interpretability is crucial for trust and accountability.

After all this discussion on the drawbacks of traditional automation and the benefits and challenges of generative AI, the testing methodology and disciplines aren’t going anywhere. In fact, it creates new opportunities to build and learn new skills, tools, and roles. The requirements to optimize and go faster will never go away—and ChatGPT and generative AI help us automation testers do just that.

Below is an example which clearly states how ChatGPT or generative AI can help automation engineers to some extent, but to finish the work, the engineer needs to use their set of skills to fine-tune the solution provided by the generative AI tool.

Let’s say you want to test the UI for a web application and its browser-based app, and you decide to use chatGPT to help generate the automation script, sending the following message to ChatGPT:

“Generate selenium java script to visit ”https://www.amazon.com/” and choose the “Medical Care” option and click “See all health on Amazon.”

Here is ChatGPT’s response:

The script is close but there’s one major problem: the script is trying to click on an element which is actually not clickable, so when we run this script as it is, it will fail. An automation engineer can easily identify this issue. There are many more such examples which need human intervention and expertise. So, in this case, generative AI is capable of doing perhaps 80% of the work necessary to solve a problem, but the rest of the fine-tuned work needs domain-specific knowledge and expertise.

Conclusion

Generative AI is evolving at a rapid pace, but at this time it’s not capable of replacing automation engineers. As generative AI evolves more, automation engineers need to also change their testing practices by using this technology to deliver bug-free products in a more efficient way. Will ChatGPT or generative AI eventually take automation engineers’ jobs? That we don’t know, but one thing we do know is AI will change the testing landscape forever.