Real-Time Risk: AI-Driven Website Vulnerability Detection via Browser Extensions
Source PublicationScientific Publication
Primary AuthorsRoy, Jiskar, Mishra

A new browser-based tool utilises supervised machine learning to classify web page security risks instantly. This system offers a resource-light alternative for identifying threats without heavy infrastructure. By integrating AI-driven website vulnerability detection directly into the user's workflow, organisations can conduct preliminary security sweeps with minimal latency.
The operational gap in threat intelligence
Small and medium-sized enterprises (SMEs) frequently operate at a security deficit. Enterprise-grade Security Operations Centres (SOCs) demand capital that smaller entities simply do not possess. Consequently, these organisations rely on static scanners that often fail to catch emerging threats in real-time. Furthermore, the industry trend toward deep learning models creates a computational bottleneck; these models are powerful but resource-intensive and opaque.
The market requires a solution that balances accuracy with accessibility. This study addresses that specific friction point. It moves away from heavy computational requirements, proposing a system that functions within a standard web browser while maintaining compliance with rigorous standards.
Architecture: From CodeBERT to Chrome
The research team curated a dataset of 40,000 vulnerability entries, harvesting data using reconnaissance tools such as Nmap and Nessus. They labelled HTML snippets based on severity metrics from the National Vulnerability Database (NVD) and the Common Vulnerability Scoring System (CVSS). The goal was to train a model to categorise sites into Low, Medium, or High-risk tiers.
The technical execution relies on a hybrid approach. The system employs CodeBERT transformer models to convert raw HTML code into numerical embeddings. These embeddings are then processed by a Random Forest classification algorithm. To ensure speed, the team utilised Term Frequency-Inverse Document Frequency (TF-IDF) vectorisation for optimisation.
Operational deployment is handled through a custom Chrome extension. This extension extracts live webpage content and communicates with a Flask-based API hosted on Amazon EC2. The inference engine, trained on AWS SageMaker, processes the request and returns a risk classification instantly.
Performance metrics and operational reality
Testing revealed a classification accuracy of 66.3%. The Receiver Operating Characteristic - Area Under the Curve (ROC-AUC) values ranged between 0.60 and 0.70 across different severity classes. While these figures indicate that the tool is not infallible, they suggest a strong utility for initial triage.
The system successfully distinguishes between risk levels in real-time, proving that lighter, supervised learning models can perform tasks typically reserved for heavier neural networks. The trade-off in absolute precision yields a significant gain in speed and auditability.
Strategic implications for SME defence
The primary value driver here is auditability. Deep learning approaches often function as 'black boxes', making it difficult to trace why a specific decision was made. Supervised learning, conversely, offers a clearer audit trail. This transparency is vital for maintaining compliance with emerging governance frameworks, including ISO 42001 and the NIST AI Risk Management Framework.
For SMEs, this technology democratises access to threat intelligence. It provides a cost-effective layer of defence that does not require specialised hardware. The study suggests that integrating such lightweight models into browser extensions could standardise basic vulnerability detection, allowing smaller firms to identify and mitigate risks before they escalate into critical breaches.