S-Graffito project is aimed at building a streaming graph management system that addresses the processing of OLTP and OLAP queries on high streaming rate, very large graphs. These graphs are increasingly being deployed to capture relationships between entities (e.g., customers and catalog items in an online retail environment) both for transactional processing and for analytics (e.g., recommendation systems). The project consists of two components: streaming graph querying and streaming graph analytics. I have been involved in the second component (streaming graph analytics) during my PhD and PostDoctoral fellowship at the University of Waterloo.
Towards RELIABLE S.t..r...e....ami..ng Graph Analytic
Continuing my PhD studies (published in sGrow and sGrapp papers), I am developing simultaneously effective, efficient, and privacy-preserving solutions for streaming graph drift detection (sGradd) and streaming graph drift prediction (sGradp). The problem is concept drift (CD) detection: CD is a phenomenon that occurs when ``changes in hidden context can induce more or less radical changes in target concept''. Hidden context refers to insufficient, incomplete, or unobservable information about input data. Target concept refers to known and/or observable information that have direct impact on the task's output. Our goal is signaling the changes in the hidden contexts such as the generative source(s) of streaming (graph) data records.
Prompt CD management in streaming settings is important for generating relevant, reliable, and effective outputs. CD detection and understanding, benefit (i) development of accurate generative data models which are expected to (not) preserve concepts, (ii) generalization of analytics, (iii) designing algorithms (e.g., network protocols for error control and estimating the time-to-live of routing packets), and (iv) anomaly detections in areas such as the following.
data management systems: erroneous data for buffer management in streaming graph management systems
security: data manipulation in communication links
health care and bioinformatics: irregular data for emergency health conditions
e-commerce and web systems: unexpected data for supply chain management
distributed systems, social networks, computational networks, network on chips, cellular backhaul and core networks: artificial and abrupt traffic data for network traffic control
environments, geological and geospatial analytics: abnormal data for map routing technologies, analysis of mineral by-products, geohazard assessments
multimedia software development: unprecedented data for musical contests and athletic performance evaluation (e.g., in artistic gymnastic and taekwondo matches), sound control, speaker view, and bandwidth adjustment in videoconferencing tools
neuroscience: neuronal spikes for cognitive neuroscience studies