Text layout, update of contribution in intro.

rustam-azimov · rustam-azimov · commit bf903656b1c3 · 2025-08-24T12:10:08.000+03:00
diff --git a/InProgress/tensors_prog/eng/001intro.tex b/InProgress/tensors_prog/eng/001intro.tex
@@ -51,9 +51,7 @@
 \begin{enumerate}
 	\item We rethink and improve the CFPQ algorithm based on tensor-product proposed by~\cite{10.1007/978-3-030-54832-2_6}.
 	We reduce this algorithm to operations over Boolean matrices.
-	As a result, all-path query semantics is handled, as opposed to the previous matrix-based solution capable of handling only the single-path semantics.
-	Best to our knowledge, our algorithm is the first CFPQ algorithm based on linear algebra which is capable to handle all-path query semantics.
-	Also, both regular and context-free grammars can be used as queries.
+	As a result, all-path query semantics is handled. Also, both regular and context-free grammars can be used as queries.
 	\item
 	We prove the correctness and time complexity for the proposed algorithm thus providing an upper bound on the complexity of the CFPQ problem in dependence on the size of the query (its context-free grammar) and the number of vertices in the input graph.
 	The proposed algorithm has subcubic complexity in terms of the grammar and the input graph sizes, which is comparable with the state-of-the-art solutions.
diff --git a/InProgress/tensors_prog/eng/002prelim.tex b/InProgress/tensors_prog/eng/002prelim.tex
@@ -4,7 +4,7 @@
 
 In this section, we introduce some basic notation and definitions from graph theory and formal language theory which will be used in the rest of the paper.
 
-\subsection{Language-Constrained Path Querying Problem}
+\he{Language-Constrained Path Querying Problem}
 
 We use a directed edge-labeled graph as a data model.
 To introduce the \term{Language-Constraint Path Querying Problem}~\cite{barrett2000formal} over directed edge-labeled graphs we first give definitions for both languages and grammars.
@@ -106,7 +106,7 @@ \subsection{Language-Constrained Path Querying Problem}
 
 Note that $\Pi$ can be infinite, thus in practice, we should provide a way to build a finite representation of such paths with reasonable complexity, instead of explicit construction of the $\Pi$.
 
-\subsection{Regular Path Queries and Finite State Machine}
+\he{Regular Path Queries and Finite State Machine}
 
 In \term{Regular Path Querying} (RPQ) the language $\mathcal{L}$ is regular.
 This case is widespread and well-studied.
@@ -171,7 +171,7 @@ \subsection{Regular Path Queries and Finite State Machine}
 Thus RPQ evaluation is an intersection of two FSMs.
 The query result can also be represented as FSM because regular languages are closed under intersection~\cite{automata:theory:10.5555/1177300}.
 
-\subsection{Context-Free Path Querying and Recursive State Machines}
+\he{Context-Free Path Querying and Recursive State Machines}
 
 An even more general case than RPQ is a \term{Context-Free Path Querying Problem (CFPQ)}, where one can use context-free languages as constraints.
 These constraints are more expressive than regular ones.
@@ -310,7 +310,7 @@ \subsection{Context-Free Path Querying and Recursive State Machines}
     $$
 
 Matrix $M_1$ can be represented as a set of Boolean matrices as follows:
-{\small
+{\scriptsize
 \begin{align*}
 M_1^S =
 \begin{pmatrix}
@@ -338,7 +338,7 @@ \subsection{Context-Free Path Querying and Recursive State Machines}
 Also, an RSM can be viewed as an FSM over $\Sigma \cup N$.
 In this work, we use this point of view to propose a unified algorithm to evaluate both regular and context-free path queries with zero overhead for regular queries.
 
-\subsection{Graph Kronecker Product and Machines Intersection}
+\he{Graph Kronecker Product and Machines Intersection}
 
 In this section, we introduce the classic Kronecker product definition,
 describe graph Kronecker product and its relation to Boolean matrices algebra,
diff --git a/InProgress/tensors_prog/eng/003_algorithm_example.tex b/InProgress/tensors_prog/eng/003_algorithm_example.tex
@@ -1,4 +1,4 @@
-\subsection{An example}
+\he{An example}
 \label{example:section}
 In this section, we introduce a detailed example to demonstrate the steps taken by the proposed algorithms.
 Namely, consider the graph $\mathcal{G}$ presented in Figure~\ref{fig:example_input_graph} and the RSM $R$ presented in Figure~\ref{example:automata}.
@@ -16,6 +16,7 @@ \subsection{An example}
 $\mathcal{M}_1$ and $\mathcal{M}_{2,(0)}$ matrices and collapse the result to the single Boolean matrix
 $M_{3,(1)}$. For the sake of simplicity, we provide only
 $M_{3,(1)}$, which is evaluated as follows.
+\small
 {
     \renewcommand{\arraystretch}{0.5}
     \setlength\arraycolsep{0.1pt}
@@ -76,6 +77,7 @@ \subsection{An example}
 corresponding matrix block in the evaluated matrix $M_{3,{2}}$. The transitive closure
 evaluation introduces three new paths $(0, 1) \rightarrow (2,1), (1, 0) \rightarrow (3,1)$ and $(0, 1) \rightarrow (3,1)$ (see Figure~\ref{fig:example_2_product}). Since only the path between vertices $(0,1)$ and
 $(3,1)$ connects the start and final states in the automaton, the edge $(1,S,1)$ is added to the resulting graph.
+\small
 {
     \renewcommand{\arraystretch}{0.5}
     \setlength\arraycolsep{0.1pt}
diff --git a/InProgress/tensors_prog/eng/003_index_creation_algorithm.tex b/InProgress/tensors_prog/eng/003_index_creation_algorithm.tex
@@ -1,4 +1,4 @@
-\subsection{Index Creation Algorithm}
+\he{Index Creation Algorithm}
 
 The \textit{index creation} algorithm outputs the final adjacency matrix for the input graph with all pairs of vertices which are reachable through some nonterminal in the input grammar $G$, as well as the index matrix which is to be used to extract paths in the \textit{path extraction} algorithm.
 
@@ -79,7 +79,8 @@ \subsection{Index Creation Algorithm}
 \EndFunction
 \end{algorithmic}
 \end{algorithm}
-\subsubsection{Application of Dynamic Transitive Closure}
+
+\textbf{Application of Dynamic Transitive Closure.}
 The most time-consuming steps of the algorithm are the computations of the Kronecker product and transitive closure.
 Note that the adjacency matrix $\mathcal{M}_2$ is changed incrementally i.e. elements (edges) are added to $\mathcal{M}_2$ at each iteration of the algorithm and are never deleted from it.
 So it is not necessary to recompute the whole product or transitive closure if some appropriate data structure is maintained.
@@ -159,7 +160,7 @@ \subsubsection{Application of Dynamic Transitive Closure}
 %one to express it in terms of basic matrix operations.
 % TODO: more accurate upper bound for the algorithm complexity
 
-\subsubsection{Index creation for RPQ}
+\textbf{Index creation for RPQ.}
 In the case of the RPQ, the main \textbf{while} loop takes only one iteration to actually append data.
 Since the input query is provided in the form of the regular expression, one can construct the corresponding RSM which consists of the single \textit{component state machine}.
 This CSM is built from the regular expression and is labeled as $S$, for example, and has no \textit{recursive calls}.
diff --git a/InProgress/tensors_prog/eng/003_paths_extraction_algorithm.tex b/InProgress/tensors_prog/eng/003_paths_extraction_algorithm.tex
@@ -1,4 +1,4 @@
-\subsection{Paths Extraction Algorithm}
+\he{Paths Extraction Algorithm}
 After the index has been created, one can enumerate all paths between specified vertices.
 The index $M_3$ already stores data about all paths derivable from nonterminals.
 This data can be used to construct these paths. However, the set of such paths can be infinite.
diff --git a/InProgress/tensors_prog/eng/004evaluation.tex b/InProgress/tensors_prog/eng/004evaluation.tex
@@ -12,21 +12,21 @@
 We only measure the execution time of the algorithms themselves, thus we assume an input graph is loaded into RAM in the form of its adjacency matrix in the sparse format.
 Note, that the time needed to load an input graph into the RAM is excluded from the time measurements.
 
-\subsection{RPQ Evaluation}
+\he{RPQ Evaluation}
 
 To investigate the applicability of the proposed algorithm for regular path querying we gathered a dataset that consists of both real-world and synthetically generated graphs.
 We generated the queries from the most popular RPQ templates.
 
-\subsubsection{Dataset}
+\he{Dataset}
 
 We gathered several graphs that represent real-world data from different areas and are frequently used for the evaluation of the graph querying algorithms.
 Namely, the dataset consists of three parts.
 The first part is the set of LUBM graphs\footnote{Lehigh University Benchmark (LUBM) web page: \url{http://swat.cse.lehigh.edu/projects/lubm/}. Access date: 07.07.2020.}~\cite{10.1016/j.websem.2005.06.005} which have different numbers of vertices.
 The second one is the set of graphs from Uniprot database\footnote{Universal Protein Resource (UniProt) web page: \url{https://www.uniprot.org/}. All files used can be downloaded via the link: \url{ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/}. Access date: 07.07.2020.}: \textit{proteomes}, \textit{taxonomy} and \textit{uniprotkb}.
-The~last part consists of the RDF files \textit{mappingbased\_properties} from DBpedia\footnote{DBpedia project web site: \url{https://wiki.dbpedia.org/}. Access date: 07.07.2020.} and \textit{geospecies}\footnote{The Geospecies RDF: \url{https://old.datahub.io/dataset/geospecies}. Access date: 07.07.2020.}.
+The~last part consists of the RDF files \textit{mappingbased\_properties} (\textit{mapping\_prop}) from DBpedia\footnote{DBpedia project web site: \url{https://wiki.dbpedia.org/}. Access date: 07.07.2020.} and \textit{geospecies}\footnote{The Geospecies RDF: \url{https://old.datahub.io/dataset/geospecies}. Access date: 07.07.2020.}.
 A brief description of the graphs in the dataset is presented in Table~\ref{tbl:graphs_for_rpq}.
 
-\begin{table}
+\begin{table}[h]
     \centering
 \caption{Graphs for RPQ evaluation}
 \label{tbl:graphs_for_rpq}
@@ -50,7 +50,7 @@ \subsubsection{Dataset}
 Taxonomy & 5 728 398 & 14 922 125 \\
 \hline
 Geospecies & 450 609 & 2 201 532 \\
-Mappingbased\_properties & 8 332 233 & 25 346 359 \\
+Mapping\_prop & 8 332 233 & 25 346 359 \\
 \hline
 \end{tabular}
 }
@@ -63,11 +63,11 @@ \subsubsection{Dataset}
 The most frequent relations from the given graph were used as symbols in the query template\footnote{Used generator is available as part of CFPQ\_data project: \url{https://github.com/JetBrains-Research/CFPQ_Data/blob/master/tools/gen_RPQ/gen.py}. Access data: 07.07.2020.}.
 We used the same set of queries for all LUBM graphs to investigate the scalability of the proposed algorithm.
 
-\begin{table}
+\begin{table}[h]
     \centering
 \caption{Queries templates for RPQ evaluation}
 \label{tbl:queries_templates}
-{\small
+{\scriptsize
 \renewcommand{\arraystretch}{1.2}
 %\rowcolors{2}{black!2}{black!10}
 \begin{tabular}{|c|c||c|c|}
@@ -96,7 +96,7 @@ \subsubsection{Dataset}
 \end{table}
 
 
-\subsubsection{Results}
+\he{Results}
 
 We averaged the execution time of index creation over 5 runs for each query.
 Index creation time for LUBM graphs set is presented in Figure~\ref{fig:lubm_all_qs}.
@@ -105,7 +105,7 @@ \subsubsection{Results}
 We conclude that our algorithm demonstrates reasonable performance to be applied to the real-world data analysis.
 %\cho{Note that the accurate comparison of different approaches may be a promising direction of future research.}
 
-\begin{figure}
+\begin{figure}[h]
     \centering
    \includegraphics[width=0.5\textwidth]{LUBM_all.pdf}
    \caption{Index creation time for LUBM graphs}
@@ -119,7 +119,7 @@ \subsubsection{Results}
 On the other hand, \textit{taxonomy} querying in many cases requires significantly more time than for other graphs, while \textit{taxonomy} is not the biggest graph.
 Finally, in most cases, query execution lasts less than 10 seconds, even for bigger graphs, and no query requires more than 52.17 seconds.
 
-\begin{figure}
+\begin{figure}[h]
     \centering
    \includegraphics[width=0.5\textwidth]{other_all.pdf}
    \caption{Index creation time for real-world RDFs}
@@ -163,12 +163,12 @@ \subsubsection{Results}
 %   \caption{Single path extraction for specific graph and query for our solution (\subref{fig:geo_tensors_rpq}, \%subref{fig:dbpedia_tensors_rpq}, \subref{fig:geo_tensors_cfpq}), and Azimov's (\subref{fig:geo_matrix_cfpq})}
 %\end{figure}
 
-\subsection{CFPQ Evaluation}
+\he{CFPQ Evaluation}
 
 We evaluate the applicability of the proposed algorithm to CFPQ processing over real-world graphs on a number of classic cases and compare them with Azimov's algorithm.
 Currently, only a single path version of Azimov's algorithm exists, and we use its implementation using PyGraphBLAS. Note that it is not trivial to compare our results with the state-of-the-art results provided by~\cite{10.1145/3398682.3399163} (Azimov's algorithm) because our algorithm computes significantly more information. While the state-of-the-art solution computes only reachability facts or a single-path semantics, our algorithm computes data necessary to restore all possible paths.
 
-\subsubsection{Dataset}
+\he{Dataset}
 
 We use CFPQ\_Data\footnote{CFPQ\_Data is a dataset for CFPQ evaluation which contains both synthetic and real-world data and queries \url{https://github.com/JetBrains-Research/CFPQ\_Data}. Access date: 07.07.2020.} dataset for evaluation.
 Namely, we use relatively big RDF files and respective same-generation queries $G_1$~(Eq.~\ref{eqn:g_1}) and $G_2$~(Eq.~\ref{eqn:g_2}) which are used in other works for CFPQ evaluation.
@@ -206,7 +206,7 @@ \subsubsection{Dataset}
 The detailed data about all the graphs used is presented in Table~\ref{tbl:graphs_for_cfpq}.
 
 {\setlength{\tabcolsep}{0.2em}
-\begin{table}
+\begin{table*}[h]
     \centering
 {
 \caption{Graphs for CFPQ evaluation: \textit{bt} is broaderTransitive, \textit{sco} is subCalssOf}
@@ -232,14 +232,14 @@ \subsubsection{Dataset}
 \hline
 \end{tabular}
 }
-\end{table}
+\end{table*}
 }
-\subsubsection{Results}
+\he{Results}
 
 We averaged the index creation time over 5 runs for both single-path Azimov's algorithm (\textbf{Mtx}) and the proposed algorithm (\textbf{Tns}) (see Table~\ref{tbl:CFPQ_index}).
 
 {\setlength{\tabcolsep}{0.2em}
-  \begin{table}
+  \begin{table*}[h]
     \centering
     \caption{CFPQ evaluation results, time is measured in seconds}
     \label{tbl:CFPQ_index}
@@ -267,7 +267,7 @@ \subsubsection{Results}
       fs              & ---    & ---    & ---  & ---  & ---   & ---   & 470.49  & 370.73  \\
       \hline
     \end{tabular}
-  \end{table}
+  \end{table*}
 }
 
 We can see that while in some cases our solution is comparable or just slightly better than Azimov's algorithm (\textit{enzyme, eclass\_514en, go}), there are cases when our solution is significantly faster (\textit{go-hierarchy}, up to 9 times faster), and when Azimov's algorithm about 1.3 times faster (all memory aliases and \textit{geospecies} with \textit{Geo} query).
@@ -285,7 +285,7 @@ \subsubsection{Results}
 %While both methods demonstrate linear time on the length of the extracted path, our generic solution is more than 1000 times slower than Azimov's single path extraction procedure.
 %We conclude that current generic all-path extraction procedure is not optimal for single path extraction.
 
-\subsection{Conclusion}
+\he{Conclusion}
 
 We conclude that the proposed algorithm is applicable to real-world data processing: the algorithm allows one to solve both the reachability problem and to extract paths of interest in a reasonable time.
 While index creation time (reachability query evaluation) is comparable with other existing solutions, the paths extraction procedure should be improved in the future. However, the state-of-the-art solution computes only reachability facts or a single-path semantics, whereas our algorithm computes data necessary to restore all possible paths (all-paths semantics).
diff --git a/InProgress/tensors_prog/eng/main.pdf b/InProgress/tensors_prog/eng/main.pdf
diff --git a/InProgress/tensors_prog/eng/main.tex b/InProgress/tensors_prog/eng/main.tex
@@ -55,7 +55,6 @@
 \label{sec:prelim}
 \input{002prelim}
 
-\clearpage
 \He{Context-free path querying by Kronecker product}
 \label{sec:algo}
 \input{003algo.tex}
@@ -68,7 +67,6 @@
 \label{sec:related}
 \input{005related.tex}
 
-\newpage
 \He{Conclusion and Future Work}
 \label{sec:conclusion}
 \input{006conclusion.tex}

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-\subsection{Paths Extraction Algorithm}`
	`1`	`+\he{Paths Extraction Algorithm}`
`2`	`2`	`After the index has been created, one can enumerate all paths between specified vertices.`
`3`	`3`	`The index $M_3$ already stores data about all paths derivable from nonterminals.`
`4`	`4`	`This data can be used to construct these paths. However, the set of such paths can be infinite.`