<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Backend Engineering]]></title><description><![CDATA[Backend Engineering]]></description><link>https://backend.engg.wiki</link><generator>RSS for Node</generator><lastBuildDate>Mon, 18 May 2026 00:20:14 GMT</lastBuildDate><atom:link href="https://backend.engg.wiki/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Text and Pattern matching in Postgres]]></title><description><![CDATA[Text matching is a pretty common problem while working with SQL. LIKE, SIMILAR TO and POSIX are pretty common querying use cases. But, taking advantage of indexes when using these is something to pay attention to. A simple b-tree index on a text/varc...]]></description><link>https://backend.engg.wiki/text-and-pattern-matching-in-postgres</link><guid isPermaLink="true">https://backend.engg.wiki/text-and-pattern-matching-in-postgres</guid><category><![CDATA[PostgreSQL]]></category><category><![CDATA[Databases]]></category><category><![CDATA[indexing]]></category><dc:creator><![CDATA[Pradeep Chodisetti]]></dc:creator><pubDate>Thu, 18 Jan 2024 17:24:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705598172444/a3cc0938-2ba9-47ff-b850-38f2a0386ecd.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Text matching is a pretty common problem while working with SQL. <a target="_blank" href="https://www.postgresql.org/docs/current/functions-matching.html">LIKE, SIMILAR TO and POSIX</a> are pretty common querying use cases. But, taking advantage of indexes when using these is something to pay attention to. A simple b-tree index on a text/varchar column doesn't utilize the index as evident below.</p>
<pre><code class="lang-powershell">testdb=<span class="hljs-comment">#  \d test_table_2</span>
                                        Table <span class="hljs-string">"public.test_table_2"</span>
   Column    |          <span class="hljs-built_in">Type</span>          | Collation | Nullable |                   Default
-------------+------------------------+-----------+----------+---------------------------------------------
 id          | integer                |           | not null | nextval(<span class="hljs-string">'your_table_name_id_seq'</span>::regclass)
 domain_name | character varying(<span class="hljs-number">255</span>) |           |          |
Indexes:
    <span class="hljs-string">"your_table_name_pkey"</span> PRIMARY KEY, btree (id)
    <span class="hljs-string">"test_index"</span> btree (domain_name)
testdb=<span class="hljs-comment">#</span>
testdb=<span class="hljs-comment">#</span>
testdb=<span class="hljs-comment">#  explain analyze select * from test_table_2 where domain_name like 'abcd%';</span>
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=<span class="hljs-number">1000.00</span>..<span class="hljs-number">13576.05</span> rows=<span class="hljs-number">100</span> width=<span class="hljs-number">25</span>) (actual time=<span class="hljs-number">78.204</span>..<span class="hljs-number">80.452</span> rows=<span class="hljs-number">0</span> loops=<span class="hljs-number">1</span>)
   Workers Planned: <span class="hljs-number">2</span>
   Workers Launched: <span class="hljs-number">2</span>
   -&gt;  Parallel Seq Scan on test_table_2  (cost=<span class="hljs-number">0.00</span>..<span class="hljs-number">12566.05</span> rows=<span class="hljs-number">42</span> width=<span class="hljs-number">25</span>) (actual time=<span class="hljs-number">50.570</span>..<span class="hljs-number">50.571</span> rows=<span class="hljs-number">0</span> loops=<span class="hljs-number">3</span>)
         <span class="hljs-keyword">Filter</span>: ((domain_name)::text ~~ <span class="hljs-string">'abcd%'</span>::text)
         Rows Removed by <span class="hljs-keyword">Filter</span>: <span class="hljs-number">333443</span>
 Planning Time: <span class="hljs-number">2.578</span> ms
 Execution Time: <span class="hljs-number">80.777</span> ms
(<span class="hljs-number">8</span> rows)
</code></pre>
<p>This is because most databases are not initialized with locale C and are unable to utilize the default operator class for pattern matching.</p>
<p>If you do not already know, you need to be clear about 2 terms - Locale and Operator class.</p>
<p><strong>Locale:</strong><br />Locale refers to the specific regional or cultural settings used by a computer system, including language, date and time formats, and collation rules. Each locale has its own set of rules for sorting and comparing characters.<br />You can check the locale that the database uses via the below.</p>
<pre><code class="lang-powershell">testdb=<span class="hljs-comment"># SHOW LC_COLLATE;</span>
 lc_collate
-------------
 en_US.UTF<span class="hljs-literal">-8</span>
(<span class="hljs-number">1</span> row)
</code></pre>
<p><strong>Operator class:</strong><br />An operator class in PostgreSQL is a way to define a custom sorting and comparison behaviour for a specific data type, such as text, integer, or double. Operator classes are used to create custom functions that can be used as operators in SQL queries, allowing you to extend the functionality of the default operators.</p>
<p>If you wish to read more, <a target="_blank" href="https://www.cybertec-postgresql.com/en/operator-classes-explained/">here</a> is a very beautiful explanation on operator classes. So, basically if your DB is initialized with locale C, you can leverage the default operator classes for ordinary comparisons as well as pattern matching expressions. But, if your DB is initialized with some other locale, you can leverage the default operator classes for ordinary comparisons but not for pattern matching expressions. For pattern matching expressions, you can use some inbuilt operator classes beside the default operator class. Here is an excerpt from Postgres documentation:</p>
<blockquote>
<p>*The operator classes text_pattern_ops, varchar_pattern_ops, and bpchar_pattern_ops support B-tree indexes on the types text, varchar, and char respectively. The difference from the default operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (LIKE or POSIX regular expressions) when the database does not use the standard “C” locale. As an example, you might index a varchar column like this:<br />CREATE INDEX test_index ON test_table (col varchar_pattern_ops);</p>
<p>Note that you should also create an index with the default operator class if you want queries involving ordinary &lt;, &lt;=, &gt;, or &gt;= comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes. (Ordinary equality comparisons can use these operator classes, however.) It is possible to create multiple indexes on the same column with different operator classes. If you do use the C locale, you do not need the xxx_pattern_ops operator classes, because an index with the default operator class is usable for pattern-matching queries in the C locale.*</p>
</blockquote>
<p>So, now when you create an index with the specific operator class like below, it is evident that index scan kicks in.</p>
<pre><code class="lang-powershell">testdb=<span class="hljs-comment"># CREATE INDEX test_index ON test_table_2 (domain_name varchar_pattern_ops);</span>
CREATE INDEX
testdb=<span class="hljs-comment">#</span>
testdb=<span class="hljs-comment"># \d test_table_2</span>
                                        Table <span class="hljs-string">"public.test_table_2"</span>
   Column    |          <span class="hljs-built_in">Type</span>          | Collation | Nullable |                   Default
-------------+------------------------+-----------+----------+---------------------------------------------
 id          | integer                |           | not null | nextval(<span class="hljs-string">'your_table_name_id_seq'</span>::regclass)
 domain_name | character varying(<span class="hljs-number">255</span>) |           |          |
Indexes:
    <span class="hljs-string">"your_table_name_pkey"</span> PRIMARY KEY, btree (id)
    <span class="hljs-string">"test_index"</span> btree (domain_name varchar_pattern_ops)

testdb=<span class="hljs-comment">#</span>
testdb=<span class="hljs-comment">#  explain analyze select * from test_table_2 where domain_name like 'abcd%';</span>
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Index Scan <span class="hljs-keyword">using</span> test_index on test_table_2  (cost=0.42..8.45 rows=100 width=25) (actual time=0.488..0.489 rows=0 loops=1)
   Index Cond: (((domain_name)::text ~&gt;=~ <span class="hljs-string">'abcd'</span>::text) AND ((domain_name)::text ~&lt;~ <span class="hljs-string">'abce'</span>::text))
   <span class="hljs-keyword">Filter</span>: ((domain_name)::text ~~ <span class="hljs-string">'abcd%'</span>::text)
 Planning Time: <span class="hljs-number">86.148</span> ms
 Execution Time: <span class="hljs-number">0.517</span> ms
(<span class="hljs-number">5</span> rows)
</code></pre>
<p>So here, in the context of text_pattern_ops, the operator class is used to define custom comparison and sorting behavior for the text data type when performing pattern-matching queries using the LIKE operator or POSIX regular expressions.</p>
<p>So far so good. When the query is of the form LIKE 'abcd%', there is no problem since index utilization happens without issues as evident above.</p>
<p>But, what if the query is of the form LIKE '%abcd' for the same index as above?</p>
<pre><code class="lang-powershell">testdb=<span class="hljs-comment">#  explain analyze select * from test_table_2 where domain_name like '%abcd';</span>
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=<span class="hljs-number">1000.00</span>..<span class="hljs-number">13576.05</span> rows=<span class="hljs-number">100</span> width=<span class="hljs-number">25</span>) (actual time=<span class="hljs-number">69.966</span>..<span class="hljs-number">71.298</span> rows=<span class="hljs-number">0</span> loops=<span class="hljs-number">1</span>)
   Workers Planned: <span class="hljs-number">2</span>
   Workers Launched: <span class="hljs-number">2</span>
   -&gt;  Parallel Seq Scan on test_table_2  (cost=<span class="hljs-number">0.00</span>..<span class="hljs-number">12566.05</span> rows=<span class="hljs-number">42</span> width=<span class="hljs-number">25</span>) (actual time=<span class="hljs-number">46.064</span>..<span class="hljs-number">46.064</span> rows=<span class="hljs-number">0</span> loops=<span class="hljs-number">3</span>)
         <span class="hljs-keyword">Filter</span>: ((domain_name)::text ~~ <span class="hljs-string">'%abcd'</span>::text)
         Rows Removed by <span class="hljs-keyword">Filter</span>: <span class="hljs-number">333443</span>
 Planning Time: <span class="hljs-number">0.207</span> ms
 Execution Time: <span class="hljs-number">71.753</span> ms
(<span class="hljs-number">8</span> rows)
</code></pre>
<p>Clearly, index scan is not done and we see a sequential scan happening.</p>
<p>Why is this?</p>
<p>This is because</p>
<blockquote>
<p>B-tree indexes are well-suited for handling ordered data and range queries, but they have limitations when it comes to wildcard matching at the beginning of the search pattern. In B-tree indexes, the index entries are sorted in a specific order, allowing efficient range queries by traversing the tree structure. However, wildcard matching at the beginning of the pattern requires scanning the entire index because the index entries are ordered based on their values from left to right.</p>
<p>When you use a LIKE query with a wildcard character at the end, such as "LIKE 'abcd%'", the B-tree index can efficiently determine the range of values that satisfy the query condition, starting from the specified prefix. It can traverse the index entries in the sorted order and efficiently identify the matching values.</p>
<p>On the other hand, when you use a LIKE query with a wildcard character at the beginning, such as "LIKE '%abcd'", the B-tree index cannot effectively leverage its sorting order to narrow down the search space. As a result, it needs to scan the entire index to find the matching values, which can be less efficient compared to a sequential scan of the table itself.</p>
</blockquote>
<p>To put it simply, if the wildcard character is at the end, I know the starting characters so I can traverse the tree from the root till the point where the wildcard character appears so I can use the index.</p>
<p>But, if the wildcard character is at the beginning, how do I traverse the tree since I do not know which characters might come in the beginning and how many characters might come, so a tree traversal is not possible and hence index is not used.</p>
<p>To deal with such cases, we can use another type of indexes called trigram indexes.</p>
<p>Trigram indexes work by breaking up text in trigrams or 3-letter sequences and the actual indexes themselves must be a GIN or GIST index.</p>
<p>GIST helps with more cases than GIN, takes less index building time but more querying time.</p>
<p>GIN takes more index building time but less querying time.</p>
<p>So, which one to use is completely dependent on your use case. For most simple cases, GIN outperforms GIST by a large margin.</p>
<p>Also, before you use them, you will need to enable pg_trgm extension in your DB.</p>
<p><a target="_blank" href="https://about.gitlab.com/blog/2016/03/18/fast-search-using-postgresql-trigram-indexes/#trigram-indexes">Here</a> is a very good explanation on Trigram Indexes. Postgres' <a target="_blank" href="https://www.postgresql.org/docs/current/pgtrgm.html">documentation</a> is pretty lucid as well.</p>
<pre><code class="lang-powershell">testdb=<span class="hljs-comment"># create index test_index on test_table_2 using gin (domain_name gin_trgm_ops);</span>
CREATE INDEX
testdb=<span class="hljs-comment">#</span>
testdb=<span class="hljs-comment">#</span>
testdb=<span class="hljs-comment"># \d test_table_2</span>
                                        Table <span class="hljs-string">"public.test_table_2"</span>
   Column    |          <span class="hljs-built_in">Type</span>          | Collation | Nullable |                   Default
-------------+------------------------+-----------+----------+---------------------------------------------
 id          | integer                |           | not null | nextval(<span class="hljs-string">'your_table_name_id_seq'</span>::regclass)
 domain_name | character varying(<span class="hljs-number">255</span>) |           |          |
Indexes:
    <span class="hljs-string">"your_table_name_pkey"</span> PRIMARY KEY, btree (id)
    <span class="hljs-string">"test_index"</span> <span class="hljs-built_in">gin</span> (domain_name gin_trgm_ops)

testdb=<span class="hljs-comment">#</span>
testdb=<span class="hljs-comment">#</span>
testdb=<span class="hljs-comment"># explain analyze select * from test_table_2 where domain_name like 'abcd%';</span>
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_table_2  (cost=<span class="hljs-number">68.78</span>..<span class="hljs-number">435.05</span> rows=<span class="hljs-number">100</span> width=<span class="hljs-number">25</span>) (actual time=<span class="hljs-number">2.136</span>..<span class="hljs-number">2.139</span> rows=<span class="hljs-number">0</span> loops=<span class="hljs-number">1</span>)
   Recheck Cond: ((domain_name)::text ~~ <span class="hljs-string">'abcd%'</span>::text)
   Rows Removed by Index Recheck: <span class="hljs-number">4</span>
   Heap Blocks: exact=<span class="hljs-number">4</span>
   -&gt;  Bitmap Index Scan on test_index  (cost=<span class="hljs-number">0.00</span>..<span class="hljs-number">68.75</span> rows=<span class="hljs-number">100</span> width=<span class="hljs-number">0</span>) (actual time=<span class="hljs-number">1.679</span>..<span class="hljs-number">1.681</span> rows=<span class="hljs-number">4</span> loops=<span class="hljs-number">1</span>)
         Index Cond: ((domain_name)::text ~~ <span class="hljs-string">'abcd%'</span>::text)
 Planning Time: <span class="hljs-number">3.563</span> ms
 Execution Time: <span class="hljs-number">2.262</span> ms
(<span class="hljs-number">8</span> rows)

testdb=<span class="hljs-comment">#</span>
testdb=<span class="hljs-comment">#</span>
testdb=<span class="hljs-comment"># explain analyze select * from test_table_2 where domain_name like '%abcd';</span>
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_table_2  (cost=<span class="hljs-number">52.78</span>..<span class="hljs-number">419.05</span> rows=<span class="hljs-number">100</span> width=<span class="hljs-number">25</span>) (actual time=<span class="hljs-number">1.013</span>..<span class="hljs-number">1.014</span> rows=<span class="hljs-number">0</span> loops=<span class="hljs-number">1</span>)
   Recheck Cond: ((domain_name)::text ~~ <span class="hljs-string">'%abcd'</span>::text)
   Rows Removed by Index Recheck: <span class="hljs-number">2</span>
   Heap Blocks: exact=<span class="hljs-number">2</span>
   -&gt;  Bitmap Index Scan on test_index  (cost=<span class="hljs-number">0.00</span>..<span class="hljs-number">52.75</span> rows=<span class="hljs-number">100</span> width=<span class="hljs-number">0</span>) (actual time=<span class="hljs-number">0.839</span>..<span class="hljs-number">0.840</span> rows=<span class="hljs-number">2</span> loops=<span class="hljs-number">1</span>)
         Index Cond: ((domain_name)::text ~~ <span class="hljs-string">'%abcd'</span>::text)
 Planning Time: <span class="hljs-number">0.171</span> ms
 Execution Time: <span class="hljs-number">1.056</span> ms
(<span class="hljs-number">8</span> rows)
</code></pre>
<p>After adding trigram index, it is clear that index is utilized for scanning in both the cases - wildcard at the beginning and wildcard towards the end. In comparison with a normal B-Tree index with xxx_pattern_ops operator class, for LIKE 'abcd%' type queries, the former does perform better but then the former has limited use cases.Trigram indexes give you an index advantage in a wide variety of cases on the other hand and come in handy in some really tricky situations. So, choose wisely and see if you can leverage both for your use case.</p>
<p>Happy indexing!</p>
]]></content:encoded></item><item><title><![CDATA[URL redirection]]></title><description><![CDATA[URL redirection is a concept where a browser when visits a source URL gets redirected to a destination URL. There are multiple reasons to do so some of which are :-

Want users to always visit your site on HTTPS instead of HTTP.

Have a shortened URL...]]></description><link>https://backend.engg.wiki/url-redirection</link><guid isPermaLink="true">https://backend.engg.wiki/url-redirection</guid><category><![CDATA[networking]]></category><dc:creator><![CDATA[Pradeep Chodisetti]]></dc:creator><pubDate>Thu, 18 Jan 2024 17:12:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705597941567/da9f6dd8-ad56-4569-bfce-45b02088951c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>URL redirection is a concept where a browser when visits a source URL gets redirected to a destination URL. There are multiple reasons to do so some of which are :-</p>
<ul>
<li><p>Want users to always visit your site on HTTPS instead of HTTP.</p>
</li>
<li><p>Have a shortened URL and want users to visit the actual site when the short URL is used.</p>
</li>
<li><p>Have your content on one website but have an alias for the same.</p>
</li>
<li><p>Have a new home for your website and want users to no longer visit the old one.</p>
</li>
</ul>
<p>There are multiple ways to achieve it where each method serves its own purpose and should be leveraged according to the specific needs. Some of the popular ways are listed below.</p>
<h2 id="heading-301302307308-redirection">301/302/307/308 redirection</h2>
<p>This URL redirection is unmasked redirection. It is called so because the target URL is not masked and visible to the user and the user is made aware of it.</p>
<p>It needs that the response HTTP status code is 301/2/7/8 and a Location header is passed in the response headers.</p>
<p>Here's an excerpt from the <a target="_blank" href="https://www.rfc-editor.org/rfc/rfc9110.html#name-redirection-3xx">RFC</a> itself:</p>
<p>The server SHOULD generate a <a target="_blank" href="https://www.rfc-editor.org/rfc/rfc9110.html#field.location">Location</a> header field in the response containing a preferred URI reference for the new permanent URI. The user agent MAY use the Location field value for automatic redirection.</p>
<p>The browser is smart for a lot of reasons and this is one of it. It sees that the status code is 3XX, understands that redirection needs to kick in and then looks for the Location header in the response headers. It then visits the target URL mentioned by the Location header through the usual process of <a target="_blank" href="https://aws.amazon.com/blogs/mobile/what-happens-when-you-type-a-url-into-your-browser">visiting</a> <a target="_blank" href="https://www.blogger.com/blog/post/edit/8281338212598806169/7479940990007833303#">a URL</a>.</p>
<p>301 and 308 are used for permanent redirections whereas 302 and 307 are used for temporary redirections.</p>
<p>In permanent redirections, the target website is indexed by the search engines whereas in temporary redirections, the search engines are made aware that this redirection is on a temporary basis and indexing needs to be set for the source URL itself.</p>
<p>You can read more on the specific use cases of each <a target="_blank" href="https://blog.hubspot.com/blog/tabid/6307/bid/7430/what-is-a-301-redirect-and-why-should-you-care.aspx">here</a>.</p>
<p>The difference between 301-308 or 302-307 is that the first set of them - 301/2 do not explicitly ask the user agent to change the METHOD during redirection whereas the latter ones - 307/8 explicitly ask the user agent to not change the METHOD during redirection. It is just a contract enforcement mechanism.</p>
<p>You can read more on this <a target="_blank" href="https://www.baeldung.com/cs/redirection-status-codes">here</a>.</p>
<h2 id="heading-frame-redirection">Frame redirection</h2>
<p>This URL redirection is masked redirection. It is called so because the target URL is masked and the user is not made aware of it explicitly.</p>
<p>It works by embedding the entire content of a target site into a <a target="_blank" href="https://www.w3schools.com/tags/tag_frame.asp">frame</a> and generating a new page. This page will be served by a domain forwarding service which ensures that this works seamlessly.</p>
<p>When an internet user visits the source URL, the resultant IP as part of the DNS resolution is that of the domain forwarding service. The service sees that someone needs to visit the source URL, checks the mapped target URL for the same on its end, takes the content of the target URL, embeds it into a frame and then generates and returns the page.</p>
<p>One important thing to note in this kind of redirection is the non-changing URL in the address bar.</p>
<p>If <a target="_blank" href="http://abc.com">abc.com</a> is fetching content for <a target="_blank" href="http://xyz.com">xyz.com</a>, and if the user clicks on some link on the page, the URL in the address bar would not change to <a target="_blank" href="http://abc.com/some-path">abc.com/some-path</a> even when the content will be fetched from <a target="_blank" href="http://xyz.com/some-path">xyz.com/some-path</a>.</p>
<p>The reason the address bar does not reflect the path when using frames is because of how web browsers handle frames. The URL in the address bar is associated with the main document and not the individual frames within that document. Each frame is a separate HTML document with its own URL, but this URL is not displayed in the address bar of the browser.  </p>
<p>When a frame is loaded, it's essentially a separate webpage being loaded within the main webpage. The browser treats each frame (and the content within) as a separate document. This means that when you navigate within a frame, the URL of the main document (which is what's displayed in the browser's address bar) doesn't change.</p>
<p>This redirection is not to be confused with CNAME redirection which is also a masked redirection.</p>
<h2 id="heading-cname-redirection">CNAME redirection</h2>
<p>This kind of redirection is masked. </p>
<p>The DNS lookup for a query when a URL is entered in the browser doesn't complete till an A/AAAA record is found since these are the ones that provide an IP for the browser to connect with.</p>
<p>A CNAME record involves a source and a destination and implies that every time an authoritative IP is fetched for the source domain, the target domain's resolution needs to kick in and its IP needs to be returned for that of the source domain.</p>
<p>So, this redirection works at the DNS level itself and hence, needless to say, the web server which sits behind that IP will need to handle requests for both domains. This is different from frame redirection where the web server doesn't have to handle requests from the source domain.</p>
<p>You can read more on this and the above 2 methods as well <a target="_blank" href="https://www.namecheap.com/support/knowledgebase/article.aspx/9604/2237/types-of-domain-redirects-301-302-url-redirects-url-frame-and-cname/">here</a>.</p>
<h2 id="heading-refresh-meta-tag-and-http-refresh-header">Refresh Meta tag and HTTP refresh header</h2>
<p>This kind of redirection is unmasked. </p>
<p>Upon adding a <a target="_blank" href="https://en.wikipedia.org/wiki/Meta_refresh">refresh meta tag</a> to the HTML document, the browser automatically refreshes the page to the mentioned URL and if you mention the delay as 0, the refresh happens instantaneously simulating a redirection.</p>
<p>This works just like sending the Refresh header in the HTTP response.</p>
<p>So, the web server for the source domain can either serve an HTML page with the refresh meta tag mentioning the target URL or the web server can respond with the refresh header in the response and then the browser can serve the target URL page.</p>
]]></content:encoded></item><item><title><![CDATA[Assigning a rate-based rule to WebACL in WAF classic on AWS]]></title><description><![CDATA[You can assign rules to your WebACL in your WAF classic but what you can also do are assign rate-based rules to it. You will use this when you need to apply rate-limiting at your WAF level.
And if you are trying to do it via Cloudformation like below...]]></description><link>https://backend.engg.wiki/assigning-a-rate-based-rule-to-webacl-in-waf-classic-on-aws</link><guid isPermaLink="true">https://backend.engg.wiki/assigning-a-rate-based-rule-to-webacl-in-waf-classic-on-aws</guid><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Pradeep Chodisetti]]></dc:creator><pubDate>Thu, 18 Jan 2024 17:07:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705597374568/2226cacb-71b7-487a-a255-34d664f08ef7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can assign rules to your WebACL in your WAF classic but what you can also do are assign <a target="_blank" href="https://docs.aws.amazon.com/waf/latest/developerguide/waf-rule-statement-type-rate-based.html">rate-based rules</a> to it. You will use this when you need to apply rate-limiting at your WAF level.</p>
<p>And if you are trying to do it via Cloudformation like below:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">WebACL:</span>
  <span class="hljs-attr">Type:</span> <span class="hljs-string">"AWS::WAFRegional::WebACL"</span>
  <span class="hljs-attr">Properties:</span>
    <span class="hljs-attr">DefaultAction:</span>
      <span class="hljs-attr">Type:</span> <span class="hljs-string">BLOCK</span>
    <span class="hljs-attr">MetricName:</span> <span class="hljs-string">"MyWebACL"</span>
    <span class="hljs-attr">Name:</span> <span class="hljs-string">MyWebACL</span>
    <span class="hljs-attr">Rules:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">Action:</span>
          <span class="hljs-attr">Type:</span> <span class="hljs-string">ALLOW</span>
        <span class="hljs-attr">Priority:</span> <span class="hljs-number">1</span>
        <span class="hljs-attr">RuleId:</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">RateBasedRule</span>


<span class="hljs-attr">RateBasedRule:</span>
  <span class="hljs-attr">Type:</span> <span class="hljs-string">"AWS::WAFRegional::RateBasedRule"</span>
  <span class="hljs-attr">Properties:</span>
    <span class="hljs-attr">Name:</span> <span class="hljs-string">MyRateBasedRule</span>
    <span class="hljs-attr">MetricName:</span> <span class="hljs-string">"MyRateBasedRule"</span>
    <span class="hljs-attr">RateKey:</span> <span class="hljs-string">"IP"</span>
    <span class="hljs-attr">RateLimit:</span> <span class="hljs-number">2000</span>
    <span class="hljs-attr">MatchPredicates:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">DataId:</span> <span class="hljs-type">!Ref</span> <span class="hljs-string">IPSet</span>
        <span class="hljs-attr">Negated:</span> <span class="hljs-literal">false</span>
        <span class="hljs-attr">Type:</span> <span class="hljs-string">"IPMatch"</span>
</code></pre>
<p>This is not going to work since <a target="_blank" href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-wafregional-ratebasedrule.html#:~:text=Note%20you%20can%20only%20create%20rate%2Dbased%20rules%20using%20an%20AWS%20CloudFormation%20template.%20To%20add%20the%20rate%2Dbased%20rules%20created%20through%20AWS%20CloudFormation%20to%20a%20web%20ACL%2C%20use%20the%20AWS%20WAF%20console%2C%20API%2C%20or%20command%20line%20interface%20(CLI).">rate-based rule creation is supported via Cloudformation but association is not</a>. This is a <a target="_blank" href="https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/940">known issue</a> at AWS end.</p>
<p>You can either do the association via console or else use AWS cli for the same if it is part of an automation.</p>
<p>You will need to first fetch a change token which gets associated with the change you are making to the WebACL.</p>
<blockquote>
<p><em>When you want to create, update, or delete AWS WAF objects, get a change token and include the change token in the create, update, or delete request. Change tokens ensure that your application doesn't submit conflicting requests to AWS WAF.</em></p>
<p><em>Each create, update, or delete request must use a unique change token. If your application submits a</em> <code>GetChangeToken</code> request and then submits a second <code>GetChangeToken</code> request before submitting a create, update, or delete request, the second <code>GetChangeToken</code> request returns the same value as the first <code>GetChangeToken</code> request.</p>
<p><em>When you use a change token in a create, update, or delete request, the status of the change token changes to</em> <code>PENDING</code> , which indicates that AWS WAF is propagating the change to all AWS WAF servers. Use <code>GetChangeTokenStatus</code> to determine the status of your change token.</p>
</blockquote>
<pre><code class="lang-bash">$ change_token=$(aws waf-regional get-change-token --output text) <span class="hljs-comment"># this line is needed so that the output stored in the variable isn't enclosed in quotes</span>
$ aws waf-regional update-web-acl --web-acl-id <span class="hljs-variable">${web_acl_id}</span> --change-token <span class="hljs-variable">${change_token}</span> --updates Action=<span class="hljs-string">"INSERT"</span>,ActivatedRule=<span class="hljs-string">"{Priority=1,RuleId=<span class="hljs-variable">${rule_id}</span>,Action={Type=\"BLOCK\"},Type=\"RATE_BASED\"}"</span>
</code></pre>
<h3 id="heading-references">References</h3>
<p><a target="_blank" href="https://docs.aws.amazon.com/cli/latest/reference/waf-regional/get-change-token.html">https://docs.aws.amazon.com/cli/latest/reference/waf-regional/get-change-token.html</a></p>
<p><a target="_blank" href="https://docs.aws.amazon.com/cli/latest/reference/waf-regional/update-web-acl.html">https://docs.aws.amazon.com/cli/latest/reference/waf-regional/update-web-acl.html</a></p>
]]></content:encoded></item><item><title><![CDATA[CGI and mod_perl]]></title><description><![CDATA[CGI
CGI or Common Gateway Interface is a standard protocol that defines how web servers and external programs can communicate. It allows web servers to execute programs written in various languages, such as Perl, Python, and PHP, in response to web r...]]></description><link>https://backend.engg.wiki/cgi-and-modperl</link><guid isPermaLink="true">https://backend.engg.wiki/cgi-and-modperl</guid><category><![CDATA[perl]]></category><category><![CDATA[cgi]]></category><dc:creator><![CDATA[Pradeep Chodisetti]]></dc:creator><pubDate>Thu, 18 Jan 2024 17:01:26 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705597261345/3b6f190e-4f35-40a4-bdfb-0ec1e9d00987.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-cgi">CGI</h2>
<p>CGI or Common Gateway Interface is a standard protocol that defines how web servers and external programs can communicate. It allows web servers to execute programs written in various languages, such as Perl, Python, and PHP, in response to web requests. CGI scripts are typically used to generate dynamic web pages, process form data, and access databases.  </p>
<h3 id="heading-working">Working</h3>
<p>Here's how CGI works:</p>
<ul>
<li><p>A web browser sends a request to a web server for a specific URL.</p>
</li>
<li><p>The web server checks if the URL maps to a static file (e.g., HTML, CSS, image).</p>
</li>
<li><p>If it's not a static file, the web server searches for a CGI script that matches the URL.</p>
</li>
<li><p>If a CGI script is found, the web server launches the script and passes it any parameters that were included in the request URL.</p>
</li>
<li><p>The CGI script executes and generates output (e.g., HTML code).</p>
</li>
<li><p>The web server sends the output back to the web browser.</p>
</li>
</ul>
<p>Despite its advantages, CGI has some limitations. One of the main issues is performance. Every time a CGI script is executed, a new process is created on the server. This can consume significant resources, especially if there are many simultaneous requests. This led to the development of alternative solutions, one of which is mod_perl.  </p>
<h2 id="heading-modperl">Mod_perl</h2>
<p>mod_perl is an Apache module (service programs that can be dynamically linked and loaded to extend the nature of the HTTP Server) that embeds the Perl interpreter directly into the web server. This allows Perl scripts to be executed much faster than CGI scripts, as they don't need to be launched as separate processes.  </p>
<h3 id="heading-working-1">Working</h3>
<p>With mod_perl, the Perl interpreter is started only once when the server starts. Perl scripts are loaded into memory, and subsequent requests are handled by the persistent interpreter. This eliminates the overhead of starting a new Perl process for each request, resulting in faster execution and reduced resource consumption.  </p>
<p>mod_perl also extends the Apache API, allowing developers to write Apache modules entirely in Perl. This gives developers access to all stages of the request processing cycle and allows them to manipulate Apache's internal tables and state mechanisms. This level of control and integration is not possible with traditional CGI.  </p>
<p>To ease the transition from CGI to mod_perl, it includes features to run existing CGI scripts under mod_perl with little or no modification. For example, Apache::Registry and Apache::PerlRun are two mod_perl modules that can execute CGI scripts much faster than traditional CGI, because they take advantage of the persistent Perl interpreter embedded in the server.  </p>
<p>When Apache receives a request, it processes it in <a target="_blank" href="https://perl.apache.org/docs/2.0/user/handlers/http.html#HTTP_Request_Cycle_Phases">12 phases</a>. The advantage of breaking up the request process into phases is that Apache gives a programmer the opportunity to hook into the process at any of those phases. For every phase a standard default handler is supplied by Apache.  </p>
<p>Modules take control of request processing at each of the phases through a set of well-defined hooks provided by Apache. The subroutine or function in charge of a particular request phase is called a handler. Apache also provides modules with a comprehensive set of functions they can call to achieve common tasks including file I/O, sending HTTP headers or parsing URIs. These functions are collectively knows as Apache API.  </p>
<p>Like other Apache modules, mod_perl is written in C, registers handlers for request phases and uses the Apache API. However, mod_perl doesn't directly process requests. Rather, it allows you to write handlers in Perl. When the Apache core yields control to mod_perl through one of its registered handlers, mod_perl dispatches processing to one of the registered Perl handlers.  </p>
<p>The &lt;Location&gt; section in the Apache configuration (httpd.conf) assigns a number of rules that the server follows when the request's URI matches the location.  </p>
<p><code>&lt;Location /foo&gt;       SetHandler modperl       PerlResponseHandler FooServer   &lt;/Location&gt;</code>  </p>
<p>This configuration causes all requests for URIs starting with /foo to be handled by the mod_perl Apache modules with the handler from the FooServer perl module.  </p>
<p>Directives :-  </p>
<p><em>SetHandler</em>  </p>
<p>SetHandler set to <a target="_blank" href="https://perl.apache.org/docs/2.0/user/config/config.html#C_perl_script_">perl-script</a> or <a target="_blank" href="https://perl.apache.org/docs/2.0/user/config/config.html#C_modperl_">modperl</a> tells Apache that mod_perl is going to handle the response generation.  </p>
<p><em>PerlResponseHandler</em>  </p>
<p>This tells mod_perl to use the FooServer perl module to handle the response generation.  </p>
<p>By default, the mod_perl API expects a subroutine named handler() to handle the request in the registered Perl*Handler module. Thus, if your module implements this subroutine, you can register the handler with mod_perl by just specifying the module name.</p>
]]></content:encoded></item><item><title><![CDATA[Hub vs Switch vs Router]]></title><description><![CDATA[HubSwitchRouter



FunctionConnects devices within a LAN in a simple broadcast mannerConnects devices within a LANConnects networks and allows devices to communicate across them

OSI layerLayer 1 (physical)Layer 2 (data-link)Layer 3 (network)

Data t...]]></description><link>https://backend.engg.wiki/hub-vs-switch-vs-router</link><guid isPermaLink="true">https://backend.engg.wiki/hub-vs-switch-vs-router</guid><dc:creator><![CDATA[Pradeep Chodisetti]]></dc:creator><pubDate>Thu, 18 Jan 2024 16:57:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705597041859/9b201784-76e6-4fe6-b337-ac3b3077ddc3.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>Hub</td><td>Switch</td><td>Router</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Function</strong></td><td>Connects devices within a LAN in a simple broadcast manner</td><td>Connects devices within a LAN</td><td>Connects networks and allows devices to communicate across them</td></tr>
<tr>
<td><strong>OSI layer</strong></td><td>Layer 1 (physical)</td><td>Layer 2 (data-link)</td><td>Layer 3 (network)</td></tr>
<tr>
<td><strong>Data transmission</strong></td><td>Unicast and multicast</td><td>Unicast and multicast</td><td>Broadcast</td></tr>
<tr>
<td><strong>Addressing</strong></td><td>None</td><td>MAC addresses, switch table</td><td>IP addresses, routing tables</td></tr>
<tr>
<td><strong>Cost</strong></td><td>Low</td><td>Moderate</td><td>High</td></tr>
</tbody>
</table>
</div>]]></content:encoded></item><item><title><![CDATA[DNS zones]]></title><description><![CDATA[A DNS Zone is a portion of the DNS namespace that is managed by an organization or administrator. It serves as an administrative space with granular control of DNS components and records, such as authoritative nameservers. A DNS zone can contain mult...]]></description><link>https://backend.engg.wiki/dns-zones</link><guid isPermaLink="true">https://backend.engg.wiki/dns-zones</guid><dc:creator><![CDATA[Pradeep Chodisetti]]></dc:creator><pubDate>Thu, 18 Jan 2024 16:49:04 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705596674174/ce3c6458-9b70-4b26-9864-d4c7dd2e8063.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A DNS Zone is a portion of the DNS namespace that is managed by an organization or administrator. It serves as an administrative space with granular control of DNS components and records, such as authoritative nameservers. A DNS zone can contain multiple domain and subdomains. Multiple zones can also exist on the same server.  Information stored for a DNS zone lives within a text file called a DNS zone file.</p>
<h2 id="heading-dns-zone-files">DNS Zone Files</h2>
<p>A DNS Zone file is a plain text file stored on a controlling DNS server that contains all the records for every domain within a given zone. Zone files can include many different record types, but must always begin with what is called an SOA record (Start of Authority).</p>
<h2 id="heading-types-of-records">Types of Records</h2>
<p>As mentioned, there are a handful of different types of records used within a DNS Zone, all of which serve a unique purpose. Below are some examples of the most commonly used record types and a brief description of each.</p>
<h3 id="heading-start-of-authority-soa"><em>Start of Authority (SOA)</em></h3>
<p>The first record in any zone file is the SOA resource record. This record is an essential part of the DNS zone file. It indicates the domain’s zone and the fundamental properties of the domain name server. It is the mandatory record that must be there in all zone files. It specifies the main properties and characteristics of a domain. Each zone file can contain only one SOA record. </p>
<h3 id="heading-name-server-ns"><em>Name Server (NS)</em></h3>
<p>These records specify the DNS server responsible for this domain. These records tell recursive name servers which name servers are authoritative for a zone. Recursive name servers look at the authoritative NS records to facilitate which server to ask next when resolving a name.</p>
<h3 id="heading-mail-exchange-mx"><em>Mail Exchange (MX)</em></h3>
<p>MX records, usually two, are responsible for specifying which mail server is in charge of receiving email messages on behalf of a site. The email client tries to make an SMTP connection to the primary mail server listed in the zone file. The records are ranked by priority from lowest to highest with the lowest being the primary. If the primary server is not available, the next listed mail server will attempt a routing connection. MX records must point to a domain, not an IP.</p>
<h3 id="heading-address-a"><em>Address (A)</em></h3>
<p>The A record or address record is used to find the IP associated with a domain name. This record routes info from the server to the end client’s web browser. It is used to add IP address for a hostname in a zone. It is the highly used resource record in a zone file. Also when you dig for a domain, the default answer you get is a A record which is denoted by a capital A.</p>
<h3 id="heading-aaaa"><em>AAAA</em></h3>
<p>The quadruple A record has the same function as the A record but is used specifically for the IPv6 protocol.</p>
<h3 id="heading-canonical-name-cname"><em>Canonical Name (CNAME)</em></h3>
<p>This record will alias one site name to another. The DNS lookup will then route domain name requests the new name that the A record holds. These records must point to a fully qualified domain name.</p>
<p>NAME                    RR       VALUE</p>
<p>--------------------------------------------------</p>
<p><a target="_blank" href="http://xyz.yourdomain.com">xyz.yourdomain.com</a>     CNAME     <a target="_blank" href="http://abc.yourdomain.com">abc.yourdomain.com</a></p>
<p><a target="_blank" href="http://abc.yourdomain.com">abc.yourdomain.com</a>     A         172.16.142.34</p>
<p>In the above shown example CNAME entry, if you want to reach "<a target="_blank" href="http://xyz.yourdomain.com">xyz.yourdomain.com</a>", your computer's DNS resolver will first fire an address lookup for "<a target="_blank" href="http://xyz.yourdomain.com">xyz.yourdomain.com</a>", and on finding the CNAME record of "<a target="_blank" href="http://abc.yourdomainc.com">abc.yourdomainc.com</a>", your resolver will again fire an address lookup for "<a target="_blank" href="http://abc.yourdomain.com">abc.yourdomain.com</a>".</p>
<h3 id="heading-alias-record-alias"><em>Alias Record (ALIAS)</em></h3>
<p>The ALIAS record is functionally similar to a CNAME record in that it is used to point one name to another. An ALIAS record is used to lead the apex domain name (<a target="_blank" href="http://example.com">example.com</a>) to a subdomain such as <a target="_blank" href="http://host.example.com">host.example.com</a>. The authoritative nameservers for the Apex domain will subsequently resolve the IP of the hostname to direct traffic.  </p>
<h3 id="heading-text-txt"><em>Text (TXT)</em></h3>
<p>TXT records hold the free-form text of any type. Initially, these were for human-readable information about the server such as location or data center. Presently, the most common uses for TXT records today are SPF and Domain_Keys(DKIM).</p>
<h3 id="heading-service-locator-srv"><em>Service Locator (SRV)</em></h3>
<p>Generalized service location record, used for newer protocols instead of creating protocol-specific records such as MX. This type of record, while helpful, is not commonly used.</p>
<h3 id="heading-pointer-ptr"><em>Pointer (PTR)</em></h3>
<p>Pointer records point an IP to a canonical name and used explicitly in reverse DNS. It is important to note that a reverse DNS record needs to be set up on the authoritative nameservers for the person that owns the IP, not the person that owns the canonical name.</p>
]]></content:encoded></item><item><title><![CDATA[.zip Tld]]></title><description><![CDATA[So, Google decided to make the Internet more unsafe (because why not) by making .ZIP and .MOV TLD extensions accessible for the public. News
Well, not really but that's the general complaint on the release of the news.
Here is a very good explanation...]]></description><link>https://backend.engg.wiki/zip-tld</link><guid isPermaLink="true">https://backend.engg.wiki/zip-tld</guid><dc:creator><![CDATA[Pradeep Chodisetti]]></dc:creator><pubDate>Thu, 18 Jan 2024 16:44:24 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705596181215/f9e1fc81-71f0-4d92-8284-943edad2fed2.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>So, Google decided to make the Internet more unsafe (because why not) by making .ZIP and .MOV TLD extensions accessible for the public. <a target="_blank" href="https://www.searchenginejournal.com/google-domain-registrar-new-tlds/485971/">News</a></p>
<p>Well, not really but that's the general complaint on the release of the news.</p>
<p><a target="_blank" href="https://medium.com/@bobbyrsec/the-dangers-of-googles-zip-tld-5e1e675e59a5">Here</a> is a very good explanation of how .ZIP TLD can be a security threat by <a target="_blank" href="https://medium.com/@bobbyrsec">Bobby Rauch</a>.</p>
<p>And <a target="_blank" href="https://textslashplain.com/2023/05/13/new-tlds-not-bad-actually/">here</a> is the counter argument for it by Eric Lawrence.</p>
<h2 id="heading-my-thoughts">My thoughts</h2>
<p>I personally feel this is a valid threat because of the simple fact that most users are not technically aware to the extent of even being able to understand how this is a threat.</p>
<p>Next, examples like .SH are not really valid for these TLDs because again, common users are far more likely to download a .ZIP file on a casual basis than a .SH file.</p>
<p>That said, the HSTS argument is completely valid (and is a big silver lining provided that those unicode characters won't be disallowed) and will get the malicious domains suspended or marked as spam but these domains can still do some damage in those short windows.</p>
<h2 id="heading-why-are-these-tlds-even-made-available">Why are these TLDs even made available?</h2>
<p>Well, they are made available definitely not with the intent to make internet more unsafe. But TLDs like these help the internet in multiple ways.</p>
<p>ChatGPT does a much better job in explaining this, so here's its answer:</p>
<blockquote>
<p>The introduction of new generic top-level domains (gTLDs) serves several purposes:</p>
<p>1. Increased Domain Name Availability: The primary motivation behind releasing new gTLDs is to expand the pool of available domain names. With the rapid growth of the internet and the exhaustion of traditional gTLDs like .com, .org, and .net, the introduction of new gTLDs allows for more choices and availability of domain names.</p>
<p>2. Branding and Differentiation: New gTLDs enable businesses, organizations, and individuals to create domain names that are more specific and relevant to their brand or industry. For example, a company in the automotive industry might choose a .auto domain, while a non-profit organization might opt for a .ngo domain. This allows for better branding, differentiation, and recognition within their respective sectors.</p>
<p>3. Market Competition and Innovation: The introduction of new gTLDs promotes competition among domain registries, fostering innovation and new business opportunities. It encourages registry operators to offer unique services and features tailored to specific industries or communities. This can lead to enhanced user experiences, specialized online communities, and improved online services.</p>
<p>4. Localization and Regional Identity: Some new gTLDs are designed to represent specific geographic regions or communities. For instance, city-based gTLDs like .nyc, .london, and .berlin can be used by businesses and individuals associated with those cities, helping to establish a sense of local identity and online presence.</p>
<p>5. Niche and Industry-Specific TLDs: New gTLDs also cater to specific industries, interests, or communities. For example, there are gTLDs like .photography, .tech, .fashion, and .guru that cater to photographers, technology enthusiasts, fashion brands, and consultants, respectively. These industry-specific domains allow businesses and individuals to align their web presence with their specialized niche or expertise.</p>
<p>Overall, the introduction of new gTLDs aims to provide more options for domain name registration, promote innovation, enhance branding, and create a more diverse and inclusive online environment.</p>
</blockquote>
<p>So, that way the .zip top-level domain (TLD) is the perfect fit for organizations specializing in file sharing, storage, and download technology, or for anyone offering speedy and efficient online service. (<a target="_blank" href="https://www.namecheap.com/domains/registration/gtld/zip/">reference</a>)</p>
<h2 id="heading-what-should-you-do">What should you do?</h2>
<p>Nothing much really except be wary of malicious links more and click links provided by trusted sources and be more stringent in clicking links that contain these TLDs anywhere in their URLs. Easier said than done ha!</p>
]]></content:encoded></item><item><title><![CDATA[XOR using Tries - Part 2]]></title><description><![CDATA[Before you go on, please make sure you have read Part 1.
Great! Let’s talk about the next usecase.
Problem
Given an array of integers, find the maximum xor subarray. Or simply,Given a1, a2, ....., an, find i and j , i <= j, such that ai xor ai+1 xor ...]]></description><link>https://backend.engg.wiki/xor-using-tries-part-2</link><guid isPermaLink="true">https://backend.engg.wiki/xor-using-tries-part-2</guid><category><![CDATA[algorithms]]></category><category><![CDATA[Trie]]></category><category><![CDATA[XOR]]></category><category><![CDATA[data structures]]></category><dc:creator><![CDATA[Pradeep Chodisetti]]></dc:creator><pubDate>Thu, 18 Jan 2024 16:28:26 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705595255037/1d3425eb-b0bb-4f64-9eb5-3e78458d7797.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Before you go on, please make sure you have read <a target="_blank" href="https://engg.wiki/xor-using-tries-part-1">Part 1</a>.</p>
<p>Great! Let’s talk about the next usecase.</p>
<h2 id="heading-problem">Problem</h2>
<p>Given an array of integers, find the maximum xor subarray. Or simply,<br />Given a<sub>1</sub>, a<sub>2</sub>, ....., a<sub>n</sub>, find i and j , i &lt;= j, such that a<sub>i</sub> xor a<sub>i+1</sub> xor ..... a<sub>j-1</sub> xor a<sub>j</sub> is the maximum possible value.</p>
<h2 id="heading-simple-solution">Simple solution</h2>
<p>Run two loops and for every combination of i and j, run one more loop to calculate xor from index i to j.</p>
<p>maxSoFar = -1<br />For i = 1 to N<br />    For j = i to N<br />        xorVal = 1<br />        For k = i to j<br />            xorVal = xorVal ^ a[k]<br />        maxSoFar = max(maxSoFar, xorVal)<br />print maxSoFar</p>
<p>That is an O(N<sup>​​​​​​​3</sup>)! Can we simplify it a bit?</p>
<p>Let’s start slow.</p>
<h2 id="heading-better-solution">Better solution</h2>
<p>Let’s first see, where we can optimise it.</p>
<p>Now, running two loops to get each combination of i and j is fine. What about running the loop from index i to index j to calculate xor?</p>
<p>Suppose, you have calculated xor from index 2 to 5, and later on, to calculate from index 2 to 7, why would you calculate xor from index 2 to 5 again?</p>
<p>Or suppose, if you want xor from index 1 to 4, you already know, xor from index 2 to 4 in the process of calculating from index 2 to 5 and just need to xor it with 1, right?</p>
<p>So, we need some means of storing things.</p>
<p>Let’s optimise here first.</p>
<p>What we will be using here is called a precomputed array.</p>
<p>Precomputed array is an array which has computations which we make in the starting and use them later on.</p>
<p>What do we need to precompute here?<br />Let’s see.</p>
<p>What if I store the xor values of index 1 to index i in array[i] and form the array for i = 1 to N.</p>
<p>What I mean is, suppose we have an array,</p>
<p>2, 4, 3, 6, 8, 7</p>
<p>If I can store an array as</p>
<p><em>{^ indicates xor}</em></p>
<p>2, 2^4, 2^4^3, 2^4^3^6, 2^4^3^6^8, 2^4^3^6^8^7</p>
<p>pre[i] = pre[i-1] xor a[i]</p>
<p>This is the precomputation we will do, this array will be the precomputed array. But why are we doing this exactly?</p>
<p>Let’s see the properties of xor once again.</p>
<p>Triangular property of xor is,</p>
<p>[A xor B = C] =&gt; [B xor C = A] =&gt; [A xor C = B]</p>
<p>So, suppose, I need to find, xor from 3rd element to 5th element,</p>
<p>That is,</p>
<p>3 ^ 6 ^ 8.</p>
<p>Now,</p>
<p>2^4^3^6^8 = (2^4) ^ (3^6^8)</p>
<p>Now, with the triangular property, I can write it as</p>
<p>3^6^8 = (2^4^3^6^8) ^ (2^4)</p>
<p>Which is the xor of 2nd and 5th element in the precomputed array.</p>
<p>So, xor from 3rd to 5th element is the xor of pre[2] and pre[5].</p>
<p>So generally speaking,</p>
<p>A<sub>i</sub> xor A<sub>i+1</sub> xor .... A<sub>j</sub> = pre[i-1] xor pre[j]</p>
<p>Or in speaking terms, the xor of numbers from index i to j, is the xor of (xor of numbers from 1 to i) and (xor of numbers from 1 to j).</p>
<p>Great, so once we build the precomputed array, we can calculate xor from index i to index j, in constant time.</p>
<p>Given this, we need to run just two loops now, and for each pair of i and j, we need to find the max out of all [(xor 1 to i) xor (xor 1 to j)].</p>
<p>So, this, now takes, O(N<sup>2</sup>).</p>
<p>Still, not enough, we can optimise it more.</p>
<h2 id="heading-optimised-solution">Optimised solution</h2>
<p>Let’s use tries like in the previous use case.</p>
<p>Let us rewrite the original problem.</p>
<p>Given a<sub>1</sub>, a<sub>2</sub>, .... a<sub>n</sub>, find i and j , i &lt;= j, such that a<sub>i</sub> xor a<sub>i+1</sub> xor .... a<sub>j-1</sub> xor a<sub>j</sub> is the maximum possible value.</p>
<p>Can now be written (thanks to precomputed array) as,</p>
<p>Given a<sub>1</sub>, a<sub>2</sub>, .... a<sub>n</sub>, find i and j , i &lt;= j, such that (xor from 1 to i) xor (xor from 1 to j) is maximum.</p>
<p>So, basically, since we have values in precomputed array as 1 to i, we need to find two numbers from the precomputed array, whose xor is maximum.</p>
<p>Hold on, wasn’t that the same question in the previous usecase?</p>
<p>Except that, in the previous usecase, we found out two numbers with maximum xor from given array, and here we are trying to find two numbers with maximum xor from precomputed array, since, our precomputed array consists of numbers from 1 to that index of the original array.</p>
<p>That’s it, you find the precomputed array, consider this as the original array and proceed exactly as in the previous usecase.</p>
<p>This is the optimised solution.</p>
<p>The complexity is :</p>
<p>O(NlogMAX) + O(N) [for creating precomputed array] = O(NlogMAX), same as the previous usecase.</p>
<p>We will explore the next usecase in another problem.</p>
]]></content:encoded></item><item><title><![CDATA[XOR using Tries - Part 1]]></title><description><![CDATA[Trie
Trie can store information about keys/numbers/strings compactly in a tree.
Tries consists of nodes, where each node stores a character/bit. We can insert new strings/numbers accordingly.
Storing numbers in trie
We can store numbers in trie using...]]></description><link>https://backend.engg.wiki/xor-using-tries-part-1</link><guid isPermaLink="true">https://backend.engg.wiki/xor-using-tries-part-1</guid><category><![CDATA[algorithms]]></category><category><![CDATA[Trie]]></category><category><![CDATA[XOR]]></category><category><![CDATA[data structures]]></category><dc:creator><![CDATA[Pradeep Chodisetti]]></dc:creator><pubDate>Thu, 18 Jan 2024 16:21:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705595385281/7accbe6b-581f-41c8-8519-cb9edbad05f1.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-trie">Trie</h1>
<p>Trie can store information about keys/numbers/strings compactly in a tree.</p>
<p>Tries consists of nodes, where each node stores a character/bit. We can insert new strings/numbers accordingly.</p>
<h1 id="heading-storing-numbers-in-trie">Storing numbers in trie</h1>
<p>We can store numbers in trie using binary representation.<br />If we have 4 numbers - 1, 4, 5, 7.</p>
<p>We first write binary representations of the same, which are:</p>
<p>001<br />100<br />101<br />111</p>
<p>And then, for every node, 0 goes on the left hand side and 1 goes on the right hand side.</p>
<p>Lets insert 1, 0001</p>
<ol>
<li><p>We start of with epsilon (E), which is root.</p>
</li>
<li><p>Then first bit is 0. So, go to left of root, create a node.</p>
</li>
<li><p>Next bit is 0, go to left of this node and create a node.</p>
</li>
<li><p>Next bit is 0, go to left of this node and create a node.</p>
</li>
<li><p>Next bit is 1, go to right of this node and create a node.</p>
</li>
<li><p>Similarly for other numbers.</p>
</li>
</ol>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJJeDG1xHyx20Z9q6BZjHeFWHnRXf-39h9hrINoEL_9eDYPRxIUdlHIbRCsYgbPnq841SUTD36TIt186C1czd5oyKHdnHI1-cqoQKg29fFFeUmAP9g2-iLSe-so0aWpTDHiyRd2WtmirYCLgz831xtO36nMEOkzjYFjgmmjqgA4MErXdScSRifCrNZxA/w658-h418/t1.png" alt /></p>
<p>While traversing the trie, if we already have a node while inserting 0 or 1, simply go to that node and move on to the next bit.</p>
<p>The number of leaves of the tree are the number of integers we are storing in the trie.</p>
<h1 id="heading-xor">XOR</h1>
<p>Exclusive or is a logical operation that outputs true only when inputs differ.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3F66TzFdtD5I533yi_rZR2DE8K5cESBBuY7GuHTaP53BsLYLcTXlbsiyVkJXqD4qLNXlJCTr6gog54CubY_jTefw6OFF1xRexORbeqmWCsjuVe4aASA9YaAjzi-y8kghiHjaFxbTfe4xZ73eGbVsJHtiTFBtdCNZtgfBQ9NcMnhjBu5TgEGIJrYaRrg/s320/t2.png" alt /></p>
<h1 id="heading-problem">Problem</h1>
<p>Given an array, find two numbers whose XOR is maximum in the array.</p>
<h2 id="heading-simple-solution">Simple solution</h2>
<p>Run two loops and for every pair combination, calculate the xor. If the xor is greater than max_xor_so_far, it replaces the value in the max_xor_so_far.</p>
<h3 id="heading-algorithm">Algorithm</h3>
<p>A[n] = [a1, a2, a3 …. an]<br />Max_so_far = -1<br />For i=1 to n<br />    For j=1 to i<br />        If a[i] xor a[j] &gt; max_so_far<br />            Max_so_far = a[i] xor a[j]<br />Print max_so_far</p>
<p>But, as we see, this has 2 loops and so takes O(N<sup>2</sup>).</p>
<h2 id="heading-optimised-solution">Optimised solution</h2>
<p>Now, we use the properties of xor, bit manipulation and a data structure, trie to optimise this solution.</p>
<p>Let us try exploring the properties of xor.</p>
<p>Now, basic properties:</p>
<p>0 xor 0 = 0<br />0 xor 1 = 1<br />1 xor 0 = 1<br />1 xor 1 = 0</p>
<p>Now, forget about getting maximum xor pair from the array.</p>
<p>What is maximum N-bit number we can get when we xor an N-bit number with another N-bit number?</p>
<p>The answer is simple, it is all 1s.</p>
<p>Or</p>
<p>XXXXXXX xor YYYYYYY = 1111111</p>
<p>So, basically if I have a number say 19, I can represent it in binary form as 10011.</p>
<p>If I wish to xor it with some other and get 11111 (all 1s), what do I xor it with? 🤔</p>
<p>Here is where we use the basic properties.</p>
<p>Let’s call that number X.</p>
<p>So,</p>
<p>19 xor X = 31</p>
<p>Or</p>
<p>10011 xor X = 11111</p>
<p>From left hand side to right hand side,</p>
<p>1st bit of X should be 0, since, 1 xor 0 = 1<br />2nd bit of X should be 1, since 0 xor 1 = 1</p>
<p>Likewise...</p>
<p>The number should be 01100</p>
<p>So, basically every bit of X should be opposite of every bit of 19 to get 31.</p>
<p>10011<br />01100<br />--------<br />11111</p>
<p>At this stage, we can have an intuition for our original problem. So, basically, for each number in the array, I need to look for some other number in the array whose bits are opposite of this number. Pretty simple right?</p>
<p>But, how do we ensure we have a number with opposite bits "at all places" for each number in the array?</p>
<p>Here’s the interesting part, we, sort of, compromise at places where we are unable to get the opposite bit, so, we take the same bit and move on to the next place.</p>
<p>What we mean by this will be much clearer through the below example.</p>
<p>Suppose, I have an array: 9, 3, 10, 1, 13.</p>
<p>We have already learnt how to insert numbers in a trie, so using that, insert 9 into the trie.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqV2LPEic-Ku2hN6SasxPK3IXkHf9OpMefGAK4RpAWwDhOJPkW8JHr6JSc2m7H9t1eZ02BhBgyeqGjx7YzmZ7osQsy0ibnMASUJQ9769Zh9HxnyyDZb6AEtNR-qnUXhvjPX_11iZ9Px4ableYc3SQ-tgisbkWyyzizJWqHUnq9lelzJ91OM_r_1g5WIg/w255-h400/t3.png" alt /></p>
<p>Now, consider number 3.</p>
<p>3 = 0011</p>
<p>So, we need 1100 to make 1111 (maximum possible), that is 0011 xor 1100 = 1111</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2-h4D2KZ8e5yuLlZxM5SAaf3iG9d8Vid6LqjdCScQN-wTBzemRx82uhVLTmKI5DaMePW7zoQ_w4Gil5EPPCLafvgXyeNfmvw1sVdJos84BRkX5aV6P5TUYxtZ1KeFxwEnFIl73KhyE8TYGh5RQNgs_0fb0xCQCEFuiOWMkqqi6RX1YiAP0Iv-WNeFDA/w284-h400/t4.png" alt /></p>
<p>We have 1, go to it.</p>
<p>Next, we need 1 but don’t have it, so we compromise and take 0 and go to it.</p>
<p>Next, we need 0, so go to it.</p>
<p>Next, we need 0, but don’t have it so we compromise again and take 1 and go to it.</p>
<p>Finally, we get 1001, which is 9.</p>
<p>So, before 3, 9 is the number that will fetch the maximum xor for 3, that is, 3 xor 9 = 10.</p>
<p>This is max_so_far = 10.</p>
<p>Now, insert 3 into the trie.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkf0LP9IRYqX5E5Y_2QwmNS5e3KiMtdT4l2UKdy6_zQhjeXs2EdDqrrH96xAxpRkPj7yyc3rh-lJ4GIrFQ464v2oSxsCfxM957A8dX0sl1_-x3_peQ65Va9BmqtHXq-PqPiVZASRSw_A_q9A7zDorezoHB4NKrEkT2Jj0p50Kx3HSlKue1rJgHZm9tnQ/w400-h375/t5.png" alt /></p>
<p>Now, consider number 10.</p>
<p>10 = 1010</p>
<p>So, we need 0101 to make 1111.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiEwR9elo80VpHCRVLMym-6pqSrlVOaJqdlxd00WWvusOqfNJY-2Og34CxSNGjbCETOaS9PO801TCwBw4jyLDcAT_i5z4mOLZiFJ1mSlj8kLUoKa17eea65OpJJZjd_QzSxoroZ3ZhRikEBpFLm7NMukoW0Mzwdq1E_uwOjNPfI2byP_dU2Le5FOE37g/w400-h351/t6.png" alt /></p>
<p>We have 0, so go to it.<br />We don't have 1, so take 0 and go to it.<br />We don't have 0, so take 1 and go to it.<br />We have 1, so take it.</p>
<p>So, finally we have 0011 which is 3 as the number in the array before 10, which gives us the maximum possible xor for 10 (considering numbers appearing before 10 in the array), that is, 9.</p>
<p>As max_so_far = 10 &gt; 9, our max_so_far stays.</p>
<p>Insert 10 into the trie.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzZ9bPWoQj10YMh3Aq4E3mqbxOHUIm1DHuBScC2WibnMIBI8RGoxdENsSNgzDwdoWnjs9O_5n2OqXV70sn00J8Aree3w2wZVW046RZ9i9QqxMEF3dyKJ06beLEthczdkT_8l_oNwJhUGdxp-HjDNImYkqGqVr7TBQ9rV4_rK_kl9pS3ItGtbltEqbbcw/w400-h360/t7.png" alt /></p>
<p>Now, consider 1.</p>
<p>1 = 0001</p>
<p>So, we need 1110 to make it 1111.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUTU-wabl64KCjpvypDvQXMrL3kxQKfr9g3L-gsM34J7P9QM7nWj9Iw-TDzGH0Gv2e7-k7h1axS0DBFHIELe9QNzBSnGsHU3DhrEpz3XqjcmD9CAq2TtTowmgkMcDFTLg9PKsIH3Lv_UFxO3aSPLT9EqjfLC_MNntubv21CVwXoeX4ytfs7k_k53bCEw/w400-h366/t8.png" alt /></p>
<p>We have 1, so take it.<br />We don't have 1 so, take 0.<br />We have 1 so take it.<br />We have 0 so take it.</p>
<p>So, we get 1010 which is 10, so for 1, 10 gives us maximum possible xor (considering numbers appearing before 1 in the array), that is, 11.</p>
<p>Max_so_far = 10 &lt; 11, so max_so_far = 11</p>
<p>Insert 1 into the trie.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjo0BLiKdOSo0WbwC-LC1Kc0lBAAL9Nt6274iqSFPjnMC41na5V8Mols2oK_o5K3TZkOINnyD_WEXTATPr9qrh8opVcmzANBPl6OhwXe15sioh1190WcT4Z-46t-pkCzBGicl6vsD5ZeBJ2Ljd_t5cxHLt7UbeucGLneZv_Q-mh3r7p3MnAEhp0axDrug/w437-h366/t9.png" alt /></p>
<p>Now, consider 13.<br />13 = 1101<br />So, we need 0010 to make 1111</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPu2_-DTUC0TQab9phhKtjoKWM7x4XFF7jdNbdhmA_rfPf-cU7TYKfngHb-3nLvN3x2A-NWDJIYreQHSFItn0aD5L5CE7sbGlkU3MoKKebYjFLV6RQXhz1duRzKSaidsrd7bl80dIYdhnlEOuHqdYjzs2cewaB3AzqXoGDdVz7nXhsgCwmAKL5_fPCig/w509-h417/t10.png" alt /></p>
<p>We have 0 so take it<br />We have 0 so take it<br />We have 1 so take it<br />We don't have 0, so take 1</p>
<p>So, we get 0011, which is 3, so for 13, 3 gives us maximum possible xor, that is, 14.</p>
<p>Max_so_far = 14 &gt; 11, so max_so_far = 14</p>
<p>Insert 13 into the trie.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwt44afdyqpL9URznJJ3tZGZMJCDipaFjhlPp1u8R50X2OPDTx-lLpJVhL08PAxVFSNsnGrxCzNfjYNgddPKebPthz6QezMehS5xQ0mSDUPuk5xeIKfUGq3Cj0UMv8qc1wlNzPBCvEY-DFt9sWVfOvQ1m216EugdDpdszidkPZyUSHxq_ffTszCf5NHg/w640-h414/t11.png" alt /></p>
<p>Now, this is the complete trie with all the numbers.</p>
<p>So, finally the maximum xor we got from the array for a pair of numbers is 14.</p>
<p>Now, let’s see it’s time complexity.</p>
<p>Well, we are inserting N numbers into trie.<br />Each insertion takes log(MAX) operations.</p>
<p>What is MAX now?</p>
<p>MAX is the number which is maximum possible for given number of bits or say (all 1s).</p>
<p>How to choose it, well, choose the maximum number from the array, 13 in the above case, choose the next power of 2 minus 1, which is 16-1, 15, so, 1111, so 4 bits. So we need a trie where we represent every number in the array as 4 bits.</p>
<p>Next the traversal in the trie takes log(MAX) operations too. For N numbers we do N traversals, so, totally for insertions and traversals, thats 2*N*log(MAX) which is, O(N log(MAX)).</p>
<p>Neat! We have drastically reduced it from O(N<sup>2</sup>) to O(N log(MAX)).</p>
<h3 id="heading-algorithm-1">Algorithm</h3>
<p>A[N] = [a1, a2, a3 ..... an]<br />insert_in_trie(a1)<br />for i = 2 to N<br />    max_xor_so_far_for_ai = traverse_in_trie_to_get_max_xor(ai)<br />    if max_xor_so_far_for_ai xor ai &gt; max_so_far<br />        max_so_far = max_xor_so_far_for_ai xor ai<br />    insert_in_trie(ai)<br />print max_so_far</p>
<p>We will explore another similar use case in <a target="_blank" href="https://engg.wiki/xor-using-tries-part-2">another problem</a>.</p>
]]></content:encoded></item></channel></rss>