tag:blogger.com,1999:blog-2911170013833165042024-03-17T00:21:36.958+05:30LinuxTopicLinux Tutorials, Ubuntu, Tutorial, Linux Administration, Linux Admin, System Admin, System Administrator, CentOS, Linux, Kali Linux, Linux basic command, Linux Mint, server configuration, automation tools, DevOps Tools, apache, cloud, OpenStack, Ansible Tutorials, Kibana, ELK, AWS, Docker, Kubernetes, Terraform and so forth. for beginners and professionals, we tried to create all posts with a screenshot. Linuxtopichttp://www.blogger.com/profile/12411230835520798076noreply@blogger.comBlogger22411tag:blogger.com,1999:blog-291117001383316504.post-457916891513848362024-03-02T17:36:00.003+05:302024-03-02T17:36:43.083+05:30Linux Command: How to print all URL from file - grep url from file linux<h3 style="text-align: left;"><b><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEj7hr7Hk7f7SzcEyteI80tfdz6hrke0Bm6hrGQ2inrzbUZgoD8fNcMX1h9ghPJj9YVTq5tMaME6_zSlp5MVsqWjfO21WV3u5d0mnzjjHPZnH1wJ_Iw8PkCV1oe8P-I0uN8lyNM-EzOY_gM0gp5wS19SWjouK7KgQQkbhyYHa1-zo8OTGcCCyL1O_eA4H1o" style="margin-left: auto; margin-right: auto;"><img alt="grep all http urls, grep url from file, print text from file" data-original-height="251" data-original-width="326" src="https://blogger.googleusercontent.com/img/a/AVvXsEj7hr7Hk7f7SzcEyteI80tfdz6hrke0Bm6hrGQ2inrzbUZgoD8fNcMX1h9ghPJj9YVTq5tMaME6_zSlp5MVsqWjfO21WV3u5d0mnzjjHPZnH1wJ_Iw8PkCV1oe8P-I0uN8lyNM-EzOY_gM0gp5wS19SWjouK7KgQQkbhyYHa1-zo8OTGcCCyL1O_eA4H1o=s16000" title="grep all http urls - grep url from file" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">grep all http urls - grep url from file - grep url regex</td></tr></tbody></table><br /><br /></b></h3><h3 style="text-align: left;"><b>Scenario</b></h3><div>I have one sitemap.xml file that is too big and in this sitemap.xml file have http/https url something like this</div><div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgLhH6ENMJ-Bqmcj4t61PS_O_DCCMa1epjETrYJSYCfub_90_2DqhvDiGLcx6GI0x9lhkOR9MUwQkz52yy9bUIidQ7rVa4HFxzC60Nqypki3pxSWzhEn7JOIbY0ibhfH3WR43qTj2kVWvKD7dDDTx-E_YqcpLaDOOqpK2YD3CMsxnT4WuvDUjqcpzX_wUg" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="88" data-original-width="1275" height="44" src="https://blogger.googleusercontent.com/img/a/AVvXsEgLhH6ENMJ-Bqmcj4t61PS_O_DCCMa1epjETrYJSYCfub_90_2DqhvDiGLcx6GI0x9lhkOR9MUwQkz52yy9bUIidQ7rVa4HFxzC60Nqypki3pxSWzhEn7JOIbY0ibhfH3WR43qTj2kVWvKD7dDDTx-E_YqcpLaDOOqpK2YD3CMsxnT4WuvDUjqcpzX_wUg=w640-h44" width="640" /></a></div><br />I need all those url for submitting for review and approval by indexnow tools</div><div><br /></div><h3 style="text-align: left;">Challenges: </h3><div><br /></div><div>We don't know how many url's exist on this sitemap.xml file and manually activity will take a lot of time also could be change of human error.</div><div><br /></div><h3 style="text-align: left;">Solution: </h3><div>Grep command can help me on this situation, ( Let me comment if we can do this via any other option ) </div><div>Here, I have window Operation System with installed mobaxterm ssh manage and I'm using local terminal on it so let's try </div><div> </div><div>1 - I have downloaded the sitemap.xml file</div><div><br /></div><div><b><blockquote>curl https://www.linuxtopic.com/sitemap.xml -o /tmp/sitemap.xml</blockquote></b></div><div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjcM7AMBa_69qWlP36ss14xoxbItCtce2sZfuuh1wRNAjgmGpl94IqDzuEX7otbglc9dpX-fSctwiq6ggWh8-DI5r8HjKhQMejiAAVf3BEAIimlrJunlLCdXLbRhEBcugDUCVbTtFXgNA9_msi_ORIiif6RauUHIrlpKdI9PQ1fKQqwVTtXA5uNvWvvqmk" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="81" data-original-width="936" src="https://blogger.googleusercontent.com/img/a/AVvXsEjcM7AMBa_69qWlP36ss14xoxbItCtce2sZfuuh1wRNAjgmGpl94IqDzuEX7otbglc9dpX-fSctwiq6ggWh8-DI5r8HjKhQMejiAAVf3BEAIimlrJunlLCdXLbRhEBcugDUCVbTtFXgNA9_msi_ORIiif6RauUHIrlpKdI9PQ1fKQqwVTtXA5uNvWvvqmk=s16000" /></a></div><br /><br /></div><div><br /></div><div>2 - Once download, use below grep command to print all the http URLs</div><div><br /></div><b><blockquote>grep -o -E "https?://[][[:alnum:]._~:/?#@&'()*+,;%=-]+" /tmp/sitemap.xml<br />OR<br />grep -Eo "(http|https)://[a-zA-Z0-9./?=_%:-]*" /tmp/sitemap.xml </blockquote></b><div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjgt5VBXd7iX9wafelQgW2myXf6ZnkoLdIuEtKJDr9eoE43nh4cFwgJkC6FdrWkCfkYG-o3xCjff1a4M4WujYbqdsyARqfXHzRghwBNf3dOJo1NJWBLBOG7ppd9WT4c7xA8ETKdTPGlClMQkEQNb9Zw12DIeSKx_QZAG82pWIcjChgd_srIlhsRZmaT2dA" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="403" data-original-width="1012" src="https://blogger.googleusercontent.com/img/a/AVvXsEjgt5VBXd7iX9wafelQgW2myXf6ZnkoLdIuEtKJDr9eoE43nh4cFwgJkC6FdrWkCfkYG-o3xCjff1a4M4WujYbqdsyARqfXHzRghwBNf3dOJo1NJWBLBOG7ppd9WT4c7xA8ETKdTPGlClMQkEQNb9Zw12DIeSKx_QZAG82pWIcjChgd_srIlhsRZmaT2dA=s16000" /></a></div><br />So, I got all the URLs and submitted for review. </div><div><br /></div><div><br /></div><div><i><br /></i></div><i>Thanks you !! </i><div><i>I hope this topic gave you all the information you needed. If you have any further questions or would like more detailed directions feel free to contact us using any of the following sources.We look forward to talking to you.</i></div><div class="blogger-post-footer">https://draft.blogger.com/feeds/291117001383316504/posts/default</div>Lokesh Carpenterhttp://www.blogger.com/profile/17797261793417041421noreply@blogger.com0