<![CDATA[Ramith's Space]]>https://ramith.fyi/https://ramith.fyi/favicon.pngRamith's Spacehttps://ramith.fyi/Ghost 4.33Sat, 13 Apr 2024 00:01:50 GMT60<![CDATA[An inspiring open science journey to remember 💙]]>Aya (Model & Dataset) is going to be released in 6 hours from now. Meanwhile, I thought of writing my thoughts on my journey and the things I have learned while collaborating with so many people across the world who had a common set of values and goals. I'

]]>
https://ramith.fyi/an-open-science-journey/65caf7db51260a3ba3ee4cdaTue, 13 Feb 2024 04:29:00 GMTAya (Model & Dataset) is going to be released in 6 hours from now. Meanwhile, I thought of writing my thoughts on my journey and the things I have learned while collaborating with so many people across the world who had a common set of values and goals. I've been part of this effort since June 2023, and time flew so fast that I forgot how I even joined this in the first place. So this is a reflection for me to look back again at this day, from sometime in the future.


We all know that data is a cornerstone of AI advancements. But apart from English, most languages in the world have negligible representation on the Internet. Can we change this with the power of community collaboration? This was what Aya was all about. It was a massive global effort to uplift many under-resourced languages in the context of the current natural language advancement landscape.

Before elaborating on how I joined this project and the specifics of it, I want to think out loud about open science efforts. During the duration of this project, I saw,

  • people from various age groups, various levels of expertise gather towards building datasets for under resourced languages
  • like minded individuals coming together for a common goal and inspiring more people to work on these research questions
  • people who are new to research leading this effort, carefully paying attention to very subtle details and writing a research paper encapsulating the effort in terms of a scientific article [1,2].

All this unfolding on Discord was so inspiring to watch. And to finally see a multilingual large language model (LLM), proficient in 101 languages – and more importantly, seeing Aya-101 [1] proficient in my native language, Sinhala (සිංහල) too was such a joy 🎉❤️.


I asked these two questions from Aya in Sinhala: 1) වෙහෙසකර අභියෝගයක් සාර්ථකව ජය ගත් පසු කුමක්ද කරන්න ඕනේ? 2) ලංකාවෙ නිදහස් දිනය කවද්ද?
While there is a long way to go for Sinhala LLMs I'm honestly suprised that it follows Sinhala instructions quite nicely!


The Start

If I recall correctly, during my final year of undergrad, I joined C4AI (Cohere For AI$^\diamond$) discord server. I saw so many wonderful initiatives by this community to help students/researchers who wanted to learn about AI and ultimately contribute to research. I didn't follow these initiatives closely due to other time commitments. But every once in a while, I would check out the cool initiatives of this community.


Filling out a Google Form (May 2023)

On May 8th, I was getting ready for a morning run and saw this message on Discord.


When filling something up, I always think whether it is something I can do given my other commitments$^\ddagger$ 😅

I knew I didn't have much time to dedicate, but the message above seemed reasonable so I filled it out. It seemed like only knowing the language was enough. Words like multilingual and underrepresented languages made it so tempting to fill out the Google form.


An Email ✉️

June, 2023 was a great month, I was finishing up some experiments for my ICML 2023 workshop paper, and writing the paper. And then I received this email..

it was sort of unexpected.. felt serious, but then again I felt like it's something I'm capable of doing during my free time (a weekly meeting, spreading info about this project sounded good and doable 🤷🏻‍♂️).

So I replied that I'd want to help out to represent my native language! $^\dagger$.

Journey as a Language Ambassador

I attended the first meeting where I got introduced to Project Aya. The presentation was incredible... I thought to myself, "a great research group with a clear vision of what they want to achieve in 2023..". There was so much positivity and hope within this community, and I loved that energy!

Somewhat of a rough start for Sinhala 🤕

Now it was my turn to contribute to my language.  I remember going to the Aya annotation platform$^\star$, and filling out my details (such as languages I speak).

I was surprised to see Sinhala prompts and completions 😃, but soon I realized that they are machine translations which most often did not make much sense 🤕 (see left side of the below picture for some context).

So, we gradually started to refine those. Soon with the help of many Sinhala contributors, we got a massive pool of prompts and completions that we kept on refining.

Realizing that we need to speed up

From June to the end of September, we gained a lot of contributions, but soon, we realized our pace was too slow 😬. So what we did was to visualize $^\star$ our goal and work towards it. Furthermore, we spread out the message through many contributors (kudos to Jalina, Chamod, Nawoda, and Chanuka)

(a discord message from the Aya Server) 

So we accelerated our pace! 🇱🇰

For the next remaining 3 months, all Sinhala contributors helped immensely to reach our goals – we even surpassed our original goals!

The pictures below capture some of the initiatives we took to gather more contributors..


]]>
<![CDATA[📑 Papers accepted to workshops, Sampling and Optimization in Discrete Space (SODS) ፨ and Differentiable Almost Everything (DiffAE) 〆 at ICML 2023 🎉]]>
Paper TL;DR : We introduce a differentiable approach to search for phylogenetic trees. We optimize the tree and ancestral sequences to reduce the total evolutionary steps (parsimony cost).


Check out our work at ICML 2023 workshops - Sampling and Optimization in Discrete Space (SODS) ፨  (Saturday, July 29) and

]]>
https://ramith.fyi/papers-accepted-to-icml-workshops/6496959de89c8ce2bc219a66Sat, 24 Jun 2023 07:07:44 GMT
Paper TL;DR : We introduce a differentiable approach to search for phylogenetic trees. We optimize the tree and ancestral sequences to reduce the total evolutionary steps (parsimony cost).


Check out our work at ICML 2023 workshops - Sampling and Optimization in Discrete Space (SODS) ፨  (Saturday, July 29) and Differentiable Almost Everything (DiffAE) 〆 (Friday, July 28)


Update 2023.09.19 : Eric J. Ma has written a very detailed article on our paper's key contribution : making the trees and sequences differentiable. You can read it here. It does a great job at explaining our method.

0:00
/


]]>
<![CDATA[Lion Optimizer (optimizer discovered through evolutionary search)]]>

I recently discussed the Lion Optimizer at the Journal Club of Wadduwage Lab. It’s an optimizer discovered through regularized (aging) evolution. Authors have dealt with the vast search space through clever tricks. You can find my slides and video below. (video audio quality is bit low though)

  • Slides
]]>
https://ramith.fyi/lion-optimizer-optimizer-discovered-through-evolutionary-search/64612ceb10437803b8251b62Sun, 14 May 2023 18:54:13 GMTLion Optimizer (optimizer discovered through evolutionary search)

I recently discussed the Lion Optimizer at the Journal Club of Wadduwage Lab. It’s an optimizer discovered through regularized (aging) evolution. Authors have dealt with the vast search space through clever tricks. You can find my slides and video below. (video audio quality is bit low though)

]]>
<![CDATA[ESM-2 (evolutionary-scale prediction of atomic level protein structure with a language model)]]>
✏️
I write these paper 'summaries' for me to clearly understand the paper by summarizing the paper and synthesizing the literature, I hope it might be helpful to some readers too. If you have any feedback please write to me hello<at>ramith.fyi or
]]>
https://ramith.fyi/esm-2-evolutionary-scale-prediction-of-atomic-level-protein-structure-with-a-language-model/63e662ef336cba05ab7bc8c6Fri, 10 Feb 2023 22:42:19 GMT
✏️
I write these paper 'summaries' for me to clearly understand the paper by summarizing the paper and synthesizing the literature, I hope it might be helpful to some readers too. If you have any feedback please write to me hello<at>ramith.fyi or comment below.
ESM-2 (evolutionary-scale prediction of atomic level protein structure with a language model)

Highlights of the ESM-2 Paper

💡
• Train protein language models upto 15B parameters$^\bigstar$

• Infer structure directly from primary sequence using LLM

• LM leverages the evolutionary patterns captured in the LM to produce atomic level predictions

• Order of magnitude faster (60x) in high res structure prediction

• Present a ESM metagenomic atlas (structural characterization of of more than 617 million$^\ddagger$ metagenomic proteins$^\dagger$)


1. Introduction

1.1 - Structure and Function is Hidden in Sequences.

Biological properties of proteins influence which position/(s) in its sequence can undergo mutations. Therefore, based on these observations, we can define types of evolutionary functions that has happened such as coevolution and conservation of amino acids. These observations can lead to infer properties regarding the function and structure of proteins.$^\star$

Usually we rely on aligning sequences before we can draw conclusions into the function and structure. This intermediate representation known as multiple sequence alignment (MSA) has a high time complexity as we have to 1) search for related sequences first$^\star$, and 2) align them.

What if we can get rid of this intermediate representation? That's one aspect this paper accomplishes.

1.2 - Large language models (LLMs)

Historically LMs were pretrained by using techniques such as predicting the next word in a sentence. But Devlin et al. [BERT] showed that just by masking some words in the input and trying to predict it ("masked language model objective - MLM") is a better pretraining strategy $^\star$.

1.3 - Contributions

Inspired by this widely adopted strategy, the authors of this paper hypothesise that filling missing amino acids might result in learning things which are valuable enough to infer the structure. Thus, they scale protein language models from 8 million parameters upto 15 billion parameters. Doing so reveals the following,

  • Enable of atomic level structure prediction directly from sequence.
  • strong correlation in perplexity and accuracy (of structure prediction)
  • 60x speed improvement on inference
  • No need for search process of related sequences

Because of this one to two orders of speed improvement and the fact that MSA is not needed, they expand structure prediction to metagenomic proteins which is much greater in extent and diverse as well. Therefore, in summary they,

  • Predict structures for all sequences (over 617M) in MGnify90 $\dagger$
    • Out of 617M proteins, 225M structures have high confidence.
      • Out of high confidence ones 76.8% are disjoint from the UniRef90 dataset by atleast 90% of sequence identity.
      • 12.6% have no experimental groundtruth.

2. Method

2.1 - How does structure emerge from LM trained on sequences?

ESM-2 language model is trained with ~65 million unique sequences $^\bigstar$. Because of the MLM objective, we ask the model to predict missing pieces (amino acids) of the sequences using the neighbouring amino acid context. Therefore, the assumption is that the model needs to learn inter-dependencies of amino acids. In previous work [1] and [2], it was shown$^\dagger$ that transformer models trained with MLM on protein sequences develop attention patterns which corresponds to the residue-residue contact map.

After training the LM, in order to compute the contact map from attention patterns, the authors use the approach in [2], where they use logistic regression to identify contacts as follows.

ESM-2 (evolutionary-scale prediction of atomic level protein structure with a language model)
(Source: Rao, Roshan, et al. "Transformer protein language models are unsupervised structure learners." Biorxiv (2020))

2.2 - Ok, what about atomic level structure?  (Enter ESM-Fold)

While authors extract the contact map from the attention maps, in order to extract spatial coordinates of the atoms, they use an equivariant transformer. This is the structure module introduced in AlphaFold. This equivariant transformer makes it possible to project out the spatial coordinates of the atoms just by using the internal language model representation. This architecture is referred to in the paper as ESMFold.

Steps in ESMFold

  • Process sequence through ESM-2
  1. Pass representation learnt by ESM-2 to a series of folding blocks
    2.1 - Each block sequentially updates a sequence representation and a pairwise representation
  2. Pass the output to the structure module
  3. (repeat with 3 steps of recycling) (view code)

ESM-2 (evolutionary-scale prediction of atomic level protein structure with a language model)
(ESMFold Architecture : img credit - Lin et al.)

Training : To train the structure model to obtain spatial coordinates, they use, experimentally determined structures from PDB (~25K clusters covering a total of ~325K structures).
This is augmented with 12M structures predicted$^\ddagger$ with AlphaFold2.

Evaluation : 194 CAMEO Proteins and 51 CASP14 protiens

This language model based approach vastly simplifies the usual SOTA structure prediction process by eliminating the need for the following $^\dagger$,

  • External evolutionary databases
  • Multiple sequence alignments (MSAs)
  • Templates


3. Results

3.1 - How well does it predict structures ?

As mentioned before, they evaluate performance on CAMEO and CASP14 proteins and check how well the structure was predicted using the TM-Score.

In predicting the structure just by single sequences, ESMFold achieves very good performance compared to AlphaFold and RoseTTAFold.

ESM-2 (evolutionary-scale prediction of atomic level protein structure with a language model)
(Figure 2B from the ESM-2 paper)

3.2 - How important is the language model in the pipeline ?  

The key question that arises is how important is the representation learnt by the LM for the task of structure prediction. To quantify this we need several metrics.

First, we need to characterize how good the understanding of the language model (ESM-2) is. This is where perplexity comes in. We already have the TM-score to determine how well a structure matches the groundtruth.

Thus, the graph to the right in Fig. 2B shows that,

  • High ESMFold TM-Scores have low perplexity scores (numerically speaking, on CAMEO, Pearson correlation coefficient is -0.55 and in CASP14 it's -0.67)

How can we achieve better perplexity?

Okay, now we know that having better language model representation (lower perplexity) leads to better structure prediction.
So how can we achieve better language model representation ? 🤔  Is scaling all you need ?

To answer this question, authors explore the effect of scaling and look at what happens to the following :

  • Precision @ L
  • Change in perplexity
ESM-2 (evolutionary-scale prediction of atomic level protein structure with a language model)
(Figure 1D of the paper)

So they plot how the long range precision @ L changes once we move from a smaller model (x axis) to a larger one (y axis). From the points above the diagonal, it seems that scaling does help to achieve better long range precision @ L (some proteins show improvement).

So is scaling 'the' answer ?

It seems it's not that simple. While certainly for some proteins P@L increases as we scale, if we look at the how many evolutionary related sequences were there, it tells another story. It seems that LMs cannot perform well when there is less number of relevant training data to the query we are asking. This seems intuitive though right? more studying more results? But the thing is it seems it's more memorization than understanding the subject?

ESM-2 (evolutionary-scale prediction of atomic level protein structure with a language model)


3.2 - Ok, what about prediction speed ?

  • Protein with 384 residues on 1, NVIDIA V100 GPU => 14.2 Seconds$^\star$
  • Shorter sequences ~60x speedup

ESM-2 (evolutionary-scale prediction of atomic level protein structure with a language model)
(ESMFold performs better for shorter sequences; However since pairwise representations have a complexity of $\mathcal{O}(n^3)$, performance gets worse for large sequence lengths. Also other methods need to search and construct the MSA. This could take additional > 10 minutes as well.

3.3 - Comparison with other protein language models

ESM-2 (evolutionary-scale prediction of atomic level protein structure with a language model)

4. Conclusion

It's remarkable that these authors scale protein language models and it has resulted in learning structure hidden through databases of sequences, and thus we do not need to depend onto the MSA.

Is it because, the model has learnt to obtain the signal which we previously obtained through MSAs? What can we tell about the performance of sequences that had less number of evolutionary sequences in training data? why does it still struggle to obtain decent performance. It would be very interesting to analyze these directions.


Thanks for reading this, hope you found it useful. If you have any suggestions/ comments please share below.

References

  1. Lin, Zeming, et al. "Evolutionary-scale prediction of atomic level protein structure with a language model." bioRxiv (2022): 2022-07.
  2. https://twitter.com/Eric_Wallace_/status/1592929060539469824

]]>
<![CDATA[[Invited Talk] - Northeast Symposium on Biomedical Optics]]>
NESBO 2022 | OCT Research
Northeast symposium on biomedical optics 2022

Abstract

Computational imaging performs intelligent measurements with a “brain” made of programmed optics like meta-surfaces. These programmable optics – though packed with billions of linear operations in a cubic millimeter – often performs poorly due to

]]>
https://ramith.fyi/invited-talk/636e5c5aa56374199528099dFri, 11 Nov 2022 19:30:47 GMT
NESBO 2022 | OCT Research
Northeast symposium on biomedical optics 2022

Abstract

Computational imaging performs intelligent measurements with a “brain” made of programmed optics like meta-surfaces. These programmable optics – though packed with billions of linear operations in a cubic millimeter – often performs poorly due to fabrication constraints. Here we propose a physics-informed quantization-aware training framework that accounts for these constraints and achieves robust designs. We discuss two types of metasurfaces, a learnable Fourier filter, and a diffractive deep neural network for applications such as phase imaging and phase object classification while accounting for the aforementioned fabrication constraints.

Bio

Ramith is a Joint Post-Bac Fellow affiliated to Wadduwage Lab and So Lab in the Division of Science at Harvard University. He completed his B.Sc. degree from University of Moratuwa, Sri Lanka in Electronic Engineering. As a Post-Bac Fellow at Wadduwage Lab, he is currently working with Dr. Dushan Wadduwage on making learnable optical systems robust and realizable by factoring in practical considerations such as fabrication constraints. His research interests include using machine learning for scientific discovery and focusing on the robustness, interpretability, and equitability of machine learning algorithms.

]]>
<![CDATA[[Invited talk] - Nano-SymBioSys workshop at UiT, The Arctic University of Norway 🇳🇴]]>

Shared some aspects of the work @khpiyumantha & I are doing at Wadduwage Lab, during the Nano-SymBioSys workshop held at UiT, The Arctic University of Norway (@UiTNorgesarktis)

Workshop Day 1


]]>
https://ramith.fyi/gave-talk-at-the-nano-symbiosys-workshop-at-uit-the-arctic-university-of-norway/633118a8b7750016cc3b2f59Mon, 26 Sep 2022 07:13:12 GMT

Shared some aspects of the work @khpiyumantha & I are doing at Wadduwage Lab, during the Nano-SymBioSys workshop held at UiT, The Arctic University of Norway (@UiTNorgesarktis)

Workshop Day 1


]]>
<![CDATA[📄 Paper accepted to IEEE Signal Processing Magazine]]>https://ramith.fyi/paper-accepted-to-ieee-signal-processing-magazine/633607f0caadb679f35f8d4bTue, 09 Aug 2022 01:09:00 GMT<![CDATA[Selected to Princeton Pathways to Graduate School Program]]>

Link - https://engineering.princeton.edu/graduate-studies/academic-pathways/prospective-graduate-students

"Princeton Engineering’s annual program Pathways to Graduate School for Rising College Seniors invites high-achieving students in science, engineering and math for a series of interactive workshops aimed at breaking down barriers and boosting success in applying for doctoral

]]>
https://ramith.fyi/selected-to-princeton-pathways-to-graduate-school-program/639840c1e900c9159c630476Mon, 08 Aug 2022 18:38:00 GMT

Link - https://engineering.princeton.edu/graduate-studies/academic-pathways/prospective-graduate-students

"Princeton Engineering’s annual program Pathways to Graduate School for Rising College Seniors invites high-achieving students in science, engineering and math for a series of interactive workshops aimed at breaking down barriers and boosting success in applying for doctoral programs."

"Princeton Engineering brings together people from across academic disciplines, from industry, non-profits and government, and from all nations and backgrounds in a collaborative culture to achieve breakthroughs of benefit to humanity. Thus the Pathways program especially seeks candidates with strong potential to contribute – through their future research, teaching, and service – to the diversity and excellence of our academic community and STEM fields as a whole. Women and other historically underrepresented groups in STEM disciplines are encouraged to apply."
Princeton Engineering - Ramith Hettiarachchi
]]>
<![CDATA[Started work as a Post Baccalaureate Fellow at Harvard]]>https://ramith.fyi/started-as-a-post-baccalaureate-fellow-at-harvard/62b7913e9a70994a9a763911Fri, 01 Jul 2022 02:50:00 GMT<![CDATA[📄 Internship project research paper accepted to ICCAR 2022]]>https://ramith.fyi/my-internship-project-research-paper-accepted-to-i/6330ed68b7750016cc3b2f49Wed, 13 Apr 2022 04:09:00 GMT<![CDATA[Introduction to Adam Optimizer & advancements leading to it]]>

This is a presentation I did in one of the Journal Club meetings in the computational imaging group at Harvard (wadduwagelab).
I made a presentation on how gradient (full batch) descent [1] has evolved into the Adam optimizer [2] by tackling the optimization challenges that exist.

You can download the

]]>
https://ramith.fyi/intro-to-optimizers/63bc6c2372d69330c0ee0593Wed, 09 Mar 2022 19:36:00 GMT

This is a presentation I did in one of the Journal Club meetings in the computational imaging group at Harvard (wadduwagelab).
I made a presentation on how gradient (full batch) descent [1] has evolved into the Adam optimizer [2] by tackling the optimization challenges that exist.

You can download the slides below. 👇

Adam-Review-2.pdf
Introduction to Adam Optimizer & advancements leading to it

]]>
<![CDATA[Access Harvard FASRC through vscode without entering password & two step code every time]]>
I usually get annoyed when I need to enter the password and two step verification every time when I need to connect to the server via vscode.

Worst thing is, I need to go through the same process when,

  • Opening a new folder on vscode
  • If connection gets disrupted
]]>
https://ramith.fyi/harvard-fasrc-single-sign-on/6226ecca900bd12db4a57850Tue, 08 Mar 2022 11:10:50 GMT
I usually get annoyed when I need to enter the password and two step verification every time when I need to connect to the server via vscode.

Worst thing is, I need to go through the same process when,

  • Opening a new folder on vscode
  • If connection gets disrupted

Luckily there's a handy solution as documented in the FASRC Docs. I found that doc page through stackoverflow $^\star$.

The trick is to authentication only once, then send our extra SSH sessions (vscode) through that connection.

Step 1 - Modify the ~/.ssh/config file

Step 2 - Open a background ssh connection

ssh -CX -o ServerAliveInterval=30 -fN harvard
🎉 Once you've done step #2 one time, vscode won't ask again for credentials because we have already setup a background ssh connection

I encourage you to read the original documentation for more comprehensive detail about these two steps.

Update : 2023.01.22 : For some reason this doesn't work if I try to initiate the first connection form the terminal. However if I do on vscode first then it works.

]]>
<![CDATA[How to setup a JAX/Tensorflow 1.15 environment in the FASRC Cluster]]>

Update 2023.07.17 - Due to a cluster update, some of the packages here does not exist, @cschesch kindly shared the process that worked for him in the comment below. The general process is the same, feel free to get to know about FASRC modules below.


Note : This guide

]]>
https://ramith.fyi/how-to-setup-a-tensorflow-1-15-environment-in-the-fasrc-cluster/6217e6292320ff486adc7c52Fri, 25 Feb 2022 02:29:19 GMT

Update 2023.07.17 - Due to a cluster update, some of the packages here does not exist, @cschesch kindly shared the process that worked for him in the comment below. The general process is the same, feel free to get to know about FASRC modules below.


Note : This guide is only for setting up TF in the FASRC Cluster. I followed the official documentation listed in the references. Skip to that section if you want to learn more.

Background info

I had a lot of trouble trying to setting up JAX/old tensorflow versions on FASRC cluster. What I later realized was that, that since there are lots of diverse projects being done in FAS, there are many modules supported in the cluster which can be imported from a single command. 😆❤️

Ok, now let's proceed with installing tensorflow 1.15.

Identify which CUDA and cuDNN versions are required by the tensorflow version you need to install. (in our specific case, we need CUDA 10.0 and cuDNN 7.4)

Build from source | TensorFlow

So now we know that tensorflow_gpu-1.15 needs CUDA 10.0 and cuDNN 7.4

1. Identify FASRC Modules to load

In FAS-RC we can load additional runtime libraries (cublas, cufftw, …). To see what's available, you can run the command module-query cuda. After that we can identify that we need,

  • cuda/10.0.130-fasrc01
  • cudnn/7.4.1.5_cuda10.0-fasrc01

Identify which versions are available

[ramith@xxxxxxx ~]$ module-query cuda

-----------------------------------------------------------------------------------------------------------------------------
  cuDNN
-----------------------------------------------------------------------------------------------------------------------------
    Description:
      The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep
      neural networks.

    Versions:
      HeLmod CentOS 7
            cudnn/5.1_cuda8.0-fasrc01............... x86-64 binary built against cuda 8.0
            cudnn/6.0_cuda7.5-fasrc01............... x86-64 binary built against cuda 7.5
            cudnn/6.0_cuda8.0-fasrc01............... x86-64 binary built against cuda 8.0
            cudnn/7.0.5_cuda8.0-fasrc01............. x86-64 binary built against cuda 8.0
            cudnn/7.0.5_cuda9.1-fasrc01............. x86-64 binary built against cuda 9.1
            cudnn/7.0_cuda9.0-fasrc01............... x86-64 binary built against cuda 9.0
            cudnn/7.1_cuda9.0-fasrc01............... x86-64 binary built against cuda 9.0
            cudnn/7.3.1.20_cuda10.0-fasrc01......... x86-64 binary built against cuda 10
            cudnn/7.4.1.5_cuda10.0-fasrc01.......... x86-64 binary built against cuda 10
            cudnn/7.4.1.5_cuda9.0-fasrc01........... x86-64 binary built against cuda 9.0
            cudnn/7.4.1.5_cuda9.2-fasrc01........... x86-64 binary built against cuda 9.2
            cudnn/7.6.5.32_cuda10.0-fasrc01......... x86-64 binary built against cuda 10.0
            cudnn/7.6.5.32_cuda10.1-fasrc01......... x86-64 binary built against cuda 10.1
            cudnn/7.6.5.32_cuda10.2-fasrc01......... x86-64 binary built against cuda 10.2
            cudnn/8.0.4.30_cuda11.0-fasrc01......... x86-64 binary built against cuda 11.0.3
            cudnn/8.0.4.30_cuda11.1-fasrc01......... x86-64 binary built against cuda 11.1
            cudnn/8.1.0.77_cuda11.2-fasrc01......... x86-64 binary built against cuda 11.2


    To find detailed information about a module, search the full name.

      module-query cudnn/8.1.0.77_cuda11.2-fasrc01

    You may need to specify the build "flavor" to get a single record

      module-query cudnn/8.1.0.77_cuda11.2-fasrc01 --flavor 'HeLmod CentOS 7'
      

    

-----------------------------------------------------------------------------------------------------------------------------
  CUDA
-----------------------------------------------------------------------------------------------------------------------------
    Description:
      Module that activates the CUDA libraries

    Versions:
      HeLmod CentOS 7
            cuda/7.5.18-fasrc01..................... install cuda toolkit and samples
            cuda/8.0.61-fasrc01..................... install cuda toolkit and samples
            cuda/9.0-fasrc02........................ install cuda toolkit and samples
            cuda/9.1.85-fasrc01..................... install cuda toolkit and samples
            cuda/9.2.88-fasrc01..................... install cuda toolkit and samples
            cuda/10.0.130-fasrc01................... install cuda toolkit and samples
            cuda/10.1.243-fasrc01................... install cuda toolkit and samples
            cuda/10.2.89-fasrc01.................... install cuda toolkit and samples
            cuda/11.0.3-fasrc01..................... install cuda toolkit and samples
            cuda/11.1.0-fasrc01..................... install cuda toolkit and samples
            cuda/11.4.2-fasrc01..................... install cuda toolkit and samples
      Easy Build
            CUDA/9.2.88.............................
            CUDA/10.0.130...........................


    To find detailed information about a module, search the full name.

      module-query CUDA/10.0.130

    You may need to specify the build "flavor" to get a single record

      module-query CUDA/10.0.130 --flavor 'Easy Build'
      
     

Load the selected CUDA and cuDNN version

module load cuda/10.0.130-fasrc01 cudnn/7.4.1.5_cuda10.0-fasrc01

2. Create Environment

conda create -n tf1.15_cuda10.0.130 python=3.6 numpy six wheel

3. Activate the conda environment & Install Tensorflow

source activate tf1.15_cuda10.0.130

pip install --upgrade tensorflow-gpu==1.15

4. Check if tensorflow uses GPU 👀

(tf1.15_cuda10.0.130) [ramith@xxxxxx ~]$ python
Python 3.6.13 |Anaconda, Inc.| (default, Jun  4 2021, 14:25:59) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
True

5. Add new environment to Jupyter Lab (so that we can select it)

conda install -c anaconda ipykernel -y
python -m ipykernel install --user --name=fyp_env

6. Working in JupyterLab ?

~As of now, even thought tensorflow used gpu, while it ran in the terminal, it didn't work in jupyter 😬, I'll update the guide if I find a solution.

Ok found the solution! So here's the thing. Before you start the Jupyter Lab instance, you can actually specify which modules to load!

(When creating the jupyter instance, you can include these module!! 😃)
(Working!)

7. JAX ?

Initially I had lots of issues like the following,

  • Unimplemented: DNN library is not found.
  • Couldn't invoke ptxas --version

The issue was that I couldn't get cuDNN to work. Tried various things, editing PATH variables etc 😆, nothing seemed to work. Ultimately I got it working by loading cudnn/8.1.0.77_cuda11.2-fasrc01 when creating the jupyter environment, which was pretty straightforward!! 😃

Important ❗️

Everytime you connect to the cluster, you will need to load those additional CUDA and cuDNN modules like shown below or when you create the notebook you need to specify the modules (as shown above).

[ramith@xxxxxxx ~]$ module load cuda/10.0.130-fasrc01 cudnn/7.4.1.5_cuda10.0-fasrc01
[ramith@xxxxxxx ~]$ source activate tf1.15_cuda10.0.130

References

]]>
<![CDATA[Deep Residual Learning for Image Recognition]]>

Highlights of ResNet Paper

  • Present a residual learning framework to train very deep networks more easily.
  • Training 8x deeper networks than VGG-net
  • 3.57% error on the ImageNet test set, 28% relative improvement on COCO object detection dataset.
]]>
https://ramith.fyi/deep-residual-learning-for-image-recognition/61f0d639271c68224cf46e27Wed, 26 Jan 2022 21:25:38 GMT

Highlights of ResNet Paper

  • Present a residual learning framework to train very deep networks more easily.
  • Training 8x deeper networks than VGG-net
  • 3.57% error on the ImageNet test set, 28% relative improvement on COCO object detection dataset.
  • Topping the leader board in ILSVRC & COCO 2015 (ImageNet classification, detection, localization, COCO detection & segmentation)

Introduction

From the ImageNet Classification results in 2014-2015, it was evident that having deeper networks helps to learn greater levels of features. As shown in the figure below, we can see that VGG-Net and GoogLeNet have reduced the top-5 error rate further by having deeper networks.

So if we just stack more and more layers, does that help? Turns out it doesn't. One reason for this is the vanishing gradient problem which was studied by Sepp Hochreiter in 1991 [14] and discussed over the years [1], [8]. This problem makes it difficult for a network to converge from the start. This issue has been addressed through various initialization methods and through batch normalization [16].

Even when a network starts converging, in deeper networks, researchers have found that there is a degradation of accuracy. Particularly, when we start increasing the depth of a model the accuracy gets saturated, and then it degrades rapidly. As  evident from the learning curves, this is not due to overfitting$^{\star}$.



The authors of the ResNet paper argue that, even if we increase the depth, theoretically there should be solution which gives the same accuracy. So it's basically the shallow network + layers with identity transform. However, the problem seems that the optimizers cannot reach that solution. So can we do a trick and get there easily? That's what authors hypothesize.

Methodology - Deep Residual Learning

Fitting a residual mapping

Let's say we need to approximate the function $\mathcal{H}$ by some set of layers of a neural network. Authors propose that, rather than learning $\mathcal{H}$ can we let the few layers approximate a residual function ( $\mathcal{H}-\mathrm{x}$ ).  🤔

Let's denote this new function by $\mathcal{F}$. So, now we have rewritten original the function we need to approximate as, $\mathcal{H}(\mathrm{x})=\mathcal{F}(\mathrm{x})+\mathrm{x}$.

Ok, so what benefit does this give?  🤷🏻

By reformulating, we saw was that, $\mathcal{H}$ was split into an addition of a function $\mathcal{F}$ with the input. In the degradation problem that we saw earlier, the issue was that learning the identity function was hard $^{\star}$. However with this residual learning reformulation, it should be easy for the optimizer to drive the weights of the layers such that $\mathcal{F}$ becomes a zero mapping. In this way, we are left with $\mathcal{F}(\mathrm{x})+\mathrm{x}$ which is the identity mapping.


]]>
<![CDATA[ImageNet classification with deep convolutional neural networks (AlexNet)]]>

AlexNet paper by [1] Krizhevsky et al. was published in 2012. It is a highly influential paper in computer vision which showed that deep networks along with efficient utilization of GPUs for training can help build better models. They were able to achieve top-1 and top-5 test set error rates

]]>
https://ramith.fyi/imagenet-classification-with-deep-convolutional-neural-networks/61e811027243cc18c691c735Wed, 19 Jan 2022 18:26:26 GMT

AlexNet paper by [1] Krizhevsky et al. was published in 2012. It is a highly influential paper in computer vision which showed that deep networks along with efficient utilization of GPUs for training can help build better models. They were able to achieve top-1 and top-5 test set error rates of 37.5% and 17.0%, while the best in ILSVRC 2010 was 47.1% and 28.2%.

Introduction

Before 2008, the computer vision researchers mostly evaluated their methods on datasets with tens of thousands of images. Thanks to the ImageNet dataset [2] which was published in 2009, researchers got the opportunity to move from small scale datasets such as MNIST, CIFAR-10/100 to a much larger dataset which has over 15 million labeled images from more than 22,000 categories.

The ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [3] which started in 2010 focuses on a subset of ImageNet which has 1.2 million training images, 50,000 validation images, and 150,000 testing images.

Architecture

Key characteristics of the AlexNet architecture : 8 learned layers (5 Conv, 3 Fully-connected)

1. ReLU nonlinearity

Inspired by Nair and Hinton's work [4] , authors have utilized the ReLU non-linearity and have found that it makes the training process several times faster.

2. Multiple GPU Training

Authors employ a cross-GPU parallelisation approach to train the AlexNet and the communication between GPUs happen only when it is required by the certain layers. They utilize 2x GTX 580 GPUs.

Just to get an idea on how GPUs now and then (2012) compare with each other. Source : GadgetVersus

3. Local Response Normalization

Krizhevsky et al. has found that their local normalization methods helps achieve better generalization. For this procedure, they sum over $n$ adjacent kernel maps at the same spatial location as shown in the equation below. This normalization has been applied in certain layers of AlexNet.

Local reponse normalization has reduced top-1 and top-5 error rates by 1.4% and 1.2%.

4. Overlapping Pooling

By overlapping pooling scheme has reduced top-1 and top-5 error rates by 0.4% and 0.3%. Furthermore, the authors have observed that models with overlapping pooling are slightly more difficult to overfit.

Reducing Overfitting

Authors have utilized 1) data augmentations, 2) dropout to reduce overfitting. Data augmentation has been done in two ways.

1) Generating image translations and horizontal reflections

2) Altering RGB channel intensities in training images

Authors cite their early work on Dropout [6] and mention that it has reduced overfitting substantially while roughly doubling the number of training iterations needed to converge.

]]>