Designing the molecular probe with NUPACK

Nupack is a cloud-based software developed by Caltech, used to model and design secondary and multi-strand DNA structures. NUPACK can be used for either DNA analysis or design.

The sequence for the DNA dogbone used in our project was generated by NUPACK.

To get to our design for our first dogbone, which detects miRNA-193b, we had to create and test over 50 unique NUPACK codes, through 3 iterations.

The biggest changes our prototype went through involved changing the enzyme we used to make our DNA circular and scaling down the overall size of our structure. By switching Circligase to T4 as the ligation enzyme we were able to reduce cost by a factor of 315! Decreasing the size and modifying the DNA shape allowed us to make the switch in ligation enzymes while ensuring our final structure was stable enough to stay together on its own. 

When used as an analysis tool, NUPACK calculates the most probable secondary structure of DNA sequences from a user-inputted sequence of nucleotides. The user can also input other parameters such as salt concentration, DNA concentration, and temperature. NUPACK uses the “nearest neighbour method” of DNA stability calculations to do this. This method is accepted as being reliable and accurate in the field. The nearest neighbour method relies on the calculation of free energy, which is the amount of energy a system has available to do work (an unstable system will have a lot of energy available, a stable or spontaneous system will have negative energy available). The probability of DNA binding is calculated by adding the free energy of every pair of adjacent nucleotides and the nucleotides to which they would bind. For example, the free energy for the sequence 5’-AGCC-3’ binding with 3’-TCGT-5’ would be the sum of the free energy for: AG/TG binding, GC/GC binding, and CC/GT binding. By calculating the free energy of all the physically possible shapes of a DNA structure, NUPACK calculates the most probable secondary structure of one or multiple DNA sequences. Our final design has a free energy of -6.10kcal/mol.

When used as a design tool, NUPACK generates a DNA sequence from user-inputted specifications such as structure shape, required nucleotide regions and ensemble defect limits (the chance of deviation from the specified structure in the sequence). NUPACK generates several random sequences within the constraints inputted and returns the one with the lowest free energy. Since NUPACK’s algorithm involves an aspect of randomness, the user may get more stable structures by inputting more limiting parameters. Understanding and exploiting this was important for us when we were designing our DNA structure.

Source: NUPACK model 

Our design includes three fundamental sites that were present in all iterations.

Fig 1. Diagram of miRNA binding regions.

The first one is the reverse complementary miRNA sequence. This sequence is split between two regions of our structure, the toehold, and the stem:

  • The toehold is the part of the miRNA in the loop that is unbound. The miRNA will first bind to this part. If the toehold is too short the miRNA will bind at a slower rate and if the toehold is too long the miRNA has a higher chance of not opening up the structure and starting RCA. 9 nucleotides is the ideal length for a toehold [1], so this is the length used in our design.
  • The stem is the double-stranded part of the structure. It contains the remainder of the reverse complementary sequence to the miRNA. All four of the miRNA sequences we are working with are 22nt long. Since our toehold length is locked at 9nt, our stem has to be at least 13nt long. 

The presence of strand displacement in our design ensures the specificity of our probes, and it also provides a sensitive manner for the detection of miRNAs. The second fundamental aspect of our design is the restriction enzyme site. Once the DNA-RNA complex is made, phi29 binds and produces a very long strand of DNA. Because these strands are very long, the sample mixture will not be a homogenous one. Also, there is a higher chance for the DNA strands to clump with each other and gather at one spot. Both of these factors decrease the reliability and the quantitative value of the results substantially. As a result, we put a restriction enzyme site in all of our designs. The restriction enzyme site will be amplified with the DNA and we will be able to cut the amplified DNA, making a homogenous mixture. The third design constraint is the presence of a site for the binding of molecular beacons. Although our final workflow does not include molecular beacons, we did want to test them as they had their own benefits. All of the iterations include two 20 nucleotide sites for the binding of molecular beacons on the single-stranded region of the probes.

1. Toehold-initiated rolling circle amplification for visualizing individual microRNAs in situ in single cells

In our first iteration, we had a separate site for each of the fundamental aspects of our system. 

Fig 2. First dogbone structure iteration, rendered by NUPACK.

We designed a restriction enzyme site for RE HaeIII (GGCC) in one of the loops. The idea was to add primers into the system before the addition of restriction enzymes. These primers would have GGCC as part of them and they would bind the amplified DNA at all sites. After this, we would add the restriction enzymes to cut the amplified DNA into smaller pieces. While this idea could have worked, we realized there are more efficient methods for our purpose, which are discussed in iteration 2 and 3.

In addition, we set two 20 nucleotide sites for the molecular beacons in each side. The presence of these sites, as well as the specification of a primer site for restriction enzyme cutting, led to a very big probe (158 nt), which is more expensive to synthesize.

Furthermore, after designing this probe we realized for its circularization we will need an extremely expensive enzyme that would ligate the single-stranded parts because the probe started and ended in the single-stranded region. This would make our assay very expensive and not viable for commercialization. In the next iterations, we devised a method to solve this important issue.

The big change we made in this iteration was a cut down on our design size. We were able to do this by putting the GGCC restriction enzyme site inside the molecular beacon attachment site. We reduced our probe size down to 100 nt! Getting rid of 58 nt meant the cost of production of our probe was less.

The shorter length made structures in this iteration more unstable. Trying to compensate for the lost stability in this iteration, we started testing both possible toeholds locations on the structure and choosing the more stable location, rather than choosing the toehold region based on the miRNA secondary structure.

Even though we were able to shorten the length of our structure, the ligation of the loose ends of the DNA using circligase was still really expensive. It wasn’t until our final iteration we were able to figure out a way around this problem and make our system economically viable.

Fig 3. Final dogbone probe structure, rendered by NUPACK.

In our final iteration, we finally fixed our circularization problem. Our solution was to change the location we put the loose ends of our DNA from the bottom of the single stranded loop to the double stranded stem. This change allowed us to use a cheaper enzyme, T4, instead of circligase to ligate our DNA.

Changing the location of the DNA break meant that our system was now 315 times cheaper! It also meant our that our structure became a lot less stable. In order to make a stable dogbone structure that met our new requirements, we added a G/C pair to the stem, making it 14 nt instead of 13 nt.  We have some ideas about how we could change our structure so we don’t have to lengthen our stem, but for now, our design works with 14 nt. By experimentation on NUPACK we learned that in order to get an optimal structure there had to be 5nt of bound nucleotides on each side of the loose ends and the break also had to be between two G/C bonds. Implementing these new constraints in our NUPACK codes allowed us to find a sequence that was stable as a dog-bone, would still open up to bind with the miRNA, AND we were able to ligate using T4.

The experiments performed in wet-lab led to a much greater understanding of our system. In our future work, we will use these understandings to further refine the system.

The first important understanding came from performing the molecular beacon vs SYBR Gold experiment. Through that experiment, we realized SYBR Gold is a much better choice for staining the DNA as it produces a lot more signal and it is a lot cheaper. This leads to one less design constraint in our system. But without the molecular beacons, how would the restriction enzymes perform their task?

The answer to this question came in our restriction enzyme test. Our miRNA also encodes the GGCC sequence that is used by our restriction enzyme to cut the DNA. As a result, the stem part of the dog-bone structure has the GGCC sequence coded in it. In our experiment, we did not add any primers nor molecular beacons, so how would the restriction enzymes cut the amplified DNA? The answer is the amplified DNA will naturally make stems when the original stem is amplified. Therefore, HAEIII would cut each of those restriction enzyme sites and result in a homogenous mixture. For other miRNAs, different restriction enzyme sites will be available, and they can be identified very easily. So, a different restriction enzyme could be used for each miRNA and no additional site is necessary for the restriction enzyme.

The removal of both these sites results in a much less constrained design for the probes. As a result, the probes could be smaller and cheaper. In addition, the decrease in constraints and size would result in NUPACK being able to produce probes with no defects and exactly 13 base pairs in their stem. Having a shorter stem that only contains the nucleotides that are complementary to our miRNA would make the dog-bone opening more likely and would probably increase our system sensitivity.

In essence, our system is modular. For each miRNA, a different probe can be easily synthesized and a different restriction enzyme could be chosen. Our focus has been on miR-193b for Small Cell Lung Cancer, but our system could apply for any mature miRNA detection.