Recent fig DNA testing by Richard F. and Brian M

Some members here are aware that Brian M. and I have been obtaining fig DNA marker data from a lab in Sacramento. Brian’s interest is in finding synonyms among fig cultivars whereas I’ve been capturing them to see if they also occur in whole genome sequencing of figs I have underway at Arizona Genomics Institute.

Here’s the markers in use at the lab:
CSP Labs Fig primers-1.xls (23 KB)

Brian has found that the Black Madeira and Figo Preto have identical “fingerprints” according to the lab markers.

I was curious about this and sent samples of them to the lab along with several others that are being sequenced at AGI. Here is the raw marker data I received in return:
251899 Fig.xlsx (12.8 KB)

I then processed the data into a more standard form, showing the counts returned (or lack thereof) for each marker. In the table, the F and R refer to the two parts of each marker which are listed in the file above. I’ve put Black Madeira and Figo Preto in the first two data columns so you can see they really are identical in this marker set:

Black Madeira Figo Preto Bardacik Burgan Unknown Filacciano Bianco Janice Kadota Seedless Martinenca Rimada Nuestra Senora del Carmen Unknown Pastiliere Violette de Bordeaux
Frub398 F 189 189 189 187,189 187 187 189 187 187,189 187,189
Frub398 R 191 191 191 191,193 193 193 191 193 191,193 191
V230 F 109 109 102 109 109 109 102 102 109
V230 R 115 115 115 115 115 115
Frub391 F 173 173 169 165 171 173 165 177 169 165
Frub391 R 175 175 173 171 173 186 173 186 186 171
Frac241 F 284 284 282,284 284 284 284 284 284 284
Frac241 R 288 288 288
V149 F 215 215 207 215 207 215 215 215 200 207
V149 R 219 219 219 219 219 219 219 215 215
FCUP008 F 154 154 152 154 154 154 154 152
FCUP008 R 171 171 169,171 173 173,177 171 171 171 171
FCUP027 F 196 196 192,196 196 192,196 192
FCUP027 R 208 208 202,208 208 208 208 208 208
FCUP016 F 162 162 162 162 162 162 162 162 162 166
FCUP016 R 168 168 168 168 168
FCUP038 F 160 160 158 166 160,166 160 158,166 166 158
FCUP038 R 183 183 173,179 181 183 183 183 172,175 170
FCUP069 F 183,185 183,185 183 183 183,185 183,185 183 175
FCUP069 R 191,193 193 193 191,193 193 191,193
FCUP044 F 199,201 199,201 201 197,201 197 197,199 199 200 197,201
FCUP044 R 208 208 208 208
Frac13 F 123 123 123 123 123 123 123 123
Frac13 R 138 138 138 138 138 136
FCUP042 F 169 169 169 169 169 169 169 169 167,169 169
FCUP042 R 175
FCUP045 F 121 121 121 121 121 121 121 119,121 121
FCUP045 R 131 131 131 131 131 131 131
FCUP070 F 165 167,169
FCUP070 R 171,175 171,175 183 171 171,173 171,173 175,179 173,175 173 179

The markers in use are not ideal but are good for ferreting possible synonyms. Beyond that though they have little utility. For example, if you want to get an idea of the relationships between these figs then further vetting of the markers is needed to eliminate those with missing values. Here’s what you get:

Bardacik Black Madeira Burgan Unknown Filacciano Bianco Janice Kadota Seedless Martinenca Rimada Nuestra Senora del Carmen Unknown Pastiliere Violette de Bordeaux
Frub398 F {189} {189} {187,189} {187} {187} {189} {187} {187,189} {187,189}
Frub398 R {191} {191} {191,193} {193} {193} {191} {193} {191,193} {191}
Frub391 F {169} {173} {165} {171} {173} {165} {177} {169} {165}
Frub391 R {173} {175} {171} {173} {186} {173} {186} {186} {171}
V149 F {207} {215} {215} {207} {215} {215} {215} {200} {207}
FCUP016 F {162} {162} {162} {162} {162} {162} {162} {162} {166}
FCUP042 F {169} {169} {169} {169} {169} {169} {169} {167,169} {169}
FCUP070 R {183} {171,175} {171} {171,173} {171,173} {175,179} {173,175} {173} {179}

From that a diagram of nearest neighbor relations can be made, which shows the number of marker mismatches between each fig. Notice the unlikely close relations to Janice-Kadota Seedless.


Thanks for working to unravel and understand some of these relationships!


I am impressed by the tremendous effort that you are putting into this, and am excited to watch the results continue to unravel the confusing story of how different fig cultivars are related to each other. Keep up the good work @Richard!


This is incredible. thank you for the information and I hope this keeps going. The work is very much appreciated.


I’d give you 2 hearts if I could, an extra for extra coolness.


Amazing work @Richard!

It’s so wild that these figs can exhibit such differences in ripening times, and sometimes even flavor, despite having the same DNA. It’s incredible how much of an effect epigenetics has.

I’m very much looking forward to more revelations as far as synonyms and closely related cousins :heart:

1 Like

They do not have the same DNA. Instead, 11 markers returned the same inflorescence values (including blank) when annealed with the chromosomes of the specimens.


Ah I see, thank you for the clarification

1 Like

This is very cool to see, there’s so much left to learn by applying even classic molecular biology methods to commonly cultivated fruit! It’s been cost-prohibitive to do whole genome sequencing until recently, and it’s still a bit too pricey for the hobby market- we’re talking roughly 1k USD per plant genome, and then you still need some expertise to generate a good genome assembly.

Out of curiosity, can you share more details about the markers being tested? It’s obviously some kind of PCR or hybridization method given the primer sequences you shared. But what are the marker values exactly? Are these PCR amplicon sizes?

You cannot get a reasonably accurate perennial plant genome for US$ 1k, or even 5k.

They are among a standard set of markers used in kits sold to university labs and also by manufacturers of automated SSR hardware, e.g. Illumina. If I were to search for the origin I’d try the BMC journal dedicated to markers. They were selected by the lab for Brian’s project, and I stuck with them for consistency. Note that Brian and I have very different goals – see OP.

You cannot get a reasonably accurate perennial plant genome for US$ 1k, or even 5k.

You absolutely can, but only at scale :slight_smile:
Not as a single sample, and not direct to consumer. And to be clear I’m just talking sequencing cost here (though sample prep doesn’t really add that much extra cost, maybe another $100). But you might be surprised by how much prices have come down in the last few years. Just this year the prices dropped nearly 3x once again for both Illumina and PacBio. Illumina is cheaper of course but you won’t get any decent assembly from that alone. But long read sequencing has come a long way and you can get some darn good assemblies with just PacBio. If you’re going for something complicated like a hexaploid you might need to throw in some additional HiC to get homologous chromosomes assembled.

But you don’t have to take my word for it!

U.S. list price is $995 for sequencing reagents for one Revio SMRT Cell, which has an expected yield of 90 Gb, equivalent to a 30X human genome. (source)

Note that quoted price is for a human genome at about 3 gigabases haploid, which is a few times larger than many plant genomes (though some of them do get crazy big like redwood at >26 gigabases!).


That’s for human genome. Now price a perennial (e.g. Plinia edulis) for which there is no reference genome.

Yeah there will always be outliers, and to estimate the cost you’d need to know roughly the size of the genome (there are methods to do this without sequencing just based on the mass of the chromosomes themselves).

But it’s exciting to think how much we’ll be able to learn by widespread sequencing of just the handful of the most common commercial fruit trees like apples and stonefruit. All of those are on the order of 5-10x smaller than a human genome (ok there’s triploid apples those are going to take a bit more). And the tech has advanced to the point where this is done routinely for de novo genome assembly (i.e. no reliance at all on mapping to anything previously sequenced).

And to the topic at hand - figs genomes are only a bit over 3 megabases! It’s feasible to do those for hundreds of dollars a piece using current tech! After you buy a million dollar sequencer and plop it down in your 10 million dollar lab :wink:

That’s why I think your Fig project is very cool, it’s a great first look into the nearly unexplored world of “heirloom” plant genomics!


Horticulturists have long thought that genomics is simpler than the actual reality.

The majority of fruit crops are currently outliers.

This is not entirely true in cases without reference genomes, because other data is needed which can drive the cost significantly higher.

Accurate polyploid genomes are still not within reach. They are presently considered the holy grail by the research community.

I believe I’m qualified to evaluate it.

This statement does not match the quotes I have in hand from PacBio Sequel Labs in the U.S. for high-quality libraries.

Ah well I didn’t come here to argue :grin: so I won’t derail the discussion any further!

I do, however, know what I’m talking about. I do this for a living.


Then provide me with a quote from your lab for a high quality Ficus carica reference genome including all costs (sample preparation, PacBio Sequel sequencing, HQ analysis, resequencing, etc. through library construction).

The next synonyms that I am (and I am sure others too) interested in investigating are Strawberry Verte, Battaglia Green and Paradiso.

1 Like

The Paradiso in Europe is different from the Paradiso distributed from NCGR Davis.

1 Like

I really admire this work and will follow if there will be further studies.
Pardon my ignorance on this field of DNA marker but trying to grasp the data you shared, I have some questions that I would greatly appreciate if answered.

  1. I’m guessing the DNA marker data in the sheet is in kb unit - showing the location of electrophoreisis results, allowing to create genetic distance via Jaccard coefficient?

  2. In my knowledge, to expand a existing distance relationship diagram you need raw marker data(electrophoresis data) from the previous study and use the same marker(primer).
    If I am aiming to expand the work of Ikegami(2008), Do you personally think Ikegami would provide such data to random person like myself? I have posted this question on ourfigs but came with less clear answers.

  3. Is there a specific reason why you chose to make a distance relationship diagram that is not a dendrogram? (Is it because you view Aradhya’s 2010 approach to be flawful?)

Thank you for reading.

No. They are reaction counts.

He’s congenial, but during busy times of the year will ignore some email.

Further, counts from SSR markers are questionable. Occurrences in high-quality genome sequences are missing or far less frequent than counted by PCR.

See section 1.8 in