Recent fig DNA testing by Richard F. and Brian M

Richard · June 14, 2023, 9:37am

Some members here are aware that Brian M. and I have been obtaining fig DNA marker data from a lab in Sacramento. Brian’s interest is in finding synonyms among fig cultivars whereas I’ve been capturing them to see if they also occur in whole genome sequencing of figs I have underway at Arizona Genomics Institute.

Here’s the markers in use at the lab:
CSP Labs Fig primers-1.xls (23 KB)

Brian has found that the Black Madeira and Figo Preto have identical “fingerprints” according to the lab markers.

I was curious about this and sent samples of them to the lab along with several others that are being sequenced at AGI. Here is the raw marker data I received in return:
251899 Fig.xlsx (12.8 KB)

I then processed the data into a more standard form, showing the counts returned (or lack thereof) for each marker. In the table, the F and R refer to the two parts of each marker which are listed in the file above. I’ve put Black Madeira and Figo Preto in the first two data columns so you can see they really are identical in this marker set:

	Black Madeira	Figo Preto	Bardacik	Burgan Unknown	Filacciano Bianco	Janice Kadota Seedless	Martinenca Rimada	Nuestra Senora del Carmen	Unknown Pastiliere	Violette de Bordeaux
Frub398 F	189	189	189	187,189	187	187	189	187	187,189	187,189
Frub398 R	191	191	191	191,193	193	193	191	193	191,193	191
V230 F	109	109	102	109	109		109	102	102	109
V230 R			115		115	115		115	115	115
Frub391 F	173	173	169	165	171	173	165	177	169	165
Frub391 R	175	175	173	171	173	186	173	186	186	171
Frac241 F	284	284	282,284	284	284	284	284	284	284
Frac241 R					288				288	288
V149 F	215	215	207	215	207	215	215	215	200	207
V149 R	219	219	219	219	219	219		219	215	215
FCUP008 F	154	154	152		154		154	154	154	152
FCUP008 R	171	171		169,171	173	173,177	171	171	171	171
FCUP027 F	196	196	192,196		196		192,196			192
FCUP027 R	208	208		202,208	208	208		208	208	208
FCUP016 F	162	162	162	162	162	162	162	162	162	166
FCUP016 R	168	168	168				168			168
FCUP038 F	160	160		158	166	160,166	160	158,166	166	158
FCUP038 R	183	183	173,179	181	183	183	183		172,175	170
FCUP069 F	183,185	183,185		183	183	183,185	183,185	183	175
FCUP069 R			191,193	193	193			191,193	193	191,193
FCUP044 F	199,201	199,201	201	197,201	197		197,199	199	200	197,201
FCUP044 R					208	208		208	208
Frac13 F			123	123	123	123	123	123	123	123
Frac13 R	138	138	138				138	138	136
FCUP042 F	169	169	169	169	169	169	169	169	167,169	169
FCUP042 R							175
FCUP045 F	121	121	121	121		121	121	121	119,121	121
FCUP045 R	131	131	131	131	131		131	131
FCUP070 F			165						167,169
FCUP070 R	171,175	171,175	183	171	171,173	171,173	175,179	173,175	173	179

The markers in use are not ideal but are good for ferreting possible synonyms. Beyond that though they have little utility. For example, if you want to get an idea of the relationships between these figs then further vetting of the markers is needed to eliminate those with missing values. Here’s what you get:

	Bardacik	Black Madeira	Burgan Unknown	Filacciano Bianco	Janice Kadota Seedless	Martinenca Rimada	Nuestra Senora del Carmen	Unknown Pastiliere	Violette de Bordeaux
Frub398 F	{189}	{189}	{187,189}	{187}	{187}	{189}	{187}	{187,189}	{187,189}
Frub398 R	{191}	{191}	{191,193}	{193}	{193}	{191}	{193}	{191,193}	{191}
Frub391 F	{169}	{173}	{165}	{171}	{173}	{165}	{177}	{169}	{165}
Frub391 R	{173}	{175}	{171}	{173}	{186}	{173}	{186}	{186}	{171}
V149 F	{207}	{215}	{215}	{207}	{215}	{215}	{215}	{200}	{207}
FCUP016 F	{162}	{162}	{162}	{162}	{162}	{162}	{162}	{162}	{166}
FCUP042 F	{169}	{169}	{169}	{169}	{169}	{169}	{169}	{167,169}	{169}
FCUP070 R	{183}	{171,175}	{171}	{171,173}	{171,173}	{175,179}	{173,175}	{173}	{179}

From that a diagram of nearest neighbor relations can be made, which shows the number of marker mismatches between each fig. Notice the unlikely close relations to Janice-Kadota Seedless.

PhilaGardener · June 14, 2023, 9:55am

Thanks for working to unravel and understand some of these relationships!

Audi_o_phile · June 14, 2023, 5:39pm

I am impressed by the tremendous effort that you are putting into this, and am excited to watch the results continue to unravel the confusing story of how different fig cultivars are related to each other. Keep up the good work @Richard!

etheth32992 · June 14, 2023, 5:51pm

This is incredible. thank you for the information and I hope this keeps going. The work is very much appreciated.

NJpete · June 15, 2023, 3:09am

I’d give you 2 hearts if I could, an extra for extra coolness.

kybishop · June 15, 2023, 4:37am

Amazing work @Richard!

It’s so wild that these figs can exhibit such differences in ripening times, and sometimes even flavor, despite having the same DNA. It’s incredible how much of an effect epigenetics has.

I’m very much looking forward to more revelations as far as synonyms and closely related cousins

Richard · June 15, 2023, 4:52am

They do not have the same DNA. Instead, 11 markers returned the same inflorescence values (including blank) when annealed with the chromosomes of the specimens.

kybishop · June 15, 2023, 11:09am

Ah I see, thank you for the clarification

gibberellin · June 16, 2023, 1:34am

This is very cool to see, there’s so much left to learn by applying even classic molecular biology methods to commonly cultivated fruit! It’s been cost-prohibitive to do whole genome sequencing until recently, and it’s still a bit too pricey for the hobby market- we’re talking roughly 1k USD per plant genome, and then you still need some expertise to generate a good genome assembly.

Out of curiosity, can you share more details about the markers being tested? It’s obviously some kind of PCR or hybridization method given the primer sequences you shared. But what are the marker values exactly? Are these PCR amplicon sizes?

Richard · June 16, 2023, 1:44am

You cannot get a reasonably accurate perennial plant genome for US$ 1k, or even 5k.

They are among a standard set of markers used in kits sold to university labs and also by manufacturers of automated SSR hardware, e.g. Illumina. If I were to search for the origin I’d try the BMC journal dedicated to markers. They were selected by the lab for Brian’s project, and I stuck with them for consistency. Note that Brian and I have very different goals – see OP.

gibberellin · June 16, 2023, 4:54am

You cannot get a reasonably accurate perennial plant genome for US$ 1k, or even 5k.

You absolutely can, but only at scale
Not as a single sample, and not direct to consumer. And to be clear I’m just talking sequencing cost here (though sample prep doesn’t really add that much extra cost, maybe another $100). But you might be surprised by how much prices have come down in the last few years. Just this year the prices dropped nearly 3x once again for both Illumina and PacBio. Illumina is cheaper of course but you won’t get any decent assembly from that alone. But long read sequencing has come a long way and you can get some darn good assemblies with just PacBio. If you’re going for something complicated like a hexaploid you might need to throw in some additional HiC to get homologous chromosomes assembled.

But you don’t have to take my word for it!

U.S. list price is $995 for sequencing reagents for one Revio SMRT Cell, which has an expected yield of 90 Gb, equivalent to a 30X human genome. (source)

Note that quoted price is for a human genome at about 3 gigabases haploid, which is a few times larger than many plant genomes (though some of them do get crazy big like redwood at >26 gigabases!).

Richard · June 16, 2023, 5:35am

@gibberellin
That’s for human genome. Now price a perennial (e.g. Plinia edulis) for which there is no reference genome.

gibberellin · June 16, 2023, 6:00am

Yeah there will always be outliers, and to estimate the cost you’d need to know roughly the size of the genome (there are methods to do this without sequencing just based on the mass of the chromosomes themselves).

But it’s exciting to think how much we’ll be able to learn by widespread sequencing of just the handful of the most common commercial fruit trees like apples and stonefruit. All of those are on the order of 5-10x smaller than a human genome (ok there’s triploid apples those are going to take a bit more). And the tech has advanced to the point where this is done routinely for de novo genome assembly (i.e. no reliance at all on mapping to anything previously sequenced).

And to the topic at hand - figs genomes are only a bit over 3 megabases! It’s feasible to do those for hundreds of dollars a piece using current tech! After you buy a million dollar sequencer and plop it down in your 10 million dollar lab

That’s why I think your Fig project is very cool, it’s a great first look into the nearly unexplored world of “heirloom” plant genomics!

Richard · June 16, 2023, 6:56am

Horticulturists have long thought that genomics is simpler than the actual reality.

The majority of fruit crops are currently outliers.

This is not entirely true in cases without reference genomes, because other data is needed which can drive the cost significantly higher.

Accurate polyploid genomes are still not within reach. They are presently considered the holy grail by the research community.

I believe I’m qualified to evaluate it.

This statement does not match the quotes I have in hand from PacBio Sequel Labs in the U.S. for high-quality libraries.

gibberellin · June 16, 2023, 7:06am

Ah well I didn’t come here to argue so I won’t derail the discussion any further!

I do, however, know what I’m talking about. I do this for a living.

Richard · June 16, 2023, 7:22am

Then provide me with a quote from your lab for a high quality Ficus carica reference genome including all costs (sample preparation, PacBio Sequel sequencing, HQ analysis, resequencing, etc. through library construction).

Ahmad · June 19, 2023, 2:29pm

The next synonyms that I am (and I am sure others too) interested in investigating are Strawberry Verte, Battaglia Green and Paradiso.

Richard · June 19, 2023, 3:21pm

The Paradiso in Europe is different from the Paradiso distributed from NCGR Davis.

cerana_lim · June 20, 2023, 3:12pm

I really admire this work and will follow if there will be further studies.
Pardon my ignorance on this field of DNA marker but trying to grasp the data you shared, I have some questions that I would greatly appreciate if answered.

I’m guessing the DNA marker data in the sheet is in kb unit - showing the location of electrophoreisis results, allowing to create genetic distance via Jaccard coefficient?
In my knowledge, to expand a existing distance relationship diagram you need raw marker data(electrophoresis data) from the previous study and use the same marker(primer).
If I am aiming to expand the work of Ikegami(2008), Do you personally think Ikegami would provide such data to random person like myself? I have posted this question on ourfigs but came with less clear answers.
Is there a specific reason why you chose to make a distance relationship diagram that is not a dendrogram? (Is it because you view Aradhya’s 2010 approach to be flawful?)

Thank you for reading.

Richard · June 20, 2023, 5:52pm

No. They are reaction counts.

He’s congenial, but during busy times of the year will ignore some email.

Further, counts from SSR markers are questionable. Occurrences in high-quality genome sequences are missing or far less frequent than counted by PCR.

See section 1.8 in http://wireilla.com/papers/ijcsa/V12N4/12422ijcsa01.pdf